This new group lists all UID/GID credential changing syscalls (which are
quite a number these days). This will become particularly useful in a
later commit, which uses this group to optionally permit user credential
changing to daemons in case ambient capabilities are not available.
These booleans simply store whether selinux/apparmor/smack are supposed
ot be used, and chache the various mac_xyz_use() calls before we
transition into the namespace, hence let's use the same verb for the
variables and the functions: "use"
Let's merge three if blocks that shall only run when sandboxing is applied
into one.
Note that this changes behaviour in one corner case: PrivateUsers=1 is
now honours both PermissionsStartOnly= and the "+" modifier in
ExecStart=, and not just the former, as before. This was an oversight,
so let's fix this now, at a point in time the option isn't used much
yet.
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
execute: add one more ExecFlags flag, for controlling unconditional directory chowning
Let's decouple the Manager object from the execution logic a bit more
here too, and simply pass along the fact whether we should
unconditionally chown the runtime/... directories via the ExecFlags
field too.
execute: let's decouple execute.c a bit from the unit logic
Let's try to decouple the execution engine a bit from the Unit/Manager
concept, and hence pass one more flag as part of the ExecParameters flags
field.
This adds a per-service restart counter. Each time an automatic
restart is scheduled (due to Restart=) it is increased by one. Its
current value is exposed over the bus as NRestarts=. It is also logged
(in a structured, recognizable way) on each restart.
Note that this really only counts automatic starts triggered by Restart=
(which it nicely complements). Manual restarts will reset the counter,
as will explicit calls to "systemctl reset-failed". It's supposed to be
a tool for measure the automatic restart feature, and nothing else.
Alan Jenkins [Wed, 9 Aug 2017 17:56:26 +0000 (18:56 +0100)]
units: console-getty.service: use the default RestartSec
> Note that console-getty.service as more uses than just containers. The
> idea is that it may be used as alternative to the whole VC/logind stuff,
> if all you need is a console on /dev/console, even on physical devices.
This means we want to remove RestartSec=0, for serial systems.
See 4bf0432 "units/serial-getty@.service: use the default RestartSec".
Alan Jenkins [Mon, 7 Aug 2017 18:24:32 +0000 (19:24 +0100)]
units: add Conflicts=rescue.service to container-getty@.service
The traditional runlevel 1 is "single user mode", and shuts down all but
the main console. In systemd, rescue.target provides runlevel1.target.
But it did not shut down logins on secondary consoles... if systemd was
running in a container.
I don't think we strictly need to change this. But when you look at both
container-getty@.service and getty@.service, you see that both have
IgnoreOnIsolate, but only the latter has Conflicts=rescue.service.
This also makes rescue.target in a container consistent with
emergency.target. In the latter case, the gettys were already stopped,
because they have a Requires dependency on sysinit.target.
Alan Jenkins [Wed, 9 Aug 2017 13:43:41 +0000 (14:43 +0100)]
units/console-getty.service: comment reason for ConditionPathExists
Currently we have 4 getty services. 1 has a BindsTo dependency on a
device unit. 3 have ConditionPathExists, but the reason is different in
every single one.
* Add comment to console-getty@.service (see commit 1b41981d)
* getty@.service is already commented
* container-getty.service is not strictly correct, as I realized while
trying to compose a comment. Reported as #6584.
William Douglas [Wed, 9 Aug 2017 15:53:03 +0000 (08:53 -0700)]
tmpfiles: Allow create symlink on directories (#6039)
Currently if tmpfiles is run with force on symlink creation but there already
exists a directory at that location, the creation will fail. This change
updates the behavior to remove the directory with rm_fr and then attempts to
create the symlink again.
tests: when running a manager object in a test, migrate to private cgroup subroot first (#6576)
Without this "meson test" will end up running all tests in the same
cgroup root, and they all will try to manage it. Which usually isn't too
bad, except when they end up clearing up each other's cgroups. This race
is hard to trigger but has caused various CI runs to fail spuriously.
With this change we simply move every test that runs a manager object
into their own private cgroup. Note that we don't clean up the cgroup at
the end, we leave that to the cgroup manager around it.
This fixes races that become visible by test runs throwing out errors
like this:
```
exec-systemcallfilter-failing.service: Passing 0 fds to service
exec-systemcallfilter-failing.service: About to execute: /bin/echo 'This should not be seen'
exec-systemcallfilter-failing.service: Forked /bin/echo as 5693
exec-systemcallfilter-failing.service: Changed dead -> start
exec-systemcallfilter-failing.service: Failed to attach to cgroup /exec-systemcallfilter-failing.service: No such file or directory
Received SIGCHLD from PID 5693 ((echo)).
Child 5693 ((echo)) died (code=exited, status=219/CGROUP)
exec-systemcallfilter-failing.service: Child 5693 belongs to exec-systemcallfilter-failing.service
exec-systemcallfilter-failing.service: Main process exited, code=exited, status=219/CGROUP
exec-systemcallfilter-failing.service: Changed start -> failed
exec-systemcallfilter-failing.service: Unit entered failed state.
exec-systemcallfilter-failing.service: Failed with result 'exit-code'.
exec-systemcallfilter-failing.service: cgroup is empty
Assertion 'service->main_exec_status.status == status_expected' failed at ../src/src/test/test-execute.c:71, function check(). Aborting.
```
bengal [Tue, 8 Aug 2017 16:55:31 +0000 (18:55 +0200)]
dhcp-network: adjust sockaddr length for addresses longer than 8 bytes (#6527)
An infiniband hardware address is 20 bytes, but sockaddr_ll.sll_addr is only 8
bytes. Explicitly ensure that sockaddr_union has enough space for infiniband
addresses, even if they run over sockaddr_ll and add a macro to compute the
proper size to pass to kernel.
sd-bus: free everything when bus_set_address_user fails (#6552)
Fixes:
```
$ env -i valgrind --leak-check=full ./build/test-bus-chat
...
==7763== 1,888 (1,824 direct, 64 indirect) bytes in 1 blocks are
definitely lost in loss record 2 of 2
==7763== at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==7763== by 0x4F8FF9A: sd_bus_new (sd-bus.c:175)
==7763== by 0x4F938BF: sd_bus_open_user (sd-bus.c:1138)
==7763== by 0x109ACD: server_init (test-bus-chat.c:70)
==7763== by 0x10BCF8: main (test-bus-chat.c:526)
==7763==
```
The directories are only used by the specific services, and
created before the services are started. So, it is not necessary
to create them by systemd-tmpfiles.
Luca Bruno [Sun, 6 Aug 2017 13:24:24 +0000 (13:24 +0000)]
core: evaluate presets after generators have run (#6526)
This commit moves the first-boot system preset-settings evaluation out
of main and into the manager startup logic itself. Notably, it reverses
the order between generators and presets evaluation, so that any changes
performed by first-boot generators are taken into the account by presets
logic.
After this change, units created by a generator can be enabled as part
of a preset.
I hit a test failure with the `max_gid+1` test. Problem is that we loop
over 0..r, but set `r` again within the loop (to 1). So max_gid is only
set based on the first supplementary GID.
dkg [Sat, 5 Aug 2017 23:19:09 +0000 (19:19 -0400)]
man: document socket requirement for systemd-socket-proxyd (#6535)
Without this requirement, if proxy-to-nginx.socket was down, and the sysadmin
were to do:
systemctl start proxy-to-nginx.service
then the service would come up without a configured socket, which doesn't make
sense. Normally this isn't how we expect a socket-activated service to start,
but it's possible for an admin to do this (if the .socket were already running,
the systemd-socket-proxyd process will start effectively idle). But the
.service shouldn't end up in a broken state if the .socket isn't already
listening.
Adding the explicit Requires: should ensure that an admin with this
configuration state can't accidentally break their system.
Martin Pitt [Fri, 4 Aug 2017 12:34:14 +0000 (14:34 +0200)]
test: Factorize common integration test functions (#6540)
All test/TEST* but TEST-02-CRYPTSETUP share the same check_result_qemu()
and test_cleanup(), so move them into test_functions and only override
them in TEST-02-CRYPTSETUP.
Also provide a common test_run() which by default assumes that both QEMU
and nspawn tests are run. Particular tests which don't support either
need to explicitly opt out by setting $TEST_NO_{QEMU,NSPAWN}. Do it this
way around to avoid accidentally forgetting to opt in, and to encourage
test authors to at least always support nspawn.
resolved,nss-myhostname: use _gateway for the gateway
This changes the symbolic name for the default gateway from "gateway" to
"_gateway". A new configuration option -Dcompat-gateway-hostname=true|false
is added. If it is set, the old name is also supported, but the new name
is used as the canonical name in either case. This is intended as a temporary
measure to make the transition easier, and the option should be removed
after a few releases, at which point only the new name will be used.
The old "gateway" name mostly works OK, but hasn't gained widespread acceptance
because of the following (potential) conflicts:
- it is completely legal to have a host called "gateway"
- there is no guarantee that "gateway" will not be registered as a TLD, even
though this currently seems unlikely. (Even then, there would be no
conflict except for the case when the top-level domain itself was being resolved.
The "gateway" or "_gateway" labels have only special meaning when the
whole name consists of a single label, so resolution of any subdomain
of the hypothetical gateway. TLD would still work OK. )
Moving to "_gateway" avoids those issues because underscores are not allowed
in host names (RFC 1123, §2.1) and avoids potential conflicts with local or
global names.
v2:
- simplify the logic to hardcode "_gateway" and allow
-Dcompat-gateway-hostname=true as a temporary measure.
alloc-util: add new helpers memdup_suffix0() and newdup_suffix0()
These are similar to memdup() and newdup(), but reserve one extra NUL
byte at the end of the new allocation and initialize it. It's useful
when copying out data from fixed size character arrays where NUL
termination can't be assumed.
execute: don't pass unit ID in --user mode to journald for stream logging
When we create a log stream connection to journald, we pass along the
unit ID. With this change we do this only when we run as system
instance, not as user instance, to remove the ambiguity whether a user
or system unit is specified. The effect of this change is minor:
journald ignores the field anyway from clients with UID != 0. This patch
hence only fixes the unit attribution for the --user instance of the
root user.
Checking for validity of a PID is relatively easy, but let's add a
helper cal for this too, in order to make things more readable and more
similar to uid_is_valid(), gid_is_valid() and friends.
audit: introduce audit_session_is_valid() and make use of it everywhere
Let's add a proper validation function, since validation isn't entirely
trivial. Make use of it where applicable. Also make use of
AUDIT_SESSION_INVALID where we need a marker for an invalid audit
session.
The long man page paragraph got it right: the tool is for escaping systemd unit
names, not just system unit names. Also fix the short man page paragraph
and the --help text.
Nicolas Iooss [Mon, 31 Jul 2017 15:45:33 +0000 (17:45 +0200)]
namespace: keep selinuxfs mounted read-write with ProtectKernelTunables (#5741)
When a service unit uses "ProtectKernelTunables=yes", it currently
remounts /sys/fs/selinux read-only. This makes libselinux report SELinux
state as "disabled", because most SELinux features are not usable. For
example it is not possible to validate security contexts (with
security_check_context_raw() or /sys/fs/selinux/context). This behavior
of libselinux has been described in
http://danwalsh.livejournal.com/73099.html and confirmed in a recent
email, https://marc.info/?l=selinux&m=149220233032594&w=2 .
Since commit 0c28d51ac849 ("units: further lock down our long-running
services"), systemd-localed unit uses ProtectKernelTunables=yes.
Nevertheless this service needs to use libselinux API in order to create
/etc/vconsole.conf, /etc/locale.conf... with the right SELinux contexts.
This is broken when /sys/fs/selinux is mounted read-only in the mount
namespace of the service.
Make SELinux-aware systemd services work again when they are using
ProtectKernelTunables=yes by keeping selinuxfs mounted read-write.
core: Do not fail perpetual mount units without fragment (#6459)
mount_load does not require fragment files to be present in order to
load mount units which are perpetual, or come from /proc/self/mountinfo.
mount_verify should do the same, otherwise a synthesized '-.mount' would
be marked as failed with "No such file or directory", as it is perpetual
but not marked to come from /proc/self/mountinfo at this point.
This happens for the user instance, and I suspect it was the cause of #5375
for the system instance, without gpt-generator.