Michal Sekletar [Fri, 25 Aug 2017 13:36:10 +0000 (15:36 +0200)]
service: attempt to execute next main command only for oneshot services (#6619)
This commit fixes crash described in
https://github.com/systemd/systemd/issues/6533
Multiple ExecStart lines are allowed only for oneshot services
anyway so it doesn't make sense to call service_run_next_main() with
services of type other than SERVICE_ONESHOT.
Referring back to reproducer from the issue, previously we didn't observe
this problem because s->main_command was reset after daemon-reload hence
we never reached the assert statement in service_run_next_main().
core: add two new special ExecStart= character prefixes
This patch adds two new special character prefixes to ExecStart= and
friends, in addition to the existing "-", "@" and "+":
"!" → much like "+", except with a much reduced effect as it only
disables the actual setresuid()/setresgid()/setgroups() calls, but
leaves all other security features on, including namespace
options. This is very useful in combination with
RuntimeDirectory= or DynamicUser= and similar option, as a user
is still allocated and used for the runtime directory, but the
actual UID/GID dropping is left to the daemon process itself.
This should make RuntimeDirectory= a lot more useful for daemons
which insist on doing their own privilege dropping.
"!!" → Similar to "!", but on systems supporting ambient caps this
becomes a NOP. This makes it relatively straightforward to write
unit files that make use of ambient capabilities to let systemd
drop all privs while retaining compatibility with systems that
lack ambient caps, where priv dropping is the left to the daemon
codes themselves.
This is an alternative approach to #6564 and related PRs.
This new group lists all UID/GID credential changing syscalls (which are
quite a number these days). This will become particularly useful in a
later commit, which uses this group to optionally permit user credential
changing to daemons in case ambient capabilities are not available.
These booleans simply store whether selinux/apparmor/smack are supposed
ot be used, and chache the various mac_xyz_use() calls before we
transition into the namespace, hence let's use the same verb for the
variables and the functions: "use"
Let's merge three if blocks that shall only run when sandboxing is applied
into one.
Note that this changes behaviour in one corner case: PrivateUsers=1 is
now honours both PermissionsStartOnly= and the "+" modifier in
ExecStart=, and not just the former, as before. This was an oversight,
so let's fix this now, at a point in time the option isn't used much
yet.
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
execute: add one more ExecFlags flag, for controlling unconditional directory chowning
Let's decouple the Manager object from the execution logic a bit more
here too, and simply pass along the fact whether we should
unconditionally chown the runtime/... directories via the ExecFlags
field too.
execute: let's decouple execute.c a bit from the unit logic
Let's try to decouple the execution engine a bit from the Unit/Manager
concept, and hence pass one more flag as part of the ExecParameters flags
field.
This adds a per-service restart counter. Each time an automatic
restart is scheduled (due to Restart=) it is increased by one. Its
current value is exposed over the bus as NRestarts=. It is also logged
(in a structured, recognizable way) on each restart.
Note that this really only counts automatic starts triggered by Restart=
(which it nicely complements). Manual restarts will reset the counter,
as will explicit calls to "systemctl reset-failed". It's supposed to be
a tool for measure the automatic restart feature, and nothing else.
Alan Jenkins [Wed, 9 Aug 2017 17:56:26 +0000 (18:56 +0100)]
units: console-getty.service: use the default RestartSec
> Note that console-getty.service as more uses than just containers. The
> idea is that it may be used as alternative to the whole VC/logind stuff,
> if all you need is a console on /dev/console, even on physical devices.
This means we want to remove RestartSec=0, for serial systems.
See 4bf0432 "units/serial-getty@.service: use the default RestartSec".
Alan Jenkins [Mon, 7 Aug 2017 18:24:32 +0000 (19:24 +0100)]
units: add Conflicts=rescue.service to container-getty@.service
The traditional runlevel 1 is "single user mode", and shuts down all but
the main console. In systemd, rescue.target provides runlevel1.target.
But it did not shut down logins on secondary consoles... if systemd was
running in a container.
I don't think we strictly need to change this. But when you look at both
container-getty@.service and getty@.service, you see that both have
IgnoreOnIsolate, but only the latter has Conflicts=rescue.service.
This also makes rescue.target in a container consistent with
emergency.target. In the latter case, the gettys were already stopped,
because they have a Requires dependency on sysinit.target.
Alan Jenkins [Wed, 9 Aug 2017 13:43:41 +0000 (14:43 +0100)]
units/console-getty.service: comment reason for ConditionPathExists
Currently we have 4 getty services. 1 has a BindsTo dependency on a
device unit. 3 have ConditionPathExists, but the reason is different in
every single one.
* Add comment to console-getty@.service (see commit 1b41981d)
* getty@.service is already commented
* container-getty.service is not strictly correct, as I realized while
trying to compose a comment. Reported as #6584.
pam_logind: skip leading /dev/ from PAM_TTY field before passing it on
Apparently, PAM documents that the PAM_TTY should come with a /dev
prefix, but we don't expect it so far, except that Wayland ends up
setting it after all, the way the docs suggest. Hence, let's simply drop
the /dev prefix if it is there.
This new helper removes a leading /dev if there is one. We have code
doing this all over the place, let's unify this, and correct it while
we are at it, by using path_startswith() rather than startswith() to
drop the prefix.
William Douglas [Wed, 9 Aug 2017 15:53:03 +0000 (08:53 -0700)]
tmpfiles: Allow create symlink on directories (#6039)
Currently if tmpfiles is run with force on symlink creation but there already
exists a directory at that location, the creation will fail. This change
updates the behavior to remove the directory with rm_fr and then attempts to
create the symlink again.
tests: when running a manager object in a test, migrate to private cgroup subroot first (#6576)
Without this "meson test" will end up running all tests in the same
cgroup root, and they all will try to manage it. Which usually isn't too
bad, except when they end up clearing up each other's cgroups. This race
is hard to trigger but has caused various CI runs to fail spuriously.
With this change we simply move every test that runs a manager object
into their own private cgroup. Note that we don't clean up the cgroup at
the end, we leave that to the cgroup manager around it.
This fixes races that become visible by test runs throwing out errors
like this:
```
exec-systemcallfilter-failing.service: Passing 0 fds to service
exec-systemcallfilter-failing.service: About to execute: /bin/echo 'This should not be seen'
exec-systemcallfilter-failing.service: Forked /bin/echo as 5693
exec-systemcallfilter-failing.service: Changed dead -> start
exec-systemcallfilter-failing.service: Failed to attach to cgroup /exec-systemcallfilter-failing.service: No such file or directory
Received SIGCHLD from PID 5693 ((echo)).
Child 5693 ((echo)) died (code=exited, status=219/CGROUP)
exec-systemcallfilter-failing.service: Child 5693 belongs to exec-systemcallfilter-failing.service
exec-systemcallfilter-failing.service: Main process exited, code=exited, status=219/CGROUP
exec-systemcallfilter-failing.service: Changed start -> failed
exec-systemcallfilter-failing.service: Unit entered failed state.
exec-systemcallfilter-failing.service: Failed with result 'exit-code'.
exec-systemcallfilter-failing.service: cgroup is empty
Assertion 'service->main_exec_status.status == status_expected' failed at ../src/src/test/test-execute.c:71, function check(). Aborting.
```
bengal [Tue, 8 Aug 2017 16:55:31 +0000 (18:55 +0200)]
dhcp-network: adjust sockaddr length for addresses longer than 8 bytes (#6527)
An infiniband hardware address is 20 bytes, but sockaddr_ll.sll_addr is only 8
bytes. Explicitly ensure that sockaddr_union has enough space for infiniband
addresses, even if they run over sockaddr_ll and add a macro to compute the
proper size to pass to kernel.
sd-bus: free everything when bus_set_address_user fails (#6552)
Fixes:
```
$ env -i valgrind --leak-check=full ./build/test-bus-chat
...
==7763== 1,888 (1,824 direct, 64 indirect) bytes in 1 blocks are
definitely lost in loss record 2 of 2
==7763== at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==7763== by 0x4F8FF9A: sd_bus_new (sd-bus.c:175)
==7763== by 0x4F938BF: sd_bus_open_user (sd-bus.c:1138)
==7763== by 0x109ACD: server_init (test-bus-chat.c:70)
==7763== by 0x10BCF8: main (test-bus-chat.c:526)
==7763==
```
The directories are only used by the specific services, and
created before the services are started. So, it is not necessary
to create them by systemd-tmpfiles.
Luca Bruno [Sun, 6 Aug 2017 13:24:24 +0000 (13:24 +0000)]
core: evaluate presets after generators have run (#6526)
This commit moves the first-boot system preset-settings evaluation out
of main and into the manager startup logic itself. Notably, it reverses
the order between generators and presets evaluation, so that any changes
performed by first-boot generators are taken into the account by presets
logic.
After this change, units created by a generator can be enabled as part
of a preset.
I hit a test failure with the `max_gid+1` test. Problem is that we loop
over 0..r, but set `r` again within the loop (to 1). So max_gid is only
set based on the first supplementary GID.
dkg [Sat, 5 Aug 2017 23:19:09 +0000 (19:19 -0400)]
man: document socket requirement for systemd-socket-proxyd (#6535)
Without this requirement, if proxy-to-nginx.socket was down, and the sysadmin
were to do:
systemctl start proxy-to-nginx.service
then the service would come up without a configured socket, which doesn't make
sense. Normally this isn't how we expect a socket-activated service to start,
but it's possible for an admin to do this (if the .socket were already running,
the systemd-socket-proxyd process will start effectively idle). But the
.service shouldn't end up in a broken state if the .socket isn't already
listening.
Adding the explicit Requires: should ensure that an admin with this
configuration state can't accidentally break their system.
Martin Pitt [Fri, 4 Aug 2017 12:34:14 +0000 (14:34 +0200)]
test: Factorize common integration test functions (#6540)
All test/TEST* but TEST-02-CRYPTSETUP share the same check_result_qemu()
and test_cleanup(), so move them into test_functions and only override
them in TEST-02-CRYPTSETUP.
Also provide a common test_run() which by default assumes that both QEMU
and nspawn tests are run. Particular tests which don't support either
need to explicitly opt out by setting $TEST_NO_{QEMU,NSPAWN}. Do it this
way around to avoid accidentally forgetting to opt in, and to encourage
test authors to at least always support nspawn.
resolved,nss-myhostname: use _gateway for the gateway
This changes the symbolic name for the default gateway from "gateway" to
"_gateway". A new configuration option -Dcompat-gateway-hostname=true|false
is added. If it is set, the old name is also supported, but the new name
is used as the canonical name in either case. This is intended as a temporary
measure to make the transition easier, and the option should be removed
after a few releases, at which point only the new name will be used.
The old "gateway" name mostly works OK, but hasn't gained widespread acceptance
because of the following (potential) conflicts:
- it is completely legal to have a host called "gateway"
- there is no guarantee that "gateway" will not be registered as a TLD, even
though this currently seems unlikely. (Even then, there would be no
conflict except for the case when the top-level domain itself was being resolved.
The "gateway" or "_gateway" labels have only special meaning when the
whole name consists of a single label, so resolution of any subdomain
of the hypothetical gateway. TLD would still work OK. )
Moving to "_gateway" avoids those issues because underscores are not allowed
in host names (RFC 1123, §2.1) and avoids potential conflicts with local or
global names.
v2:
- simplify the logic to hardcode "_gateway" and allow
-Dcompat-gateway-hostname=true as a temporary measure.