man: significantly downgrade the Options section in systemd(1)
This structure of the man page originates from the time when systemd was
installed on top of sysvinit systems, and users had an actual chance to
interact with the systemd binary directly. Nowadays it is almost never called
directly, so let's properly explain this in the overview.
The Options section is moved down below the kernel command line, those options
are only needed in special circumstances. Let's refer the reader to the
description of the kernel command line options, and not duplicate the
descriptions (which makes the text longer than necessary and increases chances
for discrepancies).
Systemd is also prominently used as the user manager, let's mention that in the
Overview.
While at it, use "=" only when an argument is required as we nowadays do.
This makes the ask-password agent handling more alike the polkit agent
handling again, and introduces ask_password_agent_open_if_enabled() that
works just like the already existing polkit_agent_open_if_enabled().
Torsten Hilbrich [Tue, 12 Nov 2019 07:36:06 +0000 (08:36 +0100)]
nspawn: Allow Capability= to overrule private network setting
The commit:
a3fc6b55ac nspawn: mask out CAP_NET_ADMIN again if settings file turns off private networking
turned off the CAP_NET_ADMIN capability whenever no private networking
feature was enabled. This broke configurations where the CAP_NET_ADMIN
capability was explicitly requested in the configuration.
Changing the order of evalution here to allow the Capability= setting
to overrule this implicit setting:
Order of evaluation:
1. if no private network setting is enabled, CAP_NET_ADMIN is removed
2. if a private network setting is enabled, CAP_NET_ADMIN is added
3. the settings of Capability= are added
4. the settings of DropCapability= are removed
This allows the fix for #11755 to be retained and to still allow the
admin to specify CAP_NET_ADMIN as additional capability.
Kevin Kuehler [Thu, 14 Nov 2019 00:56:23 +0000 (16:56 -0800)]
units: set ProtectKernelLogs=yes on relevant units
We set ProtectKernelLogs=yes on all long running services except for
udevd, since it accesses /dev/kmsg, and journald, since it calls syslog
and accesses /dev/kmsg.
If we fail to start polkit, we get a message like
"org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate remote peer.",
which has no meaning for the caller of our StartUnit method. Let's just
return -EACCES.
$ systemctl start apache
Failed to start apache.service: Could not activate remote peer. (before)
Failed to start apache.service: Access denied (after)
test: Disable LUKS devices from initramfs in QEMU tests
We currently use the host's kernel and initramfs in our QEMU tests.
If the host is running on an encrypted LUKS partition, then the initramfs
will have a crypttab setup looking for the particular root disk it needs to
encrypt before booting into the system.
However, this disk obviously doesn't exist in our QEMU VM, so it turns out
our tests end up waiting for this device to become available, which will
never actually happen, and boot hangs for 90s until that service times out.
[*** ] A start job is running for /dev/disk/by-uuid/01234567-abcd-1234-abcd-0123456789ab (20s / 1min 30s)
In order to prevent this issue, let's pass "rd.luks=0" to disable LUKS in
the initramfs only as part of our default kernel command-line in our QEMU
tests.
This is enough to disable this behavior and prevent the timeout, while at
the same time doesn't conflict with our tests that actually check for LUKS
behavior in the systemd running under test (such as TEST-02-CRYPTSETUP).
Tested: `sudo make -C TEST-02-CRYPTSETUP/ clean setup run`
Be more specific in resolved.conf man page with regard to DNSOverTLS
DNSOverTLS in strict mode (value yes) does check the server, as it is said in
the first few lines of the option documentation. The check is not performed in
"opportunistic" mode, however, as that is allowed by RFC 7858, section "4.1.
Opportunistic Privacy Profile".
> With such a discovered DNS server, the client might or might not validate the
> resolver. These choices maximize availability and performance, but they leave
> the client vulnerable to on-path attacks that remove privacy.
This partially reverts db11487d1062655f17db54c4d710653f16c87313 (the logic to
calculate the correct value is removed, we always use the same setting as for
the system manager). Distributions have an easy mechanism to override this if
they wish.
I think making this configurable is better, because different distros clearly
want different defaults here, and making this configurable is nice and clean.
If we don't make it configurable, distros which either have to carry patches,
or what would be worse, rely on some other configuration mechanism, like
/etc/profile. Those other solutions do not apply everywhere (they usually
require the shell to be used at some point), so it is better if we provide
a nice way to override the default.
HATAYAMA Daisuke [Wed, 13 Nov 2019 11:30:58 +0000 (06:30 -0500)]
verify: fix segmentation fault
systemd-analyze verify command now results in segmentation fault if two
consecutive non-existent unit file names are given:
# ./build/systemd-analyze a.service b.service
...<snip irrelevant part>...
Unit a.service not found.
Unit b.service not found.
Segmentation fault (core dumped)
The cause of this is a wrong handling of return value of
manager_load_startable_unit_or_warn() in verify_units() in failure case.
It looks that the current logic wants to assign the first error status
throughout verify_units() into variable r and count up variable count only when
a given unit file exists.
However, due to the wrong handling of the return value of
manager_load_startable_unit_or_warn() in verify_units(), the variable count is
unexpectedly incremented even when there is no such unit file because the
variable r already contains non-zero value in the 2nd failure, set by the 1st
failure, and then the condition k < 0 && r == 0 evaluates to false.
This commit fixes the wrong handling of return value of
manager_load_startable_unit_or_warn() in verify_units().
cryptsetup-generator: allow overriding /run/systemd/cryptsetup with $RUNTIME_DIRECTORY
I added a fairly vague entry to docs/ENVIRONMENT because I think it is worth
mentioning there (in case someone is looking for any environment variable that
might be relevant).
nspawn: do not emit any warning when $UNIFIED_CGROUP_HIERARCHY is used
Initially I thought this is a good idea, but when reviewing a different PR
(https://github.com/systemd/systemd/pull/13862#discussion_r340604313) I changed
my mind about this. At some point we probably should start warning about the
old option name, and yet later remove it. But it'll make it easier for people
to transition to the new option name if there's a period of support for both
names without any fuss. There's nothing particularly wrong about the old name,
and there is no support cost.
Martin Wilck [Tue, 12 Nov 2019 15:43:42 +0000 (16:43 +0100)]
udevd: fix crash when workers time out after exit is signal caught
If udevd receives an exit signal, it releases its reference on the udev
monitor in manager_exit(). If at this time a worker is hanging, and if
the event timeout for this worker expires before udevd exits, udevd
crashes in on_sigchld()->udev_monitor_send_device(), because the monitor
has already been freed.
Fix this by releasing the main process's monitor ref later, in
manager_free().
Franck Bui [Fri, 18 Oct 2019 10:44:51 +0000 (12:44 +0200)]
logind: fix (again) the race that might happen when logind restores VT
This patch is a new attempt to fix the race originally described in issue #9754.
The initial fix (commit ad96887a1205bad9656d280c5681f482e6d04838) consisted in
spawning a sub process that became the controlling process of the VT and hence
kicked the old controlling process off to make sure that the VT wouldn't have
entered in HUP state while logind restored the VT.
But it introduced a regression (see issue #11269) and thus was reverted. But
unlike it was described in the revert commit message, commit adb8688b3ff445d9c48ed0d72208c7844c2acc01 alone doen't fix the initial race.
This patch fixes the race in a simpler way by trying to restore the VT a second
time after making sure to re-open it if the first attempt fails.
Indeed if the old controlling process dies before or during the first attempt,
logind will fail to restore the VT. At this point the VT is in HUP state but
we're sure that it won't enter in a HUP state a second time. Therefore we will
retry by re-opening the VT to clear the HUP state and by restoring the VT a
second time, which should be safe this time.
Martin Wilck [Wed, 6 Nov 2019 11:24:41 +0000 (12:24 +0100)]
udevd: wait for workers to finish when exiting
On some systems with lots of devices, device probing for certain drivers can
take a very long time. If systemd-udevd detects a timeout and kills the worker
running modprobe using SIGKILL, some devices will not be probed, or end up in
unusable state. The --event-timeout option can be used to modify the maximum
time spent in an uevent handler. But if systemd-udevd exits, it uses a
different timeout, hard-coded to 30s, and exits when this timeout expires,
causing all workers to be KILLed by systemd afterwards. In practice, this may
lead to workers being killed after significantly less time than specified with
the event-timeout. This is particularly significant during initrd processing:
systemd-udevd will be stopped by systemd when initrd-switch-root.target is
about to be isolated, which usually happens quickly after finding and mounting
the root FS.
If systemd-udevd is started by PID 1 (i.e. basically always), systemd will
kill both udevd and the workers after expiry of TimeoutStopSec. This is
actually better than the built-in udevd timeout, because it's more transparent
and configurable for users. This way users can avoid the mentioned boot problem
by simply increasing StopTimeoutSec= in systemd-udevd.service.
If udevd is not started by systemd (standalone), this is still an
improvement. udevd will kill hanging workers when the event timeout is
reached, which is configurable via the udev.event_timeout= kernel
command line parameter. Before this patch, udevd would simply exit with
workers still running, which would then become zombie processes.
With the timeout removed, the sd_event_now() assertion in manager_exit() can be
dropped.
test-unit-name: add usual headers and add more verbose output
This makes it easier to see what unit_name_is_valid() returns at a glance.
The output is not whitespace clean, but I think it's good enough for a test.
We compile some c++ code for tests. We would simply use the default options for
those. When the previous commit raised the default warning level, we started
getting warnings from c++ code. Let's add the most important options to the c++
command, so that we get a compilation without any warnings again.
I don't think it makes sense to add *all* the options that we add for c to the
c++ flags, because testing them takes quite a while, and the c++ compilations
are for small amounts of code, mostly to check that the headers have compatible
syntax.
Let's bump up the warning level, and not add by -Wextra by hand. This is the
approach recommended by meson. The idea is that all projects should be as
similar as possible to make it easier for users to switch between projects.
With meson-0.52.0-1.module_f31+6771+f5d842eb.noarch I get:
src/test/meson.build:19: WARNING: Overriding previous value of environment variable 'PATH' with a new one
When we're using *prepend*, the whole point is to modify an existing variable,
so meson shouldn't warn. But let's set avoid the warning and shorten things by
setting the final value immediately.
The code in cgroup.c has support for all hierarchies, but the test,
as written, will only work on unified. Since the test is really about
bpf code, and not the legacy devices controller, let's just skip
the test.
(Yes, Ubuntu should install the UTC timezone data unconditionally: it
should not be an option, even if all other timezone data is excluded,
but since it's our business to validate user input but not out business
to validate distros, let's just accept "UTC" unconditionally, it's magic
after all)
bpf: make sure the kernel do not submit an invalid program if no pattern matched
It turns out that the kernel verifier would reject a program we would build
if there was a whitelist, but no entries in the whitelist matched.
The program would approximately like this:
0: (61) r2 = *(u32 *)(r1 +0)
1: (54) w2 &= 65535
2: (61) r3 = *(u32 *)(r1 +0)
3: (74) w3 >>= 16
4: (61) r4 = *(u32 *)(r1 +4)
5: (61) r5 = *(u32 *)(r1 +8)
48: (b7) r0 = 0
49: (05) goto pc+1
50: (b7) r0 = 1
51: (95) exit
and insn 50 is unreachable, which is illegal. We would then either keep a
previous version of the program or allow everything. Make sure we build a
valid program that simply rejects everything.
This makes the code a bit longer, but easier to read I think, because
the cgroup v1 and v2 code paths are more similar. And whent he type is
a char, any backtrace is easier to interpret.