execute: make ExecStatus dump more useful by showing passed time
Let's show the runtimes of our commands and preparations for them. It's
actually quite interesting, we sometimes are irritatingly slow with our
handoffs.
manager: switch service unit type over to using new handoff timestamping logic
Also: rename Handover → Handoff. I think it makes it clearer that this
is not really about handing over any resources, but that the executor is
out off the game from that point on.
execute: send handoff timestamps from executor to service manager
This changes the executor to systematically send handoff timestamps to
the service manager if a socket for that is supplied. This drops the
code that did this via Type=exec messages, and reverts that part to the
old behaviour before 93cb78aee2cff8109a5a70128287732f03d7a062.
Benefits of this approach:
1. We can collect the handoff for any command we fork off, regardless
if it's ExecStart= something else, regardless whether it's Type=exec,
Type=simple or some any other service type, regardless of the unit
type.
2. We collect both CLOCK_REALTIME and CLOCK_MONOTONIC, as we do for the
other process timestamps.
3. It's entirely backwards compatible, as this doesn't change the
protocol between service manager and executor, but just extends it.
manager: add socket for receiving handoff timestamps from forked children
This adds an AF_UNIX socket pair to the manager that we can collect
handoff timestamp messages on.
The idea is that forked off children send a datagram with a timestamp
and we use its sender PID to match it against the right forked off
process.
This part only implements the receiving side: a socket is created, and
listened on. Received datagrams are parsed, verified and then dispatched
to the interested units.
manager: comprehensively mark manager_dispatch_user_lookup_fd() as static
The prototype was static, but the implementation was not. Make both
static, this is otherwise too confusing. (This doesn't actually change
anything, since the prototype decides about this anyway, but it makes
things easier to read.)
While stracing PID1's forking off of children I noticed that every
single forked off child reads cap_last_cap from procfs. That value is a
kernel constant, hence we can save a lot of work if we'd cache it.
Thing is, we actually do cache it, in a thread_local cache field. This
means that the forked off processes (which are considered new threads)
will have to re-query it, even though we already know the result.
Hence, let's get rid of the thread_local stuff (given that the value is
going to be the same for all threads anyway, and we pretty much have a
single thread only anyway). Use an C11 atomic_int instead, which ensures
the value is either initialized or not initialized, but we don't need to
be concerned of partial initialization.
This makes the cap_last_cap reading go away in the children, as strace
shows (since cap_last_cap() is already called by PID 1 before
fork()ing, anyway).
Richard Maw [Tue, 23 Apr 2024 13:13:22 +0000 (14:13 +0100)]
test: Shut down tests on crash
If an assert in systemd fails it can't shut down normally.
By default it freezes. For interactive runs we want the crash shell
to enable further debugging, but during test runs we want it to exit
without having to wait for the test timeout.
By deactivating the crash shell, enabling reboot, and configuring qemu
so that it shuts down instead of rebooting we can shut down instead.
Because by default UEFI will enroll keys and then reboot
we also have to set --qemu-firmware-variables=custom
so it doesn't need to auto-enroll.
Because mkosi has to handle not receiving an EXIT_STATUS notification
it falls back to the exit code of qemu, which in the case of reboot
would be 0, we also override the success exit status to 123
and check that we got that as an exit code from mkosi.
sd-radv: allow to send multiple routes or prefix64 that have intersection with others
I cannot find any RFC that states we should not send multiple route
prefix or pref64 options that have intersection with others.
Moreover, each route prefix option has preference field, thus, user may
want to send e.g. a prefix with the normal preference, and another sub
prefix with the high preference. Previously, such configuration was
prohibited. Let's allow that now.
cryptenroll: default to block device backing /var/ rather than /
With 1df4b21abdb9e562805a7b006d179507182f845e we started to default to
enrolling into the LUKS device backing the root fs if none was specified
(and no wipe operation is used). This changes to look for /var/ instead.
On most systems /var/ is going to be on the root fs, hence this change
is with little effect.
However, on systems where / and /var/ is separate it makes more sense to
default to /var/ because that's where the persistent and variable data
is placed (i.e. where LUKS should be used) while / doesn't really have
to be variable, could as well be immutable, or ephemeral. Hence /var/
should be a safer default.
Or to say this differently: I think it makes sense to support systems
with /var/ being on / well. I also think it makes sense to support
systems with them being separate, and /var/ being variable and
persistent. But any other kind of system I find much less interesting to
support, and in that case people should just specify the device name.
Also, while we are at it, tighten the checks a bit, insist on a dm-crypt
+ LUKS superblock before continuing.
And finally, let's print a short message indicating the device we
operate on.
journal: do not rotate unrelated journal files when full or corrupted
When we fail to add an entry to a journal file, typically when the file
is full or corrupted, it is not necessary to rotate other journal files.
Not only that's unnecessary, rotating all journal files allows
unprivileged users to wipe system or other user's journals by writing
many journal entries to their own user journal file.
Let's rotate all journal files only when
- it is really requested by a privileged user (e.g. by journalctl --rotate), or
- the system time jumps backwards.
And, otherwise rotate only the journal file we are currently writing.
In that commit, we made Freeze/Thaw work with unrealized cgroups.
However, the unit was left in a strange state: it would be frozen by the
kernel but systemd would be unaware, and it remained possible to try and
realize the cgroup while the unit is supposed to be frozen. This commit
fixes the state tracking and prevents cgroups from being realized when
the unit is frozen.
We do the image build and run the tests in a btrfs loopback so we
can make use of btrfs subvolumes and COW to keep the disk space
requirements to a minimum and speed up the ephemeral copies we make
of the image to run the tests.
We also switch to building debug packages and publishing the built
packages as artifacts.
- Stop using logging module since the default output formatting is
pretty bad. Prefer print() for now.
- Log less, logging the full mkosi command line is rather verbose,
especially when it contains multi-line dropins.
- Streamline the journalctl command we output for debugging failed
tests.
- Don't force usage of the disk image format.
- Don't force running without unit tests.
- Don't force disabling RuntimeBuildSources.
- Update documentation to streamline the command for running a single
test and remove sudo as it's not required anymore.
- Improve the console output by having the test unit's output logged
to both the journal and the console.
- Disable journal console log forwarding as we have journal forwarding
as a better alternative.
- Delete existing journal file before running test.
- Delete journal files of succeeded tests to reduce disk usage.
- Rename system_mkosi target to just mkosi
- Pass in mkosi source directory explicitly to accomodate arbitrary
build directory locations.
- Add test interactive debugging if stdout is connected to a tty
- Stop explicitly using the 'system' image since it'll likely be
dropped soon.
- Only forward journal if we're not running in debugging mode.
- Stop using testsuite.target and instead just add the necessary
extras to the main testsuite unit via the credential dropin.
- Override type to idle so test output is not interleaved with
status output.
- Don't build mkosi target by default
- Always add the mkosi target if mkosi is found
- Remove dependency of the integration tests on the mkosi target
as otherwise the image is always built, even though we configure
it to not be built by default.
- Move mkosi output, cache and build directory into build/ so that
invocations from meson and regular invocations share the same
directories.
- Various aesthetic cleanups.
Building debug packages on ubuntu requires the "debug" option to be
specified explicitly. Debug packages on Ubuntu have the .ddeb extension,
so let's make sure we handle that by copying the .ddeb packages in the
build script as well.
isc-dhcp-server does not ship units, only sysv scripts, so the mkosi
presets that disable it have no effect. The generated unit is started on
each boot and fails, causing delays and noise.
Mask it so that the generated unit is overridden. It is installed only
to bring in binaries used by the networkd tests anyway.