Luca Boccassi [Tue, 12 Aug 2025 22:59:15 +0000 (23:59 +0100)]
test-cgroup: cleanup test cgroup
One test cgroup gets left behind by the test, as it moves itself
into it. Move itself and back to the original cgroup at the end
and clean up.
This fixes a failure when running the test first as root, and then
as unprivileged (initial cleanup fails as the leftover test cgroup
is owned by root).
Found using linkchecker.
For virtiofsd, the man page is maintained upstream, but doesn't seem to be
available in any of the usual places. So let's link to the Debian version.
systemd.filter I have no idea what it is.
logging: Improve logging messages related to NFTSet.
The 'NFTSet' directive in various units adds and removes entries in nftables
sets, it does not add or remove entire sets. The logging messages should
indicate that an entry was added or removed, not that a set was added or
removed.
Luca Boccassi [Wed, 6 Aug 2025 13:33:10 +0000 (14:33 +0100)]
test: use Europe/Helsinki instead of Europe/Kyiv in test-calendarspec
Europe/Kyiv was added somewhat recently. Use Europe/Helsinki which is
much older and thus works with older tzdata like version 2022a.
line 193: "2016-03-27 03:17:00" new_tz=:Europe/Kyiv
At: Sun 2016-03-27 03:17:00.000000 Europe
Assertion 'r == -ENOENT' failed at src/test/test-calendarspec.c:70, function _test_next(). Aborting.
Luca Boccassi [Wed, 6 Aug 2025 13:07:26 +0000 (14:07 +0100)]
test: fix repeated runs of test-oomd-util by clearing test cgroup
If the test is ran multiple times in a row, without an ephemeral
scope (eg: non-booted nspawn), then subsequent runs will fail as
the test cgroup is not cleared so the previous xattrs are still
present. Trim the test cgroup before and after the test.
ukify: fix insertion of padding in merged sections
The padding was done to expand the new section contents to the expected size of
the new section. And this then would be used for the content in the existing
section. The new section cannot be larger than the old section, but it can be
smaller. If the new section was smaller, then we'd not write enough padding and
the output file would be corrupted.
This was observed in CI when the .sbat section in the stub was padded to 1k.
The UKI with an .sbat section that was merged and was fairly short would hit
this scenario and be corrupted.
Yu Watanabe [Mon, 10 Mar 2025 19:21:11 +0000 (04:21 +0900)]
TEST-23-UNIT-FILE: skip verifying masked unit
This fixes the following failure:
TEST-23-UNIT-FILE.sh[2408]: + systemd-analyze --recursive-errors=no --man=no verify /usr/lib/systemd/system/sysinit.target.wants/systemd-hwdb-update.service
systemd-analyze[2737]: sys-kernel-config.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: proc-sys-fs-binfmt_misc.automount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: dev-hugepages.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: sys-kernel-tracing.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: sys-kernel-debug.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: sys-fs-fuse-connections.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: dev-mqueue.mount: symlinks are not allowed for units of this type, rejecting.
systemd-analyze[2737]: Unit systemd-hwdb-update.service is masked.
TEST-23-UNIT-FILE.sh[166]: + :
TEST-23-UNIT-FILE.sh[166]: + kill -0 2408
TEST-23-UNIT-FILE.sh[166]: + wait 2408
TEST-23-UNIT-FILE.sh[166]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-23-UNIT-FILE.verify-unit-files.sh failed'
TEST-23-UNIT-FILE.sh[166]: Subtest /usr/lib/systemd/tests/testdata/units/TEST-23-UNIT-FILE.verify-unit-files.sh failed
Yu Watanabe [Mon, 4 Aug 2025 17:44:18 +0000 (02:44 +0900)]
udev/spawn: continue to read stdout even if the result buffer is full
Previously, when the stdout of a spawned process (e.g. dmi_memory_id) is
truncated, the event source was not re-enabled, that will cause the process
to remain in a write-blocked state if the stdout buffer is full, and the
process will time out:
```
Spawned process 'dmi_memory_id' [1116] timed out after 2min 59s, killing.
Process 'dmi_memory_id' terminated by signal KILL.
```
The solution is to continue enabling the event source so that on_spawn_io()
can continue reading the stdout buffer. When the result buffer is full, the
local `buf` variable will be used to drain remaining stdout.
journal-file: let's make journal_file_copy_entry() robust against concurrent writing of the source
As usual, we need to protect ourselves against concurrent modification
of journal files. We a pretty good at that these days when reading
journal files. But journal_file_copy_entry() so far wasn't too good with
that. journal_file_append_data() so far returned EINVAL when you pass
invalid data to it. Since we pass the source data as-is in there, it's
going to fail if the journal source file is slightly invalid due to a
concurrent update.
Hence, we need to validate data gracefully here that we think comes from
a safe place, because actually it doesn't, it's directly copied from an
unsafe journal file.
Hence, let's introduce a clear error code here, and look for it in
journal_file_copy_entry(), and handle it gracefully.
Pretty sure this fixes #33372, but it's a race, so I don't know for
sure. If this remains reproducible we need to look at this again.
journal: replace a bunch of assert() with friendlier checks
We should not rely that data stored in the journal files remains
entirely untouched at all times. Because we unallocate files, data might
go away any time. Hence, never assert() on any expectations on what the
file contains. Instead, handle it more gracefully as a corruption issue,
and return EBADMSG.
This is just paranoia: let's determine the compression to use once,
instead of twice, after all te data is in journal files which might be
corrupted any time, and it would be weird if we came to different
results here each time.
journal: use EBADMSG for invalid data in file mmap
We must assume that any data in the mmap can change anytime because the
file is deallocated or similar. Let's strictly use EBADMSG for reporting
invalid file contents though (as opposed to using EINVAL if our own code
passes a wrong parameter somwhere).
terminal-util: switch from TCSADRAIN to TCSANOW for all tcsetattr() calls
TCSADRAIN means tcsetattr() will become blocking (waiting for ability to
write out queued bytes), which is problematic, if the referenced TTY is
dead for some reason.
Since all these calls just modify *input* parameters anyway (i.e. mostly
local echo, and canonical mode), forcing out queued output is kinda
pointless anyway, hence just don't do it: leave it in the queue and just
change the flags we want to change.
The tcsetattr(3) man page kinda hints that we want to use TCSANOW here,
because it documents for TCSADRAIN:
"This option should be used when changing parameters that affect
output."
Which one can read so that TCSADRAIN should not be used if it doesn't
affect output, which is the case here.
The offending commit fails to account for the case where
we have fewer lines before --until= than what's specified
in --lines=. Aside from that, if --grep= + --lines=+N are used,
we might also seek forward in the middle of the loop,
breaking the --until= boundary.
Let's turn the logic around then. Context.until_safe will
be set iff we're certain that there's enough to output,
and it gets reset whenever we seek forward.
[1] says:
> Since 0.60.0 the name argument is optional and defaults to the basename of
> the first output
We specify >= 0.62 as the supported version, so drop the duplicate name in all cases
where it is the same as outputs[0], i.e. almost all cases.
Yu Watanabe [Tue, 24 Jun 2025 17:31:48 +0000 (02:31 +0900)]
man: fix @BUILD_ROOT@ insertion
@BUILD_ROOT@ is replaced with the _quoted_ build path. Hence, if
@BUILD_ROOT@ is quoted, the result is doubly quoted, and the script does
not work if the path contains spaces.
Fabian Vogt [Fri, 1 Aug 2025 08:59:09 +0000 (10:59 +0200)]
virt: Actually use DMI detection on RISC-V as well
When booting Linux with ACPI in QEMU, the device tree is not used and
the DT based detection will not work. DMI values are accurate though
and indicate QEMU.
While detect_vm_dmi_vendor() was enabled for RISC-V in a previous commit,
it missed detect_vm_dmi(), so it was never actually used. Fix that.
TEST-13-NSPAWN: wait for a few seconds after markers found
Otherwise, the scope that the nspawn container belonging to may be
removed before the grandchild process of the machined exits and it may
be SIGKILLed.
```
[ 100.829613] systemd-machined[678]: Successfully forked off '(sd-bindmnt)' as PID 2962.
[ 100.833366] systemd-nspawn[2953]: Inner child finished, invoking payload.
[ 100.836111] (sd-bindmnt)[2962]: Skipping PR_SET_MM, as we don't have privileges.
[ 100.836401] (sd-bindmnt)[2962]: Successfully forked off '(sd-bindmnt-inner)' as PID 2964.
[ 100.846498] (sd-bindmnt)[2962]: (sd-bindmnt-inner) terminated by signal KILL.
[ 100.848846] systemd[1]: machine-TEST\x2d13\x2dNSPAWN.machinectl\x2dbind.7ye.scope: cgroup is empty
[ 100.849303] systemd[1]: machine-TEST\x2d13\x2dNSPAWN.machinectl\x2dbind.7ye.scope: Deactivated successfully.
[ 100.849317] systemd[1]: machine-TEST\x2d13\x2dNSPAWN.machinectl\x2dbind.7ye.scope: Changed running -> dead
[ 100.849752] systemd[1]: machine-TEST\x2d13\x2dNSPAWN.machinectl\x2dbind.7ye.scope: Consumed 91ms CPU time, 1.3M memory peak.
[ 100.850399] systemd-machined[678]: (sd-bindmnt) failed with exit status 1.
[ 100.850414] systemd-machined[678]: Child failed.
[ 100.854574] systemd-machined[678]: Failed to mount /tmp/marker-varlink on /tmp/marker-varlink in the namespace of machine 'TEST-13-NSPAWN.machinectl-bind.7ye': Protocol error
```
journald: add debug logs around offlining/archiving/rotating/varlink operations
It is not easy to understand what happens to a journal file
even with debug logs enabled. Add more dbg messages around operations
started by users to make it possible to follow the flow of operations.
- drop unused variables,
- adjust number of partitions, interations, and timeout,
- clear partitions on each test case finished,
- check if unnecessary devlinks are removed,
- several coding style cleanups.
TEST-64-UDEV-STORAGE: several fixlets for check_device_units()
To suppress the following warnings in case check_device_unit() failed e.g.
when the device is already removed:
```
sed: couldn't write 130 items to stdout: Broken pipe
awk: write failure (Broken pipe)
awk: close failed on file "/dev/stdout" (Broken pipe)
```
udev/node: check the target device node of devlink on removal
If the removal of the devlink is requested due to this is a 'remove' event,
it is trivial that the devlink will not be owned by this device anymore.
Let's read the devlink and if it points to our device node, then we need
to update the devlink. If it points to another device node, then it is already
owned by another device, hence we should not touch it and keep it as is.
- move to TEST-07-PID1, as it is a timer setting,
- rename the timer and service, to emphasize they are for testing
DeferReactivation=,
- use timeout command to wait for the timer being triggered several times,
- stop the timer when not necessary,
- accept 9 seconds as delta, as there are fluctuations.
TEST-71-HOSTNAME: specify job mode for the stop job (#38413)
The CI run is failing in the stop command:
```
[ 4841.936906] TEST-71-HOSTNAME.sh[140]: + stop_hostnamed
[ 4841.936906] TEST-71-HOSTNAME.sh[140]: + systemctl stop systemd-hostnamed.service
[ 4845.959747] TEST-71-HOSTNAME.sh[226]: Job for systemd-hostnamed.service canceled.
[ 4846.013286] systemd[1]: TEST-71-HOSTNAME.service: Main process exited, code=exited, status=1/FAILURE
[ 4846.013792] systemd[1]: TEST-71-HOSTNAME.service: Failed with result 'exit-code'.
[ 4846.021821] systemd[1]: Failed to start TEST-71-HOSTNAME.service - TEST-71-HOSTNAME.
```
This happens when we create the stop job, but while we're waiting for
it to finish, something triggers a start of the unit and we lose to competing
start job.
man/systemd-boot: recommend holding space by default
https://github.com/systemd/systemd/pull/15509/files#r2234113960 complains that the
advice is still not clear enough. systemd-boot itself says
"Menu hidden. Hold down key at bootup to show menu."
so let's do the same and tell users to hold down space as the first option.
This should work fine for 99% of people. Then invert the following advice to
try repeated pressing as the alternative option.
Also, fix the advice about --boot-loader-menu=. The whole para is about getting
the menu to show, so 0 is not a good value.
Follow-up for https://github.com/systemd/systemd/pull/15509.
man/systemd-boot: describe which keys use EFI variables
Some keys have only a transient effect, e.g. 'e', but some have a persistent
effect, e.g. 'd'. This is important informations, but the reader might be
forgiven for not finding that at all obvious when reading the descriptions of
the keys.
Also, mention in loader.conf man page that the settings there might be overriden
by EFI variables. This is another thing that is important but not obvious.
For some reason, the man page for loader.conf also mentioned type#1 entries
in passing. Except for using the same file extension, those files are in a
completely different format and with a different purpose. This mixup was
first introduced in f37d3835828c45b3a92ed12d9a6a30796c0a4a27, was then
reported in #10923, which was closed by cbae79b8d07327051c1e1f438f7086ab634b93f8,
but that didn't fix the actual issue.
Really fixes #10923.
While at it, simplify and improve the wording a bit.
TEST-03-JOBS: modernize test code and extend timeout
- use timeout command more,
- use `(! cmd)` rather than `cmd && exit 1`,
- drop unnecessary `|| exit 1`,
- extend timeout to support slow test environment.
* cb1a3c9049 FirmwareVariables: allow generating during image build
* 6104923534 env: export $EFI_ARCHITECTURE in hook scripts on EFI arches
* fef33f96a2 mkosi-tools: ukify moved to systemd-ukify in openSUSE
* ec4475a846 ensure builds with cache over device boundaries
* 7be5159f24 Change UnifiedKernelImages to enum and accept signed/unsigned
* 071ac4a575 mkosi-vm: install systemd-boot-efi-signed where available
* 1865be628e opensuse: Install OpenSUSE-release if another release package is not installed
* 0381b17819 qemu: Disable hpet for x86 VMs
* 4f63700eb3 mkosi-tools: install systemd-boot-tools for bootctl
* 1230ed333b man: remove duplicate 'the' in FirmwareVariables description
Fixes the following error when running with sanitizers:
```
TEST-87-AUX-UTILS-VM.sh[670]: + bootctl install --make-entry-directory=yes
TEST-87-AUX-UTILS-VM.sh[695]: Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi.signed" to "/boot/EFI/systemd/systemd-bootx64.efi".
TEST-87-AUX-UTILS-VM.sh[695]: Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi.signed" to "/boot/EFI/BOOT/BOOTX64.EFI".
TEST-87-AUX-UTILS-VM.sh[695]: Created "/boot/fedora".
TEST-87-AUX-UTILS-VM.sh[695]: Random seed file /boot/loader/random-seed successfully refreshed (32 bytes).
TEST-87-AUX-UTILS-VM.sh[695]: ../src/shared/efi-api.c:618:38: runtime error: left shift of 243 by 24 places cannot be represented in type 'int'
```
TEST-23-UNIT-FILE: do not wait indefinitely but set a reasonable timeout
Otherwise, the test does not finish until the global timeout is reached.
This is for making the test fail earlier when something spurious happens:
```
[FAILED] Failed to start TEST-23-UNIT-FILE-short-lived.service - Shortlived Unit.
TEST-23-UNIT-FILE.sh[776]: + '[' 0 -eq 0 ']'
TEST-23-UNIT-FILE.sh[776]: + sleep .5
(snip)
58/98 systemd:integration-tests / TEST-23-UNIT-FILE TIMEOUT 1800.52s killed by signal 9 SIGKILL
```
- move scripts from test/units/ to the test specific units directory,
- drop meaningless true from silent-success.service,
- call journalctl from the same bash invocation of echo.
No functional change, just refactoring and preparation for the next
commit.
TEST-64-UDEV-STORAGE: wait for partition devices being created before calling udevadm trigger
For some reasons, kernel or sfdisk once remove the created partitions
and recreated them. And if 'udevadm trigger' triggers devices currently
being removed, the udevd does not receive the triggered events, and the
command stuck.
```
[ 33.150452] TEST-64-UDEV-STORAGE.sh[546]: + sfdisk --wipe=always /dev/md/mdmirpar
[ 33.478336] systemd-udevd[442]: md127: Device is queued (SEQNUM=2163, ACTION=change)
[ 33.480153] kernel: md127: p1 p2 p3
[ 33.483772] systemd-udevd[442]: md127p1: Device is queued (SEQNUM=2164, ACTION=add)
[ 33.483914] systemd-udevd[442]: md127p2: Device is queued (SEQNUM=2165, ACTION=add)
[ 33.484999] systemd-udevd[442]: md127p3: Device is queued (SEQNUM=2166, ACTION=add)
[ 33.485564] systemd-udevd[442]: md127: Received inotify event of watch handle 164.
[ 33.503016] TEST-64-UDEV-STORAGE.sh[546]: + SYSTEMD_LOG_LEVEL=debug
[ 33.503016] TEST-64-UDEV-STORAGE.sh[546]: + timeout 30 udevadm trigger --settle --parent-match /dev/md/mdmirpar
[ 33.485905] systemd-udevd[442]: Successfully forked off '(udev-synth)' as PID 3208.
[ 33.486067] systemd-udevd[442]: md127: Removing watch handle 164.
[ 33.489035] systemd-udevd[442]: md127p1: Device is queued (SEQNUM=2167, ACTION=remove)
[ 33.489048] systemd-udevd[442]: Received inotify event about removal of watch handle 164.
[ 33.489507] systemd-udevd[442]: md127p2: Device is queued (SEQNUM=2168, ACTION=remove)
[ 33.496298] systemd-udevd[442]: md127p3: Device is queued (SEQNUM=2169, ACTION=remove)
[ 33.500628] systemd-udevd[442]: md127: Device is queued (SEQNUM=2170, ACTION=change)
[ 33.502355] systemd-udevd[442]: md127p1: Device is queued (SEQNUM=2171, ACTION=add)
[ 33.509371] TEST-64-UDEV-STORAGE.sh[3211]: md127: Triggered device with action 'change'.
[ 33.509371] TEST-64-UDEV-STORAGE.sh[3211]: md127p1: Triggered device with action 'change'.
[ 33.509371] TEST-64-UDEV-STORAGE.sh[3211]: md127p2: Triggered device with action 'change'.
[ 33.512532] systemd-udevd[442]: md127: Device is queued (SEQNUM=2172, ACTION=change, UUID=a0b75692-08ad-428a-859b-9ef8772874d7)
[ 33.512666] systemd-udevd[442]: md127p1: Device is queued (SEQNUM=2173, ACTION=change, UUID=4cd75a91-aa5b-4678-878c-0420b6c2e1e9)
[ 33.512796] systemd-udevd[442]: md127p2: Device is queued (SEQNUM=2174, ACTION=add)
[ 33.512910] systemd-udevd[442]: md127p3: Device is queued (SEQNUM=2175, ACTION=add)
[ 33.531834] TEST-64-UDEV-STORAGE.sh[3211]: md127: Got uevent without UUID, ignoring: No such file or directory
[ 33.553563] TEST-64-UDEV-STORAGE.sh[3211]: md127p1: Got uevent without UUID, ignoring: No such file or directory
[ 33.561262] TEST-64-UDEV-STORAGE.sh[3211]: md127p2: Got uevent without UUID, ignoring: No such file or directory
[ 33.562468] TEST-64-UDEV-STORAGE.sh[3211]: md127p2: Got uevent without UUID, ignoring: No such file or directory
[ 33.563143] TEST-64-UDEV-STORAGE.sh[3211]: md127p3: Got uevent without UUID, ignoring: No such file or directory
[ 33.564174] TEST-64-UDEV-STORAGE.sh[3211]: md127p1: Got uevent without UUID, ignoring: No such file or directory
[ 33.567614] TEST-64-UDEV-STORAGE.sh[3211]: md127p3: Got uevent without UUID, ignoring: No such file or directory
[ 33.597750] TEST-64-UDEV-STORAGE.sh[3211]: md127: Got uevent without UUID, ignoring: No such file or directory
[ 33.623522] TEST-64-UDEV-STORAGE.sh[3211]: md127p1: Got uevent without UUID, ignoring: No such file or directory
[ 33.676268] TEST-64-UDEV-STORAGE.sh[3211]: md127p3: Got uevent without UUID, ignoring: No such file or directory
[ 33.686088] TEST-64-UDEV-STORAGE.sh[3211]: md127p2: Got uevent without UUID, ignoring: No such file or directory
```
Let's wait for partition devices being actually created, and wait for
all queued events being processed. Then, call 'udevadm trigger'.
TEST-10-MOUNT: wait for userspace mount options being loaded
When a device is mounted with userspace options such as _netdev, even when the mount event source is
triggered, only /proc/self/mountinfo may be updated, and /run/mount/utab may not be updated yet.
Hence, the mount unit may be created/updated without the userspace options. In that case, the mount
event source will be retriggered when /run/mount/utab is updated, and the mount unit will be updated
again with the userspace options. Typically, the window between the two calls is very short, but when
the mount event source is ratelimited after the first event, processing the second event may be delayed
about 1 secound. Hence, here we need to wait for a while.
By adding a debugging logs in mount_setup_unit(), the userspace mount is
not obtained in the first event, and the second event is delayed by the ratelimit.
```
[ 20.023086] H TEST-10-MOUNT.sh[446]: + mount -t ext4 -o _netdev /dev/loop1p1 /tmp/deptest
[ 20.026255] H kernel: EXT4-fs (loop1p1): mounted filesystem c1fa00ea-2ba8-46b2-9002-2ac997f4cda9 r/w with ordered data mode. Quota mode: none.
[ 20.026537] H TEST-10-MOUNT.sh[446]: + timeout 10 bash -c 'until systemctl -q is-active tmp-deptest.mount; do sleep .1; done'
[ 20.032293] H systemd[1]: tmp-deptest.mount: mount_setup_unit: proc: yes, netdev: no
[ 20.035978] H systemd[1]: Unit blockdev@dev-loop1p1.target has alias blockdev@.target.
[ 20.039765] H systemd[1]: tmp-deptest.mount: Changed dead -> mounted
[ 20.046598] H systemd[1]: Event source 0x7c73093e05e0 (mount-monitor-dispatch) entered rate limit state.
```
test: move testcase_dependencies() to TEST-10-MOUNT
TEST-60-MOUNT_RATELIMIT is run on nspawn by default, and currently run
on vm only on arch mkosi. Let's move the test case to new TEST-10-MOUNT,
which always run on vm.