Convert systemd-analyze to option and verb macros (#41945)
I thought that this would require some bigger changes, but it turns out
that the existing functionality is good enough with some minor
adjustments if used appropriately.
The behaviour of help_section() is changed to simplify all callers.
systemd-dissect: do not fail dissection on LUKS v1 partitions
partition_is_luks2_integrity() was returning -EINVAL when it
encountered a non-LUKS2 header (e.g. LUKS v1), which caused the
caller to abort the entire disk dissection. A LUKS v1 partition
simply isn't LUKS2-with-integrity, so return 0 instead and let
dissection continue normally.
Luca Boccassi [Tue, 5 May 2026 08:52:29 +0000 (09:52 +0100)]
test: skip TEST-70-TPM2.nvpcr check if pcrextend socket inactive
systemd-dissect --mtree calls io.systemd.PCRExtend over Varlink to extend
the verity NvPCR after activation, and the test then diffs the measure
log to find the new entry. But systemd-pcrextend.socket has
ConditionSecurity=measured-os, which fails when the firmware did not
initialize PCRs, so the test fails.
Simon Lucido [Mon, 4 May 2026 09:40:41 +0000 (11:40 +0200)]
core/varlink-metrics: expose ReloadCount as a metric
Add ReloadCount to the io.systemd.Metrics family table so it can be
queried alongside other manager-level metrics via systemd-report.
Also extend the existing integration test to cross-check the value
returned by systemd-report against the D-Bus and Varlink transports
on every assertion.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Simon Lucido <simonlucido@meta.com>
The logic that was tested in the previous commit is used to implement
the behaviour for unit-shell and other verbs without changes.
The compare-versions synopsis is shortened to "V1 [OP] V2" to make the
verb synopsis fit. Unusual capitalizaition of "Command" is changed to
"COMMAND" (it's a replace arg, not a fixed string), and some help
strings are adjusted. The order of options in --help is based on the
existing order in parse_argv(). The old order in --help was mostly
random. I think it might be good to figure out something more rational
here, but I'm leaving that as a separate step.
The urlification of dot(1) in the --help string is lost. It's hard to
do this with the help string being stored in a read-only section.
I think this is not worth the trouble to reimplement in the current
scheme.
resolve: enforce the search domain limit earlier (#41938)
The search domain limit is already enforced by dns_search_domain_new(),
but in this case it's way too late. Let's enforce it during the first
loop to avoid unnecessary parsing.
---
Also, set a similar limit for NTAs - introduce a new constant, since
there's no pre-existing limit. I pulled the value from a thin air since
there's (AFAIK) no mandated maximum/minimum for NTAs, but given they're
supposed to be a manual and _temporary_ workarounds, hopefully 2K of
NTAs will be more than enough (if not, please yell).
Also note: the newly added error messages don't have the trailing "."
and similarly the newly introduced constant doesn't have the "u" suffix
to match the style of the surrounding code (and I didn't want to fix the
surrounding code to make the diff minimal). If this is not desirable,
please also yell.
cryptsetup: avoid a segfault when a keyfile is passed along with a TPM device (#41892)
A segfault is observed when both key_file and tpm2-device are
simultaneously passed to systemd-cryptsetup, e.g.:
systemd-cryptsetup attach test_data /vol /my-pass tpm2-device=auto
The crash appears after commit 5c6aad9 but the flaw in the logic was
pre-existing.
The search domain limit is already enforced by dns_search_domain_new(),
but in this case it's way too late. Let's enforce it during the first
loop to avoid unnecessary parsing.
Luca Boccassi [Mon, 4 May 2026 13:42:03 +0000 (14:42 +0100)]
test: suppress PCR public key auto-loading in TEST-70-TPM2 dditest
The dditest block calls systemd-repart with Encrypt=tpm2 but without
--tpm2-public-key-pcrs=. Since systemd-stub drops
/run/systemd/tpm2-pcr-public-key.pem when booting from a signed UKI
systemd-repart auto-loads it and enrolls a signed PCR policy, and
then systemd-cryptsetup tpm2-device=auto has no matching signature file,
so unlock fails.
--tpm2-public-key= is not enough as the default kicks in then.
Luca Boccassi [Sun, 3 May 2026 21:16:15 +0000 (22:16 +0100)]
test: make TEST-64 mdadm_lvm cleanup robust against reruns
mdadm --zero-superblock only wipes the MD metadata on the underlying
disks, not the LVM PV header that lives in the array data area. When
the VM is restarted and the test re-creates the array with the same
UUID, /dev/md127 exposes the old data including the LVM PV header, so
udev's 69-lvm.rules auto-triggers lvm-activate-mdlvm_vg.service which
races with the test's own pvcreate for exclusive access on /dev/md127.
Wipe the LVM signature off the MD device (and the underlying disks as
a belt-and-braces measure) to avoid the race on re-run, fixing failures
when the VM is rebooted instead of shut down.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Luca Boccassi [Mon, 4 May 2026 11:58:33 +0000 (12:58 +0100)]
semaphore: stop deleting all apt sources
The image configuration was changed and the main sources are
now in a drop-in apt sources files too, so deleting the whole
drop-in directory breaks installing packages. Just delete the
disabled ones and chrome.
Valentin David [Mon, 4 May 2026 08:25:19 +0000 (10:25 +0200)]
core: Open netfilter socket only when needed
On initrds where nfnetlink module is missing, trying to open
a NETLINK_NETFILTER netlink socket takes a lot of time then fails.
This makes boot noticibly slower. Even though probably no
unit in an initrd need netfilter.
So here we delay opening the socket until we know we need it.
TEST-70-TPM2: Test the key_file + tpm2-device= combo
When key_file is passed along with tpm2-device= to systemd-cryptsetup, the
logic is to try the blob as a TPM blob first, and then fall back to trying the
file as a regular key file. Check that this fallback works.
the logic in attach_luks_or_plain_or_bitlk_by_tpm2() tries to process it as a
TPM blob first. This did not work properly because it passes n_blobs=1 to
acquire_tpm2_key(), and the key_file is only read when n_blobs == 0. As a
result, the code ends up calling tpm2_unseal(..., blobs=NULL, n_blobs=1, ...).
Before commit 5c6aad9 ("cryptsetup-tokens: Print tpm2-primary-alg: only when it
is known"), the segfault was not observed because tpm2_unseal() was bailing out
early when primary_alg == 0. However, after that change, it attempts to process
the blob (which is NULL) and crashes.
Fix this logic by passing n_blobs=0 to acquire_tpm2_key() so that it actually
reads the key_file. Additionally, assert 'blobs' in tpm2_unseal() as a
safeguard.
Valentin David [Sat, 18 Apr 2026 13:09:00 +0000 (15:09 +0200)]
boot: Try to load UKI from simple filesystem before LoadImage
When the source buffer is NULL, the firmware is supposed to try to load the UKI
with simple filesystem protocol then load file 2 protocol. But it seems
on some versions of AMI, it does not use simple filesystem protocol,
and then fails to load if the ESP was loaded from an El Torito boot
catalog. Trying to load the source buffer from the simple filesystem protocol
protocols seems work around this limitation.
Shim for example, also loads the source buffer before calling LoadImage. So it
seems to be a safe thing to do. We could also maybe in the future use load file
2 protocol if simple filesystem failed in the first place.
test: make TEST-70-TPM2 and TEST-86-MULTI-PROFILE-UKI robust against reruns (#41922)
These tests leave a lot of state around, and when the test is re-run,
for example due to the qemu bug that makes a VM reboot instead of
shutting down, it fails.
Luca Boccassi [Sun, 3 May 2026 15:33:38 +0000 (16:33 +0100)]
test: make TEST-86-MULTI-PROFILE-UKI robust against reruns
When qemu reboots instead of shutting down after the last iteration,
the profile is already set to profile2 but the /root/encrypted.raw is
gone so the test fails. Reset the default boot entry at the end of the
test to make it robust against reruns.
Luca Boccassi [Sun, 3 May 2026 15:23:41 +0000 (16:23 +0100)]
test: make TEST-70-TPM2 robust against reruns
The test leaves a lot of state around, and when the test is re-run,
for example due to the qemu bug that makes a VM reboot instead of
shutting down, it fails.
Do more cleanups in the traps.
[ 162.642175] TEST-70-TPM2.sh[2815]: Calculated public key name: 000b2b66edc3a466e81059286aaf38d09ea42a7a9dcdf6ba3b664c62f0cae4ce4f66
[ 162.642628] TEST-70-TPM2.sh[2815]: PolicyAuthorize calculated digest: 2caa740101f65734d50395d6abc64fa46015d40d1f5de239434578544e592a92
[ 162.643681] TEST-70-TPM2.sh[2815]: Calculated NV index name: 000b439cfa1534815bbe8d33b80c56f5a8d17d36fe94a7782b23a37b50def5fc5eaa
[ 162.645111] TEST-70-TPM2.sh[2815]: PolicyAuthorizeNV calculated digest: 69ee0e89fafe6b9df2cd6a5defbf74aa46cf6d92703e645d463549da4ba5e1a4
[ 162.645407] TEST-70-TPM2.sh[2815]: Combined signed PCR policies and pcrlock policies cannot be calculated offline, currently.
[ 162.649576] TEST-70-TPM2.sh[2815]: Releasing crypt device /dev/loop0 context.
[ 162.652433] TEST-70-TPM2.sh[2815]: Releasing device-mapper backend.
[ 162.653518] TEST-70-TPM2.sh[2815]: Closing read only fd for /dev/loop0.
[ 162.654359] TEST-70-TPM2.sh[2815]: Closing read write fd for /dev/loop0.
[ 162.654786] TEST-70-TPM2.sh[2815]: Failed to encrypt device: Operation not supported
Luca Boccassi [Sat, 2 May 2026 22:18:22 +0000 (23:18 +0100)]
test: make varlink StartTransient checks compatible with jq 1.6
The new "varlinkctl --more StartTransient" subtest pipes a JSON-SEQ
stream of multiple records into "jq --seq -e ...". CentOS 9
ships jq 1.6, where -e only inspects the last input record's output:
when the trailing record (the final reply) doesn't match the
"select()" filter, jq exits non-zero even though earlier records
match, so the test fails.
Use --slurp which collapses the records into an array first and
returns a single bool.
Simon Lucido [Mon, 20 Apr 2026 15:05:27 +0000 (17:05 +0200)]
core: add ReloadCount to Manager and bump on successful reload
Introduce a counter that tracks how many configuration reloads have
been successfully completed by the manager. The increment lives in
manager_reload() right after the "point of no return", so failed
reload attempts that bail out earlier (e.g. during serialization)
do not bump the counter.
It is accessible as a new ReloadCount property to
org.freedesktop.systemd1.Manager (D-Bus) and ReloadCount to
io.systemd.Manager.Describe (Varlink).
Also add an integration test for ReloadCount
that verifies that the new ReloadCount property increments by one per
daemon-reload, accumulates correctly across multiple reloads, and that
D-Bus and Varlink return identical values. Also tests that the counter
reset after a reexec.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Simon Lucido <simonlucido@meta.com>
Yu Watanabe [Sat, 2 May 2026 13:31:03 +0000 (22:31 +0900)]
socket-util: introduce tos_to_priority()
This maps from TOS, which can be used for setsockopt(IPPROTO_IP, IP_TOS),
to socket priority, which can be used for setsockopt(SOL_SOCKET, SO_PRIORITY).
With this, we can set priority like the following:
```
uint8_t tos = IPTOS_CLASS_CS6;
setsockopt_int(fd, IPPROTO_IP, IP_TOS, tos);
setsockopt_int(fd, SOL_SOCKET, SO_PRIORITY, tos_to_priority(tos));
```
The Hub for these headsets uses the following
USB entries:
Bus 007 Device 002: ID 0451:2036 Texas Instruments, Inc. TUSB2036 Hub
Bus 007 Device 003: ID 1038:1290 SteelSeries ApS Arctis Pro Wireless
Bus 007 Device 004: ID 1038:1294 SteelSeries ApS Arctis Pro Wireless
dbus: limit the number of env variables to something reasonable, vol. 2
Turns out we can utilize this limit at a couple more places, so let's
move the previously defined limit constant to env-util.h and use it to
guard a couple more D-Bus methods. Also, bump it a bit, given it's meant
to be a safety cap that can't be hit in valid scenarios.
bootctl: rework/modernize "unlink" and add Varlink API for it
Among other things this changes tracking of the location of resources
during GC from using the BootEntrySource enum rather than a path, since
we have that and it is more efficient and easier to grok.
* 1302f123d9 Restrict wildcard for new files
* a6d0098d10 Install new files for upstream build
* ce07fd7616 d/t/boot-and-services: use coreutils tunable in apparmor test (LP: #2125614)
Yaping Li [Wed, 29 Apr 2026 22:17:22 +0000 (15:17 -0700)]
report: report user and system CPU time per cgroup
Extend io.systemd.CGroup.CpuUsage from a single per-unit nanosecond
counter to three rows distinguished by a "type" field of "total",
"user", or "system". The values come from cpu.stat's usage_usec,
user_usec and system_usec keys, read in a single keyed-attribute
fetch and cached on each CGroupInfo so each scrape only opens
cpu.stat once per cgroup.
options: get rid of "on_error" parameter to FOREACH_OPTION
I am really not a fan of full code lines passed to macros as parameters.
Let's get rid of the 3rd parameter of FOREACH_OPTION() hence:
1. Let's return errors just as a regular value (though a negative one),
that can be handled via a OPTION_ERROR case statement for the switch.
This normalizes handling of the error, just like any other event
returned by the option parser.
2. In order to avoid exploding the amount of boilerplate in each use
(that just propagates the error on OPTION_ERROR), let's then
introduce an explicit FOREACH_OPTION_OR_RETURN(), that returns from
the calling function on its own (and makes that clear in the name).
Together this cleans up, normalizes the logic and shortens the code.
dns-question: limit the number of questions per query
Let's cap the number of question each query can have to something
reasonable - 128 questions per query should be more than enough for any
real-world scenario.
fundamental/cleanup: add CLEANUP_ELEMENTS() and DEFINE_POINTER_ARRAY_CLEAR_FUNC()
DEFINE_POINTER_ARRAY_CLEAR_FUNC() generates a helper of the form
helper_array_clear(T *array, size_t n) that drops each element but does
not free the array itself, parallel to DEFINE_POINTER_ARRAY_FREE_FUNC()
for cases where the array has automatic storage duration.
CLEANUP_ELEMENTS() pairs with these helpers to provide a _cleanup_-like
attribute for fixed-size arrays: the bound is taken from ELEMENTSOF(),
and the helper is invoked across the elements at scope exit. Compared to
CLEANUP_ARRAY(), the storage is neither freed nor zeroed.
Migrate various logic across the tree over to the new macros.
sd-device: use DEFINE_POINTER_ARRAY_CLEAR_FUNC() for sd_device_unref_array_clear()
Replace the local device_unref_many() helper with the macro-generated
equivalent.
format-table: switch help-table arrays to CLEANUP_ELEMENTS()
Generate table_unref_array_clear() via DEFINE_POINTER_ARRAY_CLEAR_FUNC()
and convert the help-table arrays in bootctl, cryptenroll, nspawn,
repart and vmspawn to CLEANUP_ELEMENTS(). The arrays no longer need a
trailing NULL slot, so the size matches ELEMENTSOF() of the groups
array.
firewall-util: switch netlink message arrays to CLEANUP_ELEMENTS()
Generate sd_netlink_message_unref_array_clear() via
DEFINE_POINTER_ARRAY_CLEAR_FUNC() in place of the NULL-terminated
sd_netlink_message_unref_many(), and convert the two stack arrays of
sd_netlink_message pointers to CLEANUP_ELEMENTS().
Dan Anderson [Thu, 30 Apr 2026 02:53:10 +0000 (22:53 -0400)]
Improve error logging for fstat failure
Small hygiene fix. r must be >= 0 as per the prior statement (otherwise we would have returned). This is really only going to be r == 0, which means return r; is return 0; I'm updating this to use log_debug_errno
Samuel Dainard [Tue, 28 Apr 2026 15:57:26 +0000 (15:57 +0000)]
binfmt-util: handle ELOOP/EACCES from automount in read-only bind mounts
When /proc is bind-mounted read-only (common in mock/Koji buildroots,
containers, and other sandboxed environments), opening
/proc/sys/fs/binfmt_misc returns ELOOP if it is an automount point
that cannot be triggered in the read-only context.
Currently binfmt_mounted_and_writable() only handles ENOENT, so ELOOP
propagates as an error. This causes test-binfmt-util to fail with
SIGABRT and disable_binfmt() to log a spurious warning at shutdown.
Treat ELOOP and EACCES the same as ENOENT: binfmt_misc is not usably
available, return false.
Note: PR #37006 (merged April 2025) addressed ELOOP in the xstatfsat()
path, but the open() call in binfmt_mounted_and_writable() remained
unhandled.
blockdev-list: fix per-element leak in block_device_array_free() (#41869)
FOREACH_ARRAY declares 'i' as the iterator but the body passed 'd' (the
array base) to block_device_done(). Since mfree() leaves the field NULL
after the first call, element 0 is freed repeatedly while elements
1..N-1 leak their node, symlinks strv, model, vendor and subsystem.
The bug predates the sanitizer-instrumented callers. PR #41776's new
systemd-storage-block daemon runs blockdev_list() under ASan/LSan in
TEST-87-AUX-UTILS-VM and exposes it (15 allocs / 804 bytes leaked per
ListVolumes request). The fix also benefits repart and blockdev_list's
internal CLEANUP_ARRAY cleanup.
volume: add an "io.systemd.StorageProvider" IPC API that is supposed to be used by vmspawn/nspawn/pid1 to provide storage volumes in a generic fashion (#41776)
BindPath= in unit files, and --bind= in nspawn/vmspawn doesn't really
cut it to connect arbitrary storage infra to it. Let's do something
about it, and implement a simple, light-weight API for acquiring an fd
to a storage volume. Benefits:
1. the interface can be implemented by anyone, connecting anything to
vmspawn/nspawn/service management
2. very lose coupling: just bind a socket into a well-known dir, done
3. mounting can happen on-demand
shared/options: add new helper option_parser_get_arg
option_parser_next_arg() is renamed to option_parser_peek_next_arg()
to match option_parser_consume_next_arg().
A new helper is added option_parser_get_arg(…, n). It is a common pattern
to only need a single arg, and getting an array and extracting a single
item from it is too verbose.
It comes with a really thorough test suite matching our currently level
of testing of systemd-boot (read: there is none, I ask you to trust me,
Claude, and your review on this one)...