The search domain limit is already enforced by dns_search_domain_new(),
but in this case it's way too late. Let's enforce it during the first
loop to avoid unnecessary parsing.
Luca Boccassi [Sun, 3 May 2026 21:16:15 +0000 (22:16 +0100)]
test: make TEST-64 mdadm_lvm cleanup robust against reruns
mdadm --zero-superblock only wipes the MD metadata on the underlying
disks, not the LVM PV header that lives in the array data area. When
the VM is restarted and the test re-creates the array with the same
UUID, /dev/md127 exposes the old data including the LVM PV header, so
udev's 69-lvm.rules auto-triggers lvm-activate-mdlvm_vg.service which
races with the test's own pvcreate for exclusive access on /dev/md127.
Wipe the LVM signature off the MD device (and the underlying disks as
a belt-and-braces measure) to avoid the race on re-run, fixing failures
when the VM is rebooted instead of shut down.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Luca Boccassi [Mon, 4 May 2026 11:58:33 +0000 (12:58 +0100)]
semaphore: stop deleting all apt sources
The image configuration was changed and the main sources are
now in a drop-in apt sources files too, so deleting the whole
drop-in directory breaks installing packages. Just delete the
disabled ones and chrome.
Valentin David [Mon, 4 May 2026 08:25:19 +0000 (10:25 +0200)]
core: Open netfilter socket only when needed
On initrds where nfnetlink module is missing, trying to open
a NETLINK_NETFILTER netlink socket takes a lot of time then fails.
This makes boot noticibly slower. Even though probably no
unit in an initrd need netfilter.
So here we delay opening the socket until we know we need it.
Valentin David [Sat, 18 Apr 2026 13:09:00 +0000 (15:09 +0200)]
boot: Try to load UKI from simple filesystem before LoadImage
When the source buffer is NULL, the firmware is supposed to try to load the UKI
with simple filesystem protocol then load file 2 protocol. But it seems
on some versions of AMI, it does not use simple filesystem protocol,
and then fails to load if the ESP was loaded from an El Torito boot
catalog. Trying to load the source buffer from the simple filesystem protocol
protocols seems work around this limitation.
Shim for example, also loads the source buffer before calling LoadImage. So it
seems to be a safe thing to do. We could also maybe in the future use load file
2 protocol if simple filesystem failed in the first place.
test: make TEST-70-TPM2 and TEST-86-MULTI-PROFILE-UKI robust against reruns (#41922)
These tests leave a lot of state around, and when the test is re-run,
for example due to the qemu bug that makes a VM reboot instead of
shutting down, it fails.
Luca Boccassi [Sun, 3 May 2026 15:33:38 +0000 (16:33 +0100)]
test: make TEST-86-MULTI-PROFILE-UKI robust against reruns
When qemu reboots instead of shutting down after the last iteration,
the profile is already set to profile2 but the /root/encrypted.raw is
gone so the test fails. Reset the default boot entry at the end of the
test to make it robust against reruns.
Luca Boccassi [Sun, 3 May 2026 15:23:41 +0000 (16:23 +0100)]
test: make TEST-70-TPM2 robust against reruns
The test leaves a lot of state around, and when the test is re-run,
for example due to the qemu bug that makes a VM reboot instead of
shutting down, it fails.
Do more cleanups in the traps.
[ 162.642175] TEST-70-TPM2.sh[2815]: Calculated public key name: 000b2b66edc3a466e81059286aaf38d09ea42a7a9dcdf6ba3b664c62f0cae4ce4f66
[ 162.642628] TEST-70-TPM2.sh[2815]: PolicyAuthorize calculated digest: 2caa740101f65734d50395d6abc64fa46015d40d1f5de239434578544e592a92
[ 162.643681] TEST-70-TPM2.sh[2815]: Calculated NV index name: 000b439cfa1534815bbe8d33b80c56f5a8d17d36fe94a7782b23a37b50def5fc5eaa
[ 162.645111] TEST-70-TPM2.sh[2815]: PolicyAuthorizeNV calculated digest: 69ee0e89fafe6b9df2cd6a5defbf74aa46cf6d92703e645d463549da4ba5e1a4
[ 162.645407] TEST-70-TPM2.sh[2815]: Combined signed PCR policies and pcrlock policies cannot be calculated offline, currently.
[ 162.649576] TEST-70-TPM2.sh[2815]: Releasing crypt device /dev/loop0 context.
[ 162.652433] TEST-70-TPM2.sh[2815]: Releasing device-mapper backend.
[ 162.653518] TEST-70-TPM2.sh[2815]: Closing read only fd for /dev/loop0.
[ 162.654359] TEST-70-TPM2.sh[2815]: Closing read write fd for /dev/loop0.
[ 162.654786] TEST-70-TPM2.sh[2815]: Failed to encrypt device: Operation not supported
Luca Boccassi [Sat, 2 May 2026 22:18:22 +0000 (23:18 +0100)]
test: make varlink StartTransient checks compatible with jq 1.6
The new "varlinkctl --more StartTransient" subtest pipes a JSON-SEQ
stream of multiple records into "jq --seq -e ...". CentOS 9
ships jq 1.6, where -e only inspects the last input record's output:
when the trailing record (the final reply) doesn't match the
"select()" filter, jq exits non-zero even though earlier records
match, so the test fails.
Use --slurp which collapses the records into an array first and
returns a single bool.
Simon Lucido [Mon, 20 Apr 2026 15:05:27 +0000 (17:05 +0200)]
core: add ReloadCount to Manager and bump on successful reload
Introduce a counter that tracks how many configuration reloads have
been successfully completed by the manager. The increment lives in
manager_reload() right after the "point of no return", so failed
reload attempts that bail out earlier (e.g. during serialization)
do not bump the counter.
It is accessible as a new ReloadCount property to
org.freedesktop.systemd1.Manager (D-Bus) and ReloadCount to
io.systemd.Manager.Describe (Varlink).
Also add an integration test for ReloadCount
that verifies that the new ReloadCount property increments by one per
daemon-reload, accumulates correctly across multiple reloads, and that
D-Bus and Varlink return identical values. Also tests that the counter
reset after a reexec.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Simon Lucido <simonlucido@meta.com>
Yu Watanabe [Sat, 2 May 2026 13:31:03 +0000 (22:31 +0900)]
socket-util: introduce tos_to_priority()
This maps from TOS, which can be used for setsockopt(IPPROTO_IP, IP_TOS),
to socket priority, which can be used for setsockopt(SOL_SOCKET, SO_PRIORITY).
With this, we can set priority like the following:
```
uint8_t tos = IPTOS_CLASS_CS6;
setsockopt_int(fd, IPPROTO_IP, IP_TOS, tos);
setsockopt_int(fd, SOL_SOCKET, SO_PRIORITY, tos_to_priority(tos));
```
The Hub for these headsets uses the following
USB entries:
Bus 007 Device 002: ID 0451:2036 Texas Instruments, Inc. TUSB2036 Hub
Bus 007 Device 003: ID 1038:1290 SteelSeries ApS Arctis Pro Wireless
Bus 007 Device 004: ID 1038:1294 SteelSeries ApS Arctis Pro Wireless
dbus: limit the number of env variables to something reasonable, vol. 2
Turns out we can utilize this limit at a couple more places, so let's
move the previously defined limit constant to env-util.h and use it to
guard a couple more D-Bus methods. Also, bump it a bit, given it's meant
to be a safety cap that can't be hit in valid scenarios.
bootctl: rework/modernize "unlink" and add Varlink API for it
Among other things this changes tracking of the location of resources
during GC from using the BootEntrySource enum rather than a path, since
we have that and it is more efficient and easier to grok.
* 1302f123d9 Restrict wildcard for new files
* a6d0098d10 Install new files for upstream build
* ce07fd7616 d/t/boot-and-services: use coreutils tunable in apparmor test (LP: #2125614)
Yaping Li [Wed, 29 Apr 2026 22:17:22 +0000 (15:17 -0700)]
report: report user and system CPU time per cgroup
Extend io.systemd.CGroup.CpuUsage from a single per-unit nanosecond
counter to three rows distinguished by a "type" field of "total",
"user", or "system". The values come from cpu.stat's usage_usec,
user_usec and system_usec keys, read in a single keyed-attribute
fetch and cached on each CGroupInfo so each scrape only opens
cpu.stat once per cgroup.
options: get rid of "on_error" parameter to FOREACH_OPTION
I am really not a fan of full code lines passed to macros as parameters.
Let's get rid of the 3rd parameter of FOREACH_OPTION() hence:
1. Let's return errors just as a regular value (though a negative one),
that can be handled via a OPTION_ERROR case statement for the switch.
This normalizes handling of the error, just like any other event
returned by the option parser.
2. In order to avoid exploding the amount of boilerplate in each use
(that just propagates the error on OPTION_ERROR), let's then
introduce an explicit FOREACH_OPTION_OR_RETURN(), that returns from
the calling function on its own (and makes that clear in the name).
Together this cleans up, normalizes the logic and shortens the code.
dns-question: limit the number of questions per query
Let's cap the number of question each query can have to something
reasonable - 128 questions per query should be more than enough for any
real-world scenario.
fundamental/cleanup: add CLEANUP_ELEMENTS() and DEFINE_POINTER_ARRAY_CLEAR_FUNC()
DEFINE_POINTER_ARRAY_CLEAR_FUNC() generates a helper of the form
helper_array_clear(T *array, size_t n) that drops each element but does
not free the array itself, parallel to DEFINE_POINTER_ARRAY_FREE_FUNC()
for cases where the array has automatic storage duration.
CLEANUP_ELEMENTS() pairs with these helpers to provide a _cleanup_-like
attribute for fixed-size arrays: the bound is taken from ELEMENTSOF(),
and the helper is invoked across the elements at scope exit. Compared to
CLEANUP_ARRAY(), the storage is neither freed nor zeroed.
Migrate various logic across the tree over to the new macros.
sd-device: use DEFINE_POINTER_ARRAY_CLEAR_FUNC() for sd_device_unref_array_clear()
Replace the local device_unref_many() helper with the macro-generated
equivalent.
format-table: switch help-table arrays to CLEANUP_ELEMENTS()
Generate table_unref_array_clear() via DEFINE_POINTER_ARRAY_CLEAR_FUNC()
and convert the help-table arrays in bootctl, cryptenroll, nspawn,
repart and vmspawn to CLEANUP_ELEMENTS(). The arrays no longer need a
trailing NULL slot, so the size matches ELEMENTSOF() of the groups
array.
firewall-util: switch netlink message arrays to CLEANUP_ELEMENTS()
Generate sd_netlink_message_unref_array_clear() via
DEFINE_POINTER_ARRAY_CLEAR_FUNC() in place of the NULL-terminated
sd_netlink_message_unref_many(), and convert the two stack arrays of
sd_netlink_message pointers to CLEANUP_ELEMENTS().
Dan Anderson [Thu, 30 Apr 2026 02:53:10 +0000 (22:53 -0400)]
Improve error logging for fstat failure
Small hygiene fix. r must be >= 0 as per the prior statement (otherwise we would have returned). This is really only going to be r == 0, which means return r; is return 0; I'm updating this to use log_debug_errno
Samuel Dainard [Tue, 28 Apr 2026 15:57:26 +0000 (15:57 +0000)]
binfmt-util: handle ELOOP/EACCES from automount in read-only bind mounts
When /proc is bind-mounted read-only (common in mock/Koji buildroots,
containers, and other sandboxed environments), opening
/proc/sys/fs/binfmt_misc returns ELOOP if it is an automount point
that cannot be triggered in the read-only context.
Currently binfmt_mounted_and_writable() only handles ENOENT, so ELOOP
propagates as an error. This causes test-binfmt-util to fail with
SIGABRT and disable_binfmt() to log a spurious warning at shutdown.
Treat ELOOP and EACCES the same as ENOENT: binfmt_misc is not usably
available, return false.
Note: PR #37006 (merged April 2025) addressed ELOOP in the xstatfsat()
path, but the open() call in binfmt_mounted_and_writable() remained
unhandled.
blockdev-list: fix per-element leak in block_device_array_free() (#41869)
FOREACH_ARRAY declares 'i' as the iterator but the body passed 'd' (the
array base) to block_device_done(). Since mfree() leaves the field NULL
after the first call, element 0 is freed repeatedly while elements
1..N-1 leak their node, symlinks strv, model, vendor and subsystem.
The bug predates the sanitizer-instrumented callers. PR #41776's new
systemd-storage-block daemon runs blockdev_list() under ASan/LSan in
TEST-87-AUX-UTILS-VM and exposes it (15 allocs / 804 bytes leaked per
ListVolumes request). The fix also benefits repart and blockdev_list's
internal CLEANUP_ARRAY cleanup.
volume: add an "io.systemd.StorageProvider" IPC API that is supposed to be used by vmspawn/nspawn/pid1 to provide storage volumes in a generic fashion (#41776)
BindPath= in unit files, and --bind= in nspawn/vmspawn doesn't really
cut it to connect arbitrary storage infra to it. Let's do something
about it, and implement a simple, light-weight API for acquiring an fd
to a storage volume. Benefits:
1. the interface can be implemented by anyone, connecting anything to
vmspawn/nspawn/service management
2. very lose coupling: just bind a socket into a well-known dir, done
3. mounting can happen on-demand
shared/options: add new helper option_parser_get_arg
option_parser_next_arg() is renamed to option_parser_peek_next_arg()
to match option_parser_consume_next_arg().
A new helper is added option_parser_get_arg(…, n). It is a common pattern
to only need a single arg, and getting an array and extracting a single
item from it is too verbose.
It comes with a really thorough test suite matching our currently level
of testing of systemd-boot (read: there is none, I ask you to trust me,
Claude, and your review on this one)...
boot: load extra files for UKIs into memory and register as initrds
This generates on-the-fly cpio initrds from 'extra' resources declared
in Type #1 entries and installs them via the Linux initrd protocol so
that they get passed to the Linux kernel.
The PR to measure into is closely associated with where we place a
resource in the initrd cpios. Hence, let's also track it in CpioTarget,
thus simplifying our function parameter lists that way.
TODO: track StorageProvider follow-ups, sketch a NetworkProvider sibling
Records the still-missing StorageProvider integrations (nspawn,
vmspawn, service-manager BindVolume=) and replaces the now-obsolete
generic "storage API via varlink" entry with a NetworkProvider
proposal modelled on it.
test: add integration test for storagectl and storage providers
VM-only test that exercises both shipped providers through storagectl:
verifies the well-known sockets exist, lists providers/volumes/
templates, creates and acquires volumes from each template
(sparse-file, allocated-file, directory, subvolume), attaches a loop
device to cover the block provider, and exercises the mount.storage
helper.
CLI for inspecting and using storage providers. Scans
/run/systemd/io.systemd.StorageProvider/ (or the user-mode equivalent)
for AF_UNIX sockets and talks to each one over Varlink. Verbs:
"volumes" lists volumes across all providers, "templates" lists
supported creation templates, "providers" lists the endpoints
themselves.
Also installed as a mount.storage helper, so
'mount -t storage PROVIDER:VOLUME /mnt' (or 'mount -t storage.<fstype>'
to put a fresh filesystem on a block volume) acquires the volume and
mounts it. Ships with bash/zsh completions and a man page.
Second StorageProvider implementation, exposing regular files and
directories from a backing filesystem. In system mode the backing
directory is /var/lib/storage/, in user mode $XDG_STATE_HOME/storage/;
entries with a .volume suffix are exposed, with the inode type
determining whether the volume is reported as reg, dir or (via
symlinked/bind-mounted device node) blk.
Unlike the block provider, this one supports creating volumes
on-demand from a small set of built-in templates: sparse-file,
allocated-file, directory and subvolume.
First implementation of io.systemd.StorageProvider, exposing all block
devices known to udev (disks, partitions, dm nodes, …) as volumes of
type "blk". Names are picked from stable /dev/mapper and /dev/disk/by-*
symlinks; content-derived identifiers (by-uuid, by-label, …) are
intentionally avoided for security. Volume creation is not supported by
this backend.
Socket-activated via /run/systemd/io.systemd.StorageProvider/block.
Also adds shared storage-util.[ch] (VolumeType / CreateMode helpers)
that subsequent providers reuse.
Generic Varlink API for services that hand out file descriptors to
storage volumes. Three methods: Acquire() returns an fd for a named
volume (optionally creating it from a template), ListVolumes()
enumerates available volumes, ListTemplates() enumerates supported
creation templates. Volume types follow kernel inode-type naming:
blk (block device), reg (regular file), dir (directory).
Intent is that multiple providers can sit behind AF_UNIX sockets in a
well-known directory and be consumed uniformly by nspawn, vmspawn,
the service manager (BindVolume=) and similar tools.
Merge the two blocks adding tests, since there seems to be
no obvious reason to have two separate blocks, as they both
contain tests from the same libraries.
sd-json: stop printing debug messages about extension fields
The intent was good, but we now print two or three of those messages
for each report metrics received on the wire. If the json object is
extensible, then it's all good and we don't need to inundate the user
with this trivial information. (And the message also sounds like
something is wrong or unexpected, when it totally isn't.)
...
(string):1:73: Unrecognized object field 'object', assuming extension.
(string):1:89: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"virbr0","value":"degraded-carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:83: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"lo","value":"carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:79: Unrecognized object field 'value', assuming extension.
json-stream: Received message: {"parameters":{"name":"io.systemd.Network.CarrierState","object":"wlp0s20f3","value":"carrier"},"continues":true}
(string):1:66: Unrecognized object field 'object', assuming extension.
(string):1:86: Unrecognized object field 'value', assuming extension.
...
As is often the case, in this case because of alignment, we are actually
not saving any space. With the bitfield we are using one bit of the 8 bytes
allocated, and without the bitfield we are using 8 bits of that.
But we're paying a price in generated code, at every access site to the
field:
Michael Vogt [Wed, 29 Apr 2026 06:20:56 +0000 (08:20 +0200)]
core: add io.systemd.Unit.StartTransient() to the varlink API (#41583)
This commit adds a simple version of io.systemd.Unit.StartTransient
for varlink. It is similar to the dbus version, but there is a key
difference:
1. Instead of building the unit from key/value properties it
takes a structured json object "UnitContext" with a "Service" field
inside.
It is also only implementing a minimal set of what can be done with a
service.
2. No aux units (for now)
3. When called with --more the varlink socket can notify about
state changes depending on the notify{Job,Unit}Changes parameter
This aligns to the json objects/format from
https://github.com/systemd/systemd/pull/39391
and to show how the format can be shared it adds a new
(minimal) `ServiceContext` that is now part of
`io.systemd.Unit.List()`.