Frantisek Sumsal [Wed, 17 Jun 2026 12:09:43 +0000 (14:09 +0200)]
unit-name: introduce "strict" mode for unit name mangling
unit_name_mangle_with_suffix() is quite benevolent by default and allows
the unit to "transition" into a different unit type than what's
requested via its suffix argument. For example, calling
unit_name_mangle_with_suffix() with "/foo/bar" as a unit name and
".service" as a suffix would give you "foo-bar.mount", without any
warning or error.
This could then lead to a quite confusing errors in certain situations:
~# systemd-run --remain-after-exit --unit /foo/bar true
Failed to start transient service unit: Cannot set property RemainAfterExit, or unknown property.
Given we can't change the default behaviour of
unit_name_mangle_with_suffix() as some parts of systemd already depend
on its "benevolence" (like systemctl), let's introduce a new flag -
UNIT_NAME_MANGLE_STRICT - that checks if the mangled/resolved unit
name's suffix matches the requested one and errors out if not.
With the flag used throughout systemd-run's code, the error in the above
case is now a bit more clear:
~# build/systemd-run --remain-after-exit --unit /foo/bar true
Path "/foo/bar" resolves to unit type "mount", but "service" is expected as unit.
Failed to mangle unit name: Invalid argument
Michael Vogt [Tue, 23 Jun 2026 15:34:18 +0000 (17:34 +0200)]
basic: add assert() when doing pointer deref
Lennart reminded me in [1] that we need to add assert() in functions
that do pointer access. For the simple `*p` pointer dereferences
we even have an automatic coccinelle script that ensures that as
part of the automatic code checks.
However for deref in the `p->` style this is not supported right
now and adding it to coccinelle is hard because its too slow for
this kind of check. So I created a (slightly messy) tree-sitter
python script to see how many asserts we are currently missing.
This commit is the result of running it over the `src/basic`
dir and fixing the flagged issues. I plan to tidy it up and
add it to the checks too but this is orthogonal to this commit.
Luca Boccassi [Wed, 24 Jun 2026 18:02:06 +0000 (19:02 +0100)]
dhcp-message-dump: guard against negative option type before indexing
dhcp_option_type_from_code() returns _DHCP_OPTION_TYPE_INVALID (-EINVAL)
for the PAD and END option codes, and dump_dhcp_option_one() uses the
returned value directly as an index into the functions[] table. Those
codes are excluded by an assert() at the top of the function, but
assert() compiles down to __builtin_unreachable() under NDEBUG, so a
negative array index read is reachable there (and trips static
analyzers). Bail out explicitly on the error return.
Luca Boccassi [Wed, 24 Jun 2026 11:24:37 +0000 (12:24 +0100)]
hostname-setup: avoid O(N^2) string building in wildcard substitution
Building the result one char at a time via strextendn() is O(N^2)
because each call rescans and reallocs the buffer. With lines up to
LONG_LINE_MAX this caused a timeout in fuzz-hostname-setup. Use
GREEDY_REALLOC_APPEND to make it linear.
LucasTavaresA [Wed, 24 Jun 2026 12:21:29 +0000 (09:21 -0300)]
hwdb: map Brazilian ThinkPad T14 Gen 1 slash key to KEY_RO
On Lenovo ThinkPad T14 Gen 1 AMD model 20UES5TQ00 with the Brazilian
keyboard, the physical slash/question key reports as KEY_RIGHTCTRL.
This keyboard layout has no physical Right Ctrl key in that position. The
key after Space is AltGr, then PrtSc, then the slash/question key. Map the
AT keyboard scancode 0x9d to KEY_RO, matching the ABNT slash/question key
used by Brazilian keyboard layouts.
Verified with evtest:
Event: type 4 (EV_MSC), code 4 (MSC_SCAN), value 9d
Event: type 1 (EV_KEY), code 97 (KEY_RIGHTCTRL), value 1
After applying the hwdb mapping, the key reports as KEY_RO.
DMI: svnLENOVO:pn20UES5TQ00:pvrThinkPadT14Gen1
AT keyboard scancode: 0x9d
TODO: drop bootctl link + sysupdate integration item
This is now implemented: sysupdate calls out to the
/run/systemd/sysupdate/notify/ Varlink directory on completion, and bootctl
binds a socket there that links a UKI plus extras staged below
/var/lib/systemd/uki/ (with .v/ vpick support) via "bootctl link-auto".
test: verify bootctl link-auto and io.systemd.BootControl.LinkAuto
Add a TEST-87 testcase exercising "bootctl link-auto" and the equivalent
io.systemd.BootControl.LinkAuto() Varlink method: a UKI plus extras are staged
below the search directories and we assert the kernel and sidecar resources
are linked into $BOOT. Covered: plain kernel.efi + extras.d/, versioned
kernel.efi.v/ and extras .v/ resolved via vpick, directory priority
(/etc wins over /run), the no-op case when nothing is staged, and the Varlink
method including its empty reply when there is nothing to link.
test: verify sysupdate invokes the notification callout directory
Extend TEST-72-SYSUPDATE with a check that, after a successful update,
systemd-sysupdate connects to every socket linked into
/run/systemd/sysupdate/notify/ and invokes
io.systemd.SysUpdate.Notify.OnCompletedUpdate(). A tiny recorder socket is
hooked into that directory; it captures the request and replies with success.
We assert the recorded call carries the expected method, version and resource
list, and that a subsequent no-op update emits no notification.
sysext: refresh sysexts and confexts on completed system update
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
sysext Varlink server. systemd-sysext provides a single Varlink service
covering both the sysext and confext image classes, so one notification
refreshes both (equivalent to "systemd-sysext refresh" plus
"systemd-confext refresh"). Hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-sysext.socket,
enabled by default via the preset.
bootctl: add link-auto/LinkAuto and auto-link on completed system update
Add a "bootctl link-auto" verb and a matching io.systemd.BootControl.LinkAuto()
Varlink method that behave exactly like "bootctl link" / Link(), except that
the UKI and extra resources are discovered automatically instead of being
passed in. The following directories are searched, in decreasing priority:
/etc/systemd/uki/, /run/systemd/uki/, /var/lib/systemd/uki/ (where
systemd-sysupdate stages downloaded resources), /usr/local/lib/systemd/uki/
and /usr/lib/systemd/uki/.
- the UKI is taken from kernel.efi, or the best version in kernel.efi.v/
(resolved via vpick, without honouring boot-counting suffixes), from the
highest-priority directory that has one;
- extra resources are picked up from extras.d/, matching *.sysext.raw,
*.confext.raw and *.cred, each either as a plain file or as a versioned
*.v/ directory resolved via vpick, combined across all directories with
higher-priority directories winning on conflicts.
Everything is resolved relative to the pinned root directory fd. Files passed
via --extra= on the command line are linked in addition to the auto-discovered
ones.
Also bind io.systemd.SysUpdate.Notify.OnCompletedUpdate() in the boot control
Varlink server, which simply does the same as LinkAuto(), and hook a socket
into /run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-bootctl.socket
(enabled by default via the preset) so a freshly downloaded kernel is linked
into $BOOT automatically after a sysupdate run.
pcrlock: recompute PCR policy on completed system update
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
pcrlock Varlink server and hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-pcrlock.socket,
enabled by default via the preset. When sysupdate signals a completed
update, we unconditionally re-run make-policy, since the set of measured
components may have changed.
sysupdate: notify hook subscribers after a successful update
Define a new io.systemd.SysUpdate.Notify Varlink interface with a single
OnCompletedUpdate() method, and after sysupdate successfully installs an
update, invoke that method on every socket linked into
/run/systemd/sysupdate/notify/ via varlink_execute_directory(). This
gives other components a hook to react to applied updates (e.g. recompute
a TPM policy, link a freshly downloaded kernel, refresh extensions).
The notification carries the component name, the installed version and the
list of updated resources (transfer id + on-disk path). Subscribers are
free to ignore the parameters and just treat the call as a trigger.
Setting SYSTEMD_SYSUPDATE_FORCE_NOTIFY=1 forces the notification to be sent
even when no update was applied (in which case no resource list is included),
so follow-up work can be triggered unconditionally.
Mirror how chaseat() works these days: instead of a single toplevel_fd that
serves as both the root (chroot) boundary and the directory that resolution
starts from, path_pick() now takes a separate root_fd and dir_fd. This lets
callers resolve a path relative to a specific directory fd while confining
symlink and absolute-path resolution to a root directory fd.
All existing callers are updated to pass the same fd for both, preserving
their current behaviour.
sysupdate: automatically clean up orphaned files after auto-update (#42714)
This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.
sysupdate: automatically clean up orphaned files after auto-update
This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.
This happens in test units with many commands, so reset the timer when
a command completes and the test advances. The number of Exec
instructions is bounded so this will terminate jobs that are really
stuck anyway.
tunaichao [Wed, 24 Jun 2026 06:01:06 +0000 (14:01 +0800)]
core: pin restrict-fsaccess initramfs_s_dev store width to skeleton field
The clear-store in restrict_fsaccess_clear_initramfs_trust() writes a fixed
4 bytes (*(uint32_t *)(p + INITRAMFS_S_DEV_OFF) = 0). INITRAMFS_S_DEV_OFF is
derived from the skeleton, so the offset tracks any field widening, but the
store width does not: were initramfs_s_dev widened (e.g. __u32 -> __u64) in
the BPF program, the store would clear only the low 4 bytes and silently
leave the initramfs trust window partially open. That is exactly the class
of bug the mirror-struct asserts (removed earlier in this branch) guarded
against.
Add a compile-time assert pinning the store width to the skeleton field
width (sizeof_field(typeof_field(struct restrict_fsaccess_bpf, bss[0]),
initramfs_s_dev) == sizeof(uint32_t)), so widening the field fails the build
instead of clearing half of it.
tunaichao [Tue, 23 Jun 2026 07:45:49 +0000 (15:45 +0800)]
core: derive restrict-fsaccess initramfs_s_dev offset from skeleton
Building with -Dbpf=enabled -Dbpf_compiler=gcc (GCC's BPF backend) fails on
the static assertions in bpf-restrict-fsaccess.c, introduced in 68fe7fa4d6:
The hand-written struct restrict_fsaccess_bss lists the BPF .bss globals in
source declaration order and asserts that its layout matches the skeleton's
generated bss struct. bpftool gen skeleton emits that struct from the BTF
.bss DATASEC, whose member order reflects the physical order the compiler
placed the variables, not the source order. clang preserves declaration
order, so the asserts pass; gcc reorders .bss globals, so initramfs_s_dev no
longer sits at offset 0 and the asserts fail.
This is more than a build break: restrict_fsaccess_clear_initramfs_trust()
clears initramfs_s_dev by mmap()ing the .bss map and storing 0 at a hardcoded
offset 0. Under the gcc layout that store would clobber the wrong global,
silently leaving the initramfs trust window open after switch_root instead of
closing it. The asserts were correctly catching this.
Fix it by deriving the offset from the generated skeleton instead of a mirror
struct: drop struct restrict_fsaccess_bss and the four field-order
assert_cc()s, take INITRAMFS_S_DEV_OFF from the skeleton's bss struct
(offsetof(typeof_field(struct restrict_fsaccess_bpf, bss[0]),
initramfs_s_dev)), and store at p + INITRAMFS_S_DEV_OFF. The offset is a
compile-time constant, so clang (offset 0) is unchanged while gcc tracks the
real layout. A retained assert_cc() documents the 4-byte alignment the
single-store atomicity relies on.
sd-varlink: mark varlink sockets via xattrs (#42454)
Linux 7.0 added the ability to mark socket inodes with xattrs. Let's use
that to clearly mark all our Varlink sockets as being varlink related.
This is then used to implement a very useful new command "varlinkctl
list-sockets" which lists all varlink entrypoint sockets marked this
way.
By marking not just the entrypoint inodes but also the connection
sockets properly, we can one day add an ebpf based "varlinkctl trace"
command that watches varlink sockets for traffic. but that's material
for a later PR.
Frantisek Sumsal [Tue, 23 Jun 2026 19:29:53 +0000 (21:29 +0200)]
test: skip fdstore tests if test-fdstore is not available
When the test suite is run in the "standalone" mode, the minimal
container might not contain the test-fdstore binary that's needed for a
couple of tests. Since installing systemd-tests into the minimal
container pulls in a lot of other dependencies, let's just skip the
affected tests instead to avoid this.
units: tag all .varlink sockets with the right xattrs
This also relaxes the inode access modes a bit, in case they were set to
0600: we now set the "r" bit too, i.e. use 0644. This is beneficial
since it permits unpriv code to read the xattrs of the entrypoints
(which require read access). Note that in order to be able to connect()
to a socket inode you need write access, hence this shouldn't compromise
security in any way.
Luca Boccassi [Mon, 22 Jun 2026 21:48:53 +0000 (22:48 +0100)]
tpm2-setup: call DLOPEN_TPM2 to add dependency and fail immediately if not present
tpm2-setup requires both libcrypto and the tpm2-tss libraries, but so
far it only directly dlopen'ed libcrypto, with a clear error on startup
if missing, and a dependency added via dlopen notes.
Do the same for the tpm2-tss dlopens, to get a clear error and the
required dependencies.
journal: expose last 10 high priority logs as metrics (#42621)
This commit exposes the last 10 high priority logs as metrics so that
the systemd-report reports them. The entries are reported as
`io.systemd.Journal.HighPriorityMessage` and include all fields that are
printable as strings.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
Luca Boccassi [Tue, 23 Jun 2026 10:45:11 +0000 (11:45 +0100)]
core: fix assertion when inactive unit pulled in by try-restart and start at the same time
With EnqueueUnitJobMany(), one anchor can collapse to NOP (inactive
unit + try-restart) while another anchor pulls that same unit in as a
regular start/restart job, leaving a NOP and a regular job in one
unit's transaction list, hitting an assert:
#11 0x00007f3fd2a446dc in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
function=<optimized out>) at ./assert/assert.c:127
#12 0x00007f3fd326e872 in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at ../src/core/job.c:428
#13 0x00007f3fd32e5641 in job_type_merge_and_collapse (a=0x7ffc7dda2430, b=<optimized out>, u=0x557bb11434c0)
at ../src/core/job.c:523
#14 0x00007f3fd335e4b3 in transaction_ensure_mergeable (tr=tr@entry=0x557bb0f6d150,
matters_to_anchor=matters_to_anchor@entry=true, e=e@entry=0x7ffc7dda33e0) at ../src/core/transaction.c:241
#15 0x00007f3fd3360242 in transaction_merge_jobs (tr=0x557bb0f6d150, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:273
#16 transaction_activate (tr=0x557bb0f6d150, m=0x557bb0dd9c10, mode=JOB_REPLACE, affected_jobs=0x0, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:797
#17 0x00007f3fd33091ed in manager_add_jobs (m=<optimized out>, type=<optimized out>, names=<optimized out>,
reload_if_possible=false, mode=JOB_REPLACE, extra_flags=0, affected_jobs=0x0, reterr_error=0x7ffc7dda33e0,
ret_jobs=0x557bb0fe8790) at ../src/core/manager.c:2386
Luca Boccassi [Mon, 22 Jun 2026 23:04:41 +0000 (00:04 +0100)]
growfs: downgrade dependency on libcryptsetup to optional
growfs actually gracefully skips when cryptsetup fails or is
missing already, and it is only necessary when the device is
a LUKS device anyway. Downgrade from required ro recommended.
Michael Vogt [Tue, 16 Jun 2026 14:20:10 +0000 (16:20 +0200)]
journal: expose last 10 high priority logs as metrics
This commit exposes the last 10 high priority logs as metrics
so that the systemd-report reports them. The entries are
reported as `io.systemd.Journal.HighPriorityMessage` and
include all field as the new METRIC_FAMILY_TYPE_OBJECT.
Individual fields from a journal entry that are unprintable
(invalid utf-8) are skipped.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
Michael Vogt [Mon, 22 Jun 2026 05:40:34 +0000 (07:40 +0200)]
metrics: add METRIC_FAMILY_TYPE_OBJECT type
We will need a way to send journal entries as metrics. Those are already
json objects. So Lennart suggested to introduce a new type
METRIC_FAMILY_TYPE_OBJECT that does this. This commit implements
his suggestion.
Paul Meyer [Tue, 23 Jun 2026 10:34:12 +0000 (12:34 +0200)]
boot: read the TDX CPUID leaf unconditionally
vmm.c carries the confidential-VM detection used by sd-boot/sd-stub.
Its detect_tdx() had the same dead guard as the userspace copy: it
gated the 0x21 read on CPUID_GET_HIGHEST_FUNCTION (0x80000000, the
extended max function), which is always >= 0x80000000, so the guard
never held.
Mirror the userspace fix: read leaf 0x21 directly and rely on the
IntelTDX signature, matching the kernel. An out-of-range CPUID leaf
returns the highest basic leaf's data (no fault), and 0x21 is a
synthetic TDX leaf whose presence need not be reflected in the max
basic function, so it must not be gated on it.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Paul Meyer [Thu, 18 Jun 2026 05:44:09 +0000 (07:44 +0200)]
confidential-virt: read the TDX CPUID leaf unconditionally
detect_tdx() guarded the read of the TDX enumeration leaf (0x21, a
standard leaf) with CPUID_GET_HIGHEST_FUNCTION (0x80000000), which
returns the highest *extended* function. eax is therefore always
>= 0x80000000, so the "eax < 0x21" guard never held and the leaf was
read unconditionally anyway.
Drop the guard rather than re-gate it on the basic max function
(leaf 0), and read 0x21 directly, relying on the IntelTDX signature
compare. This matches the kernel, which reads the leaf unconditionally
on purpose: an out-of-range CPUID leaf returns the highest basic leaf's
data (no fault, per the Intel SDM), and 0x21 is a synthetic TDX leaf
whose presence need not be reflected in the reported max basic function,
so gating the read on it risks missing a genuine TDX guest. With no
guard the Hyper-V isolation fallback (Azure TDX guests have 0x21
blocked) also stays reachable.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Paul Meyer [Wed, 17 Jun 2026 14:13:35 +0000 (16:13 +0200)]
confidential-virt: treat an unreadable SEV MSR as confidential
msr() returned 0 on failure, indistinguishable from a real MSR value of
0. With /dev/cpu/0/msr unavailable (e.g. the msr module not loaded in the
initrd), detect_sev() read 0 and reported a genuine SEV/-ES/-SNP guest as
CONFIDENTIAL_VIRTUALIZATION_NONE.
That inverts the firmware-credential trust gate: import_credentials_*()
skip fw_cfg/SMBIOS credentials only when detect_confidential_virtualization()
is > 0 ("don't trust firmware in confidential VMs"). A false NONE makes a
confidential guest trust and import credentials injected by the untrusted
hypervisor.
msr() now returns a negative errno, and detect_sev() assumes plain SEV when
the MSR is unreadable but CPUID already advertised SEV under a hypervisor,
so the gate still trips.
The conservative branch only fires when CPUID already advertised SEV, i.e.
for a guest the hypervisor marked SEV-capable. QEMU gates that CPUID leaf on
the SEV launch object and does not expose it to ordinary guests even under
-cpu host, so it does not misfire for non-confidential guests. Were a
hypervisor to expose the bit anyway the outcome is fail-safe (we only
decline to trust firmware-supplied data); nothing in-tree branches on the
specific SEV tier.
Paul Meyer [Wed, 17 Jun 2026 16:03:55 +0000 (18:03 +0200)]
pcrextend: refuse empty measurement over Varlink
vl_method_extend() accepted an empty text/data value and measured an
empty word, bypassing the empty-word refusal the CLI path already
enforces. Measured words are joined with ":" in the record, so an empty
word is ambiguous. Reject it.
Paul Meyer [Wed, 17 Jun 2026 15:40:18 +0000 (17:40 +0200)]
tpm2-util: refuse NvPCR extend when the NV index is gone
tpm2_index_to_handle() returns 0 with a NULL handle when the NV index is
not present on the TPM. tpm2_nvpcr_extend_bytes() only checked for r < 0,
so a tombstoned NvPCR (anchor file present, NV slot cleared out from under
us) passed the NULL handle to tpm2_extend_nvpcr_nv_index() and aborted the
process via its assert(). Handle r == 0 explicitly, as the other
tpm2_index_to_handle() callers already do.
The newly introduced -ENODEV is mapped together with -ENOENT to the
io.systemd.PCRExtend.NoSuchNvPCR varlink error.
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Luca Boccassi [Mon, 22 Jun 2026 20:35:17 +0000 (21:35 +0100)]
core: add method to enqueue multiple jobs in a single call (#42182)
Currently only a single job for a single unit can be enqueued
atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.
Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.
dongshengyuan [Mon, 22 Jun 2026 05:38:05 +0000 (13:38 +0800)]
homed: fix min_free tracking in manager_rebalance_calculate()
min_free is supposed to track the minimum free space across all home
directories to scale the next rebalance interval. However, it was
incorrectly assigned h->rebalance_size (the home's current total
allocation) instead of new_free (the remaining allocatable space).
This caused the rebalance interval to be computed from allocation sizes
rather than free space, so a nearly-full home would not trigger the
shorter intervals it should, delaying response to low-space conditions.
repart: Sort the partition list by partition offset (#42488)
Currently the partition list is ordered like this: First come the
partitions that exist as definition files (could be pre-existing
partitions or could be new ones), then come the pre-existing partitions
that aren't matched to a definition file.
This ordering is visible to the user when we print our partition table,
and it doesn't really make sense from a UX perspective: Partition tables
are usually either presented in order of the partition indices, or in
order of the partition offsets. Arguably the latter would be nicer here,
since the visualization below is already ordered by physical offsets.
So reorder the list after we assigned the new partitions to their
respective free areas, according to the physical offset (or, for
partitions to newly create, the order that we will allocate them in).
Another potential upside of this is that we could rely on the partition
order in the code now more, too.
To ensure it keeps working, also add a test in the integration tests for
it.
Convert the --help text of systemd-imds and systemd-imdsd to the common
help_cmdline()/help_abstract()/help_section()/help_man_page_reference()
helpers, for a uniform output style across tools.
imds: expose instance metadata as an io.systemd.Metrics provider
When systemd-imds is invoked as a Varlink service (via the new
systemd-imds-metrics.socket), it now acts as an io.systemd.Metrics
provider for systemd-report. It connects to systemd-imdsd over the
existing io.systemd.InstanceMetadata interface to acquire the real
data and re-exposes the detected cloud vendor plus the well-known
hostname, region, zone and public IPv4/IPv6 fields as metrics in the
io.systemd.InstanceMetadata.* namespace.
The metrics logic lives entirely on the client side
(imds-tool-metrics.c); systemd-imdsd is unchanged. Each metric is
acquired on demand with a blocking call to the daemon, benefiting from
its local cache. Fields that are unset or unsupported by the vendor are
simply omitted.
The metrics socket is statically enabled into sockets.target.wants/.
dongshengyuan [Mon, 22 Jun 2026 05:16:32 +0000 (13:16 +0800)]
oci-util: fix and harden oci_registry_is_valid()
- Pass colon+1 (port string) instead of s (hostname) to safe_atou16,
so host:port registries are no longer always rejected.
- Switch to safe_atou16_full() with base-10 and strict flags to reject
non-decimal port forms (hex, octal, leading whitespace, sign prefix)
that would produce malformed URL authorities.
- Reject empty host explicitly via isempty() guard (covers both NULL
and empty-string input), and guard colon == n to reject ':port' form,
since dns_name_is_valid('') == 1 (DNS root) would otherwise accept
empty host as valid.
- Wrap overlong line to fit 109-column limit.
- Add test coverage for oci_registry_is_valid().
Gabriel [Mon, 22 Jun 2026 19:57:03 +0000 (21:57 +0200)]
Add handling for '-1' when parsing vsock CID (#42654)
Currently `systemd-ssh-generator` supports
`systemd.ssh_listen=vsock::22` and aliases the "empty CID" towards
`VMADDR_CID_ANY`. VMADDR_CID_ANY is -1, so it's confusing from a user
experience that `systemd.ssh_listen=vsock:-1:22` isn't supported.
Paul Meyer [Fri, 19 Jun 2026 08:11:52 +0000 (10:11 +0200)]
report: add systemd-report-sign-tsm backend
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
Luca Boccassi [Mon, 22 Jun 2026 18:22:37 +0000 (19:22 +0100)]
Translations update from Fedora Weblate (#42699)
Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).
Paul Meyer [Wed, 17 Jun 2026 15:21:51 +0000 (17:21 +0200)]
veritysetup: don't measure root hash signature after unsigned fallback
verb_attach() falls back to unsigned activation (crypt_activate_by_volume_key)
when signed activation fails, but still passed the signature to
pcrextend_verity_now(). The signer is parsed out of the (unverified)
signature and folded into the dm_verity NvPCR measurement, making an
unsigned fallback indistinguishable from a genuinely signed activation to
an attester. Only measure the signature when signed activation succeeded.
Allow systemd-executor to be compiled into a single binary.
The existing -Dlink-executor-shared=true|false is extended to also
allow -Dlink-executor-shared=single (*). The new mode is opt-in,
to allow experimentation and introduce this smoothly.
This saves a little space, but not as much as I expected:
$ ls -l build/{systemd,systemd-executor} build-new/systemd
-rwxr-xr-x 1 zbyszek zbyszek 631520 May 25 22:44 build/systemd
-rwxr-xr-x 1 zbyszek zbyszek 670464 May 25 22:44 build/systemd-executor
-rwxr-xr-x 1 zbyszek zbyszek 1214488 May 25 22:45 build-new/systemd
(This is with -Dbuildtype=debugoptimized -Db_lto=true).
The combined binary is slightly smaller than the sum of the separate
ones, but not much. In both cases, the binaries are linked to
libsystemd-core which is 10MB, so the size of the binaries themselves
doesn't make much of a difference. The executor needs exec-invoke.c
which is huge and not shared with anything else.
Longer term, I want to allow systemd to be linked statically. In
that case, having systemd-executor separate would be very painful.
So the option to use a multicall binary will be necessary.
Previously, we stored the resolved path to systemd-executor and
used it argv[0]. I don't think this was useful. After all, normally
we would use the non-resolved original path as argv[0]. So that
part is dropped, and the resolved path is only logged, but
"systemd-executor" is always used as argv[0]. This makes the
multicall binary work reliably, no matter what the actual file
name is.
(*) This means that compat as the commandline level is maintained:
'meson setup build -Dlink-executor-shared=true …' works as before.
Unfortunately, when using an existing build directory, meson chokes
on the type change and refuses to reconfigure the directory or change
the option or do anything useful. I think meson is DTWT here, but
this is hard to fix. So the build directory probably needs to be
recreated.
sysupdate: keep database of installed files/patterns, and use to GC them (#42646)
Transfer files might come and go, components might be enabled and
disabled. Patterns might change. Let's keep track of what we install, so
that we can automatically gc everything no longer owned by any enabled
transfer.
Luca Boccassi [Mon, 22 Jun 2026 13:47:20 +0000 (14:47 +0100)]
machine-tags: extend syntax to support key/value pairs (#42618)
This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.
Luca Boccassi [Mon, 22 Jun 2026 13:15:39 +0000 (14:15 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Luca Boccassi [Mon, 22 Jun 2026 13:08:21 +0000 (14:08 +0100)]
Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)
Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.
No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).
Also a few minor changes:
- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`
Luca Boccassi [Sat, 20 Jun 2026 00:05:00 +0000 (01:05 +0100)]
systemctl: add --kernel-cmdline-reuse option
kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.
sysupdate: introduce "installdb" that keeps track of installed resources
Let's make sure we keep track of any file we drop into the system via a
database in /var/. This database is implemented based on symlinks, i.e.
reuses the fs as a simple database. Given the database most likely will
have <= 10 entries only (as we store *patterns* of installed file paths in
them, not the file paths themselves), this should be very efficient.
For implementation details see comments at top of
src/sysupdate/sysupdate-cleanup.c.
string-util: introduce STRING_FILENAME_PART flag for string_is_safe()
Whenever we are validating a string that shall appear in a filename
eventually we want to use filename_part_is_valid() rather than file
filename_is_valid(). Let's add explicit support for that to
string_is_safe(), since it's actually a really common case.
recurse-dir: optionally, only enumerate dentries of a specific type
At various places we filter directory enumerations by inode type. Let's
add explicit support for that, so that the "struct dirent" array we
return already suppresses them.
Luca Boccassi [Thu, 29 Aug 2024 12:17:13 +0000 (13:17 +0100)]
core: add method to enqueue multiple jobs in a single call
Currently only a single job for a single unit can be enqueued atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.
Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.
Luca Boccassi [Sat, 20 Jun 2026 23:11:08 +0000 (00:11 +0100)]
btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed
Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.
An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.
dongshengyuan [Mon, 22 Jun 2026 06:13:11 +0000 (14:13 +0800)]
resolve: fix transaction leak in dns_transaction_new() error path
hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.
Luca Boccassi [Sun, 21 Jun 2026 09:42:02 +0000 (10:42 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Daan De Meyer [Wed, 3 Jun 2026 10:58:02 +0000 (10:58 +0000)]
tree-wide: Beef up openssl logging
Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.