git.ipfire.org Git - thirdparty/systemd.git/log

unit-name: introduce "strict" mode for unit name mangling

unit_name_mangle_with_suffix() is quite benevolent by default and allows
the unit to "transition" into a different unit type than what's
requested via its suffix argument. For example, calling
unit_name_mangle_with_suffix() with "/foo/bar" as a unit name and
".service" as a suffix would give you "foo-bar.mount", without any
warning or error.

This could then lead to a quite confusing errors in certain situations:

~# systemd-run --remain-after-exit --unit /foo/bar true
Failed to start transient service unit: Cannot set property RemainAfterExit, or unknown property.

Given we can't change the default behaviour of
unit_name_mangle_with_suffix() as some parts of systemd already depend
on its "benevolence" (like systemctl), let's introduce a new flag -
UNIT_NAME_MANGLE_STRICT - that checks if the mangled/resolved unit
name's suffix matches the requested one and errors out if not.

With the flag used throughout systemd-run's code, the error in the above
case is now a bit more clear:

~# build/systemd-run --remain-after-exit --unit /foo/bar true
Path "/foo/bar" resolves to unit type "mount", but "service" is expected as unit.
Failed to mangle unit name: Invalid argument

Resolves: #39996

unit-name: use FLAGS_SET() more

core: derive restrict-fsaccess initramfs_s_dev offset from skeleton (#42705)

Fixes #42689.

basic: add assert() when doing pointer deref

Lennart reminded me in [1] that we need to add assert() in functions
that do pointer access. For the simple `*p` pointer dereferences
we even have an automatic coccinelle script that ensures that as
part of the automatic code checks.

However for deref in the `p->` style this is not supported right
now and adding it to coccinelle is hard because its too slow for
this kind of check. So I created a (slightly messy) tree-sitter
python script to see how many asserts we are currently missing.

This commit is the result of running it over the `src/basic`
dir and fixing the flagged issues. I plan to tidy it up and
add it to the checks too but this is orthogonal to this commit.

[1] https://github.com/systemd/systemd/pull/42360#discussion_r3426964562

Assorted coverity fixes (#42738)

Coverity is back online, and it's not happy

sd-future: drop redundant branch in test reader fiber

Both the error and the success path returned (int) n, so the check was
a no-op. Return the value directly.

CID#1660095

Follow-up for 7bc793e21f2d4bf67bd311545270bc515fe63ad9

core: actually sort the parsed LUO session list

The strv_sort() call sat after a for (;;) loop whose only exits are
return statements inside the loop, so it never ran.

CID#1660125

Follow-up for 82b8615463c306f8f7eeaec13600c89a7bbef151

dhcp-message-dump: guard against negative option type before indexing

dhcp_option_type_from_code() returns _DHCP_OPTION_TYPE_INVALID (-EINVAL)
for the PAD and END option codes, and dump_dhcp_option_one() uses the
returned value directly as an index into the functions[] table. Those
codes are excluded by an assert() at the top of the function, but
assert() compiles down to __builtin_unreachable() under NDEBUG, so a
negative array index read is reachable there (and trips static
analyzers). Bail out explicitly on the error return.

CID#1660105

Follow-up for 149adb2fdce0d9a40f9332ecb1a48a486fce5194

hostname-setup: avoid O(N^2) string building in wildcard substitution

Building the result one char at a time via strextendn() is O(N^2)
because each call rescans and reallocs the buffer. With lines up to
LONG_LINE_MAX this caused a timeout in fuzz-hostname-setup. Use
GREEDY_REALLOC_APPEND to make it linear.

Fixes https://github.com/systemd/systemd/issues/42713

sysupdate: do a varlink callout to a ready when completing an update, and hook bootctl install, pcrlock and sysext refresh into it (#42365)

hwdb: map Brazilian ThinkPad T14 Gen 1 slash key to KEY_RO

On Lenovo ThinkPad T14 Gen 1 AMD model 20UES5TQ00 with the Brazilian
keyboard, the physical slash/question key reports as KEY_RIGHTCTRL.

This keyboard layout has no physical Right Ctrl key in that position. The
key after Space is AltGr, then PrtSc, then the slash/question key. Map the
AT keyboard scancode 0x9d to KEY_RO, matching the ABNT slash/question key
used by Brazilian keyboard layouts.

Verified with evtest:

Event: type 4 (EV_MSC), code 4 (MSC_SCAN), value 9d
Event: type 1 (EV_KEY), code 97 (KEY_RIGHTCTRL), value 1

After applying the hwdb mapping, the key reports as KEY_RO.

DMI: svnLENOVO:pn20UES5TQ00:pvrThinkPadT14Gen1
AT keyboard scancode: 0x9d

TODO: drop bootctl link + sysupdate integration item

This is now implemented: sysupdate calls out to the
/run/systemd/sysupdate/notify/ Varlink directory on completion, and bootctl
binds a socket there that links a UKI plus extras staged below
/var/lib/systemd/uki/ (with .v/ vpick support) via "bootctl link-auto".

test: verify bootctl link-auto and io.systemd.BootControl.LinkAuto

Add a TEST-87 testcase exercising "bootctl link-auto" and the equivalent
io.systemd.BootControl.LinkAuto() Varlink method: a UKI plus extras are staged
below the search directories and we assert the kernel and sidecar resources
are linked into $BOOT. Covered: plain kernel.efi + extras.d/, versioned
kernel.efi.v/ and extras .v/ resolved via vpick, directory priority
(/etc wins over /run), the no-op case when nothing is staged, and the Varlink
method including its empty reply when there is nothing to link.

test: verify sysupdate invokes the notification callout directory

Extend TEST-72-SYSUPDATE with a check that, after a successful update,
systemd-sysupdate connects to every socket linked into
/run/systemd/sysupdate/notify/ and invokes
io.systemd.SysUpdate.Notify.OnCompletedUpdate(). A tiny recorder socket is
hooked into that directory; it captures the request and replies with success.
We assert the recorded call carries the expected method, version and resource
list, and that a subsequent no-op update emits no notification.

systemd-boot-update: condition on UEFI

Our boot loader logic only supports UEFI, hence let's condition the
updater on it.

sysext: refresh sysexts and confexts on completed system update

Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
sysext Varlink server. systemd-sysext provides a single Varlink service
covering both the sysext and confext image classes, so one notification
refreshes both (equivalent to "systemd-sysext refresh" plus
"systemd-confext refresh"). Hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-sysext.socket,
enabled by default via the preset.

bootctl: add link-auto/LinkAuto and auto-link on completed system update

Add a "bootctl link-auto" verb and a matching io.systemd.BootControl.LinkAuto()
Varlink method that behave exactly like "bootctl link" / Link(), except that
the UKI and extra resources are discovered automatically instead of being
passed in. The following directories are searched, in decreasing priority:
/etc/systemd/uki/, /run/systemd/uki/, /var/lib/systemd/uki/ (where
systemd-sysupdate stages downloaded resources), /usr/local/lib/systemd/uki/
and /usr/lib/systemd/uki/.

  - the UKI is taken from kernel.efi, or the best version in kernel.efi.v/
    (resolved via vpick, without honouring boot-counting suffixes), from the
    highest-priority directory that has one;
  - extra resources are picked up from extras.d/, matching *.sysext.raw,
    *.confext.raw and *.cred, each either as a plain file or as a versioned
    *.v/ directory resolved via vpick, combined across all directories with
    higher-priority directories winning on conflicts.

Everything is resolved relative to the pinned root directory fd. Files passed
via --extra= on the command line are linked in addition to the auto-discovered
ones.

Also bind io.systemd.SysUpdate.Notify.OnCompletedUpdate() in the boot control
Varlink server, which simply does the same as LinkAuto(), and hook a socket
into /run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-bootctl.socket
(enabled by default via the preset) so a freshly downloaded kernel is linked
into $BOOT automatically after a sysupdate run.

pcrlock: recompute PCR policy on completed system update

Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
pcrlock Varlink server and hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-pcrlock.socket,
enabled by default via the preset. When sysupdate signals a completed
update, we unconditionally re-run make-policy, since the set of measured
components may have changed.

sysupdate: notify hook subscribers after a successful update

Define a new io.systemd.SysUpdate.Notify Varlink interface with a single
OnCompletedUpdate() method, and after sysupdate successfully installs an
update, invoke that method on every socket linked into
/run/systemd/sysupdate/notify/ via varlink_execute_directory(). This
gives other components a hook to react to applied updates (e.g. recompute
a TPM policy, link a freshly downloaded kernel, refresh extensions).

The notification carries the component name, the installed version and the
list of updated resources (transfer id + on-disk path). Subscribers are
free to ignore the parameters and just treat the call as a trigger.

Setting SYSTEMD_SYSUPDATE_FORCE_NOTIFY=1 forces the notification to be sent
even when no update was applied (in which case no resource list is included),
so follow-up work can be triggered unconditionally.

Fixes: #35988

vpick: take separate root_fd and dir_fd arguments

Mirror how chaseat() works these days: instead of a single toplevel_fd that
serves as both the root (chroot) boundary and the directory that resolution
starts from, path_pick() now takes a separate root_fd and dir_fd. This lets
callers resolve a path relative to a specific directory fd while confining
symlink and absolute-path resolution to a root directory fd.

All existing callers are updated to pass the same fd for both, preserving
their current behaviour.

units: tag more units correctly with varlink xattrs

These were added in parallel to #42454, hence catch up and add missing
xattrs.

Follow-up for 53fc4c48e7d40293e8f79392e2da91323dd50268

sysupdate: automatically clean up orphaned files after auto-update (#42714)

This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.

Follow-up for: d82e256bb9d151b185a8afec1fcacd8fbe80555c

po: Translated using Weblate (Romanian)

Currently translated at 76.2% (218 of 286 strings)

Co-authored-by: Petru Rebeja <petru@rebeja.eu>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ro/
Translation: systemd/main

sysupdate: automatically clean up orphaned files after auto-update

This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.

Follow-up for: d82e256bb9d151b185a8afec1fcacd8fbe80555c

test-execute: use per-Exec timeout instead of per-service timeout

The previous x2 was still not enough, and the test is still killed often in
slow GHA CI workers, eg:

https://github.com/systemd/systemd/actions/runs/28012425459/job/82908555094?pr=42705

This happens in test units with many commands, so reset the timer when
a command completes and the test advances. The number of Exec
instructions is bounded so this will terminate jobs that are really
stuck anyway.

Follow-up for 3b00327fe6004b03c4a963de3df51998cf0c79b4

core: pin restrict-fsaccess initramfs_s_dev store width to skeleton field

The clear-store in restrict_fsaccess_clear_initramfs_trust() writes a fixed
4 bytes (*(uint32_t *)(p + INITRAMFS_S_DEV_OFF) = 0). INITRAMFS_S_DEV_OFF is
derived from the skeleton, so the offset tracks any field widening, but the
store width does not: were initramfs_s_dev widened (e.g. __u32 -> __u64) in
the BPF program, the store would clear only the low 4 bytes and silently
leave the initramfs trust window partially open. That is exactly the class
of bug the mirror-struct asserts (removed earlier in this branch) guarded
against.

Add a compile-time assert pinning the store width to the skeleton field
width (sizeof_field(typeof_field(struct restrict_fsaccess_bpf, bss[0]),
initramfs_s_dev) == sizeof(uint32_t)), so widening the field fails the build
instead of clearing half of it.

core: derive restrict-fsaccess initramfs_s_dev offset from skeleton

Building with -Dbpf=enabled -Dbpf_compiler=gcc (GCC's BPF backend) fails on
the static assertions in bpf-restrict-fsaccess.c, introduced in 68fe7fa4d6:

  error: static assertion failed:
  "offsetof(struct restrict_fsaccess_bss, initramfs_s_dev) ==
   offsetof(typeof_field(struct restrict_fsaccess_bpf, bss[0]), initramfs_s_dev)"

The hand-written struct restrict_fsaccess_bss lists the BPF .bss globals in
source declaration order and asserts that its layout matches the skeleton's
generated bss struct. bpftool gen skeleton emits that struct from the BTF
.bss DATASEC, whose member order reflects the physical order the compiler
placed the variables, not the source order. clang preserves declaration
order, so the asserts pass; gcc reorders .bss globals, so initramfs_s_dev no
longer sits at offset 0 and the asserts fail.

This is more than a build break: restrict_fsaccess_clear_initramfs_trust()
clears initramfs_s_dev by mmap()ing the .bss map and storing 0 at a hardcoded
offset 0. Under the gcc layout that store would clobber the wrong global,
silently leaving the initramfs trust window open after switch_root instead of
closing it. The asserts were correctly catching this.

Fix it by deriving the offset from the generated skeleton instead of a mirror
struct: drop struct restrict_fsaccess_bss and the four field-order
assert_cc()s, take INITRAMFS_S_DEV_OFF from the skeleton's bss struct
(offsetof(typeof_field(struct restrict_fsaccess_bpf, bss[0]),
initramfs_s_dev)), and store at p + INITRAMFS_S_DEV_OFF. The offset is a
compile-time constant, so clang (offset 0) is unchanged while gcc tracks the
real layout. A retained assert_cc() documents the 4-byte alignment the
single-store atomicity relies on.

Fixes: #42689

sd-varlink: mark varlink sockets via xattrs (#42454)

Linux 7.0 added the ability to mark socket inodes with xattrs. Let's use
that to clearly mark all our Varlink sockets as being varlink related.
This is then used to implement a very useful new command "varlinkctl
list-sockets" which lists all varlink entrypoint sockets marked this
way.

By marking not just the entrypoint inodes but also the connection
sockets properly, we can one day add an ebpf based "varlinkctl trace"
command that watches varlink sockets for traffic. but that's material
for a later PR.

test: skip fdstore tests if test-fdstore is not available

When the test suite is run in the "standalone" mode, the minimal
container might not contain the test-fdstore binary that's needed for a
couple of tests. Since installing systemd-tests into the minimal
container pulls in a lot of other dependencies, let's just skip the
affected tests instead to avoid this.

update TODO

man: document sd_varlink_server_listen_address() and friends

tree-wide: relax access mode of private Varlink sockets a bit

units: tag all .varlink sockets with the right xattrs

This also relaxes the inode access modes a bit, in case they were set to
0600: we now set the "r" bit too, i.e. use 0644. This is beneficial
since it permits unpriv code to read the xattrs of the entrypoints
(which require read access). Note that in order to be able to connect()
to a socket inode you need write access, hence this shouldn't compromise
security in any way.

varlinkctl: add 'list-sockets' verb

sd-netlink: beef up sock-diag code a bit

Let's make it useful to enumerate AF_UNIX sockets.

bpf-restrict-fsaccess: move STAT_DEV_TO_KERNEL into generic code

We want to reuse it when processing sock-diag messages, hence let's
generalize this.

core: add socket xattr settings for socket unit

varlinkctl: port to new help-util.[ch] apis

sd-varlink: mark varlink sockets and entrypoint inodes as varlink via xattrs

socket-util: add new helper socket_xattr_supported()

xattr-util: use empty_to_null() where appropriate

confidential-virt: fixes to detection and reporting (#42697)

tpm2-setup: call DLOPEN_TPM2 to add dependency and fail immediately if not present

tpm2-setup requires both libcrypto and the tpm2-tss libraries, but so
far it only directly dlopen'ed libcrypto, with a clear error on startup
if missing, and a dependency added via dlopen notes.
Do the same for the tpm2-tss dlopens, to get a clear error and the
required dependencies.

journal: expose last 10 high priority logs as metrics (#42621)

This commit exposes the last 10 high priority logs as metrics so that
the systemd-report reports them. The entries are reported as
`io.systemd.Journal.HighPriorityMessage` and include all fields that are
printable as strings.

This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal

core: fix assertion when inactive unit pulled in by try-restart and start at the same time

With EnqueueUnitJobMany(), one anchor can collapse to NOP (inactive
unit + try-restart) while another anchor pulls that same unit in as a
regular start/restart job, leaving a NOP and a regular job in one
unit's transaction list, hitting an assert:

#11 0x00007f3fd2a446dc in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
     function=<optimized out>) at ./assert/assert.c:127
#12 0x00007f3fd326e872 in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at ../src/core/job.c:428
#13 0x00007f3fd32e5641 in job_type_merge_and_collapse (a=0x7ffc7dda2430, b=<optimized out>, u=0x557bb11434c0)
     at ../src/core/job.c:523
#14 0x00007f3fd335e4b3 in transaction_ensure_mergeable (tr=tr@entry=0x557bb0f6d150,
     matters_to_anchor=matters_to_anchor@entry=true, e=e@entry=0x7ffc7dda33e0) at ../src/core/transaction.c:241
#15 0x00007f3fd3360242 in transaction_merge_jobs (tr=0x557bb0f6d150, e=0x7ffc7dda33e0)
     at ../src/core/transaction.c:273
#16 transaction_activate (tr=0x557bb0f6d150, m=0x557bb0dd9c10, mode=JOB_REPLACE, affected_jobs=0x0, e=0x7ffc7dda33e0)
     at ../src/core/transaction.c:797
#17 0x00007f3fd33091ed in manager_add_jobs (m=<optimized out>, type=<optimized out>, names=<optimized out>,
     reload_if_possible=false, mode=JOB_REPLACE, extra_flags=0, affected_jobs=0x0, reterr_error=0x7ffc7dda33e0,
     ret_jobs=0x557bb0fe8790) at ../src/core/manager.c:2386

Follow-up for 7d3b32daef3125e70dd3f1689fb563a06b0c6753

various measurement-related fixes (#42698)

growfs: downgrade dependency on libcryptsetup to optional

growfs actually gracefully skips when cryptsetup fails or is
missing already, and it is only necessary when the device is
a LUKS device anyway. Downgrade from required ro recommended.

Follow-up for b0ede9f9eebf3f5507e6b3cef9e1de33af7cea68

confidential-virt: fix comment regarding vmm.c location

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

TODO: remove "10 most recent emergency message as metrics" todo

journal: expose last 10 high priority logs as metrics

This commit exposes the last 10 high priority logs as metrics
so that the systemd-report reports them. The entries are
reported as `io.systemd.Journal.HighPriorityMessage` and
include all field as the new METRIC_FAMILY_TYPE_OBJECT.

Individual fields from a journal entry that are unprintable
(invalid utf-8) are skipped.

This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal

shared: add OUTPUT_SKIP_UNPRINTABLE to log-show

This commit adds a new OUTPUT_SKIP_UNPRINTABLE to the OutputFlags
and adds code in `update_json_data` and `json_escape` to honor it.

When set all json fields that have unprintable data will be skipped
and `null` is send instead.

metrics: add METRIC_FAMILY_TYPE_OBJECT type

We will need a way to send journal entries as metrics. Those are already
json objects. So Lennart suggested to introduce a new type
METRIC_FAMILY_TYPE_OBJECT that does this. This commit implements
his suggestion.

boot: read the TDX CPUID leaf unconditionally

vmm.c carries the confidential-VM detection used by sd-boot/sd-stub.
Its detect_tdx() had the same dead guard as the userspace copy: it
gated the 0x21 read on CPUID_GET_HIGHEST_FUNCTION (0x80000000, the
extended max function), which is always >= 0x80000000, so the guard
never held.

Mirror the userspace fix: read leaf 0x21 directly and rely on the
IntelTDX signature, matching the kernel. An out-of-range CPUID leaf
returns the highest basic leaf's data (no fault), and 0x21 is a
synthetic TDX leaf whose presence need not be reflected in the max
basic function, so it must not be gated on it.

Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

confidential-virt: read the TDX CPUID leaf unconditionally

detect_tdx() guarded the read of the TDX enumeration leaf (0x21, a
standard leaf) with CPUID_GET_HIGHEST_FUNCTION (0x80000000), which
returns the highest *extended* function. eax is therefore always
>= 0x80000000, so the "eax < 0x21" guard never held and the leaf was
read unconditionally anyway.

Drop the guard rather than re-gate it on the basic max function
(leaf 0), and read 0x21 directly, relying on the IntelTDX signature
compare. This matches the kernel, which reads the leaf unconditionally
on purpose: an out-of-range CPUID leaf returns the highest basic leaf's
data (no fault, per the Intel SDM), and 0x21 is a synthetic TDX leaf
whose presence need not be reflected in the reported max basic function,
so gating the read on it risks missing a genuine TDX guest. With no
guard the Hyper-V isolation fallback (Azure TDX guests have 0x21
blocked) also stays reachable.

Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

confidential-virt: treat an unreadable SEV MSR as confidential

msr() returned 0 on failure, indistinguishable from a real MSR value of
0. With /dev/cpu/0/msr unavailable (e.g. the msr module not loaded in the
initrd), detect_sev() read 0 and reported a genuine SEV/-ES/-SNP guest as
CONFIDENTIAL_VIRTUALIZATION_NONE.

That inverts the firmware-credential trust gate: import_credentials_*()
skip fw_cfg/SMBIOS credentials only when detect_confidential_virtualization()
is > 0 ("don't trust firmware in confidential VMs"). A false NONE makes a
confidential guest trust and import credentials injected by the untrusted
hypervisor.

msr() now returns a negative errno, and detect_sev() assumes plain SEV when
the MSR is unreadable but CPUID already advertised SEV under a hypervisor,
so the gate still trips.

The conservative branch only fires when CPUID already advertised SEV, i.e.
for a guest the hypervisor marked SEV-capable. QEMU gates that CPUID leaf on
the SEV launch object and does not expose it to ordinary guests even under
-cpu host, so it does not misfire for non-confidential guests. Were a
hypervisor to expose the bit anyway the outcome is fail-safe (we only
decline to trust firmware-supplied data); nothing in-tree branches on the
specific SEV tier.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

pcrextend: refuse empty measurement over Varlink

vl_method_extend() accepted an empty text/data value and measured an
empty word, bypassing the empty-word refusal the CLI path already
enforces. Measured words are joined with ":" in the record, so an empty
word is ambiguous. Reject it.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

tpm2-util: refuse NvPCR extend when the NV index is gone

tpm2_index_to_handle() returns 0 with a NULL handle when the NV index is
not present on the TPM. tpm2_nvpcr_extend_bytes() only checked for r < 0,
so a tombstoned NvPCR (anchor file present, NV slot cleared out from under
us) passed the NULL handle to tpm2_extend_nvpcr_nv_index() and aborted the
process via its assert(). Handle r == 0 explicitly, as the other
tpm2_index_to_handle() callers already do.

The newly introduced -ENODEV is mapped together with -ENOENT to the
io.systemd.PCRExtend.NoSuchNvPCR varlink error.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

imds: expose imds info fields also as metrics (#42409)

report: add systemd-report-sign-tsm backend (#42683)

Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.

A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.

core: add method to enqueue multiple jobs in a single call (#42182)

Currently only a single job for a single unit can be enqueued
atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.

Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.

test: add coverage for multi-unit transactions

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

portablectl: use new EnqueueUnitJobMany() when available

systemctl: use new EnqueueUnitJobMany() when available

Fixes https://github.com/systemd/systemd/issues/7877
Replaces https://github.com/systemd/systemd/pull/7947

homed: fix min_free tracking in manager_rebalance_calculate()

min_free is supposed to track the minimum free space across all home
directories to scale the next rebalance interval. However, it was
incorrectly assigned h->rebalance_size (the home's current total
allocation) instead of new_free (the remaining allocatable space).

This caused the rebalance interval to be computed from allocation sizes
rather than free space, so a nearly-full home would not trigger the
shorter intervals it should, delaying response to low-space conditions.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

repart: Sort the partition list by partition offset (#42488)

Currently the partition list is ordered like this: First come the
partitions that exist as definition files (could be pre-existing
partitions or could be new ones), then come the pre-existing partitions
that aren't matched to a definition file.

This ordering is visible to the user when we print our partition table,
and it doesn't really make sense from a UX perspective: Partition tables
are usually either presented in order of the partition indices, or in
order of the partition offsets. Arguably the latter would be nicer here,
since the visualization below is already ordered by physical offsets.

So reorder the list after we assigned the new partitions to their
respective free areas, according to the physical offset (or, for
partitions to newly create, the order that we will allocate them in).

Another potential upside of this is that we could rely on the partition
order in the code now more, too.

To ensure it keeps working, also add a test in the integration tests for
it.

Screenshot before:
<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-07"
src="https://github.com/user-attachments/assets/7f24b527-7d79-49c4-916b-52faa892d4eb"
/>

Screenshot after:

<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-16"
src="https://github.com/user-attachments/assets/4505ec5e-cab4-4ac1-95f0-b5af3991509e"
/>

imds: use help-util.h helpers for --help output

Convert the --help text of systemd-imds and systemd-imdsd to the common
help_cmdline()/help_abstract()/help_section()/help_man_page_reference()
helpers, for a uniform output style across tools.

imds: expose instance metadata as an io.systemd.Metrics provider

When systemd-imds is invoked as a Varlink service (via the new
systemd-imds-metrics.socket), it now acts as an io.systemd.Metrics
provider for systemd-report. It connects to systemd-imdsd over the
existing io.systemd.InstanceMetadata interface to acquire the real
data and re-exposes the detected cloud vendor plus the well-known
hostname, region, zone and public IPv4/IPv6 fields as metrics in the
io.systemd.InstanceMetadata.* namespace.

The metrics logic lives entirely on the client side
(imds-tool-metrics.c); systemd-imdsd is unchanged. Each metric is
acquired on demand with a blocking call to the daemon, benefiting from
its local cache. Fields that are unset or unsupported by the vendor are
simply omitted.

The metrics socket is statically enabled into sockets.target.wants/.

imds: fix logging

Follow our coding style rules and make functions that log about most
erros log about all errors.

oci-util: fix and harden oci_registry_is_valid()

- Pass colon+1 (port string) instead of s (hostname) to safe_atou16,
  so host:port registries are no longer always rejected.
- Switch to safe_atou16_full() with base-10 and strict flags to reject
  non-decimal port forms (hex, octal, leading whitespace, sign prefix)
  that would produce malformed URL authorities.
- Reject empty host explicitly via isempty() guard (covers both NULL
  and empty-string input), and guard colon == n to reject ':port' form,
  since dns_name_is_valid('') == 1 (DNS root) would otherwise accept
  empty host as valid.
- Wrap overlong line to fit 109-column limit.
- Add test coverage for oci_registry_is_valid().

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

Add handling for '-1' when parsing vsock CID (#42654)

Currently `systemd-ssh-generator` supports
`systemd.ssh_listen=vsock::22` and aliases the "empty CID" towards
`VMADDR_CID_ANY`. VMADDR_CID_ANY is -1, so it's confusing from a user
experience that `systemd.ssh_listen=vsock:-1:22` isn't supported.

report: add systemd-report-sign-tsm backend

A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

shared: add configfs-tsm attestation report helper

Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

Translations update from Fedora Weblate (#42699)

Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).

Current translation status:

![Weblate translation
status](https://translate.fedoraproject.org/widget/systemd/main/horizontal-auto.svg)

po: Translated using Weblate (Romanian)

Currently translated at 74.4% (213 of 286 strings)

Co-authored-by: Fedora Weblate user 1831 <atony076@users.noreply.translate.fedoraproject.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ro/
Translation: systemd/main

po: Translated using Weblate (Korean)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: 김인수 <simmon@nplob.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ko/
Translation: systemd/main

veritysetup: don't measure root hash signature after unsigned fallback

verb_attach() falls back to unsigned activation (crypt_activate_by_volume_key)
when signed activation fails, but still passed the signature to
pcrextend_verity_now(). The signer is parsed out of the (unverified)
signature and folded into the dm_verity NvPCR measurement, making an
unsigned fallback indistinguishable from a genuinely signed activation to
an attester. Only measure the signature when signed activation succeeded.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

manager: make systemd+executor a multicall binary

Allow systemd-executor to be compiled into a single binary.
The existing -Dlink-executor-shared=true|false is extended to also
allow -Dlink-executor-shared=single (*). The new mode is opt-in,
to allow experimentation and introduce this smoothly.

This saves a little space, but not as much as I expected:
$ ls -l build/{systemd,systemd-executor} build-new/systemd
-rwxr-xr-x 1 zbyszek zbyszek 631520 May 25 22:44 build/systemd
-rwxr-xr-x 1 zbyszek zbyszek 670464 May 25 22:44 build/systemd-executor
-rwxr-xr-x 1 zbyszek zbyszek 1214488 May 25 22:45 build-new/systemd
(This is with -Dbuildtype=debugoptimized -Db_lto=true).
The combined binary is slightly smaller than the sum of the separate
ones, but not much. In both cases, the binaries are linked to
libsystemd-core which is 10MB, so the size of the binaries themselves
doesn't make much of a difference. The executor needs exec-invoke.c
which is huge and not shared with anything else.

Longer term, I want to allow systemd to be linked statically. In
that case, having systemd-executor separate would be very painful.
So the option to use a multicall binary will be necessary.

Previously, we stored the resolved path to systemd-executor and
used it argv[0]. I don't think this was useful. After all, normally
we would use the non-resolved original path as argv[0]. So that
part is dropped, and the resolved path is only logged, but
"systemd-executor" is always used as argv[0]. This makes the
multicall binary work reliably, no matter what the actual file
name is.

(*) This means that compat as the commandline level is maintained:
'meson setup build -Dlink-executor-shared=true …' works as before.
Unfortunately, when using an existing build directory, meson chokes
on the type change and refuses to reconfigure the directory or change
the option or do anything useful. I think meson is DTWT here, but
this is hard to fix. So the build directory probably needs to be
recreated.

sysupdate: keep database of installed files/patterns, and use to GC them (#42646)

Transfer files might come and go, components might be enabled and
disabled. Patterns might change. Let's keep track of what we install, so
that we can automatically gc everything no longer owned by any enabled
transfer.

machine-tags: extend syntax to support key/value pairs (#42618)

This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.

resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)

Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.

No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).

Also a few minor changes:

- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`

systemctl: add --kernel-cmdline-reuse option

kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.

ci: add test-case for new cleanup logic

sysupdate: port to new help-util.[ch] apis

As usual for stuff we touch, let's modernize the --help texts to our new
APIs.

sysupdate: introduce "installdb" that keeps track of installed resources

Let's make sure we keep track of any file we drop into the system via a
database in /var/. This database is implemented based on symlinks, i.e.
reuses the fs as a simple database. Given the database most likely will
have <= 10 entries only (as we store *patterns* of installed file paths in
them, not the file paths themselves), this should be very efficient.

For implementation details see comments at top of
src/sysupdate/sysupdate-cleanup.c.

sysupdate: some smaller clean-ups

Nothing earth shattering, just some minor tweaks.

sysupdate: split out component validation/enumeration into sysupdate-util.[ch]

Just some refactoring.

The code is slightly updated, for example it now uses string_is_safe().
But mostly this just splits out code from one large file to a smaller
one.

string-util: introduce STRING_FILENAME_PART flag for string_is_safe()

Whenever we are validating a string that shall appear in a filename
eventually we want to use filename_part_is_valid() rather than file
filename_is_valid(). Let's add explicit support for that to
string_is_safe(), since it's actually a really common case.

recurse-dir: optionally, only enumerate dentries of a specific type

At various places we filter directory enumerations by inode type. Let's
add explicit support for that, so that the "struct dirent" array we
return already suppresses them.

This shortens code and makes things more robust.

sha256: add sha256_direct_hex() helper

A various places we need a SHA256SUM of something as a string. Let's add
a simple helper for this that does this generically.

core: add method to enqueue multiple jobs in a single call

Currently only a single job for a single unit can be enqueued atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.

Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.

Fixes https://github.com/systemd/systemd/issues/8102

Co-authored-by: Michal Koutný <mkoutny@suse.com>

btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed

Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.

An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.

Fixes https://github.com/systemd/systemd/issues/42674

report-basic, networkd: add Version, KernelTimestamp, Address metrics (#42315)

This PR adds some more useful metrics:
- io.systemd.Network.Address
- io.systemd.Basic.KernelTimestamp.{Realtime,Monotonic}
- io.systemd.Basic.Version

ssl-util: support OpenSSL 4 (#42676)

OpenSSL 4 broke ABI, so we need to look for both SONAMEs.

Follow-up for
https://github.com/systemd/systemd/commit/ccdd42351f79cbb9c2e034a96280a1ded40a2f95

Fixes https://github.com/systemd/systemd/issues/42675

resolve: fix transaction leak in dns_transaction_new() error path

hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

resolved: load libcrypto/libssl lazily on first use and make them optional

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

ssl-util: add cleanup helper for SSL_CTX

journal: add catalog message for missing dlopen dep

log: add log_struct_once macro

Combines log_once and log_struct

tree-wide: Beef up openssl logging

Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.