git.ipfire.org Git - thirdparty/systemd.git/log

TODO: drop bootctl link + sysupdate integration item

This is now implemented: sysupdate calls out to the
/run/systemd/sysupdate/notify/ Varlink directory on completion, and bootctl
binds a socket there that links a UKI plus extras staged below
/var/lib/systemd/uki/ (with .v/ vpick support) via "bootctl link-auto".

test: verify bootctl link-auto and io.systemd.BootControl.LinkAuto

Add a TEST-87 testcase exercising "bootctl link-auto" and the equivalent
io.systemd.BootControl.LinkAuto() Varlink method: a UKI plus extras are staged
below the search directories and we assert the kernel and sidecar resources
are linked into $BOOT. Covered: plain kernel.efi + extras.d/, versioned
kernel.efi.v/ and extras .v/ resolved via vpick, directory priority
(/etc wins over /run), the no-op case when nothing is staged, and the Varlink
method including its empty reply when there is nothing to link.

test: verify sysupdate invokes the notification callout directory

Extend TEST-72-SYSUPDATE with a check that, after a successful update,
systemd-sysupdate connects to every socket linked into
/run/systemd/sysupdate/notify/ and invokes
io.systemd.SysUpdate.Notify.OnCompletedUpdate(). A tiny recorder socket is
hooked into that directory; it captures the request and replies with success.
We assert the recorded call carries the expected method, version and resource
list, and that a subsequent no-op update emits no notification.

systemd-boot-update: condition on UEFI

Our boot loader logic only supports UEFI, hence let's condition the
updater on it.

sysext: refresh sysexts and confexts on completed system update

Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
sysext Varlink server. systemd-sysext provides a single Varlink service
covering both the sysext and confext image classes, so one notification
refreshes both (equivalent to "systemd-sysext refresh" plus
"systemd-confext refresh"). Hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-sysext.socket,
enabled by default via the preset.

bootctl: add link-auto/LinkAuto and auto-link on completed system update

Add a "bootctl link-auto" verb and a matching io.systemd.BootControl.LinkAuto()
Varlink method that behave exactly like "bootctl link" / Link(), except that
the UKI and extra resources are discovered automatically instead of being
passed in. The following directories are searched, in decreasing priority:
/etc/systemd/uki/, /run/systemd/uki/, /var/lib/systemd/uki/ (where
systemd-sysupdate stages downloaded resources), /usr/local/lib/systemd/uki/
and /usr/lib/systemd/uki/.

  - the UKI is taken from kernel.efi, or the best version in kernel.efi.v/
    (resolved via vpick, without honouring boot-counting suffixes), from the
    highest-priority directory that has one;
  - extra resources are picked up from extras.d/, matching *.sysext.raw,
    *.confext.raw and *.cred, each either as a plain file or as a versioned
    *.v/ directory resolved via vpick, combined across all directories with
    higher-priority directories winning on conflicts.

Everything is resolved relative to the pinned root directory fd. Files passed
via --extra= on the command line are linked in addition to the auto-discovered
ones.

Also bind io.systemd.SysUpdate.Notify.OnCompletedUpdate() in the boot control
Varlink server, which simply does the same as LinkAuto(), and hook a socket
into /run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-bootctl.socket
(enabled by default via the preset) so a freshly downloaded kernel is linked
into $BOOT automatically after a sysupdate run.

pcrlock: recompute PCR policy on completed system update

Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
pcrlock Varlink server and hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-pcrlock.socket,
enabled by default via the preset. When sysupdate signals a completed
update, we unconditionally re-run make-policy, since the set of measured
components may have changed.

sysupdate: notify hook subscribers after a successful update

Define a new io.systemd.SysUpdate.Notify Varlink interface with a single
OnCompletedUpdate() method, and after sysupdate successfully installs an
update, invoke that method on every socket linked into
/run/systemd/sysupdate/notify/ via varlink_execute_directory(). This
gives other components a hook to react to applied updates (e.g. recompute
a TPM policy, link a freshly downloaded kernel, refresh extensions).

The notification carries the component name, the installed version and the
list of updated resources (transfer id + on-disk path). Subscribers are
free to ignore the parameters and just treat the call as a trigger.

Setting SYSTEMD_SYSUPDATE_FORCE_NOTIFY=1 forces the notification to be sent
even when no update was applied (in which case no resource list is included),
so follow-up work can be triggered unconditionally.

Fixes: #35988

vpick: take separate root_fd and dir_fd arguments

Mirror how chaseat() works these days: instead of a single toplevel_fd that
serves as both the root (chroot) boundary and the directory that resolution
starts from, path_pick() now takes a separate root_fd and dir_fd. This lets
callers resolve a path relative to a specific directory fd while confining
symlink and absolute-path resolution to a root directory fd.

All existing callers are updated to pass the same fd for both, preserving
their current behaviour.

units: tag more units correctly with varlink xattrs

These were added in parallel to #42454, hence catch up and add missing
xattrs.

Follow-up for 53fc4c48e7d40293e8f79392e2da91323dd50268

sysupdate: automatically clean up orphaned files after auto-update (#42714)

This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.

Follow-up for: d82e256bb9d151b185a8afec1fcacd8fbe80555c

po: Translated using Weblate (Romanian)

Currently translated at 76.2% (218 of 286 strings)

Co-authored-by: Petru Rebeja <petru@rebeja.eu>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ro/
Translation: systemd/main

sysupdate: automatically clean up orphaned files after auto-update

This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.

Follow-up for: d82e256bb9d151b185a8afec1fcacd8fbe80555c

test-execute: use per-Exec timeout instead of per-service timeout

The previous x2 was still not enough, and the test is still killed often in
slow GHA CI workers, eg:

https://github.com/systemd/systemd/actions/runs/28012425459/job/82908555094?pr=42705

This happens in test units with many commands, so reset the timer when
a command completes and the test advances. The number of Exec
instructions is bounded so this will terminate jobs that are really
stuck anyway.

Follow-up for 3b00327fe6004b03c4a963de3df51998cf0c79b4

sd-varlink: mark varlink sockets via xattrs (#42454)

Linux 7.0 added the ability to mark socket inodes with xattrs. Let's use
that to clearly mark all our Varlink sockets as being varlink related.
This is then used to implement a very useful new command "varlinkctl
list-sockets" which lists all varlink entrypoint sockets marked this
way.

By marking not just the entrypoint inodes but also the connection
sockets properly, we can one day add an ebpf based "varlinkctl trace"
command that watches varlink sockets for traffic. but that's material
for a later PR.

test: skip fdstore tests if test-fdstore is not available

When the test suite is run in the "standalone" mode, the minimal
container might not contain the test-fdstore binary that's needed for a
couple of tests. Since installing systemd-tests into the minimal
container pulls in a lot of other dependencies, let's just skip the
affected tests instead to avoid this.

update TODO

man: document sd_varlink_server_listen_address() and friends

tree-wide: relax access mode of private Varlink sockets a bit

units: tag all .varlink sockets with the right xattrs

This also relaxes the inode access modes a bit, in case they were set to
0600: we now set the "r" bit too, i.e. use 0644. This is beneficial
since it permits unpriv code to read the xattrs of the entrypoints
(which require read access). Note that in order to be able to connect()
to a socket inode you need write access, hence this shouldn't compromise
security in any way.

varlinkctl: add 'list-sockets' verb

sd-netlink: beef up sock-diag code a bit

Let's make it useful to enumerate AF_UNIX sockets.

bpf-restrict-fsaccess: move STAT_DEV_TO_KERNEL into generic code

We want to reuse it when processing sock-diag messages, hence let's
generalize this.

core: add socket xattr settings for socket unit

varlinkctl: port to new help-util.[ch] apis

sd-varlink: mark varlink sockets and entrypoint inodes as varlink via xattrs

socket-util: add new helper socket_xattr_supported()

xattr-util: use empty_to_null() where appropriate

confidential-virt: fixes to detection and reporting (#42697)

tpm2-setup: call DLOPEN_TPM2 to add dependency and fail immediately if not present

tpm2-setup requires both libcrypto and the tpm2-tss libraries, but so
far it only directly dlopen'ed libcrypto, with a clear error on startup
if missing, and a dependency added via dlopen notes.
Do the same for the tpm2-tss dlopens, to get a clear error and the
required dependencies.

journal: expose last 10 high priority logs as metrics (#42621)

This commit exposes the last 10 high priority logs as metrics so that
the systemd-report reports them. The entries are reported as
`io.systemd.Journal.HighPriorityMessage` and include all fields that are
printable as strings.

This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal

core: fix assertion when inactive unit pulled in by try-restart and start at the same time

With EnqueueUnitJobMany(), one anchor can collapse to NOP (inactive
unit + try-restart) while another anchor pulls that same unit in as a
regular start/restart job, leaving a NOP and a regular job in one
unit's transaction list, hitting an assert:

#11 0x00007f3fd2a446dc in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
     function=<optimized out>) at ./assert/assert.c:127
#12 0x00007f3fd326e872 in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at ../src/core/job.c:428
#13 0x00007f3fd32e5641 in job_type_merge_and_collapse (a=0x7ffc7dda2430, b=<optimized out>, u=0x557bb11434c0)
     at ../src/core/job.c:523
#14 0x00007f3fd335e4b3 in transaction_ensure_mergeable (tr=tr@entry=0x557bb0f6d150,
     matters_to_anchor=matters_to_anchor@entry=true, e=e@entry=0x7ffc7dda33e0) at ../src/core/transaction.c:241
#15 0x00007f3fd3360242 in transaction_merge_jobs (tr=0x557bb0f6d150, e=0x7ffc7dda33e0)
     at ../src/core/transaction.c:273
#16 transaction_activate (tr=0x557bb0f6d150, m=0x557bb0dd9c10, mode=JOB_REPLACE, affected_jobs=0x0, e=0x7ffc7dda33e0)
     at ../src/core/transaction.c:797
#17 0x00007f3fd33091ed in manager_add_jobs (m=<optimized out>, type=<optimized out>, names=<optimized out>,
     reload_if_possible=false, mode=JOB_REPLACE, extra_flags=0, affected_jobs=0x0, reterr_error=0x7ffc7dda33e0,
     ret_jobs=0x557bb0fe8790) at ../src/core/manager.c:2386

Follow-up for 7d3b32daef3125e70dd3f1689fb563a06b0c6753

various measurement-related fixes (#42698)

growfs: downgrade dependency on libcryptsetup to optional

growfs actually gracefully skips when cryptsetup fails or is
missing already, and it is only necessary when the device is
a LUKS device anyway. Downgrade from required ro recommended.

Follow-up for b0ede9f9eebf3f5507e6b3cef9e1de33af7cea68

confidential-virt: fix comment regarding vmm.c location

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

TODO: remove "10 most recent emergency message as metrics" todo

journal: expose last 10 high priority logs as metrics

This commit exposes the last 10 high priority logs as metrics
so that the systemd-report reports them. The entries are
reported as `io.systemd.Journal.HighPriorityMessage` and
include all field as the new METRIC_FAMILY_TYPE_OBJECT.

Individual fields from a journal entry that are unprintable
(invalid utf-8) are skipped.

This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal

shared: add OUTPUT_SKIP_UNPRINTABLE to log-show

This commit adds a new OUTPUT_SKIP_UNPRINTABLE to the OutputFlags
and adds code in `update_json_data` and `json_escape` to honor it.

When set all json fields that have unprintable data will be skipped
and `null` is send instead.

metrics: add METRIC_FAMILY_TYPE_OBJECT type

We will need a way to send journal entries as metrics. Those are already
json objects. So Lennart suggested to introduce a new type
METRIC_FAMILY_TYPE_OBJECT that does this. This commit implements
his suggestion.

boot: read the TDX CPUID leaf unconditionally

vmm.c carries the confidential-VM detection used by sd-boot/sd-stub.
Its detect_tdx() had the same dead guard as the userspace copy: it
gated the 0x21 read on CPUID_GET_HIGHEST_FUNCTION (0x80000000, the
extended max function), which is always >= 0x80000000, so the guard
never held.

Mirror the userspace fix: read leaf 0x21 directly and rely on the
IntelTDX signature, matching the kernel. An out-of-range CPUID leaf
returns the highest basic leaf's data (no fault), and 0x21 is a
synthetic TDX leaf whose presence need not be reflected in the max
basic function, so it must not be gated on it.

Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

confidential-virt: read the TDX CPUID leaf unconditionally

detect_tdx() guarded the read of the TDX enumeration leaf (0x21, a
standard leaf) with CPUID_GET_HIGHEST_FUNCTION (0x80000000), which
returns the highest *extended* function. eax is therefore always
>= 0x80000000, so the "eax < 0x21" guard never held and the leaf was
read unconditionally anyway.

Drop the guard rather than re-gate it on the basic max function
(leaf 0), and read 0x21 directly, relying on the IntelTDX signature
compare. This matches the kernel, which reads the leaf unconditionally
on purpose: an out-of-range CPUID leaf returns the highest basic leaf's
data (no fault, per the Intel SDM), and 0x21 is a synthetic TDX leaf
whose presence need not be reflected in the reported max basic function,
so gating the read on it risks missing a genuine TDX guest. With no
guard the Hyper-V isolation fallback (Azure TDX guests have 0x21
blocked) also stays reachable.

Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

confidential-virt: treat an unreadable SEV MSR as confidential

msr() returned 0 on failure, indistinguishable from a real MSR value of
0. With /dev/cpu/0/msr unavailable (e.g. the msr module not loaded in the
initrd), detect_sev() read 0 and reported a genuine SEV/-ES/-SNP guest as
CONFIDENTIAL_VIRTUALIZATION_NONE.

That inverts the firmware-credential trust gate: import_credentials_*()
skip fw_cfg/SMBIOS credentials only when detect_confidential_virtualization()
is > 0 ("don't trust firmware in confidential VMs"). A false NONE makes a
confidential guest trust and import credentials injected by the untrusted
hypervisor.

msr() now returns a negative errno, and detect_sev() assumes plain SEV when
the MSR is unreadable but CPUID already advertised SEV under a hypervisor,
so the gate still trips.

The conservative branch only fires when CPUID already advertised SEV, i.e.
for a guest the hypervisor marked SEV-capable. QEMU gates that CPUID leaf on
the SEV launch object and does not expose it to ordinary guests even under
-cpu host, so it does not misfire for non-confidential guests. Were a
hypervisor to expose the bit anyway the outcome is fail-safe (we only
decline to trust firmware-supplied data); nothing in-tree branches on the
specific SEV tier.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

pcrextend: refuse empty measurement over Varlink

vl_method_extend() accepted an empty text/data value and measured an
empty word, bypassing the empty-word refusal the CLI path already
enforces. Measured words are joined with ":" in the record, so an empty
word is ambiguous. Reject it.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

tpm2-util: refuse NvPCR extend when the NV index is gone

tpm2_index_to_handle() returns 0 with a NULL handle when the NV index is
not present on the TPM. tpm2_nvpcr_extend_bytes() only checked for r < 0,
so a tombstoned NvPCR (anchor file present, NV slot cleared out from under
us) passed the NULL handle to tpm2_extend_nvpcr_nv_index() and aborted the
process via its assert(). Handle r == 0 explicitly, as the other
tpm2_index_to_handle() callers already do.

The newly introduced -ENODEV is mapped together with -ENOENT to the
io.systemd.PCRExtend.NoSuchNvPCR varlink error.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

imds: expose imds info fields also as metrics (#42409)

report: add systemd-report-sign-tsm backend (#42683)

Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.

A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.

core: add method to enqueue multiple jobs in a single call (#42182)

Currently only a single job for a single unit can be enqueued
atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.

Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.

test: add coverage for multi-unit transactions

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

portablectl: use new EnqueueUnitJobMany() when available

systemctl: use new EnqueueUnitJobMany() when available

Fixes https://github.com/systemd/systemd/issues/7877
Replaces https://github.com/systemd/systemd/pull/7947

homed: fix min_free tracking in manager_rebalance_calculate()

min_free is supposed to track the minimum free space across all home
directories to scale the next rebalance interval. However, it was
incorrectly assigned h->rebalance_size (the home's current total
allocation) instead of new_free (the remaining allocatable space).

This caused the rebalance interval to be computed from allocation sizes
rather than free space, so a nearly-full home would not trigger the
shorter intervals it should, delaying response to low-space conditions.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

repart: Sort the partition list by partition offset (#42488)

Currently the partition list is ordered like this: First come the
partitions that exist as definition files (could be pre-existing
partitions or could be new ones), then come the pre-existing partitions
that aren't matched to a definition file.

This ordering is visible to the user when we print our partition table,
and it doesn't really make sense from a UX perspective: Partition tables
are usually either presented in order of the partition indices, or in
order of the partition offsets. Arguably the latter would be nicer here,
since the visualization below is already ordered by physical offsets.

So reorder the list after we assigned the new partitions to their
respective free areas, according to the physical offset (or, for
partitions to newly create, the order that we will allocate them in).

Another potential upside of this is that we could rely on the partition
order in the code now more, too.

To ensure it keeps working, also add a test in the integration tests for
it.

Screenshot before:
<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-07"
src="https://github.com/user-attachments/assets/7f24b527-7d79-49c4-916b-52faa892d4eb"
/>

Screenshot after:

<img width="2853" height="686" alt="Screenshot From 2026-06-05 00-58-16"
src="https://github.com/user-attachments/assets/4505ec5e-cab4-4ac1-95f0-b5af3991509e"
/>

imds: use help-util.h helpers for --help output

Convert the --help text of systemd-imds and systemd-imdsd to the common
help_cmdline()/help_abstract()/help_section()/help_man_page_reference()
helpers, for a uniform output style across tools.

imds: expose instance metadata as an io.systemd.Metrics provider

When systemd-imds is invoked as a Varlink service (via the new
systemd-imds-metrics.socket), it now acts as an io.systemd.Metrics
provider for systemd-report. It connects to systemd-imdsd over the
existing io.systemd.InstanceMetadata interface to acquire the real
data and re-exposes the detected cloud vendor plus the well-known
hostname, region, zone and public IPv4/IPv6 fields as metrics in the
io.systemd.InstanceMetadata.* namespace.

The metrics logic lives entirely on the client side
(imds-tool-metrics.c); systemd-imdsd is unchanged. Each metric is
acquired on demand with a blocking call to the daemon, benefiting from
its local cache. Fields that are unset or unsupported by the vendor are
simply omitted.

The metrics socket is statically enabled into sockets.target.wants/.

imds: fix logging

Follow our coding style rules and make functions that log about most
erros log about all errors.

oci-util: fix and harden oci_registry_is_valid()

- Pass colon+1 (port string) instead of s (hostname) to safe_atou16,
  so host:port registries are no longer always rejected.
- Switch to safe_atou16_full() with base-10 and strict flags to reject
  non-decimal port forms (hex, octal, leading whitespace, sign prefix)
  that would produce malformed URL authorities.
- Reject empty host explicitly via isempty() guard (covers both NULL
  and empty-string input), and guard colon == n to reject ':port' form,
  since dns_name_is_valid('') == 1 (DNS root) would otherwise accept
  empty host as valid.
- Wrap overlong line to fit 109-column limit.
- Add test coverage for oci_registry_is_valid().

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

Add handling for '-1' when parsing vsock CID (#42654)

Currently `systemd-ssh-generator` supports
`systemd.ssh_listen=vsock::22` and aliases the "empty CID" towards
`VMADDR_CID_ANY`. VMADDR_CID_ANY is -1, so it's confusing from a user
experience that `systemd.ssh_listen=vsock:-1:22` isn't supported.

report: add systemd-report-sign-tsm backend

A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

shared: add configfs-tsm attestation report helper

Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

Translations update from Fedora Weblate (#42699)

Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).

Current translation status:

![Weblate translation
status](https://translate.fedoraproject.org/widget/systemd/main/horizontal-auto.svg)

po: Translated using Weblate (Romanian)

Currently translated at 74.4% (213 of 286 strings)

Co-authored-by: Fedora Weblate user 1831 <atony076@users.noreply.translate.fedoraproject.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ro/
Translation: systemd/main

po: Translated using Weblate (Korean)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: 김인수 <simmon@nplob.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ko/
Translation: systemd/main

veritysetup: don't measure root hash signature after unsigned fallback

verb_attach() falls back to unsigned activation (crypt_activate_by_volume_key)
when signed activation fails, but still passed the signature to
pcrextend_verity_now(). The signer is parsed out of the (unverified)
signature and folded into the dm_verity NvPCR measurement, making an
unsigned fallback indistinguishable from a genuinely signed activation to
an attester. Only measure the signature when signed activation succeeded.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

manager: make systemd+executor a multicall binary

Allow systemd-executor to be compiled into a single binary.
The existing -Dlink-executor-shared=true|false is extended to also
allow -Dlink-executor-shared=single (*). The new mode is opt-in,
to allow experimentation and introduce this smoothly.

This saves a little space, but not as much as I expected:
$ ls -l build/{systemd,systemd-executor} build-new/systemd
-rwxr-xr-x 1 zbyszek zbyszek 631520 May 25 22:44 build/systemd
-rwxr-xr-x 1 zbyszek zbyszek 670464 May 25 22:44 build/systemd-executor
-rwxr-xr-x 1 zbyszek zbyszek 1214488 May 25 22:45 build-new/systemd
(This is with -Dbuildtype=debugoptimized -Db_lto=true).
The combined binary is slightly smaller than the sum of the separate
ones, but not much. In both cases, the binaries are linked to
libsystemd-core which is 10MB, so the size of the binaries themselves
doesn't make much of a difference. The executor needs exec-invoke.c
which is huge and not shared with anything else.

Longer term, I want to allow systemd to be linked statically. In
that case, having systemd-executor separate would be very painful.
So the option to use a multicall binary will be necessary.

Previously, we stored the resolved path to systemd-executor and
used it argv[0]. I don't think this was useful. After all, normally
we would use the non-resolved original path as argv[0]. So that
part is dropped, and the resolved path is only logged, but
"systemd-executor" is always used as argv[0]. This makes the
multicall binary work reliably, no matter what the actual file
name is.

(*) This means that compat as the commandline level is maintained:
'meson setup build -Dlink-executor-shared=true …' works as before.
Unfortunately, when using an existing build directory, meson chokes
on the type change and refuses to reconfigure the directory or change
the option or do anything useful. I think meson is DTWT here, but
this is hard to fix. So the build directory probably needs to be
recreated.

sysupdate: keep database of installed files/patterns, and use to GC them (#42646)

Transfer files might come and go, components might be enabled and
disabled. Patterns might change. Let's keep track of what we install, so
that we can automatically gc everything no longer owned by any enabled
transfer.

machine-tags: extend syntax to support key/value pairs (#42618)

This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.

resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)

Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.

No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).

Also a few minor changes:

- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`

systemctl: add --kernel-cmdline-reuse option

kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.

ci: add test-case for new cleanup logic

sysupdate: port to new help-util.[ch] apis

As usual for stuff we touch, let's modernize the --help texts to our new
APIs.

sysupdate: introduce "installdb" that keeps track of installed resources

Let's make sure we keep track of any file we drop into the system via a
database in /var/. This database is implemented based on symlinks, i.e.
reuses the fs as a simple database. Given the database most likely will
have <= 10 entries only (as we store *patterns* of installed file paths in
them, not the file paths themselves), this should be very efficient.

For implementation details see comments at top of
src/sysupdate/sysupdate-cleanup.c.

sysupdate: some smaller clean-ups

Nothing earth shattering, just some minor tweaks.

sysupdate: split out component validation/enumeration into sysupdate-util.[ch]

Just some refactoring.

The code is slightly updated, for example it now uses string_is_safe().
But mostly this just splits out code from one large file to a smaller
one.

string-util: introduce STRING_FILENAME_PART flag for string_is_safe()

Whenever we are validating a string that shall appear in a filename
eventually we want to use filename_part_is_valid() rather than file
filename_is_valid(). Let's add explicit support for that to
string_is_safe(), since it's actually a really common case.

recurse-dir: optionally, only enumerate dentries of a specific type

At various places we filter directory enumerations by inode type. Let's
add explicit support for that, so that the "struct dirent" array we
return already suppresses them.

This shortens code and makes things more robust.

sha256: add sha256_direct_hex() helper

A various places we need a SHA256SUM of something as a string. Let's add
a simple helper for this that does this generically.

core: add method to enqueue multiple jobs in a single call

Currently only a single job for a single unit can be enqueued atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.

Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.

Fixes https://github.com/systemd/systemd/issues/8102

Co-authored-by: Michal Koutný <mkoutny@suse.com>

btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed

Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.

An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.

Fixes https://github.com/systemd/systemd/issues/42674

report-basic, networkd: add Version, KernelTimestamp, Address metrics (#42315)

This PR adds some more useful metrics:
- io.systemd.Network.Address
- io.systemd.Basic.KernelTimestamp.{Realtime,Monotonic}
- io.systemd.Basic.Version

ssl-util: support OpenSSL 4 (#42676)

OpenSSL 4 broke ABI, so we need to look for both SONAMEs.

Follow-up for
https://github.com/systemd/systemd/commit/ccdd42351f79cbb9c2e034a96280a1ded40a2f95

Fixes https://github.com/systemd/systemd/issues/42675

resolve: fix transaction leak in dns_transaction_new() error path

hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

resolved: load libcrypto/libssl lazily on first use and make them optional

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

ssl-util: add cleanup helper for SSL_CTX

journal: add catalog message for missing dlopen dep

log: add log_struct_once macro

Combines log_once and log_struct

tree-wide: Beef up openssl logging

Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.

repart: make vfat creation reproducible (#42446)

Two fixes to get this byte-stable:

- `fd_copy_directory()` was using `FOREACH_DIRENT_ALL`, which doesn't
give stable ordering. Read all paths, sort, then iterate.
- `mcopy -s` depends on `readdir()` ordering and thus isn't
reproducible. Implement the recursion/sorting here and only invoke
mcopy/mmd per dir.

First change increases memory usage, as we don't stream the paths
anymore, second increases the number of context switches when invoking
external tools. Both should be fine given the ESP content should usually
be pretty limited.

I'd like to write a test for this, but didn't come up with a way that
doesn't require privileges and would surface the error reliably.

shared/tpm2: support chunked reads of NV indexes

The TPM2_NV_Read commands returns the requested data in a
TPM2B_MAX_NV_BUFFER type, the maximum size of which is TPM-specific and
can be determined by querying the value of the TPM_PT_NV_BUFFER_MAX
property.

The value of this may be smaller than the payload size of some NV
indexes, particularly when that payload is a X509 certificate with a RSA
public key. Eg, the manufacturer supplied RSA EK certificate on my own
machine has a size of 1035 bytes, and the value of TPM_PT_NV_BUFFER_MAX
is 1024.

To handle this case and make it possible to read any EK certificate from
the TPM, make tpm2_read_nv_index support chunked reads when the payload
size is larger than what the TPM can return in a single command.

ssl-util: prefer OpenSSL 4

For the next version we can switch to preferring the new version

ssl-util: support OpenSSL 4

OpenSSL 4 broke ABI, so we need to look for both SONAMEs.
Try libssl.so.3 first, and fallback to libssl.so.4,
so that the older and more stable version is used if both
are installed, giving distros time to fix regressions.

Follow-up for ccdd42351f79cbb9c2e034a96280a1ded40a2f95

Fixes https://github.com/systemd/systemd/issues/42675

core: create abstraction/more properties for the "Exec" part of Unit.StartTransient (#42360)

This is a bit of an RFC (but I hope I got it mostly right), @daandemeyer
suggested in
https://github.com/systemd/systemd/pull/42161#pullrequestreview-4336323314
to improve the abstractions around the Exec= in
io.systemd.Unit.StartTransient as we will add a bunch more of those. So
this PR adds first a better abstraction and then uses it. See the
individual commits for details.

Add NEWS entry

This is a breaking change, even if it is unlikely that anyone is relying
on it.

repart: expand specifiers in MakeSymlinks= target

Previously, they were only expanded in the source part of the arguments.
No other validation is applied to the target component.

repart: Sort the partition list by partition offset

Currently the partition list is ordered like this: First come the partitions that
exist as definition files (could be pre-existing partitions or could be new ones),
then come the pre-existing partitions that aren't matched to a definition file.

This ordering is visible to the user when we print our partition table, and it
doesn't really make sense from a UX perspective: Partition tables are usually
either presented in order of the partition indices, or in order of the partition
offsets. Arguably the latter would be nicer here, since the visualization below
is already ordered by physical offsets.

So reorder the list after we assigned the new partitions to their respective free
areas, according to the physical offset (or, for partitions to newly create, the
order that we will allocate them in).

Another potential upside of this is that we could rely on the partition order in
the code now more, too.

To ensure it keeps working, also add a test in the integration tests for it.

repart: Always print underline in the last row of the partition table

Claude found a small bug with the partition table we print: We filter out
partitions with p->dropped while making the table, but we want to put an
underline after the last row of the table. In the case where the last entry
in the context->partitions list is a dropped partition, the check for
!p->partitions_next returns FALSE when it actually *is* the last row in the
table.

So move to a check that's based on a pre-counted number of partitions to
print rather than checking for !p->partitions_next.

Co-developed-by: Claude Opus 4.8 <noreply@anthropic.com>

core: add _parameters_init for the Unit.StartTransient dispatch

This commit extracts the initialization of the transient parameters
for io.systemd.Unit.StartTransient into a set of helpers that follow
the _parameters_init() pattern. This way the code is more uniform
and easier to extend and less fragile. It also means there is a
single (logical) place to init the fields.

core: add more settable properties to varlink Unit.StartTransient()

This commit uses the abstractions added in the previous commit to
add a bunch more properties to the io.systemd.StartTransient()
to showcase how straightforward this is now.

New helpers for tristate bools and an init helper are added. A
dedicated dispatcher for LogLevelMax parses the string-form name
("info", "debug" etc.) declared in the varlink IDL.

The new properties are: DynamicUser, IgnoreSIGPIPE, LockPersonality,
MemoryDenyWriteExecute, NoNewPrivileges, OOMScoreAdjust, RemoveIPC,
RestrictRealtime, RestrictSUIDSGID, RootEphemeral, UMask.

The remaining ProtectKernel*, Private*, ProtectClock properties are
declared as STRING in the varlink IDL (matching the modern *Ex/enum
form) so a bool dispatcher does not pass schema validation. Those
need a string-parsing dispatcher and will be added in a follow-up.

This brings us closer to parity with the D-Bus code (still a long
way to go though).

core: create abstraction for the "Exec" part of Unit.StartTransient

The handling of the `Exec` parameters for the varlink
`io.systemd.Unit.StartTransient()` became a bit unwieldy. So
this commit creates another abstraction to handle the various
fields in the `Exec` part of the StartTransient code.

Each Exec property is now described by a single TransientExecProperty
entry and adding a new property is just a single entry there plus
an apply function.

Thanks to Ivan Kruglov for many useful suggestions.

tmpfiles: add %D specifier resolution

systemd-tmpfiles now resolves the %D specifier to /usr/share
(for the system manager) or $XDG_DATA_HOME (for the user manager).

Closes: https://github.com/systemd/systemd/issues/42010
Signed-off-by: Skye Soss <skye@soss.website>