TODO: drop bootctl link + sysupdate integration item
This is now implemented: sysupdate calls out to the
/run/systemd/sysupdate/notify/ Varlink directory on completion, and bootctl
binds a socket there that links a UKI plus extras staged below
/var/lib/systemd/uki/ (with .v/ vpick support) via "bootctl link-auto".
test: verify bootctl link-auto and io.systemd.BootControl.LinkAuto
Add a TEST-87 testcase exercising "bootctl link-auto" and the equivalent
io.systemd.BootControl.LinkAuto() Varlink method: a UKI plus extras are staged
below the search directories and we assert the kernel and sidecar resources
are linked into $BOOT. Covered: plain kernel.efi + extras.d/, versioned
kernel.efi.v/ and extras .v/ resolved via vpick, directory priority
(/etc wins over /run), the no-op case when nothing is staged, and the Varlink
method including its empty reply when there is nothing to link.
test: verify sysupdate invokes the notification callout directory
Extend TEST-72-SYSUPDATE with a check that, after a successful update,
systemd-sysupdate connects to every socket linked into
/run/systemd/sysupdate/notify/ and invokes
io.systemd.SysUpdate.Notify.OnCompletedUpdate(). A tiny recorder socket is
hooked into that directory; it captures the request and replies with success.
We assert the recorded call carries the expected method, version and resource
list, and that a subsequent no-op update emits no notification.
sysext: refresh sysexts and confexts on completed system update
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
sysext Varlink server. systemd-sysext provides a single Varlink service
covering both the sysext and confext image classes, so one notification
refreshes both (equivalent to "systemd-sysext refresh" plus
"systemd-confext refresh"). Hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-sysext.socket,
enabled by default via the preset.
bootctl: add link-auto/LinkAuto and auto-link on completed system update
Add a "bootctl link-auto" verb and a matching io.systemd.BootControl.LinkAuto()
Varlink method that behave exactly like "bootctl link" / Link(), except that
the UKI and extra resources are discovered automatically instead of being
passed in. The following directories are searched, in decreasing priority:
/etc/systemd/uki/, /run/systemd/uki/, /var/lib/systemd/uki/ (where
systemd-sysupdate stages downloaded resources), /usr/local/lib/systemd/uki/
and /usr/lib/systemd/uki/.
- the UKI is taken from kernel.efi, or the best version in kernel.efi.v/
(resolved via vpick, without honouring boot-counting suffixes), from the
highest-priority directory that has one;
- extra resources are picked up from extras.d/, matching *.sysext.raw,
*.confext.raw and *.cred, each either as a plain file or as a versioned
*.v/ directory resolved via vpick, combined across all directories with
higher-priority directories winning on conflicts.
Everything is resolved relative to the pinned root directory fd. Files passed
via --extra= on the command line are linked in addition to the auto-discovered
ones.
Also bind io.systemd.SysUpdate.Notify.OnCompletedUpdate() in the boot control
Varlink server, which simply does the same as LinkAuto(), and hook a socket
into /run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-bootctl.socket
(enabled by default via the preset) so a freshly downloaded kernel is linked
into $BOOT automatically after a sysupdate run.
pcrlock: recompute PCR policy on completed system update
Bind the io.systemd.SysUpdate.Notify.OnCompletedUpdate() method in the
pcrlock Varlink server and hook a socket into
/run/systemd/sysupdate/notify/ via systemd-sysupdate-notify-pcrlock.socket,
enabled by default via the preset. When sysupdate signals a completed
update, we unconditionally re-run make-policy, since the set of measured
components may have changed.
sysupdate: notify hook subscribers after a successful update
Define a new io.systemd.SysUpdate.Notify Varlink interface with a single
OnCompletedUpdate() method, and after sysupdate successfully installs an
update, invoke that method on every socket linked into
/run/systemd/sysupdate/notify/ via varlink_execute_directory(). This
gives other components a hook to react to applied updates (e.g. recompute
a TPM policy, link a freshly downloaded kernel, refresh extensions).
The notification carries the component name, the installed version and the
list of updated resources (transfer id + on-disk path). Subscribers are
free to ignore the parameters and just treat the call as a trigger.
Setting SYSTEMD_SYSUPDATE_FORCE_NOTIFY=1 forces the notification to be sent
even when no update was applied (in which case no resource list is included),
so follow-up work can be triggered unconditionally.
Mirror how chaseat() works these days: instead of a single toplevel_fd that
serves as both the root (chroot) boundary and the directory that resolution
starts from, path_pick() now takes a separate root_fd and dir_fd. This lets
callers resolve a path relative to a specific directory fd while confining
symlink and absolute-path resolution to a root directory fd.
All existing callers are updated to pass the same fd for both, preserving
their current behaviour.
sysupdate: automatically clean up orphaned files after auto-update (#42714)
This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.
sysupdate: automatically clean up orphaned files after auto-update
This adds an operation equivalent to "systemd-sysupdate cleanup" after
an update completed (regardless if that update was entirely successful
or not). This ensures that any orphaned files are automatically cleaned
up, if they are not referenced by any transfer file's patterns anymore.
This happens in test units with many commands, so reset the timer when
a command completes and the test advances. The number of Exec
instructions is bounded so this will terminate jobs that are really
stuck anyway.
sd-varlink: mark varlink sockets via xattrs (#42454)
Linux 7.0 added the ability to mark socket inodes with xattrs. Let's use
that to clearly mark all our Varlink sockets as being varlink related.
This is then used to implement a very useful new command "varlinkctl
list-sockets" which lists all varlink entrypoint sockets marked this
way.
By marking not just the entrypoint inodes but also the connection
sockets properly, we can one day add an ebpf based "varlinkctl trace"
command that watches varlink sockets for traffic. but that's material
for a later PR.
Frantisek Sumsal [Tue, 23 Jun 2026 19:29:53 +0000 (21:29 +0200)]
test: skip fdstore tests if test-fdstore is not available
When the test suite is run in the "standalone" mode, the minimal
container might not contain the test-fdstore binary that's needed for a
couple of tests. Since installing systemd-tests into the minimal
container pulls in a lot of other dependencies, let's just skip the
affected tests instead to avoid this.
units: tag all .varlink sockets with the right xattrs
This also relaxes the inode access modes a bit, in case they were set to
0600: we now set the "r" bit too, i.e. use 0644. This is beneficial
since it permits unpriv code to read the xattrs of the entrypoints
(which require read access). Note that in order to be able to connect()
to a socket inode you need write access, hence this shouldn't compromise
security in any way.
Luca Boccassi [Mon, 22 Jun 2026 21:48:53 +0000 (22:48 +0100)]
tpm2-setup: call DLOPEN_TPM2 to add dependency and fail immediately if not present
tpm2-setup requires both libcrypto and the tpm2-tss libraries, but so
far it only directly dlopen'ed libcrypto, with a clear error on startup
if missing, and a dependency added via dlopen notes.
Do the same for the tpm2-tss dlopens, to get a clear error and the
required dependencies.
journal: expose last 10 high priority logs as metrics (#42621)
This commit exposes the last 10 high priority logs as metrics so that
the systemd-report reports them. The entries are reported as
`io.systemd.Journal.HighPriorityMessage` and include all fields that are
printable as strings.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
Luca Boccassi [Tue, 23 Jun 2026 10:45:11 +0000 (11:45 +0100)]
core: fix assertion when inactive unit pulled in by try-restart and start at the same time
With EnqueueUnitJobMany(), one anchor can collapse to NOP (inactive
unit + try-restart) while another anchor pulls that same unit in as a
regular start/restart job, leaving a NOP and a regular job in one
unit's transaction list, hitting an assert:
#11 0x00007f3fd2a446dc in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
function=<optimized out>) at ./assert/assert.c:127
#12 0x00007f3fd326e872 in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at ../src/core/job.c:428
#13 0x00007f3fd32e5641 in job_type_merge_and_collapse (a=0x7ffc7dda2430, b=<optimized out>, u=0x557bb11434c0)
at ../src/core/job.c:523
#14 0x00007f3fd335e4b3 in transaction_ensure_mergeable (tr=tr@entry=0x557bb0f6d150,
matters_to_anchor=matters_to_anchor@entry=true, e=e@entry=0x7ffc7dda33e0) at ../src/core/transaction.c:241
#15 0x00007f3fd3360242 in transaction_merge_jobs (tr=0x557bb0f6d150, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:273
#16 transaction_activate (tr=0x557bb0f6d150, m=0x557bb0dd9c10, mode=JOB_REPLACE, affected_jobs=0x0, e=0x7ffc7dda33e0)
at ../src/core/transaction.c:797
#17 0x00007f3fd33091ed in manager_add_jobs (m=<optimized out>, type=<optimized out>, names=<optimized out>,
reload_if_possible=false, mode=JOB_REPLACE, extra_flags=0, affected_jobs=0x0, reterr_error=0x7ffc7dda33e0,
ret_jobs=0x557bb0fe8790) at ../src/core/manager.c:2386
Luca Boccassi [Mon, 22 Jun 2026 23:04:41 +0000 (00:04 +0100)]
growfs: downgrade dependency on libcryptsetup to optional
growfs actually gracefully skips when cryptsetup fails or is
missing already, and it is only necessary when the device is
a LUKS device anyway. Downgrade from required ro recommended.
Michael Vogt [Tue, 16 Jun 2026 14:20:10 +0000 (16:20 +0200)]
journal: expose last 10 high priority logs as metrics
This commit exposes the last 10 high priority logs as metrics
so that the systemd-report reports them. The entries are
reported as `io.systemd.Journal.HighPriorityMessage` and
include all field as the new METRIC_FAMILY_TYPE_OBJECT.
Individual fields from a journal entry that are unprintable
(invalid utf-8) are skipped.
This is archived via a new socket-activated unit listens on
/run/systemd/report/io.systemd.Journal
Michael Vogt [Mon, 22 Jun 2026 05:40:34 +0000 (07:40 +0200)]
metrics: add METRIC_FAMILY_TYPE_OBJECT type
We will need a way to send journal entries as metrics. Those are already
json objects. So Lennart suggested to introduce a new type
METRIC_FAMILY_TYPE_OBJECT that does this. This commit implements
his suggestion.
Paul Meyer [Tue, 23 Jun 2026 10:34:12 +0000 (12:34 +0200)]
boot: read the TDX CPUID leaf unconditionally
vmm.c carries the confidential-VM detection used by sd-boot/sd-stub.
Its detect_tdx() had the same dead guard as the userspace copy: it
gated the 0x21 read on CPUID_GET_HIGHEST_FUNCTION (0x80000000, the
extended max function), which is always >= 0x80000000, so the guard
never held.
Mirror the userspace fix: read leaf 0x21 directly and rely on the
IntelTDX signature, matching the kernel. An out-of-range CPUID leaf
returns the highest basic leaf's data (no fault), and 0x21 is a
synthetic TDX leaf whose presence need not be reflected in the max
basic function, so it must not be gated on it.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Paul Meyer [Thu, 18 Jun 2026 05:44:09 +0000 (07:44 +0200)]
confidential-virt: read the TDX CPUID leaf unconditionally
detect_tdx() guarded the read of the TDX enumeration leaf (0x21, a
standard leaf) with CPUID_GET_HIGHEST_FUNCTION (0x80000000), which
returns the highest *extended* function. eax is therefore always
>= 0x80000000, so the "eax < 0x21" guard never held and the leaf was
read unconditionally anyway.
Drop the guard rather than re-gate it on the basic max function
(leaf 0), and read 0x21 directly, relying on the IntelTDX signature
compare. This matches the kernel, which reads the leaf unconditionally
on purpose: an out-of-range CPUID leaf returns the highest basic leaf's
data (no fault, per the Intel SDM), and 0x21 is a synthetic TDX leaf
whose presence need not be reflected in the reported max basic function,
so gating the read on it risks missing a genuine TDX guest. With no
guard the Hyper-V isolation fallback (Azure TDX guests have 0x21
blocked) also stays reachable.
Ref: Linux 59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in
early boot"), arch/x86/coco/tdx/tdx.c:1119 (tdx_early_init()).
Paul Meyer [Wed, 17 Jun 2026 14:13:35 +0000 (16:13 +0200)]
confidential-virt: treat an unreadable SEV MSR as confidential
msr() returned 0 on failure, indistinguishable from a real MSR value of
0. With /dev/cpu/0/msr unavailable (e.g. the msr module not loaded in the
initrd), detect_sev() read 0 and reported a genuine SEV/-ES/-SNP guest as
CONFIDENTIAL_VIRTUALIZATION_NONE.
That inverts the firmware-credential trust gate: import_credentials_*()
skip fw_cfg/SMBIOS credentials only when detect_confidential_virtualization()
is > 0 ("don't trust firmware in confidential VMs"). A false NONE makes a
confidential guest trust and import credentials injected by the untrusted
hypervisor.
msr() now returns a negative errno, and detect_sev() assumes plain SEV when
the MSR is unreadable but CPUID already advertised SEV under a hypervisor,
so the gate still trips.
The conservative branch only fires when CPUID already advertised SEV, i.e.
for a guest the hypervisor marked SEV-capable. QEMU gates that CPUID leaf on
the SEV launch object and does not expose it to ordinary guests even under
-cpu host, so it does not misfire for non-confidential guests. Were a
hypervisor to expose the bit anyway the outcome is fail-safe (we only
decline to trust firmware-supplied data); nothing in-tree branches on the
specific SEV tier.
Paul Meyer [Wed, 17 Jun 2026 16:03:55 +0000 (18:03 +0200)]
pcrextend: refuse empty measurement over Varlink
vl_method_extend() accepted an empty text/data value and measured an
empty word, bypassing the empty-word refusal the CLI path already
enforces. Measured words are joined with ":" in the record, so an empty
word is ambiguous. Reject it.
Paul Meyer [Wed, 17 Jun 2026 15:40:18 +0000 (17:40 +0200)]
tpm2-util: refuse NvPCR extend when the NV index is gone
tpm2_index_to_handle() returns 0 with a NULL handle when the NV index is
not present on the TPM. tpm2_nvpcr_extend_bytes() only checked for r < 0,
so a tombstoned NvPCR (anchor file present, NV slot cleared out from under
us) passed the NULL handle to tpm2_extend_nvpcr_nv_index() and aborted the
process via its assert(). Handle r == 0 explicitly, as the other
tpm2_index_to_handle() callers already do.
The newly introduced -ENODEV is mapped together with -ENOENT to the
io.systemd.PCRExtend.NoSuchNvPCR varlink error.
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Luca Boccassi [Mon, 22 Jun 2026 20:35:17 +0000 (21:35 +0100)]
core: add method to enqueue multiple jobs in a single call (#42182)
Currently only a single job for a single unit can be enqueued
atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.
Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.
dongshengyuan [Mon, 22 Jun 2026 05:38:05 +0000 (13:38 +0800)]
homed: fix min_free tracking in manager_rebalance_calculate()
min_free is supposed to track the minimum free space across all home
directories to scale the next rebalance interval. However, it was
incorrectly assigned h->rebalance_size (the home's current total
allocation) instead of new_free (the remaining allocatable space).
This caused the rebalance interval to be computed from allocation sizes
rather than free space, so a nearly-full home would not trigger the
shorter intervals it should, delaying response to low-space conditions.
repart: Sort the partition list by partition offset (#42488)
Currently the partition list is ordered like this: First come the
partitions that exist as definition files (could be pre-existing
partitions or could be new ones), then come the pre-existing partitions
that aren't matched to a definition file.
This ordering is visible to the user when we print our partition table,
and it doesn't really make sense from a UX perspective: Partition tables
are usually either presented in order of the partition indices, or in
order of the partition offsets. Arguably the latter would be nicer here,
since the visualization below is already ordered by physical offsets.
So reorder the list after we assigned the new partitions to their
respective free areas, according to the physical offset (or, for
partitions to newly create, the order that we will allocate them in).
Another potential upside of this is that we could rely on the partition
order in the code now more, too.
To ensure it keeps working, also add a test in the integration tests for
it.
Convert the --help text of systemd-imds and systemd-imdsd to the common
help_cmdline()/help_abstract()/help_section()/help_man_page_reference()
helpers, for a uniform output style across tools.
imds: expose instance metadata as an io.systemd.Metrics provider
When systemd-imds is invoked as a Varlink service (via the new
systemd-imds-metrics.socket), it now acts as an io.systemd.Metrics
provider for systemd-report. It connects to systemd-imdsd over the
existing io.systemd.InstanceMetadata interface to acquire the real
data and re-exposes the detected cloud vendor plus the well-known
hostname, region, zone and public IPv4/IPv6 fields as metrics in the
io.systemd.InstanceMetadata.* namespace.
The metrics logic lives entirely on the client side
(imds-tool-metrics.c); systemd-imdsd is unchanged. Each metric is
acquired on demand with a blocking call to the daemon, benefiting from
its local cache. Fields that are unset or unsupported by the vendor are
simply omitted.
The metrics socket is statically enabled into sockets.target.wants/.
dongshengyuan [Mon, 22 Jun 2026 05:16:32 +0000 (13:16 +0800)]
oci-util: fix and harden oci_registry_is_valid()
- Pass colon+1 (port string) instead of s (hostname) to safe_atou16,
so host:port registries are no longer always rejected.
- Switch to safe_atou16_full() with base-10 and strict flags to reject
non-decimal port forms (hex, octal, leading whitespace, sign prefix)
that would produce malformed URL authorities.
- Reject empty host explicitly via isempty() guard (covers both NULL
and empty-string input), and guard colon == n to reject ':port' form,
since dns_name_is_valid('') == 1 (DNS root) would otherwise accept
empty host as valid.
- Wrap overlong line to fit 109-column limit.
- Add test coverage for oci_registry_is_valid().
Gabriel [Mon, 22 Jun 2026 19:57:03 +0000 (21:57 +0200)]
Add handling for '-1' when parsing vsock CID (#42654)
Currently `systemd-ssh-generator` supports
`systemd.ssh_listen=vsock::22` and aliases the "empty CID" towards
`VMADDR_CID_ANY`. VMADDR_CID_ANY is -1, so it's confusing from a user
experience that `systemd.ssh_listen=vsock:-1:22` isn't supported.
Paul Meyer [Fri, 19 Jun 2026 08:11:52 +0000 (10:11 +0200)]
report: add systemd-report-sign-tsm backend
A report signing backend that returns a confidential-computing
attestation report obtained via configfs-tsm. Implements
io.systemd.Report.Signer.Sign(): embeds the digest as the report's
inblob and returns the outblob (plus provider and any aux/manifest
blobs). Wired up as the "tsm" mechanism with a socket-activated service.
Add tsm_report_acquire(), a thin wrapper around the kernel's
/sys/kernel/config/tsm/report/ configfs interface for fetching a
confidential-computing attestation report (SEV-SNP, TDX, ...), including
a caller supplied input.
Luca Boccassi [Mon, 22 Jun 2026 18:22:37 +0000 (19:22 +0100)]
Translations update from Fedora Weblate (#42699)
Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).
Paul Meyer [Wed, 17 Jun 2026 15:21:51 +0000 (17:21 +0200)]
veritysetup: don't measure root hash signature after unsigned fallback
verb_attach() falls back to unsigned activation (crypt_activate_by_volume_key)
when signed activation fails, but still passed the signature to
pcrextend_verity_now(). The signer is parsed out of the (unverified)
signature and folded into the dm_verity NvPCR measurement, making an
unsigned fallback indistinguishable from a genuinely signed activation to
an attester. Only measure the signature when signed activation succeeded.
Allow systemd-executor to be compiled into a single binary.
The existing -Dlink-executor-shared=true|false is extended to also
allow -Dlink-executor-shared=single (*). The new mode is opt-in,
to allow experimentation and introduce this smoothly.
This saves a little space, but not as much as I expected:
$ ls -l build/{systemd,systemd-executor} build-new/systemd
-rwxr-xr-x 1 zbyszek zbyszek 631520 May 25 22:44 build/systemd
-rwxr-xr-x 1 zbyszek zbyszek 670464 May 25 22:44 build/systemd-executor
-rwxr-xr-x 1 zbyszek zbyszek 1214488 May 25 22:45 build-new/systemd
(This is with -Dbuildtype=debugoptimized -Db_lto=true).
The combined binary is slightly smaller than the sum of the separate
ones, but not much. In both cases, the binaries are linked to
libsystemd-core which is 10MB, so the size of the binaries themselves
doesn't make much of a difference. The executor needs exec-invoke.c
which is huge and not shared with anything else.
Longer term, I want to allow systemd to be linked statically. In
that case, having systemd-executor separate would be very painful.
So the option to use a multicall binary will be necessary.
Previously, we stored the resolved path to systemd-executor and
used it argv[0]. I don't think this was useful. After all, normally
we would use the non-resolved original path as argv[0]. So that
part is dropped, and the resolved path is only logged, but
"systemd-executor" is always used as argv[0]. This makes the
multicall binary work reliably, no matter what the actual file
name is.
(*) This means that compat as the commandline level is maintained:
'meson setup build -Dlink-executor-shared=true …' works as before.
Unfortunately, when using an existing build directory, meson chokes
on the type change and refuses to reconfigure the directory or change
the option or do anything useful. I think meson is DTWT here, but
this is hard to fix. So the build directory probably needs to be
recreated.
sysupdate: keep database of installed files/patterns, and use to GC them (#42646)
Transfer files might come and go, components might be enabled and
disabled. Patterns might change. Let's keep track of what we install, so
that we can automatically gc everything no longer owned by any enabled
transfer.
Luca Boccassi [Mon, 22 Jun 2026 13:47:20 +0000 (14:47 +0100)]
machine-tags: extend syntax to support key/value pairs (#42618)
This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.
Luca Boccassi [Mon, 22 Jun 2026 13:15:39 +0000 (14:15 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Luca Boccassi [Mon, 22 Jun 2026 13:08:21 +0000 (14:08 +0100)]
Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)
Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.
No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).
Also a few minor changes:
- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`
Luca Boccassi [Sat, 20 Jun 2026 00:05:00 +0000 (01:05 +0100)]
systemctl: add --kernel-cmdline-reuse option
kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.
sysupdate: introduce "installdb" that keeps track of installed resources
Let's make sure we keep track of any file we drop into the system via a
database in /var/. This database is implemented based on symlinks, i.e.
reuses the fs as a simple database. Given the database most likely will
have <= 10 entries only (as we store *patterns* of installed file paths in
them, not the file paths themselves), this should be very efficient.
For implementation details see comments at top of
src/sysupdate/sysupdate-cleanup.c.
string-util: introduce STRING_FILENAME_PART flag for string_is_safe()
Whenever we are validating a string that shall appear in a filename
eventually we want to use filename_part_is_valid() rather than file
filename_is_valid(). Let's add explicit support for that to
string_is_safe(), since it's actually a really common case.
recurse-dir: optionally, only enumerate dentries of a specific type
At various places we filter directory enumerations by inode type. Let's
add explicit support for that, so that the "struct dirent" array we
return already suppresses them.
Luca Boccassi [Thu, 29 Aug 2024 12:17:13 +0000 (13:17 +0100)]
core: add method to enqueue multiple jobs in a single call
Currently only a single job for a single unit can be enqueued atomically,
so there is no guarantee that, e.g., starting a unit and its socket
at the same time will happen in the same transaction. That forces
callers to 'know' the right order in which to start new units being
installed, or failures will occur. It also means some ordering
constraints are ignored, in case the separate calls are done
in the wrong manual order.
Add a new EnqueueUnitJobMany() D-Bus method that takes a list of units
to start.
Luca Boccassi [Sat, 20 Jun 2026 23:11:08 +0000 (00:11 +0100)]
btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed
Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.
An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.
dongshengyuan [Mon, 22 Jun 2026 06:13:11 +0000 (14:13 +0800)]
resolve: fix transaction leak in dns_transaction_new() error path
hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.
Luca Boccassi [Sun, 21 Jun 2026 09:42:02 +0000 (10:42 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Daan De Meyer [Wed, 3 Jun 2026 10:58:02 +0000 (10:58 +0000)]
tree-wide: Beef up openssl logging
Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.
- `fd_copy_directory()` was using `FOREACH_DIRENT_ALL`, which doesn't
give stable ordering. Read all paths, sort, then iterate.
- `mcopy -s` depends on `readdir()` ordering and thus isn't
reproducible. Implement the recursion/sorting here and only invoke
mcopy/mmd per dir.
First change increases memory usage, as we don't stream the paths
anymore, second increases the number of context switches when invoking
external tools. Both should be fine given the ESP content should usually
be pretty limited.
I'd like to write a test for this, but didn't come up with a way that
doesn't require privileges and would surface the error reliably.
Chris Coulson [Tue, 2 Jun 2026 17:02:35 +0000 (18:02 +0100)]
shared/tpm2: support chunked reads of NV indexes
The TPM2_NV_Read commands returns the requested data in a
TPM2B_MAX_NV_BUFFER type, the maximum size of which is TPM-specific and
can be determined by querying the value of the TPM_PT_NV_BUFFER_MAX
property.
The value of this may be smaller than the payload size of some NV
indexes, particularly when that payload is a X509 certificate with a RSA
public key. Eg, the manufacturer supplied RSA EK certificate on my own
machine has a size of 1035 bytes, and the value of TPM_PT_NV_BUFFER_MAX
is 1024.
To handle this case and make it possible to read any EK certificate from
the TPM, make tpm2_read_nv_index support chunked reads when the payload
size is larger than what the TPM can return in a single command.
Luca Boccassi [Sat, 20 Jun 2026 14:20:19 +0000 (15:20 +0100)]
ssl-util: support OpenSSL 4
OpenSSL 4 broke ABI, so we need to look for both SONAMEs.
Try libssl.so.3 first, and fallback to libssl.so.4,
so that the older and more stable version is used if both
are installed, giving distros time to fix regressions.
core: create abstraction/more properties for the "Exec" part of Unit.StartTransient (#42360)
This is a bit of an RFC (but I hope I got it mostly right), @daandemeyer
suggested in
https://github.com/systemd/systemd/pull/42161#pullrequestreview-4336323314
to improve the abstractions around the Exec= in
io.systemd.Unit.StartTransient as we will add a bunch more of those. So
this PR adds first a better abstraction and then uses it. See the
individual commits for details.
Jonas Dreßler [Fri, 29 May 2026 23:25:45 +0000 (01:25 +0200)]
repart: Sort the partition list by partition offset
Currently the partition list is ordered like this: First come the partitions that
exist as definition files (could be pre-existing partitions or could be new ones),
then come the pre-existing partitions that aren't matched to a definition file.
This ordering is visible to the user when we print our partition table, and it
doesn't really make sense from a UX perspective: Partition tables are usually
either presented in order of the partition indices, or in order of the partition
offsets. Arguably the latter would be nicer here, since the visualization below
is already ordered by physical offsets.
So reorder the list after we assigned the new partitions to their respective free
areas, according to the physical offset (or, for partitions to newly create, the
order that we will allocate them in).
Another potential upside of this is that we could rely on the partition order in
the code now more, too.
To ensure it keeps working, also add a test in the integration tests for it.
Jonas Dreßler [Sat, 6 Jun 2026 15:32:54 +0000 (17:32 +0200)]
repart: Always print underline in the last row of the partition table
Claude found a small bug with the partition table we print: We filter out
partitions with p->dropped while making the table, but we want to put an
underline after the last row of the table. In the case where the last entry
in the context->partitions list is a dropped partition, the check for
!p->partitions_next returns FALSE when it actually *is* the last row in the
table.
So move to a check that's based on a pre-counted number of partitions to
print rather than checking for !p->partitions_next.
Co-developed-by: Claude Opus 4.8 <noreply@anthropic.com>
Michael Vogt [Mon, 15 Jun 2026 06:40:50 +0000 (08:40 +0200)]
core: add _parameters_init for the Unit.StartTransient dispatch
This commit extracts the initialization of the transient parameters
for io.systemd.Unit.StartTransient into a set of helpers that follow
the _parameters_init() pattern. This way the code is more uniform
and easier to extend and less fragile. It also means there is a
single (logical) place to init the fields.
Michael Vogt [Thu, 21 May 2026 14:28:57 +0000 (16:28 +0200)]
core: add more settable properties to varlink Unit.StartTransient()
This commit uses the abstractions added in the previous commit to
add a bunch more properties to the io.systemd.StartTransient()
to showcase how straightforward this is now.
New helpers for tristate bools and an init helper are added. A
dedicated dispatcher for LogLevelMax parses the string-form name
("info", "debug" etc.) declared in the varlink IDL.
The new properties are: DynamicUser, IgnoreSIGPIPE, LockPersonality,
MemoryDenyWriteExecute, NoNewPrivileges, OOMScoreAdjust, RemoveIPC,
RestrictRealtime, RestrictSUIDSGID, RootEphemeral, UMask.
The remaining ProtectKernel*, Private*, ProtectClock properties are
declared as STRING in the varlink IDL (matching the modern *Ex/enum
form) so a bool dispatcher does not pass schema validation. Those
need a string-parsing dispatcher and will be added in a follow-up.
This brings us closer to parity with the D-Bus code (still a long
way to go though).
Michael Vogt [Thu, 21 May 2026 11:49:33 +0000 (13:49 +0200)]
core: create abstraction for the "Exec" part of Unit.StartTransient
The handling of the `Exec` parameters for the varlink
`io.systemd.Unit.StartTransient()` became a bit unwieldy. So
this commit creates another abstraction to handle the various
fields in the `Exec` part of the StartTransient code.
Each Exec property is now described by a single TransientExecProperty
entry and adding a new property is just a single entry there plus
an apply function.
Thanks to Ivan Kruglov for many useful suggestions.