tpm2: cache NvPCR NV space exhaustion via flag files in /run/
When we run out of NV index space while allocating an NvPCR, the
situation will unlikely improve until (at least) reboot. Retrying the
(doomed) Esys_NV_DefineSpace call on every subsequent allocation attempt
is wasteful (and very slow), so remember the exhaustion in a flag file
under /run/ and fail early next time.
We use two separate flag files, one for orderly and one for non-orderly
NvPCRs, since the two draw on different TPM resources (RAM-backed vs.
NVRAM-backed): exhaustion of one doesn't imply exhaustion of the other.
The files live in /run/, hence are cleared on reboot, which is
potentially is when NV space might become available again.
tpm2: optionally disable TPMA_NV_ORDERLY for NvPCRs
NVIndexes in TPMs can operate in two modes:
1. Backed by TPM RAM. In this case they are only written to NVRAM on an
orderly TPM shutdown when the system goes down. (TPMA_NV_ORDERLY flag
is on)
2. Backed by TPM NVRAM. In this case the nvindex value is written to NVRAM
on every write, and things are not delayed until orderly shutdown.
Normally mode 1 sounds like the obvious choice for NvPCRs, which reset
to zero anyway at boot. However, things are more complicated since
real-life TPMs tend to have a lot less RAM than NVRAM (both are
constrained but RAM even more than NVRAM). Hence there's value in using
NVRAM right-away. However, writing to NVRAM all the time means wearing
it out (since NVRAM is more vulnerable to that).
So far we unconditionally went for mode 1, but ran into space
constraints of RAM due to that.
Let's improve things a bit, and use orderly mode for NvPCRs we expect to
write many times, and non-orderly mode for those we expect to write only
a small, fixed number of times at boot, and not anymore during runtime.
Right now, this is only the "hardware" NvPCR, which measures hw identity
at boot.
Hopefully, this stretches available resources a bit further.
This also makes sure if the flag was set differently on allocation as
we'd set now, we accept it and won't complain, to make upgrades safe.
dongshengyuan [Wed, 24 Jun 2026 11:57:04 +0000 (19:57 +0800)]
pcrextend,tpm2-util,tpm2-setup: gracefully skip NvPCR when TPM NV space is exhausted
Map TPM2_RC_NV_SPACE to -ENOBUFS in tpm2_define_nvpcr_nv_index() rather
than -ENOSPC, giving it a dedicated errno distinct from the -ENOSPC that
write_string_file_at() can return when /run is full — a different failure
that occurs after the NV index is already allocated on the TPM.
In extend_nvpcr_now(), propagate -ENOBUFS as-is so callers can handle it
in a nicer fashion. In vl_method_extend(), map it to the new varlink error
io.systemd.PCRExtend.NvPCRSpaceExhausted. In run(), handle -ENOBUFS
explicitly under --graceful and print an appropriate message. Update
tpm2-setup.c accordingly.
import: Trust subkeys included in signature (#41860)
- import: Trust subkeys included in signature
With gpg sub keys one can rotate signing keys while having a stable
trust anchor. So far one still had to ship the sub key out of band but
a newer gpg has the option to include the sub key in the signature and
import it automatically. This is safe if we only allow importing a sub
key signed by the top key we already have in the key ring.
Add the --auto-key-import argument to gpg to import subkeys but also
set --import-options=merge-only,import-clean to restrict what we import
to only be sub keys signed by the top key we have in the keyring and
discard any irrelevant parts.
- import: Support env var to override gpg keyring
By default there is a fixed keyring in /usr or /etc. But when running
systemd-pull unprivileged in the user context or with a custom transfer
definition as in systemd-sysupdate --definitions=./... (e.g., for local
ParticleOS updates) it is limiting to require that all keys have to be
part of the OS keyring or otherwise no verification can be used. Also,
for testing it is valuable to point it at a different keyring.
Add a SYSTEMD_OPENPGP_KEYRING env var where the omission or empty
assignment sticks to the current behavior of the global OS keyrings but
a keyring path given will take precedence. While an env var can leak
down the process tree and is more difficult to secure for being the
trust anchor the advantage is that one can directly specify it in the
service unit as drop-in instead of having to patch the command
invocation. Anyway it's a niche use case and thus not part of the man
page.
Shihao Ren [Wed, 24 Jun 2026 03:46:56 +0000 (11:46 +0800)]
run: default run0 to root explicitly
When neither --user=, --area= nor --empower is given, run0 already
behaves as if root was requested, but only implicitly. That can be
misleading downstream.
Set arg_exec_user to "root" up front in parse_argv_sudo_mode(), so the
intent is visible and the rest of the code can rely on it.
Chris Coulson [Thu, 4 Jun 2026 14:04:29 +0000 (15:04 +0100)]
measure: Support binding signed policies to individual phases
systemd-measure can produce multiple signed policies for different
phases. However, a policy for a TPM resource that includes these signed
policies can currently be satisfied by any policy that is signed
with the same key.
It can be desirable to bind a resource's policy to one or more
specific phases. One way to do this could be to sign policies for
different phases or phase combinations with different keys. Another
approach is to limit the scope of signed policies using a policy
reference.
Using a policy reference works because:
- The reference is included along with the approved policy digest in the
digest that is signed.
- The reference argument is included in the authorization policy for a
resource via the TPM2_PolicyAuthorize assertion.
- During execution of the TPM2_PolicyAuthorize assertion, the TPM checks
that the session's current policy digest is the approved policy digest,
computes a digest from the approved policy digest and the supplied policy
reference, and checks that the resulting digest is the one that was
verified by TPM2_VerifySignature (via the returned ticket).
This adds a new --policyref argument to systemd-measure which binds all
of the signed policies to the specified policy reference. I did consider
making this more intelligent by auto-generating policy references for each
phase, but this approach provides the most flexibility for now. By making
use of the existing --append argument, a signer can produce multiple
signed policies with the same key that are bound to any individual phases
or combinations of phases.
The policy reference is a string without the NULL terminator. It is
supplied to the TPM via the TPM2B_NONCE type, which has a maximum size
equivalent to the size of the largest digest supported by the TPM. As
the signer doesn't know the capabilities of the target TPM,
systemd-measure limits the size of the policy reference to 32 bytes, to
fit within the size of a SHA256 digest.
This also includes the corresponding changes to systemd-cryptenroll and
systemd-repart (to add an equivalent --tpm2-public-key-policyref argument
so that the policy can be bound to the desired corresponding phase, and to
ensure that the policy reference is included in the LUKS2 token metadata),
and systemd-cryptsetup (to handle the policy reference stored in the LUKS2
token metadata).
This doesn't include policy reference support for credentials yet
because it requires a change to the credential headers.
dongshengyuan [Tue, 30 Jun 2026 10:10:49 +0000 (18:10 +0800)]
coredumpctl: use break instead of continue for time bound checks
When iterating journal entries with --until (forward scan) or --since
(reverse scan), the code used continue instead of break after crossing
the time boundary.
Since sd_journal_seek_realtime_usec() is called before the loop to
position at the start of the range, sd_journal_next()/previous()
returns entries in monotonically increasing/decreasing time order.
Once an entry's timestamp exceeds arg_until (or falls below arg_since
in reverse), all subsequent entries will also be out of range.
Using continue caused the entire remaining journal to be scanned
unnecessarily. journalctl uses break for the identical pattern in
src/journal/journalctl-show.c.
Kai Lüke [Mon, 15 Jun 2026 13:11:41 +0000 (15:11 +0200)]
vmspawn/nspawn: Always use a per-machine runtime subdirectory
Some state files were prefixed with the machine name which protected
against collisions when running concurrent machines in a systemd unit
with RuntimeDirectory= set. However, the socket files tpm.sock and
control were not, which caused a race. Prefixing these socket files
does not work because the path gets too long when both the subdirectory
and the socket file within it get unique identifiers.
Instead of prefixing all files, we can rather always create a
subdirectory and use simple names within in. This makes paths shorter
in the normal case and protects against races with RuntimeDirectory=
were instead of directly reusing RUNTIME_DIRECTORY we also create the
normal per-machine subdirectory. Since this here is about runtime state
it should not impact any running VMs/containers. The ssh-proxy looks
only at the normal case and does not support RUNTIME_DIRECTORY, so no
impact there as well.
meson: fix fs.exists() check for fuzz corpus samples
Commit 8355eb6e11 ("meson: Check if files returned by git ls-files
actually exist") checks the paths printed by git ls-files, which are
relative to the project root, with fs.exists(), which resolves them
relative to the test/fuzz subdirectory. As a result, every corpus
sample was silently skipped, and only the generated directives tests
were registered. Resolve the paths against the project source root
instead.
Daan De Meyer [Mon, 29 Jun 2026 12:08:27 +0000 (12:08 +0000)]
compress: handle ZSTD_CONTENTSIZE_UNKNOWN when decompressing blobs
The zstd blob decompression code assumed ZSTD_getFrameContentSize() always
returns the decompressed size, which holds for the journal and coredump blobs
systemd compresses itself with the one-shot ZSTD_compress(). Frames produced by
the streaming API, however, don't record the decompressed size in their header,
so ZSTD_getFrameContentSize() returns ZSTD_CONTENTSIZE_UNKNOWN. Per the zstd
documentation this is not an error, but decompress_blob_zstd() and
decompress_startswith_zstd() bailed out with -EBADMSG regardless.
This broke kexec of kernel images compressed with 'zstd' (e.g. zstd -22), whose
ZBOOT payload is decompressed via decompress_blob().
Treat only ZSTD_CONTENTSIZE_ERROR as fatal and, when the size is unknown, grow
the output buffer as we stream the frame out instead of relying on the recorded
size. Add a regression test that feeds a streaming-compressed (hence
unknown-size) zstd frame through decompress_blob() and decompress_startswith().
Philip Withnall [Thu, 2 Jul 2026 13:06:50 +0000 (14:06 +0100)]
docs: Update memory pressure docs for latest GLib support for it
As of GLib 2.90.0 (not yet released), GLib will fully support the spec,
including the `MEMORY_PRESSURE_WATCH` and `MEMORY_PRESSURE_WRITE`
environment variables, which it did not support previously.
See https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5046
Paul Meyer [Sun, 7 Jun 2026 17:07:41 +0000 (19:07 +0200)]
ukify: show all sections and profiles in inspect JSON output
The JSON output keyed every section by name, so a UKI with repeated
sections only showed the last one: in a multi-profile UKI all but one
.profile/.cmdline/.pcrsig were dropped, as were extra .dtbauto/.efifw.
Report the shared base sections by name at the top level (so .cmdline
etc. stay where they were), each profile as its own by-name object under
a new "_profiles" array, and the alternative-set sections (.dtbauto,
.efifw) as arrays.
Co-developed-by: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Paul Meyer [Thu, 2 Jul 2026 07:28:35 +0000 (09:28 +0200)]
boot: measure SMBIOS via non-TPM runtime measurements
In commit 29c6d1c12549 ("boot: measure select SMBIOS objects
explicitly") we added an explicit SMBIOS measurement in systemd-boot
and systemd-stub, covering cases where firmware doesn't measure SMBIOS
itself. measure_smbios() bailed unless a TPM was present, so it skipped
the measurement on confidential guests that only expose a CC
measurement protocol (e.g. Intel TDX RTMRs).
We can have much higher expectations of the virtual firmware used for
confidential computing, and the firmware is attested, so we can expect
it to always measure SMBIOS in this case. We still do our own
measurement to get a measurement structure similar to that of
non-confidential guests, and as another line of defense.
Michael Vogt [Tue, 30 Jun 2026 16:08:47 +0000 (18:08 +0200)]
firstboot: add new systemd.firstboot=headless mode
This adds a new systemd.firstboot=headless mode. It differs from the
existing systemd.firstboot=no mode in that it still performs the
non-interactive auto-configuration that requires no user input (such
as selecting the only installed locale, or applying settings provided
via credentials), and only skips the prompts that would otherwise
block waiting for user input. In contrast, =no disables that
auto-configuration along with the prompts.
The option is also honoured by homectls firstboot logic and
systemd-cryptenroll, where headless behaves the same as no
(because the is no auto-configuration).
Kai Lüke [Wed, 29 Apr 2026 06:05:04 +0000 (15:05 +0900)]
test: Add sysupdate gpg verification test
The test first checks that simple signatures work and that a foreign
signature is rejected. Then it also tests that sub key signatures are
accepted when the public sub key is included in the signature.
bpf-restrict-fs: use a 32-bit magic key on big-endian too
The inner map is created with a uint32_t key, but the update passed
&magic[i] where magic is a (possibly 64-bit) statfs_f_type_t. On
little-endian the low 32 bits happen to be read; on big-endian 64-bit
(s390x, ppc64 BE) the zero high word is read instead, so every
filesystem collides on key 0 (the allow/deny selector) and
RestrictFileSystems= is silently broken. Pass a truncated copy.
core/scope: don't assert when start is retried during cgroup chown
While a delegated scope waits for the async cgroup chown helper it sits
in SCOPE_START_CHOWN (UNIT_ACTIVATING). unit_start() forwards to
->start() while activating, so scope_start() could be re-entered in
that state and trip assert(s->state == SCOPE_DEAD), aborting PID 1.
Treat SCOPE_START_CHOWN as already-starting instead.
core: donate the fdset to do_reexecute() to avoid a double free
do_reexecute() freed the FDSet on the switch-root/soft-reboot fallback
but the caller's copy stayed live, so main() freed it again if every
fallback exec then failed. Donate the fdset instead: pass it with
TAKE_PTR() and take ownership via a _cleanup_ local, freeing it exactly
once on every exit path.
Shihao Ren [Tue, 30 Jun 2026 12:30:50 +0000 (20:30 +0800)]
run: refuse --no-block when combined with --scope
In the systemd-run --scope mode, --no-block has no actual effect and will
be silently ignored. Therefore, this combination is explicitly rejected to
reduce confusion for users when using it.
shared/install: give the borrowed name back before bailing on error
In the unmask path, when install_changes_add() fails the borrowed
*name was not handed back via TAKE_PTR() before returning, so
install_info_clear() freed the caller's strv entry, leaving a dangling
pointer that is double-freed at the caller's strv_free().
Chris Coulson [Tue, 2 Jun 2026 13:48:55 +0000 (14:48 +0100)]
tpm2-setup: Create and persist an endorsement key
This updates systemd-tpm2-setup to create and persist an endorsement
key if there isn't one already. For each supported EK template profile,
it will read the EK certificate from its NV index if there is one.
When there is an EK certificate present, a primary key is created using
the corresponding template. If the resulting EKpub matches the public
key in the certificate, the created EK is persisted and the process is
complete.
The low-range templates and the high-range RSA 2048/3072 and ECC NIST
P256/P384 storage templates are supported, as detailed in section
5.3 of the "TCG EK Credential Profile For TPM Family 2.0" spec v2.7.
High-range templates are preferred because these permit EK usage without
requiring knowledge of the authorization value for the endorsement
hierarchy, meaning that, like with the SRK, it is possible to restrict
the usage of the endorsement hierarchy whilst still permitting use of
the persistent EK.
The EK is always persisted at handle 0x81010001. This handle is
reserved in Table 2 of the "TCG TPM v2.0 Provisioning Guidance" spec
v1.0r1.0, although this is only a recommendation. This
handle is within the block of handles reserved for endorsement
primary keys in the "Registry of Reserved TPM 2.0 Handles and
Localities" spec v1.2r1.00. Section 2.3.2 of this specification also
makes a suggestion that there should be a relationship between the EK
certificate NV index and a corresponding persistent EK handle by
using handles at the same offsets within their respective ranges.
However, this contradicts the provisioning guidance spec which reserves
0x81010001 when there isn't a certificate at 0x01c00001. For simplicity,
I've chosen to use a single handle for the EK regardless of which profile
it is created with.
The "TCG EK Credential Profile For TPM Family 2.0" spec also provides a
way for endorsement keys to be certified with non-standard templates by
storing the template in a NV index. This is also supported.
The EK creation is not executed with tpm2-setup --early, as there's no
need for it to be created so early, unlike with the SRK. I also haven't
stored EKpub in /var/lib/systemd like with the SRKpub, as I'm not sure
there will be a use case for this yet.
A follow-up PR may be needed to add some internal helpers to make use
of the persisted EK, as use of low-range EKs requires a policy session.
High-range EKs can be used with a HMAC session because they have the
userWithAuth attribute set and we are creating them with an empty
authorization value.
Michael Vogt [Thu, 21 May 2026 14:57:15 +0000 (16:57 +0200)]
core: add absolute-path properties to varlink StartTransient
Similar to PR#42360 this commit adds missing properties for
absolute path handling in io.systemd.Unit.StartTransient for
the `Exec` context and a macro helper to share the common code.
Note that RootHashPath, RootHashSignaturePath need custom apply
functions because the varlink name "RootHashPath" differs from the
name that needs to be written into the unit file ("RootHash=")
and the iovec must be cleared.
In various cases we'll touch the same directories multiple times with
mtools: for example the /loader/ dir itself. Unfortunately mtools' mmd
does not implement a graceful "-p" switch like mkdir, but will do some
interactive name clash thing instead. We can turn this off via "-Ds
-DS", so let's do that. But that's not enough since the tool will still
fail with a non-explanatory exit status of 1. This hence ignores that
failure and proceeds anyway, under the assumption that failures to
create a directory will sooner or later be detected anyway once the
directory is to be populated and turns out not be existing because the
creation failed.
This makes some integration test/mkosi invocations work
non-interactively again.
Paul Meyer [Tue, 30 Jun 2026 11:30:08 +0000 (13:30 +0200)]
core: trust SMBIOS credentials under Intel TDX
SMBIOS OEM strings are host-controlled and normally distrusted by PID1
in confidential VMs. Under TDX, however, the SMBIOS table (including
Type 11) is measured into RTMR0 by the firmware (TDVF), so a remote
verifier can detect host tampering with credentials delivered this
way. Accept them there, while keeping fw_cfg distrusted as those
items are not measured even on TDX.
This lets systemd-vmspawn deliver credentials to TDX guests via the
normal SMBIOS path, unlike SNP which requires the initrd cpio channel.
Add more shutdown timestamps and preserve via LUO (#42671)
Add more shutdown timestamps, export them via D-Bus/Varlink, use them in
`systemd-analyze time`, and preserve them across kexec via LUO. This is
useful to measure reboot performance with added measurement points and
more granular intervals.
Kai Lüke [Wed, 29 Apr 2026 04:57:56 +0000 (13:57 +0900)]
import: Trust subkeys included in signature
With gpg sub keys one can rotate signing keys while having a stable
trust anchor. So far one still had to ship the sub key out of band but
a newer gpg has the option to include the sub key in the signature and
import it automatically. This is safe if we only allow importing a sub
key signed by the top key we already have in the key ring.
Add the --auto-key-import argument to gpg to import subkeys but also
set --import-options=merge-only,import-clean to restrict what we import
to only be sub keys signed by the top key we have in the keyring and
discard any irrelevant parts. The ugly part is that we also have to
work on a temporary copy of the keyring because gpg wants to persist
the added key material but we don't what that here.
Kai Lüke [Wed, 29 Apr 2026 06:06:32 +0000 (15:06 +0900)]
import: Support env var to override gpg keyring
By default there is a fixed keyring in /usr or /etc. But when running
systemd-pull unprivileged in the user context or with a custom transfer
definition as in systemd-sysupdate --definitions=./... (e.g., for local
ParticleOS updates) it is limiting to require that all keys have to be
part of the OS keyring or otherwise no verification can be used. Also,
for testing it is valuable to point it at a different keyring.
Add a SYSTEMD_OPENPGP_KEYRING env var where the omission or empty
assignment sticks to the current behavior of the global OS keyrings but
a keyring path given will take precedence. While an env var can leak
down the process tree and is more difficult to secure for being the
trust anchor the advantage is that one can directly specify it in the
service unit as drop-in instead of having to patch the command
invocation. Anyway it's a niche use case and thus not part of the man
page.
Paul Meyer [Mon, 29 Jun 2026 10:59:45 +0000 (12:59 +0200)]
tsm-report: error on empty report
Previously we would return a valid signature containing the empty
outblob, which is undesirable. In other cases where we cannot query a
report because the guest doesn't support it we currently return an
empty response so the signature aggregator in systemd-report silently
skips it. In this case, we have everything we need to actually get a
report on the guest side, but the host isn't providing us with the
quote, so we fail.
Paul Meyer [Mon, 29 Jun 2026 10:21:02 +0000 (12:21 +0200)]
vmspawn: allow TDX guest to connect to host QGS
To query a TD quote, the TDX guest must connect to the Quote Generation
Service (QGS), a SGX enclave running on the host. We check if the
service is exposed via a well-known unix socket, then pass that socket
or a fallback well-known vsock address to QEMU.
Paul Meyer [Fri, 26 Jun 2026 09:57:29 +0000 (11:57 +0200)]
vmspawn: add Intel TDX confidential VM support
Wire up --coco=tdx alongside the existing SEV-SNP path. TDX requires KVM
on x86_64, a raw TDVF firmware loaded via -bios (no pflash/NVRAM split),
kernel-irqchip=split, and the "host" CPU model since QEMU rejects named
models. Sets up the tdx-guest object and confidential-guest-support=tdx0.
TDX measurement is different from QEMU's kernel-hashes injection: TDX
provides runtime measurements via RTMRs, so the initial measurement only
covers the firmware, which then measures the rest of the boot chain into
those RTMRs (done by OVMF today). Therefore a restriction to direct
kernel boot isn't required either.
Luca Boccassi [Fri, 26 Jun 2026 12:57:27 +0000 (13:57 +0100)]
core: deserialize soft-reboot shutdown timestamps into previous-* fields
Until now the SHUTDOWN_START/FINISH timestamps were carried across a
soft-reboot in the same fields, so afterwards they described the
previous boot rather than the current one, mixing current- and
previous-boot data in the same set. Mirror what is now done on kexec
and move them into the PREVIOUS_SHUTDOWN_* fields on deserialization.
Also reset local timestamps for events that rerun (eg: generators,
units loading, etc) on soft-reboot and switch-root.
Nick Rosbrook [Fri, 26 Jun 2026 14:00:21 +0000 (10:00 -0400)]
test: reduce number of disks in TEST-64-UDEV-STORAGE-simultaneous_events on Debian/Ubuntu
This test never finishes in Ubuntu autopkgtest with the current values,
and is currently skipped all together on Debian. When running on either,
reduce the number of disks to make the test more reliable.
dongshengyuan [Wed, 24 Jun 2026 05:01:32 +0000 (13:01 +0800)]
sd-journal: rate-limit tail timestamp refresh during iteration
journal_file_read_tail_timestamp() is called unconditionally in
next_beyond_location() for every file on every iteration step,
resulting in O(N x files) volatile mmap reads. For large queries
like 'journalctl -n 1000000' this makes the command unusably slow
(~5 minutes on systems with many journal files).
Rather than suppressing the call entirely (which would make the
inotify path fully load-bearing for cross-boot ordering), rate-limit
it to at most once per second per file. This reduces the overhead
from O(N x files) to O(T x files) where T is the iteration time in
seconds, while still providing periodic refresh as a fallback for
any missed inotify events and keeping cross-boot ordering
reasonably fresh.
Embed a RateLimit struct in JournalFile for this purpose.
Measured improvement on a real system: 5:24 -> 2:39 (-51%) for
'journalctl -n 1000000'.
Shihao Ren [Tue, 30 Jun 2026 06:18:14 +0000 (14:18 +0800)]
man: fix wrong KillUserProcesses= default in systemd-run(1)
systemd-run(1) hard-coded "the default" wording for KillUserProcesses=, but the
actual compile-time default is determined by the -Ddefault-kill-user-processes=
meson build option, which distributions set differently at packaging time.
Luca Boccassi [Mon, 29 Jun 2026 17:07:01 +0000 (18:07 +0100)]
ptyfwd: avoid touching forwarder after exit drain
on_exit_event() can synchronously drain buffered data through
shovel_force(). If that completes the drain, pty_forward_done() runs
the hangup handler and may free the forwarder, so do not call
pty_forward_done() again afterwards.
[ 25.052879] TEST-74-AUX-UTILS.sh[909]: ==909==ERROR: AddressSanitizer: heap-use-after-free on address 0x7ccc8a5e0b41 at pc 0x7efc8cde106e bp 0x7ffd668629b0 sp 0x7ffd668629a8
[ 25.053136] TEST-74-AUX-UTILS.sh[909]: READ of size 1 at 0x7ccc8a5e0b41 thread T0
[ 25.092784] TEST-74-AUX-UTILS.sh[909]: #0 0x7efc8cde106d in pty_forward_done ../src/src/shared/ptyfwd.c:187
[ 25.093920] TEST-74-AUX-UTILS.sh[909]: #1 0x7efc8cdedba1 in on_exit_event ../src/src/shared/ptyfwd.c:904
[ 25.094148] TEST-74-AUX-UTILS.sh[909]: #2 0x7efc8d375eff in source_dispatch ../src/src/libsystemd/sd-event/sd-event.c:4301
[ 25.095074] TEST-74-AUX-UTILS.sh[909]: #3 0x7efc8d378032 in dispatch_exit ../src/src/libsystemd/sd-event/sd-event.c:4431
[ 25.095295] TEST-74-AUX-UTILS.sh[909]: #4 0x7efc8d37e932 in sd_event_dispatch ../src/src/libsystemd/sd-event/sd-event.c:4896
[ 25.095467] TEST-74-AUX-UTILS.sh[909]: #5 0x7efc8d37fc8c in sd_event_run ../src/src/libsystemd/sd-event/sd-event.c:4971
[ 25.095647] TEST-74-AUX-UTILS.sh[909]: #6 0x7efc8d3800ad in sd_event_loop ../src/src/libsystemd/sd-event/sd-event.c:4992
[ 25.097174] TEST-74-AUX-UTILS.sh[909]: #7 0x56049b541aba in start_transient_service ../src/src/run/run.c:2479
[ 25.097403] TEST-74-AUX-UTILS.sh[909]: #8 0x56049b552a65 in run ../src/src/run/run.c:3288
[ 25.097569] TEST-74-AUX-UTILS.sh[909]: #9 0x56049b552cb0 in main ../src/src/run/run.c:3291
[ 25.097780] TEST-74-AUX-UTILS.sh[909]: #10 0x7efc8b882300 in __libc_start_call_main (/lib64/libc.so.6+0x7d300) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.097952] TEST-74-AUX-UTILS.sh[909]: #11 0x7efc8b882417 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x7d417) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.098139] TEST-74-AUX-UTILS.sh[909]: #12 0x56049b51cf54 in _start (/usr/bin/systemd-run+0x19f54) (BuildId: 0daacdb9f20151f3517312ee99e489a9b8f4989c)
[ 25.098316] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0b41 is located 193 bytes inside of 2384-byte region [0x7ccc8a5e0a80,0x7ccc8a5e13d0)
[ 25.099202] TEST-74-AUX-UTILS.sh[909]: freed by thread T0 here:
[ 25.099410] TEST-74-AUX-UTILS.sh[909]: #0 0x7efc8e76420f in free.part.0 (/lib64/libasan.so.8+0x16420f) (BuildId: 173395e60f171589489dde2b7a156d0ae380734b)
[ 25.099557] TEST-74-AUX-UTILS.sh[909]: #1 0x7efc8cdf14d1 in pty_forward_free ../src/src/shared/ptyfwd.c:1122
[ 25.099691] TEST-74-AUX-UTILS.sh[909]: #2 0x56049b535328 in pty_forward_handler ../src/src/run/run.c:1952
[ 25.100063] TEST-74-AUX-UTILS.sh[909]: #3 0x7efc8cde138c in pty_forward_done ../src/src/shared/ptyfwd.c:196
[ 25.100197] TEST-74-AUX-UTILS.sh[909]: #4 0x7efc8cdec757 in shovel ../src/src/shared/ptyfwd.c:813
[ 25.101144] TEST-74-AUX-UTILS.sh[909]: #5 0x7efc8cdecc1f in shovel_force ../src/src/shared/ptyfwd.c:828
[ 25.102273] TEST-74-AUX-UTILS.sh[909]: #6 0x7efc8cdedb82 in on_exit_event ../src/src/shared/ptyfwd.c:899
[ 25.103564] TEST-74-AUX-UTILS.sh[909]: #7 0x7efc8d375eff in source_dispatch ../src/src/libsystemd/sd-event/sd-event.c:4301
[ 25.103712] TEST-74-AUX-UTILS.sh[909]: #8 0x7efc8d378032 in dispatch_exit ../src/src/libsystemd/sd-event/sd-event.c:4431
[ 25.104081] TEST-74-AUX-UTILS.sh[909]: #9 0x7efc8d37e932 in sd_event_dispatch ../src/src/libsystemd/sd-event/sd-event.c:4896
[ 25.104954] TEST-74-AUX-UTILS.sh[909]: #10 0x7efc8d37fc8c in sd_event_run ../src/src/libsystemd/sd-event/sd-event.c:4971
[ 25.105160] TEST-74-AUX-UTILS.sh[909]: #11 0x7efc8d3800ad in sd_event_loop ../src/src/libsystemd/sd-event/sd-event.c:4992
[ 25.105310] TEST-74-AUX-UTILS.sh[909]: #12 0x56049b541aba in start_transient_service ../src/src/run/run.c:2479
[ 25.105454] TEST-74-AUX-UTILS.sh[909]: #13 0x56049b552a65 in run ../src/src/run/run.c:3288
[ 25.105572] TEST-74-AUX-UTILS.sh[909]: #14 0x56049b552cb0 in main ../src/src/run/run.c:3291
[ 25.106136] TEST-74-AUX-UTILS.sh[909]: #15 0x7efc8b882300 in __libc_start_call_main (/lib64/libc.so.6+0x7d300) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.106263] TEST-74-AUX-UTILS.sh[909]: #16 0x7efc8b882417 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x7d417) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.106385] TEST-74-AUX-UTILS.sh[909]: #17 0x56049b51cf54 in _start (/usr/bin/systemd-run+0x19f54) (BuildId: 0daacdb9f20151f3517312ee99e489a9b8f4989c)
[ 25.106792] TEST-74-AUX-UTILS.sh[909]: previously allocated by thread T0 here:
[ 25.106957] TEST-74-AUX-UTILS.sh[909]: #0 0x7efc8e76515f in malloc (/lib64/libasan.so.8+0x16515f) (BuildId: 173395e60f171589489dde2b7a156d0ae380734b)
[ 25.108013] TEST-74-AUX-UTILS.sh[909]: #1 0x7efc8cddebed in malloc_multiply ../src/src/basic/alloc-util.h:92
[ 25.108188] TEST-74-AUX-UTILS.sh[909]: #2 0x7efc8cdee47b in pty_forward_new ../src/src/shared/ptyfwd.c:955
[ 25.108324] TEST-74-AUX-UTILS.sh[909]: #3 0x56049b538700 in run_context_setup_ptyfwd ../src/src/run/run.c:2130
[ 25.108472] TEST-74-AUX-UTILS.sh[909]: #4 0x56049b5419f9 in start_transient_service ../src/src/run/run.c:2465
[ 25.109152] TEST-74-AUX-UTILS.sh[909]: #5 0x56049b552a65 in run ../src/src/run/run.c:3288
[ 25.109311] TEST-74-AUX-UTILS.sh[909]: #6 0x56049b552cb0 in main ../src/src/run/run.c:3291
[ 25.109450] TEST-74-AUX-UTILS.sh[909]: #7 0x7efc8b882300 in __libc_start_call_main (/lib64/libc.so.6+0x7d300) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.109847] TEST-74-AUX-UTILS.sh[909]: #8 0x7efc8b882417 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x7d417) (BuildId: 830c94f480c13d9b01dc65a1035310882136094a)
[ 25.110760] TEST-74-AUX-UTILS.sh[909]: #9 0x56049b51cf54 in _start (/usr/bin/systemd-run+0x19f54) (BuildId: 0daacdb9f20151f3517312ee99e489a9b8f4989c)
[ 25.110911] TEST-74-AUX-UTILS.sh[909]: SUMMARY: AddressSanitizer: heap-use-after-free ../src/src/shared/ptyfwd.c:187 in pty_forward_done
[ 25.111054] TEST-74-AUX-UTILS.sh[909]: Shadow bytes around the buggy address:
[ 25.111213] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[ 25.111378] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[ 25.111520] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0980: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[ 25.112210] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[ 25.112399] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0a80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.112767] TEST-74-AUX-UTILS.sh[909]: =>0x7ccc8a5e0b00: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
[ 25.112901] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0b80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.113789] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0c00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.113906] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0c80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.114046] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0d00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.114159] TEST-74-AUX-UTILS.sh[909]: 0x7ccc8a5e0d80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[ 25.114278] TEST-74-AUX-UTILS.sh[909]: Shadow byte legend (one shadow byte represents 8 application bytes):
[ 25.114400] TEST-74-AUX-UTILS.sh[909]: Addressable: 00
[ 25.115099] TEST-74-AUX-UTILS.sh[909]: Partially addressable: 01 02 03 04 05 06 07
[ 25.115246] TEST-74-AUX-UTILS.sh[909]: Heap left redzone: fa
[ 25.115365] TEST-74-AUX-UTILS.sh[909]: Freed heap region: fd
[ 25.115483] TEST-74-AUX-UTILS.sh[909]: Stack left redzone: f1
[ 25.115618] TEST-74-AUX-UTILS.sh[909]: Stack mid redzone: f2
[ 25.115882] TEST-74-AUX-UTILS.sh[909]: Stack right redzone: f3
[ 25.116735] TEST-74-AUX-UTILS.sh[909]: Stack after return: f5
[ 25.116857] TEST-74-AUX-UTILS.sh[909]: Stack use after scope: f8
[ 25.116974] TEST-74-AUX-UTILS.sh[909]: Global redzone: f9
[ 25.117108] TEST-74-AUX-UTILS.sh[909]: Global init order: f6
[ 25.117257] TEST-74-AUX-UTILS.sh[909]: Poisoned by user: f7
[ 25.118128] TEST-74-AUX-UTILS.sh[909]: Container overflow: fc
[ 25.118288] TEST-74-AUX-UTILS.sh[909]: Array cookie: ac
[ 25.118433] TEST-74-AUX-UTILS.sh[909]: Intra object redzone: bb
[ 25.118546] TEST-74-AUX-UTILS.sh[909]: ASan internal: fe
[ 25.118684] TEST-74-AUX-UTILS.sh[909]: Left alloca redzone: ca
[ 25.118792] TEST-74-AUX-UTILS.sh[909]: Right alloca redzone: cb
[ 25.119282] TEST-74-AUX-UTILS.sh[909]: Command: systemd-run --quiet --pty -- bash -c echo PTY_FORWARD_READY; exec sleep 60
[ 25.119395] TEST-74-AUX-UTILS.sh[909]: ==909==ABORTING
dongshengyuan [Tue, 30 Jun 2026 01:47:22 +0000 (09:47 +0800)]
boot/random-seed: create \loader\ dir if missing when seeding
When the random seed file doesn't exist but we have good entropy
(seeded_by_efi=true), we attempt to create it. This requires a handle
to the \loader\ directory, which may not exist on systems using
UKI+EFISTUB without systemd-boot installed.
Obtain the directory handle by first trying a read-only open; if that
returns EFI_NOT_FOUND, create the directory. We deliberately avoid
requesting write access on an already-present directory because some
firmware implementations return EFI_INVALID_PARAMETER for a
WRITE|CREATE open on an existing directory — this would be logged at
LOG_ERR and abort seed creation on systems where \loader\ exists but
random-seed does not (the normal systemd-boot layout).
Once a handle to \loader\ is obtained, open the seed file relative to
that handle rather than using the full path from root.
Introduced-by: c0e7046c17 ("boot: log about RO I/O errors at debug level.") Fixes: #42801 Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Luca Boccassi [Mon, 29 Jun 2026 14:05:51 +0000 (15:05 +0100)]
env-util: ensure NUL termination of the replace_env_argv() output array
The output array is allocated with new() and left uninitialized, but a
bare unset "$VAR" token expands to nothing and writes no terminator.
When such a token leads or is the only word, the returned strv is left
without a trailing NULL.
dongshengyuan [Tue, 30 Jun 2026 09:13:10 +0000 (17:13 +0800)]
resolvectl: fix JSON reply cleanup in varlink_dump_dns_configuration
varlink_call_and_log() does not hand out a new reference for the reply
object, so the caller should not unref it. The _cleanup_(sd_json_variant_unrefp)
on reply was therefore wrong from the start.
The original TAKE_PTR(reply) was working around this incorrect cleanup
by preventing it from firing, but that left reply's refcount one too
high after sd_json_variant_ref(v) incremented the parent's count.
Fix by dropping _cleanup_(sd_json_variant_unrefp) from the reply
variable declaration entirely, as suggested by Lennart Poettering.
Luca Boccassi [Fri, 19 Jun 2026 22:48:21 +0000 (23:48 +0100)]
core/shutdown: add more shutdown timestamps
We already record when shutdown.target is initiated (ShutdownStart).
This adds a few more measuring points during shutdown: when
shutdown.target completes (i.e., all units stopped), when the shutdown
binary starts executing, and immediately before it hands off to the
kernel.
Obviously these are immediately lost upon reboot. But they will later
be wired in LUO so they get preserved on kexec, and can be inspected,
which is very useful for performance measurements.
dongshengyuan [Tue, 30 Jun 2026 09:11:32 +0000 (17:11 +0800)]
exec-invoke: fix wrong errno in log_error_errno for setenv failure
When setenv("CREDENTIALS_DIRECTORY") fails, log_error_errno() was
called with the stale return value of exec_context_get_credential_directory()
(which is >= 0 on success) instead of errno.
The %m in the format string correctly expands from libc's errno, so
the human-readable log message was unaffected. However, the structured
journal field ERRNO= received an incorrect value (0 or positive),
making automated log analysis and alerting on this failure unreliable.
Ronan Pigott [Mon, 29 Jun 2026 19:14:58 +0000 (12:14 -0700)]
run: do not munge user.slice with --slice-inherit
When using --slice-inherit, setting arg_slice would inadvertently munge
user.slice into the current user slice, usually producing a slice name
like user-1000-user.slice. This is treated as a user-UID slice by
user-.slice.d/10-defaults.conf, resulting in a strange description:
literally "User slice for UID user" (instead of an actual user UID).
Keep arg_slice empty when using --slice-inherit so that we actually
inherit from the relevant slice instead of a munged version.
portable: leave room for trailing NUL in metadata receive buffer
receive_portable_metadata() reads each item into a stack buffer of
PATH_MAX + NAME_MAX + 2 bytes, passes the full sizeof() as the recv
iovec length, and then NUL-terminates with iov_buffer[n] = 0. recvmsg()
can return n equal to the buffer size, so the terminator is written one
byte past the end.
Grow the buffer by one byte and cap the iovec at sizeof - 1, so a full
record is still received and the trailing NUL always fits, matching the
coredump-receive.c reader.
dongshengyuan [Mon, 29 Jun 2026 05:24:51 +0000 (13:24 +0800)]
man: document that $XDG_CONFIG_HOME affects environment.d lookup path
Align the documentation with the actual behavior: if $XDG_CONFIG_HOME is
set to an absolute path in the user service manager environment, it takes
precedence over the default ~/.config/ when locating environment.d files.
Also note the bootstrapping limitation that variables defined inside
environment.d files are not yet available when the generator runs.
Luca Boccassi [Mon, 29 Jun 2026 10:56:30 +0000 (11:56 +0100)]
test: make TEST-07-PID1.issue-14566 more robust
The test slept a fixed 4s after starting the service, then read the
child PID from /leakedtestpid. On a loaded host the executor had not
exec'd the script yet:
Make the service Type=notify and notify readiness after writing the PID
file, and wait for the service to go inactive in a timeout loop instead
of fixed sleeps.
wangzhaohui [Wed, 24 Jun 2026 03:10:56 +0000 (11:10 +0800)]
shell-completion: add missing commands and options to timedatectl zsh
The zsh completion for timedatectl was missing three commands ('show',
'ntp-servers', 'revert') and five options (--monitor, -p/--property=,
-a/--all, --value, -P) that are already present in the bash completion,
documented in the man page, and implemented in the binary.
dongshengyuan [Mon, 29 Jun 2026 02:20:59 +0000 (10:20 +0800)]
tmpfiles: fix device node major:minor logging to use i->major_minor
The debug log after creating a device node passed major(i->mode) and
minor(i->mode) to format the device number, but i->mode holds the
file permission bits (e.g. 0644), not the device number.
The device major:minor is stored in i->major_minor, which is also
what mknodat() receives. Use that field instead so the log correctly
reports the created device number.
dongshengyuan [Mon, 29 Jun 2026 06:54:51 +0000 (14:54 +0800)]
logind: fix typo in reboot-to-boot-loader-entry path
SetRebootToBootLoaderEntry on non-EFI systems wrote the boot loader
entry name to /run/systemd/reboot-boot-to-loader-entry (wrong order),
while the getter and unlink both use the correct path
/run/systemd/reboot-to-boot-loader-entry.
The written value was never read back, silently breaking the feature
on non-EFI systems.
dongshengyuan [Mon, 29 Jun 2026 06:53:35 +0000 (14:53 +0800)]
journal-verify: fix offset reported for tail hash mismatch
After walking a hash chain, the loop exits with p == 0. The error()
call for a tail_hash_offset mismatch passed p as the file offset,
printing 0000000000000000 instead of the actual last data object.
Pass 'last' instead, which holds the offset of the final chain entry.
dongshengyuan [Mon, 29 Jun 2026 02:20:36 +0000 (10:20 +0800)]
tmpfiles: propagate clean_item_instance() error in clean_item()
The CREATE_DIRECTORY, TRUNCATE_DIRECTORY, CREATE_SUBVOLUME,
CREATE_SUBVOLUME_INHERIT_QUOTA, CREATE_SUBVOLUME_NEW_QUOTA and
COPY_FILES branches in clean_item() called clean_item_instance()
but discarded its return value, always returning 0.
If dir_cleanup() fails (e.g. cannot stat a subdirectory), the error
is silently swallowed and the caller has no way to detect the failure.
The sibling EMPTY_DIRECTORY/IGNORE_PATH branches already propagate
the error correctly via glob_item().
The file is located outside mkosi/ subdirectory, hence currently unused.
If this is moved to mkosi/ subdirectory, the config conflicts with
TEST-58-REPART. Let's remove it at least now, and reintroduce it later
at correct place with test adjustment if this is really useful.