This introduces PinnedResources as a structure combining pinned
references to a root directory, root image, or root mstack. This is not
only easier to work with, but essential to make certain unpriv things
work, as we need some mechanism to pin resources before we drop into a
userns which might possibly not provide access anymore to those
resources.
Hence this does two things: introduce the new structure, and immediately
hook it up so that we pin things properly before dropping into userns,
and then makes use of this after dropping the right way, and enables
unpriv userns operation.
The concept is generic enough to eventually implement extension images +
mount images with the same structure, but in order to keep the changes
managable this is left for another time.
(This also makes one further clean-up: client-side verity-reuse checks
are moved server side if we are unpriv. Previously we'd do them client
side, but they were doomed to fail because of lack of privs. Hence let's
drop the client side if we are unpriv and purely do them server-side in
that case.)
So far we opened a new Varlink connection for every mountfsd/nsresourced
method call. Given each tool only does a very small number of calls
(usually 1…5) on them and the connections are cheap this is not too
wasteful. Nonetheless, let's do something about it, and allow reusing
the connection for multiple calls.
This not only makes things a bit more efficient, but has one more
important benefit: Varlink connections pin the security context of the
client when connecting. This means that varlink method calls done with a
connection established while some code was privileged will still operate
as privieged once privs are dropped, until the connection is closed.
This pinning effect is really nice, as it gives us behaviour in a
"capability system" like scheme. Later code is going to use that to
continue doing certain priv userns ops even after unsharing userns and
becoming fully unpriv.
namespace: extend bind mount ignore field to permission issues
A later commit will add transient allocation of user namespaces with
dynamic UID range assignment. That creates certain permission issues.
Let's hence allow them to be handled gracefully in case the 'ignore'
field is set for a mount.
namespace: port mount_private_apivfs() to fsopen() and friends
This is not just refactoring, but has the big benefit that it makes us
indepdendent from a temporary directory we might not have enough access
to create. (This matters with the new PrivateUsers=managed).
core: introduce exec_context_with_rootfs_strict() as a stricter version of exec_context_with_rootfs()
We have two very similar checks in place: in some contexts we want to
know if *any* RootDirectory= is configured, in the other we want to
suppress if it is configured to our regular root. Let's add a helper for
both (even if we only need it once), to make the mirrored behaviour
clear.
pull-job: make sure pull_job_restart() can be used to fetch the same resource again, just with new headers
Let's flush out all response state from the job, but let's keep the
request data previously configured, in particular the headers set. This
is useful to re-request a resource, just with a slightly modified or
identical URL.
Kai Lüke [Thu, 19 Feb 2026 07:01:06 +0000 (16:01 +0900)]
openssl-util: pass the UI callback for interactive PIN prompts
Observed with the tpm2 provider and the tpm2tss engine was that the
auth process failed because the provider/engine could not ask for the
PIN through the callback, resulting in:
"Failed to load private key from ...: Input/output error"
Apparently the default UI method is not enough and the key setup
functions expect an explicit method.
Pass the existing UI method through as callback for the key setup.
Oblivionsage [Wed, 18 Feb 2026 17:22:48 +0000 (18:22 +0100)]
pe-binary: wrap remaining LE fields with byte-swap macros
Follow-up to 02cab70acf5ca67e838d0d34860baacbf9fc3b6c. pe_hash(),
section_offset_cmp() and uki_hash() still had a bunch of raw accesses
to LE fields (e_lfanew, SizeOfHeaders, PointerToRawData, SizeOfRawData,
VirtualSize, certificate_table->Size) without le32toh(), so they'd
produce garbage on big-endian.
Also wrap VirtualSize in bootspec.c for consistency.
Daan De Meyer [Wed, 18 Feb 2026 20:27:45 +0000 (21:27 +0100)]
vpick: Make suffix a single string again instead of a strv
This was made a strv to handle either directories or raw images but
since we now handle that via multiple PickFilter instances, we don't
need suffixes to be a strv anymore.
Daan De Meyer [Wed, 18 Feb 2026 14:58:39 +0000 (15:58 +0100)]
machined: Skip root user namespace check for user managers
You can register whatever process you want in the user machined instance
that is running in the same namespace as pid 1 as machined won't be allowed
to do anything privileged anyway that could be dangerous when running as a user
instance.
We have to skip the check as we user machined instances don't have
privileges to inspect pid1's user namespaces.
Dmitry V. Levin [Wed, 18 Feb 2026 08:00:00 +0000 (08:00 +0000)]
github/workflows: disable persisting credentials for actions/checkout
Set `persist-credentials: false` for actions/checkout.
By default, using `actions/checkout` causes a credential to be persisted on
disk. Subsequent steps may accidentally publicly persist the credential, e.g.
by including it in a publicly accessible artifact via actions/upload-artifact.
However, even without this, persisting the credential on disk is non-ideal
unless actually needed.
Dmitry V. Levin [Wed, 18 Feb 2026 08:00:00 +0000 (08:00 +0000)]
github/dependabot: set cooldown period
By default, Dependabot does not perform any cooldown on dependency updates.
In other words, a regularly scheduled Dependabot run may perform an update
on a dependency that was just released moments before the run began.
This presents both stability and supply-chain security risks.
To mitigate these risks, explicitly set Dependabot cooldown period to 7 days.
ID_INTEGRATION is not being updated with hwdb entries, asign the new
value to it when hwdb has been imported.
We still need the 65-integration.rule assignment for devices that aren't
in hwdb.
While at it remove unneeded check in 70-touchpad.rules, as it was not
added for 70-joystick.rules with the statement if ID_INPUT_* is set and
ID_INPUT not, there is a bug elsewhere. And remove unneeded gotos in
both files.
Oblivionsage [Tue, 17 Feb 2026 18:39:05 +0000 (19:39 +0100)]
pe-binary: fix missing le16toh() on NumberOfSections in pe_hash/uki_hash
pe_hash() and uki_hash() pass pe_header->pe.NumberOfSections directly
to typesafe_qsort() and FOREACH_ARRAY() without le16toh(). On
big-endian (s390x), NumberOfSections=3 gets read as 0x0300 (768),
while pe_load_sections() correctly converts it and only allocates 3
sections. This makes qsort process 768 elements on a 3-element
buffer, causing a heap-buffer-overflow (confirmed with ASAN on
native s390x).
Wrap all three raw usages with le16toh() to match pe_load_sections().
dissect-image: measure Verity before making use of them
Let's hook up the dissection logic with the new measurement infra, and
issue the measurement after successfully unlock an image, but before
returning to the caller.
Note that ideally we'd do this measurement in the kernel, so that we can
place it after authenticating the root hash, but before activating the
medium. One day we should be able to do that via eBPF based on userspace
policies, but for now, this would require too much kernel rework.
Let's however make sure our measurements only contain data that the
kernel could know too, so that we hopefully can move these measurements
to the kernel without changing their formatting.
pcrextend-util: add helpers for measuring roothash/signature of Verity volumes
This adds infrastructure for measuring Verity root hashes from
userspace, along with he issuer/serial of the signatures used to unlock
them.
We measure the triplet of volume name, root hash and issuer/serial. if
confext/sysext use different signing keys then this ensures the event
log carry information about the type of image measures.
pkcs7-util: add helpers for extracting signer info from PKCS7 signatures
Once we start measuring Verity volumes as we activate them we want to
include information about the signature keys used, so that we can have
distinct ones for confext and for sysext and ther purposes and thus have
a cryptograpically protected hint about the kind of image we have
activated in the event log.
Ideally we'd measure a fingerprint of the signing certificate here, but
we don't have that here typically (as PKCS7 signatures used here
typically do not embed that), hence use the next best thing: the issuer
name and the serial number.
tpm2-setup: introduce nvpcr for measuring Verity images
I thnk it's crucial we start to measure Verity images as we activate
them, so that the event log has a full trace of the compisition of the
system. hence let's introduce a new NvPCR for this purpse, under the
name "verity".
pcrextend: allow access to the userspace event type field when measuring something
It think we should move most measurements out of the individual tools
requesting them and into the pcrextend service via Varlink, so that
fewer components require access to the TPM.
This only works however, if we can actually write full-blown event log
records via this mechanism, and for that we still were missing access to
the userspace event type we insert into the event log. Add that.
cryptsetup: move default choice of nvpcr for keyslots from generator into cryptsetup
Let's pick the default NvPCR name to use inside of cryptsetup itself, instead
of in the generator. I think this is the better choice, since it means
the default can also be used if the regular verittab generator is used
instead of the gpt-auto generator.
r-vdp [Sun, 11 Jan 2026 18:49:34 +0000 (19:49 +0100)]
systemd-boot: add a preferred setting that's similar to default but avoids booting known-bad entries
Motivation:
Currently, when setting the default boot pattern, boot assessment status
is not taken into account. This means that with boot assessment enabled,
when an explicit boot entry is configured as the default entry using an
EFI var, as is common for instance in A/B boot schemes, the configured
entry will be booted indefinitly, regardless of the entry's boot
assessment status.
In order to allow for this use case in combination with boot assessment,
we introduce a new `preferred` keyword, both in the config file and in the
bootctl CLI, that acts very similar to the existing `default` keyword but
takes boot assessment into account and never selects any entries that
have been marked as bad.
If the preferred pattern does not resolve to any bootable entry, and a
default pattern is also specified, then the default pattern will be
considered next, and we may then still select a known-bad entry to be
booted.
Yu Watanabe [Tue, 17 Feb 2026 18:27:41 +0000 (03:27 +0900)]
boot: Fix UKI boot for kernels with non-zero ImageBase (#40429)
The current code incorrectly subtracts ImageBase from section
VirtualAddress values when loading sections into memory. This is based
on a misunderstanding of the PE specification.
VirtualAddress in section headers is the address of the first byte of
the section relative to the image base when the section is loaded into
memory. In other words, VirtualAddress is already an RVA measured from
the image base, it is definitely NOT an absolute address that needs to
be adjusted.
So when loading a PE image into a newly allocated buffer, sections
should be copied to buffer + VirtualAddress, regardless of what
ImageBase says. The ImageBase field merely indicates the *preferred*
load address, it does not affect how section RVAs are interpreted.
This happens to not cause issues when ImageBase was 0 (since
VirtualAddress - 0 = VirtualAddress), which is why this bug went
undetected on modern kernels. However, it fails with kernels that have
non-zero ImageBase values.
So let's remove the nonsensical VirtualAddress < ImageBase check, and
remove the ImageBase subtractions from section loading offsets. This
lets all kernel UKIs work properly again.
Daan De Meyer [Mon, 16 Feb 2026 12:14:58 +0000 (13:14 +0100)]
sd-bus: Don't fork unnecessarily to connect to container
Let's check if we're already in the right namespaces and call connect()
directly if that's the case. This can easily happen when the machine is
specified as .host or so.
Daan De Meyer [Tue, 17 Feb 2026 14:36:00 +0000 (15:36 +0100)]
namespace-util: Do is_our_namespace() checks first in namespace_enter()
These checks may rely on /proc on older kernels which we could lose access
to by joining namespaces so let's do all the checks first and then join
namespaces.
NEWS: clarify the change for non-system accounts in v260 vs. v259
In 5c05a339c6665e3a35f6000a46dcd1da80fcdced I retroactively changed the NEWS
entry for v259. But this is very confusing, because it looks like the original
change never happened and it's not clear what is being reverted.
Let's restore the original text, and just add a short note, but then move
the new text to the section for v260.