Mike Yuan [Thu, 12 Feb 2026 01:58:35 +0000 (02:58 +0100)]
vmspawn: clean up OVMF secure boot support check a bit
find_ovmf_config() would do filtering based on arg_secure_boot
already, hence the mismatch can only occur if we're using
user-specified firmware. So be explicit about this in log.
Mike Yuan [Wed, 11 Feb 2026 22:15:24 +0000 (23:15 +0100)]
parse-argument: make parse_tristate_argument() do something useful
I expressed the issue I have with parse_tristate_argument()
in #37751: it doesn't add any value to direct use of parse_tristate();
on the contrary, it doesn't support means to reset the arg to "auto"/-1 state.
The mere reason it existed is that we need a int type ret param.
Since the previous attempt to address this mess failed, let's
try to make the function more useful by making it accept "auto".
I figure this is useful on its own.
As requested in
https://github.com/systemd/systemd/pull/40652#discussion_r2831833996,
the function name is suffixed with _with_auto() to establish
that "auto" is handled internally.
Michael Vogt [Fri, 20 Feb 2026 10:20:02 +0000 (11:20 +0100)]
machine: switch CleanPool to SD_VARLINK_REQUIRES_MORE
The CleanPool requires --more to be set and checks that in
`vl_method_clean_pool`. By switching to SD_VARLINK_REQUIRES_MORE
this will automatically be handled and is more clear to
the varlink users.
Based on the comment from Lennart in
https://github.com/systemd/systemd/pull/40650#discussion_r2832378002
and the work done by Mike in 09388a6b9e4 (thanks!).
Julian Sparber [Thu, 12 Feb 2026 16:32:32 +0000 (17:32 +0100)]
repart-varlink: Consider only managed parititions for size errors
Report DiskTooSmall only if partitions managed by repart don't fit the
disk. Because if the disk is already full with forigin partitions we
would always report DiskTooSmall instead of InsufficentFreeSpace.
Julian Sparber [Thu, 12 Feb 2026 16:28:43 +0000 (17:28 +0100)]
repart-varlink: Calculate the size of foreign partitions
To decide whether the disk is to small or has insufficient free space we
need to know how much of the disk is filled with foreign partitions.
The calculated size is used in a future commit.
Julian Sparber [Wed, 19 Nov 2025 17:30:04 +0000 (18:30 +0100)]
repart: Sum partitions size to get current disk size instead of using total size
When working on disks the disk may have a total size bigger then the
actual allocated size, therefore sum up the current partitions to
calculate the current disk size instead of asuming that the entire disk
is currently allocated.
Let's also rename the "metric_prefix" to "name", because it's actually
the servce name, and by giving it this generic name we can use it
reasonably in log messages.
Let's stay close to Varlink's naming rules and insist that metrics
prefixes must be valid varlink interface names, and suffixes are valid
varlink field names.
The former rule is clear: because a metric <x>.<y> can only be provided
by a varlink service <x>, it is obvious we should validate them the
same way. Validating the suffix via varlink field rules is not that
obvious, but I think it makes sense to stay close to Varlink naming
rules if we already started out at one place.
Yu Watanabe [Fri, 20 Feb 2026 07:18:07 +0000 (16:18 +0900)]
mstack: fix resource leak on failure path
This makes the mstack_load() requires 'ret', as clearing the loaded
mstack without use is meaningless. All callers already pass non-NULL for
the argument.
In a way, metrics are a key-value concept, where the key is a triplet of
metrics family name, object name, and "fields". Let's put them together
in the varlink call, and put the value last, separately from that.
Also, update docs a bit, i.e be explicit about the metrics *family* name
everyhwere.
Daan De Meyer [Wed, 18 Feb 2026 18:30:12 +0000 (19:30 +0100)]
uid-range: Handle same userns in uid_range_load_userns_by_fd()
If we're asked to look up our own user namespace mapping, don't go
via fd as trying to setns() to our own user namespace in
userns_enter_and_pin() would fail with EPERM as the kernel doesn't
allow switching to your own userns.
repart: return 1 from probe_sector_size_prefer_ioctl() on block device success
probe_sector_size() returns 1 when it successfully determines the sector size,
0 when falling back to the default. blockdev_get_sector_size() returns 0 on
success. probe_sector_size_prefer_ioctl() was passing blockdev_get_sector_size()
return value through directly, so caller is checking r > 0 to detect a
successfully probed sector size never saw it for block devices.
In context_load_partition_table(), this caused fs_secsz to stay at 4096 bytes
even on 512-byte sector block devices, making verity hash partition sizes wrong
unless --sector-size=512 was passed explicitly.
Fix by returning 1 on success from the block device path to match probe_sector_size()
convention.
importd: add support for downloading OCI images (#39621)
This adds the ability to download OCI images via importd.
Not a fan of the OCI format tbh, in particular its security properties
are a bit sad. But I guess it exists and is very popular, hence we might
as well add support for it, even if it comes at much weaker security
properties than DDIs.
Bring Bash profile for reporting context via Operating System Commands (OSC) into compliance with specifications (#40696)
This script fails to comply with the spec it's designed to implement,
[UAPI.15 OSC 3008: Hierarchical Context
Signalling](https://uapi-group.org/specifications/specs/osc_context/),
and fails the correctly utilize the specs provided by
[POSIX.1-2024](https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/mindex.html)
and [man 1
bash](https://www.man7.org/linux/man-pages//man1/bash.1.html); improve
compliance.
Changes are made in small atomic commits, with more detailed
descriptions of the work done in each message.
elf2efi: import whole module, not individual symbols
When reading the code, it was hard to figure out if the given name was
imported or a local class. And the renaming of imports also made it
harder to look things up online. Arguably, the deeply nested import
structure and inconsistent naming in elftools is partially to blame:
there is just no good way to make this look nice. But anyway, let's use
the usual style of importing the module and using names prefixed with
the module path so that the origin of imported names is clear.
elfutils.elf.elffile is importered separately, because a) it needs to be
imported separately anyway bxecause the module does lazy imports
internally, a) the name already indicates the origin, c) is used in
quite a few places so the shorter name is nice.
This introduces PinnedResources as a structure combining pinned
references to a root directory, root image, or root mstack. This is not
only easier to work with, but essential to make certain unpriv things
work, as we need some mechanism to pin resources before we drop into a
userns which might possibly not provide access anymore to those
resources.
Hence this does two things: introduce the new structure, and immediately
hook it up so that we pin things properly before dropping into userns,
and then makes use of this after dropping the right way, and enables
unpriv userns operation.
The concept is generic enough to eventually implement extension images +
mount images with the same structure, but in order to keep the changes
managable this is left for another time.
(This also makes one further clean-up: client-side verity-reuse checks
are moved server side if we are unpriv. Previously we'd do them client
side, but they were doomed to fail because of lack of privs. Hence let's
drop the client side if we are unpriv and purely do them server-side in
that case.)
So far we opened a new Varlink connection for every mountfsd/nsresourced
method call. Given each tool only does a very small number of calls
(usually 1…5) on them and the connections are cheap this is not too
wasteful. Nonetheless, let's do something about it, and allow reusing
the connection for multiple calls.
This not only makes things a bit more efficient, but has one more
important benefit: Varlink connections pin the security context of the
client when connecting. This means that varlink method calls done with a
connection established while some code was privileged will still operate
as privieged once privs are dropped, until the connection is closed.
This pinning effect is really nice, as it gives us behaviour in a
"capability system" like scheme. Later code is going to use that to
continue doing certain priv userns ops even after unsharing userns and
becoming fully unpriv.
namespace: extend bind mount ignore field to permission issues
A later commit will add transient allocation of user namespaces with
dynamic UID range assignment. That creates certain permission issues.
Let's hence allow them to be handled gracefully in case the 'ignore'
field is set for a mount.
namespace: port mount_private_apivfs() to fsopen() and friends
This is not just refactoring, but has the big benefit that it makes us
indepdendent from a temporary directory we might not have enough access
to create. (This matters with the new PrivateUsers=managed).
core: introduce exec_context_with_rootfs_strict() as a stricter version of exec_context_with_rootfs()
We have two very similar checks in place: in some contexts we want to
know if *any* RootDirectory= is configured, in the other we want to
suppress if it is configured to our regular root. Let's add a helper for
both (even if we only need it once), to make the mirrored behaviour
clear.
pull-job: make sure pull_job_restart() can be used to fetch the same resource again, just with new headers
Let's flush out all response state from the job, but let's keep the
request data previously configured, in particular the headers set. This
is useful to re-request a resource, just with a slightly modified or
identical URL.
Carolina Jubran [Mon, 16 Feb 2026 09:24:53 +0000 (11:24 +0200)]
udev: grant read access to PTP devices for unprivileged users
Change the default udev rule for /dev/ptp* from 0660 to 0664,
allowing unprivileged users read-only access.
NIC telemetry and hardware logs often use device timestamps that must
be correlated with host time via read-only PTP ioctls (e.g.
cross-timestamp queries). Requiring privileged access makes these
workflows unnecessarily restrictive.
Older kernels lacked proper permission checks in some PTP ioctls.
Kernel commit b4e53b15c04e3852949003752f48f7a14ae39e86 ("ptp: Add PHC
file mode checks. Allow RO adjtime() without FMODE_WRITE.") introduces
the necessary file mode validation, ensuring that read access does not
permit clock modification or configuration changes, which still require
write permissions.
This commit has been backported to all actively maintained stable
kernel branches.
Kai Lüke [Thu, 19 Feb 2026 07:01:06 +0000 (16:01 +0900)]
openssl-util: pass the UI callback for interactive PIN prompts
Observed with the tpm2 provider and the tpm2tss engine was that the
auth process failed because the provider/engine could not ask for the
PIN through the callback, resulting in:
"Failed to load private key from ...: Input/output error"
Apparently the default UI method is not enough and the key setup
functions expect an explicit method.
Pass the existing UI method through as callback for the key setup.