Mike Yuan [Mon, 23 Feb 2026 06:48:43 +0000 (07:48 +0100)]
logs-show: clean up journal_entry_to_json() a bit
* Make sure ret is initialized on success return
* Drop unneeded 'object' variable
* No need to ref/unref json objects when constructing
intermediary array
Mike Yuan [Mon, 23 Feb 2026 08:30:17 +0000 (09:30 +0100)]
units/user/systemd-journalctl.socket: drop MaxConnectionsPerSource=
For AF_UNIX sockets connection sources are accounted for
based on UID, hence in user scope this effectively
limits total number of connections, which is not really
desirable.
This follows the existing practice for
systemd-journal-{upload,gatewayd}.service,
as I think allocating a full-blown user
specifically for this purpose is an overkill.
And with DynamicUser=yes we can also take
advantage of implied sandboxing.
man/systemd.mstack: use <varname> instead of <variable>
Otherwise, `<variable>location</variable>` is rendered:
```
[2365/2925] Generating man/systemd.mstack.7 with a custom command
Element variable in namespace '' encountered in para, but no template matches.
```
resolved: Add ifindex=0 support for BrowseServices to browse all mDNS interfaces
Avahi provides AVAHI_IF_UNSPEC (-1) to browse mDNS services on all
interfaces simultaneously. Currently, systemd-resolved's BrowseServices
varlink API requires a specific interface index and lacks the ability to browse on
all available interfaces.
This change adds support for ifindex = 0 to mean \"browse on all mDNS-enabled
interfaces\" to match the Avahi API.
When ifindex = 0 is specified the browser will now iterate all mDNS scopes
instead of a single interface.
This enables applications to discover services on any network interface
without needing to know the specific interface index in advance.
Assisted-by: Claude Opus 4.6 (Eclipse Theia IDE AI)
resolved: Track per-service item ifindex in DnssdDiscoveredService
The interface where each service was discovered needs to be remembered
so it can be correctly reported when the service is later removed.
Previously, service removal would use sb->ifindex, losing the actual
interface information from the original discovery.
This change:
- Adds an ifindex field to DnssdDiscoveredService struct
- Stores the discovered interface index when adding new services,
preferring the per-item ifindex from DnsAnswerItem over the service
browser's ifindex
- Uses the stored ifindex when reporting service removal events
This ensures that service removal notifications include the correct
interface index where the service was originally discovered, matching
the behavior of the corresponding service addition notifications.
Assisted-by: Claude Opus 4.6 (Eclipse Theia IDE AI)
Yu Watanabe [Sun, 22 Feb 2026 20:38:03 +0000 (05:38 +0900)]
sd-device: do not try to remove previous tag indexes
The removed code in device_tag_index() in fact does nothing,
as sd_device.all_tags is never cleared. Moreover, not only the code
is meaningless, but it is theoretically/logically wrong, as the symlinks
in /run/udev/tags/ should be 'sticky', hence we should even not try to
remove them.
However, when TAG= (rather than TAG+=) is specified, then the tags
assigned in the previous events were also cleared.
This fixes the issue and now symlinks in /run/udev/tags/ are really
'sticky'.
Fortunately, TAG= is mostly unused. So, the issue should not affect
and the fix should not change anything on almost all systems.
journalctl: add new varlink GetEntries endpoint (#40650)
journalctl: add new varlink read service to get entries
We already have some varlink support for the journal to perform
some actions like `Rotate`. It would be nice to be able to query
the journal via varlink too so this commit adds a new varlinkctl
based journal service that exposes a single GetEntries() call
to retrieve journal entries. Basic filtering is supported and
we can expand the API as needed.
This is a separate `io.systemd.JournalControl` [1] service from the
existing `io.systemd.Journald` to decouple read and write (thanks
to Lennart for suggesting this).
This also extracts some shared helper so that we do not duplicate
code when generating the json or when adding the filters.
[1] The name mirrors the bootctl->io.systemd.BootControl naming.
Luca Boccassi [Sat, 21 Feb 2026 11:27:37 +0000 (11:27 +0000)]
core: validate ref_uid before checking in AttachProcesses method
ref_uid is initialized to invalid, and is only set in some
circumstances. The AttachProcesses will attempt to check it,
and assert that it is valid. Check beforehand.
Michael Vogt [Tue, 10 Feb 2026 15:27:58 +0000 (16:27 +0100)]
journalctl: add new varlink read service to get entries
We already have some varlink support for the journal to perform
some actions like `Rotate`. It would be nice to be able to query
the journal via varlink too so this commit adds a new varlinkctl
based journal service that exposes a single GetEntries() call
to retrieve journal entries. Basic filtering is supported and
we can expand the API as needed.
This is a separate `io.systemd.JournalControl` [1] service from the
existing `io.systemd.Journald` to decouple read and write (thanks
to Lennart for suggesting this).
This also extracts some shared helper so that we do not duplicate
code when generating the json or when adding the filters.
[1] The name mirrors the bootctl->io.systemd.BootControl naming.
Mike Yuan [Thu, 12 Feb 2026 01:58:35 +0000 (02:58 +0100)]
vmspawn: clean up OVMF secure boot support check a bit
find_ovmf_config() would do filtering based on arg_secure_boot
already, hence the mismatch can only occur if we're using
user-specified firmware. So be explicit about this in log.
Mike Yuan [Wed, 11 Feb 2026 22:15:24 +0000 (23:15 +0100)]
parse-argument: make parse_tristate_argument() do something useful
I expressed the issue I have with parse_tristate_argument()
in #37751: it doesn't add any value to direct use of parse_tristate();
on the contrary, it doesn't support means to reset the arg to "auto"/-1 state.
The mere reason it existed is that we need a int type ret param.
Since the previous attempt to address this mess failed, let's
try to make the function more useful by making it accept "auto".
I figure this is useful on its own.
As requested in
https://github.com/systemd/systemd/pull/40652#discussion_r2831833996,
the function name is suffixed with _with_auto() to establish
that "auto" is handled internally.
Michael Vogt [Fri, 20 Feb 2026 10:20:02 +0000 (11:20 +0100)]
machine: switch CleanPool to SD_VARLINK_REQUIRES_MORE
The CleanPool requires --more to be set and checks that in
`vl_method_clean_pool`. By switching to SD_VARLINK_REQUIRES_MORE
this will automatically be handled and is more clear to
the varlink users.
Based on the comment from Lennart in
https://github.com/systemd/systemd/pull/40650#discussion_r2832378002
and the work done by Mike in 09388a6b9e4 (thanks!).
Julian Sparber [Thu, 12 Feb 2026 16:32:32 +0000 (17:32 +0100)]
repart-varlink: Consider only managed parititions for size errors
Report DiskTooSmall only if partitions managed by repart don't fit the
disk. Because if the disk is already full with forigin partitions we
would always report DiskTooSmall instead of InsufficentFreeSpace.
Julian Sparber [Thu, 12 Feb 2026 16:28:43 +0000 (17:28 +0100)]
repart-varlink: Calculate the size of foreign partitions
To decide whether the disk is to small or has insufficient free space we
need to know how much of the disk is filled with foreign partitions.
The calculated size is used in a future commit.
Julian Sparber [Wed, 19 Nov 2025 17:30:04 +0000 (18:30 +0100)]
repart: Sum partitions size to get current disk size instead of using total size
When working on disks the disk may have a total size bigger then the
actual allocated size, therefore sum up the current partitions to
calculate the current disk size instead of asuming that the entire disk
is currently allocated.
Let's also rename the "metric_prefix" to "name", because it's actually
the servce name, and by giving it this generic name we can use it
reasonably in log messages.
Let's stay close to Varlink's naming rules and insist that metrics
prefixes must be valid varlink interface names, and suffixes are valid
varlink field names.
The former rule is clear: because a metric <x>.<y> can only be provided
by a varlink service <x>, it is obvious we should validate them the
same way. Validating the suffix via varlink field rules is not that
obvious, but I think it makes sense to stay close to Varlink naming
rules if we already started out at one place.
Yu Watanabe [Fri, 20 Feb 2026 07:18:07 +0000 (16:18 +0900)]
mstack: fix resource leak on failure path
This makes the mstack_load() requires 'ret', as clearing the loaded
mstack without use is meaningless. All callers already pass non-NULL for
the argument.
In a way, metrics are a key-value concept, where the key is a triplet of
metrics family name, object name, and "fields". Let's put them together
in the varlink call, and put the value last, separately from that.
Also, update docs a bit, i.e be explicit about the metrics *family* name
everyhwere.
Daan De Meyer [Wed, 18 Feb 2026 18:30:12 +0000 (19:30 +0100)]
uid-range: Handle same userns in uid_range_load_userns_by_fd()
If we're asked to look up our own user namespace mapping, don't go
via fd as trying to setns() to our own user namespace in
userns_enter_and_pin() would fail with EPERM as the kernel doesn't
allow switching to your own userns.
repart: return 1 from probe_sector_size_prefer_ioctl() on block device success
probe_sector_size() returns 1 when it successfully determines the sector size,
0 when falling back to the default. blockdev_get_sector_size() returns 0 on
success. probe_sector_size_prefer_ioctl() was passing blockdev_get_sector_size()
return value through directly, so caller is checking r > 0 to detect a
successfully probed sector size never saw it for block devices.
In context_load_partition_table(), this caused fs_secsz to stay at 4096 bytes
even on 512-byte sector block devices, making verity hash partition sizes wrong
unless --sector-size=512 was passed explicitly.
Fix by returning 1 on success from the block device path to match probe_sector_size()
convention.
importd: add support for downloading OCI images (#39621)
This adds the ability to download OCI images via importd.
Not a fan of the OCI format tbh, in particular its security properties
are a bit sad. But I guess it exists and is very popular, hence we might
as well add support for it, even if it comes at much weaker security
properties than DDIs.
Bring Bash profile for reporting context via Operating System Commands (OSC) into compliance with specifications (#40696)
This script fails to comply with the spec it's designed to implement,
[UAPI.15 OSC 3008: Hierarchical Context
Signalling](https://uapi-group.org/specifications/specs/osc_context/),
and fails the correctly utilize the specs provided by
[POSIX.1-2024](https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/mindex.html)
and [man 1
bash](https://www.man7.org/linux/man-pages//man1/bash.1.html); improve
compliance.
Changes are made in small atomic commits, with more detailed
descriptions of the work done in each message.
elf2efi: import whole module, not individual symbols
When reading the code, it was hard to figure out if the given name was
imported or a local class. And the renaming of imports also made it
harder to look things up online. Arguably, the deeply nested import
structure and inconsistent naming in elftools is partially to blame:
there is just no good way to make this look nice. But anyway, let's use
the usual style of importing the module and using names prefixed with
the module path so that the origin of imported names is clear.
elfutils.elf.elffile is importered separately, because a) it needs to be
imported separately anyway bxecause the module does lazy imports
internally, a) the name already indicates the origin, c) is used in
quite a few places so the shorter name is nice.
This introduces PinnedResources as a structure combining pinned
references to a root directory, root image, or root mstack. This is not
only easier to work with, but essential to make certain unpriv things
work, as we need some mechanism to pin resources before we drop into a
userns which might possibly not provide access anymore to those
resources.
Hence this does two things: introduce the new structure, and immediately
hook it up so that we pin things properly before dropping into userns,
and then makes use of this after dropping the right way, and enables
unpriv userns operation.
The concept is generic enough to eventually implement extension images +
mount images with the same structure, but in order to keep the changes
managable this is left for another time.
(This also makes one further clean-up: client-side verity-reuse checks
are moved server side if we are unpriv. Previously we'd do them client
side, but they were doomed to fail because of lack of privs. Hence let's
drop the client side if we are unpriv and purely do them server-side in
that case.)
So far we opened a new Varlink connection for every mountfsd/nsresourced
method call. Given each tool only does a very small number of calls
(usually 1…5) on them and the connections are cheap this is not too
wasteful. Nonetheless, let's do something about it, and allow reusing
the connection for multiple calls.
This not only makes things a bit more efficient, but has one more
important benefit: Varlink connections pin the security context of the
client when connecting. This means that varlink method calls done with a
connection established while some code was privileged will still operate
as privieged once privs are dropped, until the connection is closed.
This pinning effect is really nice, as it gives us behaviour in a
"capability system" like scheme. Later code is going to use that to
continue doing certain priv userns ops even after unsharing userns and
becoming fully unpriv.
namespace: extend bind mount ignore field to permission issues
A later commit will add transient allocation of user namespaces with
dynamic UID range assignment. That creates certain permission issues.
Let's hence allow them to be handled gracefully in case the 'ignore'
field is set for a mount.
namespace: port mount_private_apivfs() to fsopen() and friends
This is not just refactoring, but has the big benefit that it makes us
indepdendent from a temporary directory we might not have enough access
to create. (This matters with the new PrivateUsers=managed).