Luca Boccassi [Mon, 22 Jun 2026 13:47:20 +0000 (14:47 +0100)]
machine-tags: extend syntax to support key/value pairs (#42618)
This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.
Luca Boccassi [Mon, 22 Jun 2026 13:15:39 +0000 (14:15 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Luca Boccassi [Mon, 22 Jun 2026 13:08:21 +0000 (14:08 +0100)]
Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)
Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.
No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).
Also a few minor changes:
- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`
Luca Boccassi [Sat, 20 Jun 2026 00:05:00 +0000 (01:05 +0100)]
systemctl: add --kernel-cmdline-reuse option
kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.
Luca Boccassi [Sat, 20 Jun 2026 23:11:08 +0000 (00:11 +0100)]
btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed
Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.
An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.
dongshengyuan [Mon, 22 Jun 2026 06:13:11 +0000 (14:13 +0800)]
resolve: fix transaction leak in dns_transaction_new() error path
hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.
Luca Boccassi [Sun, 21 Jun 2026 09:42:02 +0000 (10:42 +0100)]
resolved: load libcrypto/libssl lazily on first use and make them optional
Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.
Daan De Meyer [Wed, 3 Jun 2026 10:58:02 +0000 (10:58 +0000)]
tree-wide: Beef up openssl logging
Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.
- `fd_copy_directory()` was using `FOREACH_DIRENT_ALL`, which doesn't
give stable ordering. Read all paths, sort, then iterate.
- `mcopy -s` depends on `readdir()` ordering and thus isn't
reproducible. Implement the recursion/sorting here and only invoke
mcopy/mmd per dir.
First change increases memory usage, as we don't stream the paths
anymore, second increases the number of context switches when invoking
external tools. Both should be fine given the ESP content should usually
be pretty limited.
I'd like to write a test for this, but didn't come up with a way that
doesn't require privileges and would surface the error reliably.
Chris Coulson [Tue, 2 Jun 2026 17:02:35 +0000 (18:02 +0100)]
shared/tpm2: support chunked reads of NV indexes
The TPM2_NV_Read commands returns the requested data in a
TPM2B_MAX_NV_BUFFER type, the maximum size of which is TPM-specific and
can be determined by querying the value of the TPM_PT_NV_BUFFER_MAX
property.
The value of this may be smaller than the payload size of some NV
indexes, particularly when that payload is a X509 certificate with a RSA
public key. Eg, the manufacturer supplied RSA EK certificate on my own
machine has a size of 1035 bytes, and the value of TPM_PT_NV_BUFFER_MAX
is 1024.
To handle this case and make it possible to read any EK certificate from
the TPM, make tpm2_read_nv_index support chunked reads when the payload
size is larger than what the TPM can return in a single command.
Luca Boccassi [Sat, 20 Jun 2026 14:20:19 +0000 (15:20 +0100)]
ssl-util: support OpenSSL 4
OpenSSL 4 broke ABI, so we need to look for both SONAMEs.
Try libssl.so.3 first, and fallback to libssl.so.4,
so that the older and more stable version is used if both
are installed, giving distros time to fix regressions.
core: create abstraction/more properties for the "Exec" part of Unit.StartTransient (#42360)
This is a bit of an RFC (but I hope I got it mostly right), @daandemeyer
suggested in
https://github.com/systemd/systemd/pull/42161#pullrequestreview-4336323314
to improve the abstractions around the Exec= in
io.systemd.Unit.StartTransient as we will add a bunch more of those. So
this PR adds first a better abstraction and then uses it. See the
individual commits for details.
Michael Vogt [Mon, 15 Jun 2026 06:40:50 +0000 (08:40 +0200)]
core: add _parameters_init for the Unit.StartTransient dispatch
This commit extracts the initialization of the transient parameters
for io.systemd.Unit.StartTransient into a set of helpers that follow
the _parameters_init() pattern. This way the code is more uniform
and easier to extend and less fragile. It also means there is a
single (logical) place to init the fields.
Michael Vogt [Thu, 21 May 2026 14:28:57 +0000 (16:28 +0200)]
core: add more settable properties to varlink Unit.StartTransient()
This commit uses the abstractions added in the previous commit to
add a bunch more properties to the io.systemd.StartTransient()
to showcase how straightforward this is now.
New helpers for tristate bools and an init helper are added. A
dedicated dispatcher for LogLevelMax parses the string-form name
("info", "debug" etc.) declared in the varlink IDL.
The new properties are: DynamicUser, IgnoreSIGPIPE, LockPersonality,
MemoryDenyWriteExecute, NoNewPrivileges, OOMScoreAdjust, RemoveIPC,
RestrictRealtime, RestrictSUIDSGID, RootEphemeral, UMask.
The remaining ProtectKernel*, Private*, ProtectClock properties are
declared as STRING in the varlink IDL (matching the modern *Ex/enum
form) so a bool dispatcher does not pass schema validation. Those
need a string-parsing dispatcher and will be added in a follow-up.
This brings us closer to parity with the D-Bus code (still a long
way to go though).
Michael Vogt [Thu, 21 May 2026 11:49:33 +0000 (13:49 +0200)]
core: create abstraction for the "Exec" part of Unit.StartTransient
The handling of the `Exec` parameters for the varlink
`io.systemd.Unit.StartTransient()` became a bit unwieldy. So
this commit creates another abstraction to handle the various
fields in the `Exec` part of the StartTransient code.
Each Exec property is now described by a single TransientExecProperty
entry and adding a new property is just a single entry there plus
an apply function.
Thanks to Ivan Kruglov for many useful suggestions.
Luca Boccassi [Sun, 21 Jun 2026 12:18:04 +0000 (13:18 +0100)]
Translations update from Fedora Weblate (#42682)
Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).
Pranay Pawar [Sun, 21 Jun 2026 05:10:21 +0000 (10:40 +0530)]
hwdb: Map Acer Nitro AN515-58 NitroSense button to prog1
The Acer Nitro AN515-58 has a dedicated NitroSense button (scan code
0xf5) with no entry in its device-specific block. It currently falls
through to the generic Acer rule:
KEYBOARD_KEY_f5=presentation
This is semantically wrong — the button has no relation to
presentation mode. Other Nitro models (AN515-47, AN517-54, ANV15-51)
already map it to prog1 (XF86Launch1), making it a user-programmable
button that desktop environments (KDE, GNOME, etc.) can bind to any
action.
Add the same mapping for AN515-58 for consistency:
KEYBOARD_KEY_f5=prog1 # NitroSense button
Tested on Acer Nitro AN515-58 with Linux 7.0.12 and KDE Plasma 6.7
Luca Boccassi [Fri, 19 Jun 2026 16:58:56 +0000 (17:58 +0100)]
executor: also preload libcrypto
It's needed for the userspace fallback verity verification, so
it needs to be pre-loaded to avoid getting blocked by RTLD_NOLOAD:
[ 57.163995] (cat)[1560]: minimal-app0-foo.service: Validation of dm-verity signature failed via the kernel, trying userspace validation instead: Required key not available
[ 57.194696] (cat)[1560]: minimal-app0-foo.service: Refusing loading of 'libcrypto.so.3', as loading further dlopen() modules has been blocked.
[ 57.197940] (cat)[1560]: minimal-app0-foo.service: Shared library 'libcrypto.so.3' is not available: Operation not permitted
[ 57.204283] (cat)[1560]: minimal-app0-foo.service: Failed to activate verity device /dev/mapper/2b2fd83f324c3aa2ea1a979899f9c630761f1de3c5e00ce8c6bb36f4d137f450-verity: Operation not supported
[ 57.272782] (cat)[1560]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not supported
[ 57.274250] (cat)[1560]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not supported
Rocker Zhang [Sat, 6 Jun 2026 07:29:43 +0000 (07:29 +0000)]
generator: order cryptsetup/verity/integrity after systemd-udevd
During soft-reboot teardown, systemd-cryptsetup@.service's ExecStop runs
libcryptsetup's crypt_deactivate(), which issues DM_REMOVE with a udev
cookie and blocks in dm_udev_wait() until 95-dm-notify.rules decrements
the cookie semaphore via "dmsetup udevcomplete $env{DM_COOKIE}".
The generated cryptsetup unit had only After=systemd-udevd-kernel.socket,
not After=systemd-udevd.service. The .socket has IgnoreOnIsolate=yes and
is not stopped during soft-reboot, so it provides no ordering at all for
the service teardown. Meanwhile systemd-udevd.service has
Conflicts=soft-reboot.target (added in 0d1819e791) and stops as soon as
soft-reboot.target starts, with no ordering relative to cryptsetup's
ExecStop.
Once udevd's device monitor event source is disabled in manager_exit(),
pending DM_REMOVE uevents are no longer processed and the cookie
semaphore stays at 1 forever, blocking soft-reboot at "Stopping
Cryptography Setup..." until JobTimeoutSec=30min on soft-reboot.target
fires.
Add systemd-udevd.service to the After= ordering of the generated
cryptsetup, veritysetup and integritysetup units. By systemd's job
ordering rules (job_compare() in src/core/job.c), when two units are
both being stopped and one has After= the other, the After= unit is
stopped first. So with cryptsetup@*.service After=systemd-udevd.service,
cryptsetup stops first (cookie acknowledged by the still-live udevd),
then udevd stops.
Putting After=umount.target on systemd-udevd.service does not work: at
soft-reboot, udevd's stop job runs concurrently with umount.target's
start job, and JOB_STOP unconditionally precedes JOB_START in the
transaction (see src/core/job.c:1742). The ordering has to be expressed
between two stop jobs, which is what putting After=systemd-udevd.service
on the dm consumers achieves.
pam_systemd: add option for acquiring logind inhibitor locks for the duration of a session (#42275)
This adds support in pam_systemd for acquiring an inhibitor lock when a
pam session is active, and can be used (for example) to prevent
suspend/sleep of a host when an ssh session is active on the host. The
reason string can be customized via `inhibit-why=`.
For example, /etc/pam.d/sshd can now contain this line: `-session
optional pam_systemd.so inhibit=sleep`
And sleep will be blocked when someone is logged into the system over
ssh:
```
foo:~$ systemd-inhibit --list --no-pager
WHO UID USER PID COMM WHAT WHY MODE
NetworkManager 0 root 397 NetworkManager sleep NetworkManager nee… delay
Realtime Kit 0 root 401 rtkit-daemon sleep Demote realtime sc… delay
sshd 0 root 667 sshd-session.pa sleep Active PAM session block
```
Chris Coulson [Mon, 1 Jun 2026 12:27:15 +0000 (13:27 +0100)]
shared/tpm2: split the tpm2_index_to_handle helper function
The existing tpm2_index_to_handle helper has optional return arguments
for an object's public area and its qualified name. However, it can also
be called for handles corresponding to NV indexes, where the public area
and qualified name don't apply.
I'm working on another change that would benefit from having an
equivalent helper for NV indexes, and there is already some code in
shared/tpm2-util.c that could use this as well.
This splits the existing tpm2_index_to_handle function into 3 functions:
- tpm2_index_to_handle, which works for both objects and NV indexes and
returns a name and handle.
- tpm2_object_index_to_handle, which works the same as the existing
function for transient or persistent objects, returning a public area,
name, qualified name and handle.
- tpm2_nv_index_to_handle, which works for NV indexes, returning the NV
public area, name and handle.
hostname: add $ hostname substitution and petnames wordfiles (#42566)
This commit adds support to /etc/hostname for substitution of $ from
wordlists located in /etc,/run,/usr/lib}/systemd/. Each $ is resolved to
a number (1,2,3...) and the corresponding file "1" is opened to acquire
the word. With that we can do a petname [1] style hostname in systemd,
e.g. below a possible expansion for a hostname template:
$-$-$-???? -> wildly-happy-octopus-92a9
The substitution of words is stable (based on machine-id) but if the
wordlist changes the hostname would change. We could pick it once and
cache it but Lennart did not like this so this version instead always
picks it (based on offset of the file so the operation is cheap). To
persist it one can use the `firstboot.hostname` credential. One a live
system this will be expanded and then written in the expanded form to
/etc/hostname.
This also includes a wordlist from the "petname" project that can be
optionally installed.
Thanks to Dustin Kirkland for this wonderful project.
[1] https://github.com/dustinkirkland/petname
---
I'm a bit unsure if this should include the word lists (I think its nice
to have them though) and if so if they should be their own commit.
This implements a close facsimile of
[sudo](https://man.archlinux.org/man/sudo.8#K)'s -k/-K/-v options, which
manipulate the temporary authorizations used by sudo. Use like `run0 -k
whoami` to always re-auth, or just `run0 -k` to revoke a prior temporary
authorization.
~~Depends on polkit-org/polkit#662 due to a bug in polkit.~~ Now it
should work ok despite the bug.
Jonas Dreßler [Fri, 29 May 2026 13:32:07 +0000 (15:32 +0200)]
repart: Place new partitions at beginning of free area rather than at end
When placing new partitions and there's space left because the new partitions
aren't occupying the whole free space, context_grow_partitions_on_free_area()
is supposed to distribute the free space between the new partitions. If still
no partition wants the free space, the free space ends up becoming padding.
Currently that padding is allocated to the partition preceding the FreeArea
(ie. a->after). This obviously means that the new partitions now end up at
the *end* of the free area rather than at the beginning, which is somewhat
unexpected given how partition placement usually is done.
Fix it by finding the last partition that belongs to the free area, and then
allocating the padding to that partition, so that the new partitions end up
getting aligned with the beginning of the free area, not the end.
Because the span might not be rounded to grain if there's a pre-allocated
a->after partition before the free area, we need to round it down ourselves
(otherwise the "left >= p->new_padding" assertion in context_place_partitions()
is going to fail).
Also ensure the fix works as expected by adding a test.
tpm2: support SHA384/SHA512 PCR banks in tpm2_get_best_pcr_bank() (#42538)
`tpm2_get_best_pcr_bank()` only ever considered the SHA256 and SHA1
banks (both the `LoaderTpm2ActivePcrBanks` path and the capability
guesswork). On a TPM whose only active bank is SHA384 it returned
`-EOPNOTSUPP`, breaking sealing/enrollment (cryptenroll, credential
encryption, legacy unseal). The restriction looks like a historical
simplification — `efi_get_active_pcr_banks()` already decodes
SHA384/SHA512 and `tpm2_hash_algorithms[]` already lists them.
This PR introduces an explicit preference table (SHA256 > SHA512 >
SHA384 > SHA1) and selects from it. SHA256 stays the top preference for
backwards compatibility, so existing systems keep using the same bank
and the legacy unseal-guess in `tpm2_unseal()` stays consistent;
SHA384/SHA512 are only chosen when SHA256 is unavailable, SHA1 remains
the last resort.
Behavior for existing SHA256/SHA1 systems is unchanged. Includes a unit
test for the bank-preference logic.
Related to https://github.com/systemd/systemd/pull/42537
Ani Sinha [Fri, 12 Jun 2026 06:40:47 +0000 (12:10 +0530)]
chid_match: match UEFI firmware hwid entry with the current smbios data
UEFI firmware type hwids must be matched against the current hardware first.
This change implements that. Additionally, some extra validations on the hwids
entries have also been added.
Roman Vinogradov [Thu, 11 Jun 2026 14:21:55 +0000 (14:21 +0000)]
nss-systemd: avoid ELF TLS for recursion guard
libnss_systemd currently uses a thread_local recursion guard to
avoid re-entering nss-systemd during NSS lookups.
Since libnss_systemd.so.2 is loaded lazily by glibc, accessing ELF TLS
may trigger dynamic TLS allocation in __tls_get_addr(). Under allocation
failure conditions, glibc terminates the process from the dynamic loader
instead of allowing the NSS module to return a normal failure.
Replace the recursion guard with POSIX thread-specific data to preserve the
same per-thread semantics while avoiding ELF TLS in the NSS module.
Note that pthread_setspecific() may still allocate internally on first use
per thread. The key improvement is that any such failure is returned
as a normal error code rather than terminating the process from inside
the dynamic loader.
Michael Vogt [Wed, 17 Jun 2026 07:51:53 +0000 (09:51 +0200)]
hostname: improve the algorithm in hostname_pick_word()
Lennart suggested to use a more uniform algorithm for
the picking of the hostname words that is not biased
for long words by just (predictably) randomly going over
the offsets until we land on a word boundary. This is a
very nice suggestion so this commit implements it with
a fallback to the "old" behavior if we do not find a
word boundary within a reasonable amount of attempts.
A small python script shows that 64 iterations plus
fallback is a good number:
```
$ python3 simulate-hostname-pick.py 64
hostname-wordlist/adverbs
words=261 p_accept=0.1119 avg_bytes/word=1/p=8.94
max_iterations=64, n_trials=1000000
fallback rate : 0.051000% (510/1_000_000)
mean seeks per word : 8.93
hostname-wordlist/adjectives
words=449 p_accept=0.1380 avg_bytes/word=1/p=7.24
max_iterations=64, n_trials=1000000
fallback rate : 0.007500% (75/1_000_000)
mean seeks per word : 7.25
hostname-wordlist/nouns
words=449 p_accept=0.1472 avg_bytes/word=1/p=6.79
max_iterations=64, n_trials=1000000
fallback rate : 0.002700% (27/1_000_000)
mean seeks per word : 6.79
```
Combined with the fallback to the previous method if
we can't find anything within the 64 attemps this seems
to be the best tradeoff and give us very good uniformity.
Michael Vogt [Fri, 12 Jun 2026 14:27:31 +0000 (16:27 +0200)]
hostname: add ? and $ in systemd.hostname= kernel cmdline
Similar to the support for ?/$ in /etc/hostname and the credentials
we now add this to the kernel commandline systemd.hostname= option.
If the expansion fails, e.g. in the initrd where the word lists (and
possibly the machine ID) are not available yet, the option is ignored
and the usual default hostname logic applies. Once the host system is
up the expansion succeeds and the intended name is applied.
Michael Vogt [Fri, 12 Jun 2026 10:43:15 +0000 (12:43 +0200)]
creds,firstboot: add support for ? and $ via credentials
Now that we support the `$` we want to also make this available
inside the system.hostname and firstboot.hostname credentials and
the firstboot --hostname option. This commit adds it (and also `?`).
Michael Vogt [Wed, 3 Jun 2026 15:05:55 +0000 (17:05 +0200)]
hostname: add $ hostname substitution and petnames
This commit adds support to /etc/hostname for substitution
of $ wordlists from {/etc,/run,/usr/lib}/systemd/hostname-wordlist.
The first $ will lookup hostname-wordlist/1, the next
hostname-wordlist/2 and so on.
With that we can do a petname [1] style hostname in systemd, e.g.
below a possible expansion for a hostname template:
$-$-$-???? -> wildly-happy-octopus-92a9
The substitution of words is stable (based on machine-id) but
not persisted, it is picked on every boot via a stable file
offset so the operation is cheap. But this means that if the
wordlist changes the hostname would change. The next commit
will add the pattern to the firstboot.hostname credential which
is persistet with the resolved names to avoid this issue.
This also includes a wordlist from the "petname" project
that can be optionally installed.
Thanks to Dustin Kirkland for this wonderful project.
Raito Bezarius [Fri, 29 May 2026 22:10:21 +0000 (00:10 +0200)]
shared/libfido2: show number of retries before lockout
For a good user experience, users expect to be informed of how many
attempts they have before being locked out of their FIDO2 device.
By displaying such information in advance, the user can make strategy to
obtain the accurate PIN or wait when they are close to an authority who
can provide them for a recovery key.
Michael Vogt [Wed, 27 May 2026 16:32:04 +0000 (18:32 +0200)]
core: add RestartRandomizedDelaySec= service option
We already support exponential backoff for automatic restarts via
RestartSec=/RestartSteps=/RestartMaxDelaySec=, but there is no way to
randomize the restart delay. When many instances of a service fail at
the same time (e.g. because a shared resource briefly went away) they
are all restarted in lockstep, creating a thundering herd problem.
So this commit adds a simple `RestartRandomizedDelaySec=` service
option which is similar to the timer `RandomizedDelaySec=` and
adds a randomized restart delay.
Michael Vogt [Fri, 12 Jun 2026 09:45:07 +0000 (11:45 +0200)]
test: enable and fix the TEST-71 hostname wildcard test
test_wildcard() was never executed: run_testcases() only picks up
functions named testcase_* so this test never ran. This commmit
makes it run and fixes two issues in the test:
1. /etc/hostname is absent in the test image so we need to guard
for that.
2. The pattern check was written as [[ "$P" == "$H" ]] with both
sides quoted, but we need to one side unquoted as otherwise
the comparison will always be false.
Clayton Craft [Sun, 24 May 2026 03:06:31 +0000 (20:06 -0700)]
pam_systemd: acquire inhibit lock when inhibit= is specified
This adds a new function, acquire_inhibit_lock, that takes a logind
inhibitor lock for the duration of the PAM session. The lock fd is
stored in PAM data with pam_cleanup_close. In pam_sm_close_session, the
lock is released explicitly by passing NULL to pam_set_data, which
triggers the registered cleanup function immediately rather than waiting
for pam_end.
The inhibit type and reason string can also be configured via the
XDG_SESSION_INHIBIT and XDG_SESSION_INHIBIT_WHY environment variables,
which take precedence over the module arguments.
Ronan Pigott [Thu, 4 Jun 2026 00:36:25 +0000 (17:36 -0700)]
run0: implement -k/-K to revoke temporary auth
This is meant to mirror sudo's -k/--reset-timestamp and
-K/--remove-timestamp options, which revoke the temporary authorization
provided by the timestamp files in /var/run/sudo/ts.
To achieve the same effect in run0, we ask polkit to revoke our
temporary authorization. If used with a command, run0 will revoke the
temporary auth and then immediately authorize the user again, just like
sudo -k. All the bus calls are completed synchronously, as they need to
complete before authorizing the user anyway.
Like sudo, the effect of -k/--reset-timestamp is to revoke only the
tmpauthz that polkit would have used to authorize the command, if
available. The -K/--remove-timestamp option will revoke all temporary
authorizations across all ttys.
Michael Vogt [Tue, 26 May 2026 13:57:03 +0000 (15:57 +0200)]
networkd: report per-interface addresses as io.systemd.Network.Address
The networkd metrics interface already reports a lot of interesting
metrics. With this commit it also report the network addresses too.
Each ready address is emitted as one record per (interface, address)
pair:
- object: ifname
- value: address in CIDR notation
- fields: { family: "ipv4"|"ipv6", scope: "global"|"link"|"host"|... }
The loopback addresses are not reported as its just noise.
Michael Vogt [Tue, 26 May 2026 13:52:40 +0000 (15:52 +0200)]
core: add kernel/userspace/finish boot timestamps to io.systemd.Manager metrics
This commit adds the boot timeline (MANAGER_TIMESTAMP_KERNEL/USERSPACE/FINISH) as
metrics. The kernel CLOCK_MONOTONIC value is 0 by definition, so only its
.Realtime is reported. For userspace and finish report both .Realtime and
.Monotonic. The naming follows D-Bus.
Fedora has switched to openssl 4, and we generate a Requires dependency
on libcrypto for the systemd-udev subpackage, so preferring openssl-3
does the wrong thing. So the order in the dlopen note needs to be switched.
But in general, we want to get rid of openssl-3, so we want to load
openssl-4 in preference. Change the order in both places.
(The "compat" order can stay in 261-stable for other distros.)
machine-tags: optionally support key/value pairs as machine tags
Other systems (kubernetes…) allow tagging machines with key/value pairs.
Let's extend our allowed syntax slightly to allow that too. Thankfully,
we enforced a pretty strict ruleset on machine tags, hence we can
introduce this without breaking compatibility.
This basically allows tags to contain "=". If so, then the left-hand
side of it must be unique among machine tags.
When matching against a machine tag, we apply the same rules as before.
This means, that if people want to check if a tag with value applies
they can do:
ConditionMachineTag=foo=bar
If they just want to check if "foo=" is set to anything, they can use
the usual glob matching:
Josh Hoffer [Thu, 4 Jun 2026 00:19:33 +0000 (17:19 -0700)]
boot/stub: honor PE SectionAlignment when loading inner kernel
The stub copies the embedded kernel PE into an xmalloc_pages()
allocation with AllocateAnyPages, which only guarantees EFI_PAGE_SIZE
(4 KiB) alignment.
The arm64 kernel declares SectionAlignment = SZ_64K. When _text is not
64K-aligned the kernel's EFI stub reallocates and copies the image,
which can fail with EFI_OUT_OF_RESOURCES on memory-constrained firmware
(e.g. U-Boot, bounded by CONFIG_SYS_MALLOC_LEN rather than full DRAM).
The failure is non-deterministic since AllocateAnyPages returns whatever
page the allocator finds first.
Plumb SectionAlignment through pe_kernel_info() and allocate via
xmalloc_aligned_pages(). For alignment <= EFI_PAGE_SIZE (e.g. x86_64) it
reduces to a plain AllocatePages(), so the change is free there.
SectionAlignment comes from the PE header, which the spec requires to be
a power of 2. Sanitize it in pe_kernel_info() right where it is parsed:
a non-conforming value falls back to plain page alignment (matching the
behavior of older systemd-stub versions that ignored the field) rather
than propagate something that would break xmalloc_aligned_pages()'s
over-alignment maths, with a log_warning() so the fallback stays
diagnosable.
Co-developed-by: Claude <claude@anthropic.com> Closes: #42443 Reported-By: Agathe Porte <agathe.porte@oss.qualcomm.com> Tested-By: Agathe Porte <agathe.porte@oss.qualcomm.com>
Kai Lüke [Thu, 28 May 2026 13:07:16 +0000 (22:07 +0900)]
sysext: Allow to (re)start units from extension-release metadata file
Up to now we recommended to use TARGET.upholds/ symlinks to start units
when an extension is loaded. However, this has some drawbacks. First,
for services that should not be tried to be started all the time we have
to resort to hiding them through a target that gets uphold and then
uses regular .wants/ for the actual service. Second, we actually leak
services on extension unload even if the unit has disappeared with the
extension. Third, to affect a service through a drop-in or a config
change from a confext/sysext and that service is already running, we
need a way to restart/reload it instead of just starting it.
Similar to EXTENSION_RELOAD_MANAGER=1, add a EXTENSION_RESTART_UNITS=
and a EXTENSION_RELOAD_OR_RESTART_UNITS= setting to the
extension-release metadata file, carrying a whitespace-separated list
of units to restart/reload on merge/refresh/unmerge after the daemon
reload. Also detect when the unit has vanished which is normally the
case when the unit was part of the unmerged extension, and stop it
explicitly to prevent it leaking. When the extension itself ships the
binary it should use EXTENSION_RESTART_UNITS= to make sure the new
binary is picked up. Since starting through this setting does not work
when the extension is mounted from the initrd, extensions should still
ship at least a .wants/ symlink to start at boot but can also continue
to ship a .upholds/ symlink for backwards compatibility without any
drawback and still benefit from the unit stopping triggered by the new
setting. While there are cases where one could want to set
EXTENSION_RESTART_UNITS= without requiring a daemon reload (e.g., an
env var file change instead of a unit drop-in), we now do an implicit
daemon reload when we have to restart units so that we know we work on
the right state and we spare users remembering to set this setting in
addition to prevent running into this issue.
Accept NSS aliases for canonicalized user records (#42452)
This PR fixes userdb lookups for NSS users that are resolved through an
alias but returned with a canonical user name.
Some NSS providers, such as SSSD, can successfully resolve a user by an
alias-like name, for example a Kerberos/AD UPN (for example
testuser@example.test) while returning a passwd record with the
canonical login name.
The original lookup name was not preserved. Later, the userdb worker
checked whether the returned record matched the requested name with
user_record_matches_user_name(). Since the requested name was
testuser@example.test, but the record only contained testuser, the
lookup was rejected as:
`io.systemd.UserDatabase.ConflictingRecordFound`
This also caused pam_systemd to fail opening sessions for such users
with:
`pam_systemd(...:session): Failed to get user record`
In my case, this broke graphical logins for Samba users logging in with
a UPN on systems where SSSD canonicalizes the NSS result.
My solution preserves the requested name as a UserRecord alias when an
NSS lookup by name succeeds but the returned pw_name differs from the
requested name.
This allows user_record_matches_user_name() to accept the canonicalized
NSS result instead of treating it as a conflicting record.
The patch does not invent new aliases. It only records the name that NSS
itself already accepted and resolved successfully.
I'm not an expert on working with systemd, so I'm asking for a review of
my PR.
Liu Zhangjian [Tue, 9 Jun 2026 13:15:52 +0000 (21:15 +0800)]
repart: fix SizeMinBytes/SizeMaxBytes rounding direction
The ltype parameter for SizeMinBytes and SizeMaxBytes in the config
parsing table was reversed. SizeMinBytes should round UP (ltype > 0)
to ensure the partition is at least the specified size, while
SizeMaxBytes should round DOWN (ltype < 0) to ensure the partition
doesn't exceed the specified size.
This matches the documentation in repart.d.xml which correctly states:
- SizeMinBytes: 'rounded upwards'
- SizeMaxBytes: 'rounded downwards'
The same fix is applied to PaddingMinBytes and PaddingMaxBytes which
share the same config_parse_size4096 parser.
Fixes: #42526 Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Liu Zhangjian <liuzhangjian@uniontech.com>
sysupdate: refuse reboot/pending logic when --component= is used (#42578)
Fixes #42330
The `pending` and `reboot` verbs, and the `--reboot` switch, decide
whether to reboot by comparing the newest installed version against the
booted OS version (`IMAGE_VERSION=` from os-release). When a component
is selected via
`--component=`, this ends up comparing the component's version against
the unrelated host OS version; by design these live in separate version
spaces, so the comparison is meaningless and reboot decisions become
arbitrary: depending on the relative version strings, sysupdate either
always reboots or never does.
### Example from the issue:
% /usr/lib/systemd/systemd-sysupdate --component=containerd reboot
Newest installed version '2.3.0' is older than booted version
'20260527200656'.
This refuses the combination with a clear error instead of silently
performing a bogus comparison:
- `verb_pending_or_reboot()` rejects `--component=` for the
`pending`/`reboot` verbs (mirroring the existing `--root=`/`--image=`
rejection).
- `verb_update_impl()` rejects `--reboot` combined with `--component=`,
before any update work is done.
Correctly tracking a per-component "booted" version (which could be
"none", making a reboot always apply) is a larger feature and left for
the future, as the reporter suggested.
The daemon (`sysupdated`) and `updatectl` don't perform this
host-version comparison, so the change is confined to the
`systemd-sysupdate` CLI.
Documentation and a negative regression test (TEST-72) are included.
### AI-use Disclosure:
I took assistance from Claude Opus 4.8 to scope out the issue, help with
writing proper comments/documentation, and help with writing the PR
description.
Armaan Sandhu [Sat, 13 Jun 2026 09:25:12 +0000 (14:55 +0530)]
sysupdate: address review feedback on component/reboot guard
Move the --reboot/--component= rejection into parse_argv() alongside the
other cross-option checks, and tighten TEST-72 to assert the specific
guard message rather than merely a non-zero exit.
Armaan Sandhu [Sat, 13 Jun 2026 07:25:51 +0000 (12:55 +0530)]
sysupdate: refuse reboot/pending logic when a component is selected
The `pending` and `reboot` verbs, as well as the `--reboot` switch, compare
the newest installed version against the booted OS version (IMAGE_VERSION= from
os-release). When a component is selected via --component=, this compares the
component's version against the unrelated host OS version, which by design live
in separate version spaces. The result is arbitrary reboot decisions: depending
on the relative version strings sysupdate would either always or never reboot.
Refuse the combination with a clear error instead of silently performing a
bogus comparison. Correctly tracking a per-component booted version is left as a
future feature.
Ronan Pigott [Mon, 15 Jun 2026 23:58:42 +0000 (16:58 -0700)]
pam: use default auth pam_deny.so
run0 doesn't actually use the auth pam stack, since polkit does the
requisite authorization. However, if the service type is left undefined
pam falls back to the definitions of the "other" service, which, at
least in Arch Linux but possibly more, includes pam_warn.so to notify
the user about this apparent error.
This creates a bit of logspam, as systemd does actually call pam_setcred
in its generic pam code, which depends on the auth pam stack, creating a
warning message in the journal on every invocation of run0.
pam_deny.so is a no-op, which avoids falling back to the other pam
service.
dongshengyuan [Wed, 17 Jun 2026 05:06:21 +0000 (13:06 +0800)]
hibernate: fix swap selection and prefer swap that can hold image
When multiple swap devices exist, prefer one with enough free space
to hold the hibernation image over one that cannot, regardless of
priority. If no swap can fit, fall back to priority-first selection.
This avoids deterministically failing hibernation when the
highest-priority swap is too small but a lower-priority one fits.
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com> Co-developed-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
report: add support for signing reports via varlink backends + make report a varlink service (#42595)
This adds the following:
1. systemd-report gains a new --sign= option, taking a boolean. If true,
this makes systemd-report generate + systemd-report upload generate a
signed report, instead of a regular one. The signatures are collected
from Varlink-based backends.
2. One such backend is added which does a simple Ed21159 based signing
scheme.
3. this adds a new metrics source which just reports text files
symlinked into a special dir as metrics. This is used to report the Ed21159 public key as metric, by default, if it exists.
4. finally, systemd-report itself is turned into a varlink service. this
is useful for example for extracting a report from a system coming in
via the varlink/http bridge.
I thought a long time about the format of signing of reports. Initially
i intended to do this like homed's user record signing, i.e. require
normalization of the record, then normalize the record, and write it out
in dense form, since the result. Finally insert the resulting hash into
the user record itself. People have pointed me to the inherent messiness
of signing JSON this way though, as it requires any participant that
wishes to sign/authenticate records this way to implement the exact same
normalization/formatting rules, and in particular in the area of
floating point numbers (of which metrics presumably will have many) this
is quite problematic.
This signing hence goes a different way. instead of expecting
signer+verifier to independently come to the same normalized text form
of the json data, let's instead output a JSON-SEQ sequence, where the
first object is the report, and any subsequent objects are one signature
each. the signatures are supposed to cover the precise binary
representation of the first element in the JSON-SEQ stream. (i.e. from
the RS to the NL).
or in other words: a verifier would receive the JSON-SEQ stream, split
it up before each RS. Then it would leave object 1 unparsed for the
moment, and parse objects 2…n. It would then authenticate object 1's
precise binary representation with objects 2…n. Once that checks out, it
would parse object 1, and use it as report.
Luca Boccassi [Fri, 29 May 2026 10:23:23 +0000 (11:23 +0100)]
udev: run workers in sibling cgroup and use cgroup.kill
Since a1f4fd387603673a79a84ca4e5ce25b439b85fe6 udev processes
run in an 'udev' subcgroup, to avoid killing control processes
when clearing workers. But the main process is still in the same
cgroup, so the atomic cgroup.kill cannot be used.
If the main subcgroup exists, try to create a sibling 'workers'
cgroup, and use it for workers processes, and use cgroup.kill if
available.
This is especially useful as rules can spawn arbitrary programs/scripts
and TasksMax= is set to unlimited.
Luca Boccassi [Fri, 19 Jun 2026 18:35:16 +0000 (19:35 +0100)]
core: add LUOSession= unit setting (#42530)
Acquiring a LUO session from /dev/liveupdate requires privileges, and
also the device is a single-owner driver so only a single process can
open it at any given time.
Add a LUOSession= service settings that allows units running without
privileges to get a session assigned to them.
The kernel imposes a 64 chars limit on session names, which is too short
to avoid clashes, so derive a hash from joining the unit name with the
parameter name, that way two units using the same setting don't clash.
Revert "units: drop After=network-online.target from imds services"
IMDS access requires networking, hence we need to run after
network-online.target. Everything else would be racy and result in
likely timeouts, because we might try to contact the network too early.