git.ipfire.org Git - thirdparty/systemd.git/log

machine-tags: extend syntax to support key/value pairs (#42618)

This is a minor extension, to move the machine tags concept more closely
towards what higher-level solutions support for tagging machines, such
as kubernetes, simply to reduce the conceptual impedance mismatch.

resolved: load libcrypto/libssl lazily on first use and make them optional (#42681)

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

Expand specifiers in `MakeSymlinks=` target in `repart.d` (#42694)

Closes #42693. Specifiers are now expanded in symlink targets
(previously, they were only expanded in the source) - this is
technically a breaking change, but I'd be very surprised if anyone was
relying on this.

No other simplification is applied to the target (unlike the source,
which goes through `path_simplify_and_warn`).

Also a few minor changes:

- rename local `path` variable to `source` to match documentation
convention
- document that `MakeSymlinks=` accepts specifiers
- fix error message to print `MakeSymlinks=` option instead of
`Subvolumes=`

systemctl: add --kernel-cmdline-reuse option

kexec-tools has a --reuse-cmdline option which is very convenient
when doing a lot of reboots, add the same to systemctl.
Dedup options, letting the last one wins in case of duplicates,
so that 'systemctl kexec --reuse-cmdline' can be chained many times
without continuosly expanding the cmdline with duplicates from
the boot entry.

btrfs-util,rm-rf: clean up subvolumes without user_subvol_rm_allowed

Without CAP_SYS_ADMIN and without the 'user_subvol_rm_allowed' mount
option, BTRFS_IOC_SNAP_DESTROY is rejected with EPERM (or EROFS for a
read-only subvolume), so rm_rf_subvolume() left subvolumes behind.
test-btrfs thus accumulated leftover subvolumes in /var/tmp on every
unprivileged run on a btrfs filesystem.

An unprivileged owner can however clear the RDONLY flag, empty a
subvolume and rmdir() it. So clear the RDONLY flag on EPERM/EACCES too
(not just EROFS) to leave the subvolume writable, and let rm_rf() fall
through on EPERM/EACCES to empty the subvolume recursively and rmdir()
it, matching what rm_rf_at() already did.

Fixes https://github.com/systemd/systemd/issues/42674

report-basic, networkd: add Version, KernelTimestamp, Address metrics (#42315)

This PR adds some more useful metrics:
- io.systemd.Network.Address
- io.systemd.Basic.KernelTimestamp.{Realtime,Monotonic}
- io.systemd.Basic.Version

ssl-util: support OpenSSL 4 (#42676)

OpenSSL 4 broke ABI, so we need to look for both SONAMEs.

Follow-up for
https://github.com/systemd/systemd/commit/ccdd42351f79cbb9c2e034a96280a1ded40a2f95

Fixes https://github.com/systemd/systemd/issues/42675

resolve: fix transaction leak in dns_transaction_new() error path

hashmap_replace() failure left t in s->manager->dns_transactions with
t->scope still NULL, causing the destructor to skip hashmap_remove().
Add the missing cleanup mirroring the earlier error path in the same
function.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

resolved: load libcrypto/libssl lazily on first use and make them optional

Currently they are marked as required, as resolved aborts on startup if
dns-over-tls is built in, even if it is not enabled in the config.
Change initialization to be done lazily on first use, so that if the
config is not enabled, it never runs, and the libraries are never
dlopened, so they can be downgraded to recommends.

ssl-util: add cleanup helper for SSL_CTX

journal: add catalog message for missing dlopen dep

log: add log_struct_once macro

Combines log_once and log_struct

tree-wide: Beef up openssl logging

Let's translate openssl's errors to proper errnos
where we can instead of returning EIO for everything.
Let's also make log_openssl_errors() public so we can
use it everywhere and migrate the rest of the codebase
to use it.

repart: make vfat creation reproducible (#42446)

Two fixes to get this byte-stable:

- `fd_copy_directory()` was using `FOREACH_DIRENT_ALL`, which doesn't
give stable ordering. Read all paths, sort, then iterate.
- `mcopy -s` depends on `readdir()` ordering and thus isn't
reproducible. Implement the recursion/sorting here and only invoke
mcopy/mmd per dir.

First change increases memory usage, as we don't stream the paths
anymore, second increases the number of context switches when invoking
external tools. Both should be fine given the ESP content should usually
be pretty limited.

I'd like to write a test for this, but didn't come up with a way that
doesn't require privileges and would surface the error reliably.

shared/tpm2: support chunked reads of NV indexes

The TPM2_NV_Read commands returns the requested data in a
TPM2B_MAX_NV_BUFFER type, the maximum size of which is TPM-specific and
can be determined by querying the value of the TPM_PT_NV_BUFFER_MAX
property.

The value of this may be smaller than the payload size of some NV
indexes, particularly when that payload is a X509 certificate with a RSA
public key. Eg, the manufacturer supplied RSA EK certificate on my own
machine has a size of 1035 bytes, and the value of TPM_PT_NV_BUFFER_MAX
is 1024.

To handle this case and make it possible to read any EK certificate from
the TPM, make tpm2_read_nv_index support chunked reads when the payload
size is larger than what the TPM can return in a single command.

ssl-util: prefer OpenSSL 4

For the next version we can switch to preferring the new version

ssl-util: support OpenSSL 4

OpenSSL 4 broke ABI, so we need to look for both SONAMEs.
Try libssl.so.3 first, and fallback to libssl.so.4,
so that the older and more stable version is used if both
are installed, giving distros time to fix regressions.

Follow-up for ccdd42351f79cbb9c2e034a96280a1ded40a2f95

Fixes https://github.com/systemd/systemd/issues/42675

core: create abstraction/more properties for the "Exec" part of Unit.StartTransient (#42360)

This is a bit of an RFC (but I hope I got it mostly right), @daandemeyer
suggested in
https://github.com/systemd/systemd/pull/42161#pullrequestreview-4336323314
to improve the abstractions around the Exec= in
io.systemd.Unit.StartTransient as we will add a bunch more of those. So
this PR adds first a better abstraction and then uses it. See the
individual commits for details.

Add NEWS entry

This is a breaking change, even if it is unlikely that anyone is relying
on it.

repart: expand specifiers in MakeSymlinks= target

Previously, they were only expanded in the source part of the arguments.
No other validation is applied to the target component.

core: add _parameters_init for the Unit.StartTransient dispatch

This commit extracts the initialization of the transient parameters
for io.systemd.Unit.StartTransient into a set of helpers that follow
the _parameters_init() pattern. This way the code is more uniform
and easier to extend and less fragile. It also means there is a
single (logical) place to init the fields.

core: add more settable properties to varlink Unit.StartTransient()

This commit uses the abstractions added in the previous commit to
add a bunch more properties to the io.systemd.StartTransient()
to showcase how straightforward this is now.

New helpers for tristate bools and an init helper are added. A
dedicated dispatcher for LogLevelMax parses the string-form name
("info", "debug" etc.) declared in the varlink IDL.

The new properties are: DynamicUser, IgnoreSIGPIPE, LockPersonality,
MemoryDenyWriteExecute, NoNewPrivileges, OOMScoreAdjust, RemoveIPC,
RestrictRealtime, RestrictSUIDSGID, RootEphemeral, UMask.

The remaining ProtectKernel*, Private*, ProtectClock properties are
declared as STRING in the varlink IDL (matching the modern *Ex/enum
form) so a bool dispatcher does not pass schema validation. Those
need a string-parsing dispatcher and will be added in a follow-up.

This brings us closer to parity with the D-Bus code (still a long
way to go though).

core: create abstraction for the "Exec" part of Unit.StartTransient

The handling of the `Exec` parameters for the varlink
`io.systemd.Unit.StartTransient()` became a bit unwieldy. So
this commit creates another abstraction to handle the various
fields in the `Exec` part of the StartTransient code.

Each Exec property is now described by a single TransientExecProperty
entry and adding a new property is just a single entry there plus
an apply function.

Thanks to Ivan Kruglov for many useful suggestions.

tmpfiles: add %D specifier resolution

systemd-tmpfiles now resolves the %D specifier to /usr/share
(for the system manager) or $XDG_DATA_HOME (for the user manager).

Closes: https://github.com/systemd/systemd/issues/42010
Signed-off-by: Skye Soss <skye@soss.website>

Translations update from Fedora Weblate (#42682)

Translations update from [Fedora
Weblate](https://translate.fedoraproject.org) for
[systemd/main](https://translate.fedoraproject.org/projects/systemd/main/).

Current translation status:

![Weblate translation
status](https://translate.fedoraproject.org/widget/systemd/main/horizontal-auto.svg)

po: Translated using Weblate (Slovenian)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Martin Srebotnjak <miles@filmsi.net>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/sl/
Translation: systemd/main

po: Translated using Weblate (Georgian)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Temuri Doghonadze <temuri.doghonadze@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ka/
Translation: systemd/main

po: Translated using Weblate (Polish)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Marek Adamski <maradam@users.noreply.translate.fedoraproject.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/pl/
Translation: systemd/main

po: Translated using Weblate (Russian)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Sergey A. <Ser82-png@yandex.ru>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ru/
Translation: systemd/main

po: Translated using Weblate (Swedish)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Luna Jernberg <droidbittin@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/sv/
Translation: systemd/main

po: Translated using Weblate (Portuguese)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Américo Monteiro <a_monteiro@gmx.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/pt/
Translation: systemd/main

po: Translated using Weblate (Turkish)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: Oğuz Ersen <oguz@ersen.moe>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/tr/
Translation: systemd/main

po: Translated using Weblate (Arabic)

Currently translated at 100.0% (286 of 286 strings)

Co-authored-by: joo es <jonnyse@users.noreply.translate.fedoraproject.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/ar/
Translation: systemd/main

cdrom_id: zero scsi response buffers before parsing

hwdb: Map Acer Nitro AN515-58 NitroSense button to prog1

The Acer Nitro AN515-58 has a dedicated NitroSense button (scan code
0xf5) with no entry in its device-specific block. It currently falls
through to the generic Acer rule:

KEYBOARD_KEY_f5=presentation

This is semantically wrong — the button has no relation to
presentation mode. Other Nitro models (AN515-47, AN517-54, ANV15-51)
already map it to prog1 (XF86Launch1), making it a user-programmable
button that desktop environments (KDE, GNOME, etc.) can bind to any
action.

Add the same mapping for AN515-58 for consistency:

KEYBOARD_KEY_f5=prog1 # NitroSense button

Tested on Acer Nitro AN515-58 with Linux 7.0.12 and KDE Plasma 6.7

executor: also preload libcrypto

It's needed for the userspace fallback verity verification, so
it needs to be pre-loaded to avoid getting blocked by RTLD_NOLOAD:

[   57.163995] (cat)[1560]: minimal-app0-foo.service: Validation of dm-verity signature failed via the kernel, trying userspace validation instead: Required key not available
[   57.194696] (cat)[1560]: minimal-app0-foo.service: Refusing loading of 'libcrypto.so.3', as loading further dlopen() modules has been blocked.
[   57.197940] (cat)[1560]: minimal-app0-foo.service: Shared library 'libcrypto.so.3' is not available: Operation not permitted
[   57.204283] (cat)[1560]: minimal-app0-foo.service: Failed to activate verity device /dev/mapper/2b2fd83f324c3aa2ea1a979899f9c630761f1de3c5e00ce8c6bb36f4d137f450-verity: Operation not supported
[   57.272782] (cat)[1560]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not supported
[   57.274250] (cat)[1560]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not supported

Follow-up for efaf5a763d6a06645dba8e88ebc15e887d59cbef

generator: order cryptsetup/verity/integrity after systemd-udevd

During soft-reboot teardown, systemd-cryptsetup@.service's ExecStop runs
libcryptsetup's crypt_deactivate(), which issues DM_REMOVE with a udev
cookie and blocks in dm_udev_wait() until 95-dm-notify.rules decrements
the cookie semaphore via "dmsetup udevcomplete $env{DM_COOKIE}".

The generated cryptsetup unit had only After=systemd-udevd-kernel.socket,
not After=systemd-udevd.service. The .socket has IgnoreOnIsolate=yes and
is not stopped during soft-reboot, so it provides no ordering at all for
the service teardown. Meanwhile systemd-udevd.service has
Conflicts=soft-reboot.target (added in 0d1819e791) and stops as soon as
soft-reboot.target starts, with no ordering relative to cryptsetup's
ExecStop.

Once udevd's device monitor event source is disabled in manager_exit(),
pending DM_REMOVE uevents are no longer processed and the cookie
semaphore stays at 1 forever, blocking soft-reboot at "Stopping
Cryptography Setup..." until JobTimeoutSec=30min on soft-reboot.target
fires.

Add systemd-udevd.service to the After= ordering of the generated
cryptsetup, veritysetup and integritysetup units. By systemd's job
ordering rules (job_compare() in src/core/job.c), when two units are
both being stopped and one has After= the other, the After= unit is
stopped first. So with cryptsetup@*.service After=systemd-udevd.service,
cryptsetup stops first (cookie acknowledged by the still-live udevd),
then udevd stops.

Putting After=umount.target on systemd-udevd.service does not work: at
soft-reboot, udevd's stop job runs concurrently with umount.target's
start job, and JOB_STOP unconditionally precedes JOB_START in the
transaction (see src/core/job.c:1742). The ordering has to be expressed
between two stop jobs, which is what putting After=systemd-udevd.service
on the dm consumers achieves.

Fixes: #40298

pam_systemd: add option for acquiring logind inhibitor locks for the duration of a session (#42275)

This adds support in pam_systemd for acquiring an inhibitor lock when a
pam session is active, and can be used (for example) to prevent
suspend/sleep of a host when an ssh session is active on the host. The
reason string can be customized via `inhibit-why=`.

For example, /etc/pam.d/sshd can now contain this line: `-session
optional pam_systemd.so inhibit=sleep`

And sleep will be blocked when someone is logged into the system over
ssh:

```
foo:~$ systemd-inhibit --list --no-pager
WHO            UID USER PID COMM            WHAT  WHY                 MODE
NetworkManager 0   root 397 NetworkManager  sleep NetworkManager nee… delay
Realtime Kit   0   root 401 rtkit-daemon    sleep Demote realtime sc… delay
sshd           0   root 667 sshd-session.pa sleep Active PAM session  block
```

Fixes: #20654

shared/tpm2: split the tpm2_index_to_handle helper function

The existing tpm2_index_to_handle helper has optional return arguments
for an object's public area and its qualified name. However, it can also
be called for handles corresponding to NV indexes, where the public area
and qualified name don't apply.

I'm working on another change that would benefit from having an
equivalent helper for NV indexes, and there is already some code in
shared/tpm2-util.c that could use this as well.

This splits the existing tpm2_index_to_handle function into 3 functions:
- tpm2_index_to_handle, which works for both objects and NV indexes and
  returns a name and handle.
- tpm2_object_index_to_handle, which works the same as the existing
  function for transient or persistent objects, returning a public area,
  name, qualified name and handle.
- tpm2_nv_index_to_handle, which works for NV indexes, returning the NV
  public area, name and handle.

hostname: add $ hostname substitution and petnames wordfiles (#42566)

This commit adds support to /etc/hostname for substitution of $ from
wordlists located in /etc,/run,/usr/lib}/systemd/. Each $ is resolved to
a number (1,2,3...) and the corresponding file "1" is opened to acquire
the word. With that we can do a petname [1] style hostname in systemd,
e.g. below a possible expansion for a hostname template:

$-$-$-???? -> wildly-happy-octopus-92a9

The substitution of words is stable (based on machine-id) but if the
wordlist changes the hostname would change. We could pick it once and
cache it but Lennart did not like this so this version instead always
picks it (based on offset of the file so the operation is cheap). To
persist it one can use the `firstboot.hostname` credential. One a live
system this will be expanded and then written in the expanded form to
/etc/hostname.

This also includes a wordlist from the "petname" project that can be
optionally installed.

Thanks to Dustin Kirkland for this wonderful project.

[1] https://github.com/dustinkirkland/petname

---

I'm a bit unsure if this should include the word lists (I think its nice
to have them though) and if so if they should be their own commit.

run0: implement sudo options -k/-K/-v (#42465)

This implements a close facsimile of
[sudo](https://man.archlinux.org/man/sudo.8#K)'s -k/-K/-v options, which
manipulate the temporary authorizations used by sudo. Use like `run0 -k
whoami` to always re-auth, or just `run0 -k` to revoke a prior temporary
authorization.

~~Depends on polkit-org/polkit#662 due to a bug in polkit.~~ Now it
should work ok despite the bug.

repart: Place new partitions at beginning of free area rather than at end

When placing new partitions and there's space left because the new partitions
aren't occupying the whole free space, context_grow_partitions_on_free_area()
is supposed to distribute the free space between the new partitions. If still
no partition wants the free space, the free space ends up becoming padding.

Currently that padding is allocated to the partition preceding the FreeArea
(ie. a->after). This obviously means that the new partitions now end up at
the *end* of the free area rather than at the beginning, which is somewhat
unexpected given how partition placement usually is done.

Fix it by finding the last partition that belongs to the free area, and then
allocating the padding to that partition, so that the new partitions end up
getting aligned with the beginning of the free area, not the end.

Because the span might not be rounded to grain if there's a pre-allocated
a->after partition before the free area, we need to round it down ourselves
(otherwise the "left >= p->new_padding" assertion in context_place_partitions()
is going to fail).

Also ensure the fix works as expected by adding a test.

tpm2: support SHA384/SHA512 PCR banks in tpm2_get_best_pcr_bank() (#42538)

`tpm2_get_best_pcr_bank()` only ever considered the SHA256 and SHA1
banks (both the `LoaderTpm2ActivePcrBanks` path and the capability
guesswork). On a TPM whose only active bank is SHA384 it returned
`-EOPNOTSUPP`, breaking sealing/enrollment (cryptenroll, credential
encryption, legacy unseal). The restriction looks like a historical
simplification — `efi_get_active_pcr_banks()` already decodes
SHA384/SHA512 and `tpm2_hash_algorithms[]` already lists them.

This PR introduces an explicit preference table (SHA256 > SHA512 >
SHA384 > SHA1) and selects from it. SHA256 stays the top preference for
backwards compatibility, so existing systems keep using the same bank
and the legacy unseal-guess in `tpm2_unseal()` stays consistent;
SHA384/SHA512 are only chosen when SHA256 is unavailable, SHA1 remains
the last resort.

Behavior for existing SHA256/SHA1 systems is unchanged. Includes a unit
test for the bank-preference logic.

Related to https://github.com/systemd/systemd/pull/42537

chid_match: match UEFI firmware hwid entry with the current smbios data

UEFI firmware type hwids must be matched against the current hardware first.
This change implements that. Additionally, some extra validations on the hwids
entries have also been added.

nss-systemd: avoid ELF TLS for recursion guard

libnss_systemd currently uses a thread_local recursion guard to
avoid re-entering nss-systemd during NSS lookups.
Since libnss_systemd.so.2 is loaded lazily by glibc, accessing ELF TLS
may trigger dynamic TLS allocation in __tls_get_addr(). Under allocation
failure conditions, glibc terminates the process from the dynamic loader
instead of allowing the NSS module to return a normal failure.
Replace the recursion guard with POSIX thread-specific data to preserve the
same per-thread semantics while avoiding ELF TLS in the NSS module.
Note that pthread_setspecific() may still allocate internally on first use
per thread. The key improvement is that any such failure is returned
as a normal error code rather than terminating the process from inside
the dynamic loader.

Related: #42559

hostname: improve the algorithm in hostname_pick_word()

Lennart suggested to use a more uniform algorithm for
the picking of the hostname words that is not biased
for long words by just (predictably) randomly going over
the offsets until we land on a word boundary. This is a
very nice suggestion so this commit implements it with
a fallback to the "old" behavior if we do not find a
word boundary within a reasonable amount of attempts.

A small python script shows that 64 iterations plus
fallback is a good number:
```
$ python3 simulate-hostname-pick.py 64
hostname-wordlist/adverbs
  words=261  p_accept=0.1119  avg_bytes/word=1/p=8.94
  max_iterations=64, n_trials=1000000
    fallback rate       :   0.051000%  (510/1_000_000)
    mean seeks per word :        8.93

hostname-wordlist/adjectives
  words=449  p_accept=0.1380  avg_bytes/word=1/p=7.24
  max_iterations=64, n_trials=1000000
    fallback rate       :   0.007500%  (75/1_000_000)
    mean seeks per word :        7.25

hostname-wordlist/nouns
  words=449  p_accept=0.1472  avg_bytes/word=1/p=6.79
  max_iterations=64, n_trials=1000000
    fallback rate       :   0.002700%  (27/1_000_000)
    mean seeks per word :        6.79
```
Combined with the fallback to the previous method if
we can't find anything within the 64 attemps this seems
to be the best tradeoff and give us very good uniformity.

hostname: add ? and $ in systemd.hostname= kernel cmdline

Similar to the support for ?/$ in /etc/hostname and the credentials
we now add this to the kernel commandline systemd.hostname= option.

If the expansion fails, e.g. in the initrd where the word lists (and
possibly the machine ID) are not available yet, the option is ignored
and the usual default hostname logic applies. Once the host system is
up the expansion succeeds and the intended name is applied.

creds,firstboot: add support for ? and $ via credentials

Now that we support the `$` we want to also make this available
inside the system.hostname and firstboot.hostname credentials and
the firstboot --hostname option. This commit adds it (and also `?`).

hostname: add $ hostname substitution and petnames

This commit adds support to /etc/hostname for substitution
of $ wordlists from {/etc,/run,/usr/lib}/systemd/hostname-wordlist.
The first $ will lookup hostname-wordlist/1, the next
hostname-wordlist/2 and so on.

With that we can do a petname [1] style hostname in systemd, e.g.
below a possible expansion for a hostname template:

$-$-$-???? -> wildly-happy-octopus-92a9

The substitution of words is stable (based on machine-id) but
not persisted, it is picked on every boot via a stable file
offset so the operation is cheap. But this means that if the
wordlist changes the hostname would change. The next commit
will add the pattern to the firstboot.hostname credential which
is persistet with the resolved names to avoid this issue.

This also includes a wordlist from the "petname" project
that can be optionally installed.

Thanks to Dustin Kirkland for this wonderful project.

[1] https://github.com/dustinkirkland/petname

po: Update translation files

Updated by "Update PO files to match POT (msgmerge)" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/
Translation: systemd/main

shared/libfido2: show number of retries before lockout

For a good user experience, users expect to be informed of how many
attempts they have before being locked out of their FIDO2 device.

By displaying such information in advance, the user can make strategy to
obtain the accurate PIN or wait when they are close to an authority who
can provide them for a recovery key.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>

core: add RestartRandomizedDelaySec= service option

We already support exponential backoff for automatic restarts via
RestartSec=/RestartSteps=/RestartMaxDelaySec=, but there is no way to
randomize the restart delay. When many instances of a service fail at
the same time (e.g. because a shared resource briefly went away) they
are all restarted in lockstep, creating a thundering herd problem.

So this commit adds a simple `RestartRandomizedDelaySec=` service
option which is similar to the timer `RandomizedDelaySec=` and
adds a randomized restart delay.

core: set pids.max=0 before sending SIGKILL to cgroup

When doing the final SIGKILL when stopping units ensure no more
processes can be created by setting pids.max=0 in the cgroup.

test: enable and fix the TEST-71 hostname wildcard test

test_wildcard() was never executed: run_testcases() only picks up
functions named testcase_* so this test never ran. This commmit
makes it run and fixes two issues in the test:

1. /etc/hostname is absent in the test image so we need to guard
   for that.
2. The pattern check was written as [[ "$P" == "$H" ]] with both
   sides quoted, but we need to one side unquoted as otherwise
   the comparison will always be false.

pam_systemd: document inhibit= and inhibit-why= options

pam_systemd: acquire inhibit lock when inhibit= is specified

This adds a new function, acquire_inhibit_lock, that takes a logind
inhibitor lock for the duration of the PAM session. The lock fd is
stored in PAM data with pam_cleanup_close. In pam_sm_close_session, the
lock is released explicitly by passing NULL to pam_set_data, which
triggers the registered cleanup function immediately rather than waiting
for pam_end.

The inhibit type and reason string can also be configured via the
XDG_SESSION_INHIBIT and XDG_SESSION_INHIBIT_WHY environment variables,
which take precedence over the module arguments.

Fixes: #20654

pam_systemd: add inhibit= and inhibit-why= module arguments to parse_argv

This adds two new module arguments to parse_argv:

1) inhibit= takes a colon-separated list of inhibitor lock types to acquire

2) inhibit-why= takes an optional human-readable reason string.

Both are currently parsed but are unused until a later commit.

run0: add -n/--non-interactive as alias to --no-ask-password

These are the flag names used by sudo with similar effect.

run0: implement -v/--validate to renew temporary auth

This is meant to mirror sudo's -v/--validate options, which autohrize
the user without running a command.

run0: implement -k/-K to revoke temporary auth

This is meant to mirror sudo's -k/--reset-timestamp and
-K/--remove-timestamp options, which revoke the temporary authorization
provided by the timestamp files in /var/run/sudo/ts.

To achieve the same effect in run0, we ask polkit to revoke our
temporary authorization. If used with a command, run0 will revoke the
temporary auth and then immediately authorize the user again, just like
sudo -k. All the bus calls are completed synchronously, as they need to
complete before authorizing the user anyway.

Like sudo, the effect of -k/--reset-timestamp is to revoke only the
tmpauthz that polkit would have used to authorize the command, if
available. The -K/--remove-timestamp option will revoke all temporary
authorizations across all ttys.

networkd: report per-interface addresses as io.systemd.Network.Address

The networkd metrics interface already reports a lot of interesting
metrics. With this commit it also report the network addresses too.

Each ready address is emitted as one record per (interface, address)
pair:
- object: ifname
- value: address in CIDR notation
- fields: { family: "ipv4"|"ipv6", scope: "global"|"link"|"host"|... }

The loopback addresses are not reported as its just noise.

Example output:
```
root@localhost:~# varlinkctl --more --json=short call /run/systemd/report/io.systemd.Network io.systemd.Metrics.List '{}'
{"name":"io.systemd.Network.Address","object":"enp0s1","value":"fe80::5054:ff:fe12:3456/64","fields":{"family":"ipv6","scope":"link"}}
{"name":"io.systemd.Network.Address","object":"enp0s1","value":"fec0::5054:ff:fe12:3456/64","fields":{"family":"ipv6","scope":"site"}}
{"name":"io.systemd.Network.Address","object":"enp0s1","value":"10.0.2.15/24","fields":{"family":"ipv4","scope":"global"}}
```

core: add kernel/userspace/finish boot timestamps to io.systemd.Manager metrics

This commit adds the boot timeline (MANAGER_TIMESTAMP_KERNEL/USERSPACE/FINISH) as
metrics. The kernel CLOCK_MONOTONIC value is 0 by definition, so only its
.Realtime is reported. For userspace and finish report both .Realtime and
.Monotonic. The naming follows D-Bus.

core: add io.systemd.Manager.Version to metrics

This commit adds the systemd version to the metrics that
`io.systemd.Manager` generates.

test: drop whitespace after shell redirection operators

crypto-util: prefer openssl-4

Fedora has switched to openssl 4, and we generate a Requires dependency
on libcrypto for the systemd-udev subpackage, so preferring openssl-3
does the wrong thing. So the order in the dlopen note needs to be switched.
But in general, we want to get rid of openssl-3, so we want to load
openssl-4 in preference. Change the order in both places.

(The "compat" order can stay in 261-stable for other distros.)

report: disable json normalization

Two PRs got merged at the same time, which cause a test to fail,
as they work individually but fail when combined

TEST-74-AUX-UTILS.sh[1688]: + /usr/lib/systemd/systemd-report generate io.systemd.Manager.UnitsTotal
TEST-74-AUX-UTILS.sh[1805]: {"mediaType":"application/vnd.io.systemd.report","metrics":[{"name":"io.systemd.Manager.UnitsTotal","value":249}],"timestamp":"Fri 2026-06-19 19:50:48 UTC"}
TEST-74-AUX-UTILS.sh[1806]: + /usr/lib/systemd/systemd-report generate io.systemd.Manager.UnitsTotal
TEST-74-AUX-UTILS.sh[1807]: + jq .
TEST-74-AUX-UTILS.sh[1807]: {
TEST-74-AUX-UTILS.sh[1807]:   "mediaType": "application/vnd.io.systemd.report",
TEST-74-AUX-UTILS.sh[1807]:   "metrics": [
TEST-74-AUX-UTILS.sh[1807]: {
TEST-74-AUX-UTILS.sh[1807]:   "name": "io.systemd.Manager.UnitsTotal",
TEST-74-AUX-UTILS.sh[1807]:   "value": 249
TEST-74-AUX-UTILS.sh[1807]: }
TEST-74-AUX-UTILS.sh[1807]:   ],
TEST-74-AUX-UTILS.sh[1807]:   "timestamp": "Fri 2026-06-19 19:50:48 UTC"
TEST-74-AUX-UTILS.sh[1807]: }
TEST-74-AUX-UTILS.sh[1688]: + /usr/lib/systemd/systemd-report upload --url=http://localhost:8089/
TEST-74-AUX-UTILS.sh[1808]: Failed to normalize report JSON: Wrong medium type

https://github.com/systemd/systemd/pull/42594
https://github.com/systemd/systemd/pull/42595

Disable normalization for now, and track the issue at
https://github.com/systemd/systemd/issues/42669

Follow-up for 3c2f7c6002254fa7108e186aeedf2b2c6a86bd4f

test: extend tests for new machine tag rules

machine-tags: optionally support key/value pairs as machine tags

Other systems (kubernetes…) allow tagging machines with key/value pairs.
Let's extend our allowed syntax slightly to allow that too. Thankfully,
we enforced a pretty strict ruleset on machine tags, hence we can
introduce this without breaking compatibility.

This basically allows tags to contain "=". If so, then the left-hand
side of it must be unique among machine tags.

When matching against a machine tag, we apply the same rules as before.
This means, that if people want to check if a tag with value applies
they can do:

ConditionMachineTag=foo=bar

If they just want to check if "foo=" is set to anything, they can use
the usual glob matching:

ConditionMachineTag=foo=*

string-util: make split_pair() return parameters optional

boot/stub: honor PE SectionAlignment when loading inner kernel

The stub copies the embedded kernel PE into an xmalloc_pages()
allocation with AllocateAnyPages, which only guarantees EFI_PAGE_SIZE
(4 KiB) alignment.

The arm64 kernel declares SectionAlignment = SZ_64K. When _text is not
64K-aligned the kernel's EFI stub reallocates and copies the image,
which can fail with EFI_OUT_OF_RESOURCES on memory-constrained firmware
(e.g. U-Boot, bounded by CONFIG_SYS_MALLOC_LEN rather than full DRAM).
The failure is non-deterministic since AllocateAnyPages returns whatever
page the allocator finds first.

Plumb SectionAlignment through pe_kernel_info() and allocate via
xmalloc_aligned_pages(). For alignment <= EFI_PAGE_SIZE (e.g. x86_64) it
reduces to a plain AllocatePages(), so the change is free there.

SectionAlignment comes from the PE header, which the spec requires to be
a power of 2. Sanitize it in pe_kernel_info() right where it is parsed:
a non-conforming value falls back to plain page alignment (matching the
behavior of older systemd-stub versions that ignored the field) rather
than propagate something that would break xmalloc_aligned_pages()'s
over-alignment maths, with a log_warning() so the fallback stays
diagnosable.

Co-developed-by: Claude <claude@anthropic.com>
Closes: #42443
Reported-By: Agathe Porte <agathe.porte@oss.qualcomm.com>
Tested-By: Agathe Porte <agathe.porte@oss.qualcomm.com>

sysext: Allow to (re)start units from extension-release metadata file

Up to now we recommended to use TARGET.upholds/ symlinks to start units
when an extension is loaded. However, this has some drawbacks. First,
for services that should not be tried to be started all the time we have
to resort to hiding them through a target that gets uphold and then
uses regular .wants/ for the actual service. Second, we actually leak
services on extension unload even if the unit has disappeared with the
extension. Third, to affect a service through a drop-in or a config
change from a confext/sysext and that service is already running, we
need a way to restart/reload it instead of just starting it.

Similar to EXTENSION_RELOAD_MANAGER=1, add a EXTENSION_RESTART_UNITS=
and a EXTENSION_RELOAD_OR_RESTART_UNITS= setting to the
extension-release metadata file, carrying a whitespace-separated list
of units to restart/reload on merge/refresh/unmerge after the daemon
reload. Also detect when the unit has vanished which is normally the
case when the unit was part of the unmerged extension, and stop it
explicitly to prevent it leaking. When the extension itself ships the
binary it should use EXTENSION_RESTART_UNITS= to make sure the new
binary is picked up. Since starting through this setting does not work
when the extension is mounted from the initrd, extensions should still
ship at least a .wants/ symlink to start at boot but can also continue
to ship a .upholds/ symlink for backwards compatibility without any
drawback and still benefit from the unit stopping triggered by the new
setting. While there are cases where one could want to set
EXTENSION_RESTART_UNITS= without requiring a daemon reload (e.g., an
env var file change instead of a unit drop-in), we now do an implicit
daemon reload when we have to restart units so that we know we work on
the right state and we spare users remembering to set this setting in
addition to prevent running into this issue.

Accept NSS aliases for canonicalized user records (#42452)

This PR fixes userdb lookups for NSS users that are resolved through an
alias but returned with a canonical user name.

Some NSS providers, such as SSSD, can successfully resolve a user by an
alias-like name, for example a Kerberos/AD UPN (for example
testuser@example.test) while returning a passwd record with the
canonical login name.

The original lookup name was not preserved. Later, the userdb worker
checked whether the returned record matched the requested name with
user_record_matches_user_name(). Since the requested name was
testuser@example.test, but the record only contained testuser, the
lookup was rejected as:
`io.systemd.UserDatabase.ConflictingRecordFound`

This also caused pam_systemd to fail opening sessions for such users
with:
`pam_systemd(...:session): Failed to get user record`

In my case, this broke graphical logins for Samba users logging in with
a UPN on systems where SSSD canonicalizes the NSS result.

My solution preserves the requested name as a UserRecord alias when an
NSS lookup by name succeeds but the returned pw_name differs from the
requested name.
This allows user_record_matches_user_name() to accept the canonicalized
NSS result instead of treating it as a conflicting record.
The patch does not invent new aliases. It only records the name that NSS
itself already accepted and resolved successfully.

I'm not an expert on working with systemd, so I'm asking for a review of
my PR.

repart: Slightly simplify supplement partition code (#42508)

Juts two small things that hopefully make it easier to read the code
around supplement/suppressing partitions.

cc @AdrianVovk

repart: fix SizeMinBytes/SizeMaxBytes rounding direction

The ltype parameter for SizeMinBytes and SizeMaxBytes in the config
parsing table was reversed. SizeMinBytes should round UP (ltype > 0)
to ensure the partition is at least the specified size, while
SizeMaxBytes should round DOWN (ltype < 0) to ensure the partition
doesn't exceed the specified size.

This matches the documentation in repart.d.xml which correctly states:
- SizeMinBytes: 'rounded upwards'
- SizeMaxBytes: 'rounded downwards'

The same fix is applied to PaddingMinBytes and PaddingMaxBytes which
share the same config_parse_size4096 parser.

Fixes: #42526
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Liu Zhangjian <liuzhangjian@uniontech.com>

sysupdate: refuse reboot/pending logic when --component= is used (#42578)

Fixes #42330

The `pending` and `reboot` verbs, and the `--reboot` switch, decide
whether to reboot by comparing the newest installed version against the
booted OS version (`IMAGE_VERSION=` from os-release). When a component
is selected via
`--component=`, this ends up comparing the component's version against
the unrelated host OS version; by design these live in separate version
spaces, so the comparison is meaningless and reboot decisions become
arbitrary: depending on the relative version strings, sysupdate either
always reboots or never does.

### Example from the issue:

% /usr/lib/systemd/systemd-sysupdate --component=containerd reboot
Newest installed version '2.3.0' is older than booted version
'20260527200656'.

This refuses the combination with a clear error instead of silently
performing a bogus comparison:

- `verb_pending_or_reboot()` rejects `--component=` for the
`pending`/`reboot` verbs (mirroring the existing `--root=`/`--image=`
rejection).
- `verb_update_impl()` rejects `--reboot` combined with `--component=`,
before any update work is done.

Correctly tracking a per-component "booted" version (which could be
"none", making a reboot always apply) is a larger feature and left for
the future, as the reporter suggested.

The daemon (`sysupdated`) and `updatectl` don't perform this
host-version comparison, so the change is confined to the
`systemd-sysupdate` CLI.

Documentation and a negative regression test (TEST-72) are included.

### AI-use Disclosure:

I took assistance from Claude Opus 4.8 to scope out the issue, help with
writing proper comments/documentation, and help with writing the PR
description.

sysupdate: address review feedback on component/reboot guard

Move the --reboot/--component= rejection into parse_argv() alongside the
other cross-option checks, and tighten TEST-72 to assert the specific
guard message rather than merely a non-zero exit.

sysupdate: refuse reboot/pending logic when a component is selected

The `pending` and `reboot` verbs, as well as the `--reboot` switch, compare
the newest installed version against the booted OS version (IMAGE_VERSION= from
os-release). When a component is selected via --component=, this compares the
component's version against the unrelated host OS version, which by design live
in separate version spaces. The result is arbitrary reboot decisions: depending
on the relative version strings sysupdate would either always or never reboot.

Refuse the combination with a clear error instead of silently performing a
bogus comparison. Correctly tracking a per-component booted version is left as a
future feature.

Fixes: https://github.com/systemd/systemd/issues/42330

pam: use default auth pam_deny.so

run0 doesn't actually use the auth pam stack, since polkit does the
requisite authorization. However, if the service type is left undefined
pam falls back to the definitions of the "other" service, which, at
least in Arch Linux but possibly more, includes pam_warn.so to notify
the user about this apparent error.

This creates a bit of logspam, as systemd does actually call pam_setcred
in its generic pam code, which depends on the auth pam stack, creating a
warning message in the journal on every invocation of run0.

pam_deny.so is a no-op, which avoids falling back to the other pam
service.

udev: fix build failure due to concurrent merge

Follow-up for 60686a33c3030e8eb60cd1d10ac44bc8987f3627

hibernate: fix swap selection and prefer swap that can hold image

When multiple swap devices exist, prefer one with enough free space
to hold the hibernation image over one that cannot, regardless of
priority. If no swap can fit, fall back to priority-first selection.

This avoids deterministically failing hibernation when the
highest-priority swap is too small but a lower-priority one fits.

Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>
Co-developed-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vmspawn: minor fixes (#42634)

repart: add live subscription support to ListCandidateDevices() varlink call (#42350)

let's ensure one can watch suitables devices come and go via
ListCandidateDevices().

boot: before starting the random seed logic, let's check if the backing volume R/O state (#42379)

Related: #42307

hostnamed: allow setting machine tags via udev rules (#42390)

This gives hostnamed a lot of love, and makes it possibly to "auto-tag"
a machine via a udev rule.

It builds on the machine tag concept added in v261.

network: hook up "machine tag" concept, also with .network/.link/.netdev files

Follow-up for: 461ec6facc4291cbdb3264d473bcab3e1d88e13a

report-basic: also report load average + swap size in metrics report

report: add support for signing reports via varlink backends + make report a varlink service (#42595)

This adds the following:

1. systemd-report gains a new --sign= option, taking a boolean. If true,
this makes systemd-report generate + systemd-report upload generate a
signed report, instead of a regular one. The signatures are collected
from Varlink-based backends.
2. One such backend is added which does a simple Ed21159 based signing
scheme.
3. this adds a new metrics source which just reports text files
symlinked into a special dir as metrics. This is used to report the
Ed21159 public key as metric, by default, if it exists.
4. finally, systemd-report itself is turned into a varlink service. this
is useful for example for extracting a report from a system coming in
via the varlink/http bridge.

I thought a long time about the format of signing of reports. Initially
i intended to do this like homed's user record signing, i.e. require
normalization of the record, then normalize the record, and write it out
in dense form, since the result. Finally insert the resulting hash into
the user record itself. People have pointed me to the inherent messiness
of signing JSON this way though, as it requires any participant that
wishes to sign/authenticate records this way to implement the exact same
normalization/formatting rules, and in particular in the area of
floating point numbers (of which metrics presumably will have many) this
is quite problematic.

This signing hence goes a different way. instead of expecting
signer+verifier to independently come to the same normalized text form
of the json data, let's instead output a JSON-SEQ sequence, where the
first object is the report, and any subsequent objects are one signature
each. the signatures are supposed to cover the precise binary
representation of the first element in the JSON-SEQ stream. (i.e. from
the RS to the NL).

or in other words: a verifier would receive the JSON-SEQ stream, split
it up before each RS. Then it would leave object 1 unparsed for the
moment, and parse objects 2…n. It would then authenticate object 1's
precise binary representation with objects 2…n. Once that checks out, it
would parse object 1, and use it as report.

udev: run workers in sibling cgroup and use cgroup.kill

Since a1f4fd387603673a79a84ca4e5ce25b439b85fe6 udev processes
run in an 'udev' subcgroup, to avoid killing control processes
when clearing workers. But the main process is still in the same
cgroup, so the atomic cgroup.kill cannot be used.

If the main subcgroup exists, try to create a sibling 'workers'
cgroup, and use it for workers processes, and use cgroup.kill if
available.

This is especially useful as rules can spawn arbitrary programs/scripts
and TasksMax= is set to unlimited.

core: add KExecsCount property

Use LUO to count kexec reboots, and expose through a new property

Fixes https://github.com/systemd/systemd/issues/42581

core: add LUOSession= unit setting (#42530)

Acquiring a LUO session from /dev/liveupdate requires privileges, and
also the device is a single-owner driver so only a single process can
open it at any given time.

Add a LUOSession= service settings that allows units running without
privileges to get a session assigned to them.

The kernel imposes a 64 chars limit on session names, which is too short
to avoid clashes, so derive a hash from joining the unit name with the
parameter name, that way two units using the same setting don't clash.

meson: switch version to 262~devel

Finalize meson.version for v261

NEWS: finalize

NEWS: update contributors list

Update hwdb autosuspend

ninja -C build update-hwdb-autosuspend

Update hwdb

ninja -C build update-hwdb

Revert "units: drop After=network-online.target from imds services"

IMDS access requires networking, hence we need to run after
network-online.target. Everything else would be racy and result in
likely timeouts, because we might try to contact the network too early.

This reverts commit 3f33f4d057506970682e9de4eff06881d678f18a.

po: Translated using Weblate (Indonesian)

Currently translated at 100.0% (285 of 285 strings)

Co-authored-by: Arif Budiman <arifpedia@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/id/
Translation: systemd/main

mkosi: copy packaged test-fdstore if not built locally

On Ubuntu autopkgtest, TEST-91-LIVEUPDATE fails because test-fdstore is
not found in the dummy container.

crypto-util: support OpenSSL 4 (#42655)