Frantisek Sumsal [Mon, 13 Oct 2025 15:36:55 +0000 (17:36 +0200)]
timer: rebase the next elapse timestamp only if timer didn't already run
The test added in f4c3c107d9be4e922a080fc292ed3889c4e0f4a5 uncovered a
corner case while recalculating the next elapse timestamp of a timer unit
that uses RandomizedDelaySec= during deserialization.
If the scheduled time (without RandomizedDelaySec=) already elapsed,
systemd "rebases" the next elapse timestamp to the time when systemd
first started, to make the RandomizedDelaySec= feature work even at
boot. However, since it was done unconditionally, it always overrode the
next elapse timestamp, which could then cause the final next elapse
timestamp to fall out of the expected window.
With a couple of additional debug logs one of the test fail looks like
this:
[ 132.129815] TEST-53-TIMER.sh[384]: + : 'Next elapse timestamp after daemon-reload, try #328'
[ 132.129815] TEST-53-TIMER.sh[384]: + systemctl daemon-reload
[ 132.136352] systemd[1]: Reload requested from client PID 16399 ('systemctl') (unit TEST-53-TIMER.service)...
[ 132.136636] systemd[1]: Reloading...
[ 132.446160] systemd[1]: Rebasing next elapse timestamp
[ 132.446168] systemd[1]: v->next_elapse: Tue 2025-10-14 00:10:00 CEST
[ 132.446170] systemd[1]: rebased: Tue 2025-10-14 00:10:56 CEST
[ 132.446172] systemd[1]: v->next_elapse after rebase: Tue 2025-10-14 00:10:56 CEST
[ 132.447361] systemd[1]: Reloading finished in 310 ms.
[ 132.484041] TEST-53-TIMER.sh[384]: + check_elapse_timestamp
[ 132.484041] TEST-53-TIMER.sh[384]: + systemctl status timer-RandomizedDelaySec-16377.timer
[ 132.533657] TEST-53-TIMER.sh[16440]: ● timer-RandomizedDelaySec-16377.timer
[ 132.533657] TEST-53-TIMER.sh[16440]: Loaded: loaded (/run/systemd/system/timer-RandomizedDelaySec-16377.timer; static)
[ 132.533657] TEST-53-TIMER.sh[16440]: Active: active (waiting) since Mon 2025-10-13 23:00:00 CEST; 1h 13min ago
[ 132.533657] TEST-53-TIMER.sh[16440]: Invocation: 5555d4f060114a5493ff228013830d17
[ 132.533657] TEST-53-TIMER.sh[16440]: Trigger: Tue 2025-10-14 22:10:04 CEST; 21h left
[ 132.533657] TEST-53-TIMER.sh[16440]: Triggers: ● timer-RandomizedDelaySec-16377.service
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:07 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Changed dead -> waiting
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:07 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Adding 15h 35min 1.230173s random time.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:07 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Realtime timer elapses at Tue 2025-10-14 15:45:58 CEST.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:07 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Changed dead -> waiting
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Adding 16h 29min 44.084409s random time.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Realtime timer elapses at Tue 2025-10-14 16:40:41 CEST.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Changed dead -> waiting
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Adding 21h 59min 7.955828s random time.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Realtime timer elapses at Tue 2025-10-14 22:10:04 CEST.
[ 132.533657] TEST-53-TIMER.sh[16440]: Oct 14 00:13:08 H systemd[1]: timer-RandomizedDelaySec-16377.timer: Changed dead -> waiting
[ 132.535386] TEST-53-TIMER.sh[384]: + systemctl show -p InactiveExitTimestamp timer-RandomizedDelaySec-16377.timer
[ 132.537727] TEST-53-TIMER.sh[16442]: InactiveExitTimestamp=Mon 2025-10-13 23:00:00 CEST
[ 132.540317] TEST-53-TIMER.sh[16444]: ++ systemctl show -P NextElapseUSecRealtime timer-RandomizedDelaySec-16377.timer
[ 132.547745] TEST-53-TIMER.sh[384]: + NEXT_ELAPSE_REALTIME='Tue 2025-10-14 22:10:04 CEST'
[ 132.548020] TEST-53-TIMER.sh[16445]: ++ date '--date=Tue 2025-10-14 22:10:04 CEST' +%s
[ 132.550218] TEST-53-TIMER.sh[384]: + NEXT_ELAPSE_REALTIME_S=1760472604
[ 132.550218] TEST-53-TIMER.sh[384]: + : 'Next elapse timestamp should be Tue 2025-10-14 00:10:00 CEST <= Tue 2025-10-14 22:10:04 CEST <= Tue 2025-10-14 22:10:00 CEST'
[ 132.550218] TEST-53-TIMER.sh[384]: + assert_ge 17604726041760393400
[ 132.550555] TEST-53-TIMER.sh[16446]: + set +ex
[ 132.550702] TEST-53-TIMER.sh[384]: + assert_le 17604726041760472600
[ 132.550832] TEST-53-TIMER.sh[16447]: + set +ex
[ 132.551091] TEST-53-TIMER.sh[16447]: FAIL: '1760472604' > '1760472600'
Here the original next elapse timestamp was Tue 2025-10-14 00:10:00 CEST
as expected, but it was overridden by the rebased timestamp:
Tue 2025-10-14 00:10:56 CEST. And when a new randomized delay was added
to it (21h 59min 7.955828s) the final next elapse timestamp fell out of
the expected window, i.e. Tue 2025-10-14 00:10:00 (scheduled time) <
Tue 2025-10-14 22:10:04 CEST (rebased elapse timestamp + randomized
delay) < Tue 2025-10-14 22:10:00 CEST (scheduled time + maximum from
RandomizedDelaySec=, i.e. 22h).
By limiting the timestamp rebase only the case where the unit hasn't
already run should prevent this from happening during daemon-reload.
Frantisek Sumsal [Mon, 13 Oct 2025 15:35:02 +0000 (17:35 +0200)]
test: format the min/max timestamps in "systemd" style
Before:
Next elapse timestamp should be Sun Oct 12 00:10:00 UTC 2025 <= Sun 2025-10-12 05:43:04 UTC <= Sun Oct 12 22:10:00 UTC
After:
Next elapse timestamp should be Tue 2025-10-14 00:10:00 CEST <= Tue 2025-10-14 19:39:11 CEST <= Tue 2025-10-14 22:10:00 CEST
(cherry picked from commit 62ca845ac776d5877fe46dab52692053df6c8efa)
core: allow split /usr/local/s?sbin with merged /usr/s?bin
Previously, we used either the fully split path or the fully merged path,
treating "split sbin" as a boolean condition. The idea was that conversion to
to merged bin would be a single event, so we don't need to care about the
details of the transition. But it turns out that some systems may be converted
in disparate steps. In https://bugzilla.redhat.com/show_bug.cgi?id=2400220,
there was a lengthy discussion about a coreos system where
/usr/local/{bin,sbin} were created as separate directories. Since /usr/local is
not part of the packaged system, it might remain split for a longer time. So
check /usr/local/s?bin separately and stop adding /usr/sbin to $PATH if only
/usr/local/s?bin is split. (I don't think it makes sense to handle the reverse
case, i.e. only /usr/s?bin being split, since that should be much rarer.)
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=2400220.
* 8e2833a5b6 Automatically figure out the name of the top-level tar dir
* dffbf2beba Make sure fallback source is listed first
* 1d3b892105 Enable sysupdate and sysupdated
docs: add comment about requiring the mount hierarchy to be mounted MS_SHARED
This has been tripping up container manager people. let's document this
explicitly.
(Note that the container interface could really use some updates, i.e.
it was written before a time where cgroup namespacing was a thing. But I
am too lazy to fix that now, so let's just add this once facet.)
doc: indicate Type=oneshot also detects invocation failures
Type `simple` explicitly mentions that invocation failures like a missing binary
or `User=` name won’t get detected – whereas type `exec` mentions that it does.
Type `oneshot` refers to being similar to `simple`, which could lead one to
assume it doesn’t detect such invocation failures either – it seems however it
does.
Indicate this my changing its wording to be similar to `exec`.
Frantisek Sumsal [Thu, 23 Oct 2025 13:30:52 +0000 (15:30 +0200)]
man: handle leading/trailing/repeating whitespaces in anchor links
So even if a <term> section contains newlines, we get a reasonable
anchor link to it.
Before:
<dt id="
bind
UNIT
PATH
[PATH]
"><span class="term">
...
<a class="headerlink" title="Permalink to this term" href="#%0A%20%20%20%20%20%20%20%20%20%20%20%20bind%0A%20%20%20%20%20%20%20%20%20%20%20%20UNIT%0A%20%20%20%20%20%20%20%20%20%20%20%20PATH%0A%20%20%20%20%20%20%20%20%20%20%20%20[PATH]%0A%20%20%20%20%20%20%20%20%20%20">¶</a>
After:
<dt id="bind UNIT PATH [PATH]"><span class="term">
...
<a class="headerlink" title="Permalink to this term" href="#bind%20UNIT%20PATH%20[PATH]">¶</a>
Ronan Pigott [Sun, 26 Oct 2025 04:04:03 +0000 (21:04 -0700)]
zsh: add completion for dbus bus address
The DBUS_SESSION_BUS_ADDRESS and DBUS_SYSTEM_BUS_ADDRESS parameters have
an interesting syntax thats useful to complete. Let's include a
completion definition for these parameters.
Ryan Brue [Mon, 18 Aug 2025 17:12:26 +0000 (12:12 -0500)]
man: Clarify usage of /usr/share/factory/ in programs
As discussed in this thread:
https://github.com/redhat-performance/tuned/issues/798#issuecomment-3197697654
/usr/share/factory/ is not intended to be read from by programs,
but the wording in the FHS can be misread to think that programs
should be using /usr/share/factory/ as the vendor supplied configuration
directory rather than something like /usr/lib/foo/ or /usr/share/foo/.
This commit points developers to the UAPI configuration spec for how to
make their programs hermetic /usr/ compatible.
Marien Zwart [Sun, 19 Oct 2025 13:41:08 +0000 (00:41 +1100)]
docs: fix conversion / calculation errors
0x1770 is 6000, not 60000. It looks like 60000 is intended (the next
range starts at 60000 in both decimal and hex), so use that.
1000 to 60000 is 59001 users, as the range is inclusive on both sides.
Similar off-by-one for one of the "unused" ranges. After these changes,
the sizes of the ranges up to and including the "-1" ID sum up to 65536,
as expected.
I'm not sure where the size of the unused range after the container UID
range came from, but it is not correct (the "Container UID" and this
reserved range combined would be larger than the "HIC SVNT LEONES" 2^31
to 2^32-2 range...). Fix it.
It is unfortunate that the first half of this table makes more sense in
decimal while the second half makes more sense in hex (which would also
make the size in 65536 chunks easy to obtain): I'm tempted to add a
"sizes in hex" column...
man/systemd-systemd.conf: describe DefaultEnvironment= and ManagerEnvironment= better
The description of ME= said "see above", but it was actually above the other
one. So change the order. But while reading this, I found it very hard to
understand. So reword things, hopefully in a way that is easier to understand.
The current behaviour is rather complex and unintuitive, but this description
just tries to describe it truthfully.
Luca Boccassi [Fri, 17 Oct 2025 13:00:23 +0000 (14:00 +0100)]
ci: re-enable bpf-framework option for build and unit test jobs
Use the same trickery we do in the package build and search for
the actual bpftool binary. For the CI job any one we find is
good enough.
When we switch all jobs to 26.04 we can drop all of this.
core/unit: fail earlier before spawning executor when we failed to realize cgroup
Before 23ac08115af83e3a0a937fa207fc52511aba2ffa, even if we failed to
create the cgroup for a unit, a cgroup runtime object for the cgroup is
created with the cgroup path. Hence, the creation of cgroup is failed,
execution of the unit will fail in posix_spawn_wrapper() and logged
something like the following:
```
systemd[1]: testservice.service: Failed to create cgroup /testslice.slice/testservice.service: Cannot allocate memory
systemd[1]: testservice.service: Failed to spawn executor: No such file or directory
systemd[1]: testservice.service: Failed to spawn 'start' task: No such file or directory
systemd[1]: testservice.service: Failed with result 'resources'.
systemd[1]: Failed to start testservice.service.
```
However, after the commit, when we failed to create the cgroup, a cgroup
runtime object is not created, hence NULL will be assigned to
ExecParameters.cgroup_path in unit_set_exec_params().
Hence, the unit process will be invoked in the init.scope.
```
systemd[1]: testservice.service: Failed to create cgroup /testslice.slice/testservice.service: Cannot allocate memory
systemd[1]: Starting testservice.service...
cat[1094]: 0::/init.scope
systemd[1]: testservice.service: Deactivated successfully.
systemd[1]: Finished testservice.service.
```
where the test service calls 'cat /proc/self/cgroup'.
To fix the issue, let's fail earlier when we failed to create cgroup.
Daan De Meyer [Mon, 13 Oct 2025 08:43:16 +0000 (10:43 +0200)]
sd-id128: Drop _sd_const_ from sd_id128_in_setv()
Both the const and pure attributes disallow modifying input arguments
but sd_id128_in_setv() clearly modifies its ap input argument by iterating
over it with va_arg() so drop the _sd_const_ attribute from
sd_id128_in_setv().
test: check the next elapse timer timestamp after deserialization
When deserializing a serialized timer unit with RandomizedDelaySec= set,
systemd should use the last inactive exit timestamp instead of current
realtime to calculate the new next elapse, so the timer unit actually
runs in the given calendar window.
Luca Boccassi [Tue, 26 Aug 2025 18:12:53 +0000 (19:12 +0100)]
sysext: do not attempt to unlock images interactively
These images are not using a passphrase, they are using keys
or at most TPM-based sealing (not yet implemented, for contexts).
Do not use the interactive helper, as it will block and ask the
user for a password if it fails to find the signing cert, which
is not useful for this tool.
Since mountfsd was added in 702a52f4b5d49cce11e2adbc740deb3b644e2de0 the
caps bounding set line was commented. That's an accident. Fix that. (We
need to add a bunch of caps to the list).
This fixes two things: first of all it ensures we take the override
status output field properly into account, instead of going directly to
the regular one.
Moreover, it ensures that we bypass auto for both notice + emergency,
since both have the same "impact", and, don't limit this for notice
only.
UID entry in the machine state file is introduced in v258,
hence when a host is upgraded to v258, the field does not exist in the
file, thus the variable 'uid' is NULL.
core/bpf-firewall: replace unnecessary unit_setup_cgroup_runtime() with unit_get_cgroup_runtime()
Except for the test, bpf_firewall_compile() is only called by the following:
cgroup_context_apply() -> cgroup_apply_firewall() -> bpf_firewall_compile()
and in the early stage of cgroup_context_apply(), it checks if the cgroup
runtime exists. Hence, it is not necessary to try to allocate the
runtime in bpf_firewall_compile().
Mike Yuan [Thu, 25 Sep 2025 20:28:33 +0000 (22:28 +0200)]
core/cgroup: make sure deserialized accounting data is not voided
Currently, cgroup_path is (de-)serialized after all the cached
accounting data. This is bogus though, since unit_set_cgroup_path()
destroys the CGroupRuntime object and starts afresh, discarding
all deserialized values. This matters especially for IP accounting,
whose BPF maps get recreated on reload/reexec and the previous values
are exclusively retrievable from deserialization. Let's hence swap things
around and serialize cgroup_path first, accounting data only afterwards.
Felix Pehla [Sat, 27 Sep 2025 13:01:06 +0000 (15:01 +0200)]
shared/bootspec: parse 'uki' boot entry option
Commit e2a3d562189c413de3262ec47cdc1e1b0b13d78b (as part of #36314)
makes sd-boot recognize a 'uki' stanza in a boot loader entry and
uapi-group/specifications@3f2bd8236d7f9ce6dedf8bda9cadffd0d363cb08 adds
it to the BLS, but bootctl and other components parsing said config do
not know about it, leading to the error message
`Unknown line 'uki', ignoring.` when attempting to parse the same entry.
This commit makes it get parsed the same way that that 'efi' is.
In the commit c960ca2be1cfd183675df581f049a0c022c1c802, the logic of
updating ACL on device node was moved from logind to udevd, but at that
time, mistakenly removed the logic for static nodes.
man: fix advice regarding thread safety of libsystemd
The prohibition to move libsystemd objects between threads was added in 64a7ef8bc06b5dcfcd9f99ea10a43bde75c4370f ('man: be more explicit about thread
safety of sd_journal'). At the time, this was valid, because we were using the
mempool for allocation and it apparently didn't handle access from different
threads. Sadlly, the commit links to a bugzilla entry referenced in the commit
is not publicly visible anymore, so the details are murky. But we stopped using
the mempool in a5d8835c78112206bbf0812dd4cb471f803bfe88 ('mempool: only enable
mempool use when linked to libsystemd-shared.so'), with subsequent followup in b01f31954f1c7c4601925173ae2638b572224e9a ('Turn mempool_enabled() into a weak
symbol'). The restriction added in the man page is not necessary since then.
The text in the man page was arguably incorrect in calling the code
"thread-agnostic". If the code does not support being touched from threads at
all and has global state to tied to the main thread, it is not "agnostic", but
just doesn't support threads.
(I'm looking into https://github.com/systemd/python-systemd/issues/143, and
with the current scheme, the python-systemd module and all python code using
libsystemd would be very hard to use. With the change to free-threaded python
in python3.13, i.e. the replacement of single Global Interpreter Lock by
locking on individual objects, this limitation would become even more
constraining.)
Anton Tiurin [Mon, 15 Sep 2025 19:32:39 +0000 (12:32 -0700)]
networkd: fia xRequiredOperationalStateForOnline serializtion
In integration tests (for example TEST-85-NETWORK-NetworkctlTests)
LINK_OPERSTATE_RANGE_INVALID and required_for_online == -1 are serialized as
```
"RequiredForOnline": "true",
"RequiredOperationalStateForOnline": [null, null]
```
Such link should be reported as required_for_online=False and not
serialize nulls.
machined: do not allow unprivileged users to shell into the root namespace
We intend to make self-registering machines an unprivileged operation,
but currently that would allow an unprivileged user to register a
process they own in the root namespace, and then login as any
user they like, including root, which is not ideal.
Forbid non-root from shelling into a machine that is running in
the root user namespace.
Let's never bother with old TPM 1.x structures, they are not mentioned
in the TCG for TPM2 spec at all. However, the spec does say we should
check the Size field of the relevant structs, before accessing them,
hence do that.
On older versions, if the flag is anything other than AT_SYMLINK_NOFOLLOW,
it returns EINVAL, so we can detect it and call the kernel syscall directly
ourselves.
Using the glibc wrappers when possible is prefereable so that programs
like fakeroot can intercept its calls and redirect them.
In systemd <= 257, each set_audit tristate value had special meaning,
- true: enable the kernel audit subsystem,
- false: disable the kernel audit subsystem,
- negative: keep the current kernel audit subsystem state.
And the default is true, rather than negative. So, users sometimes
explicitly pass an empty string to Audit= setting to keep the state.
But since f48cf2a96dfdc23fe30ba0f870125fe55cab64c7 (v258), the negative
value is mistakenly used as 'really unspecified' even if an empty string
is explicitly specified.
This makes negative values handled as unspecified as usual, and assign a new
positive value AUDIT_KEEP for when an empty string is explicitly specified.
Also, make the Audit= setting accept "keep" setting, and suggest to use "keep"
rather than an empty string.
So, arch-chroot currently uses a rather cursed setup:
it sets up a PID namespace, but mounts /proc/ from the outside
into the chroot tree, and then call chroot(2), essentially
making it somewhere between chroot(8) and a full-blown
container. Hence, the PID dirs in /proc/ reveal the outer world.
The offending commit switched chroot detection to compare
/proc/1/root and /proc/OUR_PID/root, exhibiting the faulty behavior
where the mentioned environment now gets deemed to be non-chroot.
Now, this is very much an issue in arch-chroot. However,
if /proc/ is to be properly associated with the pidns,
then we'd treat it as a container and no longer a chroot.
Also, the previous logic feels more readable and more
honestly reported errors in proc_mounted(). Hence I opted
for reverting the change here. Still note that the culprit
(once again :/) lies in the arch-chroot's pidns impl, not
systemd.
It's not well-formed to begin with. And util-linux's mount(8)
is pretty much ubiquitously employed, hence it will be rejected
elsewhere too. Just stop pretending it is valid just because
glibc parser is sloppy.
core: if we cannot decode a TPM credential skip over it for ImportCredential=
let's skip over credentials we cannot decode when they are found with
ImportCredential=. When installing an OS on some disk and using that
disk on a different machine than assumed we'll otherwise end up with a
broken boot, because the credentials cannot be decoded when starting
systemd-firstboot. Let's handle this somewhat gracefully.
This leaves handling for LoadCredential=/SetCredential= as it is (i.e.
failure to decrypt results in service failure), because it is a lot more
explicit and focussed as opposed to ImportCredentials= which looks
everywhere, uses globs and so on and is hence very vague and unfocussed.
basic/efivars: read EFI variables using one read(), not two
In https://github.com/systemd/systemd/issues/38842 it is reported that we're again
having trouble accessing EFI variables:
[ 292.212415] H (udev-worker)[253]: Reading EFI variable /sys/firmware/efi/efivars/LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f.
...
[ 344.397961] H (udev-worker)[253]: Detected slow EFI variable read access on LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f: 52.185510s
We don't know what causes the slowdown, but it seems reasonable to avoid
unnecessary read() calls. We would read the 4-byte attr first, and then the
actual value later. But our code always reads the value (and discards the attr
in all cases except one, when _writing_ the variable), so let's optimize for
the case where we read the value and read the whole contents in one readv().
Tobias Heider [Mon, 25 Aug 2025 14:07:54 +0000 (16:07 +0200)]
stub: fix file path handling for loaded kernel
- Actually pass the new memory file path to parent_loaded_image->FilePath
- Restore old parent_loaded_image if Linux returns
- Pass the same kernel_file_path in load_via_boot_services path
- s/Re-use/Patch in comment explaining what we are doing
Luca Boccassi [Sun, 24 Aug 2025 19:51:23 +0000 (20:51 +0100)]
repart: do not fail when CopyBlocks= is used in the initrd
When running in the initrd --root= is automatically set to /sysroot or /sysusr
but then using CopyBlocks fails due to a security measure:
root@particle-caba-1e47:~# systemd-repart --dry-run=no /dev/vda
No machine ID set, using randomized partition UUIDs.
Automatic discovery of backing block devices not permitted in --root= mode, refusing.