Daan De Meyer [Tue, 28 Oct 2025 21:54:14 +0000 (22:54 +0100)]
mount-util: Iterate mountinfo backwards when unmounting
Submounts will always be located further in the mountinfo file, so
when we're unmounting, iterating backwards is likely to be more
efficient than iterating forwards. It'll also reduce the amount of
EBUSY debug logging we'll get since we'll stop trying to unmount
parent mounts with submounts which will always fail with EBUSY.
If you're using `udevadm monitor` from a script, without a tty, then
libc defaults to being fully-buffered, and won't flush stdout after
newlines. This is fine for tools that dump a bunch of data and then
exit immediately. It's a problem for tools like `udevadm monitor` which
have long pauses: the buffered data can get stuck in the buffer for an
unbounded amount of time.
In the Cockpit project we've been working around this for some time with
`stdbuf` which is a `LD_PRELOAD` hack to change the libc buffering
behaviour, but we'd like to stop doing that.
Let's make sure we flush the buffer after each event.
Yu Watanabe [Tue, 28 Oct 2025 04:20:58 +0000 (13:20 +0900)]
TEST-07-PID1: wait for systemd-resolved being stopped
As 'systemctl stop' is called with --no-block, previously systemd-resolved
might not be stopped when 'resolvectl' is called, and the DBus connection
might be closed during the call:
```
TEST-07-PID1.sh[5643]: + systemctl stop --no-block systemd-resolved.service
TEST-07-PID1.sh[5643]: + resolvectl
TEST-07-PID1.sh[5732]: Failed to get global data: Remote peer disconnected
```
The reverts are not strictly necessary here (as already pointed out in
https://github.com/systemd/systemd/pull/39154#issuecomment-3360118164)
but they were helpful in checking if the fix works as expected. I can
drop them if needed.
Ronan Pigott [Sun, 26 Oct 2025 04:04:03 +0000 (21:04 -0700)]
zsh: add completion for dbus bus address
The DBUS_SESSION_BUS_ADDRESS and DBUS_SYSTEM_BUS_ADDRESS parameters have
an interesting syntax thats useful to complete. Let's include a
completion definition for these parameters.
Mike Yuan [Fri, 24 Oct 2025 21:09:50 +0000 (23:09 +0200)]
logind: support deserializing session leader through pidfdid
People make weird assumptions around state preservation and
expect logind to be stoppable. While this is realistically
not OK we can probably improve things a little.
This complements f01d8658a3a57d05a5156aefd32d8137c3ee3996 and
adds support for deserializing the LEADER_PIDFDID= field.
We still prioritize pidfd if got one from fdstore (as with
service_notify_message_parse_new_pid() in pid1), but otherwise
this should make logind restart more robust when fdstore
gets spuriously cleared.
core/exec-invoke: relax restriction for process name length
Previously, we limit the length of process name by 8.
This relax the restriction then at least process comm or
program_invocation_name contains the untrucated process name.
Yu Watanabe [Sat, 25 Oct 2025 06:34:44 +0000 (15:34 +0900)]
test: extend start limit interval
As the modified service requires about ~10 seconds for stopping, the
service never hit the start limit even if we tried to restart the
service more than 5 times.
This also checks that the service is actually triggered by dbus method
call.
Daniel Hast [Fri, 24 Oct 2025 22:47:59 +0000 (18:47 -0400)]
tree-wide: add basic validation of --background argument
Check whether the argument of the `--background` option of
`systemd-run`, `run0`, `systemd-nspawn`, `systemd-vmspawn`, and
`systemd-pty-forward` is either empty or looks like an ANSI color code,
and reject invalid values when parsing arguments.
We consider a string to look like an ANSI color code if it consists of
one or more sequences of ASCII digits separated by semicolons. This
permits every valid ANSI color code, and should reject anything that
results in garbled output.
discover-image: imply that hidden images are read-only
Marking a whole directory tree OS image as read-only is difficult
privilege-wise, because so far we rely on the FS_IMMUTABLE_FL which is
not accessible to unpriv clients.
One fundamental place where we currently rely on marking images
read-only is for keeping pristine copies of the originally downloaded
image around, which we place in "hidden" image directories. This is
probably the most relevant usecase for the read-only flag. And moreover,
the only usecase for the hidden images are these read-only pristine
copies.
Hence, let's make this work reasonably in the unpriv case, and simply
imply the read-only flag for hidden images. This is strictly speaking a
change in behaviour, but effectively it shouldn't be, because for nspawn
containers that are executed we insist on names that are hostname
compatible, and hidden names aren't (because they start with a dot).
rm-rf: make sure we can safely remove dirs we have no access to via rm_rf_at()
Previously, we'd first empty a dir, and then remove it. This works fine
as long as we have access to a dir. But in some cases (like for example
a foreign owned container tree) we might not have access to the dir, but
are still able to remove it (because it is empty, and in a dir we own).
Hence let's try that first. If it works, we do not need to enter the dir
(and thus fail).
Michal Sekletar [Fri, 24 Oct 2025 10:55:20 +0000 (12:55 +0200)]
coredump: handle ENOBUFS and EMSGSIZE the same way
Depending on the runtime configuration, e.g. sysctls
net.core.wmem_default= and net.core.rmem_default and on the actual
message size, sendmsg() can fail also with ENOBUFS. E.g. alloc_skb()
failure caused by net.core.[rw]mem_default=64MiB and huge fdinfo list
from process that has 90k opened FDs.
We should handle this case in the same way as EMSGSIZE and drop part of
the message.
Yu Watanabe [Mon, 20 Oct 2025 10:40:28 +0000 (19:40 +0900)]
core: increment start limit counter only when we can start the unit
Otherwise, e.g. requesting to start a unit that is under stopping may
enter the failed state.
This makes
- rename .can_start() -> .test_startable(), and make it allow to return
boolean and refuse to start units when it returns false,
- refuse earlier to start units that are in the deactivating state, so
several redundant conditions in .start() can be dropped,
- move checks for unit states mapped to UNIT_ACTIVATING from .start() to
.test_startable().
Frantisek Sumsal [Thu, 23 Oct 2025 13:30:52 +0000 (15:30 +0200)]
man: handle leading/trailing/repeating whitespaces in anchor links
So even if a <term> section contains newlines, we get a reasonable
anchor link to it.
Before:
<dt id="
bind
UNIT
PATH
[PATH]
"><span class="term">
...
<a class="headerlink" title="Permalink to this term" href="#%0A%20%20%20%20%20%20%20%20%20%20%20%20bind%0A%20%20%20%20%20%20%20%20%20%20%20%20UNIT%0A%20%20%20%20%20%20%20%20%20%20%20%20PATH%0A%20%20%20%20%20%20%20%20%20%20%20%20[PATH]%0A%20%20%20%20%20%20%20%20%20%20">¶</a>
After:
<dt id="bind UNIT PATH [PATH]"><span class="term">
...
<a class="headerlink" title="Permalink to this term" href="#bind%20UNIT%20PATH%20[PATH]">¶</a>
Frantisek Sumsal [Thu, 23 Oct 2025 08:28:07 +0000 (10:28 +0200)]
test: properly wait for the forked process
The process forked off by `systemd-notify --fork` is not a child of the
current shell, so using `wait` doesn't work. This then later causes a
race, when the test occasionally fails because it attempts to start a
new systemd-socket-activate instance before the old one is completely
gone:
[ 1488.947744] TEST-74-AUX-UTILS.sh[1938]: Child 1947 died with code 0
[ 1488.947952] TEST-74-AUX-UTILS.sh[1933]: + assert_eq hello hello
[ 1488.949716] TEST-74-AUX-UTILS.sh[1948]: + set +ex
[ 1488.950112] TEST-74-AUX-UTILS.sh[1950]: ++ cat /proc/1938/comm
[ 1488.945555] systemd[1]: Started systemd-networkd.service - Network Management.
[ 1488.950365] TEST-74-AUX-UTILS.sh[1933]: + assert_in systemd-socket systemd-socket-
[ 1488.950563] TEST-74-AUX-UTILS.sh[1951]: + set +ex
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + kill 1938
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + wait 1938
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: .//usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh: line 14: wait: pid 1938 is not a child of this shell
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + :
[ 1488.951486] TEST-74-AUX-UTILS.sh[1952]: ++ systemd-notify --fork -- systemd-socket-activate -l 1234 --now socat ACCEPT-FD:3 PIPE
[ 1488.952222] TEST-74-AUX-UTILS.sh[1953]: Failed to listen on [::]:1234: Address already in use
[ 1488.952222] TEST-74-AUX-UTILS.sh[1953]: Failed to open '1234': Address already in use
[ 1488.956831] TEST-74-AUX-UTILS.sh[1933]: + PID=1953
[ 1488.957078] TEST-74-AUX-UTILS.sh[102]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh failed'
[ 1488.957078] TEST-74-AUX-UTILS.sh[102]: Subtest /usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh failed
Yu Watanabe [Thu, 23 Oct 2025 00:35:03 +0000 (09:35 +0900)]
rereadpt: always update kernel partition tables from userspace in an incremental fashion (#39390)
Let's address #38672 comprehensively: let's avoid BLKRRPART as much as
we can, and always do careful userspace controlled, incremental updates
to the kernel partition tables.
This simply iterates through blkid's partition parsing, and turns it
into a BLKPG ioctls, adding, updating, removing partitions as necessary,
suppressing unnecessary changes. This has the major benefit that the
call becomes truly idempotent: if nothing changed then nothing is
removed/readed, like BLKRRPART is doing it.
This then ports over all code currently doing partition refreshing,
specifcially: udev, repart, and homed.
tree-wide: open block device locks in writable mode
udev's block device locking protocol has one pitfall not even the
example in the documentation got right so far (even though this is
explained in all detail above): udev's rescanning is only triggered when
an fd that is opened for writing is closed. This means that if a
separate locking fd is opened on a block device – one that is maintained
independently of the fd actually used for writing – it must be opened for
writing too, so that closing the lock definitely triggers a rescan. This
matters in cases where the lock fd is kept for longer than the fd used
for writing to disk. (Because otherwise udev might get the
IN_CLOSE_WRITE event, but when it tries to rescan will find the device
locked, and never retry because no IN_CLOSE_WRITE is triggred anymore.)
Let's fix that across the codebase, at 4 places:
1. in makefs (a lock fd is kept, and mkfs then invoked as child, which
uses a different fd, and the lock fd is closed only once the child
died)
2. in udevadm lock (embarassing!): which is intended to be used to wrap tools
that modify disk contents, very similar to the makefs case. The lock
is also kept until after the tool exited.
3. In storagetm: the kernel nvme-tcp layer writes to the device
directly, we just keep a lock fd.
blockdev-util: rename BlockDeviceLookupFlag to plural
This is a flags type and a flag function argument, let's name it in
plural, because it allows many flags combinations. Internally, the
implementation already used plural, but let's fix the prototypes too.
* 5650452e6b Install new files for upstream build
* 607afcd060 salsa: disable arm64/ppc64el again
* b1bb6d4849 systemd-tests: drop unused overrides
* b3790a36ca getty-static: add missing Documentation=
* 1cea27caba Backport patch to fix autopkgtest with new util-linux due to file move
* 2e74a7f969 Update changelog for 258.1-1 release
* 9250e242b9 Make /run/lock world writable by default
Luca Boccassi [Mon, 20 Oct 2025 23:37:44 +0000 (00:37 +0100)]
mountfsd: allow privileged users to mount bare unprotected filesystems
This is useful when we start to call mountfsd from root, for example
from the tests where we just use a simple squashfs/erofs.
Note that this requires the caller to be root, and it will be rejected
otherwise, as such images are classified as 'unprotected' and the
enforced policy does not accept them for unprivileged users.
Disable abort in log_assert in libsystemd/libudev (#39307)
See the second commit for details.
I think we might want to apply the same treatment to nss and pam
modules. Asserting in such "plugin code" seems iffy. But this PR doesn't
change those in any way.