importd: support OS tree "mangling" unpriv too (#39406)
Split out of #38728
(background: os tree "mangling" is what we do if a tarball with an OS
image inside it if is nested inside an extra top-level dir inside the
tarball, which we need to "mangle" and move everything inside one level
up)
This also drops the mkosi testuser from the wheel and systemd-journal
groups as the integration tests rely on the testuser not being to read
the full journal.
Mike Yuan [Thu, 30 Oct 2025 14:38:19 +0000 (15:38 +0100)]
core/exec-invoke: switch keep_fds to heap allocation
Hardcoding total size of the array is error-prone, especially
considering the exeuctable_fd is added far below, so the '4' is
not entirely obvious. Also we seldomly do VLAs.
Mike Yuan [Wed, 29 Oct 2025 20:25:42 +0000 (21:25 +0100)]
core/service: only pass socket fds to control processes
If socket is used as stdio, we'd currently imply EXEC_PASS_FDS
and dump the whole set of fds to the control processes. This is
pretty much unexpected and unnecessary though, instead let's
pass only the socket fds.
Yes, this is a compat break, but a relatively minor one I'd
argue. And we can always revisit things if users do complain.
Mike Yuan [Wed, 29 Oct 2025 20:20:26 +0000 (21:20 +0100)]
core/execute: merge n_storage_fds and n_extra_fds into stashed_fds
The distinction between fdstore and extra fds is only meaningful
to struct Service. As far as executor is concerned they're just
some fds to pass to the service. Let's just merge it hence,
for the sake of simplicity.
Daan De Meyer [Thu, 30 Oct 2025 11:28:19 +0000 (12:28 +0100)]
run0: Add --empower
--empower gives full privileges to a non-root user. Currently this
includes all capabilities but we leave the option open to add more
privileges via this option in the future.
Why is this useful? When running privileged development or debugging
commands from your home directory (think bpftrace, strace and such),
you want any files written by these tools to be owned by your current
user, and not by the root user. run0 --empower will allow you to run
all privileged operations (assuming the tools check for capabilities
and not UIDs), while any files written by the tools will still be owned
by the current user.
This creates a chicken-and-egg problem: we stuff the pcrlock policy into
a credential in the ESP, but credentials get measured into PCR 12, hence
PCR 12 is both input and output of the pcrlock logic, which makes
impossible to calculate.
Let's drop PCR 12 for now.
(We might want to pass the policy some other way one day, to avoid this,
but that's something for another day.)
Note that this still allows locking to PCR12 if people want to (for
example because they don't need this for the rootfs, and hence need no
cred passing via the ESP), this hence only changes the default, nothing
more.
Yu Watanabe [Thu, 23 Oct 2025 02:19:52 +0000 (11:19 +0900)]
network/sysctl: logs when per-link IPMasquerade= setting changes the global IPv6Forwarding= setting
All other cases, settings on different interfaces are completely
independent. But IPMasquerade=yes on an interface enables the global
IPv6Forwarding= setting, and hence affects other interfaces.
Let's log about that.
Prompted by https://github.com/systemd/systemd/issues/39304#issuecomment-3430382233.
* ea1d871ecd Add missing networkd socket units
* b76b5da2e6 Merge #214 `Drop backwards compat logic from integration tests script`
* 7208fa2b1b Require systemd-rpm-macros for build
* 2e1a6c7474 Require python3-zstandard in ELN
* 79c9db1bc8 Require systemd-libs and systemd-shared to be in the same version
* db38445a7e Drop two patches with workaround (selinux, kernel)
* 593a204189 Version 258.1
* a3e9e27982 Change '%{systemd}' to systemd in Conflicts/Provides/Requires/Recommends
* 88877a4184 Require systemd-networkd and systemd-udev to be in the same version
* 8a446daec7 Version 258 💝
* cceac93491 Pre-create /etc/userdb directory
* b442086d5f Version 258~rc4
* 327e54e421 Add to patch to create userdb root directory with correct label
* 2289d65726 Fix unit name in scriptlet
* 5acde9f1fd Add workaround patch to hopefully pass podman CI tests
* 1f5ed0da1f Version 258~rc3
* 50936458a7 obs: move recipe files in place
* 1bdb4efe40 obs: switch to xz for compression
* be7a4d0863 Version 258~rc2
* 2ace9416e8 obs: also use version with tilde for Source0
* 8d1645af75 Use again %{version} when building in OBS
* 98cc5fd91a Version 258~rc1
* ed7d2f1132 Add "test" that LTO effectively removes unused code from shared lib
* 40b38a04d2 Build docs on 64-bit architectures only
* 5d30fd3b26 Version 257.7
Daan De Meyer [Tue, 28 Oct 2025 21:54:14 +0000 (22:54 +0100)]
mount-util: Iterate mountinfo backwards when unmounting
Submounts will always be located further in the mountinfo file, so
when we're unmounting, iterating backwards is likely to be more
efficient than iterating forwards. It'll also reduce the amount of
EBUSY debug logging we'll get since we'll stop trying to unmount
parent mounts with submounts which will always fail with EBUSY.
If you're using `udevadm monitor` from a script, without a tty, then
libc defaults to being fully-buffered, and won't flush stdout after
newlines. This is fine for tools that dump a bunch of data and then
exit immediately. It's a problem for tools like `udevadm monitor` which
have long pauses: the buffered data can get stuck in the buffer for an
unbounded amount of time.
In the Cockpit project we've been working around this for some time with
`stdbuf` which is a `LD_PRELOAD` hack to change the libc buffering
behaviour, but we'd like to stop doing that.
Let's make sure we flush the buffer after each event.
Yu Watanabe [Tue, 28 Oct 2025 04:20:58 +0000 (13:20 +0900)]
TEST-07-PID1: wait for systemd-resolved being stopped
As 'systemctl stop' is called with --no-block, previously systemd-resolved
might not be stopped when 'resolvectl' is called, and the DBus connection
might be closed during the call:
```
TEST-07-PID1.sh[5643]: + systemctl stop --no-block systemd-resolved.service
TEST-07-PID1.sh[5643]: + resolvectl
TEST-07-PID1.sh[5732]: Failed to get global data: Remote peer disconnected
```
The reverts are not strictly necessary here (as already pointed out in
https://github.com/systemd/systemd/pull/39154#issuecomment-3360118164)
but they were helpful in checking if the fix works as expected. I can
drop them if needed.
Ronan Pigott [Sun, 26 Oct 2025 04:04:03 +0000 (21:04 -0700)]
zsh: add completion for dbus bus address
The DBUS_SESSION_BUS_ADDRESS and DBUS_SYSTEM_BUS_ADDRESS parameters have
an interesting syntax thats useful to complete. Let's include a
completion definition for these parameters.
Mike Yuan [Fri, 24 Oct 2025 21:09:50 +0000 (23:09 +0200)]
logind: support deserializing session leader through pidfdid
People make weird assumptions around state preservation and
expect logind to be stoppable. While this is realistically
not OK we can probably improve things a little.
This complements f01d8658a3a57d05a5156aefd32d8137c3ee3996 and
adds support for deserializing the LEADER_PIDFDID= field.
We still prioritize pidfd if got one from fdstore (as with
service_notify_message_parse_new_pid() in pid1), but otherwise
this should make logind restart more robust when fdstore
gets spuriously cleared.
core/exec-invoke: relax restriction for process name length
Previously, we limit the length of process name by 8.
This relax the restriction then at least process comm or
program_invocation_name contains the untrucated process name.
Yu Watanabe [Sat, 25 Oct 2025 06:34:44 +0000 (15:34 +0900)]
test: extend start limit interval
As the modified service requires about ~10 seconds for stopping, the
service never hit the start limit even if we tried to restart the
service more than 5 times.
This also checks that the service is actually triggered by dbus method
call.
Daniel Hast [Fri, 24 Oct 2025 22:47:59 +0000 (18:47 -0400)]
tree-wide: add basic validation of --background argument
Check whether the argument of the `--background` option of
`systemd-run`, `run0`, `systemd-nspawn`, `systemd-vmspawn`, and
`systemd-pty-forward` is either empty or looks like an ANSI color code,
and reject invalid values when parsing arguments.
We consider a string to look like an ANSI color code if it consists of
one or more sequences of ASCII digits separated by semicolons. This
permits every valid ANSI color code, and should reject anything that
results in garbled output.
discover-image: imply that hidden images are read-only
Marking a whole directory tree OS image as read-only is difficult
privilege-wise, because so far we rely on the FS_IMMUTABLE_FL which is
not accessible to unpriv clients.
One fundamental place where we currently rely on marking images
read-only is for keeping pristine copies of the originally downloaded
image around, which we place in "hidden" image directories. This is
probably the most relevant usecase for the read-only flag. And moreover,
the only usecase for the hidden images are these read-only pristine
copies.
Hence, let's make this work reasonably in the unpriv case, and simply
imply the read-only flag for hidden images. This is strictly speaking a
change in behaviour, but effectively it shouldn't be, because for nspawn
containers that are executed we insist on names that are hostname
compatible, and hidden names aren't (because they start with a dot).
rm-rf: make sure we can safely remove dirs we have no access to via rm_rf_at()
Previously, we'd first empty a dir, and then remove it. This works fine
as long as we have access to a dir. But in some cases (like for example
a foreign owned container tree) we might not have access to the dir, but
are still able to remove it (because it is empty, and in a dir we own).
Hence let's try that first. If it works, we do not need to enter the dir
(and thus fail).
Michal Sekletar [Fri, 24 Oct 2025 10:55:20 +0000 (12:55 +0200)]
coredump: handle ENOBUFS and EMSGSIZE the same way
Depending on the runtime configuration, e.g. sysctls
net.core.wmem_default= and net.core.rmem_default and on the actual
message size, sendmsg() can fail also with ENOBUFS. E.g. alloc_skb()
failure caused by net.core.[rw]mem_default=64MiB and huge fdinfo list
from process that has 90k opened FDs.
We should handle this case in the same way as EMSGSIZE and drop part of
the message.