chase: mask away CHASE_MUST_BE_REGULAR in chase_openat()
We pin the parent directory of the specified directory via CHASE_PARENT,
but if we do that we really should mask off CHASE_MUST_BE_REGULAR,
because a parent dir of course is a dir, nothing else. The
CHASE_MUST_BE_REGULAR after all should apply to the file created in that
dir, not to the parent.
sd-boot: allow configuration of log levels (#38701)
This allows for more liberal usage of logging functionality as messages
will no longer always show up on screen, regardless of urgency. The log
level to use can be configured through an SMBIOS type 11 string
(`io.systemd.boot.loglevel=`) or by using the `log-level` option in
loader.conf. Valid values are debug, info, notice, warning, err, crit,
alert, and emerg. By default, info will be used.
basic/efivars: read EFI variables using one read(), not two (#38864)
In https://github.com/systemd/systemd/issues/38842 it is reported that
we're again having trouble accessing EFI variables:
```
[ 292.212415] H (udev-worker)[253]: Reading EFI variable /sys/firmware/efi/efivars/LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f.
...
[ 344.397961] H (udev-worker)[253]: Detected slow EFI variable read access on LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f: 52.185510s
```
We don't know what causes the slowdown, but it seems reasonable to avoid
unnecessary read() calls. We would read the 4-byte attr first, and then
the actual value later. But our code always reads the value (and
discards the attr in all cases except one, when _writing_ the variable),
so let's optimize for the case where we read the value and read the
whole contents in one read().
Tobias Heider [Mon, 25 Aug 2025 14:07:54 +0000 (16:07 +0200)]
stub: fix file path handling for loaded kernel
- Actually pass the new memory file path to parent_loaded_image->FilePath
- Restore old parent_loaded_image if Linux returns
- Pass the same kernel_file_path in load_via_boot_services path
- s/Re-use/Patch in comment explaining what we are doing
Felix Pehla [Sat, 23 Aug 2025 15:27:20 +0000 (17:27 +0200)]
sd-boot: efi-log: use log levels internally
Change log_internal() to receive a log level from which a text color is
derived, rather than the text color directly, and adjust various log_*
macros to use them internally.
Implements the ability to add recovery keys to existing user accounts
via homectl update --recovery-key=yes. Previously, recovery keys could
only be configured during initial user creation, requiring users to
recreate their entire home directory to add recovery keys later.
macro: flip ONCE macro to make log_once() and friend actually log once
Previously, ONCE is false for the first time, and true for later times,
hence log_once() and log_once_errno() suppress logging in the first call,
rather than later calls.
Fortunately, ONCE macro is only used in log_once() and log_once_errno(),
hence this only fixes spurious logging.
"path" sounds like a fully qualified complete string referencing some
terminal object. But here it's not like that, the field just stores the
directory the object we actually care about is placed in. Hence let's
change this field to be named "directory", to be less confusing for
readers.
Some distributions does not have man package, but named man-db or so,
and most distribution specific mkosi.conf files already have them.
Let's drop man from the global config.
test: skip several test cases when running in chroot
When we are running in chroot, safe_fork() with FORK_MOUNTNS_SLAVE fails
with the following eror:
```
Failed to set mount propagation to MS_SLAVE for all mounts: Invalid argument
```
Let's skip the test cases when we are running in chroot.
systemd-sysext: introduce a global config (#38250)
This PR implements what is proposed in
https://github.com/systemd/systemd/issues/37992.
Having a global config file that supports the same cmdline options for
sysext/confext allows the user to customize the behavior of
systemd-sysext.service unit too, without the need of hacking the service
manually.
The global config will live in
`CONF_PATHS_STRV()/systemd/{sysext/confext}.conf` and it will be
overridden by cmdline, so it is possible to customize a run if
`systemd-sysext` is executed manually.
For now support `--mutable=` (`Mutable`) and `--image-policy=`
(`ImagePolicy`).
core: Add wall clock duration to CPU usage logging
Enhance CPU time logging to include wall clock duration alongside
CPU consumption. When a unit transitions to inactive/failed state,
the log message now shows both CPU time consumed and the total wall
clock time since activation.
Changes:
- Calculate wall clock duration using active_enter_timestamp
- Update log format: "Consumed Xs CPU time over Ys wall clock time"
- Fallback to original format if no activation timestamp available
- Use monotonic clock for accurate duration calculation
This addresses issue #35738 by providing administrators better context
about service performance and resource efficiency.
Example output:
- With wall clock: "service: Consumed 30s CPU time over 5min wall clock time"
- Without timestamp: "service: Consumed 30s CPU time"
Ryan Brue [Mon, 28 Jul 2025 16:46:22 +0000 (11:46 -0500)]
doc: document /run/host/root/ as an optional bind mount for the host fs
Container managers may want to bind mount the root filesystem
somewhere within the container. Security-wise, this is very much not
recommended, but it may be something application containers may want
to do nonetheless.
dissect: use blkid_probe filters to restrict probing to supported FSes and no raid
We only support a subset of filesystems, and no RAID, for DDIs. blkid spends a lot
of time trying to probe for the filesystem type, so cut it short by using
the filtering options to restrict it to the filesystems we support, and to
exclude raid probing.
The code was of two minds about error_id: it was used directly in
pam_syslog_errno(), but in the next line checked with streq_ptr().
sd_varlink_callbo() may return negative and then it does not set the output
params, or it returns the error in ret_error_id. We cannot assume that error_id
is non-null. Also fix a select-and-paste mistake in one place.
I'm seeing this in the initrd (with the dev_ksmg_record line added to clarify
where the error is coming from):
[ 6.114232] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.2
[ 6.116842] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.2".
[ 6.134115] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.2".
[ 6.139427] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.3
[ 6.144327] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.3".
[ 6.149442] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.3".
[ 6.155091] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.3
[ 6.160118] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.3".
[ 6.164814] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.3".
[ 6.169201] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.3
[ 6.173990] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.3".
[ 6.183104] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.3".
[ 6.187746] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.3
[ 6.192825] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.3".
[ 6.197733] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.3".
[ 6.203015] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.3
[ 6.207184] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.3".
[ 6.211943] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.3".
[ 6.216703] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.4
[ 6.221944] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.4".
[ 6.226803] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.4".
[ 6.231238] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.4
[ 6.236078] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.4".
[ 6.241845] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.4".
[ 6.247976] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.4
[ 6.252545] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.4".
[ 6.256146] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.4".
[ 6.260651] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.4
[ 6.265151] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.4".
[ 6.269755] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.4".
[ 6.276206] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.4
[ 6.280034] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.4".
[ 6.284603] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.4".
[ 6.288710] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.5
[ 6.293312] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.5".
[ 6.297763] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.5".
[ 6.302438] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.5
[ 6.306948] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.5".
[ 6.310797] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.5".
[ 6.315097] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.5
[ 6.319033] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.5".
[ 6.323593] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.5".
[ 6.328834] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.5
[ 6.333057] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.5".
[ 6.337644] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.5".
[ 6.341152] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.5
[ 6.345436] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.5".
[ 6.349824] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.5".
[ 6.354306] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.6
[ 6.358131] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.6".
[ 6.366568] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.6".
[ 6.371139] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.6
[ 6.375207] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.6".
[ 6.378681] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.6".
[ 6.382820] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.6
[ 6.387143] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.6".
[ 6.392192] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.6".
[ 6.397109] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.6
[ 6.400991] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.6".
[ 6.405992] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.6".
[ 6.410889] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.6
[ 6.414730] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.6".
[ 6.418266] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.6".
[ 6.422575] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.7
[ 6.429942] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.7".
[ 6.433780] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.7".
[ 6.438509] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.7
[ 6.442293] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.7".
[ 6.447236] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.7".
[ 6.453336] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.7
[ 6.458031] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.7".
[ 6.461948] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.7".
[ 6.465883] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.7
[ 6.470072] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.7".
[ 6.476196] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.7".
[ 6.481182] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:02.7
[ 6.484938] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:02.7".
[ 6.491322] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:02.7".
[ 6.497289] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.0
[ 6.501935] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:03.0".
[ 6.505217] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:03.0".
[ 6.509819] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.0
[ 6.516078] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:03.0".
[ 6.520942] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:03.0".
[ 6.525178] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.0
[ 6.528505] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:03.0".
[ 6.534669] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:03.0".
[ 6.539353] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.0
[ 6.543035] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:03.0".
[ 6.547441] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:03.0".
[ 6.553211] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.0
[ 6.557452] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/class/pci/0000:00:03.0".
[ 6.562468] systemd-journald[251]: sd-device: Failed to chase symlinks in "/sys/firmware/pci/0000:00:03.0".
[ 6.566955] systemd-journald[251]: dev_kmsg_record: kernel_device=+pci:0000:00:03.1
[ 6.570846] systemd-journald[251]: Too many messages being logged to kmsg, ignoring
The error message was misleading, since it sounds like there's an issue with
symlinks, but the device simply doesn't exist. But I think we should suppress
the message altogether. journald spewing messages like this fills up the logs
for no benefit. The sd_device_new* functions can legitimately be used for
"invalid" devices, e.g. to check if they even exist. We have no idea for what
purpose the caller is creating the device object, so let's not log this at all.
The caller can log if appropriate.
--no-hostname is one of the switches I use very often. In particular,
when looking at CI logs, the hostname is almost never interesting.
-H is not yet used in journalctl, because journal operates locally, but
will want it if display of remote journals is implemented. Use -W.
Luca Boccassi [Sun, 24 Aug 2025 19:51:23 +0000 (20:51 +0100)]
repart: do not fail when CopyBlocks= is used in the initrd
When running in the initrd --root= is automatically set to /sysroot or /sysusr
but then using CopyBlocks fails due to a security measure:
root@particle-caba-1e47:~# systemd-repart --dry-run=no /dev/vda
No machine ID set, using randomized partition UUIDs.
Automatic discovery of backing block devices not permitted in --root= mode, refusing.
I noticed in our NixOS packaging that we were working around the fact
that core/swap.c looks for swapon and swapoff in /sbin
Lets make it configurable just like all the other util-linux binaries
through meson and make it default to /usr/sbin/{swapon,swapoff}
This way mounts work on a systemd without the /sbin -> /usr/sbin
compatibility symlink. (And as a side-effect has NixOS be able to have
it in /nix/store too like the other util-linux tools).
Given that `unmerged-usr` support was dropped in 255 I think this is a
safe change?
fd-util: fix path_is_root_at() when dealing with detached mounts (#38636)
path_is_root_at() is supposed to detect if the inode referenced by the
specified fd is the "root inode". For that it checks if the inode and
its parent are the same inode and the same mount. Traditionally this
check was correct. But these days we actually have detached mounts (i.e.
those returned by fsmount() and related calls), whose root inode also
behaves like that.
Our uses for path_is_root_at() use the function to detect if an absolute
path would be identical to a relative path based on the specified fd
(sepifically: chaseat()), which goes really wrong if used on a detached
mount.
hence, let's adjust the function a bit, and let's go by path to "/" to
check if the referenced inode is the actual root inode in our chroot.
Alan Brady [Wed, 6 Aug 2025 17:38:59 +0000 (20:38 +0300)]
nspawn: add NamespacePath support for nspawn files
Commit d7bea6b6 ("nspawn: introduce an option for specifying network
namespace path") already did most of the work here enabling a command
line option for specifying the namespace path for a given container.
Someone even took care of the merging code in merge_settings as though
this already worked. All that's then needed is to add a line to the
nspawn-gperf.gperf file to actually enable being able to specify
NamespacePath from nspawn files as well.
This greatly simplifies how we configure nspawn containers by being able
to give all the options we need in .nspawn files instead of needing to
also use command line parameters.
Luca Boccassi [Tue, 26 Aug 2025 18:12:53 +0000 (19:12 +0100)]
sysext: do not attempt to unlock images interactively
These images are not using a passphrase, they are using keys
or at most TPM-based sealing (not yet implemented, for contexts).
Do not use the interactive helper, as it will block and ask the
user for a password if it fails to find the signing cert, which
is not useful for this tool.
vmspawn: initialize block device "serials" from backing file name
If we pass multiple block devices into a VM it's really useful to pass
recognizable serial numbers on them, so that we know which one is which.
qemu allows setting them, hence initialize them automatically from the
filename of the backing file, as a convenience feature.
Inside of a VM this means /dev/disk/by-id/… symlinks will be generated
with useful identifiers.
This was added in c242a082793df77a1dc0bce7f470660ab0a86fe5, and AFAICT, the
code was never exercised, not even in the tests. With this chunk gone, if
anyone ever calls the function without any output params, we'll do open + fstat
instead of access, which will work just fine too.
I also thought about converting efi_set_variable() to use writev, but we don't
have loop_writev. I'm not sure if the loop around write here is important.
Coinceivably, it could make a difference it we were writing a long value.
The loop was introduced in b7749eb517ff5dd379cf61ee9fb50a0105ab2c0f, without
much comment unfortunately. So it doesn't seem worth the risk of changing this
to not use a loop, and writing loop_writev just for this also seems overkill.
basic/efivars: read EFI variables using one read(), not two
In https://github.com/systemd/systemd/issues/38842 it is reported that we're again
having trouble accessing EFI variables:
[ 292.212415] H (udev-worker)[253]: Reading EFI variable /sys/firmware/efi/efivars/LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f.
...
[ 344.397961] H (udev-worker)[253]: Detected slow EFI variable read access on LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f: 52.185510s
We don't know what causes the slowdown, but it seems reasonable to avoid
unnecessary read() calls. We would read the 4-byte attr first, and then the
actual value later. But our code always reads the value (and discards the attr
in all cases except one, when _writing_ the variable), so let's optimize for
the case where we read the value and read the whole contents in one readv().
machine: do not allow unprivileged users to register other users' processes as machines (#38911)
Registering a process as a machine means a caller can get machined to
send sigterm to it, and more. If an unpriv user is registering, ensure
the registered process has the same uid.