Let's check both the per user machined and the system machined instead
of only the system machined. We give preference to the per user machined
and fall back to the system machined.
mute console kernel log/pid1 status output while firstboot is running (#39101)
This is also preparation for the installer later, split out of #38764.
It makes the experience a lot nicer if our nice little tools aren't
constantly interrupted by log spew from the kernel.
mute-console: add simple varlink service that can disable log/status spew to kernel console
For "wizard" style interactive tools it's very annoying if they are
interrupted by kernel log output or PID1's status output. let's add some
infra to disable this temporarily. I decided to implement this as an IPC
service so that we can make this robust: if the client request the
muting dies we can automatically unmute again.
This is hence a tiny varlink service, but it can also be started
directly from the cmdline.
It seems
- the address sanitizer on fedora 42 reports false-positive, or
- probing partitions in libblkid 2.40.4 has a bug.
Not sure which causes the issue, but anyway the address sanitizer
kills udev-worker when sym_blkid_partition_get_name() is called
in udev-builtin-blkid.c.
```
systemd-udevd[488]: ==488==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7ffdd716e020 at pc 0x563e3ca66fcb bp 0x7ffdd716d970 sp 0x7ffdd716d968
systemd-udevd[488]: READ of size 8 at 0x7ffdd716e020 thread T0 ((udev-worker))
(snip)
systemd-udevd[488]: HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
systemd-udevd[488]: (longjmp and C++ exceptions *are* supported)
systemd-udevd[488]: SUMMARY: AddressSanitizer: stack-buffer-underflow (/usr/bin/udevadm+0x187fca) (BuildId: 1fb56dbdf0447aba1185d6e34560b782b76098be)
(snip)
systemd-udevd[488]: Command: (udev-worker)
systemd-udevd[488]: ==488==ABORTING
```
Anton Tiurin [Mon, 15 Sep 2025 19:32:39 +0000 (12:32 -0700)]
networkd: fia xRequiredOperationalStateForOnline serializtion
In integration tests (for example TEST-85-NETWORK-NetworkctlTests)
LINK_OPERSTATE_RANGE_INVALID and required_for_online == -1 are serialized as
```
"RequiredForOnline": "true",
"RequiredOperationalStateForOnline": [null, null]
```
Such link should be reported as required_for_online=False and not
serialize nulls.
core/bpf-firewall: replace unnecessary unit_setup_cgroup_runtime() with unit_get_cgroup_runtime()
Except for the test, bpf_firewall_compile() is only called by the following:
cgroup_context_apply() -> cgroup_apply_firewall() -> bpf_firewall_compile()
and in the early stage of cgroup_context_apply(), it checks if the cgroup
runtime exists. Hence, it is not necessary to try to allocate the
runtime in bpf_firewall_compile().
core/bpf-firewall: make failures in loading custom BPF program not critical
All other resource control features work as 'best-effort', and failures
in applying them are handled gracefully. However, unlike the other features,
we tested if the BPF programs can be loaded and refuse execution on failure.
Moreover, the previous behavior of testing loading BPF programs had
inconsistency: the test was silently skipped if the cgroup for the unit does
not exist yet, but tested when the cgroup already exists.
Let's not handle failures in loading custom BPF programs as critical, but
gracefully ignore them, like we do for the other resource control features.
core/unit: fail earlier before spawning executor when we failed to realize cgroup
Before 23ac08115af83e3a0a937fa207fc52511aba2ffa, even if we failed to
create the cgroup for a unit, a cgroup runtime object for the cgroup is
created with the cgroup path. Hence, the creation of cgroup is failed,
execution of the unit will fail in posix_spawn_wrapper() and logged
something like the following:
```
systemd[1]: testservice.service: Failed to create cgroup /testslice.slice/testservice.service: Cannot allocate memory
systemd[1]: testservice.service: Failed to spawn executor: No such file or directory
systemd[1]: testservice.service: Failed to spawn 'start' task: No such file or directory
systemd[1]: testservice.service: Failed with result 'resources'.
systemd[1]: Failed to start testservice.service.
```
However, after the commit, when we failed to create the cgroup, a cgroup
runtime object is not created, hence NULL will be assigned to
ExecParameters.cgroup_path in unit_set_exec_params().
Hence, the unit process will be invoked in the init.scope.
```
systemd[1]: testservice.service: Failed to create cgroup /testslice.slice/testservice.service: Cannot allocate memory
systemd[1]: Starting testservice.service...
cat[1094]: 0::/init.scope
systemd[1]: testservice.service: Deactivated successfully.
systemd[1]: Finished testservice.service.
```
where the test service calls 'cat /proc/self/cgroup'.
To fix the issue, let's fail earlier when we failed to create cgroup.
This is the first part of #38728, just the machined stuff, no the
importd stuff.
This definitely makes sense of its own, hence let's get this in first.
The original PR contains a tescase that tests machined + importd in
combination. This PR here hence is without a testcase, but it's there,
just in the other PR.
This looks large and is large, but do note that much of the machined
changes are very repetitive: they conditionalize PK checks to the system
version, as PK doesn't make sense in the use rversion.
integration tests: do not adjust log level in the test script
We passes log level through kernel command line. It is not necessary to
set to debug level at the beginning, and set to info at the end.
This is important when a test has several subtests. If a subtest sets
log level to info at the end, then subsequent tests may not generate any
useful logs.
When HAVE_SECCOMP is not set, a build error happens:
../src/analyze/analyze-security.c: In function ‘get_security_info’:
../src/analyze/analyze-security.c:2449:13: error: unused variable ‘r’ [-Werror=unused-variable]
2449 | int r;
| ^
cc1: some warnings being treated as errors
Mike Yuan [Thu, 25 Sep 2025 20:28:33 +0000 (22:28 +0200)]
core/cgroup: make sure deserialized accounting data is not voided
Currently, cgroup_path is (de-)serialized after all the cached
accounting data. This is bogus though, since unit_set_cgroup_path()
destroys the CGroupRuntime object and starts afresh, discarding
all deserialized values. This matters especially for IP accounting,
whose BPF maps get recreated on reload/reexec and the previous values
are exclusively retrievable from deserialization. Let's hence swap things
around and serialize cgroup_path first, accounting data only afterwards.
discover-image: support runtime scope also for .nspawn settings files and the pool dir
discover-image.[ch] largely already supports per-scope operations, let's
extend this however to also cover finding .nspawn settings files and
managing the pool dir.
machined: do not allow unprivileged users to shell into the root namespace
We intend to make self-registering machines an unprivileged operation,
but currently that would allow an unprivileged user to register a
process they own in the root namespace, and then login as any
user they like, including root, which is not ideal.
Forbid non-root from shelling into a machine that is running in
the root user namespace.
gpt: Introduce function to convert verity hash or sig to data partition
Let's rename the existing partition_verity_to_data() to
partition_verity_hash_to_data() and make a new partition_verity_to_data() that
handles both verity hash and verity signature partitions.
Currently it's next to impossible to find out why dissect_image()
has failed with EADDRNOTAVAIL, so let's add debug logging and use
EREMOTE for the different architectures error to help out with
debugging a bit.
mountfsd: slightly relax restrictions on dir fds to mount
When establishing a idmapped mount for a directory with foreign mapping
we so far insisted in the dir being properly opened (i.e. via a
non-O_PATH fd) being passed to mountfsd. This is problematic however,
since the client might not actually be able to open the dir (which after
all is owned by the foreign UID, not by the user). Hence, let's relax
the rules, and accept an O_PATH fd too (which the client can get even
without privs). This should be safe, since the load-bearing security
check is whether the dir has a parent owned by the client's UID, and
for that check O_PATH or not O_PATH is not relevant.
test: check the next elapse timer timestamp after deserialization
When deserializing a serialized timer unit with RandomizedDelaySec= set,
systemd should use the last inactive exit timestamp instead of current
realtime to calculate the new next elapse, so the timer unit actually
runs in the given calendar window.
It mostly introduces the "chrome" stuff that puts a blue at the top of
bottom of the terminal screen when going through interactive tools such
as firstboot, homed-firstboot, and (in future) systemd-sysinstall).
it also introduces a generic "prompt_loop()" helper thatn queries the
user for some option in a loop until the rsponse matches certain
requirements. It's a generalization of a function of the same name that
so far only existed in firstboot.c. The more generic version will be
reused in a later PR by homed-firstboot and by sysinstall.
prompt-util: add helpers that paint some "chrome" on top/bottom of screen
We'll soon have three different kind of interactive "wizard"-like console
UIs: systemd-firstboot, homectl firstboot and soon systemd-sysinstall.
Let's give them a limited, recognizable visual identity, to distinguish
them from the usual console output: let's add a bit of "chrome" to the
top and bottom of the screen, that we show during ther wizards, but hide
again afterwards.
This makes use of the DECSTBM sequence that reduces the scrolling area
by chopping off blocks from the top or bottom of the screen. The
sequence is quite standard, given it has been part of VT100 already.
xterm, vte, Linux console all support it just fine.
machined: add PIDFD D-Bus variants for registering/creating machines
Current methods take a numeric PID, but we know that is unreliable for
the usual reasons. Add variants that take a PIDFD instead, or a
PID + PIDFDID combination for remote users.
boot: let's make the one space we output early on invisible
let's place the cursor at the beginning of the line before/after, so we
know it's the first char we overwrite, and we return to the front again
right after.
boot: work around ansi color issues between sd-boot, uefi and terminals
So, UEFI's color texting is a bit weird. It translates everything to
ANSI sequences, but unlike ANSI sequences it has no understanding of a
distinct "default" bg/fg color, it assumes the ansi color "0" is always
equal to white on black, but that's of course not really true, most
terminal emulators at the very least support white background too.
tianocore then also tries to be smart and suppresses ANSI color changes
from a color to itself. But if the understanding of the color is wrong
in the first place, then any color change suppression like this hurts
more than it helps.
Then in addition there are certain terminal tools that will reset the bg
color on every line break ("less" for example) to the default.
Let's deal with that and improve the situation on all fronts:
1. force out color changes by doing two color changes whenever we really
want it.
2. on every newline force out the color change again.
with this in place, using sd-boot on a terminal emulator is a lot nicer.
This also drops space between number and 's', like we do in format_timespan(),
and fixes spurious type mismatch between timeout_sec and timeout_remain.
measure: strip tpm 1.x remnants and make GetActivePcrBanks() work (#39089)
Let's never bother with old TPM 1.x structures, they are not mentioned
in the TCG for TPM2 spec at all. However, the spec does say we should
check the Size field of the relevant structs, before accessing them,
hence do that.
Use that to determine the version of the protocol, before accessing
GetActiveBanks().
udev-builtin-net_id: Add DeviceTree-based names for WLAN devices (#39060)
Add support for generating names like wldN based on DeviceTree aliases.
DeviceTree alias names follow de facto conventions. As of writing, there
are so far two ways WLAN devices are represented in DeviceTree aliases
in upstream Linux DTS files:
- Firstly, as wifi0, used for example in t600x-j314-j316.dtsi
- Secondly, as ethernet0 or ethernet1, used for example in
sun8i-q8-common.dtsi, with a comment saying the reason is to "Make
u-boot set mac-address for wifi without an eeprom"
So we need to handle both while generating names. Refactor most of the
logic in names_devicetree() into a helper
names_devicetree_alias_prefix() that takes an alias_prefix instead of
hardcoding "ethernet", and, in the new names_devicetree():
- For prefix "en", use alias_prefix "ethernet"
- For prefix "wl", try alias_prefix "wifi" first, and if that was not
found, fall back to alias_prefix "ethernet"
Since this is a naming scheme change, also gate this behind
NAMING_DEVICETREE_ALIASES_WLAN and NAMING_V259, and document this
change.
I initially didn't think it would be worth doing this, but I changed my
mind. People out there quite successfully build systemd without ACL
support, and that suggests life without it is quite possible. Moreover
we only use it as very specific places:
1. in udev/logind for "uaccess" mgmt
2. in tmpfiles to implement explicitly configured acl changes
3. in journald/coredump/pstore to manage access to unpriv users
4. in pid1 to manage access to credential files
5. when shifting UIDs of container trees
I specific container environments it should be entirely fine to live without all
of these, hence let's pull this in on demand only.
Let's never bother with old TPM 1.x structures, they are not mentioned
in the TCG for TPM2 spec at all. However, the spec does say we should
check the Size field of the relevant structs, before accessing them,
hence do that.
On older versions, if the flag is anything other than AT_SYMLINK_NOFOLLOW,
it returns EINVAL, so we can detect it and call the kernel syscall directly
ourselves.
Using the glibc wrappers when possible is prefereable so that programs
like fakeroot can intercept its calls and redirect them.
Since there have been disagreements on certain aspects of the technical
direction, let's clear things up, and introduce a governance document,
taking inspiration from:
udev-builtin-net_id: Add DeviceTree-based names for WLAN devices
Add support for generating names like wldN based on DeviceTree aliases.
DeviceTree alias names follow de facto conventions. As of writing, there
are so far two ways WLAN devices are represented in DeviceTree aliases
in upstream Linux DTS files:
- Firstly, as wifi0, used for example in t600x-j314-j316.dtsi
- Secondly, as ethernet0 or ethernet1, used for example in
sun8i-q8-common.dtsi, with a comment saying the reason is to "Make
u-boot set mac-address for wifi without an eeprom"
Therefore for prefix "wl", try alias_prefix "wifi" first, and if that
was not found, fall back to alias_prefix "ethernet"
Since this is a naming scheme change, also gate this behind
NAMING_DEVICETREE_ALIASES_WLAN and NAMING_V259, and document this
change.
udev-builtin-net_id: Refactor names_devicetree() to avoid hardcoding
Refactor most of the logic in names_devicetree() into a helper
names_devicetree_alias_prefix() that takes an alias_prefix instead of
hardcoding "ethernet".
The return value names_devicetree_alias_prefix() will be used in further
commits to allow for alias_prefix fallback.
This also makes shebang always use env command, and drops unnecessary
'bash -c' or 'sh -c' when a signle command is invoked in the shell,
like sleep or echo.
In the commit c960ca2be1cfd183675df581f049a0c022c1c802, the logic of
updating ACL on device node was moved from logind to udevd, but at that
time, mistakenly removed the logic for static nodes.