Currently with FileDescriptorStorePreserve=yes the FD store is kept
around
regardless of what happens to a unit, which is useful in many cases. But
in
some cases, for example when complex services crash horribly, it's hard
to
reason about what was in the intermediate state, and it's better to
start
fresh.
Add a new 'on-success' option for the FileDescriptorStorePreserve=
setting
that keeps it around only for as long as the unit doesn't go to a
persistently
failed state.
This is especially useful in combination with LUO, where we don't want
to
keep around LUO sessions created by units that then proceeded to crash
and
burn, and might be in a bad state afterwards.
Daan De Meyer [Wed, 20 May 2026 20:46:40 +0000 (20:46 +0000)]
test-btrfs: skip info test when GET_SUBVOL_INFO ioctl is unsupported
On 32-bit userspace running against a 64-bit kernel
BTRFS_IOC_GET_SUBVOL_INFO returns -ENOTTY: struct
btrfs_ioctl_get_subvol_info_args embeds four btrfs_ioctl_timespec
values, and that timespec struct (__u64 sec; __u32 nsec) packs to 12
bytes on i386 but 16 on x86_64 due to differing __u64 alignment.
sizeof(struct) is part of the ioctl cmd number via _IOR(), so the cmd
emitted by 32-bit userspace doesn't match the case label compiled by
the 64-bit kernel and the switch falls through to -ENOTTY.
btrfs already handles this exact class of bug for
BTRFS_IOC_SET_RECEIVED_SUBVOL via a btrfs_ioctl_timespec_32 struct
plus a _32 cmd alias in fs/btrfs/ioctl.c, but GET_SUBVOL_INFO (added
in 2018, four years after that fix) didn't get the same treatment.
Until a kernel patch lands the test can't exercise the ioctl on
32-bit, so convert TEST(info) to TEST_RET(info) and return
EXIT_TEST_SKIP with a clear message when -ENOTTY comes back. The
other tests in the file use ioctls that already have working compat
paths and remain unaffected.
Luca Boccassi [Thu, 21 May 2026 11:05:34 +0000 (12:05 +0100)]
network: several fixlets for NDisc (#42218)
Unfortunately, previously the path to test-ndisc-send has been wrong, so
some test cases have not been checked in our mkosi CIs. And two test
cases have been broken.
The test case `test_ndisc_redirect` was not updated when the logic in
networkd was changed by 9142bd5a8e9ed94ecbb1e335305e24760b90ad2a. The
change itself should be OK. So, the test case is updated.
The test case `test_ndisc_mtu` was broken when the commit 32417c172383847ec78b672c537594e3efe8f0e0 is merged. The commit is not
correct, as we cannot set IPv6 MTU larger than interface MTU. So, the
offending commit is reverted.
Daan De Meyer [Thu, 21 May 2026 11:03:29 +0000 (13:03 +0200)]
core: better errors and more fields for io.systemd.Unit.StartTransient (#42161)
core: add User,Group,SupplementaryGroups,Nice to varlink
Unit.StartTransient
This commit adds more writable fields to the
io.systemd.Unit.StartTransient
varlink method. With this its possible to set:
User,Group,SupplementaryGroups,Nice values.
Plus tests for them.
---
core: report unsupported service fields in varlink calls
Just like for the unsupported/bad exec_fields we should show
a message about what field is bad for service parameters. This
commit adds it using the same pattern. The JSON parser works in
fail-fast mode so we only display the first bad field (and
it depends on the parser what it finds first).
Daan De Meyer [Wed, 20 May 2026 12:37:15 +0000 (12:37 +0000)]
math-util: round to declared FP precision consistently across architectures
Add -fexcess-precision=standard so gcc inserts ISO C99 conformant
rounding at assignments, casts, and returns — without it, double values
on x87 happily stay at 80-bit extended precision across operations and
diverge from the SSE/x86_64 behavior, making strict equality comparisons
architecture-dependent.
The flag doesn't fully cover x87: per gcc PR#323
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323), a function return
value carried in ST(0) can arrive at the caller still at 80-bit, so a
double that ought to compare equal to a same-magnitude literal picks up
extra mantissa bits and doesn't. Wrap fp_equal in volatile-double
temporaries to force a memory roundtrip — the only operation that
reliably truncates on x87 — so its callers get consistent results
regardless of how the operands were produced.
Add a TEST(fp_equal) case that exercises the previously-broken pattern:
a runtime 1.0/10.0 computed inside a noinline helper, returned across
the function ABI boundary, then compared against the literal 0.1.
Without the volatile truncation this assertion fails on 32-bit gcc.
Currently with FileDescriptorStorePreserve=yes the FD store is kept around
regardless of what happens to a unit, which is useful in many cases. But in
some cases, for example when complex services crash horribly, it's hard to
reason about what was in the intermediate state, and it's better to start
fresh.
Add a new 'on-success' option for the FileDescriptorStorePreserve= setting
that keeps it around only for as long as the unit doesn't go to a persistently
failed state.
This is especially useful in combination with LUO, where we don't want to
keep around LUO sessions created by units that then proceeded to crash and
burn, and might be in a bad state afterwards.
network: restart DHCPv6, NDisc, and RADV when tracked IPv6LL is dropped
When the tracked IPv6 link-local address is removed, networkd clears
link->ipv6ll_address, but keeps DHCPv6, NDisc, and RADV running. These
engines keep using a stale source identity which affects the following:
- DHCPv6 client continues to send Solicit/Renew/Rebind from a nonexistent
source address.
- NDisc continues to send Router Solicitations from a nonexistent source
address. Router Advertisements cannot be received properly.
- RADV continues to advertise with a stale source address. This can lead
to downstream hosts configuring invalid routes.
- DHCP-PD prefixes remain configured without a valid upstream DHCPv6 path.
Added link_ipv6ll_lost() to stop IPv6 dynamic engines and related states:
- sd_dhcp6_client_stop()
- ndisc_stop() + ndisc_flush()
- sd_radv_stop()
This is called from address_drop() when the dropped address matches the
tracked IPv6LL. After clearing the tracked address, it scans for another
ready link-local address on the interface. If found, this is set as
link->ipv6ll_address and link_ipv6ll_gained() is called to restart the
engines with the new source identity.
Rocker Zhang [Sat, 16 May 2026 05:07:56 +0000 (13:07 +0800)]
logind: keep lingering users at startup-time GC
manager_startup() runs manager_gc(m, /* drop_not_started= */ false)
before the user_start() loop. user_may_gc()'s linger guard requires
user_unit_active() to be true to keep a user, but at this point the
per-user units have not been started yet, so for any lingering user
that ended up in the user_gc_queue the guard falls through and
manager_gc frees the User struct before user_start() ever runs.
This only manifests after `systemctl soft-reboot`, because /run is
tmpfs and survives soft-reboot: /run/systemd/users/UID files persist,
and manager_enumerate_users() in src/login/logind.c explicitly calls
user_add_to_gc_queue() for every UID it loads from there. Cold boot is
unaffected because /run is empty, so the linger users that come in via
manager_enumerate_linger_users() never enter the GC queue at all and
reach user_start() directly.
Special-case the startup-time GC: if a linger file exists, keep the
user regardless of unit state — user_start() is about to run and will
queue the appropriate jobs. Steady-state GC (drop_not_started=true, in
the event loop) still requires user_unit_active() so we don't hold on
to records for lingering users whose units genuinely died.
Fixes: https://github.com/systemd/systemd/issues/41789 Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Daan De Meyer [Mon, 18 May 2026 20:29:04 +0000 (20:29 +0000)]
tree-wide: standardize header names across src/fundamental, src/basic and src/shared
Drop the -fundamental suffix from src/fundamental/ headers in favor of names
that match their src/basic/ or src/shared/ counterparts (e.g.
macro-fundamental.h -> macro.h, assert-fundamental.h -> assert-util.h,
cleanup-fundamental.h -> cleanup-util.h). Rename src/basic/{btrfs,label}.{c,h}
to use the -util suffix to match the existing shared/btrfs-util and
shared/label-util siblings. Rename src/shared/mkdir-label.{c,h} to mkdir.{c,h}
and src/shared/tmpfile-util-label.{c,h} to tmpfile-util.{c,h} to match the
corresponding src/basic names.
This saves us from having to come up with separate names for files that do
the same thing across tiers, and it makes it easier to move stuff between
src/fundamental, src/basic and src/shared: consumers just #include "foo.h"
and pick up whichever tier their -I path resolves to first, so call sites
don't need to be updated when an API moves between layers.
Where a higher-tier wrapper exists (e.g. src/basic/macro.h wrapping
src/fundamental/macro.h), the wrapper uses an explicit "../fundamental/foo.h"
or "../basic/foo.h" relative include for the lower-tier header. We can't use
GCC's #include_next directive for this — when the wrapper is reachable both
via same-dir-as-source lookup and via -I (e.g. -Isrc/shared) for the
directory it lives in, #include_next advances by exactly one slot in libcpp's
internal directory chain and lands on the same physical directory it was
already in, never reaching the lower-tier sibling (see make_cpp_dir() in
gcc/libcpp/files.cc:1986).
To make sure the right headers are always picked up, the include directories
are reordered so that e.g. src/shared always takes priority over src/basic and
similar for the other directories.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
firstboot: make clear where the full screen console wizards end
The three first boot screens are visually separated from the console
output before them via the "chome" and welcome strings and sufficient
whitespace. But so far they weren't from the output after them. This is
sometimes a big confusing. Let's add a bit of a separator between the
end and what comes next, too.
Yu Watanabe [Thu, 21 May 2026 00:07:34 +0000 (09:07 +0900)]
vmspawn: rearrange --help and man page a bit (#42200)
This does not change man page or --help contents at all, but it does
introduce new sections, and moves some knobs between sections. It also
reorders things in the man page so that man page and --help show items
in the same order again.
Yu Watanabe [Wed, 20 May 2026 22:34:43 +0000 (07:34 +0900)]
test-network: fix test case for Neighbor Announcement message handling
After 9142bd5a8e9ed94ecbb1e335305e24760b90ad2a, when NA without router
flag is received, the corresponding redirect route and the default route
is removed, but the other routes are kept.
The corresponding test case was not updated by the commit, and the test
case has been unfortunately skipped...
Yu Watanabe [Wed, 20 May 2026 14:40:51 +0000 (23:40 +0900)]
test-network: try to stop test-modem-manager-mock.service only when necessary
Otherwise, all test cases that does not create/start the service emits
the following error:
```
Failed to stop test-modem-manager-mock.service: Unit test-modem-manager-mock.service not loaded.
```
Moreover, without this change, extra 'systemctl daemon-reload' is triggered after
all test cases. That's super heavy, especially when the test is running on
sanitizers.
* 3e1930512d Downgrade dependency on dbus to recommends in sd-container
* 61d6ecf0a0 Conflict with sysuser-helper
* fbc4646437 autopkgtest: add dependency on procps
vmspawn: move --kernel=/--initrd= under the "Execution" section
This has little to do with host configuration (where it was so far), and
a lot with what being executed, let's move it over.
Note that --help and man page so far differed here quite a bit: the
former had the "Execution" section, the latter didn't. This creates it
in the man page, to bring the two back in sync.
Philip Withnall [Wed, 20 May 2026 16:15:00 +0000 (17:15 +0100)]
updatectl: Add a --no-ask-password argument
While commit 83c1e8ff5f9 added support for interactive polkit
authentication in `updatectl`, some users might want to disable that for
some use cases; so add the standard `--no-ask-password` argument.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Helps: https://github.com/systemd/systemd/issues/37412
Daan De Meyer [Tue, 19 May 2026 10:36:01 +0000 (10:36 +0000)]
ci: run the musl build & test under mkosi with a postmarketOS tools tree
Drop the standalone Unit-tests (musl) workflow that ran on an Alpine sandbox
spun up by jirutka/setup-alpine, and merge it into unit-tests.yml as a new
build-musl job that provisions a postmarketOS tools tree via mkosi and runs
the meson build + test suite through 'mkosi box'. postmarketOS is musl-native,
so the musl-gcc / -idirafter /usr/include wrappers the Fedora tools tree
needed are gone; the linter.yml's own musl build step also goes away since
the unit-tests workflow now covers it (and tests it).
postmarketOS doesn't ship a downstream systemd packaging spec, so the new
tools tree config in mkosi.tools.conf/mkosi.conf.d/postmarketos.conf does not
set PrepareScripts and lists build deps manually. mkosi.sync now early-exits
when PKG_SUBDIR is unset so the missing pkgenv entry doesn't trip set -u.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Daan De Meyer [Wed, 20 May 2026 17:21:13 +0000 (17:21 +0000)]
test-path: Skip test when we can't create a cgroup
Instead of having CI runner specific checks, let's just
skip the test if we get EXIT_CGROUP which is what we get
when we can't create a cgroup. This makes the check work
independently of CI runner, and specifically also on github
actions.
Daan De Meyer [Wed, 20 May 2026 12:14:52 +0000 (12:14 +0000)]
ptyfwd: Imply PTY_FORWARD_READ_ONLY if stdin isn't readable
if stdin is connected to a closed pipe or similar, imply
PTY_FORWARD_READ_ONLY so we don't even try to read from it
in the first place. Otherwise we'll immediately get a hangup
which will cause the forwarder to call sd_event_exit() and
shut down the event loop.
Debugged-by: Christian Brauner <brauner@kernel.org>
Yu Watanabe [Wed, 20 May 2026 15:39:47 +0000 (00:39 +0900)]
systemd-udevd: configure NIC IRQs CPU affinity (#40304)
# Context
#40195 defines the initial proposal and the motivation behind this PR.
This PR introduces 3 new options for `.link` files `[Link]` section:
- `IRQAffinityPolicy=`
- `IRQAffinity=`
- `IRQAffinityNUMA=`
The purpose is to allow `systemd-udevd` to configure a NIC's IRQs
affinity to specific CPU(s).
`IRQAffinityPolicy=` supports two policies:
- `single`: assign all the NIC IRQs to CPU 0, or the first CPU in the
CPU set resulting from the union of `IRQAffinity=` and
`IRQAffinityNUMA=`.
- `spread`: assign all the NIC IRQs to all the CPUs (or the union of
`IRQAffinity=` and `IRQAffinityNUMA=` if defined) in a round-robin
fashion while optimizing for cache locally while spreading apart queues
on CPUs as much as possible.
Both `IRQAffinity=` and `IRQAffinityNUMA=` behaves as filters to reduce
the CPU set to assign IRQs to, and are only valid if
`IRQAffinityPolicy=` is defined.
# Spreading IRQs
This section describes the algorithm responsible for spreading IRQs over
different CPUs to maximize performance.
## 1. Discover CPU topology
Read from `/sys/devices/system/cpu/cpu*/topology` to identify:
- L3 cache domains (dies)
- Physical cores VS hyperthreads
- NUMA nodes
- Core ordering within each die
```
Example: Dual-socket server with 2 dies per socket, 4 cores per die
Use only the first hyperthread of each physical core to avoid SMT
contention. Two IRQs on sibling HTs contend for ALU/cache without cache
benefit.
```
Before: CPUs 0-31 (16 cores × 2 HTs)
After: CPUs 0-15 (first HT of each core)
```
## 3. Equidistant permutations
Reorder dies and CPUs so consecutive selections are maximally spread
apart.
```
Original order: [0, 1, 2, 3] (adjacent dies/CPUs)
|
v
Equidistant: [0, 2, 1, 3] (spread apart)
```
This ensures that even if only 2 IRQs are assigned, they land on dies 0
and 2 (not 0 and 1), maximizing physical distance. The permutation is
also applied within each die:
```
Die permutation: Die0 -> Die2 -> Die1 -> Die3
Within each die:
┌───────────────────────────────────────────────┐
│ Die 0: [C0,C1,C2,C3] -> [C0,C2,C1,C3] │
│ Die 1: [C4,C5,C6,C7] -> [C4,C6,C5,C7] │
│ Die 2: [C8,C9,C10,C11] -> [C8,C10,C9,C11] │
│ Die 3: [C12,C13,C14,C15] -> [C12,C14,C13,C15] │
└───────────────────────────────────────────────┘
```
## 4. Round-robin selection across dies
Pick one CPU from each die in rotation, following permuted order.
```
Round 1: Die0->C0 Die2->C8 Die1->C4 Die3->C12
| | | |
v v v v
IRQs: [IRQ0] [IRQ1] [IRQ2] [IRQ3]
Round 2: Die0->C2 Die2->C10 Die1->C6 Die3->C14
| | | |
v v v v
IRQs: [IRQ4] [IRQ5] [IRQ6] [IRQ7]
```
If there are more IRQs than physical cores, this logic wraps around and
reuse CPUs. Only the first hyperthread of each core is used to avoid
cache line contention between queues.
Daan De Meyer [Fri, 15 May 2026 14:59:49 +0000 (14:59 +0000)]
basic/math-util: drop libm where possible
- test-random-util is reworked to not use sqrt()
- pretty-print.c inlines ceil() so libm doesn't have
to be linked into libshared
- We add fno-math-errno to allow inlining of more math
functions by not requiring standard math functions to
set errno on invalid input.
Daan De Meyer [Fri, 15 May 2026 20:46:54 +0000 (20:46 +0000)]
meson: shrink developer-mode build artifacts
Two complementary changes in the developer-mode branch of meson.build:
1. -ffunction-sections -fdata-sections: pair with the existing
-Wl,--gc-sections so the linker can drop unused individual functions
and data instead of being forced to pull whole .o files into each
binary. Biggest impact on statically-linked NSS/PAM modules (a single
call into creds-util.c used to drag in the entire creds-util
translation unit, which transitively pulled TPM2, OpenSSL, PKCS11 and
KDF helpers via tpm2-util.c / openssl-util.c) and on tests that embed
daemon objects via meson's objects: extraction.
2. -gz=zstd + -fdebug-types-section + -Wl,--compress-debug-sections=zstd:
compress every .debug_* section with zstd, and move type DIEs into a
COMDAT-mergeable section so identical types described across many TUs
land once. Both are transparent to GDB / readelf / addr2line.
Gated to mode == 'developer' for now: no major distro (Fedora, Debian/
Ubuntu, Arch, Alpine, Gentoo, openSUSE, Yocto) enables -ffunction-sections
in their system-wide default CFLAGS, and the interaction with -flto=auto +
-ffat-lto-objects (which Fedora et al. ship by default) deserves a broader
evaluation before turning it on for release builds. Developer mode benefits
straightforwardly: smaller plugins, smaller tests, smaller libraries, no
interference with the hardening/LTO flag combinations distros pin.
Size impact on a clean developer-mode build, 626 ELF objects:
The big test wins come from the ~30 daemons (systemd-networkd,
systemd-resolved, systemd-journald, systemd-logind, systemd-homed,
systemd-importd, systemd-machined, …) whose compiled .o files are embedded
directly into their unit tests via meson's objects: extraction mechanism.
With per-function sections on the daemon sources, the test binary can GC
the bulk of code it never exercises; the remaining DWARF is then shared
zstd-compressed across every .o.
Build-speed cost is below noise on a 24-core build: across four clean
builds (with-flags / sections-only / baseline / with-flags rerun) the
range was 23.6–26.0 s real time and 7m39s–7m48s user time, with the
two with-flags runs faster than the baseline by a couple of seconds —
overhead from per-function-section bookkeeping and zstd compression
disappears into parallel-build noise.
Daan De Meyer [Wed, 13 May 2026 21:01:41 +0000 (23:01 +0200)]
repart: canonicalize node in varlink Run method
Run acquire_root_devno() on the varlink-provided node so symlinks (e.g.
/dev/disk/by-id/...) resolve to their canonical /dev/ path before being
used. Without this, sym_fdisk_partname() produces a "-partN" symlink
that udev hasn't created yet when repart calls open() on it right after
BLKPG_ADD_PARTITION, failing with ENOENT.
This also brings the varlink path in line with the CLI path's
partition-to-whole-disk and dm-crypt-to-backing resolution.
Michael Vogt [Wed, 20 May 2026 06:52:20 +0000 (08:52 +0200)]
core: tweak error handling around VARLINK_ERROR_UNIT_BAD_SETTING
This commit move the erros in StartTransient from
VARLINK_ERROR_UNIT_BAD_SETTING to SD_VARLINK_ERROR_INVALID_PARAMETER
and it also ensures we have the bad field in the error.
Michael Vogt [Mon, 18 May 2026 16:32:23 +0000 (18:32 +0200)]
core: add User,Group,SupplementaryGroups,Nice to varlink Unit.StartTransient
This commit adds more writable fields to the io.systemd.Unit.StartTransient
varlink method. With this its possible to set:
User,Group,SupplementaryGroups,Nice values.
Luca Boccassi [Mon, 18 May 2026 11:56:39 +0000 (12:56 +0100)]
po: skip automated fuzzy translations when generating new po files
The fuzzy translations are always wrong, but meson's integration does
not allow skipping them. Add a tiny wrapper for 'msgmerge' to
workaround the issue and skip them when running ninja systemd-update-po
Paul Meyer [Tue, 19 May 2026 11:56:46 +0000 (13:56 +0200)]
vmspawn: use EPYC-v4 cpu for SNP
SNP requires a named, stable CPU model so the launch measurement is
reproducible across hosts. EPYC-v4 is the baseline that covers all
SNP-capable processors (Milan and later).
Paul Meyer [Mon, 18 May 2026 05:50:34 +0000 (07:50 +0200)]
vmspawn: initial support for SEV-SNP guests
Add --confidential-computing=sev-snp to run the guest as an AMD SEV-SNP
confidential VM. Loads a raw OVMF firmware blob via -bios (SNP doesn't
support the pflash + NVRAM split), attaches a sev-snp-guest object,
and hashes the kernel, initrd and cmdline into the launch measurement
when direct kernel boot is used. Incompatible features (Secure Boot,
CXL, virtio-balloon, SMBIOS credentials) are rejected or disabled; an
attached vTPM must be treated as untrusted by the guest.
The feature is marked experimental in the man page.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Paul Meyer <katexochen0@gmail.com>
udev/net: add IRQAffinityNUMA= option for NUMA-aware filtering
Add support for filtering IRQ affinity to CPUs on a specific NUMA node
via the new IRQAffinityNUMA= option in .link files. The option accepts:
- "local": use the NUMA node local to the NIC's PCIe slot
- Explicit node number (0, 1, 2, ...): use CPUs on the specified node
When both IRQAffinity= and IRQAffinityNUMA= are specified, their
intersection is used. If the intersection is empty, an error is logged
and IRQ affinity configuration is skipped.
When "local" is specified but the device's NUMA node cannot be
determined (numa_node shows -1), a warning is logged and IRQ affinity
configuration is skipped.
udev/net: add IRQAffinity= option to filter eligible CPUs
Add IRQAffinity= option to .link files that filters the set of CPUs
eligible for IRQ placement. This works in conjunction with
IRQAffinityPolicy= to constrain which CPUs receive network IRQs.
When specified with spread policy, only the listed CPUs are considered
for IRQ distribution. When specified with single policy, IRQs are
pinned to the first CPU in the allowed set instead of CPU 0.
udev/net: implement IRQAffinityPolicy=spread with topology awareness
Implement the spread policy for IRQ affinity distribution using a
topology-aware algorithm. The algorithm:
1. Discovers CPU topology from sysfs (NUMA node, package, die/L3, core)
2. Groups CPUs by L3 cache domain (die) with equidistant ordering
3. Round-robins across dies, spreading IRQs across the system
4. Uses first hyperthread of each core before second hyperthreads
5. Applies IRQ affinity via /proc/irq/<n>/smp_affinity
When there are more IRQs than CPUs, queues wrap around using round-robin.
udev/net: add IRQAffinityPolicy= option for .link files
Add support for configuring IRQ affinity for network interfaces via
systemd .link files. For now, the new IRQAffinityPolicy= option in the [Link]
section only accepts "single", which pins all MSI IRQs to CPU 0.
This allows declarative IRQ affinity configuration for network devices
during udev processing, which is useful for optimizing network
performance on multi-core systems.
Further commits will expand the options supported by IRQAffinityPolicy=.
Michael Vogt [Thu, 30 Apr 2026 07:18:41 +0000 (09:18 +0200)]
core: report unsupported service fields in varlink calls
Just like for the unsupported/bad exec_fields we should show
a message about what field is bad for service parameters. This
commit adds it using the same pattern. The JSON parser works in
fail-fast mode so we only display the first bad field (and
it depends on the parser what it finds first).
Frantisek Sumsal [Wed, 20 May 2026 08:47:22 +0000 (10:47 +0200)]
dns-packet: bail out early if the packet is too short
Let's bail out early if the packet claims to contain some
questions or answer RRs, but the remaining packet data size is not
enough to hold a single such entry.
Daan De Meyer [Thu, 14 May 2026 19:20:02 +0000 (19:20 +0000)]
test: add test-link-abi to enforce link-time ABI invariants
For every built executable, internal shared library, and plugin module,
verify two link-time properties via readelf:
1. No imported GLIBC symbol's version is newer than 2.34.
2. The dynamic section's NEEDED entries reference only glibc, the
runtime linker, our own libraries.
Daan De Meyer [Fri, 15 May 2026 12:16:01 +0000 (12:16 +0000)]
tree-wide: Replace exp10() with our own impl
exp10() has a symbol version > 2.34 on latest glibc. To allow
dropping our baseline required glibc runtime version to <= 2.34,
let's add our own version to prevent pulling in the newer symbol
from glibc.
- `io.systemd.Job.List` — list all queued jobs or look up by `id`/`unit`
name, with streaming support. Uses context/runtime split: `JobContext`
(Unit, JobType) and `JobRuntime` (Id, State, Result,
ActivationDetails). Follows the same SELinux and parameter-conflict
patterns as `io.systemd.Unit.List`.
- `io.systemd.Job.Cancel` — cancel a specific job by ID, with SELinux
and polkit authorization.
- `io.systemd.Job.ClearAll` — cancel all pending jobs, with SELinux and
polkit authorization.
Patrick Rohr [Mon, 4 May 2026 20:31:10 +0000 (13:31 -0700)]
networkd: fix race condition in per-interface ICMPv6 processing
There exists a small window of time in icmp6_bind() between creating the
ICMPv6 socket and binding it to an ifindex, where the link-scoped socket
can process an ICMPv6 packet received on any interface. The applies to
both sd-radv and sd-ndisc codepaths.
This change adds an explicit check for ifindex on the receive path and
ignores packets received on other interfaces.
Yu Watanabe [Wed, 20 May 2026 01:22:32 +0000 (10:22 +0900)]
sd-bus: add depth limit to message_skip_fields() to prevent stack overflow (#42164)
`message_skip_fields()` recursively processes D-Bus variant types in
message header fields with no depth limit. A crafted message with deeply
nested variants can cause unbounded recursion and overflow the stack.
Add a `depth` parameter checked against `BUS_CONTAINER_DEPTH` (128),
matching the limit already enforced by the public
`sd_bus_message_skip()` API. All recursive call sites pass `depth + 1`,
and the top-level caller in `message_parse_fields()` passes `0`.
Michael Vogt [Tue, 19 May 2026 18:32:41 +0000 (20:32 +0200)]
core: improve errors from varlink io.systemd.Unit.StartTransient
The existing error reporting for the varlink `StartTransient` code
was converting all errors into `VARLINK_ERROR_UNIT_BAD_SETTING`.
This is not correct in some cases, we need to have a more targted
pattern here, i.e. only convert EINVAL to VARLINK_ERROR_UNIT_BAD_SETTING
and otherwise return the matching varlink error from the errno instead.
This commit fixes this issue. Thanks to Ivan Kruglov for raising
this.
Yu Watanabe [Wed, 20 May 2026 01:08:11 +0000 (10:08 +0900)]
tree-wide: move static dl handles into their dlopen_*() functions (#42168)
Each dlopen_*() wrapper kept its dl handle as a file-scope
'static void *xxx_dl = NULL;' even though only the wrapper itself
ever referenced it. Move each one inside the corresponding function
so its scope matches its actual use, leaving the rest of each
translation unit free of the unused file-scope name.
In pcre2-util.c the assert(pcre2_dl) in pattern_matches_and_log()
becomes assert(sym_pcre2_match), which carries the same invariant
(pattern_compile_and_log() ran dlopen_pcre2()).
Luca Boccassi [Tue, 19 May 2026 21:42:25 +0000 (22:42 +0100)]
test: switch TEST-55-OOMD stress-ng --vm-method to lfsr32
Commit 881e4717c7 ("test: pin stress-ng --vm-method to a portable
scalar method in TEST-55-OOMD") pinned --vm-method=zero-one with the
rationale that it is "a long-standing scalar method". That rationale is
wrong: stress_vm_zero_one() in stress-ng's stress-vm.c is declared
i.e. it carries the exact same TARGET_CLONES attribute as 33 of the 35
other vm methods. On x86_64 with GCC >=5, TARGET_CLONES expands (see
core-target-clones.h in stress-ng) to a target_clones attribute
including "arch=skylake-avx512", "arch=cooperlake", "arch=tigerlake",
"arch=sapphirerapids", and several other AVX-512-bearing arch variants,
plus "default". GCC generates AVX-512 clones of stress_vm_zero_one() and
the IFUNC resolver picks them on any CPU that advertises AVX-512.
The only vm methods in stress-ng's registry whose function definitions
omit TARGET_CLONES entirely (and are therefore guaranteed not to
dispatch to an AVX-512 clone) are lfsr32 (portable, always registered)
and write64ds (x86_64-only, gated on HAVE_ASM_X86_MOVDIRI, i.e. Intel
Tremont / Tiger Lake+ MOVDIRI instruction).
Switch the four stress-ng --vm invocations in TEST-55-OOMD to
--vm-method=lfsr32 so the AVX-512 SIGILL on CPUs without AVX-512 (e.g.
AMD Zen 1-3) can no longer occur regardless of compiler version,
optimization level, or stress-ng package build.