Luca Boccassi [Tue, 19 May 2026 10:29:58 +0000 (11:29 +0100)]
test: pin stress-ng --vm-method to a portable scalar method in TEST-55-OOMD
The stress-ng "vm" stressor's default --vm-method=all cycles through every
VM stress method, including newer ones that use AVX-512 instructions. On
CPUs without AVX-512 support (e.g. AMD Zen 1 to 3) those methods crash with
SIGILL. In testcase_oom_rulesets_lasting_sec all 10 stress-ng workers die
within ~2.34 seconds, so by the time the 6 second sleep elapses the unit
is already in failed/exit-code state and the assert_eq for
ActiveState=active trips.
Pin --vm-method=zero-one, a long-standing scalar method, on all four
stress-ng --vm invocations in this test (the two transient services in
testcase_oom_rulesets and testcase_oom_rulesets_lasting_sec, plus
TEST-55-OOMD-testbloat.service and TEST-55-OOMD-testmunch.service) so the
workers do not crash on AVX-512-less CPUs. testbloat, testmunch and
testcase_oom_rulesets have not been observed failing because they get
OOM-killed by systemd-oomd within ~1 to 2 seconds, before stress-ng cycles
into an AVX-512 method, but they share the same latent flake.
Journal excerpts from the failing run, TEST-55-OOMD-slowrule.service in
testcase_oom_rulesets_lasting_sec (journalctl -o short-monotonic):
[ 58.018676] stress-ng[1015]: invoked with '/usr/bin/stress-ng --timeout 15s --vm 10 --vm-bytes 50M --vm-keep' by user 0 'root'
[ 59.866072] stress-ng[1030]: stress-ng: debug: [1030] caught SIGILL, address 0x000055bd8d609140 (ILL_ILLOPN)
[ 59.921050] stress-ng[1030]: stress-ng: debug: [1030] stress-ng: info: 0x000055bd8d609140:<62>71 fd 48 6f 2d 36 14 1c 00 c5 d1 ef ed 49 29
[ 59.929310] stress-ng[1015]: stress-ng: error: [1015] vm: [1021] terminated with an error, exit status=2 (stressor failed)
[ 60.364111] stress-ng[1015]: stress-ng: info: [1015] failed: 10: vm (10)
[ 60.364493] stress-ng[1015]: stress-ng: info: [1015] unsuccessful run completed in 2.34 secs
[ 60.371290] systemd[1]: TEST-55-OOMD-slowrule.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
[ 60.371396] systemd[1]: TEST-55-OOMD-slowrule.service: Failed with result 'exit-code'.
[ 64.017061] TEST-55-OOMD.sh[1010]: + assert_eq failed active
[ 64.018167] TEST-55-OOMD.sh[1039]: FAIL: expected: 'active' actual: 'failed'
The faulting bytes marked by stress-ng with <62> (the byte at the
instruction pointer) decode unambiguously to an AVX-512 VMOVDQA64 using
the 512-bit zmm13 register, confirmed independently by two disassemblers:
The leading 0x62 is the EVEX prefix (exclusive to AVX-512 on this target),
zmm13 is a 512-bit register that only exists when AVX-512 is implemented,
and VMOVDQA64 requires the AVX512F (Foundation) CPUID feature (Intel SDM
Vol 2C). Executing this on a CPU without AVX-512 raises #UD, delivered by
the kernel as SIGILL/ILL_ILLOPN, matching the journal entry above. The
same journal shows the kernel reporting "kvm_amd: TSC scaling supported",
i.e. the guest is on AMD KVM, and AMD did not ship AVX-512 before Zen 4.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
fstab-generator: fix spurious quota warning for xfs
Filesystems like xfs, btrfs, gfs2 and ocfs2 handle quotas internally
and do not need external quotacheck/quotaon services. When usrquota or
grpquota mount options are used in fstab for these filesystems,
generator_hook_up_quotacheck() falls through to the !fstype_needs_quota()
branch and emits a misleading warning that quotas are "not supported"
when they actually work fine — the kernel handles them internally.
Add fstype_has_internal_quota() to return early with a debug message,
and adopt a tri-state return convention so the caller skips quotaon
when quotacheck was not needed.
The buggy code path was introduced in #24824 and #24880.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Daan De Meyer [Tue, 19 May 2026 06:54:27 +0000 (08:54 +0200)]
Various dlopen/linking cleanups from #42100 (#42166)
- **bpf-util: rename from bpf-dlopen, unify version-specific symbol
handling**
- **cryptsetup: dlopen libcryptsetup in tokens**
- **tree-wide: dlopen libpam in pam plugins**
- **test-bus-marshal: dlopen() glib and libdbus instead of linking
directly**
- **lock-util: Simplify timeout for lock_generic_with_timeout()**
- **color-util: simplify hsv_to_rgb, fix rgb_to_hsv negative-hue wrap**
- **tree-wide: Use our own macros instead of fabs()/fmax()/fmin()**
- **locale-util: dlopen() libintl instead of linking against it**
- **home: Use log2u64() over log2()**
- **meson: drop libdl, threads, and librt dependencies**
- **libc: Use dlsym() from a constructor instead of weak symbols**
- **libc: Make sure C23 versions of strtol(), sscanf() are not used**
`NFTSet` is supposed to be in `Service` instead of `Unit` section. The current example leads to `Unknown key 'NFTSet' in section [Unit], ignoring` in systemd logs.
dissect: guard against ssize_t overflow in LUKS2 header parser (#42162)
The `json_len` variable in `partition_is_luks2_integrity()` is
`ssize_t`, but the subtraction `be64toh(header.hdr_len) -
LUKS2_FIXED_HDR_SIZE` can yield a value exceeding `SSIZE_MAX` when
`hdr_len` is a large crafted value. This causes signed integer overflow
and a subsequent oversized `malloc()` that fails with `-ENOMEM`,
producing a misleading out-of-memory error instead of a clear
invalid-header rejection.
Two call sites pass `size = UINT64_MAX`, which neutralizes the existing
`hdr_len > size` guard.
Add an explicit check against `SSIZE_MAX` before the cast to `ssize_t`.
TristanInSec [Mon, 18 May 2026 17:30:51 +0000 (13:30 -0400)]
resolved: add missing polkit checks on FlushCaches and ResetServerFeatures D-Bus methods
The FlushCaches and ResetServerFeatures D-Bus methods perform
destructive operations (flushing all DNS caches and resetting server
feature negotiation including DNS-over-TLS state) without any
authorization check. The corresponding Varlink methods already enforce
polkit via verify_polkit(), but the D-Bus handlers were not updated.
Add bus_verify_polkit_async() calls to both methods, matching the
pattern used by ResetStatistics. Add the corresponding policy actions
to the polkit policy file.
Daan De Meyer [Thu, 14 May 2026 19:20:02 +0000 (19:20 +0000)]
libc: Make sure C23 versions of strtol(), sscanf() are not used
When _GNU_SOURCE is defined, glibc will always use c23 versions
of strtol(), sscanf() and friends if available (introduced after
glibc 2.34). Which means that any binaries built with headers
from newer glibc won't load on glibc < 2.38. To work around this,
redefine the appropriate constants to zero make sure the c99
versions are used instead.
Daan De Meyer [Thu, 14 May 2026 19:20:02 +0000 (19:20 +0000)]
libc: Use dlsym() from a constructor instead of weak symbols
Weak symbols still introduce a version requirement on a newer libc.
Resolve each libc symbol via dlsym(RTLD_DEFAULT) from a per-shim
constructor and cache the result in a file-scope static instead. This
avoids the version requirement, keeps the call path free of atomics
(constructors run single-threaded before main() and before any signal
handler can fire), and keeps dlsym() out of contexts where it is not
async-signal-safe.
Daan De Meyer [Fri, 15 May 2026 09:54:53 +0000 (09:54 +0000)]
meson: drop libdl, threads, and librt dependencies
Our baseline glibc is 2.34, which merged libdl, libpthread (the
dependency('threads') target), and librt into libc. Empty .so/.a stubs
remain for backward compatibility with old binaries, but new builds
resolve dl_*, pthread_*, mq_*, timer_*, etc. directly from libc.
On musl the same libraries are likewise empty stubs.
Drop the libdl, threads, and librt entries from every meson.build, and
remove the now-stale 'Libs.private: -lrt -pthread' from libudev.pc.in
since both flags resolve to empty link-time stubs on glibc 2.34+ and
musl.
Verified with readelf -d that libsystemd.so, libudev.so, and systemd no
longer carry DT_NEEDED entries for libdl/libpthread/librt.
Daan De Meyer [Fri, 15 May 2026 18:33:43 +0000 (18:33 +0000)]
locale-util: dlopen() libintl instead of linking against it
dgettext() lives in libc on glibc and in libintl.so.8 on musl with
gettext. Resolve it via dlsym() so neither configuration produces a
hard link-time dependency on libintl: try libintl.so.8 first and fall
back to RTLD_DEFAULT (which finds dgettext in libc on glibc).
The _() macro now expands to a runtime check that returns the
untranslated string if dlopen_libintl() has not run successfully, so
callers don't have to gate every translatable message on a runtime
check. pam_systemd_home — currently the only consumer of _() — calls
dlopen_libintl() best-effort from each PAM entry point.
The meson find_library('intl') dance is replaced with a has_header()
check; the only thing we need at build time is the prototype.
Daan De Meyer [Fri, 15 May 2026 11:06:21 +0000 (11:06 +0000)]
tree-wide: Use our own macros instead of fabs()/fmax()/fmin()
To make this work, ABS() is made generic so it also works on
floats and doubles.
While at it, fold the __ABS_INTEGER indirection and the
assert_cc(sizeof(long long) == sizeof(intmax_t)) away. The previous
form switched between __builtin_llabs (clang) and __builtin_imaxabs
(gcc), with the assert keeping the two paths behaviorally identical
on every platform we build for. imaxabs was originally chosen because
intmax_t is conceptually the widest signed integer type the platform
exposes, but the _Generic ABS already casts to (long long) before the
call, so the extra width imaxabs could in theory carry was being
narrowed away immediately anyway. With both paths collapsed to
__builtin_llabs((long long) (a)), the size relationship between
long long and intmax_t is no longer relevant.
Also add explicit unsigned long long / unsigned long / unsigned int
cases that pass the argument through unchanged. The previous default
branch cast unsigned values to (long long); for values above LLONG_MAX
this reinterprets them as negative, and __builtin_llabs(LLONG_MIN) is
UB. Unsigned values are already non-negative, so passing them through
is both correct and avoids the narrowing. Smaller unsigned types
(unsigned char, unsigned short) still go through the default branch
but promote to int first and fit in long long losslessly.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
In hsv_to_rgb, restructure the conversion around the sector index
k = (int)(h/60) and fractional offset f = h/60 - k. The auxiliary
x value becomes c * (k & 1 ? 1.0 - f : f) and the six branches turn
into a switch on k. This drops the two xfmod() calls that were doing
the modulo work, in exchange for a single assert(h >= 0 && h < 360) —
all in-tree callers satisfy this and never relied on the wrap.
In rgb_to_hsv, the two xfmod() calls were no-ops (their arguments
were always within the divisor's magnitude). The trailing
xfmod(*ret_h, 360) appeared to be wrapping negative hues from the
r-max branch back into [0, 360), but fmod is sign-preserving so it
never did. Drop the no-ops and add an explicit +360 wrap so magenta
(1, 0, 1) now yields h ≈ 300 instead of -60.
Extend the tests to cover all six primary/secondary colors at sector
boundaries, all six sector midpoints (to catch any future inversion
of the ramp direction), the h-near-360 edge of the last sector, and
the rgb_to_hsv negative-wrap path via magenta. Switch the new and
existing integer-channel checks to ASSERT_EQ from tests.h; the
double-typed h/s/v range checks stay on ASSERT_TRUE since the
ASSERT_* comparison macros only support integer types.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Daan De Meyer [Fri, 15 May 2026 18:24:11 +0000 (18:24 +0000)]
test-bus-marshal: dlopen() glib and libdbus instead of linking directly
The test only uses 9 symbols (5 from glib, 4 from libdbus) for its
interop checks; dlopen them at runtime so the binary no longer carries a
hard link-time dependency on either library. Headers are still pulled in
through the *_cflags partial dependencies for the type declarations.
While we're at it, drop the compat glue for glib 2.36 which is long obsolete
at this point.
Daan De Meyer [Fri, 15 May 2026 14:54:04 +0000 (14:54 +0000)]
tree-wide: dlopen libpam in pam plugins
Same reasoning as for cryptsetup tokens. It means we can include
the pam plugins in the main systemd package without the package
manager introducing a dependency on libpam. It also makes things
more consistent and makes writing the upcoming linking test script
a lot simpler.
At the same time, we get rid of the libpam_misc dependency as the
one symbol we were using from it is trivial to reimplement ourselves.
Daan De Meyer [Fri, 15 May 2026 12:16:01 +0000 (12:16 +0000)]
cryptsetup: dlopen libcryptsetup in tokens
This avoids having to subpackage the tokens separately. If they link directly
against libcryptsetup, package manager will automatically add a dependency on
libcryptsetup to the package containing the tokens. With this change, the tokens
can ship in the main systemd package without necessarily pulling in libcryptsetup.
It also makes things more consistent. Once we also do the same for pam, any direct
linking will be limited to just libc, which for example simplifies writing tests for
ensuring we don't link unnecessarily as we don't have to add exceptions for the
cryptsetup tokens.
This actually drops the dependency on cryptsetup-libs for the fedora/centos/opensuse
systemd-udev package so install it explicitly in the initrd now to keep the tests
working.
Daan De Meyer [Thu, 14 May 2026 17:13:06 +0000 (17:13 +0000)]
bpf-util: rename from bpf-dlopen, unify version-specific symbol handling
Renames src/shared/bpf-dlopen.{c,h} to src/shared/bpf-util.{c,h} and
folds the former src/shared/bpf-compat.h (struct forward decl and
compat_bpf_map_create() helper) into the new header.
Aligns dlopen_bpf() with the standard wrapper pattern: drops the
manual dlopen_safe()/dlsym_many_or_warn()/TAKE_PTR(dl) plumbing and
the bespoke 'cached' int in favor of dlopen_many_sym_or_warn() inside
a FOREACH_STRING() soname-fallback loop.
Unifies declaration of the version-specific symbols (bpf_create_map,
bpf_map_create, bpf_object__next_map, bpf_token_create) into a single
DISABLE_WARNING_REDUNDANT_DECLS block in the header, and alphabetically
merges the DLSYM_PROTOTYPE list. DLSYM_OPTIONAL is used to load each
one — call sites already handle NULL (compat_bpf_map_create() and the
sym_bpf_object__next_map guard in userns-restrict.c). bpf_token_create
additionally defaults to a missing_bpf_token_create() stub returning
-ENOSYS, so callers can branch on the errno instead of NULL-checking
the pointer.
Updates test-bpf-token to match: drops the compile-time
LIBBPF_MAJOR_VERSION ≥ 1.5 gate and the direct <bpf/bpf.h> include in
favor of dlopen_bpf() + sym_bpf_token_create(), and treats -ENOSYS as
the test-skip path (covering both 'libbpf too old' and 'kernel lacks
BPF_TOKEN_CREATE support').
Luca Boccassi [Mon, 18 May 2026 20:47:41 +0000 (21:47 +0100)]
import: Handle small files (#42150)
When systemd-pull encountered a file shorter than the compression magic
headers it looks for, then it would complete the download in the
analysis state and fail.
When we are still in the analysis state and the download is done, we
know there is no compression and we should leave the analysis state and
continue writing out to disk as usual.
TristanInSec [Mon, 18 May 2026 18:39:44 +0000 (14:39 -0400)]
dissect: use practical 16 MiB limit instead of SSIZE_MAX
As suggested by @yuwata, SSIZE_MAX is still too large and would cause
malloc() to fail anyway. Use a 16 MiB limit which is generous compared
to the typical 4 MiB maximum in cryptsetup (LUKS2_HDR_OFFSET_MAX).
TristanInSec [Mon, 18 May 2026 17:30:02 +0000 (13:30 -0400)]
dissect: guard against ssize_t overflow in LUKS2 header parser
The json_len variable is ssize_t, but the subtraction
be64toh(header.hdr_len) - LUKS2_FIXED_HDR_SIZE can yield a value
exceeding SSIZE_MAX when hdr_len is a large crafted value. This causes
signed integer overflow and a subsequent oversized malloc() that fails
with -ENOMEM, producing a misleading out-of-memory error.
Add an explicit check against SSIZE_MAX before the cast to ssize_t.
Ivan Kruglov [Mon, 18 May 2026 10:57:43 +0000 (03:57 -0700)]
errno-util: include -ENOENT in ERRNO_IS_XATTR_ABSENT()
The getxattr(2) man page only enumerates xattr-specific errors (ENODATA,
ENOTSUP, ERANGE, E2BIG, ...) in its own ERRORS section, but at the
bottom of that section notes that "the errors documented in stat(2) can
also be returned." stat(2) returns -ENOENT when a component of the path
does not exist, so any xattr lookup against a path can fail with -ENOENT
exactly the same way as -ENODATA — both mean "there is nothing here for
me to read." The previous definition of ERRNO_IS_NEG_XATTR_ABSENT()
reflected only the directly-enumerated errors and missed -ENOENT, so
callers that should semantically swallow "the xattr is absent" instead
bubbled -ENOENT up as a hard error.
The most visible consequence on real fleets has been systemd-journald
spamming dmesg with one line per dispatched log message whenever a
unit's cgroup directory cannot be found at the time
client_context_read_log_filter_patterns() is called — typically inside
containers whose journald observes clients whose unit cgroup is no
longer present in its view (cgroup-namespace boundary, unit teardown
race, transient sub-scope already collapsed back to its unit cgroup,
etc.). The same bug pattern lurks at every other cgroup-xattr callsite:
systemd-oomd reading user.oomd_avoid / user.oomd_omit / user.oomd_ooms
on cgroups it is concurrently killing; killall reading
user.survive_final_kill_signal during shutdown; cg_is_delegated() /
cg_has_coredump_receive() / cgroup_get_managed_oom_kill_last(); etc. For
these, "path is gone" is by construction the same answer as "xattr is
not set" — there is no way for the user to have attached an xattr to a
path that does not exist.
A quick survey of non-cgroup callers (src/portable/portable.c,
src/home/{homework-luks,user-record-util}.c,
src/random-seed/random-seed-tool.c, src/basic/os-util.c) confirms they
all operate on fds or on paths whose absence is already the desired
silent-skip outcome, so widening the macro to also fold in -ENOENT does
not change observable behavior at any other site.
Extend test-xattr-util's getxattr_at_malloc test with an explicit
non-existent-path lookup that asserts ERRNO_IS_NEG_XATTR_ABSENT() now
matches, alongside the pre-existing non-existent-xattr (-ENODATA) check.
Luca Boccassi [Mon, 18 May 2026 11:05:10 +0000 (12:05 +0100)]
dhcp-client: reject messages larger than the maximum UDP payload
dhcp_message_verify_header() only enforced a lower bound on the input
length, so dhcp_message_parse() happily accepted arbitrarily large
buffers. Such inputs could never have been received via UDP and would
later fail in dhcp_message_build() with -E2BIG once the parsed options'
combined size exceeds UDP_PAYLOAD_MAX_SIZE, which the fuzzer surfaced as
an assertion failure.
Reject inputs above UDP_PAYLOAD_MAX_SIZE up front, so the parse stage
mirrors what the wire format can actually carry.
ishwarbb [Mon, 23 Mar 2026 13:02:40 +0000 (13:02 +0000)]
resolved: add configurable DNS cache size
Add CacheSize= option to [Resolve] section of resolved.conf to allow
configuring the maximum number of entries in the per-scope DNS cache.
The default remains 4096 entries. Setting this to 0 disables caching
(similar to Cache=no).
CacheSize= is only read when Cache=yes or Cache=no-negative. When
Cache=no, caching is fully disabled regardless of CacheSize=.
Changes:
- Add cache_size field to Manager struct
- Parse CacheSize= from resolved.conf via gperf
- Thread cache_size through dns_cache_put() and helper functions
- Replace hard-coded CACHE_MAX with the configurable cache_size
- When cache_size is 0 or Cache=no, flush cache and skip caching
- Add man page documentation for the new option
- Add unit tests for cache size enforcement
coredump: use a fixed string instead a scope-delimited compound literal
The compound literal (const char[]){'.','.','.'} has block scope
(C99 6.5.2.5p6). Once we leave the if and loop back, copy[1].iov_base
formally points into a destroyed object. Works on GCC/Clang in practice,
but is UB.
Kai Lüke [Mon, 18 May 2026 07:46:28 +0000 (16:46 +0900)]
import: Handle small files
When systemd-pull encountered a file shorter than the compression magic
headers it looks for, then it would complete the download in the
analysis state and fail.
When we are still in the analysis state and the download is done, we
know there is no compression and we should leave the analysis state and
continue writing out to disk as usual.
core/dbus-execute: propagate oom in property_get_cpu_affinity
The function already returns errors, so I'm not sure why we ignored
the error in the second call, potentially leaving variables unitialized.
It seems easiest to propagate the error.
Kai Lüke [Fri, 15 May 2026 14:49:44 +0000 (23:49 +0900)]
import: Move pull_job_curl_on_finished after pull_job_open_disk
To call into pull_job_write_(un)compressed and pull_job_open_disk from
pull_job_curl_on_finished it has to be defined after them. This is in
preparation for a bug fix for small files where we need to leave the
compression analysis state to finish the download successfully.
Yu Watanabe [Sun, 17 May 2026 23:40:44 +0000 (08:40 +0900)]
network/address: drop duplicated address earlier
network_adjust_dhcp_server() searches network->addresses_by_section,
hence without this change, an address entry picked by
network_adjust_dhcp_server() may be detached and freed by the cleanup
function.
Luca Boccassi [Sun, 17 May 2026 17:22:20 +0000 (18:22 +0100)]
dhcp-client: clear overloaded sname/file fields after parsing
When SD_DHCP_OPTION_OVERLOAD indicates that the sname and/or file header
fields are overloaded with extra DHCP options, dhcp_message_parse() merges
those options into message->options but leaves the raw bytes untouched in
the header. As a result, dhcp_message_build() emits the header (including
the overloaded bytes) verbatim, and the next parse re-parses those bytes,
appending duplicate entries to the options map (each tag's iov list grows).
Subsequent builds then differ from the first, breaking the parse/build
roundtrip.
This was caught by fuzz-dhcp-client, which asserts that two consecutive
build calls produce identical output.
Zero out the overloaded fields after parsing them, since their content has
already been merged into the options map. This makes the roundtrip
idempotent and avoids re-emitting stale overloaded data in the rebuilt
header. The JSON build/parse path was already correct (it omits sname/file
from the JSON when the overload bit is set), so only the binary path needed
fixing.
Yu Watanabe [Wed, 11 Mar 2026 22:00:06 +0000 (07:00 +0900)]
tree-wide: use device_get_sysattr_safe_string()
The obtained strings are passed to another function, e,g, handled as a
path and opened, printed to the terminal, written to a file, saved to
udev database as udev property, exposed through DBus, passed to logger,
and so on. Hence, these should not contain any malicious characters.
Yu Watanabe [Wed, 11 Mar 2026 20:44:51 +0000 (05:44 +0900)]
sd-device: use device_get_sysattr_safe_string()
The read value are exposed by sd_device_get_subsystem() and friends.
Hence, it is better to filter invalid characters.
Of course, these should be always safe unless the kernel is buggy.
But, just for safety.
Note, even if uevent file contains invalid characters, then
device_read_uevent_file() should succeed without parsing the contents.
The caller should fail later with a proper error code if a necessary
field is unset. E.g. sd_device_get_ifindex() should still return -ENOENT
even when uevent file contains an invalid characters.