Extend fake-report-server.py with optional --cert, --key, --port
arguments for TLS support. Add a test case that generates a
self-signed certificate and tests HTTPS upload of metrics and facts.
Also exercise the --header param.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a fake HTTP server (fake-report-server.py) that accepts JSON POST
requests and validates the report structure, and test cases in
TEST-74-AUX-UTILS.report.sh that exercise plain HTTP upload of both
metrics and facts.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
All non-test users iovec_wrapper define the struct as a field in a
bigger structure, so we never free it individually. Let's simplify the
code and assume it is never null.
journal-upload: require TLS 1.2 as the minimum version
RFC 8996 says:
> This document formally deprecates Transport Layer Security (TLS)
> versions 1.0 (RFC 2246) and 1.1 (RFC 4346). Accordingly, those
> documents have been moved to Historic status. These versions lack
> support for current and recommended cryptographic algorithms and
> mechanisms, and various government and industry profiles of
> applications using TLS now mandate avoiding these old TLS versions.
> TLS version 1.2 became the recommended version for IETF protocols in
> 2008 (subsequently being obsoleted by TLS version 1.3 in 2018),
> providing sufficient time to transition away from older versions.
> Removing support for older versions from implementations reduces the
> attack surface, reduces opportunity for misconfiguration, and
> streamlines library and product maintenance.
This code probably only talks to our own receiver which uses
libmicrohttpd. That in turn delegates to GnuTLS, which supports
1.2, 1.3, 3.0, etc.
Previously we compiled curl-util.c at least two times, and then also
shared it using the extract+object. Let's build a static "convenience lib"
for it.
(Using extract+object everywhere is not possible because the different
places where it is used are conditionalized independently so we don't
have a single "source" that is always available.)
shared: move src/import/curl-util.h to src/shared/
Move more common definitions in the header file instead of repeating
them in bunch of places. src/import/curl-util.[ch] is renamed so that
it's shared more naturally with other components.
We cannot use a function, because the type is unknown and we want
to stringify the option name, but we can use a block macro to make
this a bit nicer, with normal code structure in the caller.
Nick Rosbrook [Mon, 13 Apr 2026 20:06:23 +0000 (16:06 -0400)]
test: do not use nanoseconds width specifier in date command
Using the format specifier +%s%6N with GNU date is honored, and only
prints 6 digits of the nanoseconds portion of the seconds since epoch.
The uutils implementation of date does not honor this, and always prints
all 9 digits. This is a known bug[1], but can be worked around by
adapting this test to use nanoseconds instead of microseconds.
tree-wide: convert remaining varlink string fields to enum types (#41615)
Follow-up to #40972. Convert remaining plain string fields to proper
varlink enum types across all interfaces, per the policy that
user-controlled/API fields should be declared as proper enums in the
IDL.
Shared types moved to varlink-idl-common: ExecOutputType,
CGroupPressureWatch, EmergencyAction, ManagedOOMMode — these are reused
across multiple interfaces.
Each interface change includes a corresponding enum sync test to catch
future drift between C string tables and varlink IDL definitions.
Ivan Kruglov [Tue, 14 Apr 2026 09:25:43 +0000 (02:25 -0700)]
docs: clarify when to use varlink enum types vs plain strings
Add guidance on when a field should use a proper varlink enum type
versus remaining a plain string: user-controlled/API fields should be
enums, engine-internal state fields may stay as strings.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Ivan Kruglov [Mon, 13 Apr 2026 10:56:48 +0000 (03:56 -0700)]
varlink: add ManagedOOMMode enum type to io.systemd.oom
Convert the mode field in ControlGroup from plain string to the
ManagedOOMMode enum type from varlink-idl-common. Register
ManagedOOMMode in both io.systemd.oom and io.systemd.ManagedOOM
interfaces since both use the ControlGroup struct.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Ivan Kruglov [Mon, 13 Apr 2026 10:32:16 +0000 (03:32 -0700)]
varlink: add enum types for class and whom fields in io.systemd.Machine
Convert the class field (Register input, List output) from plain string
to MachineClass enum type, and the whom field (Kill input) to KillWhom
enum type.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Ivan Kruglov [Mon, 13 Apr 2026 10:25:21 +0000 (03:25 -0700)]
varlink: add enum types for scheduling and mount settings in io.systemd.Unit
Convert CPUSchedulingPolicy, IOSchedulingClass, NUMAPolicy and MountFlags
fields from plain strings to proper varlink enum types in the io.systemd.Unit
interface. Update the corresponding serialization code to use
json_underscorify() for correct enum value formatting.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Ivan Kruglov [Mon, 13 Apr 2026 09:53:38 +0000 (02:53 -0700)]
varlink: add enum types for configuration settings in io.systemd.Manager
Convert 8 string fields in the io.systemd.Manager varlink interface to
proper enum types:
- LogTarget: new enum (console, console_prefixed, kmsg, journal, ...)
- DefaultStandardOutput/Error: reuse ExecOutputType from common
- DefaultMemory/CPU/IOPressureWatch: reuse CGroupPressureWatch from common
- DefaultOOMPolicy: new enum (continue, stop, kill)
- CtrlAltDelBurstAction: reuse EmergencyAction from common
Output serialization updated to use JSON_BUILD_PAIR_ENUM for automatic
underscorification of dash-containing values.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
NEWS: pre-announce removal of /run/boot-loader-entries/ support in lo… (#41622)
…gind
logind could read UAPI.1 Boot Loader Spec entries from
/run/boot-loader-entries/ in addition to ESP/XBOOTLDR. This was pretty
half-assed, and to my knowledge was never actually used much.
Let's remove support for it and simplify our codebase.
Let's schedule it for removal via NEWS in a future version, to give
people a chance to speak up.
journal-upload: also disable VERIFYHOST when --trust=all is used
When --trust=all disables CURLOPT_SSL_VERIFYPEER, the residual
CURLOPT_SSL_VERIFYHOST check is ineffective since an attacker can
present a self-signed certificate with the expected hostname. Disable
both for consistency and log that server certificate verification is
disabled.
machined: pass user as positional argument in machine_default_shell_args()
Instead of interpolating the user name directly into the sh -c script
body via asprintf %s, pass it as a positional parameter ($1) in a
separate argv entry. This avoids the user string being parsed as part
of the shell script syntax.
Also validate the user name in bus_machine_method_open_shell() with
valid_user_group_name(), matching the validation already done on the
Varlink path via json_dispatch_const_user_group_name().
logind: reject wall messages containing control characters
method_set_wall_message() and the property setter only checked the
message length but not its content. Since wall messages are broadcast
to all TTYs, control characters in the message could interfere with
terminal state. Reject messages containing control characters other
than newline and tab.
core: add missing SELinux access checks when listing units
Add mac_selinux_unit_access_check_varlink() to the unit enumeration
loop in vl_method_list_units(), silently skipping units the caller
is not permitted to see, matching the D-Bus ListUnits behavior.
Add mac_selinux_access_check_varlink() to vl_method_describe_manager().
In ccecae0efd ("vmspawn: use machine name in runtime directory path")
support for RUNTIME_DIRECTORY was dropped which makes it difficult to
run systemd-vmspawn in a service unit which doesn't have write access to
the regular /run but should use its own managed RUNTIME_DIRECTORY. What
worked before was --keep-unit --system but we can't use XDG_RUNTIME_DIR
and --user because then --keep-unit breaks which we need because it
can't create a scope as there is no session. Switch back to
runtime_directory which handles RUNTIME_DIRECTORY and tells us whether
we should use it as is without later cleanup or if we need to use the
regular path where we create and delete the directory ourselves.
NEWS: pre-announce removal of /run/boot-loader-entries/ support in logind
logind could read UAPI.1 Boot Loader Spec entries from
/run/boot-loader-entries/ in addition to ESP/XBOOTLDR. This was pretty
half-assed, and to my knowledge was never actually used much.
Let's remove support for it and simplify our codebase.
Let's schedule it for removal via NEWS in a future version, to give
people a chance to speak up.
- Use persist-credentials: false for actions/checkout, so we don't
leak the github token credentials to subsequent jobs.
- Remove one / from the Edit/Write permissions. Currently, with the
absolute path from github.workspace, we expand to three slashes while
we only need two.
Ivan Kruglov [Mon, 13 Apr 2026 09:53:23 +0000 (02:53 -0700)]
varlink: move shared enum types to varlink-idl-common
Move ExecOutputType, CGroupPressureWatch, EmergencyAction and
ManagedOOMMode enum type definitions from varlink-io.systemd.Unit to
varlink-idl-common, as these types are shared across multiple varlink
interfaces.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Kai Lüke [Mon, 13 Apr 2026 12:21:39 +0000 (21:21 +0900)]
vmspawn: Support RUNTIME_DIRECTORY again
In ccecae0efd ("vmspawn: use machine name in runtime directory path")
support for RUNTIME_DIRECTORY was dropped which makes it difficult to
run systemd-vmspawn in a service unit which doesn't have write access
to the regular /run but should use its own managed RUNTIME_DIRECTORY.
What worked before was --keep-unit --system but we can't use
XDG_RUNTIME_DIR and --user because then --keep-unit breaks which
we need because it can't create a scope as there is no session.
Switch back to runtime_directory which handles RUNTIME_DIRECTORY and
tells us whether we should use it as is without later cleanup or if we
need to use the regular path where we create and delete the directory
ourselves.
many: final final set of coccinelle check-pointer-deref tweaks (#41595)
I promised in https://github.com/systemd/systemd/pull/41426 that its the
final update for coccinelle pointer deref checks. However it turned out
there is this coccinelle/parsing_hacks.h that I wasn't aware of. The
file missed some important things like _cleanup_(x) that prevented
coccinelle to check a bunch of functions.
This PR adds some missing defines to the parsing_hacks.h and fixes the
missing asserts(). I apologize that its a bit long (and frankly boring)
and that I missed this earlier.
The last commit contains one small behavior change (ret in
sd_varlink_idl_parse() is now really optional) but the big one is very
mechanical.
This is useful when moving from `--pty` or `--pipe` to using
`--verbose`: you can use `--verbose-output=cat` to get similar output on
stdout while still having all of the advantages of `--verbose` over the
other options.
stat-util: always check S_ISDIR() before S_ISLNK()
Check S_ISDIR() before S_ISLINK() for all stat_verify_xyz() helpers
first, where we check them. Just to ensure we systematically return the
same errors.
Milan Kyselica [Sat, 11 Apr 2026 08:26:13 +0000 (10:26 +0200)]
boot: fix loop bound and OOB in devicetree_get_compatible()
The loop used the byte offset end (struct_off + struct_size) as the
iteration limit, but cursor[i] indexes uint32_t words. This reads
past the struct block when end > size_words.
Use size_words (struct_size / sizeof(uint32_t)) which is the correct
number of words to iterate over.
Also fix a pre-existing OOB in the FDT_BEGIN_NODE handler: the guard
i >= size_words is always false inside the loop (since the loop
condition already ensures i < size_words), so cursor[++i] at the
boundary reads one word past the struct block. Use i + 1 >= size_words
to check before incrementing.
Milan Kyselica [Sat, 11 Apr 2026 08:25:19 +0000 (10:25 +0200)]
boot: fix integer overflow and division by zero in BMP splash parser
Bound image dimensions before computing row_size to prevent overflow
in the depth * x multiplication on 32-bit. Without this, crafted
dimensions like depth=32 x=0x10000001 wrap to a small row_size that
passes all subsequent checks.
Reject channel masks where all bits are set (popcount == 32), since
1U << 32 is undefined behavior and causes division by zero on
architectures where it evaluates to zero. Move the validation before
computing derived values for clarity. Use unsigned 1U in shifts to
avoid signed integer overflow UB for popcount == 31.
journal: limit decompress_blob() output to DATA_SIZE_MAX (#41604)
We already have checks in place during compression that limit the data
we compress, so they shouldn't decompress to anything larger than
DATA_SIZE_MAX unless they've been tampered with. Let's make this
explicit and limit all our decompress_blob() calls in journal-handling
code to that limit.
One possible scenario this should prevent is when one tries to open and
verify a journal file that contains a compression bomb in its payload:
$ systemd-run --user --wait --pipe -- build-local/journalctl --verify --file=$PWD/test.journal
Running as unit: run-p682422-i4875779.service
000110: Invalid hash (00000000 vs. 11e4948d73bdafdd)
000110: Invalid object contents: Bad message
File corruption detected at /home/fsumsal/repos/@systemd/systemd/test.journal:272 (of 1249896 bytes, 0%).
FAIL: /home/fsumsal/repos/@systemd/systemd/test.journal (Bad message)
Finished with result: exit-code
Main processes terminated with: code=exited, status=1/FAILURE
Service runtime: 48.051s
CPU time consumed: 47.941s
Memory peak: 8G (swap: 0B)
```
Same could be, in theory, possible with just `journalctl --file=`, but
the reproducer would be a bit more complicated (haven't tried it, yet).
Lastly, the change in journal-remote is mostly hardening, as the maximum
input size to decompress_blob() there is mandated by MHD's connection
memory limit (set to JOURNAL_SERVER_MEMORY_MAX which is 128 KiB at the
time of writing), so the possible output size there is already quite
limited (e.g. ~800 - 900 MiB for xz-compressed data).
Daan De Meyer [Mon, 22 Dec 2025 10:22:34 +0000 (11:22 +0100)]
nspawn: Add --restrict-address-families= option
Add a new --restrict-address-families= command line option and
corresponding RestrictAddressFamilies= setting for .nspawn files to
restrict which socket address families may be used inside a container.
Many address families such as AF_VSOCK and AF_NETLINK are not
network-namespaced, so restricting access to them in containers
improves isolation. The option supports allowlist and denylist modes
(via ~ prefix), as well as "none" to block all families, matching the
semantics of RestrictAddressFamilies= in unit files.
The address family parsing logic is extracted into a shared
parse_address_families() helper in parse-helpers.c, which is now also
used by config_parse_address_families() in load-fragment.c.
This is currently opt-in. In a future version, the default will be
changed to restrict address families to AF_INET, AF_INET6 and AF_UNIX.
Daan De Meyer [Fri, 27 Mar 2026 22:03:14 +0000 (22:03 +0000)]
systemctl: replace kexec-tools dependency with direct kexec_file_load() syscall
Replace the fork+exec of /usr/bin/kexec in load_kexec_kernel() with a
direct kexec_file_load() syscall, removing the runtime dependency on
kexec-tools for systemctl kexec.
The kexec_file_load() syscall (available since Linux 3.17) accepts
kernel and initrd file descriptors directly, letting the kernel handle
image parsing, segment setup, and purgatory internally. This is much
simpler than the older kexec_load() syscall which requires complex
userspace setup of memory segments and boot protocol structures — that
complexity is the raison d'être of kexec-tools.
The implementation follows the established libc wrapper pattern: a
missing_kexec_file_load() fallback in src/libc/kexec.c calls the
syscall directly when glibc doesn't provide a wrapper (which is
currently always the case). The syscall is not available on all
architectures — alpha, i386, ia64, m68k, mips, sh, and sparc lack
__NR_kexec_file_load — so the wrapper and caller are guarded with
HAVE_KEXEC_FILE_LOAD_SYSCALL to compile cleanly everywhere.
When kexec_file_load() rejects the kernel image with ENOEXEC (e.g. the
image is compressed or wrapped in a PE container that the kernel's kexec
handler doesn't understand natively), we attempt to unwrap/decompress
and retry. This is effectively the same decompression and extraction
logic that already lives in src/ukify/ukify.py (maybe_decompress() and
get_zboot_kernel()), now implemented in C so that systemctl can handle
it natively without shelling out to external tools:
- Compressed kernels (Image.gz, Image.zst, Image.xz, Image.lz4): the
format is detected by magic bytes (per RFC 1952, RFC 8878,
tukaani.org xz spec, and lz4 frame format spec) and decompressed to
a memfd using the existing decompress_stream_*() infrastructure plus
the new gzip support from the previous commit. This is primarily
needed on arm64 where kexec_file_load() only accepts raw Image files.
On x86_64, bzImage is already the native format and works directly.
- EFI ZBOOT PE images (vmlinuz.efi): detected by "MZ" + "zimg" magic
at the start of the file. The compressed payload offset, size, and
compression type are read from the ZBOOT header defined in
linux/drivers/firmware/efi/libstub/zboot-header.S.
- Unified Kernel Images (UKI): detected as PE files with a .linux
section via the existing pe_is_uki() infrastructure. The .linux
section (kernel) and optionally .initrd section are extracted to
memfds. When a UKI provides an embedded initrd and the boot entry
doesn't specify one separately, the embedded initrd is used.
The try-first-then-decompress approach means we never decompress
unnecessarily: on x86_64 the first kexec_file_load() call succeeds
immediately with the raw bzImage, and on architectures where the
kernel's kexec handler natively understands PE (like LoongArch with
kexec_efi_ops), ZBOOT/UKI images work without decompression too.
If kexec_file_load() is unavailable (architectures without the syscall)
or all attempts fail, we fall back to forking+execing the kexec binary.
This preserves compatibility on architectures like i386 and mips where
only the older kexec_load() syscall exists and kexec-tools is needed to
handle the complex userspace setup.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
compress: rework decompressor_detect() on top of compression_detect_from_magic()
Replace the duplicated magic byte signatures in decompressor_detect()
with a call to the new compression_detect_from_magic() helper and use a
switch statement to initialize the appropriate decompression context.
time-util: encode our assumption that clock_gettime() never can return 0 or USEC_INFINITY
We generally assume that valid times returned by clock_gettime() are > 0
and < USEC_INFINITY. If this wouldn't hold all kinds of things would
break, because we couldn't distuingish our niche values from regular
values anymore.
Let's hence encode our assumptions in C, already to help static
analyzers and LLMs.
One more round, this time with the help of the claudebot, especially for
spelunking in git blame to find the original commit and writing commit
messages from the list of warnings exported from coverity
Co-developed-by: Claude
[claude@anthropic.com](mailto:claude@anthropic.com)
core: varlink enum for io.systemd.Unit interface (#40972)
Convert string fields to varlink enums in io.systemd.Unit
Following
https://github.com/systemd/systemd/pull/39391#discussion_r2489599449,
convert all configuration setting fields in the io.systemd.Unit varlink
interface from bare SD_VARLINK_STRING to proper enum types, adding type
safety to the IDL.
This converts ~30 fields across ExecContext, CGroupContext, and
UnitContext, adding 25 new varlink enum types.
Weak compatibility breakage (per
https://github.com/systemd/systemd/pull/40972#issuecomment-4222294318):
Varlink enum identifiers cannot contain - or +, so affected values are
underscorified on the wire. For example, "tty-force" becomes tty_force,
"kmsg+console" becomes kmsg_console.
journal: limit decompress_blob() output to DATA_SIZE_MAX
We already have checks in place during compression that limit the data
we compress, so they shouldn't decompress to anything larger than
DATA_SIZE_MAX unless they've been tampered with. Let's make this
explicit and limit all our decompress_blob() calls in journal-handling
code to that limit.
One possible scenario this should prevent is when one tries to open and
verify a journal file that contains a compression bomb in its payload:
$ systemd-run --user --wait --pipe -- build-local/journalctl --verify --file=$PWD/test.journal
Running as unit: run-p682422-i4875779.service
000110: Invalid hash (00000000 vs. 11e4948d73bdafdd)
000110: Invalid object contents: Bad message
File corruption detected at /home/fsumsal/repos/@systemd/systemd/test.journal:272 (of 1249896 bytes, 0%).
FAIL: /home/fsumsal/repos/@systemd/systemd/test.journal (Bad message)
Finished with result: exit-code
Main processes terminated with: code=exited, status=1/FAILURE
Service runtime: 48.051s
CPU time consumed: 47.941s
Memory peak: 8G (swap: 0B)
Same could be, in theory, possible with just `journalctl --file=`, but
the reproducer would be a bit more complicated (haven't tried it, yet).
Lastly, the change in journal-remote is mostly hardening, as the maximum
input size to decompress_blob() there is mandated by MHD's connection
memory limit (set to JOURNAL_SERVER_MEMORY_MAX which is 128 KiB at the
time of writing), so the possible output size there is already quite
limited (e.g. ~800 - 900 MiB for xz-compressed data).
Michael Vogt [Sun, 12 Apr 2026 13:47:48 +0000 (15:47 +0200)]
coccinelle: add SIZEOF() macro to work-around sizeof(*private)
We have code like `size_t max_size = sizeof(*private)` in three
places. This is evaluated at compile time so its safe to use. However
the new pointer-deref checker in coccinelle is not smart enough to know
this and will flag those as errors. To avoid these false positives
we have some options:
1. Reorder so that we do:
```C
size_t max_size = 0;
assert(private);
max_size = sizeof(*private);
```
2. Use something like `size_t max_size = sizeof(*ASSERT_PTR(private));`
3. Place the assert before the declaration
4. Workaround coccinelle via SIZEOF(*private) that we can then hide
via parsing_hacks.h
5. Fix coccinelle (OCaml, hard)
6. ... somehting I missed?
None of these is very appealing. I went for (4) but happy about
suggestions.
Michael Vogt [Sat, 11 Apr 2026 17:52:33 +0000 (19:52 +0200)]
sd-varlink: make ret optional in sd_varlink_idl_parse()
We have a test failure where the testsuite is calling
sd_varlink_idl_parse() with *ret being NULL. This is now an
assert error. So we could either fix the test or fix the code
Given that it seems genuinely useful to run sd_varlink_idl_parse()
without *ret to e.g. just check if the idl is valid I opted to
fix the code.
test-json: add iszero_safe guards for float division at index 0 and 1
The existing iszero_safe guards at index 9 and 10 were added to
silence Coverity, but the same division-by-float-zero warning also
applies to the divisions at index 0 (DBL_MIN) and 1 (DBL_MAX).
debug-generator: assert breakpoint type is valid before bit shift
The BreakpointType enum includes _BREAKPOINT_TYPE_INVALID (-EINVAL),
so Coverity flags the bit shift as potentially using a negative shift
amount. Add an assert to verify the type is in valid range, since the
static table only contains valid entries.
uid-range: add assert to prevent underflow in coalesce loop
Coverity flags range->n_entries - j as a potential underflow
in the memmove size calculation. Add assert(range->n_entries > 0)
before decrementing n_entries, which holds since the loop condition
guarantees j < n_entries.
sd-varlink: scale down the limit of connections per UID to 128
1024 connections per UID is unnecessarily generous, so let's scale this
down a bit. D-Bus defaults to 256 connections per UID, but let's be even
more conservative and go with 128.
Michael Vogt [Tue, 31 Mar 2026 17:01:28 +0000 (19:01 +0200)]
tools: run check-coccinelle.sh with (updated) parsing_hacks.h
This commit runs the check-coccinelle checker scripts with the
parsing_hacks.h. Because this was missing before there were some
issues that did not get flagged.
While at it it also adds some missing cleanup attributes and
iterators to get better results. Its a bit sad that there is no
(easy/obvious) way to detect when new things are needed for
parsing_hacks.h
Coverity was complaining that we we're doing a integer division and then
casting that to double. This was OK, but it was also a bit pointless.
An operation on a double and unsigned promoted the unsigned to a double,
so it's enough if we have a double somewhere as an argument early enough.
Drop noop casts and parens to make the formulas easier to read.
Sometimes we want need to diff two unsigned numbers, which is awkward
because we need to cast them to something with a sign first, if we want
to use abs(). Let's add a helper that avoids the function call
altogether.
Also drop unnecessary parens arounds args which are delimited by commas.
Coverity complains that r is overridden. In fact it isn't, but
we shouldn't set it like this anyway. exec_with_listen_fds() already
logs, so we only need to call _exit() if it fails.