newa(t, n) already allocates sizeof(t) * n bytes, so previously we'd
actually allocate sizeof(t) * sizeof(t) * n bytes, which is ~16x more
(on x86_64) that we actually needed.
The shutdown interface is currently only exposed via dbus. This PR
adds a comparable varlink implementation. It is inspired by the existing
dbus methods and implements PowerOff, Reboot, Halt, Kexec, SoftReboot
as varlink methods on a new io.systemd.Shutdown interface.
It is (intentional) simpler than the dbus for now, i.e. no Can* yet,
mostly
because I want to get feedback first (happy to do that in a followup).
The only real difficulty here is what to do about
verify_shutdown_creds()
as this is needed by both dbus and varlink and its dbus only. I went for
an ugly but (hopefully) pragmatic choice (see the commit message for
details). But I can totally understand if a refactor instead is
preferred.
Help users with incorrect / permission bits (#41431)
This error causes the computer to pass the emergency.target and go to
graphical.target.
Then, your window manager will have problems because it cant access any
directories, your network manager wont startup the network. In my case,
the screen just goes black. Ideally, you'd get an error message
explaining this edge scenario that's occuring to you, and an emergency
shell that makes it easy to run the necessary chmod 0755 / to proceed
with booting. IDK not sure if this is the correct way to implement this,
sorry it's my first contribution.
I ran
`meson test -C build`
and got
Ok: 1806
Fail: 26
Skipped: 25
on my cloned systemd repo before any changes, and got the same result
after my commit ¯\_(ツ)_/¯
So I hope I did that right.
Thanks
report: add cgroup metrics in a separate varlink service (#41489)
Add CpuUsage, MemoryUsage, IOReadBytes, IOReadOperations, and
TasksCurrent in a standalone socket-activated varlink service. These
metrics are gathered from the kernel via cgroup files and PID1's only
role is mapping unit names to cgroup paths — a separate process can
query PID1 once for that mapping and then read the cgroup files
directly, minimizing PID1 involvement.
The new systemd-report-cgroup-metrics service listens at
/run/systemd/report/io.systemd.CGroup and exposes:
- io.systemd.CGroup.CpuUsage
- io.systemd.CGroup.IOReadBytes
- io.systemd.CGroup.IOReadOperations
- io.systemd.CGroup.MemoryUsage (with type=current/available/peak)
- io.systemd.CGroup.TasksCurrent
This is spun out of #41078 and based on top of it. Will rebase once
that's merged.
Milan Kyselica [Thu, 9 Apr 2026 17:43:14 +0000 (19:43 +0200)]
resolved: replace assert() with error return in DNSSEC verify functions
dnssec_rsa_verify_raw() asserts that RSA_size(key) matches the RRSIG
signature size, and dnssec_ecdsa_verify_raw() asserts that
EC_KEY_check_key() succeeds. Both conditions depend on parsed DNS
record content. Replace with proper error returns.
The actual crypto verify calls (EVP_PKEY_verify / ECDSA_do_verify)
handle mismatches fine on their own, so the asserts were also redundant.
While at it, fix the misleading "EC_POINT_bn2point failed" log message
that actually refers to an EC_KEY_set_public_key() failure.
claude-review: improve review quality for large PRs
Several issues were identified from analyzing logs of a large (52-commit) PR
review:
- Claude was batching multiple commits into a single review agent instead of
one per worktree. Strengthen the prompt to explicitly prohibit grouping.
- Claude was reading pr-context.json and commit messages before spawning
agents despite instructions not to, wasting time. Tighten the pre-spawn
rules to only allow listing worktrees/ and reading review-schema.json.
- Subagents were spawned with model "sonnet" instead of "opus". Add explicit
instruction to use opus.
- After agents returned, Claude spent 9 minutes re-verifying findings with
bash/grep/sed commands, duplicating the agents' work. Add instruction to
trust subagent findings and only read pr-context.json in phase 2.
- Subagents returned markdown-wrapped JSON instead of raw JSON arrays. Add
instruction requiring raw JSON output only.
- Each subagent was independently reading review-schema.json. Instead have
the main agent read it once and paste it into each subagent prompt.
- The "drop low-confidence findings" instruction was being used to justify
dropping findings that Claude itself acknowledged as valid ("solid cleanup
suggestions", "reasonable consistency improvement"). Remove the instruction.
- Simplify the deduplication instructions
- Stop adding the severity to the body in the post processing job as claude is
also adding it so they end up duplicated.
azureuser [Tue, 3 Mar 2026 08:41:45 +0000 (08:41 +0000)]
resolved: skip cache flush on server switch/re-probe when StaleRetentionSec is set
manager_set_dns_server() and dns_server_flush_cache() call dns_cache_flush()
unconditionally, wiping the entire cache even when StaleRetentionSec is
configured. This defeats serve-stale by discarding cached records that should
remain available during server switches and feature-level re-probes.
The original serve-stale commit (5ed91481ab) added a stale_retention_usec
guard to link_set_dns_server(), and a later commit (7928c0e0a1) added the
same guard to dns_delegate_set_dns_server(), but these two call sites in
resolved-dns-server.c were missed.
This is particularly visible with DNSOverTLS, where TLS handshake failures
trigger frequent feature-level downgrades and re-probes via
dns_server_flush_cache(), flushing the cache each time.
Add the same stale_retention_usec guard to both call sites so that cache
entries are allowed to expire naturally via dns_cache_prune() when
serve-stale is enabled.
Fixes: #40781
This commit was prepared with assistance from an AI coding agent (GitHub
Copilot). All changes have been reviewed for correctness and adherence to the
systemd coding style.
Yaping Li [Fri, 3 Apr 2026 05:01:15 +0000 (22:01 -0700)]
report: add cgroup metrics in a separate varlink service
Add CpuUsage, MemoryUsage, IOReadBytes, IOReadOperations, and
TasksCurrent in a standalone socket-activated varlink service.
The new systemd-report-cgroup service listens at
/run/systemd/report/io.systemd.CGroup and exposes:
- io.systemd.CGroup.CpuUsage
- io.systemd.CGroup.IOReadBytes
- io.systemd.CGroup.IOReadOperations
- io.systemd.CGroup.MemoryUsage (with type=current/available/peak)
- io.systemd.CGroup.TasksCurrent
Multiple callers of cg_get_keyed_attribute() follow the same pattern of
reading a single keyed attribute and then parsing it as uint64_t with
safe_atou64(). Add a helper that combines both steps.
Convert all existing single-key + uint64 call sites in cgtop, cgroup.c,
and oomd-util.c to use the new helper.
With the old version there was a potential connection count leak if
either of the two hashmap operations in count_connection() failed. In
that case we'd return from sd_varlink_server_add_connection_pair()
_before_ attached the sd_varlink_server object to an sd_varlink object,
and since varlink_detach_server() is the only place where the connection
counter is decremented (called through sd_varlink_close() in various
error paths later _if_ the "server" object is not null, i.e. attached to
the sd_varlink object) we'd "leak" a connection every time this
happened. However, the potential of abusing this is very theoretical,
as one would need to hit OOM every time either of the hashmap operations
was executed for a while before exhausting the connection limit.
Let's just increment the connection counter after any potential error
path, so we don't have to deal with potential rollbacks.
Milan Kyselica [Thu, 9 Apr 2026 17:45:19 +0000 (19:45 +0200)]
udev: fix bounds check in dev_if_packed_info()
The check compared bLength against (size - sizeof(descriptor)), which
is an absolute limit unrelated to the current buffer position. Since
bLength is uint8_t (max 255), this can never exceed size - 9 for any
realistic input, making the check dead code.
Use (size - pos) instead so the check actually catches descriptors
that extend past the end of the read data.
Daan De Meyer [Sat, 28 Mar 2026 23:21:18 +0000 (23:21 +0000)]
compress: consolidate all compression into compress.c with dlopen
Move the push-based streaming compression API from import-compress.c
into compress.c and delete import-compress.c/h. This consolidates all
compression code in one place and makes all compression libraries
(liblzma, liblz4, libzstd, libz, libbz2) runtime-loaded via dlopen
instead of directly linked.
Introduce opaque Compressor/Decompressor types backed by a heap-
allocated struct defined only in compress.c, keeping all third-party
library headers out of compress.h.
Rewrite the per-codec fd-to-fd stream functions as thin wrappers around
the push API via generic compress_stream()/decompress_stream() taking a
Compression type parameter. Integrate LZ4 into this framework using the
LZ4 Frame API, eliminating all LZ4 special-casing.
Extend the Compression enum with COMPRESSION_GZIP and COMPRESSION_BZIP2
and add the corresponding blob, startswith, and stream functions for
both.
Rename the ImportCompress types and functions: ImportCompressType becomes
the existing Compression enum, ImportCompress becomes Compressor (with
Decompressor typedef), and all import_compress_*/import_uncompress_*
become compressor_*/decompressor_*. Rename dlopen_lzma() to dlopen_xz()
for consistency. Make compression_to_string() return lowercase by
default.
Add INT_MAX/UINT_MAX overflow checks for LZ4, zlib, and bzip2 blob
functions where the codec API uses narrower integer types than our
uint64_t parameters.
Migrate test-compress.c and test-compress-benchmark.c to the TEST()
macro framework, new assertion macros, and codec-generic loops instead
of per-codec duplication.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Revert "mkosi: Mark minimal images as Incremental=relaxed"
The setting has fundamental flaws that can't be easily fixed
(see https://github.com/systemd/mkosi/pull/4273) so revert it's
use as we're dropping it in systemd. Image builds will take a bit
longer again until I figure out a proper fix for this.
vconsole-setup: skip setfont(8) when the console driver lacks font support
Don't run setfont(8) on consoles that don't support
fonts. systemd-vconsole-setup neither fails nor reports errors on such consoles
unlike setfont(8) which emits the following error [1]:
systemd-vconsole-setup[169]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version
The check already existed in setup_remaining_vcs() but it was performed too
late.
Michael Vogt [Tue, 7 Apr 2026 15:54:28 +0000 (17:54 +0200)]
sd-varlink: use MSG_PEEK for protocol_upgrade connections
When there is a potential protocol upgrade we need to be careful that
we do not read beyond our json message as the custom protocol may be
anything. This was archived via a byte-by-byte read. This is of course
very inefficient. So this commit moves to use MSG_PEEK to find the
boundary of the json message instead. This makes the performance hit
a lot smaller.
Michael Vogt [Tue, 7 Apr 2026 15:47:50 +0000 (17:47 +0200)]
varlink: use single byte reads on SD_VARLINK_SERVER_UPGRADABLE
When the server side of a varlink connection supports connection
upgrades we need to go into single byte-read mode to avoid the
risk of a client that sends the json to protocol upgrade and then
immediately the custom protocol payload. This commit implements
this.
The next step is using MSG_PEEK to avoid the single-byte overhead.
Michael Vogt [Sun, 5 Apr 2026 08:05:30 +0000 (10:05 +0200)]
libsystemd,varlink: always return two fds in varlink upgrade API
This commit tweaks the API of sd_varlink_call_and_upgrade and
sd_varlink_reply_and_upgrade to return two independent fds even
if the internal {input,output}_fd are the same (e.g. a socket).
This makes the external API easier as there is no longer the risk
of double close. The sd_varlink_call_and_upgrade() is not in a
released version of systemd yet so I presume it is okay to update
it still.
This also allowed some simplifications in varlinkctl.c now that
the handling is easier.
Michael Vogt [Thu, 2 Apr 2026 07:38:41 +0000 (09:38 +0200)]
varlinkctl: add new `serve` verb to allow wrapping command in varlink
With the new protocol upgrade support in varlinkctl client we can
now do the equivalent for the server side. This commit adds a new
`serve` verb that will serve any command that speaks stdin/stdout
via varlink and its protocol upgrade feature. This is the
"inetd for varlink".
This is useful for various reasons:
1. Allows to e.g. provide a heavily sandboxed io.myorg.xz.Decompress
varlink endpoint, c.f. xz CVE-2024-3094)
2. Allow sftp over varlink which is quite useful with the
varlink-http-bridge (that has more flexible auth mechanism than
plain sftp).
3. Makes testing the varlinkctl client protocol upgrade simpler.
4. Because we can.
Extract the fd-handling logic from sd_varlink_call_and_upgrade() into a
shared static helper so that it can be reused by the upcoming server-side
sd_varlink_reply_and_upgrade().
compress: write sparse files when decompressing to regular files
Core dumps are often very sparse, containing large zero-filled regions
whose actual disk usage can be significantly reduced by preserving
holes. Previously, decompress_stream() always wrote dense output,
expanding all zero regions into allocated disk blocks.
Each decompression backend (xz, lz4, zstd) now auto-detects whether the
output fd is suitable for sparse writes via a shared should_sparse()
helper. The check requires both S_ISREG (regular file) and !O_APPEND,
since O_APPEND causes write() to ignore the file position set by
lseek(), which would collapse the holes and corrupt the output. For
pipes, sockets, and append-mode files, dense writes are preserved via
loop_write_full() with USEC_INFINITY timeout, matching the original
behavior. After sparse decompression, finalize_sparse() sets the final
file size to account for any trailing holes.
This is transparent to callers — all public signatures are unchanged.
coredumpctl benefits automatically:
- coredumpctl debug: temp file in /var/tmp is now sparse
- coredumpctl dump -o file: output file is now sparse
- coredumpctl dump > file: redirected stdout is now sparse
- coredumpctl dump | ...: pipe output unchanged (dense)
- coredumpctl dump >> file: append mode, falls back to dense
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com> Co-developed-by: Codex (GPT-5) <noreply@openai.com>
Daan De Meyer [Fri, 27 Mar 2026 13:26:16 +0000 (14:26 +0100)]
vmspawn: Support direct kernel boot without UEFI firmware
When --linux= specifies a non-PE kernel image, automatically disable
UEFI firmware loading (as if --firmware= was passed). If --firmware=
is explicitly set to a path in this case, fail with an error. Booting
a UKI with --firmware= is also rejected since UKIs require UEFI.
--firmware= (empty string) can also be used explicitly to disable
firmware loading for PE kernels.
Other changes:
- Extract OVMF pflash drive setup into cmdline_add_ovmf()
- Extract kernel image type detection into determine_kernel()
- Add smbios_supported() helper to centralize the SMBIOS availability
check (always available on x86, elsewhere requires firmware)
- Gate SMM, OVMF drives, SMBIOS11 and credential SMBIOS paths
on firmware/SMBIOS being available
- Beef up the credential logic to fall back to fw_cfg and kernel
command line in case SMBIOS is not available
coredumpctll: avoid unnecessary heap copy and decompression for field existence checks (#41520)
`print_list()` and `print_info()` used `RETRIEVE()` to `strndup()` the
entire
`COREDUMP` field into a heap-allocated string, only to check whether it
exists.
With `sd_journal_set_data_threshold(j, 0)` in `print_info()`, this
copies the
full coredump binary (potentially hundreds of MB) to heap just to print
"Storage: journal".
This PR:
1. Makes `sd_journal_get_data()` output parameters optional
(`NULL`-safe), so
callers can do pure existence checks without receiving the data.
2. Short-circuits `maybe_decompress_payload()` after
`decompress_startswith()`
succeeds when neither output pointer is requested, skipping full blob
decompression for compressed journal entries.
3. Switches coredumpctl to pass `NULL, NULL` for the existence checks
instead
of heap-copying via `RETRIEVE()`.
clangd: Strip GCC-only flags and silence unknown-attributes
Several GCC-only options in our compile_commands.json
(-fwide-exec-charset=UCS2, used by EFI boot code for UTF-16 string
literals, and -maccumulate-outgoing-args) cause clangd to emit
driver-level "unknown argument" errors. These can't be silenced through
Diagnostics.Suppress, so remove them via CompileFlags.Remove before
clang ever sees them.
Also suppress the -Wunknown-attributes warning that fires on every use
of _no_reorder_, since meson unconditionally expands it to the GCC-only
__no_reorder__ attribute when configured with GCC.
networkd-wwan: drop unreachable unknown-bearer fallback path
bearer_get_by_path() only succeeds when both modem and bearer are found.
On failure, trying bearer_new_and_initialize(modem, path) was
unreachable and relied on a modem value that is not returned on that
path.
Treat unknown bearers as no-op and rely on modem_map_bearers() for
association during initialization.
coredumpctl: use NULL outputs for COREDUMP existence checks
print_list() and print_info() used RETRIEVE() to strndup() the entire
COREDUMP field into a heap-allocated string, only to check whether it
exists. With sd_journal_set_data_threshold(j, 0) in print_info(),
this copies the full coredump binary (potentially hundreds of MB) to
heap just to print "Storage: journal".
Now that sd_journal_get_data() accepts NULL output pointers, use a
direct NULL/NULL existence check instead.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
sd-journal: skip full decompression when caller only checks field existence
When both ret_data and ret_size are NULL after decompress_startswith()
has confirmed the field matches, skip the decompress_blob() call.
This avoids decompressing potentially large payloads (e.g. inline
coredumps) just to discard the result.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
sd-journal: make sd_journal_get_data() output params optional
Allow callers to pass NULL for ret_data and/or ret_size when they only
need to check whether a field exists. Initialize provided output
pointers to safe defaults and update the manual page accordingly.
Propagate the NULL-ness through to journal_file_data_payload() so that
downstream helpers can optimize for the existence-check case.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
tmpfiles: skip redundant label writes to avoid unnecessary timestamp changes
When systemd-tmpfiles processes a 'z' (relabel) entry, fd_set_perms()
unconditionally calls label_fix_full() even when mode, owner, and group
already match. This causes setfilecon_raw() (SELinux) or xsetxattr() (SMACK)
to write the security label even if it is already correct, which on some
kernels updates the file's timestamps unnecessarily.
Fix this by comparing the current label with the desired label before
writing, and skipping the write when they already match. This is consistent
with how fd_set_perms() already skips chmod/chown when the values are
unchanged.
networkd-wwan: handle link_get_by_name() errors in modem_simple_connect()
modem_simple_connect() ignored the return value of link_get_by_name()
and then checked link for NULL. Since the helper only sets the output
pointer on success, that could read an indeterminate value.
Check and log the return code directly with log_debug_errno().
Timestamps are not guaranteed to be set by `statx()`, and their presence
should not be asserted as a proxy to judge the kernel version. In
particular, `STATX_ATIME` is omitted from the return when querying a
file on a `noatime` superblock, causing spurious errors from tmpfiles:
```console
# SYSTEMD_LOG_LEVEL=debug systemd-tmpfiles --clean
<...>
Running clean action for entry X /var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-*/tmp
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-prometheus-smartctl-exporter.service-GKguQK/tmp) failed: Protocol driver not attached
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-systemd-logind.service-k8j52T/tmp) failed: Protocol driver not attached
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-irqbalance.service-7RJkev/tmp) failed: Protocol driver not attached
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-chronyd.service-8hkO5G/tmp) failed: Protocol driver not attached
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-dbus-broker.service-6P6LVl/tmp) failed: Protocol driver not attached
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-nginx.service-B5HX8B/tmp) failed: Protocol driver not attached
Running clean action for entry x /var/tmp/systemd-private-94cc8a77688e497f96d5b9019e66ed6f-*
Running clean action for entry q /var/tmp
statx() does not support 'STATX_ATIME' mask (running on an old kernel?)
statx(/var/tmp) failed: Protocol driver not attached
<...>
```
Additionally, refactor `dir_cleanup()` slightly for self-consistency to
make
it evident that the `NSEC_INFINITY` transformation is correct.
fstab-generator: support swap on network block devices
Teach swap units to support the _netdev option as well, which should
make swaps on iSCSI possible. This mirrors the logic we already have for
regular mounts in both the fstab-generator and the core
(mount.c/swap.c).
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
One more round, this time with the help of the claudebot, especially for
spelunking in git blame to find the original commit and writing commit
messages from the list of warnings exported from coverity
Co-developed-by: Claude
[claude@anthropic.com](mailto:claude@anthropic.com)
sysext: provide systemd-{sysext,confext}-sysroot.service services (#41161)
This should pretty much close #38985
The new services are used to activate system and configuration
extensions for the main system from the initrd, this allows to overcome
the limitation that sysext/confext cannot be used to update the
resources which are required in the earliest boot of the system (before
systemd-sysext/systemd-confext start).
To make it possible to disable sysext/confext merging logic,
`systemd.sysext=0`, `systemd.confext=0`, `rd.systemd.sysext=0`,
`rd.systemd.confext=0` kernel cmdline options are introduced.
limits-util: use MUL_SAFE for physical memory calculation
Coverity flags (uint64_t)sc * (uint64_t)ps as a potential overflow.
Use MUL_SAFE which Coverity understands via __builtin_mul_overflow.
Physical page count times page size cannot realistically overflow
uint64_t, but this makes it provable to static analyzers.
Coverity flags si.ssi_signo as tainted data from read(), and warns
that casting it to signed could produce a negative value. Add an
explicit range check against INT_MAX before the SIGNAL_VALID check
to prove the cast is safe.
Coverity flags ALIGN(sizeof(sd_bus_message)) as potentially
returning SIZE_MAX, making the subsequent + sizeof(BusMessageHeader)
overflow. Store the ALIGN result in a local and assert it is not
SIZE_MAX.
sd-bus: use INC_SAFE and assert for message_from_header allocation
Coverity flags ALIGN() as potentially returning SIZE_MAX and the
subsequent a += label_sz + 1 as overflowing. Assert ALIGN result
is not SIZE_MAX and use INC_SAFE for the addition.
Coverity flags now() + 30 * USEC_PER_SEC as overflowing because
now() can return USEC_INFINITY. Use usec_add() which saturates
on overflow instead of wrapping.
Coverity flags sizeof(BusMessageHeader) + ALIGN8(m->fields_size)
as overflowing because ALIGN_TO can return SIZE_MAX as an overflow
sentinel. Assert that the aligned value is not SIZE_MAX to prove
the addition is safe.
recurse-dir: add assert for MALLOC_SIZEOF_SAFE lower bound
Coverity flags MALLOC_SIZEOF_SAFE(de) - offsetof(DirectoryEntries,
buffer) as a potential underflow when MALLOC_SIZEOF_SAFE returns 0.
After a successful malloc the return value is at least as large as
the requested size, but Coverity cannot trace this. Add an assert
to establish the lower bound.
Coverity flags range->n_entries - j - 1 and j-- as potential
underflows. Add an assert that j > 0 before decrementing, since
j starts at i + 1 >= 1 and is never decremented below its
initial value.
scsi_id: null-terminate serial after append_vendor_model
append_vendor_model() uses memcpy() to write VENDOR_LENGTH +
MODEL_LENGTH bytes without null-terminating. While the caller
zeroes the buffer beforehand, Coverity cannot trace this. Add
explicit null termination so the subsequent strlen() is provably
safe.
Uses stop_at_first_nonoption for POSIX-style option parsing.
Includes a fixup for b4df0a9ee62d553e21f3b70c28841cfd1b8736f1, where
global optarg was used instead of the function param. This made no
difference previously because they were always equal.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Michael Vogt [Tue, 24 Mar 2026 14:40:40 +0000 (15:40 +0100)]
logind: log peer ID when shutdown is called
The io.systemd.Manager.{PowerOff,SoftReboot,Halt,Kexec} manager
varlink and bus methods log the peer ID when calling shutdown.
The logind code is missing this, so this commit adds a similar
logging now. The code is quite similar to the one in existing in
src/core/manager.c but its hard to share code so this adds a bit
of duplication.
Michael Vogt [Fri, 20 Mar 2026 12:49:50 +0000 (13:49 +0100)]
logind: extend verify_shutdown_creds() to take `sd_varlink *link`
To properly support a shutdown interface with varlink we need
the functionality of verify_shutdown_creds(). This is currently
dbus only. There are some options:
1. refactor and abstract so that verify_shutdown_creds() is agnostic
2. provide a equivalent function with varlink
3. allow to call it with either a dbus or a varlink message.
The most elegant of course is (1) but it makes reviewing harder
and has a higher regression risk. It will also be more code.
Doing (2) has the risk of drift, i.e. we will need to keep
the two functions in sync and not forget about it ever.
So this commit opts for (3): allowing either dbus or varlink.
This is quite ugly, however the big advantage is that its very
simple to review as the dbus/varlink branches mirror each other.
And there is no risk of drift the dbus/varlink options are
close to each other. It unlikely we get a third bus, so it will
most likely stay this way. It is still ugly though so I can
understand if this is undesired and I can look into (1) if its
too ugly.
With this function avaialble logind-varlink.c is now updated to
use it.
Michael Vogt [Fri, 20 Mar 2026 12:42:30 +0000 (13:42 +0100)]
logind: move verify_shutdown_creds() to logind-shutdown.c
Move verify_shutdown_creds() and its helper have_multiple_sessions()
from logind-dbus.c to logind-shutdown.c so that they can be reused
by the varlink transport. No functional changes.
Also prefix both with `manager_` now that they are public.
The shutdown interface is currently only exposed via dbus. This commit
adds a comparable varlink implementation. It is inspired by the existing
dbus methods and implements PowerOff, Reboot, Halt, Kexec, SoftReboot.
It is (intentional) simpler than the dbus for now, i.e. strictly root
only. To match dbus we will need to the functionality
of verify_shutdown_creds() which is dbus-ish right now and would
need some refactor.
For the same reason it does not do the Can* methods - we will need
the verify_shutdown_creds() equivalent first.
Michael Vogt [Fri, 20 Mar 2026 12:37:42 +0000 (13:37 +0100)]
logind: move reset_scheduled_shutdown() to new logind-shutdown.c
This function operates on generic Manager state and will be needed
by the varlink shutdown interface too. Move it out of logind-dbus.c
into a new logind-shutdown.c, alongside the SHUTDOWN_SCHEDULE_FILE
define and use `manager_reset_scheduled_shutdown() as the new
name.
coredumpctl: use loop_write() for dumping inline journal coredumps
Replace the bare write() call with loop_write(), which handles short
writes and EINTR retries. This also drops the now-unnecessary ssize_t
variable and the redundant r = log_error_errno(r, ...) self-assignment,
since loop_write() already stores its result in r.
vmspawn: Always enable CXL on supported architectures
Drop the --cxl= option and unconditionally enable cxl=on the QEMU
machine type whenever the host architecture supports it (x86_64 and
aarch64). The flag was only added for testing parity with mkosi's CXL=
setting and there is no reason to leave it as an opt-in toggle: with no
pxb-cxl device or cxl-fmw window attached, enabling it on the machine
only reserves a small MMIO region and emits an empty CEDT, so the cost
is negligible while removing one knob users would otherwise have to
flip explicitly to exercise the CXL code paths in QEMU.
Reject entries once the configured maximum field count is reached.
The previous check used n > ENTRY_FIELD_COUNT_MAX before appending a new field,
which let one extra field through in boundary cases. Switch the check to
n >= ENTRY_FIELD_COUNT_MAX so an entry at the limit is rejected before adding
another property.
Jonas Rebmann [Tue, 7 Apr 2026 09:03:48 +0000 (11:03 +0200)]
test-specifier: update comment to moved file
src/partition/repart.c was renamed to src/repart/repart.c in commit 211d2f972dd1 ("Rename src/partition to src/repart"), update the comment
accordingly.
The -E short option previously used fallthrough into the --more case;
since macro-generated case labels don't support fallthrough (with some
older compilers), the --more logic is now duplicated inline in the -E
handler.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
shared/options: quote the metavar in --help output
imdsd uses --extra-header='NAME: VALUE'. We could include the quotes
in the metavar string, but I think it's nicer to only do that in the
printed output, so that later, when we add introspection, the value
there will not include the quotes.
Vitaly Kuznetsov [Thu, 19 Mar 2026 15:04:34 +0000 (16:04 +0100)]
sysext: provide a cmdline kill switch for the sysext/confext merging logic
While it is possible to disable sysext/confext merging in the main system
with 'systemctl disable', sysext/confext are always merged in the initrd,
both by systemd-{sys,conf}ext-initrd.service and by
systemd-{sys,conf}ext-sysroot.service and especially the latter can be
unexpected. Provide kernel cmdline options systemd.{sys,conf}ext=0 and
rd.systemd.{sys,conf}ext=0 covering all options.
Vitaly Kuznetsov [Wed, 18 Mar 2026 16:09:24 +0000 (17:09 +0100)]
sysext: provide systemd-{sysext,confext}-sysroot.service services
The new services are used to activate system and configuration extensions
for the main system from the initrd, this allows to overcome the limitation
that sysext/confext cannot be used to update the resources which are required
in the earliest boot of the system (before systemd-sysext/systemd-confext
start).
- Fix sd_json_variant_unsigned() dispatching to the wrong accessor
for json variant references.
- Fix a use-after-free of a borrowed varlink reply reference in
ssh-proxy.
vmspawn: use machine name in runtime directory path (#41530)
Replace the random hex suffix in the runtime directory with the machine
name, changing the layout from /run/systemd/vmspawn.<random> to
/run/systemd/vmspawn/<machine-name>/.
This makes runtime directories machine-discoverable from the filesystem
and groups all vmspawn instances under a shared parent directory,
similar to how nspawn uses /run/systemd/nspawn/.
Use runtime_directory_generic() instead of runtime_directory() since
vmspawn is not a service with RuntimeDirectory= set and the
$RUNTIME_DIRECTORY check in the latter never succeeds. The directory is
always created by vmspawn itself and cleaned up via
rm_rf_physical_and_freep on exit. The parent vmspawn/ directory is
intentionally left behind as a shared namespace.
Ivan Shapovalov [Fri, 20 Mar 2026 15:45:07 +0000 (16:45 +0100)]
tmpfiles: do not mandate `STATX_ATIME` and `STATX_MTIME`
Timestamps are not guaranteed to be set by `statx()`, and their presence
should not be asserted as a proxy to judge the kernel version. In
particular, `STATX_ATIME` is omitted from the return when querying a
file on a `noatime` superblock, causing spurious errors from tmpfiles.
Correctness analysis
====================
The timestamps produced by the `statx()` call in `opendir_and_stat()`
are only ever used once, in `clean_item_instance()` (lines 3148-3149)
as inputs to `dir_cleanup()`. Convert absent timestamps into
`NSEC_INFINITY` as per the previous commit.
Ivan Shapovalov [Fri, 20 Mar 2026 15:36:44 +0000 (16:36 +0100)]
tmpfiles: use `NSEC_INFINITY` consistently in dir_cleanup()
Correctness analysis
====================
The *time_nsec variables are used for a total of 2 or 3 times:
- twice in needs_cleanup() (lines 788, 839)
- once in a recursive dir_cleanup() (line 764) as self_*time_nsec
In needs_cleanup(), all passed timestamps are guarded against
NSEC_INFINITY (this does not fix any real bugs as a 0 value is also
older than any cutoff point and thus would not cause any deletions).
Recursively in dir_cleanup(), the self_* variables are used to reset
the toplevel directory utimes, where they are superficially compared
against NSEC_INFINITY as a guard, but subsequently mishandled in the
case when only one of the times is NSEC_INFINITY: in this case, it will
be a) logged as a bogus value and b) passed through directly to
timespec_store_nsec(), which does special-case it, but in a way that
is invalid for futimens(). This is further fixed up by explicitly
mapping NSEC_INFINITY to TIMESPEC_OMIT.
This constitutes a bugfix in theory, as a ~STATX_ATIME return from
statx() would have previously caused the corresponding utime to be
reset to 0 epoch) rather than being omitted from being set. However,
in a directory with ~STATX_ATIME, attempts to set atime would likely
be ignored as well.
Mostly this is a self-consistency fix that establishes that
dir_cleanup() should be called with NSEC_INFINITY in place of
absent timestamps.