strxcpyx: add a paranoia check for vsnprintf()'s return value
vsnprintf() can, under some circumstances, return negative value, namely
during encoding errors when converting wchars to multi-byte characters.
This would then wreak havoc in the arithmetics we do following the
vsnprintf() call. However, since we never do any wchar shenanigans in
our code it should never happen.
Let's encode this assumption into the code as an assert(), similarly how
we already do this in other places (like strextendf_with_separator()).
iovec-wrapper: rename iovw_append() to iovw_extend()
The naming is consistent with strv_extend().
This also
- introduces tiny iovw_extend_iov() wrapper,
- refuse when the source and target points to the same object,
- check the final count before extending in iovw_extend_iovw().
repart: add EncryptKDF= option for LUKS2 partitions
systemd-repart currently creates LUKS2 encrypted partitions using
libcryptsetup's default KDF (Argon2id), which requires ~1GB of memory
during key derivation. This is too much for memory-constrained
environments such as kdump with limited crashkernel memory, where
luksOpen fails due to insufficient memory.
Add an EncryptKDF= option to repart.d partition definitions that allows
selecting the KDF type. Supported values are:
- "argon2id" — Argon2id with libcryptsetup-benchmarked parameters
- "pbkdf2" — PBKDF2 with libcryptsetup-benchmarked parameters
- "minimal" — PBKDF2 with SHA-512, 1000 iterations, no benchmarking,
matching the existing cryptsetup_set_minimal_pbkdf() behaviour used
for TPM2-sealed keys
When not specified, the libcryptsetup default (argon2id) is used,
preserving existing behaviour.
The KDF type is applied via sym_crypt_set_pbkdf_type() after
sym_crypt_format() and before any keyslots are added.
These don't make too much sense on their own, but they also don't really
hurt. They are preparation for #41543, but in order to make things
either to review I split these four commits out, since they are not
directly part of what the PR shall achieve
The NOTES section in os-release(5) contains an unusual formatting.
Switch function and ulink tags and remove a newline within ulink text to
keep the entry formatting in sync with others. Also, this preserves the
formatting within the text itself.
mountpoint-util: initialize mnt_id for name_to_handle_at(AT_HANDLE_MNT_ID_UNIQUE)
Suppress the following message:
```
$ sudo valgrind --leak-check=full build/networkctl dhcp-lease wlp59s0
==175708== Memcheck, a memory error detector
==175708== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==175708== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==175708== Command: build/networkctl status wlp59s0
==175708==
==175708== Conditional jump or move depends on uninitialised value(s)
==175708== at 0x4BC33D1: inode_same_at (stat-util.c:610)
==175708== by 0x4BF1972: inode_same (stat-util.h:86)
==175708== by 0x4BF48FE: running_in_chroot (virt.c:817)
==175708== by 0x4B16643: running_in_chroot_or_offline (verbs.c:37)
==175708== by 0x4B175CE: _dispatch_verb_with_args (verbs.c:136)
==175708== by 0x4B17868: dispatch_verb (verbs.c:160)
==175708== by 0x407CBB: networkctl_main (networkctl.c:249)
==175708== by 0x407D06: run (networkctl.c:263)
==175708== by 0x407D39: main (networkctl.c:266)
==175708==
```
Not sure if it is an issue in valgrind or glibc, but at least there is
nothing we can do except for working around it.
sleep: convert to "verbs", using the new option+verb macros
We had verb-like dispatch, but done in a manual way. We have a fairly
heavy preperation steps that wraps all operations in the same way, so we
don't want to call the operation implementation functions directly. But
let's use the generic verb machinery and pass the state directly using
the userdata pointer and the recently added verb data pointer.
--help output is substantially the same, but options are now in a new
section below the verbs.
bootctl: make bootspec-util.c independent of bootctl.c
This changes boot_config_load_and_select() to also take the root path as
input, just like the ESP and XBOOTLDR path.
This has the benefit of making the whole file independent of bootctl.c,
which means we can link it into a separate test, and is preparatory work
for a follow-up commit.
If unprivileged_mode is false then verify_esp() will treat access errors
like any other and log about them. Here we set it to false, hence
there's no point to log a 2nd time.
boot: never auto-boot a menu entry with the non-default profile
When figuring out which menu entry to pick by default, let's not
consider any with a profile number > 0. This reflects that fact that
additional profiles are generally used for
debug/recovery/factory-reset/storage target mode boots, and those should
never be auto-selected. Hence do a simple check: if profile != 0, simply
do not consider the entry as a default.
We might eventually want to beef this up, and add a property one can set
in the profile metadata that controls this behaviour, but for now let's
just do a this simple fix.
namespace: don't log misleading error in the r > 0 path
fd_is_fs_type() returns < 0 for errors, 0 for false, and > 0 for true, so
in the r > branch we'd most likely report EPERM together with the error
message which is misleading.
Allows appending kernel command line arguments, like
kexec-tool does. This is especially needed for the integration
tests, as mkosi adds a bunch of options that are needed for the
test suite to work, and it breaks without them.
The interface of this program was rather strange. It took an option that
specified what to do, but that option behaved exactly like a verb. Let's
change the interface to the more modern style with verbs. Since the
inteface was documented in the man page, provide a compat shim to handle
the old options.
(In practice, I doubt anybody will notice the change. But since it was
documented, it's easier to provide the compat then to think too much
whether it is actually needed. I think we can drop it an year or so.)
Extend fake-report-server.py with optional --cert, --key, --port
arguments for TLS support. Add a test case that generates a
self-signed certificate and tests HTTPS upload of metrics and facts.
Also exercise the --header param.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a fake HTTP server (fake-report-server.py) that accepts JSON POST
requests and validates the report structure, and test cases in
TEST-74-AUX-UTILS.report.sh that exercise plain HTTP upload of both
metrics and facts.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
timesync: verify the actual size of the received data
iov.iov_len doesn't change after calling recvmsg() so it remains set to
sizeof(ntpmsg), which makes the check for a short packet always false.
Let's fix that by checking the actual size of the received data instead.
ci: Switch PR review workflow to Opus 4.7 via Mantle endpoint
Opus 4.7 is in research preview on Bedrock and the Invoke API rejects
the beta headers Claude Code sends ("invalid beta flag"). Enable the
Mantle endpoint, which serves Claude via the native Anthropic API shape
and accepts those headers, and switch the model ID to the Mantle form
(no region prefix or version suffix).
All non-test users iovec_wrapper define the struct as a field in a
bigger structure, so we never free it individually. Let's simplify the
code and assume it is never null.
journal-upload: require TLS 1.2 as the minimum version
RFC 8996 says:
> This document formally deprecates Transport Layer Security (TLS)
> versions 1.0 (RFC 2246) and 1.1 (RFC 4346). Accordingly, those
> documents have been moved to Historic status. These versions lack
> support for current and recommended cryptographic algorithms and
> mechanisms, and various government and industry profiles of
> applications using TLS now mandate avoiding these old TLS versions.
> TLS version 1.2 became the recommended version for IETF protocols in
> 2008 (subsequently being obsoleted by TLS version 1.3 in 2018),
> providing sufficient time to transition away from older versions.
> Removing support for older versions from implementations reduces the
> attack surface, reduces opportunity for misconfiguration, and
> streamlines library and product maintenance.
This code probably only talks to our own receiver which uses
libmicrohttpd. That in turn delegates to GnuTLS, which supports
1.2, 1.3, 3.0, etc.
Previously we compiled curl-util.c at least two times, and then also
shared it using the extract+object. Let's build a static "convenience lib"
for it.
(Using extract+object everywhere is not possible because the different
places where it is used are conditionalized independently so we don't
have a single "source" that is always available.)
cgls: fix/update/restore the handling of --xattr and --cgroup-id
This is a bit tricky. Previously, the --help string said
"-x --xattr=BOOL", which normally means that '-x BOOL' and '--xattr BOOL'
and '--xattr=BOOL' are all accepted and equivalent. But actually only
the third form was accepted. '-x' should have been and is now documented
as "Same as --xattr=true". The man page tried to explain this, but not
very strongly. So update the man page to have more emphasis and restore
the special behaviour for -x and -c. This is a on old program, so I
think in this case, maintaining compatiblity in behaviour is important.
arg_names is changed to be a normal strv. In the new parser code,
the argument is returned as a const char*. Dropping the const
to stuff it into the array would be too ugly.
The metavars for optional args are now shown in --help.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
shared/options: add equivalent of "-" in getopt_long
The parsing "mode" is specified as an exclusive mode, i.e. a combination
of "+" and "-" is not supported. In principle this could be supported,
but we don't use that in our code and are unlikely ever to do so.
Kai Lüke [Thu, 16 Apr 2026 06:24:27 +0000 (15:24 +0900)]
sysupdated: don't crash when an mstack machine image is found
As soon as machinectl list-images has an mstack entry updatectl fails
because systemd-sysupdated crashes with an assertion failing because
the mstack case was not handled.
For now mstack is not supported as image for sysupdate to operate on
and we can skip it.
This will make it easier to add new modes of operation later.
But I'm happy with how this came out — I think the mode setting
is nicer to read then the old bool.
shared: move src/import/curl-util.h to src/shared/
Move more common definitions in the header file instead of repeating
them in bunch of places. src/import/curl-util.[ch] is renamed so that
it's shared more naturally with other components.
sd-varlink: Don't log successful sentinel error dispatch as a failure
sd_varlink_error() deliberately returns a negative errno mapped from
the error id on success so callbacks can `return sd_varlink_error(...);`
to enqueue the reply and propagate a matching errno at once. When
varlink_dispatch_method() dispatches a configured error sentinel itself,
it doesn't need that mapping — but it was treating any negative return
as a dispatch failure and logging "Failed to process sentinel" even
though the error reply had been successfully enqueued.
Detect success via the state transition to VARLINK_PROCESSED_METHOD
instead, so only genuine enqueue failures are logged.
systemd-vmspawn: QMP-varlink bridge for VM runtime control (#41449)
systemd-vmspawn currently has zero runtime control over the VMs it
launches. It can kill QEMU (SIGTERM) or SSH in, but it cannot pause,
resume, request a graceful power-off, query status, or
react to VM events. QEMU exposes all of this via its QMP protocol;
systemd's native IPC is varlink. This series bridges the two.
machined stores the controlAddress but never connects to vmspawn.
machinectl discovers the address from Machine.List and connects
directly. Socket mode 0600 is the access-control boundary —
the socket is rooted in vmspawn's $RUNTIME_DIRECTORY, so only the UID
that launched the VM can talk to it.
QMP client library (src/shared/qmp-client.{c,h})
A small non-blocking QMP client modeled on sd-varlink's pump contract:
- Reference-counted QmpClient with an explicit five-state machine:
HANDSHAKE_INITIAL → HANDSHAKE_GREETING_RECEIVED →
HANDSHAKE_CAPABILITIES_SENT → RUNNING → DISCONNECTED.
- qmp_client_connect_fd() is non-blocking: it wraps the fd in a
JsonStream and returns immediately. The greeting + qmp_capabilities
handshake is driven lazily on the first
qmp_client_invoke() or by the event loop — whichever comes first — so
callers never block during connect.
- qmp_client_attach_event() attaches to sd_event for async operation;
qmp_client_process() performs one pump step (write → dispatch → parse →
read → disconnect) with the same contract as
sd_varlink_process(); qmp_client_wait() blocks until the next I/O event.
- qmp_client_invoke() sends an async command and fires the registered
qmp_command_callback_t with (result, error_desc, error, userdata) on
completion. Synchronous callers drive
process()/wait() in a loop until qmp_client_is_idle() is true.
- QmpClientArgs bundles the JSON arguments and an FD list for a single
command; the QMP_CLIENT_ARGS_FD() macro hands one fd to the callee for
SCM_RIGHTS passing. On partial-stage failure the
args list is narrowed so the caller's cleanup closes only the
untransferred tail.
- Event broadcast to a registered callback via qmp_client_bind_event();
transport loss surfaces through qmp_client_bind_disconnect().
- qmp_schema_has_member() walks the query-qmp-schema result for optional
runtime capability probes.
vmspawn device setup via QMP
vmspawn starts QEMU paused (-S), sets up devices via QMP, then resumes
with cont. The entire device plane moves off the legacy INI config path
and onto the bridge.
A new MachineConfig aggregate in vmspawn-qmp.h groups the per-device
info (DriveInfos, NetworkInfo, VirtiofsInfos, VsockInfo) with a single
machine_config_done() cleanup that chains the
sub-structure destructors; each conversion patch populates exactly the
field it owns.
What the conversion enables:
- FD-based device passing via add-fd / getfd + SCM_RIGHTS — vmspawn
opens every image file, TAP, VSOCK, and virtiofs socket itself and hands
the fd to QEMU. QEMU never needs filesystem
access.
- Ephemeral overlays via blockdev-create + async job-concluded
continuations on anonymous O_TMPFILE / memfd backings — no named overlay
files on disk.
- PCIe root-port pre-allocation for q35/virt machine types so
hotplug-capable slots exist at boot (NVMe, virtio-scsi, etc.).
- io_uring availability probing with automatic fallback to the default
AIO backend if QEMU's build doesn't support it.
Per-command callbacks call sd_event_exit() on setup failure so vmspawn
shuts down cleanly if any device can't be attached.
machinectl integration
- machinectl pause / resume / poweroff / reboot / terminate go through
the varlink control socket for VMs.
- D-Bus fallback for containers: poweroff sends SIGRTMIN+4, terminate
calls the existing TerminateMachine method — unchanged container
behavior.
- Multi-machine parallel dispatch via sd_event for bulk operations
(machinectl pause vm1 vm2 ...) so one slow VM doesn't serialize the
rest.
- SubscribeEvents streaming with per-subscriber event-name filters
(importd Pull-style pattern: initial {ready:true} notify, fan out via
varlink_many_notifybo(), lazy init — QMP event pump
runs only while subscribers exist).
Tests
- Unit test with a mock QMP server covering handshake, command/response,
events, and EOF.
- Integration test against real QEMU (-machine none) exercising
handshake + query-qmp-schema (~200 KB reply, validates the buffered
reader across multiple read()s) and query-status.
- Integration test for the machinectl verbs end-to-end: pause / resume /
describe / subscribe / terminate.
- Integration test for the multi-drive pipeline and ephemeral overlays
(blockdev-create async job continuations).
- Stress test: 5 cycles of start → 3× (pause/describe/resume/describe) →
terminate.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
vmspawn: add integration test for multi-drive and ephemeral QMP setup
Test the async QMP drive pipeline with real QEMU:
Test 1 (multi-drive): launches vmspawn with --image plus two
--extra-drive flags. This exercises multiple fdset allocations,
pipelined blockdev-add commands relying on FIFO ordering, io_uring
retry callbacks, and multiple device_add commands — all fired
without waiting for responses.
Test 2 (ephemeral): launches vmspawn with --image --ephemeral. This
exercises the most complex async path: blockdev-create fires a
background job, JOB_STATUS_CHANGE events are watched via the event
callback, and when the job concludes the deferred continuation fires
the overlay format node + device_add. If the continuation fails, the
root drive is never attached, the kernel panics, and vmspawn exits
without registering — so successful registration proves the pipeline
works.
Both tests use a raw ext4 image with a minimal init (sleep infinity)
and direct kernel boot. No virtiofsd needed.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
vmspawn: add integration test for machinectl VM control verbs
Add TEST-87-AUX-UTILS-VM.vmspawn.sh that validates the QMP-varlink
bridge end-to-end using a real QEMU instance:
- Launches vmspawn with --directory and --linux for direct kernel boot
(no UEFI firmware or bootable image needed)
- Waits for machine registration with machined
- Verifies varlinkAddress is exposed in Machine.List
- Tests machinectl pause, resume, poweroff
- Exercises MachineInstance varlink interface directly via varlinkctl:
QueryStatus state verification across pause/resume, Pause, Resume
Skipped automatically if vmspawn, QEMU, or a bootable kernel is not
available. Runs as part of TEST-87-AUX-UTILS-VM in the mkosi
integration test suite.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
vmspawn: add integration test for QMP client library against real QEMU
Add a test that launches QEMU with -machine none (no bootable image
needed) and exercises the QMP client library against the real QMP
implementation:
- test_qmp_client_qemu_handshake_and_schema: sends query-qmp-schema
(~200KB response that exercises the buffered multi-read() path)
via qmp_client_invoke(), then cleanly shuts down QEMU via quit.
The QMP handshake completes transparently inside invoke().
- test_qmp_client_qemu_query_status: validates query-status response
parsing, stop/cont command sequencing with id correlation, and state
verification between commands
The test is automatically skipped when QEMU is not installed.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
vmspawn: add integration test for QMP client library
Test the QMP client library using a mock QMP server over a socketpair:
- test_qmp_client_basic: Verifies full handshake, query-status with
response parsing, stop/cont commands, and asynchronous STOP event
delivery via the sd-event I/O callback
- test_qmp_client_eof: Verifies that the client properly detects
server disconnection (EOF) and returns a disconnect error
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
man: document machinectl pause/resume and update poweroff for VMs
Add manpage entries for the new pause and resume verbs. Update the
poweroff description to cover VMs (ACPI powerdown via QMP) in addition
to containers (SIGRTMIN+4).
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Each verb discovers the machine's varlinkAddress via machined's
Machine.List, connects directly to vmspawn's varlink socket, and
calls the corresponding io.systemd.MachineInstance method.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>