git.ipfire.org Git - thirdparty/systemd.git/log

vmspawn-varlink: drop AcquireQMP stub and QemuMachineInstance interface

The AcquireQMP() method was a placeholder that always returned
EOPNOTSUPP, reserving room for a future id-rewriting QMP multiplex
proxy. The broader direction is for systemd-vmspawn to remain the single
source of truth for VM control rather than exposing raw QMP to clients.

Since AcquireQMP was the only method on io.systemd.QemuMachineInstance
(and AlreadyAcquired was its only error), remove the whole interface
along with the stub, and update the controlAddress field comment in
io.systemd.Machine to stop referencing it.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: add vmspawn_qmp_remove_block_device

Hot-remove counterpart to vmspawn_qmp_add_block_device. Looks the
drive up in the bridge's block_devices registry by caller-supplied
id and dispatches device_del using the internal qmp_device_id; the
varlink link gets the immediate ack/error reply once QEMU completes
the request.

Concurrency: a second remove for the same id while the first is in
flight (between device_del dispatch and DEVICE_DELETED) would
otherwise reach QEMU and earn a confusing 'already in the process of
unplug' reply. Track the in-flight state with a new
BLOCK_DEVICE_REMOVE_PENDING bit on the existing rollback_mask, and
short-circuit duplicate calls with -EBUSY. The bit is cleared on
device_del failure (the drive is still attached, so retries make
sense) and naturally vanishes on success when the registry entry is
dropped.

DEVICE_DELETED handling: the actual blockdev-del + registry removal
+ pcie-port release is deferred to vmspawn_qmp_dispatch_device_deleted,
which fires from on_qmp_event in vmspawn-varlink.c when the guest
acks the eject. Hooking it from the existing QMP event dispatcher
keeps the cleanup local to vmspawn-qmp.{c,h}.

The function has no varlink callers in this PR — the
io.systemd.VirtualMachineInstance method handler that forwards into
it lands with the rest of the hotplug PR.

Signed-off-by: Christian Brauner <brauner@kernel.org>

vmspawn-qmp: add the hotplug-capable block-device add machinery

This is the bulk of the runtime block-device hotplug feature. The
boot-time qmp_setup_regular_drive() path is rewritten on top of a new
vmspawn_qmp_add_block_device() that owns the DriveInfo, drives a four
QMP-command pipeline (add-fd → blockdev-add → remove-fd → device_add)
through staged ref-counted callbacks, and registers the drive in a
per-bridge block-device registry on success.

New on the bridge:

  - block_devices: user-id → DriveInfo* registry (owned ref).
  - hotplug_port_owner[VMSPAWN_PCIE_HOTPLUG_SPARES]: per-port owner
    string for the spare pcie-root-ports, allocated/released through
    vmspawn_qmp_bridge_{allocate,release_pcie_port_by_idx}().
  - scsi_controller_port_idx / scsi_controller_created: track the
    on-demand virtio-scsi-pci controller so the first SCSI hotplug
    creates it (against an allocated spare port) and subsequent ones
    just attach.
  - next_block_counter: already in place from the previous commit, now
    actually consumed by add_block_device().

New on DriveInfo:

  - bridge (weak), id (varlink-visible — caller-supplied or auto-set
    to qmp_device_id), disk_type (for List replies later), counter,
    qmp_node_name / qmp_device_id (already added), fdset_path,
    pcie_port_idx (the hotplug port reserved by this drive — stays
    set across the add pipeline so drive_info_free releases it when
    the registry ref drops at DEVICE_DELETED time),
    rollback_mask (BlockDeviceAddStage bits of completed stages —
    plus a FAILED sentinel that suppresses cascading errors), and
    link (NULL ⇒ boot-time, sd_event_exit on fail; non-NULL ⇒
    hotplug, varlink reply on fail).

Helpers:

  - drive_info_unref() now releases any reserved hotplug port through
    the bridge and unrefs link.
  - drive_info_add_fail(): single failure entry point — fires the
    teardown for completed stages, sets the FAILED bit, and sends
    either the varlink error or sd_event_exit. Boot-time failures
    (link == NULL) always exit the loop, so late-arriving ephemeral
    continuation replies don't get silenced after cont.
  - vmspawn_qmp_block_device_teardown(): post-hoc blockdev-del when
    blockdev-add succeeded but a later stage failed.
  - reply_qmp_error: varlink reply helper — disconnect errors map to
    io.systemd.MachineInstance.NotConnected, everything else goes
    through sd_varlink_error_errno.
  - on_add_observe_stage / on_add_blockdev_stage /
    on_add_device_add_complete: the staged callbacks that drive the
    add pipeline, each holding one slot ref on the DriveInfo. The
    ephemeral blockdev-create continuation reuses the latter two so
    its post-cont replies go through drive_info_add_fail instead of
    the generic on_qmp_complete (which would silently log under
    setup_done).
  - on_scsi_controller_complete: handles the SCSI controller setup
    (releases the reserved port on failure, propagates the boot-time
    fatal error policy).
  - qmp_build_blockdev_add_inline(): single blockdev-add JSON that
    creates the file+format pair as one node, used by the hotplug
    path (the boot-time helpers stay separate so they can stack with
    the ephemeral path's base/overlay format nodes).

EphemeralDriveCtx is trimmed down to a DriveInfo ref plus the two
ephemeral-local scratch node names (overlay-file, base-fmt). The
copies of disk_driver/serial/pcie_port/flags/qmp_node_name/
qmp_device_id go away — the continuation reads them straight off the
ref'd drive. qmp_setup_ephemeral_drive now sets drive->bridge /
drive->id / drive->counter up front (matching the hotplug path) and
folds the feature-dependent DISCARD_NO_UNREF into drive->flags so
qmp_build_blockdev_add_format picks it up.

vmspawn_qmp_bridge_free() unrefs the qmp client first — its pending
callbacks may still reach for the bridge's hotplug port table when
they drop their last DriveInfo ref — then tears down the hashmaps
and the port owner strings.

The boot-time qmp_setup_regular_drive() collapses to a thin wrapper
that asserts "caller hasn't pre-set drive->id", takes ownership and
calls vmspawn_qmp_add_block_device(); the dispatcher in
vmspawn_qmp_setup_drives() now hands ownership over with TAKE_PTR.
The previous qmp_setup_drive() dispatcher disappears (its body is
inlined into the loop). vmspawn_qmp_init() initialises
scsi_controller_port_idx to -1.

The cosmetic (*d) → DriveInfo *drive = *d; locals in
drives_need_scsi_controller (vmspawn-qmp.c) and assign_pcie_ports
(vmspawn.c) ride along — they live in code touched by this commit
and would otherwise produce churn.

vmspawn_qmp_remove_block_device() — the symmetric remove API — is
added in the next commit so this one stays focused on the add path.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: keep the event loop running on post-setup QMP failures

on_qmp_complete tears the event loop down on any QMP error, which is
the right behaviour while we're still building the VM (a missing
device means we'd boot a half-configured guest). Once the boot-time
setup is finished and the guest is running, killing the event loop on
a QMP error means a single failed runtime command (e.g. a hotplug
device_add that the guest rejects) takes the whole VM down.

Consult bridge->setup_done — already flipped at the end of boot
setup — and skip the event-loop exit once it's set; logging is
sufficient post-setup. The bridge is fetched from the qmp client
userdata, the same pointer the rest of the file uses.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: pipeline remove-fd after each blockdev-add

QEMU keeps a monitor-side fd alive until either an explicit remove-fd
arrives or the fdset's last duplicate is closed. Today vmspawn issues
add-fd but never the matching remove-fd, so each fdset stays around for
the lifetime of the VM even after the consuming blockdev is torn down.
Pipelining a remove-fd directly after the blockdev-add that consumed
the fd hands ownership entirely to the blockdev: the fdset
auto-disposes when raw_close runs at blockdev-del time. This is the
shape needed by hotplug, where blockdev-del must clean up everything
without further coordination.

Mechanically:
- qmp_fdset_add() takes a callback/userdata pair (so callers control
  failure handling) and an optional out-param for the numeric fdset id.
  All boot-time callers keep using on_qmp_complete with a label.
- A new qmp_fdset_remove() helper sends remove-fd with caller-supplied
  callback/userdata.
- qmp_setup_ephemeral_drive captures both fdset ids and fires remove-fd
  immediately after each base/overlay file blockdev-add.
- qmp_setup_regular_drive does the same for its single file blockdev-add.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: derive QMP node and device ids from a bridge counter

Replace the caller-supplied DriveInfo.node_name with two QMP-internal
strings generated at setup time from a monotonic per-bridge counter:

qmp_node_name = "vmspawn-<N>-storage" (blockdev node-name)
qmp_device_id = "vmspawn-<N>-disk" (qdev id)

This is the naming scheme the upcoming runtime-hotplug add path needs:
unique across the lifetime of the VM, decoupled from any user-visible id,
and stable across the four QMP commands that make up an add (add-fd,
blockdev-add, remove-fd, device_add). The boot-time setups don't care
about uniqueness, but switching them now means the hotplug path can share
qmp_build_device_add() and EphemeralDriveCtx without a parallel naming
scheme.

vmspawn.c stops assigning node_name in prepare_primary_drive (which used
the literal "vmspawn") and prepare_extra_drives (which counted
"vmspawn_extra_%zu"); both are replaced by the bridge counter at the
point the drive is actually pushed into QEMU. Ephemeral helper-node
names follow the same vmspawn-<N>-{base-file,base-fmt,overlay-file}
convention; the blockdev-create job-id becomes
vmspawn-<N>-overlay-create.

EphemeralDriveCtx grows a qmp_device_id field (the qdev id is
independent of the format node-name now) and renames node_name →
qmp_node_name to match. No external behaviour change other than the new
internal names.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: convert DriveInfo to a refcounted object

In preparation for runtime block-device hotplug, where in-flight QMP
callbacks need to keep a slot reference on the DriveInfo while the bridge
also holds it in its block-device registry. Today each DriveInfo has
exactly one owner; switch the API from drive_info_free() /
drive_info_freep to drive_info_ref() / drive_info_unref() /
drive_info_unrefp so future code can take additional refs without the
caller losing track of ownership.

drive_info_new() initialises n_ref to 1 (one ref for the caller). The
existing drive_infos_done() and the prepare_*_drive() callers in
vmspawn.c are switched to the unref form. No behaviour change: each
DriveInfo still has exactly one ref at every point in this commit.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-qmp: pass bridge to on_cont_complete via invoke userdata

The callback already has the bridge available — but it was reaching for
it via qmp_client_get_userdata() instead of through its own userdata
parameter. Pass the bridge directly from vmspawn_qmp_start() so the
callback can read its argument the way the rest of the file does. No
behaviour change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-varlink: treat empty event subscription filter as catch-all

A client supplying "filter": [] previously matched no events at all,
because filter is checked with strv_contains() — an unintuitive corner
case. Treat an empty filter strv identically to a NULL filter (deliver
all events) by freeing the empty strv before it lands in the
subscription map. Brings the API closer to least-surprise.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-varlink: extract notify_event_subscribers from on_qmp_event

Pure refactor: factor the subscriber-notification body out of on_qmp_event
into a static helper. on_qmp_event keeps the JOB_STATUS_CHANGE
short-circuit and otherwise delegates to the new helper. No behaviour
change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-varlink: simplify on_qmp_describe_complete result extraction

Lift the running/status extraction out of the inline ternaries inside
SD_JSON_BUILD_PAIR_*() into named local variables with explicit defaults.
Pure readability change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn-varlink: use error < 0 in async QMP completion callbacks

The QMP client always passes either 0 or a negative errno; the != 0 check
flagged values that cannot occur. Switch to the < 0 idiom used elsewhere
in the tree, and reorder on_qmp_simple_complete so the error path is the
first branch (the more conventional shape for callbacks). Equivalent in
behaviour.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: move VMSPAWN_PCIE_HOTPLUG_SPARES to vmspawn-qmp.h

Pure code motion, in preparation for the bridge-side hotplug machinery
that needs the same constant to size its hotplug_port_owner[] array. The
unsigned-suffix on the literal is dropped: the only consumer that compares
against unsigned (vmspawn.c's pcie-port assert) is happy with a plain
integer literal.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

qmp-client: widen next_fdset_id to uint64_t

The fdset id is a monotonic counter; an unsigned int is more than wide
enough today, but uint64_t matches the type of other QMP-internal counters
(e.g. job ids) and avoids any worry about wraparound on long-running
hosts. Update the storage in struct QmpClient, the qmp_client_next_fdset_id()
return type, and the corresponding caller in qmp_fdset_add(), switching the
sprintf format from %u to PRIu64.

No behavioural change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: use qemu_device_driver_to_string() in resolve_disk_driver

Drop the inline DiskType → QEMU device driver switch and call the shared
helper instead. serial_max and the CD-ROM read-only flag stay inline
since they are vmspawn-local.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

shared: extract disk-spec parsing into machine-util

Move the ImageFormat / DiskType enums and their string tables out of
vmspawn's private settings header into a new src/shared/machine-util,
and add parse_disk_spec() — the colon-prefix loop that turns
"[FORMAT:][DISKTYPE:]PATH" into the two enums plus a normalized path.

No behavior change for vmspawn. A follow-up machinectl attach-disk
change accepts the same syntax and consumes the shared helper.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: heap-allocate each DriveInfo individually

Change DriveInfos from a contiguous array of DriveInfo structs to an
array of pointers to individually heap-allocated entries. Make all
DriveInfo string fields owned (strdup'd) and add drive_info_new()/
drive_info_free() constructors matching the cleanup pattern.

This prepares for the block device hotplug work where drives need to
be handed off to a hashmap via TAKE_PTR without copying.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: rename on_qmp_setup_complete() to on_qmp_complete()

Pure rename, no functional change. Prepares for making this callback
handle both boot-time (fatal) and runtime (non-fatal) errors based on
the bridge setup_done flag.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: add QmpClient userdata and VmspawnQmpBridge.setup_done flag

Add qmp_client_set_userdata()/qmp_client_get_userdata() accessors
mirroring the sd_varlink API, and wire up the VmspawnQmpBridge as
userdata on the QmpClient so that command callbacks can retrieve it.

Add a setup_done flag to VmspawnQmpBridge, set by on_cont_complete()
when the VM has booted and all boot-time device setup is finished.
This lets command callbacks differentiate boot-time errors (fatal —
exit event loop) from runtime errors (recoverable — log and continue).

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

test-qmp-client-qemu: exercise add-fd on the first invoke

Covers the SCM_RIGHTS fd-passing path end-to-end against a real QEMU: open an
eventfd, hand it off via QMP_CLIENT_ARGS_FD() on the very first qmp_client_invoke()
against a fresh client, and verify QEMU's add-fd reply carries the expected
fdset-id. Complements the mock-based unit test with an authoritative check that
QEMU actually consumes the fd from its FIFO receive queue when processing the
command — the AF_UNIX kernel behaviour around non-scm skb absorption into
following scm-bearing recvs is real-traffic-shaped rather than mock-shaped.

qmp-client: eagerly enqueue qmp_capabilities on connect, drop the handshake state machine

QEMU's QMP greeting is an unsolicited, informational, server-initiated advertisement
— it doesn't gate commands. The server accepts (and pipelines) commands the instant
the socket is open. We were using it as a trigger to build qmp_capabilities and
blocking callers via qmp_client_ensure_running() until the reply came back.

Build qmp_capabilities inside qmp_client_connect_fd() instead and let the JsonStream
output queue preserve FIFO ordering between it and any user command a subsequent
invoke() enqueues. That satisfies QEMU's only ordering requirement (cap must precede
other commands) without any blocking in the send path.

Fallout:
  - Collapse QmpClientState to {RUNNING, DISCONNECTED}. The three HANDSHAKE_* states
    and the QMP_CLIENT_STATE_IS_HANDSHAKE() macro go away.
  - Drop qmp_client_dispatch_handshake(); fold greeting-drop into dispatch_reply as
    a one-line shape check.
  - Drop qmp_client_ensure_running() and its qmp_client_send() call site. send()
    now only refuses when state == DISCONNECTED.
  - The qmp_capabilities reply lands on an ordinary slot whose callback logs a
    protocol-level error and force-disconnects if cap negotiation failed, matching
    the old EPROTO behaviour at the same observable boundary.
  - qmp_client_phase() no longer special-cases the old handshake states; it maps
    directly to READING / AWAITING_REPLY based on whether slots are in flight.

Test updates:
  - qmp_client_first_invoke_with_fd → qmp_client_invoke_with_fd. The scenario it
    was pinned to (push_fd+invoke staging order) has been structurally impossible
    since the QmpClientArgs rework in 8ad4adcb6f; eager-cap removes it a second
    way. The test now covers end-to-end fd-passing on the first invoke, accepting
    either recv carrying the single SCM_RIGHTS fd (AF_UNIX absorbs non-scm skbs
    forward into the next scm-bearing skb's recv, so the fd may surface with cap
    or add-fd depending on kernel scheduling — QEMU's FIFO fd queue handles either).
  - qmp_client_invoke_failure_closes_fds restructured around the new invariant:
    invoke no longer blocks and no longer returns ENOTCONN for a dead peer, so
    the fd-leak assertion moves to "still open while the JsonStream queue owns it,
    closed on client teardown" and the nested block is flattened into an explicit
    qmp_client_unref().

json-stream: stop concatenating fd-bearing queue items with prior output-buffer bytes

json_stream_format_queue() drains queued output items into the output buffer and
stages their fds in n_output_fds, relying on the downstream sendmsg() to deliver
bytes-and-ancillary atomically as one SCM_RIGHTS message. If the output buffer
already holds bytes (from a prior fast-path enqueue that hasn't been sent yet or
from a partial write), concatenating a new fd-bearing item's JSON into it means
the next sendmsg() ships the combined bytes with those fds attached — violating
the per-message fd boundary on transports where that boundary is load-bearing.

Bail out of the drain loop when we would cross that boundary, so the next
write() first sends the buffered bytes with no ancillary, then pulls the
fd-bearing item into a clean buffer and ships it on its own sendmsg.

This only produces an observable difference for SOCK_SEQPACKET / SOCK_DGRAM
consumers: on those transports each sendmsg() is its own datagram with its own
SCM_RIGHTS cmsg, so whether we concatenate matters. On AF_UNIX SOCK_STREAM
(today's sole consumer shape, used by sd-varlink and the QMP client) the kernel
absorbs a preceding non-scm skb forward into the next scm-bearing skb's recv,
so per-sendmsg separation is invisible to the receiver anyway — the guard is
cheap defensive sender hygiene there, not a behaviour change. It becomes load-
bearing the moment a SEQPACKET/DGRAM consumer wires JsonStream up.

qmp-client: make QmpSlot a public, refcounted, cancellable handle

QmpSlot now stores a back-reference to its QmpClient (mirroring sd_bus_slot)
and is exposed as a public refcounted type via qmp_slot_ref/qmp_slot_unref.
qmp_client_invoke() gains an optional QmpSlot **ret_slot out-parameter
matching sd_bus_call_async(): passing non-NULL hands back a reference whose
unref cancels the pending call (the callback is deregistered; a late reply
is logged and discarded as unknown-id).

Internally slots come in two flavours, following sd_bus's model: floating
(owned by the client's pending set, used when ret_slot is NULL) and
non-floating (ref held by the caller, slot holds a ref on the client).
qmp_slot_disconnect() centralizes the teardown so the reply-dispatched,
explicit-cancel, and client-teardown paths all converge on the same
idempotent cleanup.

qmp_client_call()'s sync slot is now non-floating and observes completion
by watching slot->client go NULL instead of set_contains() on an id.

test-qmp-client: drive the mock QMP servers through JsonStream

The mock servers previously framed messages by hand: loop_write()+"\r\n" on
the way out, a single read(fd, buf, 4095)+sd_json_parse() on the way in,
and a custom recvmsg()+CMSG_FOREACH() for the fd-passing test. That only
works when each write-on-the-wire happens to be delivered in its own
recv(); the moment a test wants to issue back-to-back commands without
waiting for replies, the kernel coalesces them and sd_json_parse() chokes
on two concatenated objects.

Route the mock servers through the same JsonStream transport the client
uses: a tiny mock_qmp_init/recv/send/send_literal layer over json_stream
takes care of the CRLF delimiter, the output queue, and SCM_RIGHTS. The
recv helper loops parse→read→wait so coalesced inbound bytes get fed out
one complete message at a time.

Drop the bespoke mock_qmp_recv_command() and replace it with
json_stream_set_allow_fd_passing_input() +
json_stream_{get_n,take,close}_input_fds() in the fd-first test. EOF
signalling moves from an explicit safe_close() to the
_cleanup_(json_stream_done) on the JsonStream.

A few more conversions to the option and verb macros (#41797)

sysupdate: convert to the new option and verb parsers

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

sysupdate: reorder verb functions and parse_argv cases to match --help

--transfer-source= is moved up to a better place.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

ptyfwd: convert to the new option parser

--help is the same except for common option strings and indentation.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

socket-activate: convert to the new option parser

--help is identical except for whitespace and common option strings.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

sysctl: convert to the new option parser

--help output is the same except for common strings and command
reordering.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

sysctl: rename local Option struct to SysctlOption

Avoid collision with Option struct from options.h (option parser).

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

tty-ask-password-agent: convert to the new option parser

--help is identical except for whitespace.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

keyutil: use OPTION_COMMON macros in a few places

Somehow those slipped through.

storagetm: convert to the new option parser

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

journal-gatewayd: convert to the new option parser

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

inhibit: convert to the new option parser

--help is the same, except for common options and a rewording of
description of --what.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

sd-dhcp-client: several refactoring for state machine of the DHCP client (#41796)

shared: find-esp fsroot check skip

When running with `SYSTEMD_RELAX_ESP_CHECKS=1` the fsroot check is still
being ran; preventing (for example) `bootctl` from operating a on a tree
as it expects a filesystem to be mounted where it finds the ESP (or
XBOOTLDR).

Expand the enum with an additional option to skip the fsroot checks and
enable it by default when `SYSTEMD_RELAX_ESP_CHECKS=1`.

See these RFEs [1], [2] for rationale.

[1]: https://github.com/systemd/systemd/issues/29871
[2]: https://github.com/systemd/systemd/issues/41707

Signed-off-by: Simon de Vlieger <cmdr@supakeen.com>

tree-wide: Load libcrypto and libssl via dlopen()

Until now OpenSSL was linked into every binary and library that needed
cryptography, pulling libcrypto (and, for resolved, libssl) into the
address space of services that never touch them at runtime. This commit
moves all OpenSSL usage behind the same dlopen helper pattern that we
already use for other optional libraries (libpam, libseccomp, libxz, …)
so libcrypto/libssl are only loaded on demand.

The bulk of the work lives in src/shared/crypto-util.{c,h} (libcrypto)
and src/shared/ssl-util.{c,h} (libssl), which replace the previous
src/shared/openssl-util.{c,h}:

- crypto-util.{c,h} declares every libcrypto function we call via
   DLSYM_PROTOTYPE() and resolves them inside dlopen_libcrypto().

- ssl-util.{c,h} holds the libssl-specific DLSYM_PROTOTYPEs,
   dlopen_libssl(), and the SSL_freep cleanup helper, so translation
   units that only need libcrypto do not pull in libssl declarations.

- Callers refer to the symbols through sym_* aliases rather than the
   original names.

- Convenience macros that used to be provided by the OpenSSL headers
   (OPENSSL_free, BN_num_bytes, the sk_TYPE_* helpers, …) are
   reimplemented as sym_* wrappers so no code path needs to fall back
   to the linker-resolved symbols.

- All _cleanup_ helpers are redefined in terms of the sym_* variants
   (EVP_PKEY_freep, X509_freep, BIO_freep, …) so cleanup attributes
   keep working without pulling in libcrypto symbols at link time.

- The public crypto-util.c entry points (openssl_pubkey_from_pem,
   openssl_digest_many, openssl_hmac_many, openssl_cipher_many,
   kdf_ss_derive, kdf_kb_hmac_derive, rsa_* / ecc_* helpers,
   pubkey_fingerprint, digest_and_sign, pkcs7_new, x509_fingerprint,
   openssl_extract_public_key, pkey_generate_volume_keys, the load_*
   helpers, …) now call dlopen_libcrypto() at entry before touching any
   sym_* pointer.

The call sites across the tree have been converted to call
dlopen_libcrypto()/dlopen_libssl() at the appropriate entry point
before their first sym_* use, and to use sym_* variants throughout:

- bootctl, sbsign, measure, pcrlock, pcrextend, tpm2-setup, repart,
   cryptsetup, cryptenroll, homectl, homed, homework, keyutil,
   sysupdate, creds, import, dissect-image, pe-binary, pkcs11-util,
   pkcs7-util, tpm2-util, creds-util. resolved additionally dlopens
   libssl for DoT.

The meson build files are updated to depend on libopenssl_cflags (a
new partial dependency that exposes include paths and compile flags
only, not the linker flags) instead of libopenssl for every target that
previously linked against OpenSSL. Nothing links against libcrypto or
libssl directly anymore.

A new src/sbsign/authenticode.c hosts the Authenticode ASN.1 type
definitions that used to live inline in sbsign.c. The OpenSSL
ASN1_SEQUENCE / ASN1_CHOICE / IMPLEMENT_ASN1_FUNCTIONS macros expand to
code that references libcrypto symbols directly, so to keep this
translation unit unlinked from libcrypto we redirect ASN1_item_* to
the sym_* variants via #define and wrap the ASN1_*_it() getters (which
appear as constant function pointers in static initializers) in small
trampoline functions that forward to the sym_* pointers at runtime.

test-dlopen-so gains assertions for dlopen_libcrypto and dlopen_libssl
so the dlopen contract is exercised in CI, and the openssl-specific
test was renamed from test-openssl.c to test-crypto-util.c to match
the new header naming.

sd-dhcp-client: add FIXME comment about the state callback

At least currently, it is a theoretical concern, as networkd does not
change the client state in the callback.

sd-dhcp-client: rework discover/request_attempts counter

discover_attempts should be reset only when
- the client is stopped, to make the counter starts from zero on
the next invocation.
- we acquire a bound lease, to make the counter starts from zero
when the lease is expired.

request_attempts should be reset only when the client enter a new state
that sends DHCPREQUEST, that is, when enter one of the REBOOTING,
REQUESTING, RENEWING, and REBINDING state.

This moves resetting counter to client_set_state() as it should happen
only when the state transition.

sd-dhcp-client: notify SD_DHCP_CLIENT_EVENT_EXPIRED only when we already have a bound lease

Otherwise, if we emit the notification without a valid bound lease, networkd
may be confused (of course should not, but for safety).

Also, increment the delay before calling client_start_delayed().
Otherwise, the first reboot is done instantaneously.

sd-dhcp-client: propagate failure in setting timer and stop the client

If we fail to setup timer event sources about the lease lifetime or
T1/T2, then the lease will be never updated, and the user (networkd)
will not receive any notification about the expire. The situation is
terrible. Let's stop the client with error code earlier, and notify the
failure to networkd.

sd-dhcp-client: simplify the implementation of IPv6 Only mode support

This drop delay after ACK, as it has many problems. See comment in
sd_dhcp_client_is_waiting_for_ipv6_connectivity() for more details.
This way, the logic becomes much much simpler.

Also, do not restart the client if we lost IPv6 connectivity in
sd_dhcp_client side, but restart the client by networkd. As,
sd_dhcp_client does not know if we can start the client or not,
e.g., the interface may be currently down.

sd-dhcp-client: simply enter renewing/rebinding state send DHCPREQUEST on T1/T2

It is not necessary to enable another timer event source to send
DHCPREQUEST from the T1/T2 timer event source. Just call the callback
function for sending message.

Also, T1 hits only we have a bound lease. Drop spurious conditions.

sd-dhcp-client: initialize event source and so on in client_start_delayed()

When we start the client, any previous state/configuration should be cleaned.
Let's effectively do the same thing as client_initialize() in that
function.

This also several assertions in client_start_delayed() to
sd_dhcp_client_start(). These kind of checks should be done earlier.

sd-dhcp-client: enter the SELECTING state before sending DHCPDISCOVER

Similarly, enter the REBOOTING state before sending DHCPREQUEST on reboot.

Also, this makes DHCPREQUEST message is sent several times also in REBOOTING
state. Previously, we wait about 4 seconds after DHCPREQUEST on reboot, and
entered the init state if no response. Now we wait 1 second after the first
DHCPREQUEST, resend another DHCPREQUEST, wait 2 seconds, then enter the
init state if no response. So, even in the worst case, we have slight
speed up.

sd-dhcp-client: replace max_request_attempts with constant macro

sd-dhcp-client: introduce client_send_discover()

No functional change, just refactoring.

sd-dhcp-client: open socket when necessary and close it when unnecessary

To make gracefully ignore unexpected messages from outside at unexpected timing.
This potentially reduces work load to handle such messages, and slightly
reduces attack surface by malicious DHCP messages.

This also makes the socket fd is owned by the relevant IO event source.

Except for the performance optimization and security hardening, this
should not change any behaviors. So, just refactoring.

sd-dhcp-client: add one missing assertion

Found and suggested by Claude. Nice!

sd-dhcp-client: move the object definition to the header

Then, we can split the long sd-dhcp-client.c into small pieces later.

This also drops redundant typedef, which is also in sd-dhcp-client.h.

networkd: allow route table names for VRF.Table=

Allow `[VRF] Table=` to accept route table names in addition to
numeric table identifiers. These may be predefined route table names
or names configured with `networkd.conf` `RouteTable=`.

There was an earlier attempt to make `VRF.Table=` accept names in
f98dd1e707, but it wired the setting to
`config_parse_route_table()`. That parser was a `[Route]` section
parser, not a generic scalar parser for netdevs: it expected
network/route parser state and created a `Route` object. It was
therefore reverted by 40352cf0c1.

This commit replaces the uint32 parser with
`manager_get_route_table_from_string()`, the generic table parser
already used by route/rule, DHCP/RA `RouteTable=`, and WireGuard
`RouteTable=` in `.netdev` files. The VRF semantics stay
unchanged. The commit retains the existing behavior of the
deprecated `TableId=` field.

Co-developed-by: OpenAI Codex <codex@openai.com>

Add 'data' parameter to options and convert to programs where it is useful (#41786)

qmp-client: add synchronous qmp_client_call()

Add a synchronous counterpart to qmp_client_invoke() that pumps the
client's own process()/wait() loop until the reply for the issued
command id arrives, mirroring sd_varlink_call()'s contract: *ret_result
and *ret_error_desc are borrowed pointers into c->current, valid until
the next qmp_client_call(), and a QMP error surfaces as -EIO when the
caller doesn't ask for the description.

Factor the command-build + slot-insert + enqueue sequence shared with
qmp_client_invoke() into qmp_client_send(). A NULL callback marks the
slot as synchronous: dispatch_reply still matches on id (so unknown
ids continue to be logged and discarded, preserving async-only
robustness), but skips the TAKE_PTR + callback invocation and leaves
c->current pinned for qmp_client_call() to read out.

Cover the three paths in test-qmp-client: successful reply, QMP error
with ret_error_desc, and QMP error returned as -EIO.

dhcp-protocol: introduce several constants, string table lookups, and so on (#41710)

various: use empty block not break after OPTION_GROUP

Use the same style everywhere.

TODO: add one more entry

This is something that should be fixed for usability, but it's something
between a missing feature and a bug. Since nobody has complained about
this, it probably can wait.

measure: also measure forgotten .efifw section

measure: convert to the new option and verb parsers

Previously, we had a nice third 'UKI PE Section' column with the section
names. This is now moved into the help strings, which means that the nice
alignment is lost. Previous behaviour could be restored by constructing
the table manually, but I'm not sure if this is worth the trouble.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

measure: reorder verb functions to match --help

repart: use parse_tristate_argument_with_auto in one more place

Use the new verb+option macros in pcrlock (#41669)

There's a lot of code movement, but the actual changes are
straightforward. Previously, the program was not marked as public, so
the --help/--version interface wasn't tested.

repart: convert to the new option parser

The metavars for a few options were changed to be shorter, so that the
automatic alignment works better. Overall, I think the new version is
as least as legible as the old one.

The synopsis for -S/-C/-P is fixed, they do not take an argument.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

sbsign: convert to the new option and verb parsers

The options --private-key, --private-key-source, --certificate,
--certificate-source are almost identical in sbsign, but are described
slightly differently. Add OPTION_COMMON_ macros that are parametrized
to keep the purpose of the --private-key and --certificate options
in the description.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

test-libudev: convert to the new option parser

The program now has a proper --help output. (Not on purpose. It's just
easier to do same thing as everywhere else.)

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

network-generator: convert to the new option parser

--help is the same except for whitespace.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

shutdown: convert to the new option parser

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

stdio-bridge: convert to the new option parser

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

test-chase-manual: convert to the new option parser

--help now has help strings. --no_autofs is renamed to --no-autofs.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

chase: add explicit root_fd parameter to chaseat() and drop CHASE_AT_RESOLVE_IN_ROOT (#41652)

Split the single directory fd that chaseat() used to take into two
separate
fds: a root_fd that sets the chroot boundary (symlinks may not escape
it,
absolute symlinks resolve relative to it), and a dir_fd that path
resolution
starts from. This makes the chroot semantics of chaseat() explicit at
every
call site instead of encoding them in the CHASE_AT_RESOLVE_IN_ROOT flag,
which is removed. It also decouples the starting directory from the root
boundary, so callers can descend from any inode inside the tree without
having to reopen the root separately.

XAT_FDROOT passed as root_fd means "no containment" (host root); as
dir_fd
it means "start at root_fd". For a smoother transition, AT_FDCWD is also
accepted as root_fd and treated as XAT_FDROOT. When root_fd points to a
directory that is actually the host root, it is normalized to XAT_FDROOT
up front so the existing shortcut path can kick in.

Absolute paths returned by chaseat() are now relative to root_fd, and
relative paths are relative to dir_fd. The result is absolute only when
there is no chroot boundary (root_fd is XAT_FDROOT), or when an absolute
symlink made resolution jump out of the dir_fd subtree; otherwise
callers
get a relative path they can feed straight back into an openat()-style
call against dir_fd. Specifically, when dir_fd == root_fd and we're not
operating on the host's root directory, we return a relative path even
if
we received an absolute path or resolved an absolute symlink to allow
passing the path directly to openat() style functions. We do this to not
have to go modify every caller of chaseat() to make sure they deal
properly
with any absolute paths they might receive. Only when root_fd != dir_fd
do
we have to return an absolute path to indicate that the path is relative
to
root_fd and not dir_fd.

The shortcut that skips the per-component walk is reworked around a new
chase_xopenat() helper that funnels CHASE_NOFOLLOW, CHASE_MUST_BE_* and
CHASE_TRIGGER_AUTOFS through xopenat_full()'s O_NOFOLLOW, O_DIRECTORY,
XO_REGULAR, XO_SOCKET and XO_TRIGGER_AUTOMOUNT flags. As a result these
flags no longer force us off the shortcut and can be dropped from
CHASE_NO_SHORTCUT_MASK, and the old openat_opath_with_automount() helper
goes away. A CHASE_MUST_BE_ANY alias is introduced for shortcut callers
(stat/access paths) that don't go through xopenat_full() and still need
to bail on those flags locally.

All *_and_* helpers built on top of chaseat() (chase_and_openat,
chase_and_opendirat, chase_and_statat, chase_and_accessat,
chase_and_fopenat_unlocked, chase_and_unlinkat,
chase_and_open_parent_at)
gain the same root_fd parameter, and every call site in the tree is
ported to the new signature.

Merge branch 'main' into inode-ref

mkosi: user and group bin needed for a test

* Fix the test TEST-02-UNITTESTS for openSUSE environment.

test: enable check-{help,version}-systemd-pcrlock

This is a normal user-facing program, so it should be tested
in the usual fashion.

pcrlock: convert to the new option and verb parsers

The VERB definitions are done in order to retain the logical
presentation of verbs in lock+unlock pairs.

Previously --help output was too wide, it now fits in 80 columns.
Cosmetic changes in --help output only.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

pcrlock: reorder function definitions to match --help

The order was random, different in the source code, different in the
verb list, different in --help. Order source code by --help to make
the next patch manageable.

man: clarify that /etc/verity.d only parses certificates with the .crt extension (#41790)

Exposed in the dracut testsuite while adding tests for sysexts:

```
[    2.972948] localhost (sd-merge)[510]: Validation of dm-verity signature failed via the kernel, trying userspace validation instead: Required key not available
[    2.972993] localhost (sd-merge)[510]: Skipping file '/etc/verity.d/dracut.pem', suffix is not '.crt'.
[    2.973045] localhost (sd-merge)[510]: No userspace dm-verity certificates found.
```

https://github.com/systemd/systemd/blob/658e5ac06f80ee2078b034f7cc483204d7f91c5e/src/shared/dissect-image.c#L3093

iovec-util: introduce iovec_equal(), and add overflow check (#41700)

man: clarify that /etc/verity.d only parses certificates with the .crt extension

Exposed in the dracut testsuite while adding tests for sysexts:

```
[    2.972948] localhost (sd-merge)[510]: Validation of dm-verity signature failed via the kernel, trying userspace validation instead: Required key not available
[    2.972993] localhost (sd-merge)[510]: Skipping file '/etc/verity.d/dracut.pem', suffix is not '.crt'.
[    2.973045] localhost (sd-merge)[510]: No userspace dm-verity certificates found.
```

dissect-image: fix typo in log message

Revert "resolve: refuse traffic from the local host only for queries"

This reverts commit 526f1594daec073269c3e70ee7914f6dd8740d5c.

This revert is necessary because the change breaks mDNS hostname stability
whenever a DNS-SD service calls UnregisterService. When a service
unregisters (e.g. on process restart), manager_refresh_rrs() clears and
re-adds all RRs in PROBING state, which sends a multicast announcement
(QR=1). The kernel reflects this back to resolved's own socket. Because
the local-address check was moved inside the query-only branch by the
reverted commit, the reply path in on_mdns_packet() is now unguarded.
The looped-back announcement matches the pending probe transaction and
completes it with DNS_TRANSACTION_SUCCESS. Since the zone item is still
in PROBING state (not ESTABLISHED), dns_zone_item_notify() sets
we_lost=true and calls dns_zone_item_conflict(), which invokes
manager_next_hostname() and renames the hostname (e.g. foo.local →
foo4.local). This happens reliably on every restart of any service using
RegisterService/UnregisterService (homebridge, avahi-compat wrappers,
etc.).

The top-level local-address check in on_mdns_packet() suppresses all
looped-back multicast traffic before the reply/query split. Restoring it
there is consistent with the overall design: dns_scope_check_conflicts()
already has its own manager_packet_from_local_address() guard and is
unaffected.

A more targeted long-term fix (e.g. guarding dns_transaction_process_reply()
for mDNS, or avoiding unnecessary re-probing of already-established records
in manager_refresh_rrs()) can be pursued separately.

shared: drop redundant cryptsetup_enable_logging(NULL) calls (#41785)

These were only used to implicitly load libcryptsetup at startup.
dlopen_cryptsetup() now calls cryptsetup_enable_logging(NULL) itself,
and every code path that uses libcryptsetup calls dlopen_cryptsetup()
before doing so, so the upfront calls are no longer needed.

repart: raise log level to LOG_ERR if dlopen_fdisk() fails

libfdisk is required by systemd-repart and it silently exits if dlopen fails
(unless the debug log level is set):

```
$ SYSTEMD_LOG_LEVEL=debug systemd-repart
Shared library 'libfdisk.so.1' is not available: libfdisk.so.1: cannot open shared object file: No such file or directory
$ echo $?
1
```

Follow-up for d49f3f287a0bf72b5b473980cf435f0c0c2413d0

repart: trim NUL bytes from verity sig split artifact

The verity signature partition content is a bare JSON object. Repart
pads it with zeros to fill the GPT partition. But when splitting out
the content as an individual file, the padding remains, so it's not
a valid text file.

jq started rejecting files with NUL bytes to fix a security issue:
https://github.com/jqlang/jq/commit/6374ae0bcdfe33a18eb0ae6db28493b1f34a0a5b

Trim the output when writing these files out.

dissect-image: fix path building for non-raw images (#41674)

If the passed in image path didn't end with .raw, we'd return an empty
string + suffix instead of the intended image + suffix path.

---

Also, fix two more nits that came up repeatedly in my searches.

gpt-auto-generator: do not fail on missing libcryptsetup when verity
is not used

add_veritysetup() is called unconditionally from add_root_mount() and
add_usr_mount() whenever in_initrd() is true, to generate units that
only activate if verity devices appear. However, when compiled without
libcryptsetup, this function returned a hard error, causing the entire
generator to fail even when no verity protection is in use.

Change the #else fallback to log a debug message and return 0, matching
the pattern already used by add_root_cryptsetup().

shared/options: add a 'data' parameter to options

This mirrors a similar field in Verb. In some cases it convenient
to pass a fixed value to the parser.

userdbctl: drop unused variable

measure: fix oom check

Pointed out in review.

meson: move fuzz-journald-util.c to fuzz-journal-audit

The .c file is shared between various fuzz-journal-* binaries. It
was added to 32bd43d768a4bdd54481c5e37ce9ea3d1009a824, but that is
somewhat ugly.

Let's add it to the alphabetially first fuzzer and share from there.

Follow-up for 32bd43d768a4bdd54481c5e37ce9ea3d1009a824 and
85b5acde869baa51f5618fa503eafac3dccbf379.

meson: concatenate donors specified in 'objects'

Previously, we'd only honour the last donor.

shared: drop redundant dlopen_cryptsetup() calls from cryptsetup_* helpers

cryptsetup_set_minimal_pbkdf(), cryptsetup_get_token_as_json() and
cryptsetup_add_token_json() each take a struct crypt_device *cd, which
can only be obtained by first calling sym_crypt_init*() — and that
already requires dlopen_cryptsetup() to have succeeded. The internal
calls here were only implicitly re-loading a library the caller is
guaranteed to have already loaded.

shared: drop redundant cryptsetup_enable_logging(NULL) calls

These were only used to implicitly load libcryptsetup at startup.
dlopen_cryptsetup() now calls cryptsetup_enable_logging(NULL) itself,
and every code path that uses libcryptsetup calls dlopen_cryptsetup()
before doing so, so the upfront calls are no longer needed.

cryptsetup: load libcryptsetup via dlopen in setup binaries

Convert systemd-cryptsetup, systemd-cryptenroll, systemd-veritysetup
and systemd-integritysetup to go through the existing dlopen wrapper
for libcryptsetup instead of linking the library directly. Each binary
calls dlopen_cryptsetup() at the start of its run() and uses the sym_*
variants for every libcryptsetup entry point.

Extend cryptsetup-util.{h,c} to cover the libcryptsetup symbols that
these binaries use and that the wrapper was missing:
crypt_activate_by_token_pin, crypt_deactivate, crypt_init_data_device,
crypt_keyslot_status, crypt_set_keyring_to_link (conditional on
HAVE_CRYPT_SET_KEYRING_TO_LINK), crypt_status and
crypt_token_external_path.

With no direct callers of crypt_free() left, drop the non-sym
crypt_freep cleanup variant and rename sym_crypt_freep back to
crypt_freep via DEFINE_TRIVIAL_CLEANUP_FUNC_FULL_RENAME, matching the
naming convention used by other dlopen wrappers (acl_freep,
xkb_context_unrefp, ...). Update the remaining users in src/shared,
src/repart, src/home and src/growfs to the new name.

The four affected meson targets switch from libcryptsetup to
libcryptsetup_cflags so they no longer record a DT_NEEDED entry for
libcryptsetup.so.12.

repart: Fix xopenat_full() error handling

shared: load libgnutls and libmicrohttpd via dlopen

Convert the GnuTLS and libmicrohttpd usage in journal-remote to the
dlopen pattern used by other optional shared libraries. A new
src/shared/gnutls-util.{h,c} declares the GnuTLS entry points via
DLSYM_PROTOTYPE and resolves them in dlopen_gnutls(); microhttpd-util
is moved from src/journal-remote to src/shared and gains analogous
DLSYM_PROTOTYPEs plus dlopen_microhttpd(). Callers in journal-gatewayd,
journal-remote-main and microhttpd-util itself call the sym_* wrappers
and invoke dlopen_gnutls()/dlopen_microhttpd() at their entry points.

setup_gnutls_logger() no longer fails when libgnutls is missing at
runtime; it logs a notice and returns 0 so journal-gatewayd starts up
without TLS dependencies installed.

The meson files gain libgnutls_cflags and libmicrohttpd_cflags partial
dependencies that expose include paths and compile flags only. Every
systemd-journal-{gatewayd,remote,upload} target switches to the cflags
variant, dropping the direct libgnutls/libmicrohttpd link. The
gatewayd->remote object-sharing dance for microhttpd-util.o goes away
since the code now lives in libshared.

test-dlopen-so gains assertions for dlopen_gnutls and dlopen_microhttpd.

ukify: fix default path for hwids

The documentation and commit that added this seem to suggest this should
be under /usr/lib/systemd

fixes 117ec9db7e71357837190833d7731bc61ae54ecc

test: wrap mount/umount when running with sanitizers

On Fedora Rawhide mount/umount is linked against libsystemd, which then
breaks the binaries in sanitizer runs, as we try to run instrumented
code from an uninstrumented binary:

bash-5.3# ldd /usr/bin/mount
        linux-vdso.so.1 (0x00007fa757ef9000)
        libmount.so.1 => /lib64/libmount.so.1 (0x00007fa757e84000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fa757e51000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa757c56000)
        libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fa757c16000)
        libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007fa757400000)
        libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fa75734f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa757efb000)
        libclang_rt.asan.so => /usr/lib/clang/22/lib/x86_64-redhat-linux-gnu/libclang_rt.asan.so (0x00007fa756800000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa7566e4000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa7566b7000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa756400000)
bash-5.3# mount
==458==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.

This then breaks the whole machine, as mount is quite essential during
boot.

Let's just add mount/umount to the list of wrapped binaries to fix this.

nspawn: add --forward-journal= and --forward-journal-*= options

Add --forward-journal=FILE|DIR to forward the container's journal
entries to the host via systemd-journal-remote. When specified,
nspawn starts systemd-journal-remote listening on a Unix socket,
bind-mounts it into the container at /run/host/journal/socket, and
passes a journal.forward_to_socket credential pointing to it.

Add --forward-journal-max-use=, --forward-journal-keep-free=,
--forward-journal-max-file-size=, and --forward-journal-max-files=
to configure disk usage limits for the forwarded journal.

Consolidate nspawn's per-machine on-disk state under a single runtime
directory at /run/systemd/nspawn/<machine>/. The container rootdir
mount point moves from /tmp/nspawn-root-XXXXXX to <runtime_dir>/root,
the unix-export directory moves from
/run/systemd/nspawn/unix-export/<machine> to <runtime_dir>/unix-export,
and the journal-remote socket lives at
<runtime_dir>/journal-remote-socket. Update ssh-generator and
ssh-proxy to follow the new unix-export path layout.

Extract fork_journal_remote() into fork-notify.{c,h} as a shared
helper used by both nspawn and vmspawn, replacing vmspawn's
start_systemd_journal_remote().

Extract runtime_directory_make() into path-lookup.{c,h} as a shared
helper used by both nspawn and vmspawn, replacing vmspawn's inline
runtime directory creation logic.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

vmspawn,journal-remote: add journal forwarding disk usage options

Add options to vmspawn to configure journal-remote disk usage limits
when forwarding journal entries from the VM. These are passed through
as --max-use=, --keep-free=, --max-file-size=, and --max-files=
command-line arguments to systemd-journal-remote.

Add --max-use=, --keep-free=, --max-file-size=, and --max-files=
command-line options to systemd-journal-remote to allow overriding the
corresponding settings from the configuration file.

Add $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE environment variable support
to systemd-journal-remote. When set, the specified file is used
instead of the default configuration file and drop-in directories.
When set to the empty string or /dev/null, configuration file parsing
is skipped entirely. vmspawn sets this to /dev/null in the child
process to avoid inheriting the host's journal-remote configuration.

Make fork_notify() argv parameter optional. When NULL is passed,
fork_notify() returns 0 in the child (with $NOTIFY_SOCKET set) and
lets the caller run custom code before exec. Returns 1 in the parent.
This allows vmspawn to set environment variables in the child without
polluting the parent process.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>