git.ipfire.org Git - thirdparty/systemd.git/log

vmspawn: multifunction-pack pcie-root-ports on pcie.0

The pre-allocated pcie-root-port block in run_virtual_machine() places
every port directly on pcie.0 with an auto-assigned PCI address. A
minimal VM already costs 4 builtin + 10 hotplug spares = 14 pcie.0
slots, on top of 3 implicit virtio devices (virtio-rng-pci,
virtio-balloon, virtio-serial-pci) for another 3.

pcie.0 has 32 device-numbers; q35 reserves 0x00 (host bridge) and 0x1f
(ICH9 LPC), leaving ~30 auto-assignable slots. TEST-64-UDEV-STORAGE-
nvme_basic pushes 20 '-device nvme' lines through
$SYSTEMD_VMSPAWN_QEMU_EXTRA, which vmspawn does not see — total demand
14 + 3 + 20 = 37 > 30. Bus realization fails after QEMU's chardev has
already emitted the QMP greeting, and the monitor socket POLLHUPs
while we are mid-feature-probe, reported as 'QMP connection dropped
during feature probing'.

Pack the root ports as multifunction devices, 8 per pcie.0 device-
number (QEMU docs/pcie.txt:84, 117-120, 255-258). Function 0 of each
group carries multifunction=on; functions 1-7 ride the same slot via
addr=N.F. Each function remains independently hot-pluggable so
vmspawn's QMP device_add machinery is unaffected. 14 ports collapse to
2 pcie.0 slots; the nvme_basic budget becomes 2 + 3 + 20 = 25.

The chassis/slot properties (used for ACPI hotplug identity) stay as
i+1 — they live in a uint8_t namespace independent of the PCI BDF and
are still unique. Base PCI slot 0x10 sits above the auto-assigned
virtio devices (which land at 0x01-0x03 in config order) and below
the q35 LPC reservation at 0x1f.

While here, rebuild the slot-count formula to match what
assign_pcie_ports() actually allocates. The +1 'SCSI controller' term
was bogus — virtio-scsi-pci comes from the hotplug-spares pool via
hotplug_port_owner[] in vmspawn-qmp.c, never from a builtin port (see
the comment in assign_pcie_ports()). The +1 'network' and +1 'vsock'
terms are now conditional on arg_network_stack and use_vsock. Bind
volumes were missing entirely. And the per-drive accounting now
mirrors assign_pcie_ports()'s skip-SCSI behaviour: non-SCSI drives
(root + extras + bind volumes) take one builtin port each, SCSI
drives take none — they share a controller drawn from the hotplug
pool at device-add time. Cap at 120 ports (15 device-numbers × 8) so
we cannot run off the end of the 5-bit PCI device-number space — the
usable range starting at 0x10 ends at 0x1e because ICH9 LPC sits at
0x1f.0 single-function, blocking the rest of that slot for
multifunction packing.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

dhcp-message: introduce several more functions to parse/append DHCP options (#42063)

Convert loginctl to option and verb macros (#42066)

loginctl: convert to OPTION and VERB macros

--help output is the same, except for the expected formatting changes
and moving of --no-pager/--no-legend/--no-ask-password to the end.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

shared/verbs: allow all groups to be named

When verb groups were added, I assumed that the first group will always
by the unnamed group, or in other words, that VERB_GROUP() line cannot
appear first. This provides an additional check on the whether the verbs
haven't been reordered by the compiler or linker. But that check is weak
and we can do a better check anyway. And this limitation is unexpected,
since we allow that for OPTIONs. The code should all work without an
unnamed group, once this assertion is removed.

vmspawn: add io.systemd.MachineInstance.ReplaceStorage (#42017)

A follow-up to the AddStorage / RemoveStorage series. ReplaceStorage
swaps the *backing file* of an already-attached storage device on a
running vmspawn-managed VM, leaving the guest-visible device frontend
(virtio-blk, virtio-scsi, nvme, scsi-cd) and every other property of
the device untouched. The intended use is to point an existing disk
at a new image without the guest seeing a hot-unplug/hot-plug cycle.

The signature mirrors AddStorage minus the 'config' field: the
device frontend doesn't change, only the backing behind it. Read-
only / read-write is derived from the new fd's O_ACCMODE; scsi-cd is
forced read-only to match the boot-time policy. S_ISBLK on the new
fd selects host_device vs file driver, matching AddStorage.

The QMP primitive is blockdev-reopen. It cannot change a file /
host_device node's 'filename' so we can't just point the existing
file node at a new fd, but it can swap a format node's 'file' child
to a different existing monitor-owned node by node-name reference
(case 3 in qemu/qapi/block-core.json:5034-5040). The chain is:

  add-fd          (host fd → new fdset)
  blockdev-add    (new file node, filename=/dev/fdset/N — fd-only)
  remove-fd       (release monitor's ref; new file holds the dup)
  blockdev-reopen (format node, file = new file node-name)
  blockdev-del    (old file node; its dup release frees old fdset)

The reopen options must restate every option the original blockdev-
add emitted on the format node — blockdev-reopen resets any
unspecified option to its driver default. The 'file' field is a
node-name string reference, never a path.

No new errors and no new IDL types beyond the method itself;
everything is built on the existing NoSuchStorage / StorageImmutable
/ NotConnected / EBUSY vocabulary.

The series is:

  vmspawn: split blockdev-add into separate file and format calls
      Preparatory refactor. qemu/blockdev.c:3440 only marks the
      top-level BDS returned by blockdev-add as monitor-owned;
      inline children are NOT, so blockdev-del later rejects them
      with "Node X is not owned by the monitor". Split into two
      blockdev-add calls so the file node is independently
      deletable. DriveInfo gains qmp_file_node_name and a
      file_generation counter; the teardown helper deletes format
      then file (file-first is rejected as "node used as 'file'
      of Y"). The ephemeral path was already structured this way;
      only the regular add path changes. Drops the now-unused
      qmp_build_blockdev_add_inline().

  shared/varlink-io.systemd.MachineInstance: add ReplaceStorage method
      IDL only: ReplaceStorage(fileDescriptorIndex, name). No new
      errors.

  vmspawn: implement io.systemd.MachineInstance.ReplaceStorage
      vmspawn_qmp_replace_block_device() entry point, ReplaceCtx
      (refcounted, ReplaceCtxStateFlags for partial-state tracking)
      and four async callbacks plus an idempotent replace_fail.
      file_generation is bumped before issuing blockdev-add so
      retries don't collide on node-name.
      BLOCK_DEVICE_STATE_REPLACE_PENDING gates concurrent
      Replace / Remove on the same drive. On reopen success the
      trailing blockdev-del of the old file node fires from the
      reopen callback; its failure logs a warning and still replies
      success (the swap already committed; the orphan resolves at VM
      exit). QMP disconnect mid-replace routes via
      qmp_client_fail_pending → replace_fail → NotConnected.

  test: integration test for io.systemd.MachineInstance.ReplaceStorage
      TEST-87-AUX-UTILS-VM.replace-storage covers happy-path replace,
      successive replaces (file_generation rotation), StorageImmutable
      rejection on the boot-time drive, NoSuchStorage on unknown
      names, InvalidParameter on malformed names, and clean
      RemoveStorage after a replace (proves the new file node is
      monitor-owned and the teardown order works). Backing files are
      passed via 'varlinkctl --push-fd'; no machinectl front-end is
      added in this round.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

report-basic: expose os-release fields as a metric (#41988)

Add io.systemd.Basic.OSRelease metric family that reports all the fields
in os-release.

dhcp-message: introduce dhcp_message_get_option_dnr()

This is for DHCP option 162 (DNR).

dhcp-message: introduce dhcp_message_{append,get}_option_6rd()

These are for DHCP option 212 (6rd).

dhcp-message: introduce dhcp_message_{append,get}_option_routes()

These are for DHCP options 33 (static route), 121 (classless static
route), and 249 (private classless static route).

dhcp: move definition of sd_dhcp_route and related functions to dhcp-route.[ch]

This also renames arguments for storing results.
No functional change, just refactoring and preparation for later commits.

dhcp-message: add SIP server option support

The DHCP option 120 (SIP server) option takes a list of addresses or
domain names, and the first byte in the data classifies which type is
stored. Let's extend _addresses() and _domains() to make them support
the SIP server option.

dhcp-message: introduce dhcp_message_get_option_domains()

This is for e.g. DHCP option 119 (domain search).

dhcp-message: introduce dhcp_message_{append,get}_option_length_prefixed_data()

This is for e.g. User Class option.

dhcp-message: introduce dhcp_message_{append,get}_option_sub_tlv()

This is for e.g. Vendor-Specific Information option.

dhcp: random trivial cleanups (#42061)

sd-bus: handle non-string keys in dictionaries in JSON dump

JSON only supports string keys in objects, but D-Bus specification is a
bit more lenient and allows dict entries to have any basic type as key.
Let's stringify allowed non-string keys so that we can represent them as
JSON objects.

Relevant snippet from the D-Bus specification:

  A DICT_ENTRY works exactly like a struct, but rather than parentheses
  it uses curly braces, and it has more restrictions. The restrictions
  are: it occurs only as an array element type; it has exactly two
  single complete types inside the curly braces; the first single
  complete type (the "key") must be a basic type rather than a container
  type. Implementations must not accept dict entries outside of arrays,
  must not accept dict entries with zero, one, or more than two fields,
  and must not accept dict entries with non-basic-typed keys. A dict
  entry is always a key-value pair.

Resolves: #32904

sd-dhcp-client: always set default broadcast hardware address when unspecified

The default value for InfiniBand is copied from dhcp-network.c.

logind: zero-initialize dispatch struct in vl_method_release_session()

The local struct passed to sd_varlink_dispatch() was not
zero-initialized. Since sd_json_dispatch_full() does not call handlers
for absent optional fields, p.id could be left indeterminate when
the client omits the Id parameter, leading to use of uninitialized
memory.

report-cgroup: use errno_or_else in one more place

Old gcc is confused about initialization:
In function ‘io_read_send’,
    inlined from ‘walk_cgroups’ at ../src/report/report-cgroup.c:288:24:
../src/report/report-cgroup.c:167:21: error: ‘values[0]’ may be used uninitialized [-Werror=maybe-uninitialized]
  167 |                 r = metric_build_send_unsigned(mf + i, link, unit, values[i], /* fields= */ NULL);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maybe this helps.

report-basic: expose os-release fields as metrics

Add io.systemd.Basic.OSRelease metric families that reports select fields
from os-release.

$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="45 (Cloud Edition Prerelease)"
RELEASE_TYPE=development
ID=fedora
VERSION_ID=45
VERSION_CODENAME=""
PRETTY_NAME="Fedora Linux 45 (Cloud Edition Prerelease)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:45"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/rawhide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=rawhide
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=rawhide
SUPPORT_END=2027-11-24
VARIANT="Cloud Edition"
VARIANT_ID=cloud

$ varlinkctl call --more ./build/systemd-report-basic io.systemd.Metrics.List {} | jq --seq -c
...
{"name":"io.systemd.Basic.OSRelease.NAME","value":"Fedora Linux 44 (Workstation Edition)"}
{"name":"io.systemd.Basic.OSRelease.ID","value":"fedora"}
{"name":"io.systemd.Basic.OSRelease.CPE_NAME","value":"cpe:/o:fedoraproject:fedora:44"}
{"name":"io.systemd.Basic.OSRelease.VARIANT_ID","value":"workstation"}
{"name":"io.systemd.Basic.OSRelease.VERSION_ID","value":"44"}
{"name":"io.systemd.Basic.OSRelease.SUPPORT_END","value":"2027-05-19"}

I picked the fields that contain useful information about the specific
version/image/variant/experiment/flavour of the system. Also, either
NAME or PRETTY_NAME is included. This one is intended for human readers
to be able to identify the OS version easily.

report: drop MetricsFamilyContext, CGroupContext, CGroupInfo

Previously, we passed around information about the MetricFamily'ies
and the varlink connection in a helper structure. Having a hybrid of
const static and runtime stuff is iffy. Let's simplify things by passing
two separate parameters.

Also, in report-cgroup.c we built a cache of parsed values. This
requires additional storage requirements and introduces complexity when
dealing with population of the cache at the appropriate time.
This cache is not useful: for each cgroup, we generate a list of
metrics, and we have all the information at hand. The only reason
why we'd create the cache and not generate all the relevant replies
at once was that the helper functions called the .generate function
for each MetricFamily separately.

The MetricFamily interface is changed, so that metrics can be
defined without a .generate function. This is understood to mean
that the preceding metric family's .generate function will also
genarate this family. This allows us to define related metrics
nicely in a table:
  { METRIC_IO_SYSTEMD_CGROUP_PREFIX "CpuUsage", generate_func },
  { METRIC_IO_SYSTEMD_CGROUP_PREFIX "IOReadBytes", NULL },
  { METRIC_IO_SYSTEMD_CGROUP_PREFIX "IOReadOperations", NULL },
  { METRIC_IO_SYSTEMD_CGROUP_PREFIX "SomethingElse", generate_func2 },
  ...
When implementing .Describe, we list all the families. When implementing
.List, we only call those with .generate, and we get the same results
as before.

This allows the .generate functions to be simplified: instead of
keeping state, they just spit out all the metrics for a given
object in a tight loop.

varlink-io.systemd.MachineInstance,vmspawn: treat AddStorage/RemoveStorage name as opaque

The 'name' field on AddStorage and RemoveStorage was documented as
'<provider>:<volume>' and enforced via machine_storage_name_split() at
the varlink boundary. That form is only the convention machinectl
inherits from the StorageProvider routing path; the API itself only
needs a unique identifier the caller can re-use to detach the binding.

Drop the strict format check, require only a non-empty string, and
update the IDL docs to describe the field as a caller-supplied
identifier with machinectl's convention as a non-normative example.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: reject O_PATH and O_WRONLY fds in AddStorage

An fd opened O_PATH cannot be read, and an O_WRONLY fd cannot serve as
a backing file for a virtual disk image. Reject both at the bind-volume
entry point with -EBADF instead of letting the request proceed to QMP
where QEMU's file backend would fail to read from the fd. The
ReplaceStorage entry point grew the same checks in parallel.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

test: integration test for io.systemd.MachineInstance.ReplaceStorage

Modelled on TEST-87-AUX-UTILS-VM.bind-volume.sh. Boots vmspawn with
one boot-time bind-volume, hot-adds a runtime volume via machinectl
bind-volume, then exercises ReplaceStorage:

  1. happy-path replace of a runtime drive
  2. successive replace (verify file_generation rotation — no
     node-name collisions on the second swap)
  3. replace of the boot-time drive must fail with StorageImmutable
  4. replace of an unknown name must fail with NoSuchStorage
  5. invalid name (no provider:volume separator) must fail with
     InvalidParameter
  6. unbind-volume after replace must succeed — proves the new file
     node is monitor-owned and the format-then-file teardown order
     in vmspawn_qmp_block_device_teardown() correctly cleans up both
     blockdev nodes

Pushes the new backing file via varlinkctl --push-fd; the file is a
plain truncate'd image. Auto-discovered by run_subtests in
TEST-87-AUX-UTILS-VM.sh.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: implement io.systemd.MachineInstance.ReplaceStorage

Wire up the runtime hot-swap Varlink method. The signature mirrors
AddStorage minus 'config': the device frontend (virtio-blk,
virtio-scsi, nvme, scsi-cd) doesn't change, only the backing file
behind it. Read-only/read-write may flip based on the new fd's
O_ACCMODE; scsi-cd is forced read-only to match the boot-time policy.

QMP sequence (entry: vmspawn_qmp_replace_block_device):

  add-fd                          → on_replace_observe_stage
  blockdev-add (new file)         → on_replace_blockdev_add_complete
  remove-fd (new fdset)           → on_replace_observe_stage
  blockdev-reopen (format)        → on_replace_blockdev_reopen_complete
                                    [commit + fire trailing del]
  blockdev-del (old file)         → on_replace_old_blockdev_del_complete

The reopen options must be a superset of every option that
qmp_build_blockdev_add_format() may emit, otherwise reopen rejects
'Cannot reset option X to default'. The 'file' field is a string
reference to the new file node — case 3 of the schema in
qemu/qapi/block-core.json:5034-5040 ("the current child is replaced
with that other node"). The format node's qmp_node_name is preserved
so the device frontend's drive=<X> binding does not move.

ReplaceCtx tracks the per-call state with a refcount mirroring the
add-stage drive-info pattern. On any pre-commit failure replace_fail
tears down whatever new-side state we created on the wire and replies
on drive->link via reply_qmp_error (disconnect → NotConnected). On
post-commit del failure we log a warning, leak the orphan, and reply
success — the swap itself succeeded and the leak resolves at VM exit.

file_generation is bumped before issuing blockdev-add so failed
attempts cannot collide on node-name when the user retries.

Errors:
  NoSuchStorage     - drive not in the registry
  StorageImmutable  - drive lacks QMP_DRIVE_REMOVABLE (boot-time)
  EBUSY             - add still pending or another replace/remove in flight
  NotConnected      - QMP transport disconnect during the chain
  EIO               - QEMU rejected blockdev-reopen

Also gates RemoveStorage on REPLACE_PENDING so a device_del cannot
race a mid-flight blockdev-reopen on the same drive.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

shared/varlink-io.systemd.MachineInstance: add ReplaceStorage method

Define the IDL for io.systemd.MachineInstance.ReplaceStorage, a
runtime hot-swap of an already-attached storage volume's backing
file. The signature mirrors AddStorage minus the 'config' field
because the device frontend (virtio-blk, virtio-scsi, nvme, scsi-cd)
does not change — only the backing file behind it.

The implementation lives in vmspawn (next commit) and uses QMP
blockdev-reopen to swap the file child of the existing format node.
The reused error vocabulary (NoSuchStorage, StorageImmutable,
NotConnected, plus the generic errno path) covers every failure
mode; no new errors are added.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

vmspawn: split blockdev-add into separate file and format calls

The current vmspawn_qmp_add_block_device() emits a single blockdev-add
that combines the format-level node ("vmspawn-N-storage") with an
inline file child. QEMU's qmp_blockdev_add() only marks the top-level
returned BDS as monitor-owned (qemu/blockdev.c:3440); inline children
are NOT, so qmp_blockdev_del() rejects them with "Node X is not owned
by the monitor" (qemu/blockdev.c:3513-3517).

To prepare for ReplaceStorage — which needs to swap the file child of
an existing format node via blockdev-reopen, and then blockdev-del the
old file node — make the file node monitor-owned by issuing it as its
own blockdev-add call. The 4-stage add chain becomes 5 stages:

  add-fd
  blockdev-add (file)    → on_add_file_node_stage   sets FILE_NODE_ADDED
  blockdev-add (format)  → on_add_format_node_stage sets BLOCKDEV_ADDED
  remove-fd
  device_add

DriveInfo gains qmp_file_node_name ("vmspawn-N-file-G", G a generation
counter bumped on every replace), file_generation, and a stashed
fdset_id so future ReplaceStorage can target both for cleanup.

vmspawn_qmp_block_device_teardown() now deletes both nodes in order —
format first, then file — because the format holds a strong reference
to its file child and a file-first del is rejected with "Node X is
busy: node is used as 'file' of Y".

Folds bridge->features VMSPAWN_QMP_FEATURE_IO_URING into the file
node's flags so the new path inherits io_uring just like the old
inline form did. The format-level options (read-only, discard,
discard-no-unref) are unchanged.

The ephemeral path is structurally already separate file+format with
monitor-owned children; no behavioural change there beyond the
on_add_blockdev_stage → on_add_format_node_stage rename.

Drops the now-unused qmp_build_blockdev_add_inline() helper.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

loginctl: move options and verbs to match order in --help

First, "output modifier" options --no-pager/--no-legend/--no-ask-password are
moved to the end next to --output and --json. I think it makes sense to group
them. Then the implementing code is reordered to match the order in --help.

journalctl,analyze: use assert_cc in two more places

TODO: fix typo

repart: Add BtrfsReplace= (#41109)

This is a series of commits which adds a feature needed by GNOME OS'
installer. This was show during All Systems Go 2025 talk:
https://cfp.all-systems-go.io/all-systems-go-2025/talk/QRJVL3/

To sum up this PR, this changes first systemd-repart to use BLKPG
partition instead of loop devices when possible. We need then to always
rescan the partitions to try remove partitions if it failed. We allow
encrypted partitions to stay activated and with a chosen name. And we
add a new partition configuration `BtrfsReplace=`.

Note that "replace" comes from the command `btrfs replace`. But in the
case of systemd-repart, maybe "inplace" or "move" would make more sense.
I open to suggestions.

If it is better I can split this into several PRs.

The commits:

## repart: Reuse the backing fd for fdisk

Because fdisk_assign_device tries to open block devices with O_EXCL,
when it does it blocks cryptsetup from using partition block devices for
the same disk.

Since we already have a file descriptor for the device, we can just
share it and use fdisk_assign_device_by_fd instead.

## repart: Use blkpg partitions instead of loop devices when possible

We will want to allow future features to keep some devices mounted or
active. So in order to avoid leaving a mess of many loop devices, we can
just already use the partition block device already.

## repart: Rescan disk on failure if we create blkpg partitions on the
fly

Since we did not write the partition table, then the created partitions
should get removed on error.

## repart: Allow keeping luks2 volumes opened

## repart: Add BtrfsReplace=

BtrfsReplace=/mntpnt will move the btrfs filesystem from mount point to
the partition created. After moving, it will resize to take the whole
partition.
This is useful for OS installers that move a live system into a disk and
do not require a reboot.

## repart: Add VolumeName=

When a luks2 device mapper is to be kept alive after execution of
systemd-cryptsetup, the name of the volume will be taken from this
value.

## test: Add test for repart's BtrfsReplace

test: split unit tests (#42062)

vmspawn: Prefer systemd-journal-remote from $PATH

$PATH might point to a systemd checkout containing
a newer version of systemd-journal-remote which we
should use, hence prefer an executable from $PATH
over the one from /usr/lib/systemd.

Convert journalctl to option macros (#42051)

test: move test cases for client_id_{hash,compare}_func() to test-dhcp-client-id.c

test: move unit test for dhcp_identifier_set_iaid() to test-dhcp-duid.c

dhcp-network: make dhcp_network_send_{raw,udp}_socket() take iovec_wrapper

dhcp: introduce sd_dhcp_message object and several related functions (part 1) (#42047)

test: make TEST-75-RESOLVED robust against journald metadata race

Even after switching the wait loop to a polling `journalctl --grep`, the
test still fails intermittently because the very first messages emitted by
the freshly-spawned systemd-networkd-wait-online process can carry stale
journald metadata. journald associates `_SYSTEMD_UNIT=` (and friends) with
each entry by reading `/proc/$pid/cgroup` of the originating PID; if those
messages are produced before journald notices the cgroup migration into the
new service, they get tagged with `_SYSTEMD_UNIT=init.scope`. The
`-u $unit` filter then fails to match them.

Capture a journal cursor before launching the unit, and grep using
`--after-cursor=` plus `SYSLOG_IDENTIFIER=systemd-networkd-wait-online`
instead of `-u $unit`. SYSLOG_IDENTIFIER is set by the program itself, so
it's not subject to the cgroup-discovery race. The cursor bounds the search
to entries produced by this invocation, so prior wait-online runs in
earlier testcases don't interfere.

Logs from the failing run showing the messages exist but are tagged with
the wrong unit:

  [ 2570.948554] TEST-75-RESOLVED.sh[2178]: + unit=wait-online-dns-ede81407-b93b-459d-8e5d-69292b42d2ae.service
  [ 2571.023162] TEST-75-RESOLVED.sh[2178]: + systemd-run -u wait-online-dns-ede81407-b93b-459d-8e5d-69292b42d2ae.service ...
  [ 2571.049189] TEST-75-RESOLVED.sh[2178]: + timeout 30 bash -c 'until journalctl -b -u wait-online-dns-ede81407-b93b-459d-8e5d-69292b42d2ae.service --grep ...'
  [ 2571.964986] systemd-networkd-wait-online[2190]: dns0: No DNS server is accessible.
  [ 2601.051088] TEST-75-RESOLVED.sh[2178]: ++ cleanup

And for that 2571.964986 entry:

      _SYSTEMD_CGROUP=/init.scope
      _SYSTEMD_UNIT=init.scope
      _EXE=/usr/lib/systemd/systemd-executor
      _CMDLINE=/usr/lib/systemd/systemd-executor --deserialize 68 ...
      SYSLOG_IDENTIFIER=systemd-networkd-wait-online
      MESSAGE=dns0: No DNS server is accessible.

Follow-up for d4bc62713e09df09281f26f4bf385801a3ee2897

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

copy: fix typo and slightly update comment

option: fix typo

po: update Japanese translation

json-stream: tolerate truncated SCM_RIGHTS on inbound messages

When an LSM (e.g. SELinux) denies an fd transfer or the receiver hits
RLIMIT_NOFILE, the kernel drops the fd(s) from the SCM_RIGHTS cmsg and
sets MSG_CTRUNC on the recvmsg(). recvmsg_safe() turns that into
-ECHRNG, which causes json_stream_read() to discard the data bytes
that were nevertheless received and the varlink server to silently
tear down the connection — leaving the caller waiting for a reply
that never comes.

Inline the recvmsg() call instead and, on MSG_CTRUNC, drop the partial
fds but keep the message data. The method handler will surface a clean
-ENXIO when it tries to peek the missing fd, which sd-varlink wraps as
io.systemd.System for the peer, instead of a hang. This matches the
recent sd-bus fix in 6c8de404c9 ('sd-bus: allow receiving messages with
MSG_CTRUNC set').

update TODO

test: Add test for repart's BlockDeviceReplace

repart: Add VolumeName=

When a luks2 device mapper is to be kept alive after execution
of systemd-cryptsetup, the name of the volume will be taken
from this value.

repart: Add BlockDeviceReplace=

BlockDeviceReplace=/mntpnt will move the btrfs filesystem from mount point to
the partition created. After moving, it will resize to take the whole
partition.

This is useful for OS installers that move a live system into a disk and
do not require a reboot.

repart: Allow keeping luks2 volumes opened

repart: Rescan disk on failure if we create blkpg partitions on the fly

Since we did not write the partition table, then the created partitions
should get removed on error.

repart: Use blkpg partitions instead of loop devices when possible

We will want to allow future features to keep some devices mounted or
active. So in order to avoid leaving a mess of many loop devices, we can
just already use the partition block device already.

repart: Reuse the backing fd for fdisk

Because fdisk_assign_device tries to open block devices with O_EXCL, when it
does it blocks cryptsetup from using partition block devices for the same
disk.

Since we already have a file descriptor for the device, we can just share it
and use fdisk_assign_device_by_fd instead.

This requires at least libfdisk 2.35 (part of util-linux) which was
released in 2020.

po: Translated using Weblate (Lao)

Currently translated at 100.0% (266 of 266 strings)

po: Added translation using Weblate (Lao)

Co-authored-by: Bone NI <bounkirdni@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/lo/
Translation: systemd/main

sd-dhcp-server-lease: rename dhcp_server_lease_append_json() -> dhcp_server_lease_build_json()

It does not append, but build a new JSON variant.

sd-dhcp-server: coding style fix

fuzz: modernize fuzz-dhcp-server

- Do not include .c file.
- Use ASSERT_OK() and friends.

dhcp-protocol: introduce more sub-options for DHCP Relay Agent Information option

These new values will be used later.

sd-dhcp-lease: drop sd_dhcp_lease.have_subnet_mask and have_broadcast

NULL address is invalid in both cases. Let's refuse to use them when NULL.

dhcp-message: introduce dhcp_message_{append,get}_option_hostname() and related functions

These are for DHCP options 12 (Hostname) and 81 (FQDN) options.

dhcp-message: introduce dhcp_message_{append,get}_option_parameter_request_list()

These are for DHCP option 55 (parameter request list).

dhcp-message: introduce dhcp_message_{append,get}_option_client_id()

These are for DHCP option 61 (client ID).

dhcp-message: introduce dhcp_message_{append,get}_option_string()

These are for DHCP options that takes a string e.g. DHCP options
17 (root path), 60 (vendor class identifier), and so on.

dhcp-message: introduce dhcp_message_{append,get}_option_addresses()

These functions support multiple addresses.

These are for e.g. DHCP options 6 (DNS servers), 42 (NTP servers) and so on.

dhcp-message: introduce dhcp_message_{append,get}_option_address()

These are equivalent to dhcp_message_{append,get}_option_be32(), but for
type safety.

These are for e.g. DHCP options 50 (requested IP address).

dhcp-message: introduce dhcp_message_{append,get}_option_sec()

These are for e.g. DHCP options 51 (lease time), 58 (renewal time), and
59 (rebinding time).

dhcp-message: introduce sd_dhcp_message object and several functions for the object

core: when skipping state deserializing units, also skip job subsections (#41957)

If a unit has active jobs, when it gets serialized there are job
subsections, each with their own empty line marker. The skipping
function ignores this and skips until the marker, but then leaves
the job in place, breaking deserialization.
Consume jobs subsections too.

This shows up now that there's TEST-07-PID1.alias-corruption,
which occasionally fails when the aliased unit happens to
still have a job when the reexec happens.

```
[  967.551630] TEST-07-PID1.sh[179]: + echo 'Testing with: systemctl daemon-reexec'
[  967.551630] TEST-07-PID1.sh[179]: Testing with: systemctl daemon-reexec
[  968.405274] TEST-07-PID1.sh[179]: + echo '--- Attempt 1/3 ---'
[  968.405274] TEST-07-PID1.sh[179]: --- Attempt 1/3 ---
[  968.698641] TEST-07-PID1.sh[179]: + echo 'Running daemon-reexec...'
[  968.698641] TEST-07-PID1.sh[179]: Running daemon-reexec...
[  969.130261] TEST-07-PID1.sh[179]: + echo 'legit.service PID remains 1282. Attempt 1 passed.'
[  969.130261] TEST-07-PID1.sh[179]: legit.service PID remains 1282. Attempt 1 passed.
[  970.870456] TEST-07-PID1.sh[179]: + echo '--- Attempt 2/3 ---'
[  970.870456] TEST-07-PID1.sh[179]: --- Attempt 2/3 ---
[  971.267205] TEST-07-PID1.sh[179]: + echo 'Running daemon-reexec...'
[  971.267205] TEST-07-PID1.sh[179]: Running daemon-reexec...
[  971.715743] TEST-07-PID1.sh[179]: + echo 'legit.service PID changed from 1282 to 1643!'
[  971.715743] TEST-07-PID1.sh[179]: legit.service PID changed from 1282 to 1643!
```

https://github.com/systemd/systemd/actions/runs/25376867873/job/74414201255

network: use TLV and iovec to manage several DHCP options (#42045)

storagectl: add assert for args

coccinelle-storage check started failing…

Implement Path/Scope/Swap/Timer Context/Runtime for `io.systemd.Unit.List` (#41980)

The PR implements the following objects + tests for
io.systemd.Unit.List:
* PathContext
* PathRuntime
* ScopeContext
* ScopeRuntime
* SwapContext
* SwapRuntime
* TimerContext
* TimerRuntime

It's a continuation of the following PRs:
* https://github.com/systemd/systemd/pull/37432
* https://github.com/systemd/systemd/pull/37646
* https://github.com/systemd/systemd/pull/38032
* https://github.com/systemd/systemd/pull/38212
* https://github.com/systemd/systemd/pull/39391

btrfs-util: clear RDONLY flag on subvolume before destroy ioctl

Without CAP_SYS_ADMIN, btrfs_ioctl_snap_destroy() runs an
inode_permission(MAY_WRITE) check against the target subvolume root, which
btrfs_permission() rejects with EROFS for a read-only subvolume. As a
result, unprivileged removal of a read-only subvolume fails — both via
btrfs_subvol_remove_at() directly and via the recursive cleanup path used
by rm_rf_subvolume_and_freep(), which propagates the EROFS up.

Detect EROFS after the destroy ioctl, clear the RDONLY flag (only inode
ownership is required for BTRFS_IOC_SUBVOL_SETFLAGS), and retry once.

While at it, fix the surrounding comments: BTRFS_IOC_SNAP_DESTROY drops the
entire subvolume tree, so regular files inside are irrelevant; ENOTEMPTY
from the ioctl indicates nested subvolumes (BTRFS_ROOT_REF_KEY entries) via
may_destroy_subvol(), not non-empty contents.

journalctl: move handling of --smart-relinquish-var to action logic

The help string for --smart-relinquish-var and --relinquish-var
were in reversed order because of the _fallthrough_.

We would resolve the conditions for "smart relinquish" immediately
in parse_argv() and call 'return 0' if the conditions were wrong,
terminating option parsing and the program. It seems nicer to delay
action until later. This makes the logic flow more standard. This
also allows the option parsing cases to be exchanged, fixing the
issue with --help.

journalctl: drop some parentheses

journalctl: convert to OPTION macros

Two namespaces are used: "journalctl" and "journalctl-varlink". Help for
--user/--system in the latter is added, even though it is not used yet.
I think it'll be good to have this for introspection.

The four FSS-related options (--interval, --verify-key, --force,
--setup-keys) unfortunately each gain an inline #if HAVE_GCRYPT / #else;
the EOPNOTSUPP fallback is duplicated four times.

The metavar for --identifier/--exclude-identifier is changed to "ID"
to make the layout nicer. (And because that seems to make more sense.)

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

60-sensor.hwdb iio/accel fix for Advan Evo-X 13 (#42037)

Added fix for Advan Evo-X 13 2-in-1 laptop

Relevant output of `udevadm info --export-db`
```
P: /devices/pci0000:00/0000:00:15.2/i2c_designware.3/i2c-2/i2c-NSA2513:00/iio:device0
M: iio:device0
R: 0
J: +iio:iio:device0
U: iio
T: iio_device
E: DEVPATH=/devices/pci0000:00/0000:00:15.2/i2c_designware.3/i2c-2/i2c-NSA2513:00/iio:device0
E: DEVTYPE=iio_device
E: SUBSYSTEM=iio
E: USEC_INITIALIZED=5462149
E: ACCEL_MOUNT_MATRIX=-1,0,0;0,-1,0;0,0,1
E: IIO_SENSOR_PROXY_TYPE=iio-poll-accel
E: SYSTEMD_WANTS=iio-sensor-proxy.service
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:
```

shared/options: add OPTION_COMMON_{SYSTEM,USER}

We have different help strings for --user/--system in different places, so this
only covers a subset of --system/--user instances. But this particular help
seems to be the most widely used.

(In a few cases, the help string is fixed: it should be "system mode", not
"per-system mode".)

chase: Use openat2() if available

Let's make use of openat2() if we can in chaseat().

chase: Use ELOOP for CHASE_PROHIBIT_SYMLINKS error

Matches the behavior of openat2() with RESOLVE_NO_SYMLINKS
which makes introducing support for openat2() easier.

pe-binary: fix "systemd-sbsign calculates wrong PE checksum"

journalctl: reorder parse_argv() cases to match --help

Pure reordering. ARG_SMART_RELINQUISH_VAR is kept immediately before
ARG_RELINQUISH_VAR because of the existing _fallthrough_; that's the
only deviation from strict --help order.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>

Introduce helper functions to parse and build length-prefixed data and TLV data (#41802)

These are currently not used yet, but will be used later in
parsing/building network packets like DHCP message.

dhcp: use struct iovec_wrapper to manage user class

sd-dhcp-option: drop unused sd_dhcp_option

dhcp: use TLV object to manage extra and vendor options

Note, previously we replaced the previous option with the same option code with
new one. But, DHCP message can have multiple options with same option code.
Hence, this make the conf parser not replace, but append new one.

sd-dhcp-protocol: rename DHCP option 43, 124, and 125

There are four DHCP options with confusing names:
Option 43: Vendor-Specific Information
Option 60: Vendor Class Identifier
Option 124: Vendor-Identifying Vendor Class
Option 125: Vendor-Identifying Vendor-Specific Information

Let's use their full names for their corresponding enums.

btrfs-util: Make nested subvolume operations work unpriv

BTRFS_IOC_SEARCH is only available to root in the
initial userns. This means we fail to recursively
snapshot even if a subvolume has no nested subvolumes
at the moment.

Let's fix this by using the newer btrfs ioctls which
do work even if we don't have CAP_SYS_ADMIN in the initial
userns.

hwdb/keyboard: Map f21 key on Wareus B15

Addition to PR https://github.com/systemd/systemd/pull/41181
Plasma-workspace OSD notifications about turning the touchpad on
and off are guided by f21. When this match is specified,
KDE notifies on this laptop that the on/off switch of the atchpad
state is pressed.
Fix dmesg:
atkbd serio0: Unknown key pressed (translated set 2, code 0xc1 on isa0060/serio0).

test: start systemd-report-basic.socket again

SUSE uses a different preset, so don't just assert in the test,
instead just start the socket in case it is not enabled

TEST-74-AUX-UTILS.sh[1594]: ++ systemctl is-enabled systemd-report-basic.socket
TEST-74-AUX-UTILS.sh[1540]: + [[ disabled == enabled ]]
TEST-74-AUX-UTILS.sh[120]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.report.sh failed'

Follow-up for 4409e52494d803426a365b6636a66fd2dfc70b62

tlv-util: introduce tlv-util that handles Tag-Length-Value data format

In many network protocols e.g. DHCP, the TLV format is used.
Let's introduce a simple parser and builder of the data format.

iovec-wrapper: reintroduce iovw_free() and iovw_free_free()

They were dropped by the commit 267b16f33c5636617927f15d7ae6b945c862a587,
but will be used later. Hence, let's reintroduce them.

iovec-wrapper: introduce iovec_split() and iovw_merge()

In many network protocols, the length-prefixed data format is often
used. Let's add a simple parser and builder for the format.

iovec-wrapper: introduce iovw_put_full() and friends to make them accept zero length entry

These will be used later. Preparation for later commits.

iovec-wrapper: make iovw_size() take NULL again

This partially reverts 267b16f33c5636617927f15d7ae6b945c862a587.

We usually make xyz_size() take NULL, e.g. hashmap_size().

vmspawn: Add missing error logging

firstboot,sysinstall,hostnamed: always show FANCY_NAME=

This makes sure that whenever we want to show the OS name we can show
the fancy name. Thus this moves the escaping/validation of the fancy
name out of hostnamed into generic code, and then makes use of it in
sysinstall,firstboot,prompt-util.

mkosi: Drop CPUs= limit

Limiting VMs to 2 cpus was cargo culting without any
actual data that this benefits performance. The host OS
has a scheduler, let's make use of it and give the VM access
to all the CPUs. This doesn't mean they become inaccessible to
the host, it just means the VM gets as many virtual CPUs as the
host has CPU cores (threads). How they get scheduled is still up
to the host OS.

units: pull in basic.target rather than sysinit.target from system-install.target

Many of our services are nowadays implemented via socket activation, and
hence require sockets.target to be active to be accessible. One of them
is mute-console.socket, which we typically want to use from
systemd-firstboot.service, systemd-sysinstall.service and other related
services. Hence let's pull in basic.target rather than sysinit.target
from system-install.target since it pulls sockets.target in too.

Effectively, this doesn't change much except for pulling in a bunch more
sockets, and frankly going for sysinit.target was really a bug to begin
width.

Add liburing to build image packages

vmspawn: Use builtin vdagent instead of spicevmc

The builtin one also makes the clipboard and such work. spicevmc
is only required for remote desktop use cases, so let's use the
builtin one instead.

boot,vconsole: Propagate UEFI HII keyboard layout to the OS

UEFI firmware can report the currently-active keyboard layout via
EFI_HII_DATABASE_PROTOCOL.GetKeyboardLayout(). The layout descriptor
includes an RFC 4646 / BCP 47 language tag (e.g. "en-US"). Query this
from sd-boot/sd-stub and write it to a new LoaderKeyboardLayout EFI
variable, advertised through a new EFI_LOADER_FEATURE_KEYBOARD_LAYOUT
feature bit.

On the OS side, systemd-vconsole-setup reads the variable as a
lowest-priority fallback for the console keymap. To map the BCP 47
tag to a vconsole keymap we extend /usr/share/systemd/kbd-model-map
with an optional sixth column listing the comma-separated BCP 47 tags
each row covers; a new find_vconsole_keymap_for_bcp47() helper walks
the file, preferring an exact tag match and otherwise falling back to
the row whose tag matches the input's primary subtag. Credentials,
/etc/vconsole.conf, and vconsole.keymap= on the kernel command line
continue to take precedence.

bootctl status surfaces the new variable, printing the language tag
or "n/a (not reported by firmware)" when sd-boot advertises the
feature but the firmware HII database didn't expose a layout (common
on QEMU without a USB keyboard, since EDK2's PS/2 driver does not
register an HII keyboard layout).