git.ipfire.org Git - thirdparty/systemd.git/log

dissect-image: Drop blkid_probe_filter_superblocks_usage() call from probe_blkid_filter()

probe_blkid_filter() sets up a blkid superblock filter to restrict
filesystem detection to a known-safe set of types (btrfs, erofs, ext4,
f2fs, squashfs, vfat, xfs). It does so via two consecutive calls:

  1. blkid_probe_filter_superblocks_type(BLKID_FLTR_ONLYIN, ...)
  2. blkid_probe_filter_superblocks_usage(BLKID_FLTR_NOTIN, BLKID_USAGE_RAID)

However, both filter functions share the same internal bitmap in libblkid.
Each call goes through blkid_probe_get_filter(), which zeroes the entire
bitmap before applying the new filter. This means the second call (usage
filter) silently destroys the type filter set by the first call.

The result is that only RAID superblocks end up being filtered, while all
other filesystem types — including iso9660 — pass through unfiltered.

This causes ISO images (e.g. those with El Torito boot catalogs and GPT)
to be incorrectly dissected: blkid detects the iso9660 superblock on
the whole device (since iso9660 is marked BLKID_IDINFO_TOLERANT and can
coexist with partition tables), the code enters the unpartitioned
single-filesystem path, and then mounting fails because iso9660 is not
in the allowed filesystem list:

  "File system type 'iso9660' is not allowed to be mounted as result
   of automatic dissection."

Fix this by dropping the blkid_probe_filter_superblocks_usage() call.
The BLKID_FLTR_ONLYIN type filter already restricts probing to only
the listed types, which implicitly excludes RAID superblocks as well,
making the usage filter redundant.

Follow-up for 72bf86663c ("dissect: use blkid_probe filters to restrict
probing to supported FSes and no raid")

vmspawn: drop ICH9-LPC S3 disable and guard cfi.pflash01 for x86

The ICH9-LPC disable_s3 global QEMU config was a workaround for an
OVMF limitation where S3 resume didn't work with X64 PEI + SMM. SMM is
required for secure boot as it prevents the guest from writing directly
to the pflash, bypassing UEFI variable protections. With X64 PEI + SMM
enabled and S3 advertised, OVMF would hang on S3 resume. The
workaround was to tell QEMU not to advertise S3 support.

This limitation has been resolved in edk2 — the S3Verification() check
was removed in edk2 commit 098c5570 ("OvmfPkg/PlatformPei: drop
S3Verification()") after edk2 gained native X64 PEI + SMM + S3 resume
support. See https://github.com/tianocore/edk2/commit/098c5570.

Drop the now-unnecessary ICH9-LPC disable_s3 config entirely, and
guard the cfi.pflash01 secure=on setting with an x86 architecture
check since SMM is x86-specific and this option is invalid on ARM.

kernel-image: minor refactoring to inspect_kernel()

Let's add make three arguments optional, by splitting inspect_kernel()
from inspect_kernel_full().

Let's also downgrade logging to debug, so that this becomes more
library-like. Let's log on the call-site instead.

vmspawn: minor tweaklets (#41485)

core: Prevent corrupting units from stale alias state on daemon-reload (#39703)

During daemon-reload (or daemon-reexec), when a unit becomes an alias to
another unit, deserialising the alias's stale serialised state can
corrupt the canonical unit's live runtime state.

Consider this scenario:

1. Before reload:
   - a.service is running
   - b.service was stopped earlier and is dead
   - Both exist as independent units

2. User creates an alias to migrate from b -> a:

   - `ln -s /run/systemd/system/a.service /etc/systemd/system/b.service`

3. daemon-reload triggers serialisation. State file contains both units:
- a.service -> state=running, cgroup=/system.slice/a.service, PID=1234,
...
   - b.service -> state=dead, cgroup=(empty), no PIDs, ...

4. During deserialisation:
   - Processes a.service: loads Unit A, deserialises -> state=RUNNING
- Processes b.service: manager_load_unit() detects symlink, returns Unit
A
   - unit_deserialize_state(Unit A, ...) overwrites with b's dead state

5. The result is that:
   - Unit A incorrectly shows state=dead despite PID 1234 still running
- If a.service has Upholds= dependents, catch-up logic sees a.service
should be running but is dead
   - systemd starts a.service again -> PID 5678
   - Two instances run: PID 1234 (left-over) and PID 5678 (new)

This bug is deterministic when serialisation orders a.service before
b.service.

The root cause is that manager_deserialize_one_unit() calls
manager_load_unit(name, &u) which resolves aliases via
unit_follow_merge(), returning the canonical Unit object. However, the
code doesn't distinguish between two cases when u->id differs from the
requested name from the state file. In the corruption case, we're
deserialising an alias entry and unit_deserialize_state() blindly
overwrites the canonical unit's fields with stale data from the old,
independent unit. The serialised b.service then overwrites Unit A's
correct live state.

This commit first scans the serialised unit names, then adds a check
after manager_load_unit():

    if (!streq(u->id, name) && set_contains(serialized_units, u->id))
        ...

This detects when the loaded unit's canonical ID (u->id) differs from
the serialised name, indicating the name is now an alias for a different
unit and the canonical unit also has its own serialised state entry.

If the canonical unit does not have its own serialised state entry, we
keep the state entry. That handles cases where the old name is really
just a rename, and thus the old name is the only serialised state for
the unit. In that case there is no bug, because there is no separate
canonical state entry for the stale alias entry to overwrite.

Skipping is safe because:

1. The canonical unit's own state entry will be correctly deserialised
regardless of order. This fix only prevents other stale alias entries
from corrupting it.
2. unit_merge() has already transferred the necessary data. When
b.service became an alias during unit loading, unit_merge() already
migrated dependencies and references to the canonical unit.
3. After merging, the alias doesn't have its own runtime state. The
serialised data represents b.service when it was independent, which is
now obsolete once the canonical unit also has its own serialised entry.
4. All fields are stale. unit_deserialize_state() would overwrite state,
timestamps, cgroup paths, pids, etc. There's no scenario where we want
this data applied on top of the canonical unit's own serialised state.

This fix also correctly handles unit precedence. For example, imagine
this scenario:

1. `b.service` is a valid, running unit defined in `/run`.
2. The sysadmin creates `ln -s .../a.service /etc/.../b.service`.
3. On reload, the new symlink in `/etc` overrides the unit in `/run`.

The new perspective from the manager side is that `b.service` is now an
alias for `a.service`.

In this case, systemd correctly abandons the old b.service unit, because
that's the intended general semantics of unit file precedence. We also
do that in other cases, like when a unit file in /etc/systemd/system/
masks a vendor-supplied unit file in /lib/systemd/system/, or when an
admin uses systemctl mask to explicitly disable a unit.

In all these scenarios, the configuration with the highest precedence
(in /etc/) is treated as the new source of truth. The old unit's
definition is discarded, and its running processes are (correctly)
abandoned. In that respect we are not doing anything new here.

Some may ask why we shouldn't just ignore the symlink if we think this
case will come up. I think there are multiple very strong reasons not to
do so:

1. It violates unit precedence. The unit design is built on a strict
precedence list. When an admin puts any file in /etc, they are
intentionally overriding everything else. If manager_load_unit were to
"ignore" this file based on runtime state, it would break this
fundamental precedent.
2. It makes daemon-reload stateful. daemon-reload is supposed to be a
simple, stateless operation, basically to read the files on disk and
apply the new configuration. But doing this would make daemon-reload
stateful, because we'd have to read the files on disk, but
cross-reference the current runtime state, and... maybe ignore some
files. This is complex and unpredictable.
3. It also completely ignores the user intent. The admin clearly has
tried to replace the old service with an alias. Ignoring their
instruction is the opposite of what they want.

Fixes: https://github.com/systemd/systemd/issues/38817
Fixes: https://github.com/systemd/systemd/issues/37482

vmspawn: simplify kernel_cmdline_maybe_append_root()

vmspawn: shorten find_virtiofsd() a bit

vmspawn: add a bunch of func param assert()s

ci: Drop base64 encoding in claude review workflow

Doesn't seem to work nearly as good as the previous solution which
just told claude not to escape stuff.

manager: Add DefaultMemoryZSwapWriteback

Allow setting system-wide MemoryZSwapWriteback in system.conf

Resolves: #41320

vmspawn: Add --console-transport= option to select serial vs virtio-serial

Add a --console-transport= option that selects between virtio-serial
(the default, appearing as /dev/hvc0) and a regular serial port
(appearing as /dev/ttyS0 or /dev/ttyAMA0 depending on architecture).

This is primarily useful for testing purposes, for example to test
sd-stub's automatic console= kernel command line parameter handling. It
allows verifying that the guest OS correctly handles serial console
configurations without virtio.

When serial transport is selected, -serial chardev:console is used on
the QEMU command line to connect the chardev to the platform's default
serial device. This cannot be done via the QEMU config file as on some
platforms (e.g. ARM) the serial device is a sysbus device that can only
be connected via serial_hd() which is populated by -serial.

loop-util: create loop device for block devices with sector size mismatch

Previously, loop_device_make_internal() always used the block device
directly (via loop_device_open_from_fd()) for whole-device access,
regardless of sector size. This is incorrect when the GPT partition
table was written with a different sector size than the device reports,
as happens with CD-ROM/ISO boot via El Torito: the device has
2048-byte blocks but the GPT uses 512-byte sectors.

Restructure the sector size handling in loop_device_make_internal():

- Move GPT sector size probing (UINT32_MAX case) before the
  block-vs-regular-file split so both paths share the same logic and
  O_DIRECT handling. Check f_flags instead of loop_flags for O_DIRECT
  detection, since we're probing the original fd before any reopening.

- For block devices, get the device sector size and compare it against
  the resolved sector_size. Only use the block device directly when
  sector sizes match. When they differ (probed GPT mismatch or explicit
  sector size request), fall through to create a real loop device with
  the correct sector size.

- Default sector_size=0 to the device sector size for block devices
  (instead of always 512), so "no preference" correctly matches the
  device's sector size.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

test-loop-block: Migrate to new assertion macros and framework

While we're at it, rename to test-loop-util.c so it matches its
source file.

We also drop the root check and solely check for CAP_SYS_ADMIN since
that's sufficient to run the tests.

vmspawn: propagate $TERM from host into VM via kernel command line

When running in a console mode (interactive, native, or read-only),
propagate the host's $TERM into the VM by adding TERM= and
systemd.tty.term.hvc0= to the kernel command line.

TERM= is picked up by PID 1 and inherited by services on /dev/console
(such as emergency.service). systemd.tty.term.hvc0= is used by services
directly attached to /dev/hvc0 (such as serial-getty@hvc0.service) which
look up $TERM via the systemd.tty.term.<tty> kernel command line
parameter.

While systemd can auto-detect the terminal type via DCS XTGETTCAP, not
all terminal emulators implement this, so explicitly propagating $TERM
provides a more reliable experience. We skip propagation when $TERM is
unset or set to "unknown" (as is the case in GitHub Actions and some
other CI environments).

Previously this was handled by mkosi synthesizing the corresponding
kernel command line parameters externally.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

vmspawn: Warn about --grow-image= in combo with --image-disk-type=scsi-cd

Support `CopyBlocks=` for `Verity={hash,sig}` (#41393)

This enables deriving the minimum size of the `Verity=hash` partition
using the `Verity=` logic when the size of the `Verity=data` partition
is bigger than the `CopyBlocks=` target.

This enables using `Minimize=true` for an "installer image" and later
using sd-repart to install to a system with reserve space for future
updates by specifying `Size{Min,Max}Bytes=` only in the `Verity=data`
partition, without needing to hardcode the corresponding size for the
`Verity=hash` partition.

While not strictly necessary for `Verity=signature` partitions (since
they have a fixed size) there isn't too much reason to not support it,
since then you can still specify `VerityMatchKey=` to indicate that the
partition is logically still part of that group of partitions.

---

Alternative to: https://github.com/systemd/systemd/pull/41156
Fixes https://github.com/systemd/systemd/issues/40995

report: implement facts interface (#41117)

This is a rebased version of #40936:
- the first few commits are #41003, which is greenlighted for merging
after the release
- then there's #40923
- and some changes on top

network-generator: support BOOTIF= and rd.bootif=0 options (#41028)

The network generator currently supports many of the options described
by dracut.cmdline(7), but not everything.

This commit adds support for the BOOTIF= option (and the related
rd.bootif= option) used in PXE setups.

This is implemented by treating BOOTIF as a special name/placeholder
when used as an interface name, and expecting a MAC address to be set in
the BOOTIF= parameter. The resulting .network file then uses MACAddress=
in the [Match] section, instead of Name=.

terminal-util: Protect errno in termios_reset()

basic/terminal-util: use getenv_terminal_is_dumb()

terminal_prepare_query() is called from terminal_get_size() which
operates on an explicitly passed fd — typically /dev/console opened
directly by PID 1 via reset_dev_console_fd(), or a service's TTY via
exec_context_apply_tty_size(). Using terminal_is_dumb() here is wrong
because it additionally checks on_tty(), which tests whether stderr is
a tty. PID 1's stderr may not be a tty (e.g. connected to kmsg or the
journal), causing terminal_is_dumb() to return true and skip the ANSI
query even though the fd we're operating on is a perfectly functional
terminal.

Use getenv_terminal_is_dumb() instead, which only checks $TERM, matching
what terminal_reset_ansi_seq() already does.

Also use it in terminal_get_cursor_position(), which also receives fds
to operate on.

basic/terminal-util: use non-blocking writes when sending ANSI sequences in terminal_get_size()

terminal_get_size() writes ANSI escape sequences (CSI 18 and DSR
queries) to the output fd to determine terminal dimensions. This is
called during early boot via reset_dev_console_fd() and from service
execution contexts via exec_context_apply_tty_size().

Previously, these writes used loop_write() on a blocking fd, which
could block indefinitely if the terminal is not consuming data — for
example on a serial console with flow control asserted, or a
disconnected terminal. This is the same problem that was solved for
terminal_reset_ansi_seq() in systemd/systemd#32369 by temporarily
setting the fd to non-blocking mode with a write timeout.

Apply the same pattern here: set the output fd to non-blocking in
terminal_get_size() before issuing the queries, and restore blocking
mode afterward. Change the loop_write() calls in
terminal_query_size_by_dsr() and terminal_query_size_by_csi18() to
loop_write_full() with a 100ms timeout so writes fail gracefully
instead of hanging.

Also introduce the CONSOLE_ANSI_SEQUENCE_TIMEOUT_USEC constant for all
timeouts used across all ANSI sequence writes and reads (vt_disallocate(),
terminal_reset_ansi_seq(), and the two size query functions). 333ms
is now used for all timeouts in terminal-util.c.

Also introduce a cleanup function for resetting a fd back to blocking
mode after it was made non-blocking.

vmspawn: use fstab.extra credential for runtime mounts instead of kernel cmdline

Switch runtime virtiofs mount configuration from systemd.mount-extra=
kernel command line parameters to the fstab.extra credential. This
avoids consuming kernel command line space (which is limited) and
matches the approach used by mkosi.

Each mount is added as an fstab entry in the format:
{tag} {destination} virtiofs {ro|rw},x-initrd.mount

If the user already specified a fstab.extra credential via
--set-credential= or --load-credential=, the virtiofs mount entries
are appended to it rather than conflicting.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

vmspawn: add scsi-cd disk type for ISO/CD-ROM image support

Add DISK_TYPE_SCSI_CD to support attaching disk images as CD-ROM
drives, needed for testing El Torito ISO images built by
systemd-repart.

When --image-disk-type=scsi-cd is specified, the image is attached
with media=cdrom and readonly=on on the drive, using scsi-cd as the
device driver on the SCSI bus. This also works for --extra-drive=
with the scsi-cd: prefix.

The QEMU configuration matches the standard OVMF CD-ROM boot setup:
  -drive if=none,media=cdrom,format=raw,readonly=on
  -device virtio-scsi-pci
  -device scsi-cd

When direct kernel booting with scsi-cd, if the kernel command line
contains "rw", append "ro" to override it since CD-ROMs are
read-only.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

core: Prevent corrupting units from stale alias state on daemon-reload

During daemon-reload (or daemon-reexec), when a unit becomes an alias to
another unit, deserialising the alias's stale serialised state can
corrupt the canonical unit's live runtime state.

Consider this scenario:

1. Before reload:
   - a.service is running
   - b.service was stopped earlier and is dead
   - Both exist as independent units

2. User creates an alias to migrate from b -> a:

   - `ln -s /run/systemd/system/a.service /etc/systemd/system/b.service`

3. daemon-reload triggers serialisation. State file contains both units:
   - a.service -> state=running, cgroup=/system.slice/a.service,
     PID=1234, ...
   - b.service -> state=dead, cgroup=(empty), no PIDs, ...

4. During deserialisation:
   - Processes a.service: loads Unit A, deserialises -> state=RUNNING
   - Processes b.service: manager_load_unit() detects symlink, returns
     Unit A
   - unit_deserialize_state(Unit A, ...) overwrites with b's dead state

5. The result is that:
   - Unit A incorrectly shows state=dead despite PID 1234 still running
   - If a.service has Upholds= dependents, catch-up logic sees
     a.service should be running but is dead
   - systemd starts a.service again -> PID 5678
   - Two instances run: PID 1234 (left-over) and PID 5678 (new)

This bug is deterministic when serialisation orders a.service before
b.service.

The root cause is that manager_deserialize_one_unit() calls
manager_load_unit(name, &u) which resolves aliases via
unit_follow_merge(), returning the canonical Unit object. However, the
code doesn't distinguish between two cases when u->id differs from the
requested name from the state file. In the corruption case, we're
deserialising an alias entry and unit_deserialize_state() blindly
overwrites the canonical unit's fields with stale data from the old,
independent unit. The serialised b.service then overwrites Unit A's
correct live state.

This commit first scans the serialised unit names, then adds a check
after manager_load_unit():

    if (!streq(u->id, name) && set_contains(serialized_units, u->id))
        ...

This detects when the loaded unit's canonical ID (u->id) differs from
the serialised name, indicating the name is now an alias for a different
unit and the canonical unit also has its own serialised state entry.

If the canonical unit does not have its own serialised state entry, we
keep the state entry. That handles cases where the old name is really
just a rename, and thus the old name is the only serialised state for
the unit. In that case there is no bug, because there is no separate
canonical state entry for the stale alias entry to overwrite.

Skipping is safe because:

1. The canonical unit's own state entry will be correctly deserialised
   regardless of order. This fix only prevents other stale alias entries
   from corrupting it.
2. unit_merge() has already transferred the necessary data. When
   b.service became an alias during unit loading, unit_merge() already
   migrated dependencies and references to the canonical unit.
3. After merging, the alias doesn't have its own runtime state. The
   serialised data represents b.service when it was independent, which
   is now obsolete once the canonical unit also has its own serialised
   entry.
4. All fields are stale. unit_deserialize_state() would overwrite state,
   timestamps, cgroup paths, pids, etc. There's no scenario where we
   want this data applied on top of the canonical unit's own serialised
   state.

This fix also correctly handles unit precedence. For example, imagine
this scenario:

1. `b.service` is a valid, running unit defined in `/run`.
2. The sysadmin creates `ln -s .../a.service /etc/.../b.service`.
3. On reload, the new symlink in `/etc` overrides the unit in `/run`.

The new perspective from the manager side is that `b.service` is now an
alias for `a.service`.

In this case, systemd correctly abandons the old b.service unit, because
that's the intended general semantics of unit file precedence. We also
do that in other cases, like when a unit file in /etc/systemd/system/
masks a vendor-supplied unit file in /lib/systemd/system/, or when an
admin uses systemctl mask to explicitly disable a unit.

In all these scenarios, the configuration with the highest precedence
(in /etc/) is treated as the new source of truth. The old unit's
definition is discarded, and its running processes are (correctly)
abandoned. In that respect we are not doing anything new here.

Some may ask why we shouldn't just ignore the symlink if we think this
case will come up. I think there are multiple very strong reasons not to
do so:

1. It violates unit precedence. The unit design is built on a strict
   precedence list. When an admin puts any file in /etc, they are
   intentionally overriding everything else. If manager_load_unit were
   to "ignore" this file based on runtime state, it would break this
   fundamental precedent.
2. It makes daemon-reload stateful. daemon-reload is supposed to be a
   simple, stateless operation, basically to read the files on disk and
   apply the new configuration. But doing this would make daemon-reload
   stateful, because we'd have to read the files on disk, but
   cross-reference the current runtime state, and...  maybe ignore some
   files. This is complex and unpredictable.
3. It also completely ignores the user intent. The admin clearly has
   tried to replace the old service with an alias. Ignoring their
   instruction is the opposite of what they want.

Fixes: https://github.com/systemd/systemd/issues/38817
Fixes: https://github.com/systemd/systemd/issues/37482

vmspawn: use PTY for native console to avoid QEMU O_NONBLOCK issue

QEMU's stdio chardev sets O_NONBLOCK on both stdin and stdout (see
chardev/char-stdio.c [1] and chardev/char-fd.c [2]). Since forked
processes share file descriptions, and on a terminal all three stdio
fds typically reference the same file description, this affects our
own stdio too.

Avoid this by using a PTY with chardev serial instead of chardev
stdio for native console mode, matching the approach already used
for interactive and read-only modes. The PTY forwarder shovels bytes
transparently between our stdio and QEMU's PTY using the new
PTY_FORWARD_DUMB_TERMINAL and PTY_FORWARD_TRANSPARENT flags, which
disable terminal decoration (background tinting, window title, OSC
context) and escape sequence handling (Ctrl-] exit, hotkeys)
respectively.

The chardev is configured with mux=on so the QEMU monitor remains
accessible via Ctrl-a c.

Also dedup CONSOLE_NATIVE, CONSOLE_READ_ONLY, and CONSOLE_INTERACTIVE
handling by using fallthrough, with the only differences being the
ptyfwd flags, mux setting, and monitor section.

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/chardev/char-stdio.c
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/chardev/char-fd.c

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

ptyfwd: Add transparent flag

stub: Determine the correct serial console from the ACPI device path

Instead of requiring exactly one serial device and assuming ttyS0,
extract the COM port index from the ACPI device path and use the
uart I/O port address format for the console= kernel argument.

On x86, the ACPI UID for PNP0501 (16550 UART) maps directly to the
COM port number: UID 0 = COM1 (0x3F8), UID 1 = COM2 (0x2F8), etc.
The I/O port addresses are fixed in the kernel (see
arch/x86/include/asm/serial.h). Using the console=uart,io,<addr>
format (see Documentation/admin-guide/kernel-parameters.txt) addresses
the UART by I/O port directly rather than relying on ttyS naming,
and also provides early console output before the full serial driver
loads.

Restrict the entire serial console auto-detection to x86. On non-x86
(e.g. ARM with PL011 UARTs), displays may be available without GOP
(e.g. simple-framebuffer via device tree), serial device indices are
assigned dynamically during probe rather than being fixed to I/O port
addresses, and the kernel has its own console auto-detection via DT
stdout-path.

When ConOut has no device path (ConSplitter), all text output handles
are enumerated. If multiple handles have PNP0501 UART nodes with
different UIDs, bail out rather than guessing.

Add ACPI_DP device path subtype, ACPI_HID_DEVICE_PATH struct, and
EISA_PNP_ID() macro to device-path.h for parsing ACPI device path
nodes. Remove MSG_UART_DP, device_path_has_uart(), count_serial_devices()
and proto/serial-io.h (no longer needed).

Move all the console logic to console.c as well.

stub: Do a single PCI read to get the vendor and device ID

stub: Drop needless enum value assignment

Follow up for 45e4df9a33

update TODO

shared/gpt: add gpt_probe() for GPT header and partition entry reading

Add gpt_probe() which probes for a GPT partition table at various sector
sizes (512-4096) and optionally returns the header and partition entries.
Returns the detected sector size on success, 0 if no GPT was found, or
negative errno on error.

Refactor probe_sector_size() in dissect-image.c to be a thin wrapper
around gpt_probe().

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

mount-util: restore compat for kernels without MOUNT_ATTR_NOSYMFOLLOW (< 5.14)

Follow-up for 6753bd8a2f38bd77a4c8b973174db6ec8bcaf3ab

Replaces #41341

networkd: fix assert with IPFamily=both in MobileNetwork conf and add minimal mock for test coverage (#41402)

Initialize roothash when populating sig partition

This allows one to specify `CopyBlocks=` on both the `Verity=` data and
hash partition and the signature is recreated correctly.

test: tweak TEST-74-AUX-UTILS.varlinkctl.sh varlink test

This commit tweaks the TEST-74-AUX-UTILS.varlinkctl.sh code
to use `systemd-notify --fork $UPGRADE_SERVER` instead of
the (ugly) timeout.

This also fixes a stale comment in around
`Test --upgrade with stdin redirected from a regular file`.

Thanks to Daan for suggesting this!

varlinkctl: simplify error handling in exec_with_listen_fds

Instead of exiting in exec_with_listen_fds() just return an error
and do the actual _exit() in the caller. Much nicer this way.

Thanks for Lennart for suggesting this.

sd-varlink: fix fd handling in upgrade code path

This commit fixes an issue with the fd handling in
sd_varlink_call_and_upgrade() when one direction of the output
FDs is unset.

Thanks to Lennart for spotting this and suggesting the fix.

vmspawn: Use qemu config file for smp and memory

Pass -no-user-config while we're at it to avoid loading qemu config
from /etc which is more likely to cause hard to debug issues rather
than do something useful.

vmspawn: pass --log-level=error and --modcaps=-mknod to virtiofsd

Reduce virtiofsd log noise by setting --log-level=error, and drop
the unnecessary mknod capability with --modcaps=-mknod, matching
mkosi's virtiofsd invocation.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

docs: fix misleading VM/machined documentation

Fix two issues in WRITING_VM_AND_CONTAINER_MANAGERS.md:

1. The Host OS Integration section implied that -M switch and
   machinectl shell/login work for VMs, but they currently only
   work for containers. Add a note clarifying this limitation.

2. The Guest OS Integration section said "there's only one" VM
   integration API (SMBIOS Product UUID), but VM_INTERFACE.md
   documents five. Replace the outdated single-API description
   with a reference to VM_INTERFACE.md listing all five.

Fixes #40935

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

Support `CopyBlocks=` for `Verity={hash,sig}`

This enables deriving the minimum size of the `Verity=hash` partition
using the `Verity=` logic when the size of the `Verity=data` partition
is bigger than the `CopyBlocks=` target.

This enables using `Minimize=true` for an "installer image" and later
using sd-repart to install to a system with reserve space for future
updates by specifying `Size{Min,Max}Bytes=` only in the `Verity=data`
partition, without needing to hardcode the corresponding size for the
`Verity=hash` partition.

While not strictly necessary for `Verity=signature` partitions (since
they have a fixed size) there isn't too much reason to not support it,
since then you can still specify `VerityMatchKey=` to indicate that
the partition is logically still part of that group of partitions.

We ensure that if one of the hash uses `CopyBlocks=` that the data
partition does so as well. Similarly if the signature partition does it
checks that the hash and data partition do so as well. This is to
minimize the chance of accidental misconfiguration of mixing
`CopyBlocks=auto` and `CopyBlocks=<path>` and of manually populating
partitions while hash/sig partitions are copied from existing sources.

vmspawn: Pass extra cmdline via smbios when direct booting a UKI

-cmdline doesn't work when direct booting a UKI so use SMBIOS instead.

hwdb/keyboard: fix enter key for X+ piccolo

The main enter key gives a code for keypad one... Map it to
regular enter key.

Two followups for recent commits (#41457)

nspawn: move boot_id and kmsg backing file creation to outer child

Follow-up for af5126568af6 ("nspawn: keep backing files for boot_id and
kmsg bind mounts alive").

The backing files for the boot_id and kmsg bind mounts were previously
created in the inner child. However, /run/host/ is remounted read-only
by mount_all() in the inner child (via the MOUNT_IN_USERNS mount table
entry) before setup_boot_id() and setup_kmsg() run, so creating files
there would fail with EROFS.

Fix this by splitting the file creation into separate functions
(setup_boot_id_file() and setup_kmsg_fifo()) that run in the outer
child, where /run/host/ is still writable. The bind mounts onto /proc
remain in the inner child, since procfs is only mounted there.

Also move the backing files from /run/ to /run/host/ and drop the dot
prefix, since /run/host/ is the container-manager-owned namespace and
there is no need to hide these files there. Additionally, apply
userns_lchown() to the created files, matching the convention used by
all other outer child functions that create files in the container.

units: allow systemd-report-basic@.service to run in early boot

report-basic: lock down the service

The basic approach is copied from systemd-journal-gatewayd.service,
with some additions to lock down unneeded network access.

report: move facts generator out of PID1 into a separate varlink service

The collection of facts is entirely unprivileged and has very little to
do with PID1. PID1 is privileged and single-threaded and a point of
contention, so we shouldn't put things in PID1 that don't need to be
there. A separate service can be enabled/disabled/started/stopped at
will, is easy to sandbox, etc. If it turns out to be necessary to
collect some facts through PID1 in the future, we can always add a
smaller facts endpoint to PID1 again.

Two claude fixes (#41451)

basic/terminal-util: flush stray input when terminal query fails

Follow-up for da69848791d2b32dfb90946264fd632ac1d5c7de.

test-efi-string: add more cases

This excercises the patterns used in
45e4df9a331208d20ecb9f5ead8110eb50a5b86d.

ci: base64 encode multiline strings in structured output

Avoid claude trying to escape characters in the structured JSON by
just having it base64 encode the multiline strings in the structured
JSON.

ci: Delay instructions to read pr-context.json until 2nd phase

The main agent doesn't need to read pr-context.json until all
reviews have finished. This should prevent it from passing unnecessary
data from pr-context.json in the prompt to its subagents, which can just
read that file themselves when needed.

shared/options: allow output ret_arg to be omitted

Sometimes we have a parser which would never use the argument.

report: rename variables/fields, use string table

Noop refactoring to make naming more consistent.

report: implement facts interface

report: downgrade message for org.varlink.service.MethodNotFound

We now have three kinds of endpoints under /run/systemd/report/:
those that implement facts, those that implement metrics, and those that
implement both. Let's downgrade the message to avoid pointless warnings.

stub: auto-detect console device and append console= to kernel command line

The Linux kernel does not reliably auto-detect serial consoles on
headless systems. While the docs claim serial is used as a fallback
when no VGA card is found, in practice CONFIG_VT's dummy console
(dummycon) registers early and satisfies the kernel's console
requirement, preventing the serial fallback from ever triggering. The
ACPI SPCR table can help on ARM/RISC-V where QEMU generates it, but
x86 QEMU does not produce SPCR, and SPCR cannot describe virtio
consoles at all. This means UKIs booted via sd-stub in headless VMs
produce no visible console output unless console= is explicitly
passed on the kernel command line.

Fix this by having sd-stub auto-detect the console type and append an
appropriate console= argument when one isn't already present.

Detection priority:

1. VirtIO console PCI device (vendor 0x1AF4, device 0x1003): if
   exactly one is found, append console=hvc0. This takes highest
   priority since a VirtIO console is explicitly configured by the
   VMM (e.g. systemd-vmspawn's virtconsole device). If multiple
   VirtIO console devices exist, we cannot determine which hvc index
   is correct, so we skip this path entirely.

2. EFI Graphics Output Protocol (GOP): if present, don't add any
   console= argument. The kernel will use the framebuffer console by
   default, and adding a serial console= would redirect the primary
   console away from the display.

3. Serial console: first, we count the total number of serial devices
   via EFI_SERIAL_IO_PROTOCOL. If there are zero or more than one,
   we bail out — with multiple UARTs, the kernel assigns ttyS indices
   based on its own enumeration order and we cannot determine which
   index the console UART will receive. Only when exactly one serial
   device exists (guaranteeing it will be ttyS0) do we proceed to
   verify it's actually used as a console by checking for UART device
   path nodes (MESSAGING_DEVICE_PATH + MSG_UART_DP). The firmware's
   ConOut handle is checked first; if it has no device path (common
   with OVMF's ConSplitter virtual handle when using -nographic
   -nodefaults), we fall back to enumerating all
   EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL handles and checking each one's
   device path. The architecture-specific console argument is then
   appended:
   - x86:     console=ttyS0
   - ARM:     console=ttyAMA0
   - Others:  console=ttyS0 (RISC-V, LoongArch, MIPS all use ttyS0)

Note on OVMF's VirtioSerialDxe: it exposes virtio serial ports with
the same UART device path nodes as real serial ports (ACPI PNP 0x0501
+ MSG_UART_DP), making them indistinguishable from real UARTs via
device path inspection alone. This is why we check for the VirtIO
console PCI device via EFI_PCI_IO_PROTOCOL before falling back to
device path analysis.

Also add a minimal EFI_PCI_IO_PROTOCOL definition (proto/pci-io.h)
with just enough to call Pci.Read for vendor/device ID enumeration,
and add the MSG_UART_DP subtype to the device path header.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

build(deps): bump meson from 1.10.1 to 1.10.2 in /.github/workflows

Bumps [meson](https://github.com/mesonbuild/meson) from 1.10.1 to 1.10.2.
- [Release notes](https://github.com/mesonbuild/meson/releases)
- [Commits](https://github.com/mesonbuild/meson/compare/1.10.1...1.10.2)

---
updated-dependencies:
- dependency-name: meson
  dependency-version: 1.10.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

build(deps): bump the actions group with 3 updates

Bumps the actions group with 3 updates: [actions/upload-artifact](https://github.com/actions/upload-artifact), [redhat-plumbers-in-action/download-artifact](https://github.com/redhat-plumbers-in-action/download-artifact) and [softprops/action-gh-release](https://github.com/softprops/action-gh-release).

Updates `actions/upload-artifact` from 6 to 7
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v6...v7)

Updates `redhat-plumbers-in-action/download-artifact` from 1.1.5 to 1.1.6
- [Release notes](https://github.com/redhat-plumbers-in-action/download-artifact/releases)
- [Commits](https://github.com/redhat-plumbers-in-action/download-artifact/compare/103e5f882470b59e9d71c80ecb2d0a0b91a7c43b...03d5b806a9dca9928eb5628833fe81a0558f23bb)

Updates `softprops/action-gh-release` from 2.5.0 to 2.6.1
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](https://github.com/softprops/action-gh-release/compare/a06a81a03ee405af7f2048a818ed3f03bbf83c7b...153bb8e04406b158c6c84fc1615b65b24149a1fe)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: redhat-plumbers-in-action/download-artifact
  dependency-version: 1.1.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: softprops/action-gh-release
  dependency-version: 2.6.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>

hmac: add comments explaining why each buffer needs erasing

As requested in review: clarify that the padding arrays carry
key material (key XOR fixed constant, trivially reversible),
not just padding bytes.

hmac: erase key-derived stack buffers before returning

hmac_sha256() leaves four stack buffers containing key-derived material
(inner_padding, outer_padding, replacement_key, hash state) on the stack
after returning. The inner_padding and outer_padding arrays contain
key XOR 0x36 and key XOR 0x5c respectively, which are trivially
reversible to recover the original HMAC key.

This function is called with security-sensitive keys including the LUKS
volume key (cryptsetup-util.c), TPM2 PIN (tpm2-util.c), and boot secret
(tpm2-swtpm.c). The key material persists on the stack until overwritten
by later unrelated function calls.

Add CLEANUP_ERASE() to all four local buffers, following the same
pattern applied to tpm2-util.c in commit 6c80ce6 (PR #41394).

loop-util: work around kernel loop driver partition scan race

The kernel loop driver has a race condition in LOOP_CONFIGURE when
LO_FLAGS_PARTSCAN is set: it sends a KOBJ_CHANGE uevent (with
GD_NEED_PART_SCAN set) before calling loop_reread_partitions(). If
udev opens the device in response to the uevent before
loop_reread_partitions() runs, the kernel's blkdev_get_whole() sees
GD_NEED_PART_SCAN and triggers a first partition scan. Then
loop_reread_partitions() runs a second scan that drops all partitions
from the first scan (via blk_drop_partitions()) before re-adding them.
This causes partition devices to briefly disappear (plugged -> dead ->
plugged), which breaks systemd units with BindsTo= on the partition
device: systemd observes the dead transition, fails the dependent
units with 'dependency', and does not retry when the device reappears.

Work around this in loop_device_make_internal() by splitting the loop
device setup into two steps: first LOOP_CONFIGURE without
LO_FLAGS_PARTSCAN, then LOOP_SET_STATUS64 to enable partscan. This
avoids the race because:

1. LOOP_CONFIGURE without partscan: disk_force_media_change() sets
   GD_NEED_PART_SCAN, but GD_SUPPRESS_PART_SCAN remains set. If udev
   opens the device, blkdev_get_whole() calls bdev_disk_changed()
   which clears GD_NEED_PART_SCAN, but blk_add_partitions() returns
   early because disk_has_partscan() is false — no partitions appear,
   the flag is drained harmlessly.

2. Between the two ioctls, we open and close the device to ensure
   GD_NEED_PART_SCAN is drained regardless of whether udev processed
   the uevent yet.

3. LOOP_SET_STATUS64 with LO_FLAGS_PARTSCAN: clears
   GD_SUPPRESS_PART_SCAN and calls loop_reread_partitions() for a
   single clean scan. Crucially, loop_set_status() does not call
   disk_force_media_change(), so GD_NEED_PART_SCAN is never set again.

A proper kernel fix has been submitted:
https://lore.kernel.org/linux-block/20260330081819.652890-1-daan@amutable.com/T/#u

This workaround should be dropped once the fix is widely available.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

nspawn: keep backing files for boot_id and kmsg bind mounts alive

Both setup_boot_id() and setup_kmsg() previously created temporary files
in /run, bind mounted them over their respective /proc targets, and then
immediately unlinked the backing files. While the bind mount keeps the
inode alive, the kernel marks the dentry as deleted.

This is a problem because bind mounts backed by unlinked files cannot be
replicated: both the old mount API (mount(MS_BIND)) and the new mount
API (open_tree(OPEN_TREE_CLONE) + move_mount()) fail with ENOENT when
the source mount references a deleted dentry. This affects
mount_private_apivfs() in namespace.c, which needs to replicate these
submounts when setting up a fresh /proc instance for services with
ProtectProc= or similar sandboxing options — with an unlinked backing
file, the boot_id submount simply gets lost.

Fix this by using fixed paths (/run/proc-sys-kernel-random-boot-id and
/run/proc-kmsg) instead of randomized tempfiles, and not unlinking them
after the bind mount. The files live in /run which is cleaned up on
shutdown anyway.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

varlinkctl: add protocol upgrade support using new SocketFoward (#41283)

hwdb: Silence spurrious F23 key-press from Fn key on Thinkpad T14s

The Thinkpad T14s Gen 6 (Snapdragon) emits a F23 key press when
pressing the Fn key. Silence them since the keyboard doesn't
actually have a F23 key.

ci: Rework Claude review workflow to use CLI directly

Replace claude-code-action with a direct claude CLI invocation. This
gives us explicit control over settings, permissions, and output
handling.

Other changes:
- Prepare per-commit git worktrees with pre-generated commit.patch and
  commit-message.txt files, replacing the pr-review branch approach.
- Use structured JSON output (--output-format stream-json --json-schema)
  instead of having Claude write review-result.json directly.
- Use jq instead of python3 for JSON prettification.
- Add timeout-minutes: 60 to the review job.
- List tool permissions explicitly instead of using a wildcard.
- Fix sandbox filesystem paths to use regular paths instead of the "//"
  prefix.

loop-util: use auto-detect open mode for loop device setup

When callers do not explicitly request read-only mode, pass open_flags
as -1 (auto-detect) instead of hardcoding O_RDWR. This enables the
existing O_RDWR-to-O_RDONLY retry logic in loop_device_make_by_path_at()
which falls back to O_RDONLY when opening the backing device with O_RDWR
fails with EROFS or similar errors.

Previously, callers passed O_RDWR explicitly when read-only mode was not
requested, which bypassed the retry logic entirely. This meant that
inherently read-only block devices (such as CD-ROMs) would fail to open
instead of gracefully falling back to read-only mode.

Also propagate the unresolved open_flags through
loop_device_make_by_path_at() into loop_device_make_internal() instead
of resolving it to O_RDWR early. For loop_device_make_by_path_memory(),
resolve to O_RDWR immediately since memfds are always writable.

In mstack, switch from loop_device_make() to
loop_device_make_by_path_at() with a NULL path, which reopens the
O_PATH file descriptor with the appropriate access mode. This is
necessary because the backing file descriptor is opened with O_PATH,
which prevents loop_device_make_internal() from auto-detecting the
access mode via fcntl(F_GETFL).

varlink: tweak exec_with_listen_fds() log generation

The previous code used strv_join() when it generated the log
message for `varlinkctl --exec`. However this can lead to
inaccurate logging so use `quote_command_line()` instead.

Thanks to Lennart for suggesting this.

varlinkctl: add support for `--exec` with `--upgrade`

Having support for `--exec` when using `--upgrade` is nice so this
commit adds it. It does it by extracting a shared helper called
`exec_with_listen_fds()` and then use that in the `verb_call()`
and `varlink_call_and_upgrade()` calls.

varlinkctl: add protocol upgrade support

The varlink spec supports protocol upgrades and they are very
useful to e.g. transfer binary data directly via varlink. So
far varlinkctl/sd-varlink was not supporting this. This commit
adds support for it in varlinkctl by using the new code in
sd-varlink and the generalized socket-forward code.

shared: rename internal variables in SimplexForwarder

The SimplexForwader was using the naming of the SocketForwarder
for bi-directional sockets. This was to keep the diff small and
to make it easier to follow what changed and what was reused.

However the name "client/server" for the SimplexForwader does
no longer make much sense. The SimplexForwader is no longer
about client/server but really just read/write. So this commit
adjusts the naming.

shared: extend socket-forward to support fd-pairs too

Now that the socket forward code is extracted we can
extend it to not just support bidirectional sockets
but also input/output fd-pairs. This will be needed
for e.g. the varlinkctl protocol upgrade support where
one side of the connection is a fd-pair (stdin/stdout).

This is done by creating two half-duplex forwarders
that operate independantly. This also allows to simplify
some state tracking, e.g. because each fd serves only one
direction we don't need to dynamically create the event mask
with EPOLLIN etc, its enough to set it once. It also handles
non-pollable FDs transparently.

Thanks to Lennart for his excellent suggestions here.

sd-varlink: add sd_varlink_call_and_upgrade() for protocol upgrades

The varlink spec supports protocol upgrades and they are very
useful to e.g. transfer binary data directly via varlink. So
far sd-varlink was not supporting this.

This commit adds a new public sd_varlink_call_and_upgrade()
that sends a method call, waits for the reply, then steals
the connection fds for raw I/O. It returns separate input_fd
and output_fd to support both bidirectional sockets and pipe
pairs.

A helper is extracted and shared between sd_varlink_call_full()
and sd_varlink_call_and_upgrade(). A new `protocol_upgrade`
bool in `struct sd_varlink` ensures that on a protocol upgrade
request we only exactly read the varlink protocol bytes and
leave anything beyond that to the caller that speaks the upgraded
protocol.

Note that this is the client side of the library implementation
only for now. The server side needs work but this is already
useful as it allows to talk to varlink servers that speak protocol
upgrades (like the rust implemenations of varlink).

terminal-util: fix boot hang from ANSI terminal size queries

Since v257, terminal_fix_size() is called during early boot via
console_setup() → reset_dev_console_fd() to query terminal dimensions
via ANSI escape sequences. This has caused intermittent boot hangs
where the system gets stuck with a blinking cursor and requires a
keypress to continue (see systemd/systemd#35499).

The function tries CSI 18 first, then falls back to DSR if that fails.
Previously, each method independently opened a non-blocking fd, disabled
echo/icanon, ran its query, restored termios, and closed its fd. This
created two problems:

1. Echo window between CSI 18 and DSR fallback: After CSI 18 times out
   and restores termios (re-enabling ECHO and ICANON), there is a brief
   window before DSR disables them again. If the terminal's CSI 18
   response arrives during this window, it is echoed back to the
   terminal — where the terminal interprets \e[8;rows;cols t as a
   "resize text area" command — and the response bytes land in the
   canonical line buffer as stale input that can confuse the DSR
   response parser.

2. Cursor left at bottom-right on DSR timeout: The DSR method worked by
   sending two DSR queries — one to save the cursor position, then
   moving the cursor to (32766,32766) and sending another to read the
   clamped position. If neither response was received (timeout), the
   cursor restore was skipped (conditional on saved_row > 0), leaving
   the cursor at the bottom-right corner of the terminal. The
   subsequent terminal_reset_ansi_seq() then moved it to the beginning
   of the last line via \e[1G, making boot output appear at the bottom
   of the screen — giving the appearance of a hang even when the system
   was still booting.

This commit fixes both issues:

- terminal_fix_size() now opens the non-blocking fd and configures
  termios once for both query methods, so echo stays disabled for the
  entire CSI 18 → DSR fallback sequence with no gap. tcflush(TCIFLUSH)
  is called before each query to drain any stale input from the tty
  input queue.

- The DSR method now uses DECSC (\e7) / DECRC (\e8) to save and restore
  the cursor position via hardware, instead of querying it with a
  separate DSR round-trip. All four sequences (DECSC, CUP to
  bottom-right, DSR query, DECRC) are sent in a single write, so the
  terminal processes DECRC and restores the cursor regardless of whether
  userspace ever reads the DSR response. This eliminates the
  cursor-at-bottom-right artifact on timeout and simplifies the read
  loop to only need a single DSR response instead of two.

- The repeated setup boilerplate (dumb check, verify_same, fd_reopen,
  termios save/disable) is extracted into terminal_prepare_query(),
  shared by terminal_get_size_by_csi18(), terminal_get_size_by_dsr(),
  and terminal_fix_size().

Fixes: systemd/systemd#35499
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

terminal-util: add CLEANUP_TERMIOS_RESET() for automatic termios restore

Add TERMIOS_NULL sentinel, TermiosResetContext, and CLEANUP_TERMIOS_RESET()
macro (modeled after CLEANUP_ARRAY()) to automatically restore terminal
settings when leaving scope, replacing manual goto+tcsetattr patterns.

Migrate ask_string_full(), terminal_get_cursor_position(),
get_default_background_color(), terminal_get_terminfo_by_dcs(),
terminal_get_size_by_dsr() and terminal_get_size_by_csi18() to use the new
cleanup macro, removing the goto-based cleanup labels and replacing them
with direct returns.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

test-terminal-util: migrate to new assertion macros

Replace assert_se() calls with the more descriptive ASSERT_OK(),
ASSERT_OK_ZERO(), ASSERT_OK_ERRNO(), ASSERT_OK_POSITIVE(),
ASSERT_OK_EQ_ERRNO(), ASSERT_FAIL(), ASSERT_TRUE(), ASSERT_FALSE(),
ASSERT_EQ(), ASSERT_LE(), and ASSERT_NOT_NULL() macros throughout the
test file.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>

man: fix typos in some binary names

Refactor the option & verb table handling and convert a few more programs (#41335)

After having some more experience with how this works, I think some
changes are in order. So there are a handful of preparatory patches and
then conversion of a few progs that make use of the new functionality.

Only enable `NoAuto=true` for supported partitions

When `Format=empty` is set we need to check for `NoAuto` support for
the partition type, else we print a warning later in the build.

Followup for 381304a

cryptenroll: harden some variables with erasure on cleanup

This doesn't really matter as it runs in user contexts, but
follow good practice and mark all variables containing secrets
for erasure on cleanup

Reported on yeswehack.com as YWH-PGM9780-170

More assorted coverity fixes (#41413)

sysupdate: Ignore resources that are not pending

`updatectl enable --now` systematically fails because the update phase find
resources that are not pending. So instead we should ignore them.

Fixes #41254

discover-image: Ignore sysupdate temporary files

Sysupdate temporary file names do not match their extension-release names. So
they will always fail. That makes enabling any other sysexts/confexts fail
which has catastrophic consequences. Unfortunately since 260, sysupdate
leaves temporary files for long time instead just while downloading. So
this kind of failure now happens much more often.

sysupdated: Accept "current+pending" key

Since 594d0345fa997446b4c2dcfbccf3f83257bb55a3 the key for
current version might be "current+pending". So in order not to fail
we need to accept it.

Fixes #41409

many: another set of checks for pointer access without NULL check (#41400)

Followup for https://github.com/systemd/systemd/pull/41370

Next set of pointer-deref coccinelle tweaks for:
''src/core/'
'src/journal/'
'src/network/'
''src/nspawn/'

network-generator: support BOOTIF= and rd.bootif=0 options

The network generator currently supports many of the options described
by dracut.cmdline(7), but not everything.

This commit adds support for the BOOTIF= option (and the related
rd.bootif= option) used in PXE setups.

This is implemented by treating BOOTIF as a special name/placeholder
when used as an interface name, and expecting a MAC address to be set in
the BOOTIF= parameter. The resulting .network file then uses MACAddress=
in the [Match] section, instead of Name=.

ether-addr-util: introduce hw_addr_is_valid()

Take this from udev, and adapt it to make it re-usable elsewhere.

core: make check-pointer-deref clean

Add the needed assert changes to make the code clean
for the new check-pointer-deref script.

journald: add assert for allocated buffer size

Coverity flags allocated - 1 as a potential underflow when
allocated is 0. After GREEDY_REALLOC succeeds the buffer is
guaranteed non-empty, but Coverity cannot trace through the
conditional. Add an assert to document this.

CID#1548053

Follow-up for ec20fe5ffb8a00469bab209fff6c069bb93c6db2

test-json: avoid divide-by-zero coverity warning for index 9

Same fix as d0a066a1a4a391f629f7f52b5005103f8daf411f did for
index 10: add iszero_safe() check before dividing by the
json variant real value.

CID#1587762

Follow-up for d0a066a1a4a391f629f7f52b5005103f8daf411f

network: make check-pointer-deref clean

Add the needed assert changes to make the code clean
for the new check-pointer-deref script.

journal: make check-pointer-deref clean

Add the needed assert changes to make the code clean
for the new check-pointer-deref script.

nspawn: make check-pointer-deref clean

Add the needed assert changes to make the code clean
for the new check-pointer-deref script.

cleanup: address review feedback from claude

Trivial ordering/modernizing change that got highlighted by
claude and refined by keszybz to move to the modern systemd
style.

Thanks to keszybz for suggesting this.

mkosi: add coccinelle to the debian tools tree too

It is already part of the fedora/opensues tools tree. It must
have slipped through for Debian so lets add it.

nspawn-oci: add asserts for UID/GID validity after dispatch

Coverity flags UINT32_MAX - data.container_id as an underflow
when container_id could be UID_INVALID (UINT32_MAX). After
successful sd_json_dispatch_uid_gid(), the values are guaranteed
valid, but Coverity cannot trace through the callback. Add
asserts to document this invariant.

CID#1548072

Follow-up for 91c4d1affdba02a323dc2c7caccabe240ccb8302

boot: clamp setup header copy size to sizeof(SetupHeader)

The setup_size field from the kernel image header is used as part
of the memcpy size. Clamp it to sizeof(SetupHeader) to ensure the
copy does not read beyond the struct bounds even if the kernel
image header contains an unexpected value.

CID#1549197

Follow-up for d62c1777568ff69034fd5b5d582a2889229f7e20

creds-util: add assert for output buffer size overflow safety

Coverity flags the multi-term output.iov_len accumulation as a
potential overflow. Add an assert after the calculation to verify
the result is at least as large as the input, catching wraparound.

CID#1548068

Follow-up for 21bc0b6fa1de44b520353b935bf14160f9f70591

calendarspec: use ADD_SAFE for repeat offset calculation

Use overflow-safe ADD_SAFE() instead of raw addition when
computing the next matching calendar component with repeat.
On overflow, skip the component instead of using a bogus value.

CID#1548052

Follow-up for a2eb5ea79c53620cfcf616e83bfac0c431247f86