firstboot,sysinstall,hostnamed: always show FANCY_NAME=
This makes sure that whenever we want to show the OS name we can show
the fancy name. Thus this moves the escaping/validation of the fancy
name out of hostnamed into generic code, and then makes use of it in
sysinstall,firstboot,prompt-util.
Daan De Meyer [Mon, 11 May 2026 19:58:24 +0000 (21:58 +0200)]
mkosi: Drop CPUs= limit
Limiting VMs to 2 cpus was cargo culting without any
actual data that this benefits performance. The host OS
has a scheduler, let's make use of it and give the VM access
to all the CPUs. This doesn't mean they become inaccessible to
the host, it just means the VM gets as many virtual CPUs as the
host has CPU cores (threads). How they get scheduled is still up
to the host OS.
units: pull in basic.target rather than sysinit.target from system-install.target
Many of our services are nowadays implemented via socket activation, and
hence require sockets.target to be active to be accessible. One of them
is mute-console.socket, which we typically want to use from
systemd-firstboot.service, systemd-sysinstall.service and other related
services. Hence let's pull in basic.target rather than sysinit.target
from system-install.target since it pulls sockets.target in too.
Effectively, this doesn't change much except for pulling in a bunch more
sockets, and frankly going for sysinit.target was really a bug to begin
width.
Daan De Meyer [Mon, 11 May 2026 13:03:49 +0000 (13:03 +0000)]
boot,vconsole: Propagate UEFI HII keyboard layout to the OS
UEFI firmware can report the currently-active keyboard layout via
EFI_HII_DATABASE_PROTOCOL.GetKeyboardLayout(). The layout descriptor
includes an RFC 4646 / BCP 47 language tag (e.g. "en-US"). Query this
from sd-boot/sd-stub and write it to a new LoaderKeyboardLayout EFI
variable, advertised through a new EFI_LOADER_FEATURE_KEYBOARD_LAYOUT
feature bit.
On the OS side, systemd-vconsole-setup reads the variable as a
lowest-priority fallback for the console keymap. To map the BCP 47
tag to a vconsole keymap we extend /usr/share/systemd/kbd-model-map
with an optional sixth column listing the comma-separated BCP 47 tags
each row covers; a new find_vconsole_keymap_for_bcp47() helper walks
the file, preferring an exact tag match and otherwise falling back to
the row whose tag matches the input's primary subtag. Credentials,
/etc/vconsole.conf, and vconsole.keymap= on the kernel command line
continue to take precedence.
bootctl status surfaces the new variable, printing the language tag
or "n/a (not reported by firmware)" when sd-boot advertises the
feature but the firmware HII database didn't expose a layout (common
on QEMU without a USB keyboard, since EDK2's PS/2 driver does not
register an HII keyboard layout).
Daan De Meyer [Mon, 11 May 2026 13:00:19 +0000 (15:00 +0200)]
vmspawn: Attach a USB keyboard in GUI mode
EDK2's UsbKbDxe is the only driver that registers a default HII
keyboard layout via the HII database protocol; the PS/2 driver does
not. Adding a USB xHCI controller and usb-kbd in CONSOLE_GUI mode
gives us a layout to query, which systemd-boot exports through the
LoaderKeyboardLayout EFI variable — useful for exercising that
codepath end-to-end.
Michael Vogt [Fri, 8 May 2026 14:37:52 +0000 (16:37 +0200)]
units: enable systemd-report-basic.socket by default
In https://github.com/systemd/systemd/pull/41688 we merged metrics
and facts for systemd-report. However while some metric sources
are enabled by default (like `io.systemd.{Manager,Network}`) the
`io.systemd.Basic` service is not enabled by default.
This commit changes this and enables it by default.
We could also enable the systemd-report-cgroup.socket but that sends
a lot more data not sure that is a good default.
repart: make definitions varlink parameter actually optional
The Varlink iterface said the definitions directory was mandatory, and
so did the dispatch table. But that's nonsense, the code is completely
fine to operate without (same as cmdline repart invocations): it will
just use the standard definitions dir.
Luca Boccassi [Mon, 11 May 2026 11:58:13 +0000 (12:58 +0100)]
TEST-67-INTEGRITY: pre-load crypto modules and skip unsupported algorithms
The test occasionally fails on GHA CI when formatting with xxhash64
because dm-integrity's crypto_alloc_shash() -> request_module() path
flakily fails to load the algorithm:
[ 29.172664] TEST-67-INTEGRITY.sh[447]: + for a in crc32c crc32 xxhash64 sha1 sha256
[ 29.172664] TEST-67-INTEGRITY.sh[447]: + [[ xxhash64 == crc32 ]]
[ 29.172664] TEST-67-INTEGRITY.sh[447]: + test_one xxhash64 0
[ 29.172664] TEST-67-INTEGRITY.sh[447]: + integritysetup format /dev/loop0 --batch-mode -I xxhash64 ''
[ 29.223383] TEST-67-INTEGRITY.sh[1220]: device-mapper: reload ioctl on temporary-cryptsetup-fa8bebe3-1d87-4796-91e8-abc02c487bb5 (254:0) failed: No such file or directory
[ 29.226916] kernel: device-mapper: table: 254:0: integrity: Invalid internal hash (-ENOENT)
[ 29.227415] kernel: device-mapper: ioctl: error adding target to table
[ 29.231586] TEST-67-INTEGRITY.sh[1220]: Cannot format integrity for device /dev/loop0.
Preload each algorithm's crypto module before use, and skip algorithms
that are not registered in /proc/crypto.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Daan De Meyer [Sun, 10 May 2026 19:24:26 +0000 (21:24 +0200)]
clang-tidy: Drop unknown gcc compiler args
clang-tidy recently gained support to allow dropping
compiler args from the entries parsed from the compilation
database. Let's make use of this to drop the two compiler
args we use with gcc that clang doesn't support so we can
run clang-tidy on meson build trees configured to use gcc
without getting tons of false positives.
favilances [Sat, 9 May 2026 18:52:04 +0000 (21:52 +0300)]
test-path-util: add coverage for path edge cases
Path utility helpers are used throughout systemd for validation, comparison and manipulation of filesystem paths. Add coverage for additional corner cases around absolute path detection, normalization and prefix matching so regressions in these common helpers are easier to catch.
Luca Boccassi [Fri, 8 May 2026 19:25:56 +0000 (20:25 +0100)]
test: bump TEST-58-REPART timeouts with sanitizers
The test is flaky under sanitizers as the timeouts seem to be too short,
bump them like we do in other tests to try and make it more robust when
running with sanitizers
Luca Boccassi [Fri, 8 May 2026 15:16:04 +0000 (16:16 +0100)]
test: fix flaky TEST-07-PID1.socket-defer.sh
The socket's SubState transitions from 'running' to 'listening' shortly
after the triggered service becomes inactive, so the assert can race and
observe the stale 'running' state:
Luca Boccassi [Fri, 8 May 2026 14:09:25 +0000 (15:09 +0100)]
test: workaroud flaky TEST-53-TIMER.restart-trigger against journald cgroup attribution race
The restart-trigger subtest occasionally fails on CI with:
+ assert_eq 0 1
FAIL: expected: '1' actual: '0'
even though the timer fires correctly and the echo message is in fact
written to the journal. The failure happens because the test relies on
`journalctl --unit=$UNIT_NAME` to find the message, and that filter is
based on the cgroup journald looks up for the writer PID at the time
the stdout message is received.
For very short-lived processes spawned via systemd-executor (like
`echo`), that lookup is racy: the writer's `/proc/$PID/cgroup` can
still resolve to `/init.scope` (systemd-executor's own cgroup) rather
than the service's cgroup, so the message ends up attributed to
`init.scope` and `--unit=` filtering misses it.
Note _SYSTEMD_UNIT=init.scope / _SYSTEMD_CGROUP=/init.scope on the
echo output: this is what causes `--unit=timer-restart-14362` to
return 0 hits. The test failure logs from the same run confirm this:
+ JOURNAL_TS=1778160292
+ journalctl -p info --since=@1778160292 --unit=timer-restart-14362 '--grep=Hello from timer 29581'
-- No entries --
+ systemctl restart timer-restart-14362.timer
...
+ date '--set=+2 hours'
Thu May 7 15:24:52 UTC 2026
+ sleep 1
...
echo[816]: Hello from timer 29581
...
++ journalctl -q -p info --since=@1778160292 --unit=timer-restart-14362 '--grep=Hello from timer 29581'
++ wc -l
+ assert_eq 0 1
FAIL: expected: '1' actual: '0'
For comparison, in a passing local run the same message is attributed
correctly to the service unit (_SYSTEMD_UNIT=timer-restart-24147.service),
so `--unit=` matches.
Work around the underlying journald race in the test by setting an
explicit `SyslogIdentifier=` on the service and matching with `-t` plus
the unique grep pattern: `SyslogIdentifier` is carried over the stdout
stream protocol and is not affected by the cgroup lookup race.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
The ITE keyboard controller firmware (version 0xAB83) is shared
between the Clevo PA70ES and the X+ piccolo series.
The piccolo's hwdb rule matches by input device ID
(evdev:input:b0011v0001p0001eAB83*) and remaps scan code 0x9c
(KP_Enter) to Enter, since the piccolo has no numpad and its
main Enter key sends the wrong scan code.
The Clevo PA70ES has a real numpad. The piccolo rule matches it
because both laptops use the same ITE controller firmware, which
breaks KP_Enter on the PA70ES.
Add a DMI-specific override that restores KEY_KPENTER for 0x9c
on the PA70ES.
The piccolo rule should ideally be narrowed to use DMI matching
instead of input device ID to avoid catching other laptops with
the same ITE controller firmware.
Daan De Meyer [Fri, 8 May 2026 19:28:36 +0000 (21:28 +0200)]
mkosi: drop libucontext again
Turns out it's possible to implement fibers without unnecessary
system calls and without ucontext.h so there's no need for libucontext
anymore, so drop it from the package list.
Ivan Kruglov [Thu, 7 May 2026 09:16:51 +0000 (02:16 -0700)]
test: add missing varlink IDL enum tests for Job and ServiceType
PR #41583 (io.systemd.Unit.StartTransient) introduced several new varlink IDL enum types without corresponding enum consistency tests:
- JobType, JobState, JobResult in the new io.systemd.Job interface
- ServiceType in the Unit interface's ServiceContext
Add a new test-varlink-idl-job test file covering all three Job enums, and add ServiceType coverage to the existing test-varlink-idl-unit test. Export vl_type_ServiceType (was static) so it can be referenced from the test.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
userdbctl: actually implement option parsing stop after --chain
The basic idea is that --chain should stop option parsing. But
previously this didn't work, so --chain could be specified anywhere
in the command line. To maintain with compatibility with that,
allow --chain to be specified anywhere until the first positional
arg or option in the command string. This allows options to be passed
in the expected fashion:
userdbctl --chain ssh-authorized-keys user cmd --opt1 --opt2
userdbctl --chain ssh-authorized-keys user -- cmd --opt1 --opt2
but also allows the invocations which worked previously:
userdbctl ssh-authorized-keys user --chain cmd
userdbctl ssh-authorized-keys user cmd --chain
Daan De Meyer [Fri, 1 May 2026 09:08:35 +0000 (09:08 +0000)]
curl-util: bring CurlGlue/CurlSlot in line with sd-bus and qmp-client
Refactor curl-util to use the same per-request, refcounted, cancellable
slot model as sd-bus, sd-varlink and qmp-client.
CurlGlue becomes opaque and refcounted, and dispatches per-slot
completion callbacks through CURLOPT_PRIVATE instead of a single
g->on_finished demux that every caller had to switch on. The new
curl_glue_perform_async(g, easy, cb, userdata, &slot) replaces
curl_glue_add + the on_finished/userdata wiring.
CurlSlot is the per-request handle: it owns the easy handle,
curl_slot_unref does curl_multi_remove_handle + curl_easy_cleanup
(which doubles as cancel since remove aborts in-flight transfers
without queuing CURLMSG_DONE), and floating slots (ret_slot=NULL) are
kept alive in the glue's slot set until the callback fires. Drop the
userdata parameter from curl_glue_make: CURLOPT_PRIVATE is now used
internally to route completions to the slot.
Migrate pull-job and the pull-{oci,raw,tar} drivers, and imdsd, to the
new shape. PullJob.curl becomes PullJob.slot; pull_job_curl_on_finished
becomes a per-slot callback. imdsd routes its token-vs-data branch off
slot identity rather than easy-handle pointer comparison. Both daemons
drop the global on_finished/userdata wiring on the glue. pull_job_finish
and context_fail{,_full} now return int (always 0) so the callbacks
stay in the `return finish(...);` style.
Add test-curl-util covering glue lifecycle, easy-handle defaults,
floating and non-floating perform paths, cancel-via-slot-unref (verified
by a sentinel request that drives the loop to completion), and three
concurrent requests on a single glue. Tests fetch local files via
file:// URLs so no network is needed; libcurl availability is probed
once via dlopen_curl in intro().
The situation with --chain is complicated. The old code tried to use "+…"
in getopt_long() to stop option parsing. But it didn't actually work.
This logic was originally added in 8072a7e6a9eaf2de120797dd16c5e0baea606219. ef9c12b157a50d63e8a8eb710c013d16c2cea319 added an comment about 'optind=0'
which explains why the code doesn't work, but the code wasn't changed.
To wit:
$ userdbctl.old --no-pager --chain ssh-authorized-keys zbyszek -- /bin/echo --asdf
--asdf
$ userdbctl.old --no-pager --chain ssh-authorized-keys zbyszek /bin/echo -- --asdf
--asdf
$ userdbctl.old --no-pager --chain ssh-authorized-keys zbyszek /bin/echo --asdf
userdbctl.old: unrecognized option '--asdf'
(Basically, if "--" is used, it can be anywhere, since getopt_long() doesn't do
anything special after --chain and looks for the next option. There were some
tests of --chain, but they all used the username as the positional argument, so
it wasn't misinterpreted as an option.)
This behaviour is preserved in the conversion.
--help is generally the same except for expected formatting changes.
--json= is moved above between --output= and -j. For some reason it was
further down.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Luca Boccassi [Wed, 6 May 2026 18:57:19 +0000 (19:57 +0100)]
test: try to make TEST-04-JOURNAL.journalctl-varlink less flaky
The io.systemd.JournalAccess server occasionally returns NoEntries for a
unit-filter query right after the unit logged its message, e.g. from a
failing CI run:
[ 1204.967910] TEST-04-JOURNAL.sh[15025]: ++ varlinkctl call --more /run/systemd/io.systemd.JournalAccess io.systemd.JournalAccess.GetEntries '{"units": ["test-journalctl-varlink-1-13583.service", "test-journalctl-varlink-2-25039.service"]}'
[ 1205.017361] journalctl[15026]: varlink-3-3: Received message: {"method":"io.systemd.JournalAccess.GetEntries","parameters":{"units":["test-journalctl-varlink-1-13583.service","test-journalctl-varlink-2-25039.service"]},"more":true}
[ 1205.017498] journalctl[15026]: Failed to open journal file /var/log/journal/ce54feb228124e639f3b7779beeaff60/system.journal: No data available
[ 1205.017823] journalctl[15026]: varlink-3-3: Sending message: {"error":"io.systemd.JournalAccess.NoEntries"}
[ 1205.017936] TEST-04-JOURNAL.sh[15025]: Method call failed: io.systemd.JournalAccess.NoEntries
[ 1205.499083] TEST-04-JOURNAL.sh[146]: Subtest /usr/lib/systemd/tests/testdata/units/TEST-04-JOURNAL.journalctl-varlink.sh failed
Wrap the calls that expect data in a helper that retries up to 3 times on
NoEntries, syncing the journal between attempts.
parse_from_file doesn't set arg_from_file itself, but returns a
sd_json_variant ref to the caller. I think the change of arg_from_file
is more readable with this structure.
Dirga Yuza [Fri, 8 May 2026 00:10:40 +0000 (07:10 +0700)]
hwdb: add force-release to Nitro AN515-58 backlight keys
This fixes an incomplete mapping introduced in PR #39769 for the Acer
Nitro 5 AN515-58.
The previous PR mapped the physical keyboard backlight keys (scancodes
`0xef` and `0xf0`) to `kbdillumup` and `kbdillumdown` to prevent them
from dropping screen brightness.
However, the embedded controller on this Acer model only emits "make"
(press) scancodes and fails to emit "break" (release) scancodes for
these specific keys. Without a release event, the input subsystem
registers the keys as continously held down (auto-repeat). In desktop
environments like KDE Plasma, pressing the key once causes the
brightness UI slider to get stuck in an infinite adjustment loop.
This issue is previously unnoticed as this model did not expose any
keyboard backlight control.
The fix is done by prepending the `!` (force-release) flag to the
keycodes. This instructs `evdev` to synthesize a key
release event.
The fix is verified locally on an Acer Nitro AN515-58. `evtest` now
correctly reports `value 1` immediately followed by `value` 0, and KDE
Plasma brightness OSD no longer gets stuck.
sd-dhcp-client: avoid taking and dropping a reference
The helper would create a new ref, even though we had one handy
and didn't need to create a new ref. So change the helper to
take an existing reference.
tree-wide: rename unref_and_replace_full to unref_and_replace_new_ref
We have a number of *_unref_and_replace macros. One could think that
they are like the various free_and_replace variants, but they actually
create a new ref to the passed object. The free_and_replace variants
take ownership of the argument. This inconsistency is surprising. Rename
all those functions to have "_new_ref" at the end to make the difference
clear.
Use OPTION_NAMESPACE() to keep the resolvectl and systemd-resolve
option sets separate. The resolvconf-compat path (resolvconf
invocation) keeps its own getopt-based parsing.
--help output has the expected changes to formatting. Synopis
for [status] is now shows that the verb is optional.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Luca Boccassi [Wed, 6 May 2026 17:04:51 +0000 (18:04 +0100)]
test: fix flaky testcase_15_wait_online_dns in TEST-75-RESOLVED
The test used `timeout 30 bash -c "journalctl -b -u $unit -f | grep -m1 ..."`
to wait for systemd-networkd-wait-online to log that no DNS server is
accessible. The expected message is actually emitted ~1s after the unit
starts, but `grep -m1` exiting doesn't tear down `journalctl -f`: journalctl
only notices the closed pipe on its next write, which may never happen for
an otherwise idle unit. The pipeline therefore hangs until the 30s timeout
fires, eventually causing the test to fail.
Replace the follow+pipe with a polling `journalctl --grep` loop, which
exits cleanly as soon as the message lands in the journal.
Logs from the failing run:
[ 2650.871441] systemd-networkd-wait-online[2190]: dns0: No DNS configuration yet
[ 2651.723180] systemd-networkd-wait-online[2190]: dns0: No DNS server is accessible.
[ 2680.909048] systemd-networkd-wait-online[2190]: json-stream: Got POLLHUP from socket.
[ 2680.909092] systemd-networkd-wait-online[2190]: DNS configuration monitor disconnected, reconnecting...
[ 2680.914368] systemd-networkd-wait-online[2190]: Failed to connect to io.systemd.Resolve.Monitor: Connection refused
[ 2681.966674] systemd-networkd-wait-online[2190]: dns0: No DNS server is accessible.
[ 2681.969527] systemd-networkd-wait-online[2190]: Failed to connect to io.systemd.Resolve.Monitor: Connection refused
[ 2682.077032] systemd[1]: Stopping wait-online-dns-0f9e4f6d-8b34-4cff-b2da-03612ca731e8.service - [systemd-run] /usr/lib/systemd/systemd-networkd-wait-online --timeout=0 --dns --interface=dns0...
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Previously, we'd show a partial synopsis for systemd-mount
in --help for systemd-umount. I don't think it makes sense to do that.
So now the --help for systemd-umount is separate, with just its syntax
and a new blurb.
"transiently" is dropped from the description. Mount points generally
are transient, so no need to say that. (E.g. the man page for mount just
says "attach" and "detach".)
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
The order of options is changed (to what was present in parse_argv).
I don't the order in --help was mostly random, as is the new one,
so I didn't try to preserve the old order. Some help strings are
reworded/adjusted.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
Paul Meyer [Wed, 6 May 2026 15:35:48 +0000 (17:35 +0200)]
vmspawn: search XDG_DATA_DIRS for QEMU firmware
get_firmware_search_dirs() previously hardcoded /usr/share/qemu/firmware
as the only system-wide search path. That assumption breaks on
distributions that deliberately do not populate /usr/share, making
vmspawn fail: "Failed to find OVMF config: No such file or directory".
NixOS exposes those firmware locations through XDG_DATA_DIRS.
Extend the search list with XDG_DATA_HOME/XDG_DATA_DIRS. This is the
standard XDG mechanism and is already what QEMU itself uses for the same
descriptors, so behavior matches user expectations across tooling.
To avoid regressing setups where user has set XDG_DATA_DIRS to a custom
value that omits /usr/share, keep /usr/share/qemu/firmware as an
unconditional fallback.
Precedence is unchanged: XDG_CONFIG_HOME/qemu/firmware still wins
over /etc/qemu/firmware, which still wins over any shared-data dir.
Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Paul Meyer <katexochen0@gmail.com>
dbus: limit the number of env variables to something reasonable, vol. 3
Let's limit the number of environment variables when creating a
transient unit via StartTransientUnit as well, since validating the
environment variable names/assignments is expensive.
vmspawn: reject --bind-volume= duplicates at parse time (#41961)
bind_volume_parse() does not look at peers, so passing the same
PROVIDER:VOLUME twice on the command line silently produces two parsed
entries in arg_bind_volumes. vmspawn_bind_volume_acquire() then builds
two DriveInfo with identical d->id ("<provider>:<volume>"). At boot,
bridge_register_drive() puts d->id into the b->block_devices hashmap;
the second insert returns -EEXIST and the user sees a bare "File exists"
with no context for which volume is responsible.
Reject the collision at the parse site with a linear scan over the
existing array — n_items is small (one entry per --bind-volume on the
command line), and a clear error message naming the offending volume is
much more useful than the late EEXIST from the QMP setup loop.
vmspawn: reject --bind-volume= duplicates at parse time
bind_volume_parse() does not look at peers, so passing the same
PROVIDER:VOLUME twice on the command line silently produces two parsed
entries in arg_bind_volumes. vmspawn_bind_volume_acquire() then builds
two DriveInfo with identical d->id ("<provider>:<volume>"). At boot,
bridge_register_drive() puts d->id into the b->block_devices hashmap;
the second insert returns -EEXIST and the user sees a bare "File
exists" with no context for which volume is responsible.
Reject the collision at the parse site with a linear scan over the
existing array — n_items is small (one entry per --bind-volume on the
command line), and a clear error message naming the offending volume
is much more useful than the late EEXIST from the QMP setup loop.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
The order in --help is changed to move have 'list', 'inspect' (query
operations), 'attach'/'detach'/'reattach' (main ops), and then the
other more specialized verbs.
PR #41776 introduced the io.systemd.StorageProvider Varlink interface
and
two backends ('block' exposes host block devices, 'fs' exposes regular
files / dirs / subvolumes under /var/lib/storage), plus the
storagectl(1)
CLI to enumerate them. The only consumer so far was mount.storage. This
series wires up the first of the three integrations called out in
TODO.md:
systemd-vmspawn --bind-volume=PROVIDER:VOLUME[:CONFIG][:K=V,...]
Boot-time attach. Drives added this way are immutable at runtime.
io.systemd.MachineInstance.AddStorage / .RemoveStorage
Two new generic methods on the per-machine control socket. vmspawn
implements them (this series); systemd-nspawn will reuse the same
methods later.
machinectl bind-volume MACHINE PROVIDER:VOLUME[:CONFIG][:K=V,...]
machinectl unbind-volume MACHINE PROVIDER:VOLUME
Runtime hotplug front-end: machinectl Acquire()s the fd locally and
pushes it across to the target machine's MachineInstance socket.
Volumes are identified by a user-visible name "<provider>:<volume>"
(e.g.
"block:/dev/sda"). The 3rd 'config' field is opaque to the shared layer
and interpreted per backend — vmspawn maps it to a DiskType from
disk_type_table[] (virtio-blk default, virtio-scsi, nvme, scsi-cd; same
vocabulary as --extra-drive); future nspawn will read it as a mount
path.
- Document the new --bind-volume= option in systemd-vmspawn(1) and
the new bind-volume / unbind-volume verbs in machinectl(1).
- Add an integration test
(TEST-87-AUX-UTILS-VM.bind-volume.sh) covering boot-time attach
via --bind-volume, runtime attach via 'machinectl bind-volume',
runtime detach via 'machinectl unbind-volume', the StorageImmutable
rejection of attempts to detach boot-time volumes, and the
NoSuchStorage rejection of detach on unknown names.
- Strike "hook-up in systemd-vmspawn" from TODO.md; the nspawn and
service-manager hookups remain.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
For bind-volume, machinectl parses the SPEC with the shared
bind_volume_parse(), Acquires the storage volume from the named
provider on the machinectl side, locates the target machine's
io.systemd.MachineInstance control socket via
machine_get_control_address(), pushes the fd across, and calls
io.systemd.MachineInstance.AddStorage with name='<provider>:<volume>'
and the user-supplied config string.
For unbind-volume, machinectl just forwards the name string to
io.systemd.MachineInstance.RemoveStorage.
Volumes attached at machine startup (e.g. via systemd-vmspawn's
--bind-volume=) are rejected with StorageImmutable when the user
attempts to unbind them at runtime.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Wire up the runtime hotplug Varlink methods on the per-VM control
socket:
AddStorage → take fd from the link, look up the DiskType from the
'config' field, build a DriveInfo flagged
QMP_DRIVE_REMOVABLE, dispatch to
vmspawn_qmp_add_block_device(). Reply delivered async
by on_add_device_add_complete() once the guest sees
the device.
RemoveStorage → forward the user-visible name to
vmspawn_qmp_remove_block_device(); the existing
device_del / DEVICE_DELETED / blockdev-del chain
replies on the link.
Add SD_VARLINK_SERVER_ALLOW_FD_PASSING_INPUT to the server flags so
clients can push storage fds across via sd_varlink_push_fd().
Maps -EEXIST → StorageExists and -EOPNOTSUPP/-EINVAL →
ConfigNotSupported in the AddStorage handler so callers see the
specific MachineInstance errors.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
For each --bind-volume passed at startup, vmspawn calls Acquire() on
the named StorageProvider and attaches the resulting fd to the VM as
an additional drive. The drive is identified by the user-visible name
'<provider>:<volume>' on the bridge — that is also the handle used
later when machinectl unbind-volume detaches drives at runtime
(though boot-time drives like these are NOT removable; that is the
StorageImmutable behaviour added earlier).
The colon grammar is parsed by the shared bind_volume_parse() helper.
The 3rd 'config' field selects the guest device type from the
disk_type_table[] vocabulary (virtio-blk, virtio-scsi, nvme, scsi-cd);
empty defaults to virtio-blk per the TASK grammar.
Wiring lives next to the existing --extra-drive setup: parse_argv()
appends a parsed BindVolume to arg_bind_volumes, and prepare_device_info()
hands the array to vmspawn_bind_volume_prepare_boot() which Acquires
each volume and pushes a DriveInfo onto the existing drives array.
PCIe port assignment (assign_pcie_ports()) and the QMP setup loop pick
them up automatically.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
This is vmspawn's per-backend code for the StorageProvider integration.
Other backends (future systemd-nspawn, future service-manager
BindVolume=) consume the same shared parser and Acquire helper but
each provides its own attach/detach glue; this is vmspawn's.
- disk_type_from_bind_volume_config() turns the opaque BindVolume
'config' field (e.g. "scsi-cd") into a DiskType. Empty defaults to
virtio-blk to match the --bind-volume CLI grammar.
- vmspawn_bind_volume_acquire() takes a parsed BindVolume, calls
storage_acquire_volume() for the fd, and builds a DriveInfo ready
for vmspawn_qmp_setup_drives() (boot) or vmspawn_qmp_add_block_device()
(hotplug). Rejects directory-typed volumes (vmspawn block devices
need a regular file or a host block device).
- vmspawn_bind_volume_attach_fd() is the runtime path: takes a fd
that was already pushed across by an AddStorage caller plus the
name+config it specified, builds the DriveInfo with
QMP_DRIVE_REMOVABLE set and a varlink link, and dispatches to
vmspawn_qmp_add_block_device(). Reply is delivered asynchronously
by the existing on_add_device_add_complete() callback.
- vmspawn_bind_volume_prepare_boot() is a thin loop the boot-time
path uses to populate DriveInfos.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
vmspawn: track removability as a QmpDriveFlags bit and expose add_block_device
Drives attached at boot via the existing CLI options (--image,
--extra-drive) must not be detachable at runtime via the upcoming
RemoveStorage Varlink method, while drives added at runtime via
AddStorage must be. Track this distinction with a new QMP_DRIVE_REMOVABLE
property flag — placed alongside QMP_DRIVE_BLOCK_DEVICE, not in the
transient BlockDeviceStateFlags state-machine, since "may be removed"
is a permanent property of the drive.
vmspawn_qmp_remove_block_device() now early-rejects unknown ids with
io.systemd.MachineInstance.NoSuchStorage and immutable drives with
io.systemd.MachineInstance.StorageImmutable.
vmspawn_qmp_add_block_device() loses its 'static' qualifier and gets a
declaration in the header, so the runtime hotplug path
(vmspawn-bind-volume.c, next) can dispatch into it directly.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
shared: add AddStorage / RemoveStorage to io.systemd.MachineInstance
Define two new methods on the generic 'MachineInstance' Varlink
interface that systemd-vmspawn (this series) and (future)
systemd-nspawn implement on their per-machine control sockets:
AddStorage(fileDescriptorIndex, name, config?) -> ()
Attach a storage volume — the caller passes an fd previously
acquired from a StorageProvider, plus a unique name of the form
'<provider>:<volume>' that identifies this binding for later
removal, plus a backend-specific 'config' field (vmspawn: guest
device type; future nspawn: mount path).
RemoveStorage(name) -> ()
Detach a previously-added storage volume.
Plus errors NoSuchStorage, StorageExists, StorageImmutable (the volume
was attached at boot and cannot be removed), BadConfig, and
ConfigNotSupported. Names follow the io.systemd.StorageProvider
vocabulary (NoSuchVolume, BadTemplate, TypeNotSupported, etc.) so the
two interfaces are visually consistent.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
storagectl: refactor mount.storage helper to use storage_acquire_volume()
Drop the inline socket-build + sd_varlink_callbo() + reply-dispatch
+ take_fd block from run_as_mount_helper() in favour of the shared
helper. Preserves the type-fallback retry (TypeNotSupported / WrongType
re-tries with requestAs="blk") and the per-error-id message mapping;
the helper just reports the io.systemd.StorageProvider.* error name
back to the caller.
Net effect: ~50 lines of dedup, no functional change.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
storagectl's mount.storage helper bundles "open StorageProvider socket
+ Acquire() + dispatch reply + take fd" inline. Future consumers
(systemd-vmspawn boot-time --bind-volume, machinectl bind-volume) need
the same dance.
Factor it into a single libshared helper that takes the Acquire()
parameters by value and returns the fd plus the actual type/read-only
flags. Library code, so no logging — varlink errors are surfaced via
sd_varlink_error_to_errno() and the StorageProvider error_id is
returned to the caller via reterr_error_id (caller decides how to
format messages).
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Add a universal parser for the colon-separated grammar
'PROVIDER:VOLUME[:CONFIG][:K=V,K=V,…]' that backs --bind-volume on
systemd-vmspawn (next), machinectl bind-volume, and the future nspawn
+ service-manager BindVolume= integrations.
The 'config' field is opaque to shared code and interpreted per
backend (vmspawn: a DiskType name, future nspawn: a mount path). The
trailing key=value list is parsed into the io.systemd.StorageProvider
.Acquire() parameters (template, create, read-only/ro, size/create-size
and request-as), with values validated against the existing
storage-util enums and validators. Provider/volume names are checked
with storage_provider_name_is_valid() and storage_volume_name_is_valid();
the combined "<provider>:<volume>" string is also validated as
string_is_safe so it is safe to use as a QEMU device id.
Add a test-machine-util unit test covering the happy paths plus a
handful of malformed inputs.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
The storage backend providers (block, fs) and storagectl currently each
extract storage-util.c into their target. Several upcoming consumers
(machine-util's BindVolume parser, vmspawn's hotplug glue, machinectl's
new bind-volume verbs) need the StorageProvider type/string-table
helpers and a future shared Acquire client helper.
Move storage-util.{c,h} to src/shared so libshared exports the symbols
once and every consumer (storage providers, storagectl, libshared
itself) picks them up by linking libshared. Drop the now-redundant
'extract'/'objects' wiring in src/storage/meson.build.
No code changes; this is purely a relocation.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
The mount.storage helper open-codes the conventional 64K UID/GID
delegation block size as 0x10000 / 0x10000U in four places. Several
other places in the tree do the same (nspawn's arg_uid_range default,
homed's mount setup, …), but with no shared name.
Add USERNS_RANGE_SIZE in user-util.h alongside UID_NOBODY and friends,
and switch storagectl over to it. Other call sites can adopt it
incrementally.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
test-homectl-prompts: add manual test to exercise prompt functionality
The prompt for groups is nice. The prompt for a shell could use some
love. Looking at this is much easier if we can invoke the code outside
in isolation.
I wrote this when looking at https://github.com/systemd/systemd/pull/41947,
where I wanted to see how the homectl prompt works with the changes.