json-stream: hide JsonStreamQueueItem as an implementation detail
The json-stream API previously exposed JsonStreamQueueItem and several
functions operating on it (json_stream_make_queue_item(),
json_stream_enqueue_item(), json_stream_queue_item_free(),
json_stream_queue_item_get_data()). These existed solely to support
sd-varlink's "defer-and-modify" pattern for streaming replies, where a
reply is held back so its "continues" field can be set before
transmission. This is a varlink protocol concern that should not leak
into the generic transport layer.
Similarly, the fd pushing API (json_stream_push_fd(),
json_stream_reset_pushed_fds()) and the pushed_fds state lived inside
JsonStream, even though fd-to-message association is a protocol-level
concern managed entirely by sd-varlink.
Rework the API so that:
- JsonStreamQueueItem and all its functions become static to
json-stream.c. The only output API is now json_stream_enqueue_full()
(accepting explicit fds) and the inline json_stream_enqueue() wrapper
for the common no-fds case.
- The pushed_fds state moves from JsonStream into sd_varlink, where
sd_varlink_push_fd() and sd_varlink_reset_fds() manage it directly.
- The deferred reply in sd-varlink changes from a JsonStreamQueueItem*
to a plain sd_json_variant* plus a separate previous_fds/n_previous_fds
pair, keeping the protocol-specific bookkeeping in sd-varlink where it
belongs.
- A new varlink_enqueue() helper wraps json_stream_enqueue_full() with
the varlink connection's pushed fds, transferring fd ownership to the
queue item on success.
The reworks the ESP/XBOOTLDR logic to pin the ESP/XBOOTLDR via an fd,
and return that as optional return parameter.
So far we only pinned the parent dir of the ESP/XBOOTLDR, which was
useful when verifying that ESP/XBOOTLDR is actually a mount point by
comparing mount ids. This however became obsolete with a98a6eb95cc980edab4b0f9c59e6573edc7ffe0c. Hence, let's clean this up,
and pin the inode we really care about and return it.
chase: tighten flags checks in chase_and_unlinkat()
Some flags don't reasonably apply to chase_and_unlinkat() (because we
open the parent inode of an inode to delete, which is always a dir),
hence let's catch these flags when misused.
(I ran into this, and it was very confusing to debug, hence let's make
it easier)
Newer tar started using openat2() via open_subdir() to address
CVE-2025-45582 [0]. Now, gnulib, that tar uses, provides the openat2()
syscall in two ways [1]:
1) If glibc doesn't provide openat2(), it provides its own version in
openat2.c, that tries to call openat2() syscall first, and if it
returns ENOSYS, it emulates the function in userspace.
2) If glibc provides openat2(), it uses that directly, without providing
any fallback on ENOSYS.
Quite recently our test suite started calling nspawn with
--suppress-sync=yes. This means that we call seccomp_suppress_sync(),
which eventually calls block_open_flag(), that blocks the openat2()
syscall completely and refuses it with ENOSYS as this syscall can't be
sensibly filtered (see the openat2()-relevant comments in
block_open_flag() and seccomp_restrict_sxid()). And when glibc provides
openat2(), there's no fallback, so the ENOSYS bubbles up to the user as:
TEST-25-IMPORT.sh[163]: + tar xzf /var/tmp/scratch.tar.gz
TEST-25-IMPORT.sh[163]: tar: ./adirectory/athirdfile: Cannot open: Function not implemented
TEST-25-IMPORT.sh[163]: tar: Exiting with failure status due to previous errors
Let's mitigate this by re-enabling sync for TEST-25-IMPORT, at least for
now.
In nspawn.c's run_container() the child_netns_fd = receive_one_fd(...)
failure path logged 'r' instead of the negative errno returned in
child_netns_fd, so the actual error from receive_one_fd was being
overwritten by whatever 'r' happened to hold. The other receive_one_fd
call sites in the same function use the returned fd variable directly
(mntns_fd, etc.), so align this one.
In shared/nsresource.c's nsresource_add_cgroup() the cgroup_fd_idx =
sd_varlink_push_dup_fd(...) failure path logged userns_fd_idx, which
is the previous successful push's index, not the negative errno we
just got from pushing cgroup_fd. Log cgroup_fd_idx instead.
Both were flagged by static analysis (#41709) and match the immediately
preceding userns_fd-path pattern that was presumably copy-pasted.
compress: gracefully handle a truncated ZSTD frame
If a journal file contains a truncated ZSTD frame (i.e. a frame with
Frame_Content_Size > 0, but with not enough data in Data_Block),
ZSTD_decompressStream() would return a non-zero, non-error value. This
would then skip the error path in the ZSTD_isError() branch and we'd hit
the following assert:
Let's handle this situation gracefully and return EBADMSG instead.
Also, add another journalctl invocation to the corrupted-journals test
that goes through the sd_journal_get_data() -> decompress_startswith_zstd()
code path which, among other things, covers the issue when run on the
provided journal file.
strxcpyx: add a paranoia check for vsnprintf()'s return value
vsnprintf() can, under some circumstances, return negative value, namely
during encoding errors when converting wchars to multi-byte characters.
This would then wreak havoc in the arithmetics we do following the
vsnprintf() call. However, since we never do any wchar shenanigans in
our code it should never happen.
Let's encode this assumption into the code as an assert(), similarly how
we already do this in other places (like strextendf_with_separator()).
iovec-wrapper: rename iovw_append() to iovw_extend()
The naming is consistent with strv_extend().
This also
- introduces tiny iovw_extend_iov() wrapper,
- refuse when the source and target points to the same object,
- check the final count before extending in iovw_extend_iovw().
repart: add EncryptKDF= option for LUKS2 partitions
systemd-repart currently creates LUKS2 encrypted partitions using
libcryptsetup's default KDF (Argon2id), which requires ~1GB of memory
during key derivation. This is too much for memory-constrained
environments such as kdump with limited crashkernel memory, where
luksOpen fails due to insufficient memory.
Add an EncryptKDF= option to repart.d partition definitions that allows
selecting the KDF type. Supported values are:
- "argon2id" — Argon2id with libcryptsetup-benchmarked parameters
- "pbkdf2" — PBKDF2 with libcryptsetup-benchmarked parameters
- "minimal" — PBKDF2 with SHA-512, 1000 iterations, no benchmarking,
matching the existing cryptsetup_set_minimal_pbkdf() behaviour used
for TPM2-sealed keys
When not specified, the libcryptsetup default (argon2id) is used,
preserving existing behaviour.
The KDF type is applied via sym_crypt_set_pbkdf_type() after
sym_crypt_format() and before any keyslots are added.
These don't make too much sense on their own, but they also don't really
hurt. They are preparation for #41543, but in order to make things
either to review I split these four commits out, since they are not
directly part of what the PR shall achieve
The NOTES section in os-release(5) contains an unusual formatting.
Switch function and ulink tags and remove a newline within ulink text to
keep the entry formatting in sync with others. Also, this preserves the
formatting within the text itself.
mountpoint-util: initialize mnt_id for name_to_handle_at(AT_HANDLE_MNT_ID_UNIQUE)
Suppress the following message:
```
$ sudo valgrind --leak-check=full build/networkctl dhcp-lease wlp59s0
==175708== Memcheck, a memory error detector
==175708== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==175708== Using Valgrind-3.26.0 and LibVEX; rerun with -h for copyright info
==175708== Command: build/networkctl status wlp59s0
==175708==
==175708== Conditional jump or move depends on uninitialised value(s)
==175708== at 0x4BC33D1: inode_same_at (stat-util.c:610)
==175708== by 0x4BF1972: inode_same (stat-util.h:86)
==175708== by 0x4BF48FE: running_in_chroot (virt.c:817)
==175708== by 0x4B16643: running_in_chroot_or_offline (verbs.c:37)
==175708== by 0x4B175CE: _dispatch_verb_with_args (verbs.c:136)
==175708== by 0x4B17868: dispatch_verb (verbs.c:160)
==175708== by 0x407CBB: networkctl_main (networkctl.c:249)
==175708== by 0x407D06: run (networkctl.c:263)
==175708== by 0x407D39: main (networkctl.c:266)
==175708==
```
Not sure if it is an issue in valgrind or glibc, but at least there is
nothing we can do except for working around it.
sleep: convert to "verbs", using the new option+verb macros
We had verb-like dispatch, but done in a manual way. We have a fairly
heavy preperation steps that wraps all operations in the same way, so we
don't want to call the operation implementation functions directly. But
let's use the generic verb machinery and pass the state directly using
the userdata pointer and the recently added verb data pointer.
--help output is substantially the same, but options are now in a new
section below the verbs.
bootctl: make bootspec-util.c independent of bootctl.c
This changes boot_config_load_and_select() to also take the root path as
input, just like the ESP and XBOOTLDR path.
This has the benefit of making the whole file independent of bootctl.c,
which means we can link it into a separate test, and is preparatory work
for a follow-up commit.
If unprivileged_mode is false then verify_esp() will treat access errors
like any other and log about them. Here we set it to false, hence
there's no point to log a 2nd time.
boot: never auto-boot a menu entry with the non-default profile
When figuring out which menu entry to pick by default, let's not
consider any with a profile number > 0. This reflects that fact that
additional profiles are generally used for
debug/recovery/factory-reset/storage target mode boots, and those should
never be auto-selected. Hence do a simple check: if profile != 0, simply
do not consider the entry as a default.
We might eventually want to beef this up, and add a property one can set
in the profile metadata that controls this behaviour, but for now let's
just do a this simple fix.
namespace: don't log misleading error in the r > 0 path
fd_is_fs_type() returns < 0 for errors, 0 for false, and > 0 for true, so
in the r > branch we'd most likely report EPERM together with the error
message which is misleading.
Allows appending kernel command line arguments, like
kexec-tool does. This is especially needed for the integration
tests, as mkosi adds a bunch of options that are needed for the
test suite to work, and it breaks without them.
The interface of this program was rather strange. It took an option that
specified what to do, but that option behaved exactly like a verb. Let's
change the interface to the more modern style with verbs. Since the
inteface was documented in the man page, provide a compat shim to handle
the old options.
(In practice, I doubt anybody will notice the change. But since it was
documented, it's easier to provide the compat then to think too much
whether it is actually needed. I think we can drop it an year or so.)
Extend fake-report-server.py with optional --cert, --key, --port
arguments for TLS support. Add a test case that generates a
self-signed certificate and tests HTTPS upload of metrics and facts.
Also exercise the --header param.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a fake HTTP server (fake-report-server.py) that accepts JSON POST
requests and validates the report structure, and test cases in
TEST-74-AUX-UTILS.report.sh that exercise plain HTTP upload of both
metrics and facts.
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
timesync: verify the actual size of the received data
iov.iov_len doesn't change after calling recvmsg() so it remains set to
sizeof(ntpmsg), which makes the check for a short packet always false.
Let's fix that by checking the actual size of the received data instead.
ci: Switch PR review workflow to Opus 4.7 via Mantle endpoint
Opus 4.7 is in research preview on Bedrock and the Invoke API rejects
the beta headers Claude Code sends ("invalid beta flag"). Enable the
Mantle endpoint, which serves Claude via the native Anthropic API shape
and accepts those headers, and switch the model ID to the Mantle form
(no region prefix or version suffix).
All non-test users iovec_wrapper define the struct as a field in a
bigger structure, so we never free it individually. Let's simplify the
code and assume it is never null.
journal-upload: require TLS 1.2 as the minimum version
RFC 8996 says:
> This document formally deprecates Transport Layer Security (TLS)
> versions 1.0 (RFC 2246) and 1.1 (RFC 4346). Accordingly, those
> documents have been moved to Historic status. These versions lack
> support for current and recommended cryptographic algorithms and
> mechanisms, and various government and industry profiles of
> applications using TLS now mandate avoiding these old TLS versions.
> TLS version 1.2 became the recommended version for IETF protocols in
> 2008 (subsequently being obsoleted by TLS version 1.3 in 2018),
> providing sufficient time to transition away from older versions.
> Removing support for older versions from implementations reduces the
> attack surface, reduces opportunity for misconfiguration, and
> streamlines library and product maintenance.
This code probably only talks to our own receiver which uses
libmicrohttpd. That in turn delegates to GnuTLS, which supports
1.2, 1.3, 3.0, etc.
Previously we compiled curl-util.c at least two times, and then also
shared it using the extract+object. Let's build a static "convenience lib"
for it.
(Using extract+object everywhere is not possible because the different
places where it is used are conditionalized independently so we don't
have a single "source" that is always available.)