Yu Watanabe [Sat, 15 Nov 2025 19:46:18 +0000 (04:46 +0900)]
nspawn: Prevent invalid UIDs propagating in bind mounts (#39729)
Commit 88fce090263ba8944cf491346eae2e8022dfd88d modified the
mount_bind() function, causing it to perform arithmetic on the uid_shift
parameter. However, it performs this arithmetic even when uid_shift was
UID_INVALID, which was not intended. This typically occurred when
mount_custom() was called for a simple bind mount without user
namespaces (and thus no rootidmap mount option).
This arithmetic (e.g., uid_shift + m->destination_uid) then wraps
around, resulting in the invalid ID 4294967295 ((uid_t)-1).
This bug manifests for users running systemd-nspawn with
--link-journal=host and --volatile=yes (but without --private-users),
causing systemd-tmpfiles to fail.
Make mount_bind() robust by checking if uid_shift is valid before using
it in arithmetic. If it is UID_INVALID, it defaults to a shift of 0 for
the ownership calculation, restoring correct behavior for plain bind
mounts while preserving the intended logic for ID-mapped mounts.
units: let's set a socket name for networkd rtnl socket
Let's make our networkd sockets recognizable purely by name. It hink
already for debugging it's a good idea to always set socket names, in
particular for services that have multiple sockets they listen on.
This adds a name to the rtnl socket, which so far missed one. Note that
the C code won't look for it, for compat with older versions, but at
least things are a bit more debuggable.
Let's expose local VMs/containers under ._dhcp by default. Let's also
expose WIFI AP clients under .home.arpa (i.e. the RFC8375 domain for
home networks).
This function doesn't "connect" to Varlink (i.e. it isn't a client) but
it binds a Varlink socket (i.e. it is server), hence let's remove the
verb "connect" from its name. let's copy how machined/resolved name the
counterpart for this function: manager_varlink_init()
Luca Boccassi [Sat, 15 Nov 2025 00:37:58 +0000 (00:37 +0000)]
test: always create networkd mock tmpfs for networkd-test.py
Match the behaviour of the other test classes that use sd-run and
always create the mock tmpfs runtime dirs.
This will be needed as the new resolve.hook directory won't exist
on boot but will be needed by the test case.
This basically implements nss-myhostname, but natively in
systemd-resolved, so that the logic becomes available also for clients
using the local DNS stub for resolution or the D-Bus or Varlink APIs.
This introduces /run/systemd/resolve.hook/ as a new directory that local
(privileged) programs can bind a Varlink socket into. If they do they'll
get a method call for each attempted resolved lookup, which they can
then either process themselves (and generate new records for, or return
errors to block stuff) or let pass so that the regular resolution is
done.
Usecase for this is primarily two things:
1. in machined we can add local resolution of machine names to their IP
addresses, similar in fashion to nss-mymachines, but working also if
the non-NSS interfaces to name resolution are used, i.e. the local
DNS responder. In fact, I think we should eventually remove
nss-mymachines from our tree, as soon as this code in resolved is
setlled.
2. in networkd we can add local resolution of names specified in DHCP
leases we hand out.
But beyond that there should be many other uses, for example people
could write "dns firewalls" with this if they like where they
dynamically block certain names from resolution.
Yu Watanabe [Sat, 15 Nov 2025 01:09:19 +0000 (10:09 +0900)]
Make new sd-path configuration search functionality generic (#39684)
Reverts systemd/systemd#38680
After taking a closer look I'm not convinced by the approach, see below.
First of all, all other SD_PATH_SEARCH_* are either somewhat generic,
i.e. encode the common prefix for configurations, binaries, etc., or are
subdirectories under systemd/ hence in our own "domain". The
tmpfiles/sysctl/binfmt we don't prefix with "systemd" precisely because
the concept is generic and there're actually other impls of them. A
specific SD_PATH_SEARCH_SYSCTL doesn't fit into our existing scheme.
Instead something along the lines of "SEARCH_SYSTEM_CONFIGURATION" shall
be introduced, and consumers will just suffix
sysctl.d/tmpfiles.d/binfmt.d for the final result.
And secondly, I don't grok why systemd-sysctl now unnecessarily calls
into sd-path to obtain the fixed search path. None of our other tools do
that.
-----------
An alternate approach, SD_PATH_SYSTEM_SEARCH_CONFIGURATION, which does
exactly above, will be introduced instead. It provides a universal
interface for querying any system config with our idiomatic
/etc/:/run/:/usr/local/lib/:/usr/lib/ hierarchy.
Luca Boccassi [Fri, 14 Nov 2025 21:27:24 +0000 (21:27 +0000)]
Try to make TEST-75-RESOLVED less flaky (#39733)
These tests unfortunately rely on polling in several areas. In some
cases, it appears the timeouts are too short (e.g. #39602, or
https://github.com/systemd/systemd/actions/runs/19369869943/job/55422626427?pr=39731#logs).
Try to adjust the timeouts to see if this makes things more reliable.
man: document that ConditionSecurity=tpm2 means full UEFI/PC Client profile support
TPM2 support is not too useful if the firmware doesn't actually use it
for the boot chain, hence we require the full PC client profile support.
Let's make that clear in the docs.
units: measure a separator event into PCR 9 after completing NvPCR initialization
We do this in a separate service (rather than inside of
systemd-tpm2-setup), since we want failures of this measurement to
result in an instant reboot, like for most our measurements.
Failures to initialize nvpcrs, or allocate an SRK are somewhat OK (and
more likely), as long as this separator communicates clearly where they
have to have taken place, if they worked.
tpm2-setup: measure information about NvPCR initialization to PCR 9
This locks down NvPCR initilization a bit more: we'll measure each
initialization of an NvPCR into PCR 9, thus chaining the NvPCRs to the
PCR set. After all NvPCRs are initialized we measure a barrier into PCR
9 as well.
This ensures that later additions of NvPCRs are clearly recognizable and
distuingishable from those done at boot.
Nick Rosbrook [Fri, 14 Nov 2025 19:13:07 +0000 (14:13 -0500)]
test: adjust timeouts for testcase_15_wait_online_dns
Do not set a timeout on the wait-online call, since there are timeout
calls later that will prevent the test from blocking forever. Increase
those timeout calls for slower CI runs.
Nick Rosbrook [Fri, 14 Nov 2025 14:37:21 +0000 (09:37 -0500)]
test: wait for interface to come online before checking DNS scopes
The current test is flaky because it creates a new interface definition,
calls networkctl reload, and then calls resolvectl show-cache. If
resolved has not received the changes and setup the DNS scopes for the
interface, show-cache will be empty for that interface.
Yu Watanabe [Thu, 13 Nov 2025 23:26:47 +0000 (08:26 +0900)]
musl: test-bus-error: drop ._need_free flag checks
Its value depends on how strerror_r() implemented, and the
implementations of the function in glibc and musl are actually
different. Let's drop the checks.
Yu Watanabe [Tue, 24 Jan 2023 07:39:46 +0000 (23:39 -0800)]
musl: introduce GNU specific version of strerror_r()
musl provides XSI compliant strerror_r(), and it is slightly different
from the one by glibc.
Let's introduce a tiny wrapper to convert XSI strerror_r() to GNU one.
The wrapper also patches musl's spurious catchall error message.
Daan De Meyer [Thu, 13 Nov 2025 21:15:01 +0000 (22:15 +0100)]
sd-event: Make sure iterations of defer and exit sources are updated
Defer and exit event sources are marked pending once when they are added
and never again afterwards. This means their pending_iteration is never
incremented after they are initially added, which breaks fairness among
event sources with equal priority which depend on the pending_iteration
variable getting updated in source_set_pending(). To fix this, let's assign
iterations for defer and exit sources in source_dispatch() instead so that
those get their pending_iteration updated as well.
Daan De Meyer [Wed, 12 Nov 2025 16:58:17 +0000 (17:58 +0100)]
sd-event: Add exit-on-idle support
Sometimes it's hard to assign responsibility to a specific event source
for exiting when there's no more work to be done. So let's add exit-on-idle
support where we exit when there are no more event sources.
Chris Down [Fri, 14 Nov 2025 10:04:24 +0000 (18:04 +0800)]
nspawn: Prevent invalid UIDs propagating in bind mounts
Commit 88fce09 modified the mount_bind() function, causing it to perform
arithmetic on the uid_shift parameter. However, it performs this
arithmetic even when uid_shift was UID_INVALID, which was not intended.
This typically occurred when mount_custom() was called for a simple bind
mount without user namespaces (and thus no rootidmap mount option).
This arithmetic (e.g., uid_shift + m->destination_uid) then wraps
around, resulting in the invalid ID 4294967295 ((uid_t)-1).
This bug manifests for users running systemd-nspawn with
--link-journal=host and --volatile=yes (but without --private-users),
causing systemd-tmpfiles to fail.
Make mount_bind() robust by checking if uid_shift is valid before using
it in arithmetic. If it is UID_INVALID, it defaults to a shift of 0 for
the ownership calculation, restoring correct behavior for plain bind
mounts while preserving the intended logic for ID-mapped mounts.
Daan De Meyer [Fri, 14 Nov 2025 08:10:18 +0000 (09:10 +0100)]
run0: Make --same-root-dir available for run0
This enables running something like
"mkosi box -- run0 --empower --same-root-dir -E PATH" to get an
empowered session as the current user within the "mkosi box" environment.
Daan De Meyer [Fri, 14 Nov 2025 09:28:43 +0000 (10:28 +0100)]
sd-event: Only register memory presure if write buffer size is zero
As documented in sd_event_add_memory_pressure(), we can only add
the memory pressure fd to epoll once we've written the watch string,
so make sure we don't register the memory pressure in
event_source_online() until we've written the watch string.
Daan De Meyer [Thu, 6 Nov 2025 09:20:49 +0000 (10:20 +0100)]
sd-event: Mark post sources as pending after dispatching
More post event sources might get added during dispatching, we want
to make sure those become pending as well if we're dispatching a non-post
event source.
Daan De Meyer [Thu, 6 Nov 2025 19:56:53 +0000 (20:56 +0100)]
test-cgroup-util: Skip test on ESTALE
The kernel converts a bunch of errors to ESTALE in the open_by_handle_at()
codepath so we treat it as missing privs but it could be absolutely
anything really.
Luca Boccassi [Fri, 14 Nov 2025 00:12:34 +0000 (00:12 +0000)]
integritysetup: Add support for hmac-sha512 and wrapped key HMAC algorithms phmac-sha256 and phmac-sha512 (#39719)
Currently the only supported integrity algorithm using HMAC is
`hmac-sha256`. Add `hmac-sha512` to the list of supported algorithms as
well.
Also add the `PHMAC` integrity algorithm to the list of supported
algorithms. The `PHMAC` algorithm is like the regular HMAC algorithm,
but it takes a wrapped key as input. A key for the `PHMAC` algorithm is
an opaque key blob, who's physical size has nothing to do with the
cryptographic size. Such a wrapped key can for example be a HSM
protected key. Currently PHMAC is only available for the s390x
architecture (Linux on IBM Z).
Support for PHMAC has just been added to the cryptsetup project via MR
https://gitlab.com/cryptsetup/cryptsetup/-/merge_requests/693 by commit
To allow automatic opening of integrity protected volumes that use PHMAC
via `/etc/integritytab`, this change in systemd's integritysetup tool is
needed as well.
Chris Down [Sun, 9 Nov 2025 16:59:59 +0000 (00:59 +0800)]
sd-dhcp-server: Add Hostname= option to static leases
This adds a new `Hostname=` option to the [DHCPServerStaticLease]
section in .network files, allowing an administrator to assign a
specific hostname to a client receiving a static lease.
We automatically select the correct DHCP option to use based on the
format of the provided string:
- Single DNS labels are sent as Option 12.
- Names with multiple DNS labels are sent as Option 81 in wire format.
Yu Watanabe [Sat, 21 Jun 2025 15:38:58 +0000 (00:38 +0900)]
musl: add several missing statx macros
glibc's sys/stat.h includes linux/stat.h, and we have copy of it from
the latest kernel, hence all new flags are always defined.
However, musl's sys/stat.h does not include linux/stat.h, and moreover,
they conflict with each other, hence we cannot include both header
simultaneously. Let's define missing macros to support musl.
musl: introduce dummy gshadow header file for userdb
Even 'gshadow' meson option is disabled, src/shared/userdb.c and
src/shared/user-record-nss.c include gshadow.h unconditionally.
Let's introduce dummy header to make them compiled gracefully.