tree-wide: make sure net/if.h is included before any linux/ header
The linux/ headers include linux/libc-compat.h that makes sure the
linux/ headers won't redeclare symbols already declared by net/if.h, but
glibc's net/if.h doesn't do that, so if the include order is reversed
we'll end up with a bunch of errors about redeclared stuff:
This also drops remaining workarounds from the last time this issue was
brought up (6f270e6bd8) since they shouldn't be needed anymore if the
order of the includes is the "correct" one. I also added a comment to
each affected include when this is inevitably encountered again in the
future.
Just like we already have $SYSTEMD_PACKAGES for systemd packages to
re-install in the main image, let's add $INITRD_PACKAGES for all
systemd packages to re-install in the initrd.
mkosi: Install openSUSE-release instead of distribution-release
distribution-release is a virtual package that is by default satisfied
by the openSUSE MicroOS-release package. Let's make sure we pull in the
generic openSUSE-release package instead by installing
patterns-base-minimal_base which has a Suggests dependency on
openSUSE-release which makes sure it takes priority over the MicroOS one.
We might want to run the build scripts outside of mkosi as well at
some point, e.g. to build an rpm after booting the image, so let's
make them more generic by using /usr/lib/os-release to figure out
which pkg specs we should use instead of $PKG_SUBDIR. To make ubuntu
use the debian pkg spec, we add a symlink pkg/ubuntu which points to
debian/ in the same directory.
dns_transaction_request_dnssec_rr was already adjusted in 400171036592,
to allow for the return parameter to be passed uninitialized. However
this codepath was missed, meaning this function could sometimes return
success without having actually set the parameter.
Fixes: 400171036592 ("resolved: minor dnssec fixups") Fixes: 47690634f157 ("resolved: don't request the SOA for every dns label")
network/dhcp6: return earlier if no lease acquired
Previously, even If an interface has not acquired a DHCPv6 lease,
networkd logs a misleading message:
===
Apr 09 10:44:57 systemd-networkd[3970750]: veth99: DHCPv6 lease lost
===
The function should do nothing when no lease acquired. Let's return
earlier and suppress the log message.
This allows us to build and install after booting without having to
build a new image. Together with
https://github.com/systemd/mkosi/pull/2601 and after enabling
RuntimeBuildSources=yes, after booting, "meson install -C /work/build"
can be used to do an incremental build and install. This won't build
proper packages, but will be invaluable for having a quick compile,
edit, test cycle without having to rebuild the image all the time.
networkd: report error if lease file cannot be loaded and ignore
On my system, networkd would report that interface ve-rawhide is "Failed"
without anything in the logs:
systemd-networkd[651095]: ve-rawhide: Trying to reconfigure the interface.
systemd-networkd[651095]: ve-rawhide: Gained IPv6LL
systemd-networkd[651095]: ve-rawhide: Link DOWN
systemd-networkd[651095]: ve-rawhide: Lost carrier
systemd-networkd[651095]: ve-rawhide: Configuring with /usr/lib/systemd/network/80-container-ve.network.
systemd-networkd[651095]: ve-rawhide: Link UP
systemd-networkd[651095]: ve-rawhide: Gained carrier
systemd-networkd[651095]: ve-rawhide: Failed
At debug level:
systemd-networkd[799993]: dhcp-server-lease/ve-rawhide:1:1: Missing object field 'Address'.
I'm not sure why "Address" is missing, but anyway, in this case, we should ignore the
lease file rather than refusing to configure the interface. Also, warn at the point
where we know what the filename is.
test-execute: check for s390x first and duplicate test
s390x will define both s390x and s390, so exec-personality-s390.service is ran
in both cases but fails on s390x, as the personality returned is s390x.
Split the test and check specifically for s390x.
Mike Yuan [Sun, 7 Apr 2024 11:33:37 +0000 (19:33 +0800)]
systemctl-logind: auto soft-reboot only if /run/nextroot/ is mountpoint
Consider the following case: a user sets up a minimum rootfs for
file system maintenance work in /run/nextroot/ dir directly. When
they're done, they expect 'systemctl reboot' to perform a full reboot.
But they keep soft-rebooting back to the tmpfs root, until they
find out about $SYSTEMCTL_SKIP_AUTO_SOFT_REBOOT.
So currently, when /run/nextroot/ is a normal dir, pid1 automatically
turns it into a bind mount to soft-reboot into. This is good, but when
combined with automatic soft-reboot it has an arguably unexpected
behavior, since /run/nextroot/ can never go away in such a case.
OTOH, if /run/nextroot/ is a mountpoint in the first place, the mount
is *moved* so a second reboot would not trigger auto soft-reboot.
Let's just make things more friendly to users, and do auto soft-reboot
only if /run/nextroot/ is also a mountpoint.
With gcc-14.0.1-0.13.fc40, when compiling with -O2, the compiler doesn't understand
that sd_bus_error_setf() always returns negative on error when <name> is provided:
[28/576] Compiling C object systemd-resolved.p/src_resolve_resolved-bus.c.o
../src/resolve/resolved-bus.c: In function ‘call_link_method’:
../src/resolve/resolved-bus.c:1763:16: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
1763 | return handler(message, l, error);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1749:15: note: ‘l’ was declared here
1749 | Link *l;
| ^
../src/resolve/resolved-bus.c: In function ‘bus_method_get_link’:
../src/resolve/resolved-bus.c:1822:13: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
1822 | p = link_bus_path(l);
| ^~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1810:15: note: ‘l’ was declared here
1810 | Link *l;
| ^
...
Let's make the assertion a bit more explicit. With this, the warning goes away,
but I think it's more obvious to a human reader too.
core: silence gcc warning about unitialized variable
When compiled with -O2, the compiler is not happy about dynamic_user_pop() and
would warn about the output variables not being set. It does have a point:
we were doing a cast from ssize_t to int, and theoretically there could be
wraparound. So let's add an explicit check that the cast to int is fine.
[540/2509] Compiling C object src/core/libsystemd-core-256.so.p/dynamic-user.c.o
../src/core/dynamic-user.c: In function ‘dynamic_user_close.isra’:
../src/core/dynamic-user.c:580:9: warning: ‘uid’ may be used uninitialized [-Wmaybe-uninitialized]
580 | unlink_uid_lock(lock_fd, uid, d->name);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/core/dynamic-user.c:560:15: note: ‘uid’ was declared here
560 | uid_t uid;
| ^~~
../src/core/dynamic-user.c: In function ‘dynamic_user_realize’:
../src/core/dynamic-user.c:476:29: warning: ‘new_uid’ may be used uninitialized [-Wmaybe-uninitialized]
476 | num = new_uid;
| ~~~~^~~~~~~~~
../src/core/dynamic-user.c:398:23: note: ‘new_uid’ was declared here
398 | uid_t new_uid;
| ^~~~~~~
nsresourced: add new daemon for granting clients user namespaces and assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.
dissect-image: make dissected_image_acquire_metadata() operate within a userns if possible
This opens the door for making the call work without privileges: if we
pass in a userns fd and DissectedImage that has mount fds then we can
acquire all information without privs.
lock-util: make global lock return parameter to image_path_lock() optional
When adding unprivileged nspawn support we don't really want a global
lock file, since we cannot even access the dir they are stored in, hence
make the concept optional.
uid-range: add new uid_range_load_userns_by_fd() helper
This is similar to uid_range_load_userns() but instead of reading the
uid_map off a process it reads it off a userns fd.
(Of course the kernel has no API for this right now, hence we fork off a
throw-away process which joins the user namespace, and then read off the
data from there.)
image-policy: add a new image_policy_intersect() call
This new call takes two image policy objects and generates an
"intersection" policy, i.e. only allows what is allowed by both. Or in
other words it conceptually implements a binary AND of the policy flags.
(Except that it's a bit harder, due to normalization, and underspecified
flags).
We can use this later for mountfsd: a client can specify a policy, and
mountfsd can specify another policy, and we'll then apply only what both
allow.
Note that a policy generated like this might be invalid. For example, if
one policy says root must exist and be verity or luks protected, and the
other policy says root must be absent, then the intersection is invalid,
since one policy only allows what the other prohibits and vice versa.
We'll return a clear error code in that case (ENAVAIL). (This is because
we simply don't allow encoding such impossible policies in an
ImagePolicy structure, for good reasons.)
This new call is like varlink_peek_fd() (i.e. gets an fd out of the
connection but leaving it also in there), and combines ith with
F_DUPFD_CLOEXEC to make a copy of it.
We previously already had varlink_dup_fd() which was a duplicating
version for pushing an fd *into* the connection. To reduce confusion,
let's rename that one varlink_push_dup_fd() to make the symmetry to
valrink_push_fd() clear so that we have no:
varlink_peer_push_fd() → put fd in without dup'ing
varlink_peer_push_dup_fd() → same with F_DUPFD_CLOEXEC
varlink_peer_peek_fd() → get fd out without dup'ing
varlink_peer_peek_dup_fd() → same with F_DUPFD_CLOEXEC
Since e56a8790a0 debugging test-execute fails has been a royal PITA, since
we ditch all potentially useful output from the test units (that, for
the most part, run `sh -x ...`). Let's improve the situation a bit by
setting EXEC_OUTPUT_NULL only when running the single test case that
needs it, and inheriting stdout otherwise.
For example, with a purposefully introduced error we get this output
with this patch:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
Personality: x86-64
LockPersonality: no
SystemCallErrorNumber: kill
++ uname -m
+ c=x86_64
+ test x86_64 = foo_bar
Received SIGCHLD from PID 1520588 (sh).
Child 1520588 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1520588 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted
But without it, we'd miss the most important part:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
Personality: x86-64
LockPersonality: no
SystemCallErrorNumber: kill
Received SIGCHLD from PID 1521365 (sh).
Child 1521365 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1521365 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted