Yu Watanabe [Thu, 1 May 2025 06:28:34 +0000 (15:28 +0900)]
various: convert more readers of /proc/ to plain read_full_file() (#37299)
Continuation of #36734
Apparently I was wrong about everything under `/proc/` being seq_file,
but at least there're some more to convert and we can leverage our
helper func while doing so.
Mike Yuan [Thu, 13 Mar 2025 13:49:13 +0000 (14:49 +0100)]
fileio: modernize get_proc_field()
- Drop effectively unused "terminator" param, imply whitespace
- Make ret param optional
- Return ENODATA if the requested key is not found, rather than
ENOENT
- Turn ENOENT -> ENOSYS if /proc/ is not mounted
- Don't skip whitespaces before ':', nothing needs this handling
anyways
- Remove the special treatment for all "0"s. We don't actually
use this for capabilities given pidref_get_capability() exists
- Switch away from read_full_virtual_file() - files using "field"
scheme under /proc/ seem all to be "seq_file"s (refer to da65941c3ee03495541c3bffbccc9012c8d9a5f8 for details on file types)
sd-stub: fix assertion failure when cleaning up initrd pages
When linux_exec() fails, the initrd pages cleanup attempts to run,
and an assertion is triggered:
../src/boot/linux.c:125@linux_exec: Error loading kernel image: Security violation
../src/boot/util.h:81@cleanup_pages: Error freeing pages: Not found
../src/boot/log.c:30@efi_assert: systemd-boot: Assertion 'r == EFI_SUCCESS' failed at ../src/boot/util.h:82@cleanup_pages, halting.
(log message is new)
This was introduced by https://github.com/systemd/systemd/pull/36715
Before that change, given the argument to xmalloc_pages() was passed as EFI_SIZE_TO_PAGES(n_pages), that's
what ended up in Pages.n_pages. After this change, n_pages gets assigned without being transformed by
EFI_SIZE_TO_PAGES, so the cleanup can find them again. That change causes the assertion failure to trigger.
Changing this to .n_pages = EFI_SIZE_TO_PAGES(n_pages) fixes the assertion.
We were compiling the same resolved sources over and over again (up to
10 times) which had a substantial effect on build times. Let's make sure
we only compile the resolved sources once by having one static library
containing the objects for all the resolved sources.
While we're at it, get rid of unnecessary variables and includes in the
resolve meson file and generally clean things up a bit.
Before (recorded with ClangBuildAnalyzer):
**** Time summary:
Compilation (1823 times):
Parsing (frontend): 675.5 s
Codegen & opts (backend): 81.6 s
After:
**** Time summary:
Compilation (1585 times):
Parsing (frontend): 553.6 s
Codegen & opts (backend): 70.7 s
wait-online: handle varlink connection errors while waiting for DNS (#37283)
Currently, if systemd-networkd-wait-online is started with --dns, and
systemd-resolved is not running, it will exit with an error right away.
Similarly, if systemd-resolved is restarted while waiting for DNS
configuration, systemd-networkd-wait-online will not attempt to
re-connect, and will potentially never see subsequent DNS
configurations.
Improve this by adding socket units for the systemd-resolved varlink
servers, and re-establish the connection in systemd-networkd-wait-online
when we receive `SD_VARLINK_ERROR_DISCONNECTED`.
Nick Rosbrook [Mon, 28 Apr 2025 16:44:20 +0000 (12:44 -0400)]
test: add a test for resolved and wait-online interactions
Specifically, add a test case that ensures systemd-networkd-wait-online --dns
is robust against (a) systemd-resolved absence, and (b) systemd-resolved
restarts.
Nick Rosbrook [Tue, 29 Apr 2025 19:16:45 +0000 (15:16 -0400)]
wait-online: attempt to re-connect after varlink disconnects
Now that systemd-resolved has socket activation for it's varlink
sockets, this should should be enough to make the DNS configuration
logic robust against systemd-resolved stops and restarts.
Add logic to grab socket fds via sd_varlink_server_listen_name(), but
fallback to the existing sd_varlink_server_listen_address() calls if no
fds were given.
This will be used to make systemd-networkd-wait-online --dns more robust
against systemd-resolved restarts etc.
network/ndisc: drop only default gateway via the host when a neighbor announcement without router flag is received
A host can send Router Advertisements (RAs) without acting as a router.
In such cases, the lifetime of the RA header should be zero, but may
contain several options, and clients can configure addresses, routes,
and so on with the message. The host may (should?) send Neighbor
Announcements (NAs) without the router flag in that case.
So, when a NA without the router flag is received, let's not drop
configurations based on the previous RA options, but only drop the
default gateway configured based on the RA header.
See RFC 4861 Neighbor Discovery in IPv6, section 6.3.4:
https://www.rfc-editor.org/rfc/rfc4861#section-6.3.4:~:text=%2D%20The%20IsRouter%20flag,as%20a%20host.
> - The IsRouter flag in the cache entry MUST be set based on the Router
> flag in the received advertisement. In those cases where the IsRouter
> flag changes from TRUE to FALSE as a result of this update, the node
> MUST remove that router from the Default Router List and update the
> Destination Cache entries for all destinations using that neighbor as
> a router as specified in Section 7.3.3. This is needed to detect when
> a node that is used as a router stops forwarding packets due to being
> configured as a host.
sd-varlink: put a limit on queued outgoing messages
This is only a safety net for runaway programs: it puts a limit on
outgoing messages, i.e. not on resources accessible directly from
outside, but only on resources taken by trusted local code.
This was done by running a locally built clang-format with
https://github.com/llvm/llvm-project/pull/137617 and
https://github.com/llvm/llvm-project/pull/137840 applied on all .c
and .h files.
When a [SR-IOV] section has no setting, e.g.
```ini
[SR-IOV]
VirtualFunction=0
```
then the kernel previously replied -EINVAL, as we send a rtnl message
with an empty IFLA_VF_INFO container.
See See do_setvfinfo() in net/core/rtnetlink.c of the kernel.
When a [SR-IOV] section that has an unsupported settings by the
interface driver, then previously the kernel partially applied
settings and returned -EOPNOTSUPP. E.f.
```ini
[SR-IOV]
VirtualFunction=0
LinkState=auto
Trust=true
MACAddress=02:01:00:3e:61:34
```
and the interface does not support configuring the link state, then
the MAC address is assigned, but the trust is not applied:
```
enp3s0f0: Failed to configure SR-IOV virtual function 0, ignoring: Operation not supported
vf 0 link/ether 02:01:00:3e:61:34 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
```
To fix such issues, this makes networkd/udevd send each attribute
for VF one-by-one.
The configuration can easily fail when the target virtual function
does not exist, and there is nothing networkd can do in such case.
Also, it is overkill to make the physical interface entered to the
failed state in such case. Let's warn but ignore the failure.
These assertions impose an include order between <linux/fs.h> and
"missing_fs.h", specifically <linux/fs.h> can't be included before
"missing_fs.h". This makes automated include refactoring very painful,
so let's get rid of these assertions and instead assume that linux/fs.h
does the right thing.
basic: Add our own <netinet/in.h> and <net/if.h> headers
These glibc headers conflicts with the corresponding linux headers
(<linux/in.h> and <linux/if.h>) and impose an include order (the glibc one
has to be included before any linux header is included). This makes sorting
includes a royal pain so let's define our own versions of these headers using
various linux headers to do all the work and filling in the missing bits
ourselves.
The header requires struct sockaddr declared. So, otherwise, we need to
include sys/socket.h earlier than linux/vm_sockets.h.
Let's make the header includable at any place.
udev: Enable delegation without delegating any controllers
Delegation is enabled for udev so that it can mess around with the
cgroup hierarchy to avoid killing control processes when it calls
cg_kill in on_post() when it goes idle. We don't actually care about
any specific cgroup controllers in udev, so set Delegate= to enable
delegation without delegating any controllers
Follow up for https://github.com/systemd/systemd/pull/22752
various: do not use assert_se as a workaround in non-test code
This partially reverts 5332be60d3897c7b86d28cf7b9d61c5dc6847fd6. I expect that
there is no practical difference, but it seems philosophically wrong to use
assert_se(), i.e. for the generation of the code in non-debug builds, just to
suppress a warning. We have _unused_ for that, use it.
I verified that we don't get warnings with clang and -DNDEBUG=1 with this patch.
@DaanDeMeyer Obviously this doesn't fix nearly everything, so gradually
moving things over is probably a smart thing? It seems clang-tidy does
support drop in configs for example:
Its a bit strange that `WarningsAsErrors` isn't propagated, but dropping
this file in src/report/.clang-tiday invokes:
```
[1314/1543][1.5s] /usr/bin/clang-tidy --use-color -extra-arg=-fno-caret-diagnostics -p=/home/jelle/projects/systemd/build -quiet /home/jelle/projec
ts/systemd/src/repart/repart.c
../src/repart/repart.c:4715:41: error: argument name 'pubkey' in comment does not match parameter name 'public' [bugprone-argument-comment,-warning
s-as-errors]
4715 | /* pubkey= */ NULL, /* Turn this one off for the 2nd shard */
| ^
../src/shared/tpm2-util.h:281:108: note: 'public' declared here
281 | int tpm2_calculate_sealing_policy(const Tpm2PCRValue *pcr_values, size_t n_pcr_values, const TPM2B_PUBLIC *public, bool use_pin, const Tpm2
PCRLockPolicy *policy, TPM2B_DIGEST *digest);
| ^
```
So that seems to behave as intended :)
And in some cases I am not sure if switching to the correct argument is
an improvement ie.:
```
../src/bootctl/bootctl-reboot-to-firmware.c:66:51: [38;2;190;132;255m0;1;31merror: argument name 'dispatch_table' in comment does not match paramet
er name 'table' [bugprone-argument-comment,-warnings-as-errors]
66 | r = sd_varlink_dispatch(link, parameters, /* dispatch_table = */ NULL, /* userdata = */ NULL);
| [38;2;190;132;255m0;1;32m ^
../src/systemd/sd-varlink.h:187:98: [38;2;190;132;255m0;1;36mnote: 'table' declared here
187 | int sd_varlink_dispatch(sd_varlink *v, sd_json_variant *parameters, const sd_json_dispatch_field table[], void *userdata);
| [38;2;190;132;255m0;1;32m ^
```
or
```
../src/validatefs/validatefs.c:274:83: [38;2;190;132;255m0;1;31merror: argument name 'ret_len' in comment does not match parameter name 'len' [bugprone-argument-comment,-warnings-as-errors]
274 | (void) blkid_probe_lookup_value(b, "PART_ENTRY_TYPE", &v, /* ret_len= */ NULL);
| [38;2;190;132;255m0;1;32m ^
/usr/include/blkid/blkid.h:455:52: [38;2;190;132;255m0;1;36mnote: 'len' declared here
455 | const char **data, size_t *len)
| [38;2;190;132;255m0;1;32m ^
```
But that's also half a style thing with `len` winning over `ret_len`.
Notable changes are
- SIGTERM is the highest among others, to make not udevd queue too
many events, as we need to serialize them anyway.
- device monitor has the second highest priority, to make 'remove'
uevents received earlier than IN_IGNORED inotify events. Otherwise,
after IN_IGNORED is received, if there is no queued event,
/run/udev/queue file will be removed by the post-event source of the
inotify event, and 'udevadm settle' or friends may wrongly finish,
even we will soon queue 'remove' uevents for the device.
This change should fix the recent instability of TEST-64-UDEV-STORAGE.
udev: do not remove /run/udev/queue file when we are synthesizing events
Note, it should be safe even if we synthesize no event, e.g. when the
device has been already removed. In such case, the post event after
SIGCHLD will remove the file.
udev: try again to create /run/udev/queue when queueing the next event
This is mostly a paranoia, but if we failed to create /run/udev/queue
for some reasons on queueing an event, previously we would never create
the file until once the queue became empty. This makes in such case we
try to create the file again when queueing the next event.
networkd: reduce the default IPv4 DAD (ACD) timeout and make it configurable (#37138)
RFC 5227 specifies randomized intervals to avoid that a large number of
hosts powered up at the same time send their message simultaneously.
Performing the conflict detection takes a variable time between 4 and 7
seconds from the beginning to the first announcement, as shown by the
following diagram where P indicates a probe and A an announcement:
```
time(s) 0 1 2 3 4 5 6 7 8 9
+---+---+---+---+---+---+---+---+---+
SHORTEST P P P A A
LONGEST P P P A A
```
The host can't use the address until the first announcement is sent. 7
seconds is a very long time on modern computers especially considering
the fact that the round-trip time on current LAN technologies is at most
few milliseconds. Section 2.2 of the RFC addresses this matter and hints
that a future standard will adjust those timeouts; however that standard
doesn't exist yet.
Make the timeout configurable via a new
"IPv4DuplicateAddressDetectionTimeoutSec=" option. The intervals defined
in the RFC are then scaled proportionally so that the duration of the
conflict detection takes at most the given value. Interval happening
after the first announcement are not scaled, as recommended by the RFC.
Also reduce the default value from 7s to 200ms, which is a more suitable
value for today's technology.
The original timeout of 7 seconds is very long for today's networks. Reduce it
to 200ms. Note that this change also affects IPv4 link-local addressing.
RFC 5227 specifies randomized intervals to avoid that a large number of hosts
powered up at the same time send their message simultaneously. Performing the
conflict detection takes a variable time between 4 and 7 seconds from the
beginning to the first announcement, as shown by the following diagram where P
indicates a probe and A an announcement:
time(s) 0 1 2 3 4 5 6 7 8 9
+---+---+---+---+---+---+---+---+---+
SHORTEST P P P A A
LONGEST P P P A A
The host can't use the address until the first announcement is sent. 7 seconds
is a very long time on modern computers especially considering the fact that
the round-trip time on current LAN technologies is at most few milliseconds.
Section 2.2 of the RFC addresses this matter and hints that a future standard
will adjust those timeouts; however that standard doesn't exist yet.
Make the timeout configurable via a new IPv4DuplicateAddressDetectionTimeoutSec=
option. The intervals defined in the RFC are then scaled proportionally so that
the duration of the conflict detection takes at most the given value. Interval
happening after the first announcement are not scaled, as recommended by the
RFC.
man/systemd-nspawn: call dnf with --use-host-config
This is needed for dnf5. But dnf-4 doesn't know about it. So also add a hint to
skip the option with dnf-4. We can drop this later when dnf5 is the default
everywhere.
Also, s/vim-minimal/nano/. Nano is the default editor in Fedora since
https://fedoraproject.org/wiki/Changes/UseNanoByDefault.
selinux: Disable selinux logging in mac_init() as well
We currently only disable selinux logging in mac_selinux_setup(),
but not in mac_init(). We don't want libraries we use to log unless
we tell them to, so disable selinux's logging in mac_init() as well.
test: Add custom signal handlers to integration test wrapper script
meson will send SIGTERM if the test gets stuck and hits the timeout,
in which case we still want to do log saving and analysis, so let's
add some signal handlers which allow us to do that.
This won't be very useful until https://github.com/mesonbuild/meson/pull/14513
lands, since we only get half a second from meson to handle SIGTERM
before it sends SIGKILL, but let's land this already so we immediately
start taking advantage of the meson fix once it lands.