Julia Kartseva [Fri, 7 Jan 2022 23:02:57 +0000 (15:02 -0800)]
bpf: check if lsm link ptr is libbpf error
BPF_RAW_TRACEPOINT_OPEN is expected to work only on x86 and x86_64,
since BPF trampoline is implemented only on these architectures.
Attach probing by bpf_program__attach_lsm already happens in
`bpf_lsm_supported`. The resulting pointer can store libbpf error and
that is the case for unsupported architectures.
Add libbpf error check to `bpf_lsm_supported` so execution does not
reach the point where unit startup fails.
In the olden days systemd-resolved used dbus and it didn't make sense to start
it before dbus which is started fairly late. But we have mostly ported resolved
over to varlink. The queries from nss-resolve are done using varlink, so name
resolution can work without dbus. resolvectl still uses dbus, so e.g. 'resolvectl
query' will not work, but by starting systemd-resolved earlier we're not making this
any worse.
If systemd-resolved is started after dbus, it registers the name and everything
is fine. If it is started before dbus, it'll watch for the dbus socket and
connect later. So it should be fine to start systemd-resolved earlier. (If dbus
is stopped and restarted, unfortunately systemd-resolved does not reconnect.
This seems to be a small bug: since our daemons know how to watch for
dbus.socket, they could restart the watch if they ever lose the connection. But
this scenario shouldn't happen in normal boot, and restarting dbus is not
supported anyway.)
Moving the start earlier the following advantages:
- name resolution becomes availabe earlier, in particular for synthesized
hostnames even before the network is up.
- basic.target is part of initrd.target, so systemd-resolved will get started
in the initrd if installed. This is required for nfs-root when the server is
specified using a name (https://bugzilla.redhat.com/show_bug.cgi?id=2037311).
bpf: actually skip RestrictFileSystems= when not supported
Units would fail to start, incl. systemd-journald.service and systemd-udevd.service.
Since unit->manager->restrict_fs will be set if and only if we can use it,
we can just check for that and remove the other checks.
Follow-up for 299d9417238e0727a48ebaabb5a9de0c908ec5c8.
Luca Boccassi [Sun, 9 Jan 2022 14:00:25 +0000 (14:00 +0000)]
test: store empty files rather than symlinks for test-fstab-generator
Dangling symlinks get pruned when packaging up the installation
directory. Just store empty files instead, and compare the names
rather than the content for .requires/.wants - the filename is
what is important anyway, the content is ignored.
> [It is] called in the setup of ld-linux-x86-64.so.2 from _dl_sysdep_start.
> My local call stack (with LTO):
>
> #0 init_cpu_features.constprop.0 (/usr/lib64/ld-linux-x86-64.so.2)
> #1 _dl_sysdep_start (/usr/lib64/ld-linux-x86-64.so.2)
> #2 _dl_start (/usr/lib64/ld-linux-x86-64.so.2)
> #3 _start (/usr/lib64/ld-linux-x86-64.so.2)
>
> Looking through the source, I think it's this (links for glibc 2.34):
> - First dl_platform_init calls _dl_x86_init_cpu_features, a wrapper for init_cpu_features.
> - Then init_cpu_features calls get_cet_status.
> - At last, get_cet_status invokes arch_prctl.
Adam Williamson [Wed, 5 Jan 2022 22:07:14 +0000 (14:07 -0800)]
kernel-install: prefer /boot over /boot/efi for $BOOT_ROOT
This restores the preference order from before 9e82a74. The code
previous to that change 'preferred' /boot over /boot/efi; that
commit changed it to check /boot/efi before checking /boot.
Changing this precedence could (and did, for me) have unexpected
effects - it seems safer to leave it how it was.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Markus Weippert [Tue, 4 Jan 2022 12:56:11 +0000 (13:56 +0100)]
homed: stop before stopping dbus
Otherwise, systemd-homed-active.service will fail to deactivate all
homes because homectl can no longer talk to homed if dbus stops first.
As a result, /home cannot be umounted.
Doing this on systemd-homed-active.service instead works as well, but
systemd-homed will exit 1 if dbus is already shut down.
Julia Kartseva [Thu, 6 Jan 2022 00:34:56 +0000 (16:34 -0800)]
bpf: do not freeze if bpf lsm fails to set up
BPF LSM is cgroup unaware and it's set up is happening in core manager.
It occures that the current implementation is too restrictive and causes
pid 1 to freeze.
Instead:
* in bpf_lsm_setup set manager->restrict_fs pointer last,
so it is an indicator that the set up was successful
* check for manager->restrict_fs before applying unit options
If the action failed, we should log about the issue, and continue.
Exiting would bring the graphical session down, which of course is not
appreciated by users.
As documented in previous commits, a non-negative return from the callback
doesn't matter, so the callback is simplified a bit.
It was copy-pasted directly from OSS-Fuzz where it makes sense to
kind of strip binaries to get nice backtraces but when the fuzzers
are built and run locally with gdb it would be nice to have a little
bit more than that.
It was initially discovered in elfutils where I put the same flags
and was surprised when I couldn't run the fuzzer comfortably step
by step, which led to the same change there: https://github.com/google/oss-fuzz/pull/7092
:-)
network: move logging from tc .fill_message to the callers
Structured initialization is used a bit more.
There were two kinds of log messages: about failed size calculations and
about failed appends to the message. I "downgraded" the first type to log_debug,
and moved the latter to the caller. This way there should be at most one high-priority
message.
I also changed sizeof(<type>) to sizeof(var) — there is less chance of select-and-paste
error in the second form.
network: simplify logging in l2tp_create_session() and l2tp_create_tunnel()
All the detailed logging is replaced by a simple "Failed to create netlink message",
which should be enough for the user in the unlikely case that this ever fails.
network: replace detailed netlink append messages with a single generic message
This commit is the first in the series, and they generally follow the same
idea: we had very detailed logging for message append operations which would
only fail either with some type error or intrinsic limit (and then they would
fail everywhere, so this would be noticed during development or in CI), or they
would fail with ENOMEM, in which case the exact location is not very interesting
since this is not repeatable.
I am in general in favour of detailed logging messages, because it helps with
diagnosis of errors, but I think case is an exception. Despite not being very
useful, those messages required a lot of effort, because they were customized
for each and every append operation. In fact some of the messages contained copy
errors. The text of the messages (since they are generally unique) also added up
to a considerable size.
This removes the log messages after each sd_netlink_message_append_*() in
fill_message_create() with a single line in netdev_create(). As described
above, we are just appending fields to a message, so those calls would almost
never fail.
A forgotten 'return' was added in one place.
$ size build/systemd-networkd{.0,}
text data bss dec hex filename 1878634 394016 36 2272686 22adae build/systemd-networkd.0 1842450 394080 36 2236566 222096 build/systemd-networkd
random-seed: hash together old seed and new seed before writing out file
If we're consuming an on-disk seed, we usually write out a new one after
consuming it. In that case, we might be at early boot and the randomness
could be rather poor, and the kernel doesn't guarantee that it'll use
the new randomness right away for us. In order to prevent the new
entropy from getting any worse, hash together the old seed and the new
seed, and replace the final bytes of the new seed with the hash output.
This way, entropy strictly increases and never regresses.
Fixes: https://github.com/systemd/systemd/issues/21983 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
$ udevadm test /devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4.1/2-4.1.3
...
2-4.1.3: /usr/lib/udev/rules.d/40-libgphoto2.rules:9 Importing properties from results of builtin command 'usb_id'
2-4.1.3: /usr/lib/udev/rules.d/50-udev-default.rules:13 Skipping builtin 'usb_id' in IMPORT key
2-4.1.3: /usr/lib/udev/rules.d/50-udev-default.rules:13 Importing properties from results of builtin command 'hwdb --subsystem=usb'
2-4.1.3: hwdb modalias key: "usb:v17EFp3054:OneLink+ Giga"
2-4.1.3: /usr/lib/udev/rules.d/50-udev-default.rules:15 Importing properties from results of builtin command 'hwdb 'usb:v17efp3054''
2-4.1.3: No entry found from hwdb.
2-4.1.3: /usr/lib/udev/rules.d/50-udev-default.rules:15 Failed to run builtin 'hwdb 'usb:v17efp3054'': No data available
2-4.1.3: /usr/lib/udev/rules.d/50-udev-default.rules:52 MODE 0664
except that the existing one was done with uppercase digits and the full match pattern,
and the second one was done with lowercase digits.
With the previous commit we only have uppercase digits in our match patterns, so we can
drop the duplicate import. (Some other projects might have rules that used the lowercase
match patterns, and people might have some local rules that did that too. But the second
import was only added recently so I think it's better to rip off the bandaid quickly.)
hwdb: fix check for uppercasedness of match patterns
The check was added in 77547d5313ea916d2fb64ca5a8812734e9b50f92, but
it doesn't work as expected. Because the second part is wrapped in Optional(),
it would silently "succeed" when the lowercase digits were in the second part:
Empty files and empty strings seem to have triggered various
issues in the past so it seems they shouldn't be ignore by the
fuzzers just because fmemopen can't handle them.
Prompted by https://github.com/systemd/systemd/pull/21939#issuecomment-1003113669
meson: generate better arch defines for clang bpf compilation
The code assume that meson's cpu_family can be mapped directly to
'-D__<cpu_family>__'. This works in a surprising number of cases, but not for a
few architectures. PPC uses "powerpc", and RISC-V omits the trailing underscores.
ARM and RISC-V require a second define too.
Fixes #21900.
(I don't think this matters too much: we need *something* so that gnu/stubs.h
can be successfully included. But we don't actually call syscalls or depend too
much on the host environment, so things should be fine as long as we don't get
a compilation error.)
When the support for "synthetic errno" was added, we started truncating
the errno value to just the least significant byte. This is generally OK,
because errno values are defined up to ~130.
The docs don't really say what the maximum value is. But at least in principle
higher values could be added in the future. So let's stop truncating
the values needlessly.
The kernel (or libbpf?) have an error where they return 524 as an errno
value (https://bugzilla.redhat.com/show_bug.cgi?id=2036145). We would
confusingly truncate this to 12 (ENOMEM). It seems much nicer to let
strerror() give us "Unknown error 524" rather than to print the bogus
message about ENOMEM.
coredump: do not crash if we failed to acquire exe path
The COREDUMP_EXE attribute is "optional", i.e. we continue to process the
crash even if we didn't acquire it. The coredump generation code assumed
that it is always available:
#5 endswith at ../src/fundamental/string-util-fundamental.c:41
[ endswith() is called with NULL here, and an assertion fails. ]
#6 submit_coredump at ../src/coredump/coredump.c:823
#7 process_socket at ../src/coredump/coredump.c:1038
#8 run at ../src/coredump/coredump.c:1413
We use the exe path for loop detection, and also (ultimately) pass it to
dwfl_core_file_report(). The latter seems to be fine will NULL, so let's just
change our code to look at COMM, which should be more reliable anyway.