Daan De Meyer [Thu, 25 May 2023 16:13:02 +0000 (18:13 +0200)]
units: Shut down networkd and resolved on switch-root
Let's explicitly order these against initrd-switch-root.target, so
that they are properly shut down before we switch root. Otherwise,
there's a race condition where networkd might only shut down after
switching root and after we've already we've loaded the unit graph,
meaning it won't be restarted in the rootfs.
Accel (Compute Acceleration) are new devices for AI/ML computation:
https://docs.kernel.org/accel/introduction.html
They are part of DRM subsystem. Add them to 'render' group since
no other appropriate group in standard linux systems exist. This
can be changed when proper common user-space components will emerge,
and new group for acceleration devices access will be established.
Daan De Meyer [Wed, 24 May 2023 13:32:17 +0000 (15:32 +0200)]
meson: Create credstore directories
Let's make the creds directories a bit more discoverable and make it
easier for users to use them. This also allows us to fix the
mode to 0700 for /etc instead of the usual 0755 which is what probably
would happen if users had to create this directory themselves.
Frantisek Sumsal [Wed, 24 May 2023 11:29:52 +0000 (13:29 +0200)]
tree-wide: check memstream buffer after closing the handle
When closing the FILE handle attached to a memstream, it may attempt to
do a realloc() that may fail during OOM situations, in which case we are
left with the buffer pointer pointing to NULL and buffer size > 0. For
example:
Malte Poll [Wed, 24 May 2023 09:01:25 +0000 (11:01 +0200)]
ukify: fix handling signed kernel as file
The .linux section would contain the path to the signed kernel (instead of the signed kernel itself), since the python type of the variable is used to determine how it is handled when adding the pe sections.
Co-authored-by: Otto Bittner <cobittner@posteo.net>
Luca Boccassi [Thu, 11 May 2023 23:55:58 +0000 (00:55 +0100)]
stub: allow loading and verifying cmdline addons
Files placed in /EFI/Linux/UKI.efi.extra.d/ and /loader/addons/ are
opened and verified using the LoadImage protocol, and will thus get
verified via shim/firmware.
If they are valid signed PE files, the .cmdline section will be
extracted and appended. If there are multiple addons in each directory,
they will be parsed in alphanumerical order.
Optionally the .uname sections are also matched if present, so
that they can be used to filter out addons as well if needed, and only
addons that correspond exactly to the UKI being loaded are used.
It is recommended to also always add a .sbat section to addons, so
that they can be mass-revoked with just a policy update.
The files must have a .addon.efi suffix.
Files in the per-UKI directory are parsed, sorted, measured and
appended first. Then, files in the generic directory are processed.
Mike Yuan [Mon, 22 May 2023 00:30:30 +0000 (08:30 +0800)]
core: get rid of unused Service.will_auto_restart logic
The announced new behavior for OnFailure= never worked properly,
and we've fixed the document instead in #27675.
Therefore, let's get rid of the unused logic completely. More at #27594.
The to-be-added RestartMode= option should cover the use case hopefully.
Luca Boccassi [Sun, 21 May 2023 13:32:39 +0000 (14:32 +0100)]
ukify: add default .sbat section for addons
In order to ensure addons can always be revoked via SBAT, and it is not
left out by mistake, have a default metadata entry if none is specified
by the caller.
https://github.com/rhboot/shim/blob/main/SBAT.md
Luca Boccassi [Thu, 11 May 2023 23:51:19 +0000 (00:51 +0100)]
efi: set EFIVAR to stop Shim from uninstalling its protocol
We'll use it from the stub to validate files. Requires Shim 5.18.
By default, Shim uninstalls its protocol when calling StartImage(),
so when loading systemd-boot via shim and then loading an UKI, the
UKI's sd-stub will no longer be able to use the shim verification
protocol by default.
mount: check right before invoking /bin/umount if it makes sense
Notifications from /proc/self/mountinfo are async, so if we stop a
service (and while doing so get rid of the credentials mount point of
it), then it will take a while until the notification reaches us and we
actually scan the table again. In particular as we nowadays ratelimit
notifications on the table, since it's so inefficient. And as I learnt
the ratelimiting is actually quite regularly hit during shutdown, where
a flurry of umount events are genreated. Hence, let's check if a mount
point is actually a mountpoint before trying to unmount it. And if it
isn't let's wait for the notification to come in.
(This race might be triggred not just by us on ourselves btw: there are
other daemons that unmount stuff when stopping where the race also
exists, but might simply be harder to trigger: if during service
shutdown these services remove some mount then they might collide with
us doing the same. After all, we have the rule to unmount everything
mounted automatically for you during shutdown.)
In the long run we should also start making us of this when it becomes
available: https://github.com/util-linux/util-linux/issues/2132 With
that we can make issues like this go away entirely from our side of
things at least.
core: split out default network dep generation into own function
Just some simple refactoring: let's split out network dep generation
into its own function mount_add_default_network_dependencies().
This way mount_add_default_dependencies() only does preparatory stuff,
and then calls both mount_add_default_network_dependencies() and
mount_add_default_ordering_dependencies() with that, making things
nicely symmetric.
core: suppress various defaults deps for credentials mounts
The per-unit credentials mounts might show up for any kind of service,
including very very early ones. Hence let's not order them after
local-fs-pre.target, because otherwise we might trigger cyclic deps of
services that want to plug before that but still use credentials.
Daan De Meyer [Tue, 23 May 2023 14:24:47 +0000 (16:24 +0200)]
core/timer: Always use inactive_exit_timestamp if it is set
If we're doing a daemon-reload, we'll be going from TIMER_DEAD => TIMER_WAITING,
so we won't use inactive_exit_timestamp because TIMER_DEAD != UNIT_ACTIVE, even
though inactive_exit_timestamp is serialized/deserialized and will be valid after
the daemon-reload.
This issue can lead to timers never firing as we'll always calculate the next
elapse based on the current realtime on daemon-reload, so if daemon-reload happens
often enough, the elapse interval will be moved into the future every time, which
means the timer will never trigger.
To fix the issue, let's always use inactive_exit_timestamp if it is set, and only
fall back to the current realtime if it is not set.
Luca Boccassi [Sun, 21 May 2023 14:18:21 +0000 (15:18 +0100)]
stub: measure SMBIOS kernel-cmdline-extra in PCR12
PCR1, where SMBIOS strings are measured, is filled with data that is not
under the control of the machine owner. Measure cmdline extensions in
PCR12 too, where we measure other optional addons that are loaded by
sd-stub.
Jan Janssen [Tue, 23 May 2023 17:00:52 +0000 (19:00 +0200)]
elf2efi: Do not emit an empty relocation section
At least shim will choke on an empty relocation section when loading the
binary. Note that the binary is still considered relocatable (just with
no base relocations to apply) as we do not set the
IMAGE_FILE_RELOCS_STRIPPED DLL characteristic.
msizanoen1 [Tue, 23 May 2023 11:46:26 +0000 (18:46 +0700)]
core: Do not check child freezability when thawing slice
We want thawing operations to still succeed even in the presence of an
unfreezable unit type (e.g. mount) appearing under a slice after the
slice was frozen. The appearance of such units should never cause the
slice thawing operation to fail to prevent potential future repeats of
https://github.com/systemd/systemd/issues/25356.
sd-boot,sd-stub: also print version after the address
The kernel, systemd, and many other things print their version during boot.
sd-boot and sd-stub are also important, so let's print the version if EFI_DEBUG.
(If !EFI_DEBUG, continue to be quiet.)
When updating the docs, I saw that that the text in HACKING.md was out of date.
Instead of trying to update the instructions there, make it shorter and refer
the reader to tools/debug-sd-boot.sh for details.
Daan De Meyer [Tue, 23 May 2023 11:25:58 +0000 (13:25 +0200)]
tree-wide: Fix false positives on newer gcc
Recent gcc versions have started to trigger false positive
maybe-uninitialized warnings. Let's make sure we initialize
variables annotated with _cleanup_ to avoid these.
Frantisek Sumsal [Tue, 23 May 2023 07:55:17 +0000 (09:55 +0200)]
json: correctly handle magic strings when parsing variant strv
We can't dereference the variant object directly, as it might be
a magic object (which has an address on a faulting page); use
json_variant_is_sensitive() instead that handles this case.
For example, with an empty array:
==1547789==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000023 (pc 0x7fd616ca9a18 bp 0x7ffcba1dc7c0 sp 0x7ffcba1dc6d0 T0)
==1547789==The signal is caused by a READ memory access.
==1547789==Hint: address points to the zero page.
SCARINESS: 10 (null-deref)
#0 0x7fd616ca9a18 in json_variant_strv ../src/shared/json.c:2190
#1 0x408332 in oci_args ../src/nspawn/nspawn-oci.c:173
#2 0x7fd616cc09ce in json_dispatch ../src/shared/json.c:4400
#3 0x40addf in oci_process ../src/nspawn/nspawn-oci.c:428
#4 0x7fd616cc09ce in json_dispatch ../src/shared/json.c:4400
#5 0x41fef5 in oci_load ../src/nspawn/nspawn-oci.c:2187
#6 0x4061e4 in LLVMFuzzerTestOneInput ../src/nspawn/fuzz-nspawn-oci.c:23
#7 0x40691c in main ../src/fuzz/fuzz-main.c:50
#8 0x7fd61564a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)
#9 0x7fd61564a5c8 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x275c8)
#10 0x405da4 in _start (/home/fsumsal/repos/@systemd/systemd/build-san/fuzz-nspawn-oci+0x405da4)
DEDUP_TOKEN: json_variant_strv--oci_args--json_dispatch
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ../src/shared/json.c:2190 in json_variant_strv
==1547789==ABORTING
Or with an empty string in an array:
../src/shared/json.c:2202:39: runtime error: member access within misaligned address 0x000000000007 for type 'struct JsonVariant', which requires 8 byte alignment
0x000000000007: note: pointer points here
<memory cannot be printed>
#0 0x7f35f4ca9bcf in json_variant_strv ../src/shared/json.c:2202
#1 0x408332 in oci_args ../src/nspawn/nspawn-oci.c:173
#2 0x7f35f4cc09ce in json_dispatch ../src/shared/json.c:4400
#3 0x40addf in oci_process ../src/nspawn/nspawn-oci.c:428
#4 0x7f35f4cc09ce in json_dispatch ../src/shared/json.c:4400
#5 0x41fef5 in oci_load ../src/nspawn/nspawn-oci.c:2187
#6 0x4061e4 in LLVMFuzzerTestOneInput ../src/nspawn/fuzz-nspawn-oci.c:23
#7 0x40691c in main ../src/fuzz/fuzz-main.c:50
#8 0x7f35f364a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)
#9 0x7f35f364a5c8 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x275c8)
#10 0x405da4 in _start (/home/fsumsal/repos/@systemd/systemd/build-san/fuzz-nspawn-oci+0x405da4)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/shared/json.c:2202:39 in
Note: this happens only if json_variant_copy() in json_variant_set_source() fails.
pid1: when taking possession of passed fds check O_CLOEXEC state first
So here's the thing. One library we use (libselinux) is opening fds
behind our back when we initialize it and keeps it open. On the other
hand we want to automatically pick up all fds passed in to us, so that
we can distribute them to our services and close the rest. We pick them
up very early in our code, to ensure that we don't get confused by open
fds at that point. Except that libselinux insists on being initialized
even earlier. So suddenly we might take possession of libselinux' fds,
and then close them later when we decide no service wants them. Then
during shutdown we close down selinux and selinux closes its fds, but
since already closed long ago this ight close our fds instead. Hilarity
ensues.
I wish low-level software wouldn't do such things behind our back, but
well, let's make the best of it.
This changes the fd pick-up logic to only pick up fds that have
O_CLOEXEC unset. O_CLOEXEC must be unset for any fds passed in to us
over execve() after all. And for all our own fds we should set O_CLOEXEC
since we generally don't want to litter fd tables for execve(). Also,
libselinux thankfully appears to set O_CLOEXEC correctly on its fds,
hence the filter works.
log: propagate max log level into glibc's setlogmask()
Follow-up for: #27734
It makes sense to propagate the select log level we maintain also into
glibc, so that any code that uses syslog() directly that ends up in our
processes (libraries and such) are affected by our settings the same way
as we are ourselves.