meson: add option to build systemd-executor "statically"
The new link-executor-shared option is similar to the existing
link-udev-shared: when set to false, we link to the static versions of our
internal libraries.
The resulting exuctor binary is fairly large, about as large as libsystemd-core
(14 MB without lto, 8 with lto).
This is intended as a workaround for the fuckup with the pinned executor
binary:
when an upgrade is performed, the package manager will install new version of
the libraries and new version of the code, and some time later reexecute the
managers. This creates a window when the pinned executor binary will fail to
execute. There are two factors which make the issue easier to hit:
- when the distribution uses a finely-grained shared-lib-tag. E.g. Fedora
uses version-release as the tag, which means that the issue occurs on
every package upgrade. This is the right thing to do, because the
ABI of our internal libraries is not stable at all, so replacing the
library from a different version in place creates a window where our
programs may crash or misbehave.
- when the distribution doesn't immediately reexec all the managers after
upgrade. In early versions of systemd, we used to hammer the machine during
upgrade, doing daemon-reexecs repeatedly. This works, but is ugly and
wasteful. Doing the reexecs while the upgrade is in progres also creates a
window where a mix of old and new configs or both is loaded. Users are
particularly annoyed by those reloads if there is some issue in the
configuration causing us to emit warnings on every reexec. Doing the
reexecs once after the new configuration and libraries have been put
in place is nicer.
The pinning of the executor binary breaks upgrades and in particular
it penalizes the distributions which make use of the features which
were previously added to avoid bugs and inefficiency during upgrades.
When the executor is linked statically, there is a smaller chance that it'll
fail to load libraries. The issue can still occur because other libraries, not
our own, are linked dynamically.
meson: build libsystemd-core via an intermediate static library
By itself, this is not useful. I'm making this a separate commit to
make debugging easier. It turns out that meson does static libraries
using references, so the "static library" a tiny stub stub that refers
to the object files on disk and this has negligible cost:
$ ls -lhd build/src/core/libsystemd-core-257.{a,so}
-rw-r--r-- 1 zbyszek zbyszek 36K Jul 3 16:54 build/src/core/libsystemd-core-257.a
-rwxr-xr-x 1 zbyszek zbyszek 6.1M Jul 3 16:54 build/src/core/libsystemd-core-257.so
Our variables for internal libraries are named 'libfoo' for the shared lib
variant, and 'libfoo_static' for the static lib variant. The only exception was
libbasic, because we didn't have a shared variant for it. But let's rename it
for consitency. This makes the build config easier to understand.
Anton Golubev [Wed, 3 Jul 2024 07:42:24 +0000 (10:42 +0300)]
hwdb: Add some HP IR cameras
Two very similar devices, with two functions - a regular camera and IR.
The peculiarity of their infrared camera is that it uses a color image
format (YUYV), although it is essentially black and white.
The IR camera interface differs from the regular camera interface by name:
"HP Wide Vision FHD Camera: HP W" for the regular camera and
"HP Wide Vision FHD Camera: HP I" for an infrared camera
Therefore, glob *I is used to separate the IR camera
* f9fe17dbde Use vmlinux.h from kernel-devel
* 9cbad936a6 Pull in openssl-devel-engine
* 8ae009f929 Only add Requires on python3-zstd on Fedora
* 750e910c7c Drop BuildRequires on python3-zstd
meson: Define __TARGET_ARCH macros required by bpf
These are required by the bpf_tracing.h header in libbpf, see
https://github.com/libbpf/libbpf/blob/master/src/bpf_tracing.h.
bpf_tracing.h does have a few fallbacks in case __TARGET_ARCH_XXX
is not defined but recommends using the __TARGET_ARCH macros instead
so let's do that.
coredump: correctly take tmpfs size into account for compression
We calculate the amount of uncompressed data we can write by taking the limits
into account and halving it to ensure there's room for switching to compression
on the fly when storing cores on a tmpfs (eg: due read-only rootfs).
But the logic is flawed, as taking into account the size of the tmpfs storage
was applied after the halving, so in practice when an uncompressed core file
was larger than the tmpfs, we fill it and then fail.
Rearrange the logic so that the halving is done after taking into account
the tmpfs size.
These functions after all write EFI UTF-16 strings, i.e. are relatively
high-level, hence give them a specific name indicating the type, to
match our other helpers that have similar type suffixes.
Daniel Rusek [Thu, 6 Jun 2024 21:44:38 +0000 (23:44 +0200)]
test: split the resolved test suite into separate test cases
Although being far from ideal and the first two test cases have to be run
before the setup phase otherwise they will fail, it still makes the test
suite look much better and easier to read
This addresses an issued introduced by 278e815bfa3e4c2e3914e00121c37fc844cb2025, which dropped the a dependency
from user@.service systemd-user-sessions.service without replacement.
While dropping that dependency does make sense, it should have been
replaced with the weaker dependency on systemd-logind.service, hence fix
that now.
user@.service is after all a logind concept, hence logind really should
be around for its lifetime.
systemd-user-sessions.service is a later milestone that only really
should apply to regular users (not root), hence it's too strong a
requirement.
Daan De Meyer [Fri, 28 Jun 2024 18:12:51 +0000 (20:12 +0200)]
Use read_full_file_full() in read_smbios11_field()
read_virtual_file() will only read up to page size bytes of data
from /sys/firmware/dmi/entries/.../raw so let's use read_full_file_full()
instead to make sure we read all data.
This should be safe since smbios11 data can be considered immutable
during the lifetime of the system.
Various of our tools operate on block devices, and it's not always
obvious to know which block devices are actually appropriate for use.
Hence, let's add a helper that allows to list block devices, and
supports some limited filtering.
mountpoint-util: use the FID stuff for detecting the root of mounts
In the unlikely event that sandboxes block statx() but let
name_to_handle_at() through it's a good way to determine the root inode
of the namespace, since its parent inode will have the same FID and
mnt_id.
mountpoint-util: add new helper name_to_handle_at_try_fid()
Newer kernels support a new flag for name_to_handle_at(): AT_HANDLE_FID.
This flag is supposed to return an identifier for an inode that we can
use for checking inode identity. It's supposed to be a replacement for
checking .st_ino which doesn't work anymore today because inode numbers
are no longer unique on file systems (not on overlayfs, and not on btrfs
for example). Hence, be a good citizen and add infrastructure to support
AT_HANDLE_FID. Unfortunately that doesn't work for old kernels, hence
add a fallback logic: if we can use the flag, use it. If we cannot use
name_to_handle_at() without it, which might give us a good ID too. But
of course tha tcan fail as well, which callers have to check.
rhellstrom [Thu, 27 Jun 2024 08:00:00 +0000 (11:00 +0300)]
Conditional PSI check to reflect changes done in 5.13
cpu.pressure 'full' is undefined for system-wide checks since 5.13 but still reported with values set to 0 for backwards compatibility. Made changes to reflect this for system-wide checks so that the conditional comparison is not made against the 0 value and instead fall back to 'some'.
Luca Boccassi [Sat, 29 Jun 2024 17:31:23 +0000 (18:31 +0100)]
core: try again bind mounting if the destination was already created
If the destination mount point is on a shared filesystem and is
missing on the first attempt, we try to create it, but then
fail with -EEXIST if something else created it in the meanwhile.
Enter the retry logic on EEXIST, as we can just use the mount
point if it was already created.
Daan De Meyer [Sat, 29 Jun 2024 13:27:02 +0000 (15:27 +0200)]
mkfs-util: Set sector size for btrfs as well
btrfs used to default the sector size to the page size and didn't
support anything else. Since 6.7, it defaults to 4K and using 4K
makes the filesystem compatible with all page sizes. So let's make
sure we use minimum 4K as well (lower causes failures on systems with
a 4K page size) but still allow larger sector sizes if specified by
the user.
Mike Yuan [Fri, 28 Jun 2024 13:32:33 +0000 (15:32 +0200)]
core/unit: follow merged units before updating SourcePath= timestamp too
Currently, we only follow merged units for unit_load_dropin() call.
But if the unit is an alias, we should always perform operations
on the "canonical" unit.
nscd is known to be racy [1] and it was already deprecated and later dropped in
Fedora a while back [1,2]. We don't need to support obsolete stuff in systemd,
and the cache in systemd-resolved provides a better solution anyway.
Daan De Meyer [Fri, 28 Jun 2024 14:22:15 +0000 (16:22 +0200)]
TEST-54-CREDS: Use UEFI firmware if available
On aarch64, SMBIOS is only available when using UEFI, so let's make
sure that the creds test uses UEFI when available so that it can
read creds from SMBIOS when running in a virtual machine.
Daan De Meyer [Fri, 28 Jun 2024 14:21:51 +0000 (16:21 +0200)]
TEST-18-FAILUREACTION: Set auto firmware
This test runs in nspawn by default but will still run in qemu when
tests are run unprivileged so make sure we use UEFI if available to
avoid hangs when using the linux firmware.
Daan De Meyer [Fri, 28 Jun 2024 14:19:38 +0000 (16:19 +0200)]
TEST-09-REBOOT: Set auto firmware
This test runs in nspawn by default but will still run in qemu when
tests are run unprivileged so make sure we use UEFI if available to
avoid hangs when using the linux firmware.
Daan De Meyer [Fri, 28 Jun 2024 13:28:16 +0000 (15:28 +0200)]
TEST-70-TPM2: Use UEFI firmware if available
On x86 this doesn't matter but on aarch64 we need to make sure UEFI
is used so that /sys/kernel/security/tpm0/binary_bios_measurements
is there which is required for TEST-70-TPM2.
Mike Yuan [Wed, 5 Jun 2024 18:45:12 +0000 (20:45 +0200)]
core: do not set up cgroup runtime on coldplug
Currently, unit_setup_cgroup_runtime() is called in
various _coldplug() functions if the unit is not inactive.
That seems unnecessary though, and kinda defeats the purpose
of CGroupRuntime. If we need to fork off a process for the unit
or got something during deserialization, the CGroupRuntime
would be automatically set up by unit_prepare_exec() /
cgroup_runtime_deserialize_one(). Otherwise it would mean
the cgroup doesn't exist and we don't need to allocate
that in the first place.
Plus, note that socket units might also carry a cgroup with
ExecStartPre=/ExecStartPost=/... Hence the existing code
is really inconsistent.
Mike Yuan [Sat, 1 Jun 2024 23:50:09 +0000 (07:50 +0800)]
core: unify reset_accounting handling
Since the introduction of CGroupRuntime, there's no need
to call *_reset_accounting in unit_new(), hence make those
static. While at it, refrain from hardcoding default values
in cgroup_runtime_new(), but call the corresponding funcs.
This also corrects the default value of io_accounting_base.
Mike Yuan [Wed, 19 Jun 2024 19:07:07 +0000 (21:07 +0200)]
core/cgroup: call bpf_firewall_close in cgroup_runtime_free
No functional change, just deduplicate default values
in cgroup_runtime_free() and remove pointless call in
unit_free() (at the time it's called the CGRuntime has
been destroyed already).
It turns out OverlayFS doesn't handle gracefully when the same source is
specified multiple times in lowerdir= and it fails with ELOOP:
Failed to mount overlay (type overlay) on /run/systemd/mount-rootfs/opt (MS_RDONLY "lowerdir=/run/systemd/unit-extensions/1/opt:/run/systemd/unit-extensions/0/opt:/run/systemd/mount-rootfs/opt"): Too many levels of symbolic links
This happens even if we mount each image in a different internal mount
path, as OverlayFS will resolve it and look for the backing device, which
will be the same device mapper entity, and return a hard error.
This error does not appear if dm-verity is not used, so it is very
confusing for users, and unnecessary.
When mounting ExtensionImages, check if an image is dm-veritied,
and drop duplicates if the root hashes match, to avoid this user-unfriendly
hard error.
James Coglan [Fri, 28 Jun 2024 12:58:22 +0000 (13:58 +0100)]
resolved: correct parsing of OPT extended RCODEs
The DNS_PACKET_RCODE() function works out the full RCODE by taking the
first octet from the OPT record TTL field and bitwise-OR-ing this with
the basic RCODE from the packet header. This results in RCODE values
being lower than they should be.
For example, if the first TTL octet is 0x7a and the basic RCODE is 3,
this function currently returns `0x7a | 3` = 123, rather than 0x7a3 =
1955.
The first TTL octet is supposed to form the upper 8 bits of a 12-bit
value, whereas the current implementation constraints the value to 8
bits and results in mis-interpreted RCODEs.
This fixes things by shifting the TTL 20 places instead of 24 and
masking off the low nibble that comes from the upper bits of the version
octet.
Note that dns_packet_append_opt() correctly converts the input RCODE
into the high octet of the OPT TTL field; this problem only affects
parsing of incoming packets.