* f9fe17dbde Use vmlinux.h from kernel-devel
* 9cbad936a6 Pull in openssl-devel-engine
* 8ae009f929 Only add Requires on python3-zstd on Fedora
* 750e910c7c Drop BuildRequires on python3-zstd
meson: Define __TARGET_ARCH macros required by bpf
These are required by the bpf_tracing.h header in libbpf, see
https://github.com/libbpf/libbpf/blob/master/src/bpf_tracing.h.
bpf_tracing.h does have a few fallbacks in case __TARGET_ARCH_XXX
is not defined but recommends using the __TARGET_ARCH macros instead
so let's do that.
coredump: correctly take tmpfs size into account for compression
We calculate the amount of uncompressed data we can write by taking the limits
into account and halving it to ensure there's room for switching to compression
on the fly when storing cores on a tmpfs (eg: due read-only rootfs).
But the logic is flawed, as taking into account the size of the tmpfs storage
was applied after the halving, so in practice when an uncompressed core file
was larger than the tmpfs, we fill it and then fail.
Rearrange the logic so that the halving is done after taking into account
the tmpfs size.
These functions after all write EFI UTF-16 strings, i.e. are relatively
high-level, hence give them a specific name indicating the type, to
match our other helpers that have similar type suffixes.
Daniel Rusek [Thu, 6 Jun 2024 21:44:38 +0000 (23:44 +0200)]
test: split the resolved test suite into separate test cases
Although being far from ideal and the first two test cases have to be run
before the setup phase otherwise they will fail, it still makes the test
suite look much better and easier to read
This addresses an issued introduced by 278e815bfa3e4c2e3914e00121c37fc844cb2025, which dropped the a dependency
from user@.service systemd-user-sessions.service without replacement.
While dropping that dependency does make sense, it should have been
replaced with the weaker dependency on systemd-logind.service, hence fix
that now.
user@.service is after all a logind concept, hence logind really should
be around for its lifetime.
systemd-user-sessions.service is a later milestone that only really
should apply to regular users (not root), hence it's too strong a
requirement.
Daan De Meyer [Fri, 28 Jun 2024 18:12:51 +0000 (20:12 +0200)]
Use read_full_file_full() in read_smbios11_field()
read_virtual_file() will only read up to page size bytes of data
from /sys/firmware/dmi/entries/.../raw so let's use read_full_file_full()
instead to make sure we read all data.
This should be safe since smbios11 data can be considered immutable
during the lifetime of the system.
Various of our tools operate on block devices, and it's not always
obvious to know which block devices are actually appropriate for use.
Hence, let's add a helper that allows to list block devices, and
supports some limited filtering.
mountpoint-util: use the FID stuff for detecting the root of mounts
In the unlikely event that sandboxes block statx() but let
name_to_handle_at() through it's a good way to determine the root inode
of the namespace, since its parent inode will have the same FID and
mnt_id.
mountpoint-util: add new helper name_to_handle_at_try_fid()
Newer kernels support a new flag for name_to_handle_at(): AT_HANDLE_FID.
This flag is supposed to return an identifier for an inode that we can
use for checking inode identity. It's supposed to be a replacement for
checking .st_ino which doesn't work anymore today because inode numbers
are no longer unique on file systems (not on overlayfs, and not on btrfs
for example). Hence, be a good citizen and add infrastructure to support
AT_HANDLE_FID. Unfortunately that doesn't work for old kernels, hence
add a fallback logic: if we can use the flag, use it. If we cannot use
name_to_handle_at() without it, which might give us a good ID too. But
of course tha tcan fail as well, which callers have to check.
rhellstrom [Thu, 27 Jun 2024 08:00:00 +0000 (11:00 +0300)]
Conditional PSI check to reflect changes done in 5.13
cpu.pressure 'full' is undefined for system-wide checks since 5.13 but still reported with values set to 0 for backwards compatibility. Made changes to reflect this for system-wide checks so that the conditional comparison is not made against the 0 value and instead fall back to 'some'.
Luca Boccassi [Sat, 29 Jun 2024 17:31:23 +0000 (18:31 +0100)]
core: try again bind mounting if the destination was already created
If the destination mount point is on a shared filesystem and is
missing on the first attempt, we try to create it, but then
fail with -EEXIST if something else created it in the meanwhile.
Enter the retry logic on EEXIST, as we can just use the mount
point if it was already created.
Daan De Meyer [Sat, 29 Jun 2024 13:27:02 +0000 (15:27 +0200)]
mkfs-util: Set sector size for btrfs as well
btrfs used to default the sector size to the page size and didn't
support anything else. Since 6.7, it defaults to 4K and using 4K
makes the filesystem compatible with all page sizes. So let's make
sure we use minimum 4K as well (lower causes failures on systems with
a 4K page size) but still allow larger sector sizes if specified by
the user.
Mike Yuan [Fri, 28 Jun 2024 13:32:33 +0000 (15:32 +0200)]
core/unit: follow merged units before updating SourcePath= timestamp too
Currently, we only follow merged units for unit_load_dropin() call.
But if the unit is an alias, we should always perform operations
on the "canonical" unit.
nscd is known to be racy [1] and it was already deprecated and later dropped in
Fedora a while back [1,2]. We don't need to support obsolete stuff in systemd,
and the cache in systemd-resolved provides a better solution anyway.
Daan De Meyer [Fri, 28 Jun 2024 14:22:15 +0000 (16:22 +0200)]
TEST-54-CREDS: Use UEFI firmware if available
On aarch64, SMBIOS is only available when using UEFI, so let's make
sure that the creds test uses UEFI when available so that it can
read creds from SMBIOS when running in a virtual machine.
Daan De Meyer [Fri, 28 Jun 2024 14:21:51 +0000 (16:21 +0200)]
TEST-18-FAILUREACTION: Set auto firmware
This test runs in nspawn by default but will still run in qemu when
tests are run unprivileged so make sure we use UEFI if available to
avoid hangs when using the linux firmware.
Daan De Meyer [Fri, 28 Jun 2024 14:19:38 +0000 (16:19 +0200)]
TEST-09-REBOOT: Set auto firmware
This test runs in nspawn by default but will still run in qemu when
tests are run unprivileged so make sure we use UEFI if available to
avoid hangs when using the linux firmware.
Daan De Meyer [Fri, 28 Jun 2024 13:28:16 +0000 (15:28 +0200)]
TEST-70-TPM2: Use UEFI firmware if available
On x86 this doesn't matter but on aarch64 we need to make sure UEFI
is used so that /sys/kernel/security/tpm0/binary_bios_measurements
is there which is required for TEST-70-TPM2.
Mike Yuan [Wed, 5 Jun 2024 18:45:12 +0000 (20:45 +0200)]
core: do not set up cgroup runtime on coldplug
Currently, unit_setup_cgroup_runtime() is called in
various _coldplug() functions if the unit is not inactive.
That seems unnecessary though, and kinda defeats the purpose
of CGroupRuntime. If we need to fork off a process for the unit
or got something during deserialization, the CGroupRuntime
would be automatically set up by unit_prepare_exec() /
cgroup_runtime_deserialize_one(). Otherwise it would mean
the cgroup doesn't exist and we don't need to allocate
that in the first place.
Plus, note that socket units might also carry a cgroup with
ExecStartPre=/ExecStartPost=/... Hence the existing code
is really inconsistent.
Mike Yuan [Sat, 1 Jun 2024 23:50:09 +0000 (07:50 +0800)]
core: unify reset_accounting handling
Since the introduction of CGroupRuntime, there's no need
to call *_reset_accounting in unit_new(), hence make those
static. While at it, refrain from hardcoding default values
in cgroup_runtime_new(), but call the corresponding funcs.
This also corrects the default value of io_accounting_base.
Mike Yuan [Wed, 19 Jun 2024 19:07:07 +0000 (21:07 +0200)]
core/cgroup: call bpf_firewall_close in cgroup_runtime_free
No functional change, just deduplicate default values
in cgroup_runtime_free() and remove pointless call in
unit_free() (at the time it's called the CGRuntime has
been destroyed already).
It turns out OverlayFS doesn't handle gracefully when the same source is
specified multiple times in lowerdir= and it fails with ELOOP:
Failed to mount overlay (type overlay) on /run/systemd/mount-rootfs/opt (MS_RDONLY "lowerdir=/run/systemd/unit-extensions/1/opt:/run/systemd/unit-extensions/0/opt:/run/systemd/mount-rootfs/opt"): Too many levels of symbolic links
This happens even if we mount each image in a different internal mount
path, as OverlayFS will resolve it and look for the backing device, which
will be the same device mapper entity, and return a hard error.
This error does not appear if dm-verity is not used, so it is very
confusing for users, and unnecessary.
When mounting ExtensionImages, check if an image is dm-veritied,
and drop duplicates if the root hashes match, to avoid this user-unfriendly
hard error.
James Coglan [Fri, 28 Jun 2024 12:58:22 +0000 (13:58 +0100)]
resolved: correct parsing of OPT extended RCODEs
The DNS_PACKET_RCODE() function works out the full RCODE by taking the
first octet from the OPT record TTL field and bitwise-OR-ing this with
the basic RCODE from the packet header. This results in RCODE values
being lower than they should be.
For example, if the first TTL octet is 0x7a and the basic RCODE is 3,
this function currently returns `0x7a | 3` = 123, rather than 0x7a3 =
1955.
The first TTL octet is supposed to form the upper 8 bits of a 12-bit
value, whereas the current implementation constraints the value to 8
bits and results in mis-interpreted RCODEs.
This fixes things by shifting the TTL 20 places instead of 24 and
masking off the low nibble that comes from the upper bits of the version
octet.
Note that dns_packet_append_opt() correctly converts the input RCODE
into the high octet of the OPT TTL field; this problem only affects
parsing of incoming packets.
James Coglan [Fri, 28 Jun 2024 12:41:31 +0000 (13:41 +0100)]
resolved: allow the full TTL to be used by OPT records
Whereas RFC 1035 says the TTL field takes the "positive values of a
signed 32 bit number", and RFC 2181 says "Implementations should treat
TTL values received with the most significant bit set as if the entire
value received was zero,", the dns_packet_read_rr() function sets
rr->ttl to zero if the MSB is set.
However, EDNS(0) as specified in RFC 6891 repurposes the TTL field's 4
octets to store other information, c.f.:
+0 (MSB) +1 (LSB)
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
0: | EXTENDED-RCODE | VERSION |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
2: | DO| Z |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The first octet extends the usual 4-bit RCODE from the packet header by
providing an additional 8 bits of space, extending the RCODE to 12 bits.
But, our handling of the TTL field means that the high bit in the first
octet is not actually usable, since setting it will mean these 4 octets
are replaced with 0. This may have the effect of making us believe a
server does not support DNSSEC when it actually set the DO bit in its
OPT record.
Here we change things so that the TTL is only set to zero for record
types other than OPT.
LICENSES/README: expand text to summarize state for binaries and libs
We would say how *sources* are licensed, but actually most user care about the
resulting binaries. So say how the *binaries* are licensed. I used the word
"effectively" because the permissive licenses don't set any requirements on the
binaries, so the license of sources is a complex mix, but the resulting
binaries have a simple effective license.
Also, make it clear that the GPLv2 license applies to udev programs, but not
the shared library. Based on private correspondence, there's some confusion
about this.