Daan De Meyer [Tue, 1 Oct 2024 12:30:15 +0000 (14:30 +0200)]
ukify: Rework multi-profile UKIs
The API introduced in https://github.com/systemd/systemd/pull/34295
is less than ideal:
- It doesn't consider signing at all (ukify can't sign separately yet)
- Measurement is completely broken (all profile sections are marked to
not be measured)
- It focuses on a very niche use case of extending existing UKIs and makes
the more common use case of building a UKI with several profiles included
much harder than needed.
Let's instead rework the API to focus on the primary use case of building
a UKI with multiple profiles added to it immediately. We require the profiles
to be built upfront as separate PE binaries with UKI. There's no need to sign
or measure these, they're solely vehicles for profile sections. This saves us
from having to complicate the command line and config parsing to support defining
multiple profiles.
To add the profiles when building a UKI, we introduce the new --add-profile
switch which takes a path to a PE binary describing a profile. The required
sections are read from each PE binary, measured and added as a profile.
The integration test is disabled until the new API is merged and exposed in
mkosi so that building a UKI with profiles can be left to mkosi and the integration
test will only test the switching between profiles and not the building of UKIs
with profiles.
Michael Ferrari [Thu, 3 Oct 2024 12:02:12 +0000 (14:02 +0200)]
repart: open target devices before UUID creation
This is to ensure that the UUIDs from the CopyBlocks= devices are copied
to the corresponding new partition instead of creating a new UUID for
it. With this verity partitions can be copied, keeping their UUIDs to
ensure that they still match up with what is specified in roothash=.
man/systemd-stub: reword descriptions of .dtb and .profile sections
- The text was clearly edited in variuos places to e.g. allow multiple
sections, so it first said that sections are singletons, and immediately
after that that some section are not.
- Replace "regardless of the kernel" with "regardless of the kernel version".
The kernel is very much involved e.g. in loading of the initrds.
- Various other small rewordings to make the text more legible.
Peter Hutterer [Tue, 12 Apr 2022 04:48:04 +0000 (14:48 +1000)]
logind: add support for hidraw devices
Add support for opening /dev/hidraw devices via logind's TakeDevice().
Same semantics as our support for evdev devices, but it requires the
HIDIOCREVOKE ioctl in the kernel.
IPE is a new LSM being introduced in 6.12. Like IMA, it works based on a
policy file that has to be loaded at boot, the earlier the better. So
like IMA, if such a policy is present, load it and activate it.
If there are any .p7b files in /etc/ipe/, load them as policies.
The files have to be inline signed in DER format as per IPE documentation.
Lukas Nykryn [Tue, 1 Oct 2024 09:30:18 +0000 (11:30 +0200)]
man: using WantedBy=default.target is not a good idea
We had several users, that wrote their unit files with
WantedBy=default.target because it should be started "every time".
But for example in Fedora/CentOS/RHEL, this often breaks for
example selinux relabels (where we just want to do a relabel and reboot).
Daan De Meyer [Wed, 2 Oct 2024 09:27:55 +0000 (11:27 +0200)]
mkosi: Stop installing bpftrace
bpftrace nudges the Fedora Rawhide images towards compiler-rt18 while the
sanitizer builds pull in clang19, leading to the sanitizer libraries
not being found at runtime. Let's drop bpftrace for now so that compiler-rt19
is pulled in in the main image.
Daan De Meyer [Wed, 2 Oct 2024 09:27:09 +0000 (11:27 +0200)]
mkosi: Pass ASAN_OPTIONS to subimages
systemd built with sanitizers is installed in subimages and tools
might get invoked in postinstall scripts so we have to disable ASAN
in the subimages as well during the image build.
man: soft deprecate use of ";" for separating multiple command lines in ExecStart=
So far we supported this syntax:
ExecStart=foo ; bar
as equivalent to:
ExecStart=foo
ExecStart=bar
With this change we'll "soft" deprecate the first syntax. i.e. it's
still supported in code, but not documented anymore.
The concept was originally added to make things easier for 3rd party
.ini readers, as it allowed writing unit files with a .ini framework
that doesn't allow multiple assignments for the same key. But frankly,
this is kinda pointless, as so many other of our knobs require the
double assignment.
Hence, let's just stop advertising the concept, let's simplify the docs,
by removing one entirely redundant feature from it.
sd-varlink: mark functions that can take 'more' flag in IDL structures with an explicit flag
Let's mark functions that accept the 'more' flag explicitly for that,
and validate for this explicitly.
This is preparation for
https://github.com/varlink/varlink.github.io/issues/26, if we get that
one day. Let's make sure that from day #1 we have this info available
even if we don't generate this in the IDL for now.
Also enables the two flags for all interfaces we export that use the
logic.
sd-varlink-idl: add some room for flags everywhere
Given this is supposed to be a public API now, let's add some concept
for extensions of these open-coded structures: let's make sure we have
flags fields on all structures (which we can use for extensions later).
Right now we only have this for varlink "fields" structures, this adds
the same for "symbols" and the "interface" as a whole.
There are no actual flags defined in either for now, this is just
future-safety preparation.
(But a later commit will add two flags to symbols)
tree-wide: always do dlopen() with RTLD_NOW + RTLD_NODELETE
Let's systematically use RTL_NOW|RLTD_NODELETE as flags passed to
dlopen(), across our codebase.
Various distros build with "-z now" anyway, hence it's weird to specify
RTLD_LAZY trying to override that (which it doesn't). Hence, let's
follow suit, and just do what everybody else does.
Also set RTLD_NODELETE, which is apparently what distros will probably
end up implying sooner or later anyway. Given that for pretty much all
our dlopen() calls we never call dlclose() anyway, let's just set this
everywhere too, to make things systematic.
This way, the flags we use by default match what distros such as fedora
do, there are no surprises, and read-only relocations can be a thing.
Helmut Grohne [Mon, 30 Sep 2024 15:56:18 +0000 (17:56 +0200)]
bpf: fix cross build failure on Debian
For compiling bpf code, the system include directory needs to be
constructed. On Debian-like systems, this requires passing a multiarch
directory. Since clang's -dump-machine prints something other that the
multiarch triplet, gcc was interrogated earlier, but that also yields a
wrong result for cross compilation and was thus skipped resulting in
clang not finding asm/types.h.
Rather than, -dump-machine we should ask for -print-multiarch (which
rarely differs). Whenever gcc is in use, this is right (even for cross
building). Since clang does not support -print-multiarch and its
-dump-machine never matches Debian's multiarch, we resort to asking gcc
when building natively. For cross builds using clang, we are out of
luck.
Add %posttrans versions of the systemd %postun scriptlets
On upgrades, only the %postun scriptlets of the old package version
run. This means that any changes related to restarting daemons require
two releases before they're actually used.
%postun is used because it runs after the old package has been removed,
which is important as it means any lingering dropins from the old package
will have been removed as well.
To allow deploying fixes in just a single release while still running after
the old package has been removed, let's introduce %posttrans versions of these
scriptlets as %posttrans of the new package runs on upgrade and install after
the old package has been removed.
This is the same as json_dispatch_user_group_name() but fills in the
string as "const char*" to the JSON field. Or in other words, it's what
sd_json_dispatch_const_string() is to sd_json_dispatch_string().
Note this drops the SD_JSON_STRICT flags from various dispatch tables
for these fields, and replaces this by SD_JSON_RELAX, i.e. the opposite
behaviour. As #34558 correctly suggests we should validate user names
in lookup functions using the lax rules, rather than the strict ones,
since clients not knowing the rules might ask us for arbitrary
resolution.
(SD_JSON_RELAX internally translates to valid_user_group_name() with the
VALID_USER_RELAX flag).
man: drop /var/spool/ mention from file-hierarchy(7) man page
Today it seems this is mostly used by mail and printer servers, and it's
not clear to me at all what the property is that makes
/var/spool/<package> the better place for the relevant data than
/var/lib/<package>.
Hence, in the interest of shortening the spec, let's not mention the dir
anymore. In particular as the dir really isn't used by us much, for
example we do not have a counterpart for RuntimeDirectory=,
StateDirectory=, … that would cover the spool.
Since most systems these days we care about probably come *without* a
printer or mail server, let's maybe no mention this in the man page that
is supposed to discuss the rough skeleton how things are set up. After
all, people are supposed to exend the skeleton with their stuff, and
this sounds more like a case for an extension of the skeleton instead of
being considered part of the skeleton itself.
man: drop mention of /usr/include/ from file-hierarchy(7) man page
The man page is supposed to provide a "generalized, though minimal and
modernized subset" (as per introductory pargapraghs), from a systemd
perspective. But the thing is that /usr/include/ really doesn't matter
to us. It's a development thing, and slightly weird (because it arguably
would be better places in /usr/share/include/ or so). It's not going to
be there on 95% of deployed systems, and we really don't want people to
bother with it on such systems.
We only define the skeleton of directories in this document, and it's
expected that people extend it, and I think this really should be one of
those dirs that is an extension of our skeleton, but not part of the
skeleton, if that makes any sense.
Now that we properly leave sufficient space for inline execution of
the .linux section, let's remove the special casing of the .linux
section as it doesn't need to be the last section anymore now.
ukify: Use SizeOfImage from linux image as virtual size of .linux section
The SizeOfImage is bigger than the image itself so that space is
guaranteed to be available for in place execution of the linux image. Let's
make sure we take this into account and use SizeOfImage as the section's virtual
size instead of the size of the image itself.
tpm2-util: show loaded libraries in 'systemd-analyze has-tpm2'
After 3b16e9f41983f697bc38c40bb8e7119c1bb4f7c8, even the libraries are
documented in the man page, it is useful to mention which libraries are
checked in the command output.
Of course, the dependencies are kind of implementation detail, and may
be changed in the future version, but that's especially why I think
showing the library deps in the output is useful.
systemd-analyze is a debugging tool, and already shows many internal
states. I think there is nothing to prevent from showing the deps.
repart: Shortcut copy if source or target starts with exclude path
If the source or target we're copying to is a subdirectory of any of the
directories specified in ExcludeFiles= or ExcludeFilesTarget=, shortcut the
entire copy operation.
Every services and containers should be able to protect their users and
limit the impact of security bugs thanks to the security syscalls
provided by seccomp and Landlock. The goal of these syscalls is to
improve security with additional restrictions. They are designed to be
safely used by unprivileged (and then potentially malicious) users.
Remove the now-redundant "seccomp" entry for nspawn.
Somebody wrapped the text, but whitespace is preserved in <programlisting>, so
the output was mangled. It also doesn't make sense to run systemd-path as root
(as indicated by '#'), so drop that. Also, this chunk should be a separate
paragraph.
Ivan Kruglov [Fri, 20 Sep 2024 10:20:53 +0000 (12:20 +0200)]
machine: resolve race condition in TEST-13-NSPAWN.machinectl.sh
I encountered this race condition while working on TEST-13-NSPAWN.varlinkctl.sh.
The long-running machine's init script sometimes does not have time to start and
register signals. As result, occasiounally failed tests.
units: Order ldconfig after systemd-tmpfiles-setup.service
tmpfiles might be linking the configuration for ldconfig into /etc
so make sure it runs after it so that the configuration is guaranteed
to be in place.
repart: Determine verity sig size based on partition designator
Verity= is an image build concept, not a first boot concept, whereas
a partition designator is always available, so let's do the size stuff
based on that.
Ivan Shapovalov [Fri, 20 Sep 2024 11:01:51 +0000 (13:01 +0200)]
core/cgroup: cache IO accounting data when pruning a cgroup
When removing a cgroup in unit_prune_cgroup(), read IO metrics to cache
them similar to the existing treatment of the CPU and memory usage data.
Note that we do not do this for the IP metrics as the firewall objects
are only destroyed in unit_free() and thus stay alive long enough to
be read out directly by all interested parties.