Luca Boccassi [Tue, 3 Aug 2021 14:00:40 +0000 (15:00 +0100)]
tree-wide: voidify unchecked close_nointr calls
These have ignored the return value forever. Two are public APIs so
we can't really change what they return anyway, and the other one is
a cleanup path and the existing error code is more important.
In general we almost never hit those asserts in production code, so users see
them very rarely, if ever. But either way, we just need something that users
can pass to the developers.
We have quite a few of those asserts, and some have fairly nice messages, but
many are like "WTF?" or "???" or "unexpected something". The error that is
printed includes the file location, and function name. In almost all functions
there's at most one assert, so the function name alone is enough to identify
the failure for a developer. So we don't get much extra from the message, and
we might just as well drop them.
Dropping them makes our code a tiny bit smaller, and most importantly, improves
development experience by making it easy to insert such an assert in the code
without thinking how to phrase the argument.
When copying large directory trees it should be a better idea to sync
the whole fs once when we are done instead of individually for each
file, hence add COPY_SYNCFS.
As opposed to COPY_FSYNC/COPY_FSYNC_FULL this only really applies to the
top-level directory, after completion of the whole copy.
As a safety precaution it makes sense to fsync() files after copying
them, and maybe even the directories they are contained in. Let's add a
flag for these two cases.
copy: tighten destination checks when copying files
let's make sure we only operate on regular files when copying files.
Also, make sure to copy file attributes only over if target is a regular
file (so that copying a file to /dev/null won't alter the access
mode/ownership of that device node...)
(This might not look like a big improvement, but will shortly, when we
add fsync() support to the copy logic, at which point there are more
error paths we can unify that way.)
While we are at it, tweak a clean-up path: only unlink a copied file if
we are definitely the ones who created them, i.e. if O_EXCL is set.
PR #20176 broke building of the cryptsetup token logic. This wasn't
noticed before the PR was merged, because the only CIs new enough to be
able to build the token logic (the Fedora Rawhide ones) didn't actually
run at all on the PR.
Let's add the missing hookup for the TPM2 PCR bank logic also to the
token module, to make the CI pass again.
Previously, we'd encode PCR policies strictly with the SHA256 PCR bank
set. However, as it appears not all hw implement those. Sad.
Let's add some minimal logic to auto-detect supported PCR banks: if
SHA256 is supported, use that. But if not, automatically fall back to
SHA1.
This then changes both the LUKS code, and the credentials code to
serialize the selected bank, along with the rest of the data in order to
make this robust.
This extends the LUK2 JSON metadata in a compatible way. The credentials
encryption format is modified in an incompatible way however, but given
that this is not part of any official release should be OK.
Boot loaders are software like any other, and hence muse be updated in
regular intervals. Let's add a simple (optional) service that updates
sd-boot automatically from the host if it is found installed but
out-of-date in the ESP.
Note that traditional distros probably should invoke "bootctl update"
directly from the package scripts whenver they update the sd-boot
package. This new service is primarily intended for image-based update
systems, i.e. where the rootfs or /usr are atomically updated in A/B
style and where the current boot loader should be synced into the ESP
from the currently booted image every now and then. It can also act as
safety net if the packaging scripts in classic systems are't doing the
bootctl update stuff themselves.
Since updating boot loaders mit be a tiny bit risky (even though we try
really hard to make them robust, by fsck'ing the ESP and mounting it only on
demand, by doing updates mostly as single file updates and by fsync()ing
heavily) this is an optional feature, i.e. subject to "systemctl
enable". However, since it's the right thing to do I think, it's enabled
by default via the preset logic.
Note that the updating logic is implemented gracefully: i.e. it's a NOP
if the boot loader is already new enough, or was never installed.
bootctl: tweak "bootctl update" to be a NOP when boot loader is already current and --graceful is given
Previously, the "bootctl update" logic would refrain from downrgading a
boot loader, but if the boot loader that is installed already matched
the version we could install we'd install it anyway, under the
assumption this was effectively without effect. This behaviour was handy
while developing boot loaders, since installing a modified boot loader
didn't require a version bump.
However, outside of the systems of boot loader developers I don't think
this behaviour makes much sense: we should always emphasize doing
minimal changes to the ESP, hence when an update is supposedly not
necessary, then don't do it. Only update if it really makes sense, to
minimize writes to the ESP. Updating the boot loader is a good thing
after all, but doing so redundantly is not.
Also, downgrade the message about this to LOG_NOTICE, given this
shouldn't be a reason to log.
Finally, exit cleanly in this cases (or if another boot loader is
detected)
let's share some code between import_url_last_component() and
import_url_change_last_component(), and make sure we never eat up the
hostname component of the URL when parsing out the last component.
Let's also make import_url_change_last_component() more generic so that
we can also use it for append components to paths, instead of replacing
suffixes.
util: add one more helper for generating colored check mark glyphs
This one is useful for a outputs with a slightly more "positive"
outlook, i.e. where only the checkmarks are shown but the crossmarks are
replaced by spaces.
(Usecase: a larger table with many checkmarks, where the red crossmarks
might just be too much negative noise)
Now that CONST_MAX() is a bit more foregiving, let's stick to the native
return type of sizeof() everywhere, which is size_t, instead of casting
to "unsigned", so that on the common archs we don't unnecessarily lose
the upper 32bits.
This checked for strict type compatibility so far, which mean CONST_MAX()
couldn't be used on two differently signed integers, even though
conceptually there's nothing wrong with allowing that here, as C
correctly picks the larger type in the ternary op.
hence, let's explicitly whitelist integer comparisons here, as long as
the signedness matches.
manager: reexecute on SIGRTMIN+25, user instances only
Before this patch, there was no way to request all running user instances for
reexecuting. However this can be useful especially during package updates
otherwise user instances are never updated and keep running a potentially very
old version of the binaries.
Now assuming that we have enough priviledge, it's possible to request
reexecution of all user instances:
Note that this request is obviously asynchronous as it relies on a
signal. Keeping "systemctl kill" as the only interface should be good enough to
make this obvious and that's the reason why another interface, such as
"systemctl --global daemon-reexec" has not been considered.
PID1 already uses SIGTERM for reexecuting hence sending it SIGRTMIN+25 is a
nop.
man/systemctl: rework descriptions of bind and mount-image
The text used "unit's view" to mean mount namespace. But we talk about
mount namespaces in the later part of the paragraph anyway, so trying to
use an "approachable term" only makes the whole thing harder to understand.
Let's use the precise term.
Some paragraph-breaking and re-indentation is done too.
The output is similar to our hand-crafted status message, but it's nice to use
the built-in functionality. After all, it was amended during development to
support our use case.
This undoes part of 4c890ad3cc7b3445683d7b52bc00e4a58bef5e94: the
implementations of update-dbus-docs and update-man-rules are moved back to
man/meson.build, and alias_target() is used to keep the visible target names
unchanged.
The rules for man pages are reworked so that it's possible to invoke the
targets even if xstlproc is not available. After all, xsltproc is only needed
for the final formatted output, and not other processing.
As documented in /meson.build where the variable is defined,
meson.build_root() doesn't work as expected with project nesting. I have
no idea why anyone would want to embed systemd in another meson project,
but let's use the variable if we have it.
In general, we shouldn't blanket move syscalls like this into @default,
given that glibc actually does have fallbacks, afaics. However, as
long as the syscalls are "read-only" and thus benign, I figure it's a
safe thing to do. But we should probably stick to a "if in doubt, don't"
rule, and put these syscalls in @system-service as default, but not into
@default.
I think in the real world @system-service is the sensible group people
should use, and not @default actually.
meson: require 0.53.2 and drop some workarounds for old meson
Ubuntu Bionic 18.04 has 0.45, so it was below the previously required
minimum version already. Focal 20.04 has 0.53.2. Let's require that
and use various features that are available.
man: use title of docs/ pages when referring to them
There is some inconsistency, partially caused by the awkward naming
of the docs/ pages. But let's be consistent and use the "official" title.
If we ever change plural↔singular, we should use the same form everywhere.
> There are nothing we can configure in udevd for loopback interfaces;
> no ethertool configs can be applied, MAC address, interface name should
> introduced a regression for 'udevadm test-builtin net_setup_link /sys/class/net/lo/'.
> Prior to this commit this command would exit with 0 whereas after this commit
> it exists with 1. This causes cloud-init on Archlinux to fail as this command
> is run by it and likely also netplan to have networkd rescan and re-apply a
> bunch of things on NICs.
I think it's reasonable to keep returning 0 here: we are intentatinally doing
nothing for the device, and that is not an error, but a (noop) success.
Ondrej Kozina [Tue, 16 Mar 2021 19:13:28 +0000 (20:13 +0100)]
Add support for systemd-tpm2 libcryptsetup plugin.
Add support for systemd-tpm2 based LUKS2 device activation
via libcryptsetup plugin. This make the feature (tpm2 sealed
LUKS2 keyslot passphrase) usable from both systemd utilities
and cryptsetup cli.
The feature is configured via -Dlibcryptsetup-plugins combo
with default value set to 'auto'. It get's enabled automatically
when cryptsetup 2.4.0 or later is installed in build system.
This is not called from the systemd.triggers or systemd.macros files. Instead,
it would be called from the scriptlets in systemd rpm package itself, at the
place where we call systemctl daemon-reexec.
See https://github.com/systemd/systemd/pull/20289#issuecomment-885622200 .
rpm: restart user services at the end of the transaction
This closes an important gap: so far we would reexecute the system manager and
restart system services that were configured to do so, but we wouldn't do the
same for user managers or user services.
The scheme used for user managers is very similar to the system one, except
that there can be multiple user managers running, so we query the system
manager to get a list of them, and then tell each one to do the equivalent
operations: daemon-reload, disable --now, set-property Markers=+needs-restart,
reload-or-restart --marked.
The total time that can be spend on this is bounded: we execute the commands in
parallel over user managers and units, and additionally set SYSTEMD_BUS_TIMEOUT
to a lower value (15 s by default). User managers should not have too many
units running, and they should be able to do all those operations very
quickly (<< 1s). The final restart operation may take longer, but it's done
asynchronously, so we only wait for the queuing to happen.
The advantage of doing this synchronously is that we can wait for each step to
happen, and for example daemon-reloads can finish before we execute the service
restarts, etc. We can also order various steps wrt. to the phases in the rpm
transaction.
When this was initially proposed, we discussed a more relaxed scheme with bus
property notifications. Such an approach would be more complex because a bunch
of infrastructure would have to be added to system manager to propagate
appropriate notifications to the user managers, and then the user managers
would have to wait for them. Instead, now there is no new code in the managers,
all new functionality is contained in src/rpm/. The ability to call 'systemctl
--user user@' makes this approach very easy. Also, it would be very hard to
order the user manager steps and the rpm transaction steps.
Note: 'systemctl --user disable' is only called for a user managers that are
running. I don't see a nice way around this, and it shouldn't matter too much:
we'll just leave a dangling symlink in the case where the user enabled the
service manually.
Some rpms install a bunch of units… It seems nicer to invoke them all in
parallel. In particular, timeouts in systemctl also run in parallel, so if
there's some communication mishap, we will wait less.
rpm: use a helper script to actually invoke systemctl commands
Instead of embedding the commands to invoke directly in the macros,
let's use a helper script as indirection. This has a couple of advantages:
- the macro language is awkward, we need to suffix most commands by "|| :"
and "\", which is easy to get wrong. In the new scheme, the macro becomes
a single simple command.
- in the script we can use normal syntax highlighting, shellcheck, etc.
- it's also easier to test the invoked commands by invoking the helper
manually.
- most importantly, the logic is contained in the helper, i.e. we can
update systemd rpm and everything uses the new helper. Before, we would
have to rebuild all packages to update the macro definition.
This raises the question whether it makes sense to use the lua scriptlets when
the real work is done in a bash script. I think it's OK: we still have the
efficient lua scripts that do the short scripts, and we use a single shared
implementation in bash to do the more complex stuff.
The meson version is raised to 0.47 because that's needed for install_mode.
We were planning to raise the required version anyway…
glibc master uses getrandom in malloc since https://sourceware.org/git/?p=glibc.git;a=commit;h=fc859c304898a5ec72e0ba5269ed136ed0ea10e1 , getrandom should be in the default set so to avoid all non trivial programs to fallback to a PRNG.
Add variant of close_all_fds() that does not allocate and use it in freeze()
Even though it's just a fallback path, let's not be sloppy and allocate in
the crash handler.
> The deadlock happens because systemd crash in malloc() then in signal
> handler, it calls malloc() (close_all_fds()-> opendir()-> __alloc_dir())
> again. malloc() is not a signal-safe function, maybe we should re-think
> the logic here.
Currently it's only used in two places in src/shared/, so the function was
already included just once in compiled code. But it seems appropriate to
move it there anyway, because library code should have no need to fork
agents, so it doesn't belong in basic/.
We need a sorted list of fds to skip over when closing. We would allocate a
copy of the passed array to do the sort. But all callers construct a temporary
array to pass to us, so it is pointless to copy it again.
close_all_fds/safe_fork_full/namespace_fork/fork_agent are changed to pass
a non-const int array. I checked all users, and all callers are fine with
the array being sorted.
The function was returning some number (sometimes 1, sometimes the extent
of the range passed over to close_range(), ???). Anyway, all callers only
check for error, so let's return 0 on success.
man: stop recommending putting myhostname after dns
nss-resolve also looks in /etc/hosts, and has the same local hostname
resolving logic as nss-myhostname. We shouldn't recommend another order
than nss-resolve uses internally.
When nss-resolve is used, there's no possibility to override
nss-myhostname hosts via DNS *anyway*.
On top of that, it's not a good idea to allow DNS to override local
hostnames as all - at least not something we should advertise in the
docs.