1. The "entry-token" concept already introduced in kernel-install is now
made use of. i.e. specifically there's a new option --entry-token=
that can be used to explicitly select by which ID to identify boot
loader entries: the machine ID, or some OS ID (ID= or IMAGE_ID= from
/etc/os-release, or even some completely different string. The
selected string is then persisted to /etc/kernel/entry-token, so that
kernel-install can find it there.
2. The --make-machine-id-directory= switch is renamed to
--make-entry-directory= since after all it's not necessarily the
machine ID the dir is named after, but can be any other string as
selected by the entry token.
3. This drops all code to make automatic changes to /etc/machine-info.
Specifically, the KERNEL_INSTALL_MACHINE_ID= field is now more
generically implemented in /etc/kernel/entry-token described above,
hence no need to place it at two locations. And the
KERNEL_INSTALL_LAYOUT= field is not configurable by user switch or
similar anyway in bootctl, but only read from
/etc/kernel/install.conf, and hence copying it from one configuration
file to another appears unnecessary, the second copy is fully
redundant. Note that this just drops writing these fields, they'll
still be honoured when already set.
This drops documentation of KERNEL_INSTALL_MACHINE_ID as machine-info
field (though we'll still read it for compat).
This updates the kernel-install man page to always say "ENTRY-TOKEN"
instead of "MACHINE-ID" where appropriate, to clear the confusion up
between the two.
This also tries to fix how we denote env vars (always prefix with $ and
without = suffix), and other vars (without $ but with = suffix)
kernel-install: search harder for kernel image/initrd drop-in dir
If not explicitly configured, let's search a bit harder for the
ENTRY_TOKEN, and let's try the machine ID, the IMAGE_ID and ID fields of
/etc/os-release and finally "Default", all below potential $XBOOTLDR.
kernel-install: only generate systemd.boot_id= in kernel command line if used for naming the boot loader spec files/dirs
Now that we can distinguish the naming of the boot loader spec
dirs/files and the machine ID let's tweak the logic for suffixing the
kernel cmdline with systemd.boot_id=: let's only do that when we
actually need the boot ID for naming these dirs/files. If we don't,
let's not bother.
This should be beneficial for "golden" images that shall not carry any
machine IDs at all, i.e acquire their identity only once the final
userspace is actually reached.
kernel-install: add a new $ENTRY_TOKEN variable for naming boot entries
This cleans up naming of boot loader spec boot entries a bit (i.e. the
naming of the .conf snippet files, and the directory in $BOOT where the
kernel images and initrds are placed), and isolates it from the actual machine
ID concept.
Previously there was a sinlge concept for both things, because typically
the entries are just named after the machine ID. However one could also
use a different identifier, i.e. not a 128bit ID in which cases issues
pop up everywhere. For example, the "machine-id" field in the generated
snippets would not be a machine ID anymore, and the newly added
systemd.machine_id= kernel parameter would possibly get passed invalid
data.
Hence clean this up:
$MACHINE_ID → always a valid 128bit ID.
$ENTRY_TOKEN → usually the $MACHINE_ID but can be any other string too.
This is used to name the directory to put kernels/initrds in. It's also
used for naming the *.conf snippets that implement the Boot Loader Type
1 spec.
kernel-install: don't try to persist used machine ID locally
This reworks the how machine ID used by the boot loader spec snippet
generation logic. Instead of persisting it automatically to /etc/ we'll
append it via systemd.machined_id= to the kernel command line, and thus
persist it in the generated boot loader spec snippets instead. This has
nice benefits:
1. We do not collide with read-only root
2. The machine ID remains stable across factory reset, so that we can
safely recognize the path in $BOOT we drop our kernel images in
again, i.e. kernel updates will work correctly and safely across
kernel factory resets.
3. Previously regular systems had different machine IDs while in
initrd and after booting into the host system. With this change
they will now have the same.
This then drops implicit persisting of KERNEL_INSTALL_MACHINE_ID, as its
unnecessary then. The field is still honoured though, for compat
reasons.
This also drops the "Default" fallback previously used, as it actually
is without effect, the randomized ID generation already took precedence
in all cases. This means $MACHNE_ID/KERNEL_INSTALL_MACHINE_ID are now
guaranteed to look like a proper machine ID, which is useful for us,
given you need it that way to be able to pass it to the
systemd.machine_id= kernel command line option.
get_pretty_hostname() so far had semantics not in line with our usual
ones: the return parameter was actually freed before the return string
written into it, because that's what parse_env_file() does. Moreover,
when the value was not set it would return NULL but succeed.
Let's normalize this, and only fill in the return value if there's
something set, and never read from it, like we usually do with return
parameter, and in particular those named "ret_xyz".
The existing callers don't really care about the differences, but it's
nicer to normalize behaviour to minimize surprises.
Frantisek Sumsal [Thu, 10 Mar 2022 14:18:45 +0000 (15:18 +0100)]
cgls: mangle user-provided unit names
so the CLI interface is now similar to `systemctl`, i.e. if no unit name
suffix is provided, assume `.service`.
Fixes: #20492
Before:
```
$ systemd-cgls --unit user@1000
Failed to query unit control group path: Invalid argument
Failed to list cgroup tree: Invalid argument
```
Luca Boccassi [Thu, 10 Mar 2022 01:30:08 +0000 (01:30 +0000)]
core: support ExtensionDirectories in user manager
Unprivileged overlayfs is supported since Linux 5.11. The only
change needed to get ExtensionDirectories to work is to avoid
hard-coding the staging directory to the system manager runtime
directory, everything else just works (TM).
This change does two things: raise the default limit for nspawn
containers (where we try to mimic closely what the kernel does), and
bump it when running on old kernels which still have the lower setting.
The first ExecStartPre or the first ExecStart commands would get the metadata,
but not the subsequent ones. Also check that we do not pass it in
ExecStartPost.
manager: prevent cleanup of triggering units before we start the handler
This fixes the following case:
OnFailure= would be spawned correctly, but OnSuccess= would be
spawned without the MONITOR_* metadata, because we'd "collect" the unit
that started successfully. So let's block cleanup while we have a job
running for the handler. The job cannot last infinitely, so at some point
we'll be able to collect both.
We already logged what we are spawning, but not so much why. Let's
add this, so it's easier to distinguish execstartpre/execstart/execstartpost
and such.
The test would fail when the the same handler was used for multiple
*failing* units. We need to call 'reset-failed' to let the manager forget
about the earlier ones.
systemd-analyze log-target console is removed, because it's easier to follow
the logs if logging it to the journal.
Luca Boccassi [Wed, 9 Feb 2022 11:48:30 +0000 (11:48 +0000)]
core: split $MONITOR_METADATA and return it only if a single unit triggers OnFailure/OnSuccess
Remove the list logic, and simply skip passing metadata if more than one
unit triggered an OnFailure/OnSuccess handler.
Instead of a single env var to loop over, provide each separate item
as its own variable.
Luca Boccassi [Tue, 8 Mar 2022 22:13:37 +0000 (22:13 +0000)]
core: do not return 'skipped' when Condition*= fail with StartUnitWithFlags()
Backward incompatible change to avoid returning 'skipped' if a condition causes
a job activation to be skipped when using StartUnitWithFlags().
Job results are broadcasted, so it is theoretically possible that existing
software could get confused if they see this result.
Luca Boccassi [Wed, 9 Mar 2022 02:07:34 +0000 (02:07 +0000)]
core: support MountAPIVFS and RootDirectory in user manager
The only piece missing was to somehow make /proc appear in the
new user+mount namespace. It is not possible to mount a new
/proc instance, not even with hidepid=invisible,subset=pid, in
a user namespace unless a PID namespace is created too (and also
at the same time as the other namespaces, it is not possible to
mount a new /proc in a child process that creates a PID namespace
forked from a parent that created a user+mount namespace, it has
to happen at the same time).
Use the host's /proc with a bind-mount as a fallback for this
case. User session services would already run with it, so
nothing is lost.
Yu Watanabe [Mon, 7 Mar 2022 06:45:17 +0000 (15:45 +0900)]
network: refuse string which contains non-safe or non-ascii characters for Filename=
The string will be used when the client load additional config file to
boot, and it must be a valid path or url. Hence, let's refuse non-safe or
non-characters.
The function has nothing to do with any Manager object, hence drop that
from the name. And it actually looks something up by handle *action* not
by *handle*, hence the old name was a bit misnomer. Let's call it
handle_action_lookup(), as it queries handle action metainfo for a
handle action.
Also, let's make sure it behaves more like our usual functions that
lookup some fixed data from some enum value/int: let's return NULL if we
don't find it.
It stores meta-info about various HandleActions, hence let's name it
after that. The fact that it can be seen as stored inside some form of a
table is an implementation detail of logind-action.c, and should not
leak into other modules, hence let's focus on what it is, not how it is
stored.
logind: replace handle_action_valid() macro by inline function
The old macro will double evaluation and has no protection against
operator precedence issues. Let's fix that by using an inline func
instead, which also gives us typesafety.
random-util: unify RANDOM_ALLOW_INSECURE and !RANDOM_BLOCK and simplify
RANDOM_BLOCK has existed for a long time, but RANDOM_ALLOW_INSECURE was
added more recently, leading to an awkward relationship between the two.
It turns out that only one, RANDOM_BLOCK, is needed.
RANDOM_BLOCK means return cryptographically secure numbers no matter
what. If it's not set, it means try to do that, but if it fails, fall
back to using unseeded randomness.
This part of falling back to unseeded randomness is the intent of
GRND_INSECURE, which is what RANDOM_ALLOW_INSECURE previously aliased.
Rather than having an additional flag for that, it makes more sense to
just use it whenever RANDOM_BLOCK is not set. This saves us the overhead
of having to open up /dev/urandom.
Additionally, when getrandom returns too little data, but not zero data,
we currently fall back to using /dev/urandom if RANDOM_BLOCK is not set.
This doesn't quite make sense, because if getrandom returned seeded data
once, then it will forever after return the same thing as whatever
/dev/urandom does. So in that case, we should just loop again.
Since there's never really a time where /dev/urandom is able to return
some easily but more with difficulty, we can also get rid of
RANDOM_EXTEND_WITH_PSEUDO. Once the RNG is initialized, bytes
should just flow normally.
This also makes RANDOM_MAY_FAIL obsolete, because the only case this ran
was where we'd fall back to /dev/urandom on old kernels and return
GRND_INSECURE bytes on new kernels. So also get rid of that flag.
Finally, since we're always able to use GRND_INSECURE on newer kernels,
and we only fall back to /dev/urandom on older kernels, also only fall
back to using RDRAND on those older kernels. There, the only reason to
have RDRAND is to avoid a kmsg entry about unseeded randomness.
The result of this commit is that we now cascade like this:
- Use getrandom(0) if RANDOM_BLOCK.
- Use getrandom(GRND_INSECURE) if !RANDOM_BLOCK.
- Use /dev/urandom if !RANDOM_BLOCK and no GRND_INSECURE support.
- Use /dev/urandom if no getrandom() support.
- Use RDRAND if we would use /dev/urandom for any of the above reasons
and RANDOM_ALLOW_RDRAND is set.
Yu Watanabe [Tue, 8 Mar 2022 12:15:58 +0000 (21:15 +0900)]
test: skip TEST-17 on ubuntu ppc64el
On Ubuntu CI on ppc64el, the test randomly fails when /run/udev is not
synced before checking its contents (see #22357). But /run/udev is a
tmpfs and fsync on tmpfs is noop (see `struct shmem_file_operations` in
mm/shmem.c of the kernel), hence, it is not necessary to call fsync on
/run/udev in general. This should be a testing emvironment issue (I
guess it is an issue on nested KVM on ppc64el), instead of an issue on
udev.
licunlong [Tue, 8 Mar 2022 11:18:36 +0000 (19:18 +0800)]
main: log which process send SIGNAL to PID1
This can help users to figure out what makes systemd freeze.
1. Someone kills systemd accidentally, then the sender_pid won't be 1;
2. systemd triggers segfault or assert, then the sender_pid will be 1;
When writing docs for SD_BUS_VTABLE_CAPABILITY, I noticed that we have one use
of SD_BUS_VTABLE_CAPABILITY(CAP_SYS_ADMIN) in the tree. This is the default, so
it's not very useful to specify it. But if we're touching that, I think it's
better to use mac + polkit for this like for everything else.
We don't have a very good category for this, but I don't think it makes sense
to add a new one. I just reused the same as other similar calls.
David Bond [Tue, 8 Mar 2022 09:41:39 +0000 (10:41 +0100)]
udev: 60-persistent-storage-tape.rules: handle duplicate device ID
Some SCSI tape devices use the same device ID (NAA registered device
designator) for the SCSI tape changer device and the first actual tape
device. For example, this one:
You must connect the bridged drive to an HBA supporting multiple
LUNs (also referred to as LUN scanning). The SL150 Library uses a
single SCSI ID and two logical unit numbers (LUN). LUN 0 controls
the tape drive and LUN 1 which is configured as a SCSI medium
changer device controls the robotics. Data is sent to the remaining
LUN on the bridged drive or to LUNs on the other, unbridged drives
in the partition, all of which are configured as SCSI
sequential-access (tape) devices.
This may lead to errors because /dev/tape/by-id symlinks may sometimes
point to the st device representing the tape, and sometimes to the sg
device representing the changer.
Fix this by assigning an increased priority to the tape device, and creating
a separate -changer link for the SCSI tape changer.
Bastien Nocera [Mon, 7 Mar 2022 09:11:12 +0000 (10:11 +0100)]
memory-id: Work-around incorrect "Number of slots"
In some BIOSes, the "Number of slots or sockets available for Memory
Devices in this array" is incorrectly set to the number of memory array
that's populated.
Work-around this problem by outputting the number of sockets after
having parsed them so that consumers of this data can carry on expecting
an accurate number in this property.
This fixes the number of memory slots advertised for the HP Z600.
See https://gitlab.gnome.org/GNOME/gnome-control-center/-/issues/1686
portable: add return parameter to GetImageMetadataWithExtensions
The complaint was that the output array was used for two kinds of data, and the
input flag decided whether this extra data should be included. The flag is
removed, and instead the old method is changed to include the data always as
a separate parameter.
This breaks backward compatibility, but the old method is effectively broken
and does not appear to be used yet, at least in open source code, by
searching on codesearch.debian.net and github.com.
Daan De Meyer [Mon, 28 Feb 2022 10:12:04 +0000 (10:12 +0000)]
mkosi: Add CentOS Stream 9
The blocker causing Stream 9 builds to fail was fixed
(https://bugzilla.redhat.com/show_bug.cgi?id=2056276) so we can add
CentOS Stream 9 builds as well now.
IIUC, pipefail doesn't matter for a sequence of commands joined with &&, and we
don't have any pipes. And such a failing expression also does not trigger an
exit, so the set +e/set -e were noops.
4piu [Mon, 7 Mar 2022 08:38:08 +0000 (16:38 +0800)]
Add support for NEC VersaPro VG-S
The brightness control key (Fn+F7 Fn+F8) and touchpad toggle key (Fn + Space) do not work on the NEC VersaPro VG-S laptop. Add the keycode to fix the problem.
I think the current behaviour is stupid: 'x-systemd.automount,noauto' should
mean that we create the units, but don't add .mount or .automount to any targets.
Instead, we completely ignore 'noauto'. But let's at least describe the
implementation.
A description of SD_BUS_VTABLE_CAPABILITY is added, and the discussion
on SD_BUS_VTABLE_UNPRIVILEGED in expanded. I think it would be nice
to add longer description of how access is checked (maybe in sd-bus(3)),
but I'm leaving that for later. I think the text that was added here
describes everything, even if tersely.
The data type off_t can be 64 on 32 bit systems if they have large
file support. Since mmap expects a size_t with 32 bits as second
argument truncation could occur. At worst these huge files could
lead to mmaps smaller than the previous check for small files.
This in turn shouldn't have a lot of impact because mmap allocates
at page size boundaries. This also made the PAGE_ALIGN call in
open_mmap unneeded. In fact it was neither in sync with other mmap
calls nor with its own munmap counterpart in error path.
If such large files are encountered, which is very unlikely in these
code paths, treat them with the same error as if they are too small.