Frantisek Sumsal [Sun, 13 Mar 2022 13:45:03 +0000 (14:45 +0100)]
macro: account for negative values in DECIMAL_STR_WIDTH()
With negative numbers we wouldn't account for the minus sign, thus
returning a string with one character too short, triggering buffer
overflows in certain situations.
1. The "entry-token" concept already introduced in kernel-install is now
made use of. i.e. specifically there's a new option --entry-token=
that can be used to explicitly select by which ID to identify boot
loader entries: the machine ID, or some OS ID (ID= or IMAGE_ID= from
/etc/os-release, or even some completely different string. The
selected string is then persisted to /etc/kernel/entry-token, so that
kernel-install can find it there.
2. The --make-machine-id-directory= switch is renamed to
--make-entry-directory= since after all it's not necessarily the
machine ID the dir is named after, but can be any other string as
selected by the entry token.
3. This drops all code to make automatic changes to /etc/machine-info.
Specifically, the KERNEL_INSTALL_MACHINE_ID= field is now more
generically implemented in /etc/kernel/entry-token described above,
hence no need to place it at two locations. And the
KERNEL_INSTALL_LAYOUT= field is not configurable by user switch or
similar anyway in bootctl, but only read from
/etc/kernel/install.conf, and hence copying it from one configuration
file to another appears unnecessary, the second copy is fully
redundant. Note that this just drops writing these fields, they'll
still be honoured when already set.
This drops documentation of KERNEL_INSTALL_MACHINE_ID as machine-info
field (though we'll still read it for compat).
This updates the kernel-install man page to always say "ENTRY-TOKEN"
instead of "MACHINE-ID" where appropriate, to clear the confusion up
between the two.
This also tries to fix how we denote env vars (always prefix with $ and
without = suffix), and other vars (without $ but with = suffix)
kernel-install: search harder for kernel image/initrd drop-in dir
If not explicitly configured, let's search a bit harder for the
ENTRY_TOKEN, and let's try the machine ID, the IMAGE_ID and ID fields of
/etc/os-release and finally "Default", all below potential $XBOOTLDR.
kernel-install: only generate systemd.boot_id= in kernel command line if used for naming the boot loader spec files/dirs
Now that we can distinguish the naming of the boot loader spec
dirs/files and the machine ID let's tweak the logic for suffixing the
kernel cmdline with systemd.boot_id=: let's only do that when we
actually need the boot ID for naming these dirs/files. If we don't,
let's not bother.
This should be beneficial for "golden" images that shall not carry any
machine IDs at all, i.e acquire their identity only once the final
userspace is actually reached.
kernel-install: add a new $ENTRY_TOKEN variable for naming boot entries
This cleans up naming of boot loader spec boot entries a bit (i.e. the
naming of the .conf snippet files, and the directory in $BOOT where the
kernel images and initrds are placed), and isolates it from the actual machine
ID concept.
Previously there was a sinlge concept for both things, because typically
the entries are just named after the machine ID. However one could also
use a different identifier, i.e. not a 128bit ID in which cases issues
pop up everywhere. For example, the "machine-id" field in the generated
snippets would not be a machine ID anymore, and the newly added
systemd.machine_id= kernel parameter would possibly get passed invalid
data.
Hence clean this up:
$MACHINE_ID → always a valid 128bit ID.
$ENTRY_TOKEN → usually the $MACHINE_ID but can be any other string too.
This is used to name the directory to put kernels/initrds in. It's also
used for naming the *.conf snippets that implement the Boot Loader Type
1 spec.
kernel-install: don't try to persist used machine ID locally
This reworks the how machine ID used by the boot loader spec snippet
generation logic. Instead of persisting it automatically to /etc/ we'll
append it via systemd.machined_id= to the kernel command line, and thus
persist it in the generated boot loader spec snippets instead. This has
nice benefits:
1. We do not collide with read-only root
2. The machine ID remains stable across factory reset, so that we can
safely recognize the path in $BOOT we drop our kernel images in
again, i.e. kernel updates will work correctly and safely across
kernel factory resets.
3. Previously regular systems had different machine IDs while in
initrd and after booting into the host system. With this change
they will now have the same.
This then drops implicit persisting of KERNEL_INSTALL_MACHINE_ID, as its
unnecessary then. The field is still honoured though, for compat
reasons.
This also drops the "Default" fallback previously used, as it actually
is without effect, the randomized ID generation already took precedence
in all cases. This means $MACHNE_ID/KERNEL_INSTALL_MACHINE_ID are now
guaranteed to look like a proper machine ID, which is useful for us,
given you need it that way to be able to pass it to the
systemd.machine_id= kernel command line option.
Yu Watanabe [Fri, 11 Mar 2022 06:13:23 +0000 (15:13 +0900)]
meson: move to c_std=gnu11
Recently, the kernel communitiy started to discuss to move C11 (gnu11) [1],
and it seems to come near future.
Let's also move to c_std=gnu11. Unlike the kernel, we already uses
gnu99, hence hopefully we can move to C11 without changing anything.
Yu Watanabe [Mon, 28 Feb 2022 01:55:51 +0000 (10:55 +0900)]
network: re-design request queue
This makes Request object takes hash, compare, free, and process functions.
With this change, the logic in networkd-queue.c can be mostly
independent of the type of the request or the object (e.g. Address) assigned
to the request, and it becomes simpler.
Yu Watanabe [Mon, 28 Feb 2022 02:15:01 +0000 (11:15 +0900)]
network: refuse to configure link properties when in initialized state
The condition should be satisfied only when users request to reconfigure
the link, and in that case, all request will be cancelled. Hence, it is
not necessary to process the request.
Yu Watanabe [Mon, 28 Feb 2022 00:20:42 +0000 (09:20 +0900)]
network: introduce request_call_netlink_async()
In most netlink handlers, we do the following,
1. decrease the message counter,
2. check the link state,
3. error handling,
4. update link state via e.g. link_check_ready().
The first two steps are mostly common, hence let's extract it.
Moreover, this is not only extracting the common logic, but provide a
strong advantage; `request_call_netlink_async()` assigns the relevant
Request object to the userdata of the netlink slot, and the request object
has full information about the message we sent. Hence, in the future,
netlink handler can print more detailed error message. E.g. when
an address is failed to configure, then currently we only show an
address is failed to configure, but with this commit, potentially we can
show which address is failed explicitly.
This does not change such error handling yet. But let's do that later.
Yu Watanabe [Sun, 27 Feb 2022 06:39:16 +0000 (15:39 +0900)]
network: make Request object take Manager*
Previously, even though all Request object are owned by Manager, they
do not have direct reference to Manager, but through Link or NetDev
object. But, as Link or NetDev can be NULL, we need to conditionalize
how to access Manager from Request with the type of the request.
This makes the way simpler, as now Request object has direct reference
to Manager.
This also rename request_drop() -> request_detach(), as in the previous
commit, the reference counter is introduced, so even if a reference of
a Request object from Manager is dropped, the object may still alive.
The naming `request_drop()` sounds the object will freed by the
function. But it may not. And `request_detach()` suggests the object
will not be managed by Manager any more, and I think it is more
appropreate.
This is just a cleanup, and should not change any behavior.
Yu Watanabe [Sun, 27 Feb 2022 06:18:01 +0000 (15:18 +0900)]
network: introduce reference counter for Request object
Currently, all Request object are always owned by Manager, and freed
when it is processed, especially, soon after a netlink message is sent.
So, it is not necessary to introduce the reference counter.
In a later commit, the Request object will _not_ be freed at the time
when a netlink message is sent, but assigned to the relevant netlink
slot as a userdata, and will be freed when a reply is received. So, the
owner of the Request object is changed in its lifetime. In that case, it
is convenient that the object has reference counter to avoid memleak or
double free.
Yu Watanabe [Sat, 26 Feb 2022 06:56:39 +0000 (15:56 +0900)]
network: make request_process_address() and friends take Link and corresponding object
This also renames e.g. request_process_address() -> address_process_request().
Also, this drops type checks such as `assert(req->type == REQUEST_TYPE_ADDRESS)`,
as in the later commits, the function of processing request, e.g.
`address_process_request()`, will be assigned to the Request object when
it is created. And the request type will be used to distinguish and to
avoid deduplicating requests which do not have any assigned objects,
like REQUEST_TYPE_DHCP4_CLIENT. Hence, the type checks in process functions
are mostly not necessary and redundant.
This is mostly cleanups and preparation for later commits, and should
not change any behavior.
get_pretty_hostname() so far had semantics not in line with our usual
ones: the return parameter was actually freed before the return string
written into it, because that's what parse_env_file() does. Moreover,
when the value was not set it would return NULL but succeed.
Let's normalize this, and only fill in the return value if there's
something set, and never read from it, like we usually do with return
parameter, and in particular those named "ret_xyz".
The existing callers don't really care about the differences, but it's
nicer to normalize behaviour to minimize surprises.
Frantisek Sumsal [Thu, 10 Mar 2022 14:18:45 +0000 (15:18 +0100)]
cgls: mangle user-provided unit names
so the CLI interface is now similar to `systemctl`, i.e. if no unit name
suffix is provided, assume `.service`.
Fixes: #20492
Before:
```
$ systemd-cgls --unit user@1000
Failed to query unit control group path: Invalid argument
Failed to list cgroup tree: Invalid argument
```
Luca Boccassi [Thu, 10 Mar 2022 01:30:08 +0000 (01:30 +0000)]
core: support ExtensionDirectories in user manager
Unprivileged overlayfs is supported since Linux 5.11. The only
change needed to get ExtensionDirectories to work is to avoid
hard-coding the staging directory to the system manager runtime
directory, everything else just works (TM).
This change does two things: raise the default limit for nspawn
containers (where we try to mimic closely what the kernel does), and
bump it when running on old kernels which still have the lower setting.
The first ExecStartPre or the first ExecStart commands would get the metadata,
but not the subsequent ones. Also check that we do not pass it in
ExecStartPost.
manager: prevent cleanup of triggering units before we start the handler
This fixes the following case:
OnFailure= would be spawned correctly, but OnSuccess= would be
spawned without the MONITOR_* metadata, because we'd "collect" the unit
that started successfully. So let's block cleanup while we have a job
running for the handler. The job cannot last infinitely, so at some point
we'll be able to collect both.
We already logged what we are spawning, but not so much why. Let's
add this, so it's easier to distinguish execstartpre/execstart/execstartpost
and such.
The test would fail when the the same handler was used for multiple
*failing* units. We need to call 'reset-failed' to let the manager forget
about the earlier ones.
systemd-analyze log-target console is removed, because it's easier to follow
the logs if logging it to the journal.
Luca Boccassi [Wed, 9 Feb 2022 11:48:30 +0000 (11:48 +0000)]
core: split $MONITOR_METADATA and return it only if a single unit triggers OnFailure/OnSuccess
Remove the list logic, and simply skip passing metadata if more than one
unit triggered an OnFailure/OnSuccess handler.
Instead of a single env var to loop over, provide each separate item
as its own variable.
Luca Boccassi [Tue, 8 Mar 2022 22:13:37 +0000 (22:13 +0000)]
core: do not return 'skipped' when Condition*= fail with StartUnitWithFlags()
Backward incompatible change to avoid returning 'skipped' if a condition causes
a job activation to be skipped when using StartUnitWithFlags().
Job results are broadcasted, so it is theoretically possible that existing
software could get confused if they see this result.
Luca Boccassi [Wed, 9 Mar 2022 02:07:34 +0000 (02:07 +0000)]
core: support MountAPIVFS and RootDirectory in user manager
The only piece missing was to somehow make /proc appear in the
new user+mount namespace. It is not possible to mount a new
/proc instance, not even with hidepid=invisible,subset=pid, in
a user namespace unless a PID namespace is created too (and also
at the same time as the other namespaces, it is not possible to
mount a new /proc in a child process that creates a PID namespace
forked from a parent that created a user+mount namespace, it has
to happen at the same time).
Use the host's /proc with a bind-mount as a fallback for this
case. User session services would already run with it, so
nothing is lost.
Yu Watanabe [Mon, 7 Mar 2022 06:45:17 +0000 (15:45 +0900)]
network: refuse string which contains non-safe or non-ascii characters for Filename=
The string will be used when the client load additional config file to
boot, and it must be a valid path or url. Hence, let's refuse non-safe or
non-characters.
The function has nothing to do with any Manager object, hence drop that
from the name. And it actually looks something up by handle *action* not
by *handle*, hence the old name was a bit misnomer. Let's call it
handle_action_lookup(), as it queries handle action metainfo for a
handle action.
Also, let's make sure it behaves more like our usual functions that
lookup some fixed data from some enum value/int: let's return NULL if we
don't find it.
It stores meta-info about various HandleActions, hence let's name it
after that. The fact that it can be seen as stored inside some form of a
table is an implementation detail of logind-action.c, and should not
leak into other modules, hence let's focus on what it is, not how it is
stored.
logind: replace handle_action_valid() macro by inline function
The old macro will double evaluation and has no protection against
operator precedence issues. Let's fix that by using an inline func
instead, which also gives us typesafety.
random-util: unify RANDOM_ALLOW_INSECURE and !RANDOM_BLOCK and simplify
RANDOM_BLOCK has existed for a long time, but RANDOM_ALLOW_INSECURE was
added more recently, leading to an awkward relationship between the two.
It turns out that only one, RANDOM_BLOCK, is needed.
RANDOM_BLOCK means return cryptographically secure numbers no matter
what. If it's not set, it means try to do that, but if it fails, fall
back to using unseeded randomness.
This part of falling back to unseeded randomness is the intent of
GRND_INSECURE, which is what RANDOM_ALLOW_INSECURE previously aliased.
Rather than having an additional flag for that, it makes more sense to
just use it whenever RANDOM_BLOCK is not set. This saves us the overhead
of having to open up /dev/urandom.
Additionally, when getrandom returns too little data, but not zero data,
we currently fall back to using /dev/urandom if RANDOM_BLOCK is not set.
This doesn't quite make sense, because if getrandom returned seeded data
once, then it will forever after return the same thing as whatever
/dev/urandom does. So in that case, we should just loop again.
Since there's never really a time where /dev/urandom is able to return
some easily but more with difficulty, we can also get rid of
RANDOM_EXTEND_WITH_PSEUDO. Once the RNG is initialized, bytes
should just flow normally.
This also makes RANDOM_MAY_FAIL obsolete, because the only case this ran
was where we'd fall back to /dev/urandom on old kernels and return
GRND_INSECURE bytes on new kernels. So also get rid of that flag.
Finally, since we're always able to use GRND_INSECURE on newer kernels,
and we only fall back to /dev/urandom on older kernels, also only fall
back to using RDRAND on those older kernels. There, the only reason to
have RDRAND is to avoid a kmsg entry about unseeded randomness.
The result of this commit is that we now cascade like this:
- Use getrandom(0) if RANDOM_BLOCK.
- Use getrandom(GRND_INSECURE) if !RANDOM_BLOCK.
- Use /dev/urandom if !RANDOM_BLOCK and no GRND_INSECURE support.
- Use /dev/urandom if no getrandom() support.
- Use RDRAND if we would use /dev/urandom for any of the above reasons
and RANDOM_ALLOW_RDRAND is set.