These are wrappers around getpwuid_r() and friends, and will allocate the
right-sized buffer for this call.
We so far had multiple implementations of a buffer allocation loop
around getpwuid_r() and friends, and they all suck in some way. Let's
clean this up and add a common implementation, and use it everywhere.
Also, be more careful with error numbers, in particular systematically
turn ENOENT into ENOSRCH (the former is what is returned if /etc/passwd
is absent, which we want to consider identical to user not existing,
which is ENOSRCH). We so far did this at some invocations, but not all.
There are some invocations of getpwuid() left in the codebase. We really
should fix those too, and have a single unified implementation of the
logic, but those are not as trivial to convert, so left for another
time.
nspawn,vmspawn: let's add some terminal magic to the welcome text
Let's grey the text out, and prefix it with a vertical grey bar, to make
clear this is output from the host, not the payload, and make it clearly
distinguishable from what follows.
Let's also make the image name clickable (with new enough
shared-mime-info this should allow you to look into the image with
gnome-disk-utility or a similar tool.
vmspawn: make "-m" value formatting independent of locale
We cannot format the memory string via printf() %f format strings, since
that's locale dependent and qemu doesn't like that. hence format this as
an integer. We'll lose sub-MiB accuracy, but systems with less than 1
MiB memory don't really make much sense anyway.
vmspawn: close host vsock fd once we passed it to the child
Without this I qemu simply froze in a weird state for me if I kill it:
it was supposedly a zombie, but we'd get the pidfd POLLIN event for it
only once the fd is closed. Hence let's close it right-away.
(Smells like a kernel issue actually, but too lazy to bother with this).
Daan De Meyer [Mon, 22 Jan 2024 11:04:45 +0000 (12:04 +0100)]
mkosi: Use authselect local profile if it exists
authselect 1.5.0 removed the "minimal" profile and added the "local"
profile instead. Let's modify our post-installation script to take
these changes into account.
Adrian Vovk [Sat, 30 Dec 2023 19:06:39 +0000 (14:06 -0500)]
core: path: Re-enter waiting if target is deactivating
Previously, path units would remain in the running state while their
target unit is deactivating. This left a window of time where the target
unit is no longer operational (i.e. it is busy deactivating/cleaning
up/etc) but the path unit would continue to ignore inotify events. In
short: any inotify event that occurs while the target unit deactivates
would be completely lost.
With this commit, the path will go back into a waiting state when the
target unit starts deactivating. This means that any inotify event that
occurs while the target unit deactivates will queue a start job.
ptyfwd: when leaving a session with tinted background, clear to end of screen
So if we tint the background of a ptyfwd session with a color and the
session ends, then so far we reset the bg color and clear till the end
of line.
Let's instead clear till the end of the screen. This is nicer since it
means that any follow-up output will not be affected by the changed
background color anymore.
Luca Boccassi [Thu, 18 Jan 2024 19:32:47 +0000 (19:32 +0000)]
portable: log structured message when attach/detach succeeds
Currently portabled is completely silent (when not using debug level). But
when the system state is changed (ie: a portable is attached or detached)
there are no traces left in the journal. Log at info level when either of
those operations succeed, as they are effectively changing the state of
the system.
Create new MESSAGE_IDs for these logs, and also append PORTABLE_ROOT=
(and PORTABLE_EXTENSION= if any), like the units themselves are
configured to do via LogExtraFields=, so that the same metadata can
be found in the attach/detach messages and in logs from the units
themselves.
Let's put the run queue really the last spot, as we should only start
doing more work if we really have nothing else to do anymore.
Let's move the service watchdog after the rewatch PID logic for similar
logic: it will possibly result in new jobs being enqueued to stop
things, and we should really have done all other work first.
manager: process exec_fd (i.e. Type=exec) events before SIGCHLD events
We want to make sure we don't confuse the case "process started
successfully but then failed quickly" from the case "process failed to
start". Hence we need to make sure we take notice of Type=exec before we
bother with SIGCHLD.
Hence move EVENT_PRIORITY_EXEC_FD to the front. In fact, let's move it
even further up than SIGCHLD, i.e. before sd_notify() handling, so that
we don't end up processing service state change notifications before we
even considered that the service is properly started.
This also gives the cgroup OOM handling and the exec_fd handling
different priorities, to improve robustness of the system, we should act
quickly on OOM, and it doesn't matter if a service started succcessfully
if we have to act on OOM anyway.
This is based on Andrew Onyshchuk <andryk.rv@gmail.com> work here:
core: maintain a single table with event source priorities
It's hard to oversee the assigned processing priorities of the various
event sources we have. Let's unify them in a table (an enum), where we
can have a single consisten look at them, and then reference the table
entries by expressive symbols.
This doesn#t change behaviour in any way, it just gives each priority a
nice label, but doesn't change any of the priorities.
Clayton Craft [Fri, 19 Jan 2024 00:20:55 +0000 (16:20 -0800)]
boot: don't print error if device tree fixup protocol isn't supported
This isn't a failure we care about, and it's somewhat alarming to see a
red error message flash up on the display when booting, so this just
simply returns EFI_SUCCESS and skips printing the "error" altogether.
Frantisek Sumsal [Thu, 18 Jan 2024 16:20:52 +0000 (17:20 +0100)]
journalctl: consider shut down namespaced sd-journald instance synced
If the namespaced systemd-journald instance was shut down due to
inactivity, we can consider it synchronized, so avoid throwing an error
in such case.
This should help with the random TEST-44-LOG-NAMESPACE fails where we
might try to sync the namespace just after it was shut down:
[ 7.682941] H testsuite-44.sh[381]: + systemd-run --wait -p LogNamespace=foobaz echo 'hello world'
[ 7.693916] H systemd-journald[389]: Failed to open /dev/kmsg, ignoring: Operation not permitted
[ 7.693983] H systemd-journald[389]: Collecting audit messages is disabled.
[ 7.725511] H systemd[1]: Started systemd-journald@foobar.service.
[ 7.726496] H systemd[1]: Listening on systemd-journald-varlink@foobaz.socket.
[ 7.726808] H systemd[1]: Listening on systemd-journald@foobaz.socket.
[ 7.750774] H systemd[1]: Started run-u3.service.
[ 7.795122] H systemd[1]: run-u3.service: Deactivated successfully.
[ 7.842042] H testsuite-44.sh[390]: Running as unit: run-u3.service; invocation ID: 56380adeb36940a8a170d9ffd2e1e433
[ 7.842561] H systemd[1]: systemd-journald-varlink@foobaz.socket: Deactivated successfully.
[ 7.842762] H systemd[1]: Closed systemd-journald-varlink@foobaz.socket.
[ 7.846394] H systemd[1]: systemd-journald@foobaz.socket: Deactivated successfully.
[ 7.846566] H systemd[1]: Closed systemd-journald@foobaz.socket.
[ 7.852983] H testsuite-44.sh[390]: Finished with result: success
[ 7.852983] H testsuite-44.sh[390]: Main processes terminated with: code=exited/status=0
[ 7.852983] H testsuite-44.sh[390]: Service runtime: 44ms
[ 7.852983] H testsuite-44.sh[390]: CPU time consumed: 8ms
[ 7.852983] H testsuite-44.sh[390]: Memory peak: 880.0K
[ 7.852983] H testsuite-44.sh[390]: Memory swap peak: 0B
[ 7.853785] H testsuite-44.sh[381]: + journalctl --namespace=foobar --sync
[ 7.860095] H systemd-journald[389]: Received client request to sync journal.
[ 7.862119] H testsuite-44.sh[381]: + journalctl --namespace=foobaz --sync
[ 7.868381] H journalctl[396]: Failed to connect to /run/systemd/journal.foobaz/io.systemd.journal: Connection refused
[ 7.871498] H systemd[1]: testsuite-44.service: Main process exited, code=exited, status=1/FAILURE
[ 7.871642] H systemd[1]: testsuite-44.service: Failed with result 'exit-code'.
[ 7.930772] H systemd[1]: Failed to start testsuite-44.service.
Yu Watanabe [Tue, 16 Jan 2024 13:36:29 +0000 (22:36 +0900)]
network/route: convert route before requesting
Previously,
1. use the passed Route object as is when a route is requested,
2. when the route becomes ready to configure, convert the Route object
if necessary, to resolve outgoing interface name, and split multipath
routes, and save them to the associated interfaces,
3. configure the route with the passed Route object.
However, there are several inconsistencies with what kernel does:
- The kernel does not merge nor split IPv4 multipath routes. However, we
unconditionally split multipath routes to manage.
- The kernel does not set gateway or so to a route if it has nexthop ID.
Fortunately, I do not find any issues caused by the inconsistencies. But
for safety, let's manage routes in a consistent way with the kernel.
This makes,
1. when a route is requested, split IPv6 multipath routes, but keep IPv4
multipath routes as is, and queue (possibly multiple) requests for
the route.
2. when the route becomes ready to configure, resolve nexthop and interface
name, and requeue request if necessary.
3. configure the (possibly split) route.
By using the logic,
- Now we manage routes in a mostly consistent way with the kernel.
- We can drop ConvertedRoutes object.
- Hopefully the code becomes much simpler.
Yu Watanabe [Wed, 17 Jan 2024 01:07:19 +0000 (10:07 +0900)]
nspawn-network: also check alternative names
If the requested new name for a network interface is already assigned as a
alternative name, then it is not necessary to and cannot rename the
interface.
Yu Watanabe [Wed, 17 Jan 2024 00:48:12 +0000 (09:48 +0900)]
nspawn-network: split out move_network_interface_one()
This also changes to use sd_device to get some attributes.
So, on moving interfaces back to the parent, we need to populate sysfs
associated to the client netns.
That may look redundant and complicated, but it makes later change
easier, and hopefully faster.
Although DocBook 4.5 states that `cmdsynopsis` can be used within `term` [1],
and `term` within `varlistentry`, `man` does not display the list of commands
after this change. FWIW, `cmdsynopsis` is used tree-wide within `refsynopsisdiv`
only.
Black-Hole1 [Fri, 19 Jan 2024 03:38:49 +0000 (11:38 +0800)]
virt: support detection of Apple Virtualization guests with cpuid
This is a supplement to #24419. On macOS Intel machines, detection needs to be done through cpuid.
In macOS, `dmi_vendors` detection is only applicable to M series.
Alberto Planas [Thu, 18 Jan 2024 14:38:30 +0000 (15:38 +0100)]
Measure empty PK and KEK EFI vars
The OVMF UEFI firmware is measuring PK and KEK when secure boot is
disabled, and those variables are absent. This can be checked via the
event log to see that there are extensions for PCR 7 associated with PK
and KEK events of type EV_EFI_VARIABLE_DRIVER_CONFIG.
When running the "lock-secureboot-policy" verb, pcrlock complains that
those variables are not found and refuse to generate the
240-secureboot-policy.pcrlock.d/generated.pcrlock file.
The "TCG PC Client Platform Firmware Profile Specification Version 1.05
Revision 23"[1] from May 7, 2021, in section "3.3.4.8 PCR[7] - Secure
Boot Policy Measurements", point 10.b:
If reading a UEFI variable returns UEFI_NOT_FOUND, platform firmware
SHALL measure the absence of the variable. The
UEFI_VARIABLE_DATA.VariableDataLength field MUST be set to zero and
UEFI_VARIABLE_DATA.VariableData field will have a size of zero.
This patch mark those variables to be marked as "synthesize empty",
generating the correct hash for those variables.
dissect-image: introduce new get_common_dissect_directory() helper
So far, if some component mounts a DDI in some local mount namespace we
created a temporary mountpoint in /tmp/ for that. Let's instead use the
same directory inode in /run/ instead. This is safe, since if everything
runs in a local mount namespace (with propagation on /run/ off) then
they shouldn't fight for the inode. And it relieves us from having to
clean up the directory after use. Morever, it allows us to run without
/tmp/ mounted.
This only moves dissect-image.c and the dissec tool over. More stuff is
moved over later.
man: don't suggest using pam_unix.so's use_authtok switch
Our dumbed down example PAM stacks do not contain cracklib/pwq modules,
hence using use_authtok on the pam_unix.so password change stack won't
work, because it has the effect that pam_unix.so never asks for a
password on its own, expecting the cracklib/pwq modules to have
queried/validated them beforehand.
I noticed this issue because of #30969: Debian's PAM setup suffers by
the same issue – even though they don't actually use our suggested PAM
fragments at all.
mime: expose a mime type for encrypted credentials
Let's make things nice for desktops, and provide a mime type for
credential files.
This uses the 128bit header identifier that our credential files start
with. However, the files are always base64 encoded, hence we have to
match the base64 string, hence add a small test case that generates them
properly for us, and truncates them at the right place (since 128 is not
evently divisable by 6).
Mike Yuan [Wed, 17 Jan 2024 11:52:40 +0000 (19:52 +0800)]
hibernate-util: log that we actually read /sys/power/resume* rather than cmdline
/sys/power/resume is always populated by the initrd, while
/sys/power/resume_offset might have been populated by
the kernel itself. Therefore, if the user is using an initrd
that doesn't include resume hook, the hibernation would fail,
which is expected. However, it was hard to track down the real
problem, since the previous log message suggested that resume=
is not set through kernel cmdline.
varlink: introduce varlink_call_and_log() which calls and then logs an error
As it turns out we do this in a similar way at various times (and
sometimes incorrectly), hence add a common implementation to share the
code and fix the incorrect behaviour.
varlink: drop "ret_flags" parameter from varlink_call()
The parameter returns the flags field of the reply message. This is only
relevant in very few cases, hence drop it from the call, but keep it in
a more generic varlink_call_full() call for those who need it.