firstboot: process the root account after sysusers created it
We would create root account from sysusers or from firstboot, depending on
which one ran earlier. Since firstboot offers more options, in particular can
set the root password, we needed to order it earlier. This created an ugly
ordering requirement:
We want sysusers.service to create basic users, so we can create nodes in dev,
so we can operate on block devices and such, so that we can resize and remount
things. But at the same time, systemd-firstboot.service can only work if it is
run early, before systemd-sysusers.service has created /etc/passwd. We can't
have it both ways: the units that want to have a fully writable root file
system cannot be ordered before units which are required to do file system
preparation.
Instead of trying to order firstboot very early, let's let it do its thing even
if it is started later. Instead of refusing to create to the root account if
/etc/passwd and /etc/shadow exist, actually check if the account is configured.
Now sysusers writes root account with password PASSWORD_UNPROVISIONED
("!unprovisioned"), and then firstboot checks for this, and will configure root
in this case.
This allows sysusers to be executed earlier (or accounts to be set up earlier
in another way).
shared/condition: add envvar override for the check for first-boot
Before 7cd43e34c5a302ff323c013f437092d2ff5ccbbf, it was possible to use
SYSTEMD_PROC_CMDLINE=systemd.condition-first-boot to override autodetection.
But now this doesn't work anymore, and it's useful to be able to do that for
testing.
units: create /dev with --graceful first, allow sysusers to run later
We want to call systemd-tmpfiles-setup-dev.service to create /dev/fuse and
other device nodes so that module probing will work. But it is possible that
when we're in first boot, some users or groups need to be created by
systemd-sysusers first. But it is also possible that systemd-sysusers cannot
actually execute configuration because the root partition is not fully writable
yet. So let systemd-tmpfiles-setup-dev.service run earlier, possibly without
all users and groups in place. Since systemd-tmpfiles-setup-dev.service writes
to /dev only, it doesn't care how the root partition is mounted. In this early
run, some some nodes might be created with default permissions (i.e. not
accessible to non-root users or groups). This should be OK for the early boot
phase. Afterwards, we let systemd-tmpfiles-setup.service execute full
configuration. We will configure any files in /dev twice, but considering that
there's only a few of them and that the second run should only adjust ownership
and permissions, this should be OK. This way, we avoid the dependency loop.
If systemd-repart is running sufficiently early, /var/tmp might not be in place
yet. But if there is nothing to minimize, we won't even use it. Let's move the
check right before the first use.
systemd-repart[441]: Device '/' has no dm-crypt/dm-verity device, no need to look for…
systemd-repart[441]: Device /dev/sda opened and locked.
systemd-repart[441]: Sector size of device is 512 bytes. Using grain size of 4096.
systemd-repart[441]: Could not determine temporary directory: No such file or directory
systemd[1]: systemd-repart.service: Child 441 belongs to systemd-repart.service.
systemd[1]: systemd-repart.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: systemd-repart.service: Failed with result 'exit-code'.
firstboot: clarify that machine-id options are only offline, add missing docs
Let's flat out refuse to configure machine-id on a running system with
systemd-firstboot. It wouldn't work anyway, because by the time firstboot is
started, pid1 has created /etc/machine-id, possibly with "unitialized", so
firstboot wouldn't touch the file. (If --force is specified, it works. So
let's allow that in case people want to do crazy things.)
While at it, add missing descriptions of various things that were added over
time, and group descriptions of similar options together.
units: make sure proc-sys-binfmt_misc.automount is actually stopped
As with other units, stopping of the automount requires actual work,
and without the ordering dependency systemd might not execute the stop
job before shutdown.target is reached and units ordered after that are
executed.
units/systemd-repart: stop pretending that root config is executed in the initrd
I have a system with /usr/lib/repart.d/50-root.conf with GrowFileSystem=yes.
The partition wouldn't be resized in the initrd, because
ConditionDirectoryNotEmpty=|/sysusr/usr/lib/repart.d was evaluated very early,
before /sysroot was mounted. There was no ordering dependency between
systemd-repart.service and sysroot.mount. (There was After=initrd-usr-fs.target,
but it seems to be only referred to by systemd-fstab-generator, which in my
case doesn't even run, because there's no fstab.)
But in fact, we neeed to run systemd-repart in the initrd only in limited
circumstances: when we need to create the root device based on config under
sysusr.mount. If there is config on the root device, it can be executed in
the host system, early during boot. Thus, let's remove the condition on
/sysroot/…. Without an ordering dependency on sysroot.mount, it was subject to
a race condition anyway. (A race condition with a low probability of "winning",
because systemd-repart.service has no dependencies, but sysroot.mount requires
a device to be detected and the mount to happen.)
The other problem was that systemd-repart.service didn't have the ordering wrt.
initrd-switch-root.target, so it was subject to the same race condition that
was fixed for other units in 7c0e2b555968d70ac563a37e32a6931ee90961a6. (If the
systemd-repart.service/stop job is slow, we could end up not restarting
systemd-repart.service in the host system.)
With the changes here, I see systemd-repart.service/start running twice:
in the initrd it is skipped because the conditions fail, and then in the
host system it runs normally.
Note: support for /sysroot is retained in systemd-repart code. I don't see a
strong reason to remove it, since it may still be useful to people invoking
repart in the initrd in other circumstances.
> The block is reordered and split to have:
> 1. description + documentation
> 2. (optionally) conditions
> 3. all the dependencies
The dependencies for shutdown.target are listed separately because they are the
other deps are for startup, and shutdown.target only matter much later.
Michal Sekletar [Mon, 22 May 2023 15:44:30 +0000 (17:44 +0200)]
core/service: when resetting PID also reset known flag
Re-watching pids on cgroup v1 (needed because of unreliability of cgroup
empty notifications in containers) is handled bellow at the end of
service_sigchld_event() and depends on value main_pid_known flag.
In CentOS Stream 8 container on cgroup v1 the stop action would get stuck
indefinitely on unit like this,
However, upstream works "fine" because in upstream version of systemd we
actually never wait on processes killed in containers and proceed
immediately to sending SIGKILL hence re-watching of pids in the cgroup
is not necessary. But for the sake of correctness we should merge the
patch also upstream.
Daan De Meyer [Mon, 22 May 2023 11:33:01 +0000 (13:33 +0200)]
mkosi: Make sure persistent journal storage is enabled
We ship with empty /var, so /var/log/journal does not exist, which
means journald does not do persistent logging. Let's fix that by
setting the config to explicitly enable persistent logging.
Frantisek Sumsal [Mon, 22 May 2023 11:24:12 +0000 (13:24 +0200)]
test: prefix "internal" stuff with an underscore
Since bash has no namespaces, let's do the second best thing and prefix
all "internal" stuff with an underscore, to minimize the chance of a name
conflict in the future.
The original patch was reverted because of issue #25369. The issue was created
because it wrongly assumed that sd_journal_seek_tail() seeks to 'current' tail.
But in fact, only when a subsequent sd_journal_previous() is called that it's
pointing to the tail at that time. The concept of 'tail' in sd_journal_seek_tail()
only has a logical meaning, and a sd_journal_previous is needed. In fact, if we
look at the codes in journalctl, we can see sd_journal_seek_tail() is followed by
sd_journal_previous(). By contrary, a sd_journal_next() after a 'logical' tail does
not make much sense. So the original patch is correct, and projects that are
using 'sd_journal_next()' right after 'sd_journal_seek_tail()' should do fixes
as in https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2823#note_1637715.
Yu Watanabe [Sun, 21 May 2023 16:56:08 +0000 (01:56 +0900)]
test-journal-interleaving: extend tests to clarify the issue in sd_journal_next() or friends
This illustrates bug in sd_journal_next() or friends;
calling sd_jounral_next() followed by sd_journal_seek_tail() makes the
location saved in sd-journal something corrupted, and subsequent
sd_journal_previous() or friends may fail or provides unexpected result.
Dmitry V. Levin [Tue, 16 May 2023 08:00:00 +0000 (08:00 +0000)]
udevadm-verify: introduce --no-summary option
When udevadm verify is invoked by an analyzer tool like rpminspect
to verify individual udev rules files, the summary just clutters the
output, so provide an option to turn the summary off.
Mike Yuan [Sat, 1 Apr 2023 11:44:29 +0000 (19:44 +0800)]
networkctl: add verb edit and cat to operate on network configs
This adds two verbs, edit and cat, to networkctl for
operating on network configs (namely .network, .netdev
and .link files). Specially, if the config name is
prefixed by @, it will be treated as network interface
name, and operations will be performed on config files
associated with the link.
units: order sysinit.target, debug-shell.service after systemd-vconsole-setup
Previous patch to add an implicit dependency effectively orders various getty
services after systemd-vconsole-setup.service. But I think it's cleaner to also
order the service before sysinit.target, like it was before 8125e8d38e3aa099c7dce8b0161997b8842aebdc. There might be units which don't do
use TTYVHangup= but would like to have the console fully initialized.
Also, add a manual ordering to debug-shell.service, because it has
ImplicitDependencies=no. This might delay debug-shell.service a bit, but
systemd-vconsole-setup.service has no dependencies and should be very quick, so
this should not be noticable in practice. Without the ordering, the terminal
might not have a key map loaded, making debug-shell.service hard to use.
We started systemd-vconsole-setup in two ways: via a dbus call from localed to
do systemd-vconsole-setup.service/restart, and from udev, calling the binary
directly. This patch makes udev call
systemctl restart systemd-vconsole-setup.service
effectively implementing the same method as localed.
Ordering is implemented at the unit level, so we can use --no-block to not
block here.
pid1: order units using TTYVHangup= after vconsole setup
The goal of this change is to delay getty services until after
systemd-vconsole-setup has finished. systemd-vconsole-setup starts loadkeys,
and it seems that when loadkeys is interrupted by the TTY hangup call we do
when starting tty services [1], so that loadkeys starts getting EIO from the
ioctl("/dev/tty1", KDSKBENT) syscall it does.
Initially I wanted to add ordering dependencies to individual units, but
TTYVHangup= can be added to other various external units too. The solution with
an implicit dependency should cover those cases too.
namespace-util: use inode_same_at() instead of FORMAT_PROC_FD_PATH()
Doesn't matter much, but this makes it leas magic and independent of
/proc/ mounts. (Well, it actually doesn't, since the right-hand path is
also in /proc/, but still...
Let's be more accurate about what this function does: it checks whether
the underlying reported inode is the same. Internally, this already uses
a better named stat_inode_same() call, hence let's similarly name the
wrapping function following the same logic.
Similar for files_same_at() and path_equal_or_same_files().
manager: restrict Dump*() to privileged callers or ratelimit
Dump*() methods can take quite some time due to the amount of data to
serialize, so they can potentially stall the manager. Make them
privileged, as they are debugging tools anyway. Use a new 'dump'
capability for polkit, and the 'reload' capability for SELinux, as
that's also non-destructive but slow.
If the caller is not privileged, allow it but rate limited to 10 calls
every 10 minutes.
units: order getty units after getty-pre.target unconditionally
Those two units had this ordering conditionalized on HAVE_SYSV_COMPAT. This
seems strange. 45e27532971ac84e835a2879df510a581f933fcd added the ordering
differently for those two files without any comment, and I think it was just
pasted or scripted erroneously.
Frantisek Sumsal [Fri, 19 May 2023 09:45:11 +0000 (11:45 +0200)]
test: build the SELinux test module on the host
Let's save some time and build the SELinux test module on the host
instead of a possibly unaccelerated VM. This brings the runtime of
TEST-06-SELINUX from ~12 minutes down to a ~1 minute.
Frantisek Sumsal [Fri, 19 May 2023 09:07:07 +0000 (11:07 +0200)]
test: drop generated stuff from the final coverage report
Let's drop stuff from the current $BUILD_DIR from the final coverage
report, as it's all generated files (mostly gperf) which we don't
really care about and it makes the Coveralls report confusing, since it
reports "source not available" for all such files.
Frantisek Sumsal [Fri, 19 May 2023 08:48:15 +0000 (10:48 +0200)]
test: make the stress test slightly less stressful on slower machines
Without acceleration this part of the test takes over 10 minutes (!),
which feels quite unnecessary. Let's cut down the number of stuff we
dump to the journal in such case, but keep the original value if we run
with acceleration (since in that case it takes less than 10 seconds).
Dmitry V. Levin [Thu, 18 May 2023 17:00:00 +0000 (17:00 +0000)]
udev-rules: avoid issuing redundant diagnostics in verify mode
When udevadm verify is given an argument that doesn't point to an
existing file, there used to be two diagnostics messages, the first one
at a warning level, and the second one at an error level:
$ build/udevadm verify /no/such/directory
Failed to open /no/such/directory, ignoring: No such file or directory
Failed to parse rules file /no/such/directory: No such file or directory
Luca Boccassi [Thu, 18 May 2023 12:08:56 +0000 (13:08 +0100)]
integration test: pass 'noresume' to qemu
When running on Debian/Ubuntu, I get a minute delay or so on every boot
because the local initramfs tries to resume from hibernation. This is
not really useful here, so always skip it