This is closely related to the previous commit: if the credentials dir
is empty and nothing mounted on it, let's remove it again.
This will in particular happen if we decided to not actually install the
mount we prepared for the credentials because it is empty. In that case
the mount point inode is already there, and with this we'll remove it.
Primary effect, users will see ENOENT rather than EACCESS when trying to
access it, which should be preferable, given we already handle that
nicely in our credential consumption code.
This should also be useful on systems where we lack any privs to create
mounts, and thus operate on a regular dir anyway.
Let's avoid creating another mount in the system if it's empty anyway.
This is mostl a cosmetic thing in one (pretty common) special case: if
creds settings are used in a unit but no creds actually available to be
passed.
(While we are at it this also does one more minor optimization: it
adjusts the MS_RDONLY/MS_NOSUID/… flags of the source mount we are about
to MS_MOVE into the right place only if we actually really move it, and
if we instead unmount it again we won't bother with the flags either)
There's no need to fchdir() out of the rootfs and back into it around
the umount2(), hence don't.
This brings the logic closer to what the pivot_root() man page suggests.
While we are at it, always operate based on fds, once we opened the
original dir, and pass the path string along only for generating
messages (i.e. as "decoration").
Add tests for both code paths: the pivot_root() one and the MS_MOUNT.
notify: don't send EXIT_STATUS= notify message from systemd-notify
In 623a00020f116d8e9c70608a9e4f7cc978342441 code was added that our
various programs send a notification message with their exit status on
exit. This is great, but it becomes utterly confusing in systemd-notify,
whose primary purpose is to send such messages after all, and sending an
implicit one in addition to the primary one is particularly confusing,
when debugging things.
Let's hence just drop the implicit message. systemd-notify's exit status
is after all indicative primarily because sd_notify() failed, and hence
it's pretty pointless to then send that fact as another sd_notify()
message.
(Primary reason for this patch is simply that it confused the hell out
of me, when debugging sd_notify() issues)
base-filesystem: add new helper base_filesystem_create_fd() that operates on an fd, instead of a path
This also changes the open flags from
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW to
O_DIRECTORY|O_CLOEXEC. O_RDONLY is redundant, since O_RDONLY is zero
anyway, and O_DIRECTORY pins the acces mode enough: it doesn't allow
read()/write() anyway when specified. O_NONBLOCK is also pointless given
that O_DIRECTORY is specified, it has no meaning on directories. (It is
useful if we don't know much about the inode we are opening, and could
be a device node or fifo, but the O_DIRECTORY excludes that case.)
O_NOFOLLOW is dropped since there's really no point in blocking out the
initial entrypoint being a symlink. Once we pinned the the root of the
tree it might make sense to restrict symlink use below it, but for the
entrypoint itself it doesn't matter.
switch-root: don't require /mnt/ when switching root into host OS
So far, we invoked pivot_root() specifying /mnt/ as second argument,
which then unmounted right-after. We'd create /mnt/ if needed. This
sucks, because it means /mnt/ must strictly be pre-created on immutable
images.
Remove this limitation, by using pivot_root() with "." as source and
target, which will result in two stacked mounts afterwards: the new one
underneath, the old one ontop. We can then simply unmount the top one,
and have what we want without needing any extra /mnt/ dir.
Since we don't need /mnt/ anymore we can get rid of the extra
unmount_old_root parameter and simply specify it as NULL if we don't
want the old mount to stick around.
Jan Janssen [Tue, 2 May 2023 17:41:58 +0000 (19:41 +0200)]
boot: Use correct memory type for allocations
We were using the wrong memory type when allocating pool memory. This
does not seem to cause a problem on x86, but the kernel will fail to
boot at least on ARM in QEMU.
This is caused by mixing different allocation types which ended up
breaking the kernel or EDK2 during boot services exit. Commit 2f3c3b0bee5534f2338439f04b0aa517479f8b76 appears to fix this boot
failure because it was replacing the gnu-efi xpool_print with xasprintf
thereby unifying the allocation type.
But this same issue can also happen without this fix somehow when the
random-seed logic is in use.
msizanoen1 [Tue, 2 May 2023 09:59:07 +0000 (16:59 +0700)]
core: check for SERVICE_RELOAD_NOTIFY in manager_dbus_is_running
This ensures that systemd won't erronously disconnect from the system
bus in case a bus recheck is triggered immediately after the bus service
emits `RELOADING=1`.
This fixes an issue where systemd-logind sometimes randomly stops
receiving `UnitRemoved` after a system update.
This also handles SERVICE_RELOAD_SIGNAL just in case somebody ever
creates a D-Bus broker implementation that uses `Type=notify-reload`.
generators: skip private tmpfs if /tmp does not exist
When spawning generators within a sandbox we want a private /tmp, but it
might not exist, and on some systems we might be unable to create it
because users want a BTRFS subvolume instead.
base-filesystem: create /proc, /sys, /dev mount points as 0555
These inodes are going to be overmounted anyway, hence let's create them
with access mode 555, so that they are as close to being immutable as
regular UNIX access modes allow them to be. In other words: this takes
the "w" mode away for root. This of course usually has little effect --
unless CAP_DAC_OVERRIDE is dropped. But at the very least it makes the
point clear that inodes should be considered immutable.
(I intended to make this 0000 originally, but that doesn't work, as many
tools – including our own – have fallback paths that when they see
ENOENT in /proc/ they can handle this gracefully. But changing the mode
to 000 would turn this to EACCES - something they usually have no
fallback path for)
This slightly extends the symbol file test and checks which symbols are
listed in one list but missing in the other. This is tremendously useful
to quickly determine which symbols wheren't exposed properly but should
have been.
(This is is implemented in pure C, no systemd helpers, to ensure we see
libsystemd.so API as any other tool would.)
mkosi: Switch to use mkosi presets with prebuilt initrds
Instead of building the initrds for the mkosi images with dracut,
let's switch to using mkosi presets to build the initrd with mkosi
as well.
This commit splits up our single image build into three separate
mkosi presets:
1. The "base" preset. This image contains systemd and all its runtime
dependencies. The sole purpose of this image is to serve as a base image
for the initrd and the final image. It's also responsible for building
systemd from source with the build script. The results are installed into
the base image. Note that we install the systemd and udev packages into this
image as well to prevent package managers from overriding the systemd we built
from source with the distro packaged systemd if it's pulled in as a dependency
by another package from the initrd or final profiles.
2. The "initrd" preset. This image provides the initrd. It's trivial and does
nothing more than packaging the base image up as a zstd compressed initramfs and
adds /init and /etc/initrd-release symlinks to the image.
3. The "final" preset. This image builds on top of the base image and adds
a kernel and extra packages that are useful for testing and debugging.
We also split out the optional kernel build into a separate set of config files
that are only included if a kernel to build is actually provided.
Note that this commit doesn't really change anything about how mkosi is used.
The commands remain the same, except that mkosi will now build all the presets
in order. "mkosi summary" will show the summary of all the presets. "mkosi qemu,
boot, shell" will always boot the final preset. With "-f", all presets will be
built and the final one is booted. "-i" makes a cache of each preset.
The only thing to keep in mind is that specifying config via the mkosi CLI will
apply to each of the presets. e.g. any extra packages added with "-p" will be
installed in both the initrd and the final image. To apply local configuration
to a single preset, create a file 00-local.conf in
mkosi.presets/<profile>/mkosi.conf.d and put all the preset specific configuration
in there.
Last month I monkey-patched journald to produce a small (64K) but valid
journal and used that as an input to four AFL fuzzers. After a month it
generated quite a nice corpora (4738 test cases) and after filtering
and minimizing it I was left with 619 unique journals with various
levels of corruption that probe the journal code.
It seems to detect past issues like systemd#26567, etc.
Yu Watanabe [Mon, 1 May 2023 05:18:08 +0000 (14:18 +0900)]
sd-journal: introduce simple loop detection for entry array objects
If .next_entry_array_offset points to one of the previous entry or the
self entry, then the loop for entry array objects may run infinitely.
Let's assume that the offsets of each entry array object are in
increasing order, and check that in loop.
don bright [Sun, 30 Apr 2023 03:33:13 +0000 (22:33 -0500)]
hwdb: add hardware rfkill key for Dell Latitude E6* models (#27462)
Hello
This pull req is adapting pull req #5772 (which fixed issue #5047), for the very similar computer Dell Latitude E6420 which has the same problem with the hardware switch to toggle wifi (aka rfkill). The symptom is the following repeated msgs in dmesg
[ 309.010284] atkbd serio0: Use 'setkeycodes e008 <keycode>' to make it known.
[ 309.016020] atkbd serio0: Unknown key pressed (translated set 2, code 0x88 on isa0060/serio0).
Adding this line to include E6 models causes these messages to stop showing in dmesg
As the systemd-pstore process is quite short lived, it might sometimes
lack the necessary metadata to make matching against a unit or a syslog
tag work. Since we already use a cursor file to make the matching window
small as possible, let's just drop the unit match completely and hope
for the best.
core/path: do not enqueue new job in .trigger_notify callback
Otherwise,
1. X.path triggered X.service, and the service has waiting start job,
2. systemctl stop X.service
3. the waiting start job is cancelled to install new stop job,
4. path_trigger_notify() is called, and may reinstall new start job,
5. the stop job cannot be installed, and triggeres assertion.
So, instead, let's add a defer event source, then enqueue the new start
job after the stop (or any other type) job finished.
pid1: unify implemenation of /run/ disk space safety check a bit
reload/reexec currently used a separate implementation of the /run/ disk
space check, different from the one used for switch-root, even though
the code is mostly the same. The one difference is that the former
checks are authoritative, the latter are just informational (that's
because refusing a reload/reexec is relatively benign, but refusing a
switch-root quite troublesome, since this code is entered when it's
already "too late" to turn turn back, i.e. when the preparatory
transaction to initiate the switch root are already fully executed.
Let's share some code, and unify codepaths.
(This is preparation for later addition of a "userspace reboot" concept)
core/systemctl: when switching root default to /sysroot/
We hardcode the path the initrd uses to prepare the final mount point at
so many places, let's also imply it in "systemctl switch-root" if not
specified.
This adds the fallback both to systemctl and to PID 1 (this is because
both to — different – checks on the path).