units: change Requires=systemd-networkd.service → BindsTo= one more time
Follow-up for da15f8406e9aeb7908e1d92c02d2ff5147c7788a which did the
change for systemd-networkd-wait-online.service, let's also do this for
systemd-networkd-wait-online@.service
This is a wrapper around fd_reopen() that will reopen an fd if the
F_GETFL flags indicate this is necessary, and otherwise not.
This is useful for various utility calls that shall be able to operate
on O_PATH and without it, and might need to convert between the two
depending on what's passed in.
Joerg Behrmann [Wed, 23 Nov 2022 15:43:19 +0000 (16:43 +0100)]
kernel-install: Add uki layout
Currently the kernel-install man page only documents the bls layout for use
with the boot loader spec type #1. 90-loaderentry.install uses this layout to
generate loader entries and copy the kernel image and initrd to $BOOT.
This commit documents a second layout "uki" and adds 90-uki-copy.install,
which copies a UKI "uki.efi" from the staging area or any file with the .efi
extension given on the command line to
$BOOT/EFI/Linux/$ENTRY_TOKEN-$KERNEl_VERSION(+$TRIES).efi
This allows for both locally generated and distro-provided UKIs to be handled
by kernel-install.
Nick Rosbrook [Tue, 22 Nov 2022 17:50:33 +0000 (12:50 -0500)]
test: handle Debian's /etc/default/locale in testsuite-74.firstboot.sh
This handles a Debian-specific quirk where /etc/default/locale is used
instead of /etc/locale.conf. There is currently special handling for
this in testsuite-73.sh, so the quirk should be handled here too for
consistency.
Daan De Meyer [Wed, 23 Nov 2022 11:00:01 +0000 (12:00 +0100)]
repart: Prefer using loop devices to populate filesystems when available
Let's make sure we use loop devices if we have access to them and
only fall back to regular files if we can't use loop devices. We
prefer loop devices because when using mkfs --root options, we have
to populate a temporary staging tree which means we're copying every
file twice instead of once when using loop devices.
Jan Janssen [Wed, 2 Nov 2022 09:25:32 +0000 (10:25 +0100)]
stub: Detect empty LoadOptions when run from EFI shell
The EFI shell will pass the entire command line to the application it
starts, which includes the file path of the stub binary. This prevents
us from using the built-in cmdline if the command line is otherwise
empty.
Fortunately, the EFI shell registers a protocol on any images it starts
this way. The protocol even lets us access the args individually, making
it easy to strip the stub path off.
Mike Yuan [Wed, 23 Nov 2022 18:39:15 +0000 (02:39 +0800)]
systemctl: deprecate passing positional argument to reboot completely
(follow-up of #15958)
In #15958 we deprecated passing positional argument to reboot by
generate a warning. It's been two years now and I believe it can
be dropped completely, as per requested in #15773.
Yu Watanabe [Sat, 26 Nov 2022 00:46:40 +0000 (09:46 +0900)]
sd-netlink: append instead of prepend multipart message
Previously, e.g., networkd enumerated network interfaces with ifindex
in a decreasing order, as sd-netlink inverses the order of the received
multipart messages.
Let's keep the order of the multipart messages. Hopefully this changes
no behavior, as our code do not depend on the order of the received
multipart messages.
Before:
===
Nov 26 09:35:10 systemd[1]: Starting Network Configuration...
Nov 26 09:35:11 systemd-networkd[36185]: wlp59s0: Saved new link: ifindex=3, iftype=ETHER(1), kind=n/a
Nov 26 09:35:12 systemd-networkd[36185]: enp0s31f6: Saved new link: ifindex=2, iftype=ETHER(1), kind=n/a
Nov 26 09:35:12 systemd-networkd[36185]: lo: Saved new link: ifindex=1, iftype=LOOPBACK(772), kind=n/a
After:
===
Nov 26 09:45:18 systemd[1]: Starting Network Configuration...
Nov 26 09:45:19 systemd-networkd[38372]: lo: Saved new link: ifindex=1, iftype=LOOPBACK(772), kind=n/a
Nov 26 09:45:19 systemd-networkd[38372]: enp0s31f6: Saved new link: ifindex=2, iftype=ETHER(1), kind=n/a
Nov 26 09:45:19 systemd-networkd[38372]: wlp59s0: Saved new link: ifindex=3, iftype=ETHER(1), kind=n/a
Yu Watanabe [Sat, 26 Nov 2022 00:35:53 +0000 (09:35 +0900)]
sd-netlink: do not link non-multipart messages
Previously, if a single packet contains multiple non-multipart messages,
then the messages were linked and saved as a single entry, especially
even if the messages has different serial numbers. Though, not sure if
the kernel sends such packet. But at least for safety, let's link only
multipart messages.
Yu Watanabe [Thu, 24 Nov 2022 18:36:39 +0000 (03:36 +0900)]
sd-netlink: fix possible use-after-free
When we receive a multi-part message and fail to parse it, then
the prviously received message is freed with the _cleanup_ attribute,
but still referenced by sd_netlink.rqueue_partial. That causes
use-after-free when we receive another multi-part message.
Daan De Meyer [Fri, 25 Nov 2022 14:09:53 +0000 (15:09 +0100)]
repart: Remove bogus check
The --empty option applies to the partition table of the block
device, not the number of definition files we've read. Also, even
if we don't find any definition files, let's not shortcut execution
so we can run repart on a device/loopback file to get information
on the partition table.
Michal Koutný [Fri, 25 Nov 2022 16:25:36 +0000 (17:25 +0100)]
logind: Properly unescape names of lingering users
Filenames to store user linger requests are created with C-escaping.
When we enumerate the files to acquire ligering users, we use the
filenames verbatim. In the case C-escaping is not an identity map (such
as "DOMAIN\User"), we won't be able to start user instances of
such mangled users.
Unescape filenames when we treat them as usernames again.
Daan De Meyer [Fri, 25 Nov 2022 13:32:20 +0000 (14:32 +0100)]
units: Use BindsTo=systemd-networkd in systemd-networkd-wait-online.service
We don't want systemd-networkd-wait-online to start if systemd-networkd
is skipped due to condition failures. This is only guaranteed by BindsTo=
and not Requires=, so let's use BindsTo=
resolved: in dns stub always report "lo" as interface for "localhost"
Previously, we'd return the ifindex the user asked on, and if none was
specified "lo". Let's always return "lo".
This should be a better choice usually, since localhost addresses are
typically not reachable over arbitrary interfaces once SO_BINDTODEVICE
or so is used. Hence, let's report the interface that is always right
for these addresses.
Daan De Meyer [Tue, 22 Nov 2022 13:27:30 +0000 (14:27 +0100)]
repart: Add --skip-partitions=
--include-partitions and --exclude-partitions now fully exclude
partitions from repart. Whenever a partition type is excluded, we
don't take any partitions of that type into account at all when
running systemd-repart.
--skip-partitions= is introduced to do what --exclude-partitions did
previously. Any skipped partitions are taken into acount when doing
size calculations, but are not yet populated.
Why do we need both concepts? Exclusion is needed so that we can
use shared repart definitions to generate bootable and non-bootable
images. When generating a non-bootable image, we use --exclude-partitions
to exclude the ESP partition. Skipping is needed so that we can
populate the root partition while skipping the ESP partition, get
the roothash of the root partition, use that to generate a UKI, and
finally populate the ESP partition with the UKI included.
A NULL Bitmap object is by all our code considered identical to an empty
bitmap. Hence let's remove the entirely unnecessary assert().
The assert() can be triggered if debug monitoring is used an an empty
NSEC or NSEC3 RR is included in an answer resolved returns.
it's not really a security issue since enabling debug monitoring is a
manual step requiring root privileges, that is off by default. Moreover,
it's a "clean" assert(), i.e. the worst that happens is tha a coredump
is generated and resolved restarted.
Daan De Meyer [Wed, 23 Nov 2022 13:12:38 +0000 (14:12 +0100)]
mkfs-util: Skip non files/directories when calling mcopy
Only files and directories are supported by vfat. When we pass a
symlink to mcopy, it will try to dereference them and copy what the
symlink points at into the vfat partition instead. Let's avoid this
by skipping all unsupported file types when establishing the list of
top level targets that mcopy should copy.
We also use RECURSE_DIR_SORT everywhere when iterating directories
to make things more reproducible.
Jan Janssen [Wed, 23 Nov 2022 12:57:34 +0000 (13:57 +0100)]
stub: Fix splash alpha blending
How to interpret the pixel format depends on the masks in the DIB header
(if present). Also, 16bpp (unlike 24bpp) can carry an alpha channel.
This was previously not accounted for.
Currently, services use mount_move_root() in order to setup the root
directory of services using a mount namespace. This relies on MS_MOVE
and chroot(). However, this has serious drawbacks even for relatively
simple mount propagation scenarios.
What systemd currently does is roughly equivalent to the following shell
code:
unshare --mount --propagation=shared
cd /
mount --make-rslave /
mkdir /new-root
mount --rbind / /new-root
cd /new-root
mount --move /new-root /
chroot .
This looks simple enough but has the consequence that two separate mount
trees exist for the lifetime of the service. The first one was created
when the mount namespace was created, and the second one when a new
mount for the rootfs was created. The first mount tree sticks around as
a shadow mount tree. Both mount trees are dependent mounts with the host
rootfs as their dominating mount.
Now, when mount propagation is triggered by the host by e.g.,
mount --bind /opt /mnt
it means that two propagation events are generated. I'm skipping over
the exact kernel details as they aren't that important. The gist is that
for every propagation event that is generated a second one is generated
for the shadow mount tree. In other words, the kernel creates two copies
for each mount that is propagated instead of one.
This isn't necessary. We can simply change the sequence above to:
unshare --mount --propagation=shared
cd /
mount --make-rslave /
mkdir /new-root
# stash fd to old rootfs
# stash fd to new rootfs
mount --rbind / /new-root
mkdir /new-root
cd /new-root
pivot_root . .
# new root is tucked under old root
# chdir into old rootfs via stashed fd
umount -l /old-root
The pivot_root allows us to get rid of the old mount tree that was
created when the mount namespace was created. So after this sequence
only one mount tree is alive. Plus, it's safer and nicer. Moving mounts
isn't pleasnt.
This patch doesn't convert nspawn yet as the requirements are more
tricky given that it wants to preserve the rootfs as a shared mount
which goes against pivot_root() requirements.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Luca Boccassi [Wed, 23 Nov 2022 16:06:48 +0000 (16:06 +0000)]
portable: add a few more useful debug log messages
When attaching and /etc/systemd/system.attached can't be created or used
(eg: dead symlink) the logs are pretty much useless as even at debug
level there's no indication of what is going wrong.
Add some debug logs, and return a more specific error string over D-Bus.
Nick Rosbrook [Tue, 22 Nov 2022 16:30:03 +0000 (11:30 -0500)]
oomd: fix unreachable test case in test-oomd-util
This conditional with !empty_or_root(ctx->path) always returns false
because the most recent oomd_cgroup_context_acquire() call was with the
root cgroup. Make sure this test case can be reached by checking cgroup
instead of ctx->path.
While here, use an unused uid (61183) instead of the nobody uid so the
test case does not fail in unprivileged LXD containers.
Nick Rosbrook [Tue, 22 Nov 2022 15:33:55 +0000 (10:33 -0500)]
oomd: always allow root-owned cgroups to set ManagedOOMPreference
Commit 652a4efb66a ("oomd: loosen the restriction on ManagedOOMPreference")
made the change to allow ManagedOOMPreference on a cgroup candidate when
the monitored cgroup and cgroup candidate are owned by the same user.
The commit assumed that this check was sufficient to continue allowing
ManagedOOMPreference on all cgroups owned by root. However, it caused a
regression for unprivileged LXD containers where e.g. /sys/fs/cgroup is
owned by nobody (uid=65534).
Fix this by explicitly allowing the ManagedOOMPreference if uid == 0 in
oomd_fetch_cgroup_oom_preference().
hwdb: remove fuzz and deadzone for Simucube wheel bases.
For these devices the axes are setup via a special
configuration tool. udev should not apply additional
fuzz or deadzone.
Reference for the product IDs:
https://granitedevices.com/wiki/Simucube_product_USB_interface_documentation
This also indicates that there are a total of 8 axes.