With certain fstabs we may propagate ENXIO from the $SYSTEMD_SYSFS_CHECK
check all the way up, making fstab-generator exit with a non-zero EC and
without any helpful message, which is really confusing.
We now modify our cmdline to use '=' for all arguments,
but didn't change early setup check to work with that.
So every daemon-reexec does a full setup, thus breaking
running user sessions.
Daan De Meyer [Thu, 22 Dec 2022 13:59:56 +0000 (14:59 +0100)]
find-esp: Add openat() like helpers that operate on fds
We also rework the internals of find-esp to work on directory file
descriptors instead of absolute paths and do a lot of general cleanups.
By passing the parent directory file descriptor to verify_fsroot_dir()
along with the name of the directory we're operating on, we can get rid
of the fallback that goes via path to open the parent directory if '..'
fails due to permission errors.
Daan De Meyer [Thu, 30 Mar 2023 08:39:53 +0000 (10:39 +0200)]
btrfs-util: Add btrfs_get_block_device_at()
Let's make btrfs_get_block_device_fd() more generic by renaming it
to btrfs_get_block_device_at() so it can operate on only paths, dir_fd
and path, or only on fd by using xopenat().
Let's always operate on paths without resolving the final component.
If the path is a symlink, it could point to a vendor default in /usr,
in which case we definitely do not want to modify the vendor defaults.
To avoid this from happening, we replace the symlink with our own file
instead of modifying the file the symlink points at.
Frantisek Sumsal [Fri, 31 Mar 2023 16:42:38 +0000 (18:42 +0200)]
test: set ReadWritePaths= for test-.services when built w/ coverage
Let's make the dropin, to make the build dir writable for gcov, a bit
more generic, so it can be used by all units starting with prefix test-.
This should help with a bunch of recent reports about missing coverage I
got, as well as with existing test units using DynamicUser=true.
This might feel a bit like a magic trick from behind the curtains, but I
want to touch the actual tests as little as possible, since it makes them
unnecessarily messy (see the various workarounds for sanitizers), and
the coverage reports are generated only in a specific CI job anyway.
core: skip deps on oomd if v2 or memory unavailable
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2055664 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2172146
User report that systemd repeatedly logs about not being able to start oomd
when booted with v1:
Feb 20 16:52:33 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[2067491]: Queued start job for default target default.target.
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
Feb 20 16:52:34 systemd[2067491]: Created slice app.slice - User Application Slice.
Feb 20 16:52:34 systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).
systemd-oomd.service that pulls systemd-oomd.socket in (because it requires
it); systemd-oomd.service itself is pulled by user@.service because
systemd-oomd package installs an override config file that sets
ManagedOOMMemoryPressure=kill.
Add a check to the code that adds the implicit dependency to skip the
dep if we cannot start it. The check is done exactly the same as in oomd
itself.
Thomas Blume [Tue, 14 Mar 2023 14:21:29 +0000 (15:21 +0100)]
test: use setpriv instead of su for user switch from root
systemd-repart needs to find mkfs.ext4 for the test.
This is located in the directory /usr/sbin on openSUSE Tumbleweed.
But since the variable ALWAYS_SET_PATH in /etc/login.defs is set to yes,
su re-initializes the $PATH variable and removes /usr/sbin.
Hence, mkfs.ext4 is not found and the test fails.
Using setpriv instead of su fixes this issue and is more appropriate to
do the switch user task from root.
[zjs: move setpriv to $BASICTOOLS and force-push to retrigger CI]
TODO: drop items regarding swap-for-hibernate-only-use
I doubt we should bother. Swap always makes sense, and having a swap
partition for hibernate only without using it all the time just makes
the system worse overall.
bootctl: clean up handling of files with no version information
get_file_version() would return:
- various negative errors if the file could not be accessed or if it was not a
regular file
- 0/NULL if the file was too small
- -ESRCH or -EINVAL if the file did not contain the marker
- -ENOMEM or permissions errors
- 1 if the marker was found
bootctl status iterates over /EFI/{systemd,BOOT}/*.efi and checks if the files
contain a systemd-boot version tag. Resource or permission errors should be
fatal, but lack of version information should be silently ignored.
OTOH, when updating or installing bootloader files, the version is expected
to be present.
get_file_version() is changed to return -ESRCH if the version is unavailable,
and other errnos for permission or resource errors.
The logging is reworked to always display an error if encountered, but also
to log the status at debug level what the result of the version inquiry is.
This makes it figure out what is going on:
/efi/EFI/systemd/systemd-bootx64.efi: EFI binary LoaderInfo marker: "systemd-boot 253-6.fc38"
/efi/EFI/BOOT/BOOTfbx64.efi: EFI binary has no LoaderInfo marker.
/efi/EFI/BOOT/BOOTIA32.EFI: EFI binary has no LoaderInfo marker.
/efi/EFI/BOOT/BOOTX64.EFI: EFI binary LoaderInfo marker: "systemd-boot 253-6.fc38"
Frantisek Sumsal [Thu, 30 Mar 2023 17:26:53 +0000 (19:26 +0200)]
coverage: add a wrapper for execveat()
gcov provides wrappers for the exec*() calls but there's none for execveat(),
which means we lose all coverage prior to the call. To mitigate this, let's
add a simple execveat() wrapper in gcov's style[0], which dumps and resets
the coverage data when needed.
This applies only when we're built with -Dfexecve=true.
We have three states:
- ENABLE_COREDUMP and systemd-coredump is installed,
- ENABLE_COREDUMP but systemd-coredump is not installed,
- !ENABLE_COREDUMP.
In the last case we would not do any coredumping-related setup in pid1, which
means that coredumps would go to to the working directory of the process, but
actually limits are set to 0. This is inherited by children of pid1.
As discussed extensively in https://github.com/systemd/systemd/pull/26607, this
default is bad: dumps are written to arbitrary directories and not cleaned up.
Nevertheless, the kernel cannot really fix it. It doesn't know where to write,
and it doesn't know when that place would become available. It is only the
userspace that can tell this to the kernel. So the only sensible change in the
kernel would be to default to '|/bin/false', i.e. do what we do now.
In the middle case, we disabled writing of coredumps via a pattern, but raise
the RLIMIT_CORE. We need to raise the limit because we can't raise it later
after processes have been forked off. This means we behave correctly, but allow
coredumping to be enabled at a later point without a reboot.
This patch makes the last case behave like the middle case. This means that
even if systemd is compiled with systemd-coredump, it still does the usual
setup. If users want to restore the kernel default, they need to provide two
drop-in files:
for sysctl.d, with 'kernel.core_pattern=core'
for systemd.conf, with 'DefaultLimitCORE=0'.
The general idea is that pid1 does the safe thing. A distro may want to use
something different than the systemd-coredump machinery, and then that would
could packaged together with the drop-ins to change the configuration.
Yu Watanabe [Mon, 27 Mar 2023 19:14:03 +0000 (04:14 +0900)]
test-kernel-install: several cleanups
- allow to run without $PROJECT_BUILD_ROOT,
- drop unnecessary export for bootctl,
- enable -x option to show commands,
- use 'test ! -e' to check the nonexistence of files,
- show more debugging logs.
units: let's establish the coredump socket before writting core_pattern sysctl
It's a bit nicer if we only write the sysctl core_pattern once the
coredump socket is established, since it's the backend for the handler.
Given the systemd-coredump.socket basically has no dependencies that run
before it this should not really make things slower or so, it just
removes the tiny window where core pattern is in effect that wants to
connect to the backend socket but cannot.
The status quo isn't terrible, and not too different in effect: either
way, until the socket unit is up we won't process coredumps. It's mostly
what kind of behaviour you get then: an error due to /bin/false being
invoked, or an error because systemd-coredump can't connect to its
socket. After this patch we'll exclusively see the former.
Let's use more _cleanup_ expressions. Various other modernizations. No
actual code changes, except for maybe a conversion to use heap memory
when generating an array of fds, instead of stack as before. Given that
fdstores are typically user controlled, that should be a wise idea.
pid1: introduce new SERVICE_{DEAD|FAILED}_BEFORE_AUTO_RESTART service substates
When a service deactivates and is then automatically restarted via
Restart= we currently quickly transition through
SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the
normal ("permanent") dead/failed state, but a transitory one we
immediately leave from again. We do this so that software that looks for
failures/successes can take notice, even if we restart as a consequence
of the deactivation.
Let's clean this up a bit: let's introduce two new states:
SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART
that are used for the transitory states. Both the SERVICE_DEAD and
SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level
UNIT_INACTIVE state though. (and similar for the respective failed
states). This means the high-level state machine won't change by this,
only the low-level one.
This clearly seperates the substates, which makes the state engine
cleaner, and allows clients to follow precisely whether we are in a
transitory dead/failed state, or a permanent one, by looking at the
service substate. Moreover it allows us to remove the 'n_keep_fd_store'
which so far we used to ensure the fdstore was not released during this
transitory dead/failed state but only during the permanent one. Since we
can now distinguish these states properly we can just use that.
This has been bugging me for a while. Let's clean this up.
Note that the unit restart logic is already nicely covered in the
testsiute, hence this adds no new tests for that.
And yes, this could be considered a compat break, but sofar we took the
liberty to make changes to the low-level state machine (i.e. SERVICE_xyz
states, sometimes called "substates") without considering this a bad
breakage – the high-level state machine (i.e. UNIT_xyz states) should
be considered API that cannot be changed.