Michal Sekletar [Wed, 22 May 2024 15:15:07 +0000 (17:15 +0200)]
libsystemd: link with '-z nodelete'
We want to avoid reinitialization of our global variables with static
storage duration in case we get dlopened multiple times by the same
application. This will avoid potential resource leaks that could have
happened otherwise (e.g. leaking journal socket fd).
varlinkctl: when operating in --more mode, fail correcly on Varlink method error
In varlink.c we generally do not make failing callback functions fatal,
since that should be up to the app. Hence, in case of varlinkctl (where
we want failures to be fatal), make sure to propagate the error back
explicitly.
Before this change a failing call to "varlinkctl --more call …" would result in
a zero exit code. With this it will correctly exit with a non-zero exit
code.
Luca Boccassi [Tue, 21 May 2024 00:43:24 +0000 (01:43 +0100)]
test: do not fail network namespace test with permission issues
When running in LXC with AppArmor we'll most likely get an error when creating
a network namespace due to a kernel regression in < v6.2 affecting AppArmor,
resulting in denials. Like other tests, avoid failing in case of permission
issues and handle it gracefully.
Yu Watanabe [Wed, 22 May 2024 15:03:42 +0000 (00:03 +0900)]
units: stop systemd-journald before systemd-soft-reboot.service
Typically, soft-reboot.target is never reached. So, without this change,
systemd-journald may be killed by PID1 on soft-reboot, and may cause
journal corruption.
Still I think this is the way to go. But the change was merged after -rc2,
and still discussion is continued. So, at least now let's revert it,
and do that after v256-final is released if approved.
F_OFD_SETLK is documented to only return EAGAIN, and F_SETLKW/F_OFD_SETLKW
are blocking operations so this logic doesn't apply to them in the
first place.
Hence, only automatically convert EACCES into EAGAIN for F_SETLK
operations, and propagate the original error in the other cases.
This is important because in some cases we catch permission errors
and gracefully fallback, which is not possible if the original error
is lost.
This is an issue in practice because, due to a kernel bug present
before v6.2, AppArmor denies locking on file descriptors to LXC
containers. We support all currently maintained LTS kernels,
including v6.1, where despite a lot of effort and attempts over almost
a year, the bugfix still hasn't been backported, as it is complex and
requires large changes to AppArmor.
On affected kernels, all services running with PrivateNetwork=yes
fail and do not recover, instead of the normal behaviour of gracefully
downgrading to PrivateNetwork=no.
The integration tests in the Debian CI fail due to this issue:
Yu Watanabe [Wed, 22 May 2024 03:26:58 +0000 (12:26 +0900)]
test: replace journal checkers with journalctl --follow + grep -m
Recently, for slow test environments, journalctl --sync was added to the
loop in the timeout. However, journalctl --sync may be slow in such systems,
and timeout easily triggered during syncing.
Hopefully, reading journal with --follow and grep the output with an expected
line should be efficient.
Yu Watanabe [Tue, 21 May 2024 20:24:05 +0000 (05:24 +0900)]
test: lock device during running cryptsetup
On running cryptsetup, udevd detects two inotify events for the
underlying device. Running the test on enough fast host, the expected
symlinks based on UUID and disk label are created by the second event.
During processing a uevent for a device, udevd disables the inotify
watch for the device. If the test runs on slow system, the second
inotify event may comes during a udev worker processing the synthesized
uevent triggered by the first inotify event. Hence, no synthesized
uevent for the second inotify event will be generated, and the expected
symlinks will be never created.
To prevent the issue, we need to lock the device during cryptsetup
command is running.
Luca Boccassi [Tue, 21 May 2024 17:19:04 +0000 (18:19 +0100)]
pkg/opensuse: switch to SHA1 fork
src.opensuse.org switched to SHA256, which means it can no longer be
used as a submodule in a SHA1 repository. Switch to a fork on Pagure
that gets synced across and is still SHA1:
Yu Watanabe [Tue, 21 May 2024 08:57:59 +0000 (17:57 +0900)]
test: wait a bit before stopping/killing service
Otherwise, when stopping the service, the last command may not be
started yet, and the service manager may not send SIGTERM signal to the
last command, but send SIGKILL on timeout.
===
May 21 08:23:24 test19-exit-cgroup.sh[437]: + disown
May 21 08:23:24 test19-exit-cgroup.sh[438]: + sleep infinity
May 21 08:23:24 test19-exit-cgroup.sh[437]: + systemd-notify --ready
May 21 08:23:24 test19-exit-cgroup.sh[437]: + sleep infinity
May 21 08:23:24 test19-exit-cgroup.sh[441]: + systemctl stop one
May 21 08:23:24 test19-exit-cgroup.sh[443]: + sleep infinity
(snip)
May 21 08:23:24 systemd[1]: one.service: Changed running -> stop-sigterm
May 21 08:23:24 systemd[1]: Stopping one.service - /tmp/test19-exit-cgroup.sh "systemctl stop one"...
May 21 08:23:24 systemd[1]: Received SIGCHLD from PID 441 (systemctl).
May 21 08:23:24 systemd[1]: Child 437 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 437 belongs to one.service.
May 21 08:23:24 systemd[1]: one.service: Main process exited, code=killed, status=15/TERM (success)
May 21 08:23:24 systemd[1]: Child 439 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 439 belongs to one.service.
May 21 08:23:24 systemd[1]: Child 441 (systemctl) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 441 belongs to one.service.
May 21 08:23:24 systemd[1]: Child 442 (bash) died (code=killed, status=15/TERM)
May 21 08:23:24 systemd[1]: one.service: Child 442 belongs to one.service.
(snip)
May 21 08:24:54 systemd[1]: one.service: State 'stop-sigterm' timed out. Killing.
May 21 08:24:54 systemd[1]: one.service: Killing process 443 (sleep) with signal SIGKILL.
May 21 08:24:54 systemd[1]: one.service: Changed stop-sigterm -> stop-sigkill
May 21 08:24:54 systemd[1]: Received SIGCHLD from PID 443 (sleep).
May 21 08:24:54 systemd[1]: Child 443 (sleep) died (code=killed, status=9/KILL)
May 21 08:24:54 systemd[1]: one.service: Child 443 belongs to one.service.
May 21 08:24:54 systemd[1]: one.service: Control group is empty.
May 21 08:24:54 systemd[1]: one.service: Failed with result 'timeout'.
May 21 08:24:54 systemd[1]: one.service: Service restart not allowed.
May 21 08:24:54 systemd[1]: one.service: Changed stop-sigkill -> failed
May 21 08:24:54 systemd[1]: one.service: Job 738 one.service/stop finished, result=done
May 21 08:24:54 systemd[1]: Stopped one.service - /tmp/test19-exit-cgroup.sh "systemctl stop one".
May 21 08:24:54 systemd[1]: one.service: Unit entered failed state.
May 21 08:24:54 systemd[1]: one.service: Releasing resources...
===
As requested in post-merge review
https://github.com/systemd/systemd/pull/32869#pullrequestreview-2068161094:
> NotInControl error is really about session controllers, but this here really
> is different.
Yu Watanabe [Mon, 20 May 2024 19:48:42 +0000 (04:48 +0900)]
test: wait for unit generated from /proc/self/mountinfo to be unloaded
Fixes https://github.com/systemd/systemd/issues/32680#issuecomment-2120974685.
===
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2475]: + mountpoint /tmp/tmp.eaRV7lSbX2/mnt
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2476]: /tmp/tmp.eaRV7lSbX2/mnt is not a mountpoint
May 21 02:45:08 TEST-74-AUX-UTILS.sh[2449]: + systemd-mount /dev/loop0 /tmp/tmp.eaRV7lSbX2/mnt
May 21 02:45:08 systemd-mount[2477]: Failed to start transient mount unit: Unit tmp-tmp.eaRV7lSbX2-mnt.mount was already loaded or has a fragment file.
===
The change in behavior was partly intentional, as I think
if both --wait and --pty are used, manually disconnecting
from PTY forwarder should not result in systemd-run exiting
with "Finished with ..." log. But we should check for
--wait here.
Frantisek Sumsal [Tue, 21 May 2024 13:04:22 +0000 (15:04 +0200)]
test: make TEST-65-ANALYZE happy when built with gcov
systemd-analyze runs the generators in a sandbox, which makes gcov
unhappy since it can't update its counters. Let's "silence" gcov in this
particular case by telling it to look for gcov note files in /tmp (where
shouldn't be any, so gcov won't try to update any counters).
Quoting https://github.com/systemd/systemd/issues/28514#issuecomment-1831781486:
> Whenever PAM is enabled for a service, we set up the PAM session and then
> fork off a process whose only job is to eventually close the PAM session when
> the service dies. That services we run with service privileges, both to
> minimize attack surface and because we want to use PR_SET_DEATHSIG to be get
> a notification via signal whenever the main process dies. But that only works
> if we have the same credentials as that main process.
>
> Now, if pam_systemd runs inside the PAM stack (which it normally does) it's
> session close hook will ask logind to synchronously end the session via a bus
> call. Currently that call is not accessible to unprivileged clients. And
> that's the part we need to relax: allow users to end their own sessions.
The check is implemented in a way that allows the kill if the sender is in
the target session.
I found 'sudo systemctl --user -M "zbyszek@" is-system-running' to
be a convenient reproducer.
Before:
May 16 16:25:26 x1c systemd[1]: run-u24754.service: Deactivated successfully.
May 16 16:25:26 x1c dbus-broker[1489]: A security policy denied :1.24757 to send method call /org/freedesktop/login1:org.freedesktop.login1.Manager.ReleaseSession to org.freedesktop.login1.
May 16 16:25:26 x1c (sd-pam)[3036470]: pam_systemd(login:session): Failed to release session: Access denied
May 16 16:25:26 x1c systemd[1]: Stopping session-114.scope...
May 16 16:25:26 x1c systemd[1]: session-114.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd[1]: Stopped session-114.scope.
May 16 16:25:26 x1c systemd[1]: session-c151.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd-logind[1513]: Session c151 logged out. Waiting for processes to exit.
May 16 16:25:26 x1c systemd-logind[1513]: Removed session c151.
After:
May 16 17:02:15 x1c systemd[1]: run-u24770.service: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopping session-115.scope...
May 16 17:02:15 x1c systemd[1]: session-c153.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: session-115.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopped session-115.scope.
May 16 17:02:15 x1c systemd-logind[1513]: Session c153 logged out. Waiting for processes to exit.
May 16 17:02:15 x1c systemd-logind[1513]: Removed session c153.
Edit: this seems to also fix https://github.com/systemd/systemd/issues/8598.
It seems that with the call to ReleaseSession, we wait for the pam session
close hooks to finish. I inserted a 'sleep(10)' after the call to ReleaseSession
in pam_systemd, and things block on that, nothing is killed prematurely.
analyze: do not print timestamps before "start of userspace"
We have the following timestamp status:
$ systemctl show systemd-fsck-root.service | grep InactiveExitTimestamp
InactiveExitTimestamp=Thu 2023-11-02 12:27:24 CET
InactiveExitTimestampMonotonic=15143158
$ systemctl show | grep UserspaceTimestamp
UserspaceTimestamp=Thu 2023-11-02 12:27:25 CET
UserspaceTimestampMonotonic=15804273
i.e. UserspaceTimestamp is before InactiveExit of systemd-fsck-root.service.
This is fine, but on display, we'd subtract those values and print a huge
negative value bogusly:
$ build/systemd-analyze critical-chain systemd-remount-fs.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
In fact, list_dependencies_print() already had a branch where the check that
'times->activating > boot->userspace_time', but it didn't cover all cases. So
make it cover both branches, and also change to '>=', since it's fine if
something happened with the same timestamp.
With the patch:
$ build/systemd-analyze critical-chain systemd-remount-fs.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
Luca Boccassi [Mon, 20 May 2024 12:12:03 +0000 (13:12 +0100)]
logind: do not fail creating a session when request is not from a unit
When running inside an LXC container the 'su' process will not be part of
any unit or slice.
manager_get_user_by_pid() which was used until v255 (included) does not fail
if it cannot find a unit/slice, but simply returns 'not found'. Do the same
in manager_get_session_by_pidref().
This was not detected as Semaphore CI does not reboot the testbed before
the logind test, so the session is started by the old logind from the base
distro, instead of the one being tested.
Yu Watanabe [Mon, 20 May 2024 00:53:26 +0000 (09:53 +0900)]
test-network: also set custom altternative name for netdevsim interface
Due to the bug in kernel 6.9 caused by
https://github.com/torvalds/linux/commit/8debcf5832c3e8a6baaea27c75ad8a6ba5077beb,
the net_id udev builtin does not work for netdevsim interface.
So, eni99np1 cannot be used with kernel 6.9 anymore.
Yu Watanabe [Sun, 19 May 2024 00:14:27 +0000 (09:14 +0900)]
machine-id-setup: acquire machine ID from /run/machine-id if possible
If machine ID is previously stored at /run/machine-id, then let's reuse
it. This is important on switching root and /etc/machine-id was previously
a mount point.
Note, this requires the previous two commits, and cannot backport without them.
Note, before the previous commit, the use-after-free could be triggered
only by Rename() DBus method, and could not by RenameImage(), as we did not
cache Image object when RenameImage() method is called. And machinectl
always uses RenameImage(). Hence, the issue could be triggered only when
Rename() DBus method is explicitly called by e.g. busctl.
With the previous commit, the Image object passed to the function is
always cached. Hence, the issue could be triggered even with machinectl
command, and this fix is important.
Yu Watanabe [Fri, 17 May 2024 20:33:48 +0000 (05:33 +0900)]
machine: also acquire Image object from cache when a dbus method in the main interface is called
Previously, Image objects were only cached when reading properties or
methods in the org.freedesktop.machine1.Image interface are called.
This makes that, when a method in the main interface (org.freedesktop.machine1)
for an image is called, also acquire the Image object from the cache,
and if not cached, create Image object and put into the cache, like we
do for org.freedesktop.machine1.Image.
Otherwise, if some properties of an image are updated by methods in the main
interface, e.g. MarkImageReadOnly(), the changes do not applied to the cached
Image object, and subsequent read of proerties through the interface for the
image, e.g. ReadOnly property, may provide outdated values.
Yu Watanabe [Fri, 17 May 2024 20:10:42 +0000 (05:10 +0900)]
discover-image: update Image.read_only flag in image_read_only()
Otherwise, ReadOnly DBus property in org.freedesktop.machine1.Image or
org.freedesktop.portable1.Image will not be updated by MarkReadOnly DBus
method.
Currently, we only pass TTYPath=/dev/pts/... to
the transient service spawned by systemd-run.
This is a bit problematic though, when ExecStartPre=
or ExecStopPost= is used. Since when these control
processes get to run, the main process is not yet
started/has already exited, hence the slave suffers
from the same vhangup problem as the mentioned commit.
By passing the slave fd in, the service manager will
hold the fd open as long as the service is alive.
Fixes the following unexpected skip:
```
[ 6.163670] TEST-64-UDEV-STORAGE.sh[596]: + modinfo btrfs
[ 6.164102] TEST-64-UDEV-STORAGE.sh[726]: /usr/lib/systemd/tests/testdata/units/TEST-64-UDEV-STORAGE.sh: line 726: modinfo: command not found
[ 6.164683] TEST-64-UDEV-STORAGE.sh[727]: + echo 'This test requires the btrfs kernel module but it is not installed, skipping the test'
[ 6.165069] TEST-64-UDEV-STORAGE.sh[728]: + tee --append /skipped
[ 6.166801] TEST-64-UDEV-STORAGE.sh[728]: This test requires the btrfs kernel module but it is not installed, skipping the test
[ 6.167177] TEST-64-UDEV-STORAGE.sh[596]: + exit 77
```
Mike Yuan [Fri, 17 May 2024 13:07:17 +0000 (21:07 +0800)]
man/soft-reboot: order surviving services before shutdown.target
Prompted by #32895
Rather than ordering with each power operation targets,
ordering against shutdown.target which is a valid
synchronization point. This has no effect if soft-reboot
is being performed.
Yu Watanabe [Fri, 17 May 2024 06:04:31 +0000 (15:04 +0900)]
test: wait for underlying .device unit being active before invoking systemd-mount
Fixes following failure:
===
May 17 04:12:04 TEST-74-AUX-UTILS.sh[2684]: + systemd-mount --owner=testuser /dev/loop0 /tmp/tmp.DVQdo2ou53/mnt
(snip)
May 17 04:15:04 systemd[1]: dev-loop0.device: Job dev-loop0.device/start timed out.
May 17 04:15:04 systemd[1]: dev-loop0.device: Job 5812 dev-loop0.device/start finished, result=timeout
May 17 04:15:04 systemd[1]: Timed out waiting for device dev-loop0.device - /dev/loop0.
May 17 04:15:04 systemd[1]: tmp-tmp.DVQdo2ou53-mnt.mount: Job 5804 tmp-tmp.DVQdo2ou53-mnt.mount/start finished, result=dependency
May 17 04:15:04 systemd[1]: Dependency failed for tmp-tmp.DVQdo2ou53-mnt.mount - /tmp/tmp.DVQdo2ou53/mnt.
May 17 04:15:04 systemd[1]: tmp-tmp.DVQdo2ou53-mnt.mount: Job tmp-tmp.DVQdo2ou53-mnt.mount/start failed with result 'dependency'.
May 17 04:15:04 systemd[1]: systemd-fsck@dev-loop0.service: Job 5805 systemd-fsck@dev-loop0.service/start finished, result=dependency
May 17 04:15:04 systemd[1]: Dependency failed for systemd-fsck@dev-loop0.service - File System Check on /dev/loop0.
May 17 04:15:04 systemd[1]: systemd-fsck@dev-loop0.service: Job systemd-fsck@dev-loop0.service/start failed with result 'dependency'.
May 17 04:15:04 systemd[1]: dev-loop0.device: Job dev-loop0.device/start failed with result 'timeout'.
(snip)
May 17 04:15:04 systemd-mount[2856]: A dependency job for tmp-tmp.DVQdo2ou53-mnt.mount failed. See 'journalctl -xe' for details.