TheHillBright [Wed, 21 May 2025 10:38:12 +0000 (18:38 +0800)]
journald: clarify doc for usage-related values cap (#37528)
The old description makes users wrongly assume that the cap of 4G
applied, even when the user specifies a value that will result in higher
than 4G. This commit avoids this misunderstanding.
coredump: restore compatibility with older patterns
This was broken in f45b8015513d38ee5f7cc361db9c5b88c9aae704. Unfortunately
the review does not talk about backward compatibility at all. There are
two places where it matters:
- During upgrades, the replacement of kernel.core_pattern is asynchronous.
For example, during rpm upgrades, it would be updated a post-transaction
file trigger. In other scenarios, the update might only happen after
reboot. We have a potentially long window where the old pattern is in
place. We need to capture coredumps during upgrades too.
- With --backtrace. The interface of --backtrace, in hindsight, is not
great. But there are users of --backtrace which were written to use
a specific set of arguments, and we can't just break compatiblity.
One example is systemd-coredump-python, but there are also reports of
users using --backtrace to generate coredump logs.
Thus, we require the original set of args, and will use the additional args if
found.
A test is added to verify that --backtrace works with and without the optional
args.
This returns to the original approach proposed in
https://github.com/systemd/systemd/pull/17270. After review, the approach was
changed to use sd_pid_get_owner_uid() instead. Back then, when running in a
typical graphical session, sd_pid_get_owner_uid() would usually return the user
UID, and when running under sudo, geteuid() would return 0, so we'd trigger the
secure path.
sudo may allocate a new session if is invoked outside of a session (depending
on the PAM config). Since nowadays desktop environments usually start the user
shell through user units, the typical shell in a terminal emulator is not part
of a session, and when sudo is invoked, a new session is allocated, and
sd_pid_get_owner_uid() returns 0 too. Technically, the code still works as
documented in the man page, but in the common case, it doesn't do the expected
thing.
$ build/test-sd-login |& rg 'get_(owner_uid|cgroup|session)'
sd_pid_get_session(0) → No data available
sd_pid_get_owner_uid(0) → 1000
sd_pid_get_cgroup(0) → /user.slice/user-1000.slice/user@1000.service/app.slice/app-ghostty-transient-5088.scope/surfaces/556FAF50BA40.scope
I think it's worth checking for sudo because it is a common case used by users.
There obviously are other mechanims, so the man page is extended to say that
only some common mechanisms are supported, and to (again) recommend setting
SYSTEMD_LESSSECURE explicitly. The other option would be to set "secure mode"
by default. But this would create an inconvenience for users doing the right
thing, running systemctl and other tools directly, because then they can't run
privileged commands from the pager, e.g. to save the output to a file. (Or the
user would need to explicitly set SYSTEMD_LESSSECURE. One option would be to
set it always in the environment and to rely on sudo and other tools stripping
it from the environment before running privileged code. But that is also fairly
fragile and it obviously relies on the user doing a complicated setup to
support a fairly common use case. I think this decreases usability of the
system quite a bit. I don't think we should build solutions that work in
priniciple, but are painfully inconvenient in common cases.)
man: rework the description of $SYSTEMD_PAGER and $PAGER
$PAGER wasn't documented, but actually we treat it same as $SYSTEMD_PAGER,
except for lower priority. And the two variables can be used to disable the
pager, even if $SYSTEMD_PAGERSECURE is not set.
Behaviour is (obviously) not changed by this patch, it intentionally just
updates the docs to match the code.
man: reword the description of "secure pager" handling
The existing description was not *wrong*, but it was a bit muddled. Let's
reorder the text to give a short intro and then describe what the options
actually do and the clear "true" and "false" cases first, and then describe
autodetection.
Related to https://yeswehack.com/vulnerability-center/reports/346802.
Todd C. Miller [Tue, 6 May 2025 22:39:14 +0000 (16:39 -0600)]
flush_ports: flush POSIX message queues properly
On Linux, read() on a message queue descriptor returns the message
queue statistics, not the actual message queue data. We need to use
mq_receive() to drain the queues instead.
Fixes a problem where a POSIX message queue socket unit with messages
in the queue at shutdown time could result in a hang on reboot/shutdown.
man/systemd.exec: reword description of SystemCallFilter=
The existing text grew organically as features were added and was
not very organized. Reorder it and break into paragraphs grouped
by topic. The description of the :errno syntax is replaced by a short
reference to the SystemCallErrorNumber= setting. This makes the
text shorter and makes it easier to explain how the two settings combine.
Debarshi Ray [Fri, 2 May 2025 19:08:55 +0000 (21:08 +0200)]
meson: Ensure that distribution packages own systemenvgeneratordir
Currently, Fedora's systemd RPM doesn't own systemenvgeneratordir
(ie., /usr/lib/systemd/system-environment-generators) [1] because it's
not created when systemd is installed. In contrast, userenvgeneratordir
(ie., /usr/lib/systemd/user-environment-generators) is created, unless
the environment-d Meson option is explicitly disabled.
While this can be worked around elsewhere, it's better if the upstream
build system created the directories consistently. It will avoid
repetition, and prevent silly bugs or deviations from creeping in.
Tim Small [Fri, 2 May 2025 12:40:00 +0000 (13:40 +0100)]
man/network: Note .link early boot caveat, and .network .netdev usage.
Document .link .network and .netdev file type distinctions in early
introductory text, and document distro-specific need to sync link files
with early-boot copies, see Debian bug 1005282:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005282 for an
example.
network/ndisc: drop only default gateway via the host when a neighbor announcement without router flag is received
A host can send Router Advertisements (RAs) without acting as a router.
In such cases, the lifetime of the RA header should be zero, but may
contain several options, and clients can configure addresses, routes,
and so on with the message. The host may (should?) send Neighbor
Announcements (NAs) without the router flag in that case.
So, when a NA without the router flag is received, let's not drop
configurations based on the previous RA options, but only drop the
default gateway configured based on the RA header.
See RFC 4861 Neighbor Discovery in IPv6, section 6.3.4:
https://www.rfc-editor.org/rfc/rfc4861#section-6.3.4:~:text=%2D%20The%20IsRouter%20flag,as%20a%20host.
> - The IsRouter flag in the cache entry MUST be set based on the Router
> flag in the received advertisement. In those cases where the IsRouter
> flag changes from TRUE to FALSE as a result of this update, the node
> MUST remove that router from the Default Router List and update the
> Destination Cache entries for all destinations using that neighbor as
> a router as specified in Section 7.3.3. This is needed to detect when
> a node that is used as a router stops forwarding packets due to being
> configured as a host.
sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
man/sd-bus: Add at least one reference per sd-bus function man page
Some sd-bus man pages did not have any references on the main
sd-bus man page. Unless you accidentally stumbled on them from
other pages it was difficult to discover them.
man/sd_bus_emit_signal: Fix extra const for strv functions
The functions `sd_bus_emit_interfaces_added_strv`, `sd_bus_emit_interfaces_removed_strv`
and `sd_bus_emit_properties_changed_strv` take an `char **` not
`const char **` as last argument.
See `src/systemd/sd-bus.h` for the function definition.
shutdown: handle gracefully if a device disappears while we detach it
Let's gracefully handle cases where a device disappears in the time we
between our discovery and when we want to detach it, due to "auto-clear"
or a similar logic.
The loopback case already handled this quite OK, do the same for MD and
swap too.
Switch to ERRNO_IS_DEVICE_ABSENT() for all checks, just in case.
Also improve debug logging for all these cases, so we know exactly what
is going on.
This is inspired by #37160, but shouldn't really fix anything there, I
am pretty sure the ENODEV seen in that output stems from the STOP_ARRAY
call, not from the open().
Note that this does not change anything for the device mapper case,
because the DM subsystem does not return useful error codes to
userspace, hence everything is a complete mess there.
Christian Hesse [Wed, 9 Apr 2025 21:03:06 +0000 (23:03 +0200)]
man: mention special functionality for reload-or-restart with --marked (#37076)
We had a downstream discussion on what `systemctl reload-or-restart
--marked` does, until upstream chimed in and pointed on very special
behavior for that combination. 😜
The second references the first, but not vice versa. Let's fix this.
kmeaw [Sun, 30 Mar 2025 12:08:38 +0000 (13:08 +0100)]
shared/calendarspec: fix normalization when DST is negative
When trying to calculate the next firing of 'hourly', we'd lose the
tm_isdst value on the next iteration.
On most systems in Europe/Dublin it would cause a 100% cpu hang due to
timers restarting.
This happens in Europe/Dublin because Ireland defines the Irish Standard Time
as UTC+1, so winter time is encoded in tzdata as negative 1 hour of daylight
saving.
Before this patch:
$ env TZ=IST-1GMT-0,M10.5.0/1,M3.5.0/1 systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
Original form: hourly
Normalized form: *-*-* *:00:00
Next elapse: Sat 2025-03-29 23:00:00 GMT
(in UTC): Sat 2025-03-29 23:00:00 UTC
From now: 13h ago
Iteration #2: Sun 2025-03-30 00:00:00 GMT
(in UTC): Sun 2025-03-30 00:00:00 UTC
From now: 12h ago
Iteration #3: Sun 2025-03-30 00:00:00 GMT <-- note every next iteration having the same firing time
(in UTC): Sun 2025-03-30 00:00:00 UTC
From now: 12h ago
...
With this patch:
$ env TZ=IST-1GMT-0,M10.5.0/1,M3.5.0/1 systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
Original form: hourly
Normalized form: *-*-* *:00:00
Next elapse: Sat 2025-03-29 23:00:00 GMT
(in UTC): Sat 2025-03-29 23:00:00 UTC
From now: 13h ago
Iteration #2: Sun 2025-03-30 00:00:00 GMT
(in UTC): Sun 2025-03-30 00:00:00 UTC
From now: 12h ago
Iteration #3: Sun 2025-03-30 02:00:00 IST <-- the expected 1 hour jump
(in UTC): Sun 2025-03-30 01:00:00 UTC
From now: 11h ago
...
This bug isn't reproduced on Debian and Ubuntu because they mitigate it by
using the rearguard version of tzdata. ArchLinux and NixOS don't, so it would
cause pid1 to spin during DST transition.
This is how the affected tzdata looks like:
$ zdump -V -c 2024,2025 Europe/Dublin
Europe/Dublin Sun Mar 31 00:59:59 2024 UT = Sun Mar 31 00:59:59 2024 GMT isdst=1 gmtoff=0
Europe/Dublin Sun Mar 31 01:00:00 2024 UT = Sun Mar 31 02:00:00 2024 IST isdst=0 gmtoff=3600
Europe/Dublin Sun Oct 27 00:59:59 2024 UT = Sun Oct 27 01:59:59 2024 IST isdst=0 gmtoff=3600
Europe/Dublin Sun Oct 27 01:00:00 2024 UT = Sun Oct 27 01:00:00 2024 GMT isdst=1 gmtoff=0
Compare it to Europe/London:
$ zdump -V -c 2024,2025 Europe/London
Europe/London Sun Mar 31 00:59:59 2024 UT = Sun Mar 31 00:59:59 2024 GMT isdst=0 gmtoff=0
Europe/London Sun Mar 31 01:00:00 2024 UT = Sun Mar 31 02:00:00 2024 BST isdst=1 gmtoff=3600
Europe/London Sun Oct 27 00:59:59 2024 UT = Sun Oct 27 01:59:59 2024 BST isdst=1 gmtoff=3600
Europe/London Sun Oct 27 01:00:00 2024 UT = Sun Oct 27 01:00:00 2024 GMT isdst=0 gmtoff=0
There were some conflicts related to the skipping of 6f5cf41570776f489967d1a7de18260b2bc9acf9, but the tests pass with and the
example output above also looks good, so I think the backport is correct.
The arguments `(rd.)systemd.mount-extra` take a value that looks like
`WHAT:WHERE[:FSTYPE[:OPTIONS]]`. The `OPTIONS` were parsed into a nulstr
where a comma-separated c-string was expected. This leads to a bug where
only the first option was taken into account by the generator.
For example, if you passed `systemd.mount-extra=/x:/y:baz:ro,defaults`
to the kernel, `systemd-fstab-generator` would translate that into a
nulstr: `ro\0defaults\0`.
Since methods processing options in the generator expected a
comma-separated c-string, they would only see the first option, `ro` in
this case.
Some AMD systems have support for features like custom brightness
curve or adaptive backlight management. These features allow the
display driver to adjust the brightness based upon other factors
than just the user brightness request.
The user's brightness request is indicated in the 'brightness' file
but the effective result of the logic in the display driver is stored
in the 'actual_brightness' file.
This leads to problems when shutting the system down because the value
of 'actual_brightness' may be lower than 'brightness' and the wrong value
gets stored for the next boot.
For example if the brightness a user requested was 150, the actual_brightness
might be 130. So the next boot the brightness will be "set" to 130, but the
actual brightness might be 115. If the user reboots again it will be set to 115
for the next boot but the actual brightness might be 100. That is this gets worse
and worse each reboot cycle until the system eventually boots up at minimum
brightness.
Furthermore the kernel documentation indicates that the brightness and
actual_brightness files are not guaranteed to be the same values.
Due to this; drop the use of 'actual_brightness' when saving/restoring brightness
and instead rely only upon 'brightness'.
Mike Yuan [Tue, 8 Apr 2025 20:35:14 +0000 (22:35 +0200)]
run0: make sure we submit $SHELL to remote
Normally, the service manager sets $SHELL to the target user's
login shell, but run0 always overrides that with either
originating user's shell or value from --setenv=SHELL=. In both cases
$SHELL needs to be sent.
test-sd-device: limit the number of iterations when testing device parent/child functions
The test "hangs" and times out on some arm64 machines. It actually works as
expected, but the machine has 2016 children under /sys/devices/system/memory/,
and the tests do a double loop over this, which is slow enough to hit the 120 s
limit. Add a limit on the number of iterations.
Another option would be to exclude "memory" subsystem. But we may have other
subsystems which have the same problem in the future, so I think it'll be more
robust to not try to limit the fix to a specific subsystem.
mkosi: Fix arch build script version sed expression
Yours truly got rid of the _tag variable in the Arch Linux PKGBUILD
a while ago, so actually adapt the build script to that by changing
the pkgver= variable instead.
Daan De Meyer [Thu, 27 Mar 2025 14:49:06 +0000 (15:49 +0100)]
TEST-06-SELINUX: Only enable if meson was invoked as root
This test only works if the image was built as root. Since that's
impossible to check as meson generally runs before we build the image,
let's use whether meson is run as root as a proxy.
Daan De Meyer [Tue, 25 Mar 2025 09:37:32 +0000 (10:37 +0100)]
test: Disable pager in integration test units
Integration test units are now connected to the tty when running
interactively, so let's make sure we disable the pager to avoid tests
hanging in the pager.
Yu Watanabe [Thu, 13 Mar 2025 03:11:40 +0000 (12:11 +0900)]
TEST-73-LOCALE: do not unnecessarily restart systemd-localed
It is not necessary to clear previous keymap assignment, as
`localectl set-keymap` will anyway overwrite the previous assignment.
This drops the unnecessary restart of systemd-localed in the loop.
The mkosi test image contains about 500~700 keymaps. The test
performance is greatly improved by reducing the number of restarts,
especially when the test is running with sanitizers.
On Fedora 41 with sanitizers,
Before:
1/1 systemd:integration-tests / TEST-73-LOCALE OK 1157.50s
After:
1/1 systemd:integration-tests / TEST-73-LOCALE OK 104.43s
Yu Watanabe [Fri, 21 Mar 2025 00:54:45 +0000 (09:54 +0900)]
udev: make udevadm and friends not warn about unknown settings
Without this change, when e.g. event_timeout= is specified in udev.conf,
udevadm and friends which loads udev.conf warn about unknown key:
===
$ udevadm info /sys/class/net/lo
/run/udev/udev.conf.d/test-17.conf:1: Unknown key 'event_timeout', ignoring.
/run/udev/udev.conf.d/test-17.conf:2: Unknown key 'timeout_signal', ignoring.
===
We didn't check the number of arguments first, hence ended up outputting
some ugly complaints with `(null)` in a format string. And what's worse
accepted any number of arguments, where we'd ignore all but the first
two though.