vmspawn: do not preserve access permissions and xattrs of template OVMF vars
This makes vmspawn work when /usr/share/qemu/edk2-i386-vars.fd is on
disk with 0444 permissions as is the case on NixOS.
The nix package manager does not store any access permissions, ownership,
timestamps, or extended attributes in its package format to increase
reproducibility. The only meta-data that is stored is the executable bit.
Thus when unpacking a nix package, the executable bit is preserved, but no other
access permissions are preserved and all files in /nix/store end up as
read-only.
This causes the template OVMF vars file to have 0444 permissions. If we preserve
those permissions when copying the template file to /tmp that means QEMU can not
write to the file and fails.
So lets not preserve permissions and keep the 0600 permissions that are set by
default.
Alex [Mon, 2 Jun 2025 22:47:49 +0000 (18:47 -0400)]
network: fix a potential divide-by-zero (#37705)
In function `tc_init`, hz is parsed from the content of file
`"/proc/net/psched"` and can be 0.
In function `hierarchy_token_bucket_class_verify`, hz is directly used
as a divisor in
`htb->buffer = htb->rate / hz + htb->mtu;` without any check. This adds a check on hz before using it as a divisor.
mount-util: avoid unnecessary mount_setattr() call in make_fsmount()
If .attr_set is zero (and .att_clr, .propagation too), then there's no
point in calling mount_setattr().
Fixes: #37062
Note that this optimization is not precisely load-bearing anymore, since 3cc23a2c2345eb188551565349c89ec1fa8f650f got merged which removes the
only caller of make_fsmount() that might trigger it. But it's worth
fixing generic code anyway, in case it gets used like this later again.
A "string" is a concept in C. In a text-based API, this is implicit, especially
if we say that something was "formatted". So change occurences of "decimal
string" to just "decimal". Similarly, "numerics" is unclear, say "digits".
Also, a "timestamp is in a clock" just sounds wrong. Reword those sentences.
Yu Watanabe [Tue, 27 May 2025 17:09:52 +0000 (02:09 +0900)]
network/link: update state file when master ifindex is changed
If master ifindex is non-zero, then the carrier state and operational
state of the interface may be the enslaved state.
As the operational state is saved in link state file, and read by
wait-online, we need to update the state file when the master ifindex is
changed.
Yu Watanabe [Tue, 27 May 2025 14:17:40 +0000 (23:17 +0900)]
network/link: ENODATA from reading IFLA_MASTER when an interface has no master
When an interface leaved from the master interface, then reading
IFLA_MASTER attribute causes ENODATA. When the interface was previously
enslaved to another interface, we need to remove reference to the
interface from the previous master interface.
This is especially important when
```
ip link set dev eth0 nomaster
```
is called.
Adrian Vovk [Tue, 18 Feb 2025 20:59:03 +0000 (15:59 -0500)]
man/systemd.timer: Correct inaccuracy in man page
The docs previously stated that RandomizedDelaySec is applied onto the
next scheduled time, but after 9fa326b18aef0c1e5c80e23a5b41de02155e6f7e
this is no longer the case.
I also reworded FixedRandomDelay= slightly, to make it a bit clearer
Daan De Meyer [Sun, 11 May 2025 07:42:28 +0000 (09:42 +0200)]
meson: Stop doing nested build when fuzzers are enabled
Currently, when fuzzers are enabled, we run meson from within meson
to build the fuzzer executables with sanitizers. The idea is that
we can build the fuzzers with different kinds of sanitizers
independently from the main build.
The issue with this setup is that we don't actually make use of it.
We only build the fuzzers with one set of sanitizers (address,undefined)
so we're adding a bunch of extra complexity without any benefit as we
can just setup the top level meson build with these sanitizers and get
the same result.
The other issue with this setup is that we don't pass on all the options
passed to the top level meson build to the nested meson build. The only things
we pass on are extra compiler arguments and the value of the auto_features
option, but none of the individual feature options if overridden are passed on,
which can lead to very hard to debug issues as an option enabled in the top
level build is not enabled in the nested build.
Since we're not getting anything useful out of this setup, let's simplify
and get rid of the nested meson build. Instead, sanitizers should be enabled
for the top level meson.build. This currently didn't work as we were overriding
the sanitizers passed to the meson build with the fuzzer sanitizer, so we
fix that as well by making sure we combine the fuzzer sanitizer with the ones
passed in by the user.
We also drop support for looking up libFuzzer as a separate library as
it has been shipped builtin in clang since clang 6.0, so we can assume
that -fsanitize=fuzzer is available.
To make sure we still run the fuzzing tests, we enable the fuzz-tests option
by default now to make sure they still always run (without instrumentation unless
one of llvm-fuzz or oss-fuzz is enabled).
* 5e739ef1ed mkosi-initrd: Optionally match t64 suffix for tss2
libraries
* ec70393077 Merge pull request https://github.com/systemd/mkosi/pull/3742 from DaanDeMeyer/man
|\
| * 94cc136fbe mkosi-tools: Install man tool and pages as part of misc
profile
| * eda2ed533d Enforce C.UTF-8 locale for all commands we run
* | 9821e9a3e3 sandbox: Support using mkosi-sandbox as a library
* | 4145382edf Serialize pid in state and check if still exists on load
* | 3d119cba07 Merge pull request https://github.com/systemd/mkosi/pull/3736 from DaanDeMeyer/rpm-gpgkey
|\ \
| |/
|/|
| * 0a5d87b7bb Only pick up /etc/pki/tls and /etc/pki/ca-trust as
certificate dirs
| * c30eee187f Look for rpm gpg keys from inside the sandbox
|/
* ef2842dfea Fix version bump check if image version was passed on CLI
* 12b6251153 apt: Install apt sources if apt was installed via base tree
* a0b4e1af9a Make sure git doesn't fail when running as root
* 585a47705d repart: use --append-fstab=auto if available
* cec6ae1dda sandbox: handle case where dev node for tty doesn't exist
* a60dade823 initrd: shadow-utils removal is only necessary on old
Fedora
* ca11acbd5b Use SPDX identifier instead of file path for license in
pyproject.toml
* 4d031bc57d Revert license-files property
* c80dd09008 Merge pull request https://github.com/systemd/mkosi/pull/3722 from behrmann/versiontweaks
|\
| * c76e5dc4bc make version test more readable
| * 90ba99dde1 version: add __repr__ to GenericVersion
|/
* dd794ec832 Fix licenses path in pyproject.toml
* 7eeb749840 Merge pull request https://github.com/systemd/mkosi/pull/3702 from aafeijoo-suse/initrd-kmp
|\
| * 565b905aa1 mkosi-initrd: handle symlinks under weak-updates
| * a83ccc10c7 mkosi-initrd: perform basic checks on the kernel dir
before calling mkosi
| * 73cad79c9e mkosi-initrd: --kernel-modules-include ->
--kernel-modules
* bac76904c3 build(deps): bump github/codeql-action from 3.28.13 to
3.28.16
* 44161624a2 Supress ssh unit generation if sshd is not present
* b8758dac28 Partially revert 640000a861e9cd9a3807e4158e110a098c74d078
* 6f11937dc6 Don't use default value if optional settings are set to
none
* 640000a861 Use a default tools tree by default if mkosi.tools.conf
exists
* 63d91cc285 mkosi: Override misconfigured gitconfig HTTP/HTTPS proxy
with ProxyUrl
* a859b5eb13 Make sure we create the default workspace directory as well
mkosi: Run clangd within the tools tree instead of the build container
Running within the build sandbox has a number of disadvantages:
- We have a separate clangd cache for each distribution/release combo
- It requires to build the full image before clangd can be used
- It breaks every time the image becomes out of date and requires a
rebuild
- We can't look at system headers as we don't have the knowledge to map
them from inside the build sandbox to the corresponding path on the host
Instead, let's have mkosi.clangd run clangd within the tools tree. We
already require building systemd for both the host and the target anyway,
and all the dependencies to build systemd are installed in the tools tree
already for that, as well as clangd since it's installed together with the
other clang tooling we install in the tools tree. Unlike the previous approach,
this approach only requires the mkosi tools tree to be built upfront, which has
a much higher chance of not invalidating its cache. We can also trivially map
system header lookups from within the sandbox to the path within mkosi.tools
on the host so that starts working as well.
* dbb4020bee mkosi: Use tools tree by default in repository config
* a2407a305c dnf: Stop messing around with plugins
* eee382ebc6 Fix mkosi help
* 8d4f9969bb mkosi-obs: simplify generation of signed UEFI auth files
* 364dfc65eb Merge pull request #3661 from septatrix/ssh-runtime
|\
| * ab3b52841c Improve Ssh= documentation
| * 79878d7e6c Add new Ssh=auto and Ssh=runtime options
* 49036322c2 Merge pull request #3682 from DaanDeMeyer/history
|\
| * 96e512fe6e installer: Make sure package manager state is preserved in the image
| * b859a7cf0a Only copy repository metadata from specific subdirs from /var
| * c8bf8e4278 Rename cache_subdirs() to package_subdirs()
* | 54b59c4a2e Merge pull request #3696 from DaanDeMeyer/history-cli
|\ \
| * | 898d89e887 Rework version bumping
| * | cc45fe3bad Only write CLI arguments to history instead of full config
| * | 1def443097 Disallow using --rerun-build-scripts with --force again
| * | 87b03ee264 Rename get_configdir() to finalize_configdir()
| * | 9c1217a217 Get rid of to_json() methods on Args and Config
| |/
* | 124f551e77 mkosi-obs: do not publish roothash
* | fc86100e51 mkosi-obs: append certs from mkosi.uefi.db/ to 'db'
* | 8bee4cb8e2 Make sure sync scripts are executable
|/
* a7e90514fa Simplify tools tree out of date error
* f9956daba7 Fail if --rerun-build-scripts is used and tools is out of date
* d94bf56ae8 mkosi-initrd: add specific configuration for plymouth in Debian
* 8235ddbc5b Take shared lock in copy_ephemeral()
* 19c74d5ba5 Two follow ups for #3678
* 0d6f15e8c3 Merge pull request #3678 from DaanDeMeyer/history
|\
| * 5410c4c7af tests: Require genkey to be run once upfront
| * 86b8c611a1 tests: Drop unused tools field
| * c3d1bd0dde Rework history <=> sandbox integration
* fce4db970f zypper: display debugging output if ARG_DEBUG is set
* 2c052b9d45 Allow PCR signing settings to be overridden in sub-images
* 00c220225b zypper: do not fail if a package configured to be removed is not found
mkosi: Rename mkosi.prepare scripts for systemd deps to systemd.prepare
These scripts are reused by multiple images, so let's give them a
non-standard name to indicate that. Otherwise it's all too easy to add
something to mkosi.prepare for the main image and accidentally have it
included in all the subimages as well even though that's not desired.
mkosi: Reuse main image prepare scripts in subimages
In the subimages we also want to make sure all dependencies of the
systemd packages are cached so reuse the same prepare scripts from
the main image to do that.
We only want required dependencies in the subimages, not recommended
or suggested dependendencies, so add an environment variable
$SYSTEMD_REQUIRED_DEPS_ONLY which the prepare scripts can check for
and enable it for the subimages.
mkosi: Make sure coreutils is installed in initrd/exitrd
This is already installed but Fedora/CentOS systems are nudged towards
installing coreutils-single which then later causes issues when we try
to install coreutils as a dependency of systemd so let's make sure we
pick coreutils from the beginning.
mkosi: Move TEST-24-CRYPTSETUP files to mkosi/ directory
If the integration tests have been installed in the systemd-tests
package, the path to these in mkosi.postinst.chroot will be wrong.
Let's fix the issue by moving these files into the mkosi/ directory
as they're only used by mkosi regardless so they make more sense to
be there anyway.
Turns out makepkg sets $SOURCE_DATE_EPOCH= to the current time for
every build if not set explicitly which causes full rebuilds if we
don't set time-epoch explicitly ourselves, so let's do that everywhere
to avoid unnecessary rebuilds.
mkosi: drop os-release symlink for minimal-base image
[ 385s] ERROR: link target doesn't exist (neither in build root nor in installed system):
[ 385s] /usr/lib/systemd/tests/mkosi/mkosi.images/minimal-base/mkosi.extra/etc/os-release -> ../usr/lib/os-release
It shouldn't be even needed, everything should look in /usr/lib/os-release too
With the latest mkosi it's possible for MinimumVersion= to be a git
commit so let's start making use of that. This will make mkosi fail
if it's executed within the systemd repository and the checked out
commit is too old.
Putting the mkosi commit sha in mkosi/mkosi.conf also allows retrieving
it without having the full source tree available.
We also make a bunch of improvements to the fetch-mkosi.py script.
coredump: introduce an enum to wrap dumpable constants
Two constants are described in the man page, but are not defined by a header.
The third constant is described in the kernel docs. Use explicit values to
show that those are values are defined externally.
A new core_pattern specifier was added, %F, to provide a PIDFD
to the usermode helper process referring to the crashed process.
This removes all possible race conditions, ensuring only the
crashed process gets inspected by systemd-coredump.
The check looks plausible, but when I started checking whether it needs
to be lowered for the recent changes, I realized that it doesn't make
much sense.
context_parse_iovw() is called from a few places, e.g.:
- process_socket(), where the other side controls the contents of the
message. We already do other checks on the correctness of the message
and this assert is not needed.
- gather_pid_metadata_from_argv(), which is called after
inserting MESSAGE_ID= and PRIORITY= into the array, so there is no
direct relation between _META_ARGV_MAX and the number of args in the
iovw.
- gather_pid_metadata_from_procfs(), where we insert a bazillion fields,
but without any relation to _META_ARGV_MAX.
Since we already separately check if the required stuff was set, drop this
misleading check.
The kernel provides %d which is documented as
"dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE".
We already query /proc/pid/auxv for this information, but unfortunately this
check is subject to a race, because the crashed process may be replaced by an
attacker before we read this data, for example replacing a SUID process that
was killed by a signal with another process that is not SUID, tricking us into
making the coredump of the original process readable by the attacker.
With this patch, we effectively add one more check to the list of conditions
that need be satisfied if we are to make the coredump accessible to the user.
No functional change. This change is done in preparation for future changes.
Currently, the list of fields which are received on the command line is a
strict subset of the fields which are always expected to be received on a
socket. But when we add new kernel args in the future, we'll have two
non-overlapping sets and this approach will not work. Get rid of the variable
and enumerate the required fields. This set will never change, so this is
actually more maintainable.
The message with the hint where to add new fields is switched with
_META_ARGV_MAX. The new order is more correct.
unit_gc_sweep() might try to add the unit to gc queue again.
While that becomes no-op as Unit.in_gc_queue is not cleared
yet, it induces minor inconsistency of states.
coredump: restore compatibility with older patterns
This was broken in f45b8015513d38ee5f7cc361db9c5b88c9aae704. Unfortunately
the review does not talk about backward compatibility at all. There are
two places where it matters:
- During upgrades, the replacement of kernel.core_pattern is asynchronous.
For example, during rpm upgrades, it would be updated a post-transaction
file trigger. In other scenarios, the update might only happen after
reboot. We have a potentially long window where the old pattern is in
place. We need to capture coredumps during upgrades too.
- With --backtrace. The interface of --backtrace, in hindsight, is not
great. But there are users of --backtrace which were written to use
a specific set of arguments, and we can't just break compatiblity.
One example is systemd-coredump-python, but there are also reports of
users using --backtrace to generate coredump logs.
Thus, we require the original set of args, and will use the additional args if
found.
A test is added to verify that --backtrace works with and without the optional
args.
machined: call pidref_verify() in some cases this was missing
We need to protect us from recycled PIDs here like everywhere else: once
we read data from /proc/$PID/ we need to validate that $PID still points
to the original pidfd.
TheHillBright [Wed, 21 May 2025 10:38:12 +0000 (18:38 +0800)]
journald: clarify doc for usage-related values cap (#37528)
The old description makes users wrongly assume that the cap of 4G
applied, even when the user specifies a value that will result in higher
than 4G. This commit avoids this misunderstanding.
Yu Watanabe [Fri, 9 May 2025 07:56:48 +0000 (16:56 +0900)]
integration-tests: adjust priorities
When running with sanitizers:
```
26/95 systemd:integration-tests / TEST-21-DFUZZER OK 1517.75s
40/95 systemd:integration-tests / TEST-85-NETWORK-NetworkdDHCPClientTests OK 779.18s
42/95 systemd:integration-tests / TEST-04-JOURNAL OK 716.17s
```
and without sanitizers:
```
44/95 systemd:integration-tests / TEST-85-NETWORK-NetworkdDHCPClientTests OK 730.33s
29/95 systemd:integration-tests / TEST-64-UDEV-STORAGE-simultaneous_events OK 701.49s
40/95 systemd:integration-tests / TEST-04-JOURNAL OK 348.05s
```
So, let's set higher priorities only on these tests.
sd-device: fix sysname check in sd_device_new_from_subsystem_sysname()
For example, consider the following device:
- syspath: /sys/bus/mdio_bus/drivers/Qualcomm Atheros AR8031!AR8033
- subsystem: "drivers"
- driver subsystem: "mdio_bus"
- sysname: "Qualcomm Atheros AR8031/AR8033" <-- '!' is replaced with '/'
When sd_device_new_from_subsystem_sysname() is called to get the device,
the arguments to sd_device_new_from_subsystem_sysname() should be
- subsystem: "drivers"
- sysname (concatenated with driver subsystem): "mdio_bus:Qualcomm Atheros AR8031/AR8033"
In that case, we need to pass to device_new_from_path_join() the following:
- subsystem: "drivers"
- driver subsystem: "mdio_bus"
- sysname: "Qualcomm Atheros AR8031/AR8033"
- a: "/sys/bus"
- b: "drivers"
- c: "mdio_bus"
- d: "Qualcomm Atheros AR8031!AR8033"
Here, the important point is that the `sysname` argument and the
last argument `d` are differnt: the `sysname` argument needs to match
the sysname obtained by `sd_device_get_sysname()`, but `d` must be
the last path component of the syspath.
Previously, we passed a wrong sysname to device_new_from_path_join().
This fixes the issue.
This returns to the original approach proposed in
https://github.com/systemd/systemd/pull/17270. After review, the approach was
changed to use sd_pid_get_owner_uid() instead. Back then, when running in a
typical graphical session, sd_pid_get_owner_uid() would usually return the user
UID, and when running under sudo, geteuid() would return 0, so we'd trigger the
secure path.
sudo may allocate a new session if is invoked outside of a session (depending
on the PAM config). Since nowadays desktop environments usually start the user
shell through user units, the typical shell in a terminal emulator is not part
of a session, and when sudo is invoked, a new session is allocated, and
sd_pid_get_owner_uid() returns 0 too. Technically, the code still works as
documented in the man page, but in the common case, it doesn't do the expected
thing.
$ build/test-sd-login |& rg 'get_(owner_uid|cgroup|session)'
sd_pid_get_session(0) → No data available
sd_pid_get_owner_uid(0) → 1000
sd_pid_get_cgroup(0) → /user.slice/user-1000.slice/user@1000.service/app.slice/app-ghostty-transient-5088.scope/surfaces/556FAF50BA40.scope
I think it's worth checking for sudo because it is a common case used by users.
There obviously are other mechanims, so the man page is extended to say that
only some common mechanisms are supported, and to (again) recommend setting
SYSTEMD_LESSSECURE explicitly. The other option would be to set "secure mode"
by default. But this would create an inconvenience for users doing the right
thing, running systemctl and other tools directly, because then they can't run
privileged commands from the pager, e.g. to save the output to a file. (Or the
user would need to explicitly set SYSTEMD_LESSSECURE. One option would be to
set it always in the environment and to rely on sudo and other tools stripping
it from the environment before running privileged code. But that is also fairly
fragile and it obviously relies on the user doing a complicated setup to
support a fairly common use case. I think this decreases usability of the
system quite a bit. I don't think we should build solutions that work in
priniciple, but are painfully inconvenient in common cases.)
man: rework the description of $SYSTEMD_PAGER and $PAGER
$PAGER wasn't documented, but actually we treat it same as $SYSTEMD_PAGER,
except for lower priority. And the two variables can be used to disable the
pager, even if $SYSTEMD_PAGERSECURE is not set.
Behaviour is (obviously) not changed by this patch, it intentionally just
updates the docs to match the code.
man: reword the description of "secure pager" handling
The existing description was not *wrong*, but it was a bit muddled. Let's
reorder the text to give a short intro and then describe what the options
actually do and the clear "true" and "false" cases first, and then describe
autodetection.
Related to https://yeswehack.com/vulnerability-center/reports/346802.
Todd C. Miller [Tue, 6 May 2025 22:39:14 +0000 (16:39 -0600)]
flush_ports: flush POSIX message queues properly
On Linux, read() on a message queue descriptor returns the message
queue statistics, not the actual message queue data. We need to use
mq_receive() to drain the queues instead.
Fixes a problem where a POSIX message queue socket unit with messages
in the queue at shutdown time could result in a hang on reboot/shutdown.
man/systemd.exec: reword description of SystemCallFilter=
The existing text grew organically as features were added and was
not very organized. Reorder it and break into paragraphs grouped
by topic. The description of the :errno syntax is replaced by a short
reference to the SystemCallErrorNumber= setting. This makes the
text shorter and makes it easier to explain how the two settings combine.
Debarshi Ray [Fri, 2 May 2025 19:08:55 +0000 (21:08 +0200)]
meson: Ensure that distribution packages own systemenvgeneratordir
Currently, Fedora's systemd RPM doesn't own systemenvgeneratordir
(ie., /usr/lib/systemd/system-environment-generators) [1] because it's
not created when systemd is installed. In contrast, userenvgeneratordir
(ie., /usr/lib/systemd/user-environment-generators) is created, unless
the environment-d Meson option is explicitly disabled.
While this can be worked around elsewhere, it's better if the upstream
build system created the directories consistently. It will avoid
repetition, and prevent silly bugs or deviations from creeping in.
Tim Small [Fri, 2 May 2025 12:40:00 +0000 (13:40 +0100)]
man/network: Note .link early boot caveat, and .network .netdev usage.
Document .link .network and .netdev file type distinctions in early
introductory text, and document distro-specific need to sync link files
with early-boot copies, see Debian bug 1005282:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005282 for an
example.