Justification similar as in the previous commit. The check is only
partially connected to the intended purpose and breaks backwards compat
without a sufficient reason.
The original change was done to clean up a situation where we added a
new group, but the group could already have been used for some other
purposes, and now the some unexpected entity would own the device.
Unfortunately, this check doesn't really address the issue, since the
existing account might as well be a system account, which might be
equally bad. In addition, this change is a big compatiblity break,
causing existing rules to stop working. Since quite a lot of systems
have local configuration to assign devices to users for various
purposes, this is very noticable to users. In a way, the original change
to add a new group was the compat break, and follow-up patch to cahnge
the rule parsing evolved a small compat break into a much bigger one.
There is merit to the change though, since device nodes shouldn't be
owned by users and groups and different mechanisms should be used
instead. To avoid breaking users systems, and since the original goal
cannot be achieved by this patch, let's downgrade this to a warning
to guide users towards different solutions.
mountfsd: do not cross mount boundaries when looking for parent of foreign UID range owned dirs
This is primarily paranoia: it might be possible for unpriv users to set
up mount hierarchies in unexpected ways when using userns. Hence let's
make protections more rigid: when looking for a parent dir of a foreign
UID owned dir tree, refuse to cross mount boundaries.
When a cgroup is selected for termination, send varlink messages to
hooks registered in `/run/systemd/oomd.prekill-hooks/`.
oomd waits up to `PreKillTimeoutSec=` seconds for response before
proceeding with the kill.
Matteo Croce [Mon, 25 Aug 2025 15:13:00 +0000 (17:13 +0200)]
oomd: implement a prekill varlink event
When a cgroup is selected for termination, send varlink messages
to hooks registered in `/run/systemd/oomd.prekill-hooks/`.
oomd waits up to `PreKillHookTimeoutSec=` seconds for response
before proceeding with the kill.
The revert is needed because with the PreKill hook, oomd_cgroup_kill()
is not goint to really kill processes but it just creates the callbacks.
So the check is deferred to the real kill.
udev: Introduce uaccess for remote graphical sessions (#38516)
When systemd is compiled with group-render-mode=0660, only the active
seat gets access to the render devices through uaccess. Remote desktop
sessions like gnome-remote-desktop would be left with no hardware
rendering, because those sessions are not associated with a seat.
We solve the issue by granting uaccess to specifically tagged devices on
session start, if the session is marked with
XDG_SESSION_EXTRA_DEVICE_ACCESS.
udev-builtin-uaccess is refactored to grant multiple users access to a
device, taking into account the device's seat and all the active
EXTRA_DEVICE_ACCESS sessions.
report: keep track of varlink connections inside of Context object
Let's also move the Varlink connection management into the Context
object. Let's also switch to Set* for it, so that we get get
auto-expanding behaviour.
It's one of the primary objects that make up the program "context"
conceptually, hence it also should be part of the Context object. This
allows us to just have it available if the Context object is seen.
report: do not treat an empty report dir as an issue
We should permit that the report varlink dir is created on the fly when
the first socket is bound there. Hence, let's treat a non-existant dir
equivalent to an empty one.
We usually do this in our tree like this, do it here too.
Yu Watanabe [Fri, 6 Feb 2026 16:07:33 +0000 (01:07 +0900)]
daemon-util: downgrade log level on ECONNREFUSED and friends
This partially reverts 36c557f7d41441bbd98a8965348dfe8050fc9c98, which
introduced notify_remove_fd() that logs in LOG_DEBUG. However,
notify_remove_fd_warn() is still called other library functions, e.g.
notify_push_fd(), and produces warning message about the failure in
removing fd from fdstore on shutdown.
During shutdown process, we get the following logs:
```
systemd-udevd[370]: Failed to send notify message to '/run/systemd/notify': Connection refused
systemd-udevd[370]: Failed to remove file descriptor "config-serialization" from the store, ignoring: Connection refused
systemd-udevd[370]: Failed to send notify message to '/run/systemd/notify': Connection refused
systemd-udevd[370]: Failed to push serialization fd to service manager: Connection refused
```
Here, the 1st, 3rd, and 4th messages are in LOG_DEBUG, but the 2nd one
was in LOG_WARNING before this commit, and this makes it also in LOG_DEBUG.
Nick Rosbrook [Fri, 6 Feb 2026 16:38:47 +0000 (11:38 -0500)]
resolvectl: include ifindex when printing link-local DNS server
Historically, resolvectl status has not included the interface
specification for DNS servers with an IPv6 link-local address, since it
is technically somewhat redundant. But, adding this extra bit of
information makes it easier to copy-and-paste to use elsewhere, etc.
For example, the previous output:
Link 2 (enp34s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: fe80::861e:a3ff:feb1:f8e7
DNS Servers: 192.168.1.12 192.168.1.13 fe80::861e:a3ff:feb1:f8e7
DNS Domain: lan
now becomes:
Link 2 (enp34s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: fe80::861e:a3ff:feb1:f8e7%2
DNS Servers: 192.168.1.12 192.168.1.13 fe80::861e:a3ff:feb1:f8e7%2
DNS Domain: lan
bootctl: return recognizable Varlink error when we cannot determine the boot entry token
When running "bootctl install" on an empty --root= dir, we don't know
which token to use, and the operation will fail. Make sure to return an
explicit error about this.
This introduces a recognizable low-level error for this (EUNATCH), and
then turns this into a recognizable Varlink error.
(I made sure that the old low-level error EINVAL wasn't load-bearing,
and it is safe to change this.)
bootctl: rework bootctl-install.c in preparation of varlinkification
This primarily introduces a context object for each operation, so that
we later can instantiate one for each varlink op we execute, and can
safely lifecycle all operation parameters for each subequent call.
This also reworks the root dir handling to be fd based.
This drops explicit CHASE_TRIGGER_AUTOFS from a bunch of chase() calls
that operate within the ESP/XBOOTLDR, while it keeps them in place for the
chase() calls that find the top-level ESP/XBOOTLDR inode. This reflects
the fact that we explicitly support autofs for the ESP/XBOOTLDR itself,
but below it expect no further mounts, just plain VFAT.
This changes behaviour of the interaction of $KERNEL_INSTALL_CONF_ROOT
and --root=: the former will now be taken relative to the host root, and
will no longer be affected by --root=. This follows similar behaviour in
kernel-install, where it is very explicitly documented in the man page
(the bootclt man page does not document this). This is strictly speaking
a compat breakage, but i think a very minor, niche one, and I think the
pain afflicted by this change is probably neglible compare to the
unsystematic behaviour comapred to kernel-install.
CODING_STYLE: document how to handle kernel compat
Let's define a way how to mark codepaths that are subject to
deletion once the kernel baseline reaches a certain version, to make it
easier to find these cases.
WHile we are at it, introuce a whole section in CODING_STYLE about
kernel version compat.
I followed the new scheme in #39621, but we can merge the coding style
guidelines on this already.
In my testing I switched building my locally run CI integration tests to
ArchLinux and realized that for that the default sizes don't work
anymore, the images are larger than the space allocated. Let's bump the
size by 50% for the relevant disk images.
When systemd is compiled with group-render-mode=0660, only the active seat
gets access to the render devices through uaccess. Remote desktop sessions
like gnome-remote-desktop would be left with no hardware rendering, because
those sessions are not associated with a seat.
Tag the render nodes with "xaccess" so that access is also granted to remote
sessions created with XDG_SESSION_EXTRA_DEVICE_ACCESS=1
udev: Grant sessions access to devices tagged with xaccess
Grant access to devices tagged with "xaccess" on session start, if the session
was created with XDG_SESSION_EXTRA_DEVICE_ACCESS=1.
udev-builtin-uaccess is refactored to grant multiple users access to a device,
taking into account the device's seat and all the active EXTRA_DEVICE_ACCESS
sessions.
login: Add XDG_SESSION_EXTRA_DEVICE_ACCESS variable for additional access
A session created with XDG_SESSION_EXTRA_DEVICE_ACCESS will be granted
additional powers.
Exactly which powers are granted is going to be defined by udevd.
The matrix before was setting accel values to follow normal device
orientation, but the accel values must match the panel orientation that
in these devices is 90 degrees CCW.
Indicate how the panel is mounted in the comment. Could be interesting
to do it also for other devices because when desktop enviroments do it
right the user could be unaware of the panel mounting and could think
monitor-sensor output is bogus.
nsresourced: Ensure that all user namespaces are cleaned-up
The code here assumes that free_user_ns() is called for every single
user namespace. That however has never been the case and the logic for
free_user_ns() is a bit more involved.
A nested user namespace pins its parent user namespace. IOW, the
lifetime of the parent user namespaces is at least as long as the child
user namespaces.
If a parent user namespace becomes unused (no namespace file descriptors
or task using it anymore) then it will stick around and its lifetime
still bound to the child user namespace.
free_user_ns() takes advantage of that behavior. If a child user
namespace is freed and its parent user namespace is already unused then
then free_user_ns() will free both the child and the parent user
namespace. This means a single free_user_ns() frees two user namespaces.
Hence, the bpf program never sees the parent user namespace being freed.
We can fix this by piggy-backing on another function that is called for
every single user namespace being freed. This requires CONFIG_SYSCTL but
systemd doesn't work without that anyway.
The return type needs to change to a scalar type as required by libbpf.
Long-term what we need is appropriate LSM infrastructure for this
including hooks that get called on namespace destruction.
Thanks to Daan DeMeyer for figuring out that the cast is needed.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Daan De Meyer [Sat, 24 Jan 2026 19:52:14 +0000 (20:52 +0100)]
mountfsd: Always open_tree() in mount namespace of peer
open_tree() will fail with EINVAL when passed a directory file descriptor
that comes from another mount namespace. While this should be fixed in a
future kernel, let's workaround the issue for now by entering the mount
namespace of the peer if needed and calling open_tree() there and then
passing the fd back to the mountfsd process.
Mike Yuan [Thu, 5 Feb 2026 00:32:59 +0000 (01:32 +0100)]
mountpoint-util: rework name_to_handle_at() unique mount id handling
name_to_handle_at_try_unique_mntid_fid() in its current form is
ill-designed for various reasons:
* AT_HANDLE_FID requires file system support, while unique mount id
is a VFS concept hence is always available if supported. Hence
the fallback for AT_HANDLE_MNT_ID_UNIQUE should be independent
of fid.
* The request for AT_HANDLE_MNT_ID_UNIQUE can be identified via
specifying ret_unique_mnt_id, no need for opening up the control
to caller (and currently the function simply doesn't handle
mismatch between ret params and flags).
* The caller cannot realistically differentiate whether the returned
mount id is actually unique.
* The path_get_unique_mnt_id() fallback did not handle AT_SYMLINK_FOLLOW.
Let's instead move the statx() fallback into name_to_handle_at_loop()
directly, and revamp interaction of ret_mnt_id/ret_unique_mnt_id:
if both are set, it indicates that the caller can handle both, hence
set what we have and return 0/1 for whether we managed to acquire
the unique one.
The !ret_handle && ret_mnt_id logic is removed. Let's not rely on
undocumented bizaare behavior and it's unused anyways.
path_get_mnt_id_at() exists for a reason...
* 215a9497cc fedora: Use N-1 key as well when querying rawhide GPG key
* 842a37ed6c Add MakeScriptsExecutable= setting to optionally try to
make scripts executable before bailing out
* 814f2004bb build(deps): bump github/codeql-action from 4.31.9 to
4.32.0
* d8f4f628bf build(deps): bump actions/checkout from 6.0.1 to 6.0.2
* 3e55361142 docs: remove superfluous definition colon
* 5901524c48 mkosi-tools: add libarchive-tools package.
* 968392f1b9 docs: Add information about gui mkosi-tools profile
* 0e2960c245 Add missing call to run_locale_gen()
* 41cd2067bc rpm: Set pkgverify_level to digest
* 86fe0f448a dnf: Give advanced users some control over plugins
* 50a1feee52 run: Improve sandbox command logging
* b1dffe1c3c Fix environment variable name for systemd-repart
* 07726068d9 Allow specifying "default" value for Initrds=
* 704f163ec0 Allow setting PORTABLE_PREFIXES= via Environment=
* e6588afb45 opensuse: More GPG key handling fixes
* c367f993dd opensuse: Fetch remote keys as well if RepositoryKeyFetch=
is enabled
* 31852c9314 ci: Use mkosi box for unit test CI as well
* e4229f5bf5 Make sure we pass the right context to
finalize_default_initrd()
* 9b431b783a tools: don't pull in virtiofsd in bookworm tools trees
* ae2d88d463 build(deps): bump github/codeql-action from 4.31.6 to
4.31.9
* 933401a8b6 build(deps): bump actions/checkout from 6.0.0 to 6.0.1
* 6bfeb4ac86 opensuse: Import GPG keys for all repositories
* 9829b9136f Add support for locale-gen
* 63ae86ec04 nixos: Use repository key fetching by default on nixos
* f01ca9904b docs: Reword dependencies vs tools tree requirement a bit
* ab47ba25ef docs: Minor correction on enabling unprivileged namespaces
* 7bd46a417e docs: Update unprivileged user namespace docs
* 14d2d37a19 sandbox: Make sure we're dumpable before writing uidmap files
* 215a9497cc fedora: Use N-1 key as well when querying rawhide GPG key
* 842a37ed6c Add MakeScriptsExecutable= setting to optionally try to make scripts executable before bailing out
* 814f2004bb build(deps): bump github/codeql-action from 4.31.9 to 4.32.0
* d8f4f628bf build(deps): bump actions/checkout from 6.0.1 to 6.0.2
* 3e55361142 docs: remove superfluous definition colon
* 5901524c48 mkosi-tools: add libarchive-tools package.
* 968392f1b9 docs: Add information about gui mkosi-tools profile
* 0e2960c245 Add missing call to run_locale_gen()
* 41cd2067bc rpm: Set pkgverify_level to digest
* 86fe0f448a dnf: Give advanced users some control over plugins
* 50a1feee52 run: Improve sandbox command logging
* b1dffe1c3c Fix environment variable name for systemd-repart
* 07726068d9 Allow specifying "default" value for Initrds=
* 704f163ec0 Allow setting PORTABLE_PREFIXES= via Environment=
* e6588afb45 opensuse: More GPG key handling fixes
* c367f993dd opensuse: Fetch remote keys as well if RepositoryKeyFetch= is enabled
* 31852c9314 ci: Use mkosi box for unit test CI as well
* e4229f5bf5 Make sure we pass the right context to finalize_default_initrd()
* 9b431b783a tools: don't pull in virtiofsd in bookworm tools trees
* ae2d88d463 build(deps): bump github/codeql-action from 4.31.6 to 4.31.9
* 933401a8b6 build(deps): bump actions/checkout from 6.0.0 to 6.0.1
* 6bfeb4ac86 opensuse: Import GPG keys for all repositories
* 9829b9136f Add support for locale-gen
* 63ae86ec04 nixos: Use repository key fetching by default on nixos
* f01ca9904b docs: Reword dependencies vs tools tree requirement a bit
* ab47ba25ef docs: Minor correction on enabling unprivileged namespaces
* 7bd46a417e docs: Update unprivileged user namespace docs
gvenugo3 [Tue, 3 Feb 2026 03:57:30 +0000 (20:57 -0700)]
sleep: allow HibernateDelaySec and low-battery hibernation to work together
Previously, setting HibernateDelaySec= would disable ACPI battery trip
point (_BTP) alarms, forcing the system to rely solely on software
polling for battery checks. This could result in the battery draining
to 0% between polling intervals, causing data loss.
Now, when ACPI _BTP is available AND HibernateDelaySec= is set, both
mechanisms work together. The system will hibernate on whichever comes
first: low battery (instant hardware alarm) or the configured timeout.
This also properly respects HibernateOnACPower=no by resetting the
timer while on AC power, matching the documented behavior.