Commit
38748596f078 ("core: Make DelegateNamespaces= work for user
managers with CAP_SYS_ADMIN") refactored the logic for when an
unprivileged process should create a new user namespace for sandboxing.
This refactor inadvertently removed a check (`params->runtime_scope !=
RUNTIME_SCOPE_USER`) that differentiated between system services and user
services.
This causes a regression in rootless containers where systemd runs
unprivileged. When starting a system service (like `dbus-broker`) that
uses sandboxing features (eg. with `PrivateTmp=yes`), systemd now
incorrectly creates a new, minimal `PRIVATE_USERS_SELF` namespace.
This new namespace only maps UID/GID 0. When dbus-broker attempts to
drop privileges to the `dbus` user (GID 81), the `setresgid(81, 81, 81)`
call fails because GID 81 is not mapped.
Restore the check to ensure that the special unprivileged sandboxing
logic is only applied to user services, as was the original intent.
System services in a rootless context will now correctly run in the
container's main user namespace, where all necessary UIDs/GIDs are
mapped.
Fixes: https://github.com/systemd/systemd/issues/39563
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2391343
static bool exec_needs_cap_sys_admin(const ExecContext *context, const ExecParameters *params) {
assert(context);
+ assert(params);
+
+ /* We only want to ever imply PrivateUsers= for user managers, as they're not expected to setuid() to
+ * other users, unlike the system manager which needs all users to be around. */
+ if (params->runtime_scope != RUNTIME_SCOPE_USER)
+ return false;
return context->private_users != PRIVATE_USERS_NO ||
context->private_tmp != PRIVATE_TMP_NO ||