<varlistentry>
<term><varname>PrivateUsers=</varname></term>
- <listitem><para>Takes a boolean argument or one of <literal>self</literal>, <literal>identity</literal>,
- or <literal>full</literal>. Defaults to false. If enabled, sets up a new user namespace for the
- executed processes and configures a user and group mapping. If set to a true value or
- <literal>self</literal>, a minimal user and group mapping is configured that maps the
- <literal>root</literal> user and group as well as the unit's own user and group to themselves and
- everything else to the <literal>nobody</literal> user and group. This is useful to securely detach
- the user and group databases used by the unit from the rest of the system, and thus to create an
- effective sandbox environment. All files, directories, processes, IPC objects and other resources
- owned by users/groups not equaling <literal>root</literal> or the unit's own will stay visible from
- within the unit but appear owned by the <literal>nobody</literal> user and group. </para>
+ <listitem><para>Takes a boolean argument or one of <literal>self</literal>,
+ <literal>identity</literal>, <literal>full</literal> or <literal>managed</literal>. Defaults to
+ false. If enabled, sets up a new user namespace for the executed processes and configures a user and
+ group mapping. If set to a true value or <literal>self</literal>, a minimal user and group mapping is
+ configured that maps the <literal>root</literal> user and group as well as the unit's own user and
+ group to themselves and everything else to the <literal>nobody</literal> user and group. This is
+ useful to securely detach the user and group databases used by the unit from the rest of the system,
+ and thus to create an effective sandbox environment. All files, directories, processes, IPC objects
+ and other resources owned by users/groups not equaling <literal>root</literal> or the unit's own will
+ stay visible from within the unit but appear owned by the <literal>nobody</literal> user and
+ group. </para>
<para>If the parameter is <literal>identity</literal>, user namespacing is set up with an identity
mapping for the first 65536 UIDs/GIDs. Any UIDs/GIDs above 65536 will be mapped to the
to call <function>setgroups()</function> system calls (by setting
<filename>/proc/<replaceable>pid</replaceable>/setgroups</filename> to <literal>allow</literal>).
Similar to <literal>identity</literal>, this does not provide UID/GID isolation, but it does provide
- process capability isolation.</para>
-
- <para>If this mode is enabled, all unit processes are run without privileges in the host user
- namespace (regardless of whether the unit's own user/group is <literal>root</literal> or not). Specifically
- this means that the process will have zero process capabilities on the host's user namespace, but
- full capabilities within the service's user namespace. Settings such as
- <varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire
- additional capabilities in the host's user namespace.</para>
+ process capability isolation. If this mode is enabled, all unit processes are run without privileges
+ in the host user namespace (regardless of whether the unit's own user/group is
+ <literal>root</literal> or not). Specifically this means that the process will have zero process
+ capabilities on the host's user namespace, but full capabilities within the service's user
+ namespace. Settings such as <varname>CapabilityBoundingSet=</varname> will affect only the latter,
+ and there's no way to acquire additional capabilities in the host's user namespace.</para>
+
+ <para>If the paramater is <literal>managed</literal> a transient, dynamically allocated range of
+ 65536 UIDs/GIDs is allocated for the unit, and a UID/GID mapping is assigned to the unit's process
+ so the UID/GID 0 from inside the unit maps to the first UID/GID of the allocated mapping. Note that
+ in this mode the UID/GID the service process will run as is different depending if looking from the
+ host side (where it will be a high, dynamically assigned UID) or from inside the unit (where it will
+ be 0). Also note that this mode will enable file system UID mapping for the file systems this service
+ accesses, mapping the "foreign" UID range on disk to the selected dynamic UID range at
+ runtime.</para>
<para>When this setting is set up by a per-user instance of the service manager, the mapping of the
<literal>root</literal> user and group to itself is omitted (unless the user manager is root).
#include "mountpoint-util.h"
#include "namespace-util.h"
#include "nsflags.h"
+#include "nsresource.h"
#include "open-file.h"
#include "osc-context.h"
#include "pam-util.h"
static int setup_private_users(
PrivateUsers private_users,
- uid_t ouid,
- gid_t ogid,
- uid_t uid,
- gid_t gid,
+ uid_t ouid, /* service manager uid */
+ gid_t ogid, /* service manager gid */
+ uid_t uid, /* unit uid */
+ gid_t gid, /* unit gid */
bool allow_setgroups) {
_cleanup_free_ char *uid_map = NULL, *gid_map = NULL;
case PRIVATE_USERS_NO:
return 0; /* Early exit */
+ case PRIVATE_USERS_MANAGED: {
+ if (uid != 0 || gid != 0)
+ return log_debug_errno(SYNTHETIC_ERRNO(EPERM), "When allocating dynamic user namespace range, target UID/GID must be root, refusing.");
+
+ _cleanup_close_ int userns_fd = nsresource_allocate_userns(/* name= */ NULL, NSRESOURCE_UIDS_64K);
+ if (userns_fd < 0)
+ return userns_fd;
+
+ if (setns(userns_fd, CLONE_NEWUSER) < 0)
+ return log_debug_errno(errno, "Failed to join freshly allocated user namespace: %m");
+
+ /* In "managed" mode the originating UID is not mapped hence we need to explicitly become root in the new userns now. */
+ r = reset_uid_gid();
+ if (r < 0)
+ return log_debug_errno(r, "Failed to reset UID/GID to root: %m");
+
+ return 1; /* Early exit */
+ }
+
case PRIVATE_USERS_IDENTITY:
uid_map = strdup("0 0 65536\n");
if (!uid_map)