.SH SYNOPSIS
.B unshare
[options]
-.I program
-.RI [ arguments ]
+.RI [ program
+.RI [ arguments ]]
.SH DESCRIPTION
Unshares the indicated namespaces from the parent process and then executes
-the specified \fIprogram\fR.
+the specified \fIprogram\fR. If \fIprogram\fR is not given, then ``${SHELL}'' is
+run (default: /bin/sh).
.PP
The namespaces can optionally be made persistent by bind mounting
/proc/\fIpid\fR/ns/\fItype\fR files to a filesystem path and entered with
.BR \%nsenter (1)
-even after the \fIprogram\fR terminates.
+even after the \fIprogram\fR terminates (except PID namespaces where
+permanently running init process is required).
Once a persistent \%namespace is no longer needed, it can be unpersisted with
.BR umount (8).
-See the \fBEXAMPLES\fR section for more details.
+See the \fBEXAMPLE\fR section for more details.
+.PP
+.B unshare
+since util-linux version 2.36 uses /\fIproc/[pid]/ns/pid_for_children\fP and \fI/proc/[pid]/ns/time_for_children\fP
+files for persistent PID and TIME namespaces. This change requires Linux kernel 4.17 or newer.
.PP
The namespaces to be unshared are indicated via options. Unshareable namespaces are:
.TP
-.BR "mount namespace"
-Mounting and unmounting filesystems will not affect the rest of the system
-(\fBCLONE_NEWNS\fP flag), except for filesystems which are explicitly marked as
-shared (with \fBmount --make-shared\fP; see \fI/proc/self/mountinfo\fP or
-\fBfindmnt -o+PROPAGATION\fP for the \fBshared\fP flags).
+.B mount namespace
+Mounting and unmounting filesystems will not affect the rest of the system,
+except for filesystems which are explicitly marked as
+shared (with \fBmount \-\-make-shared\fP; see \fI/proc/self/mountinfo\fP or
+\fBfindmnt \-o+PROPAGATION\fP for the \fBshared\fP flags).
+For further details, see
+.BR mount_namespaces (7)
+and the discussion of the
+.B CLONE_NEWNS
+flag in
+.BR clone (2).
.sp
.B unshare
since util-linux version 2.27 automatically sets propagation to \fBprivate\fP
\fB\-\-propagation unchanged\fP.
Note that \fBprivate\fP is the kernel default.
.TP
-.BR "UTS namespace"
+.B UTS namespace
Setting hostname or domainname will not affect the rest of the system.
-(\fBCLONE_NEWUTS\fP flag)
+For further details, see
+.BR uts_namespaces (7)
+and the discussion of the
+.B CLONE_NEWUTS
+flag in
+.BR clone (2).
.TP
-.BR "IPC namespace"
-The process will have an independent namespace for System V \%message queues,
-semaphore sets and shared memory segments. (\fBCLONE_NEWIPC\fP flag)
+.B IPC namespace
+The process will have an independent namespace for POSIX message queues
+as well as System V \%message queues,
+semaphore sets and shared memory segments.
+For further details, see
+.BR ipc_namespaces (7)
+and the discussion of the
+.B CLONE_NEWIPC
+flag in
+.BR clone (2).
.TP
-.BR "network namespace"
+.B network namespace
The process will have independent IPv4 and IPv6 stacks, IP routing tables,
firewall rules, the \fI/proc/net\fP and \fI/sys/class/net\fP directory trees,
-sockets, etc. (\fBCLONE_NEWNET\fP flag)
+sockets, etc.
+For further details, see
+.BR network_namespaces (7)
+and the discussion of the
+.B CLONE_NEWNET
+flag in
+.BR clone (2).
.TP
-.BR "pid namespace"
+.B PID namespace
Children will have a distinct set of PID-to-process mappings from their parent.
-(\fBCLONE_NEWPID\fP flag)
+For further details, see
+.BR pid_namespaces (7)
+and
+the discussion of the
+.B CLONE_NEWPID
+flag in
+.BR clone (2).
+.TP
+.B cgroup namespace
+The process will have a virtualized view of \fI/proc\:/self\:/cgroup\fP, and new
+cgroup mounts will be rooted at the namespace cgroup root.
+For further details, see
+.BR cgroup_namespaces (7)
+and the discussion of the
+.B CLONE_NEWCGROUP
+flag in
+.BR clone (2).
.TP
-.BR "user namespace"
+.B user namespace
The process will have a distinct set of UIDs, GIDs and capabilities.
-(\fBCLONE_NEWUSER\fP flag)
-.PP
-See \fBclone\fR(2) for the exact semantics of the flags.
+For further details, see
+.BR user_namespaces (7)
+and the discussion of the
+.B CLONE_NEWUSER
+flag in
+.BR clone (2).
+.TP
+.B time namespace
+The process can have a distinct view of
+.B CLOCK_MONOTONIC
+and/or
+.B CLOCK_BOOTTIME
+which can be changed using \fI/proc/self/timens_offsets\fP.
+For further details, see
+.BR time_namespaces (7).
.SH OPTIONS
.TP
.BR \-i , " \-\-ipc" [ =\fIfile ]
Unshare the mount namespace. If \fIfile\fP is specified, then a persistent
namespace is created by a bind mount.
Note that \fIfile\fP has to be located on a filesystem with the propagation
-flag set to \fBprivate\fP. Use the command \fBfindmnt -o+PROPAGATION\fP
+flag set to \fBprivate\fP. Use the command \fBfindmnt \-o+PROPAGATION\fP
when not sure about the current setting. See also the examples below.
.TP
.BR \-n , " \-\-net" [ =\fIfile ]
.TP
.BR \-p , " \-\-pid" [ =\fIfile ]
Unshare the PID namespace. If \fIfile\fP is specified then persistent
-namespace is created by a bind mount. See also the \fB--fork\fP and
-\fB--mount-proc\fP options.
+namespace is created by a bind mount. See also the \fB\-\-fork\fP and
+\fB\-\-mount-proc\fP options.
.TP
.BR \-u , " \-\-uts" [ =\fIfile ]
Unshare the UTS namespace. If \fIfile\fP is specified, then a persistent
Unshare the user namespace. If \fIfile\fP is specified, then a persistent
namespace is created by a bind mount.
.TP
+.BR \-C , " \-\-cgroup"[=\fIfile\fP]
+Unshare the cgroup namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
+.TP
+.BR \-T , " \-\-time"[=\fIfile\fP]
+Unshare the time namespace. If \fIfile\fP is specified then a persistent
+namespace is created by a bind mount. The \fB\-\-monotonic\fP and
+\fB\-\-boottime\fP options can be used to specify the corresponding
+offset in the time namespace.
+.TP
.BR \-f , " \-\-fork"
Fork the specified \fIprogram\fR as a child process of \fBunshare\fR rather than
running it directly. This is useful when creating a new PID namespace.
.TP
+.B \-\-keep\-caps
+When the \fB\-\-user\fP option is given, ensure that capabilities granted
+in the user namespace are preserved in the child process.
+.TP
+.BR \-\-kill\-child [ =\fIsigname ]
+When \fBunshare\fR terminates, have \fIsigname\fP be sent to the forked child process.
+Combined with \fB\-\-pid\fR this allows for an easy and reliable killing of the entire
+process tree below \fBunshare\fR.
+If not given, \fIsigname\fP defaults to \fBSIGKILL\fR.
+This option implies \fB\-\-fork\fR.
+.TP
.BR \-\-mount\-proc [ =\fImountpoint ]
Just before running the program, mount the proc filesystem at \fImountpoint\fP
(default is /proc). This is useful when creating a new PID namespace. It also
mess up existing programs on the system. The new proc filesystem is explicitly
mounted as private (with MS_PRIVATE|MS_REC).
.TP
+.BI \-\-map\-user= uid|name
+Run the program only after the current effective user ID has been mapped to \fIuid\fP.
+If this option is specified multiple times, the last occurrence takes precedence.
+This option implies \fB\-\-user\fR.
+.TP
+.BI \-\-map\-group= gid|name
+Run the program only after the current effective group ID has been mapped to \fIgid\fP.
+If this option is specified multiple times, the last occurrence takes precedence.
+This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+.TP
.BR \-r , " \-\-map\-root\-user"
Run the program only after the current effective user and group IDs have been mapped to
the superuser UID and GID in the newly created user namespace. This makes it possible to
namespaces (such as configuring interfaces in the network namespace or mounting filesystems in
the mount namespace) even when run unprivileged. As a mere convenience feature, it does not support
more sophisticated use cases, such as mapping multiple ranges of UIDs and GIDs.
-This option implies \fB--setgroups=deny\fR.
+This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+This option is equivalent to \fB\-\-map-user=0 \-\-map-group=0\fR.
+.TP
+.BR \-c , " \-\-map\-current\-user"
+Run the program only after the current effective user and group IDs have been mapped to
+the same UID and GID in the newly created user namespace. This option implies
+\fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+This option is equivalent to \fB\-\-map-user=$(id -ru) \-\-map-group=$(id -rg)\fR.
.TP
.BR "\-\-propagation private" | shared | slave | unchanged
Recursively set the mount propagation flag in the new mount namespace. The default
.BR "\-\-setgroups allow" | deny
Allow or deny the
.BR setgroups (2)
-syscall in user namespaces.
-
-.BR setgroups (2)
-is only callable with CAP_SETGID and CAP_SETGID in a user
-namespace. Linux kernel since 3.19 does not give you permission to call setgroups(2)
-until after GID map has been set. The GID map is writable by root when
-.BR setgroups (2)
-is enabled and the GID map becomes writable by unprivileged processes when
-.BR setgroups (2)
-is permanently disabled.
+system call in a user namespace.
+.sp
+To be able to call
+.BR setgroups (2),
+the calling process must at least have CAP_SETGID.
+But since Linux 3.19 a further restriction applies:
+the kernel gives permission to call
+.BR \%setgroups (2)
+only after the GID map (\fB/proc/\fIpid\fB/gid_map\fR) has been set.
+The GID map is writable by root when
+.BR \%setgroups (2)
+is enabled (i.e., \fBallow\fR, the default), and
+the GID map becomes writable by unprivileged processes when
+.BR \%setgroups (2)
+is permanently disabled (with \fBdeny\fR).
+.TP
+.BR \-R, "\-\-root=\fIdir"
+run the command with root directory set to \fIdir\fP.
+.TP
+.BR \-w, "\-\-wd=\fIdir"
+change working directory to \fIdir\fP.
+.TP
+.BR \-S, "\-\-setuid \fIuid"
+Set the user ID which will be used in the entered namespace.
+.TP
+.BR \-G, "\-\-setgid \fIgid"
+Set the group ID which will be used in the entered namespace and drop
+supplementary groups.
+.TP
+.BI \-\-monotonic " offset"
+Set the offset of
+.B CLOCK_MONOTONIC
+which will be used in the entered time namespace. This option requires
+unsharing a time namespace with \fB\-\-time\fP.
+.TP
+.BI \-\-boottime " offset"
+Set the offset of
+.B CLOCK_BOOTTIME
+which will be used in the entered time namespace. This option requires
+unsharing a time namespace with \fB\-\-time\fP.
.TP
.BR \-V , " \-\-version"
Display version information and exit.
.TP
.BR \-h , " \-\-help"
Display help text and exit.
-.SH EXAMPLES
+.SH NOTES
+The proc and sysfs filesystems mounting as root in a user namespace have to be
+restricted so that a less privileged user can not get more access to sensitive
+files that a more privileged user made unavailable. In short the rule for proc
+and sysfs is as close to a bind mount as possible.
+.SH EXAMPLE
.TP
-.B # unshare --fork --pid --mount-proc readlink /proc/self
+.B # unshare \-\-fork \-\-pid \-\-mount-proc readlink /proc/self
.TQ
1
.br
Establish a PID namespace, ensure we're PID 1 in it against a newly mounted
procfs instance.
.TP
-.B $ unshare --map-root-user --user sh -c whoami
+.B $ unshare \-\-map-root-user \-\-user sh \-c whoami
.TQ
root
.br
.TP
.B # touch /root/uts-ns
.TQ
-.B # unshare --uts=/root/uts-ns hostname FOO
+.B # unshare \-\-uts=/root/uts-ns hostname FOO
.TQ
-.B # nsenter --uts=/root/uts-ns hostname
+.B # nsenter \-\-uts=/root/uts-ns hostname
.TQ
FOO
.TQ
is then entered with \fBnsenter\fR. The namespace is destroyed by unmounting
the bind reference.
.TP
-.B # mount --bind /root/namespaces /root/namespaces
+.B # mount \-\-bind /root/namespaces /root/namespaces
.TQ
-.B # mount --make-private /root/namespaces
+.B # mount \-\-make-private /root/namespaces
.TQ
.B # touch /root/namespaces/mnt
.TQ
-.B # unshare --mount=/root/namespaces/mnt
+.B # unshare \-\-mount=/root/namespaces/mnt
.br
Establish a persistent mount namespace referenced by the bind mount
/root/namespaces/mnt. This example shows a portable solution, because it
makes sure that the bind mount is created on a shared filesystem.
+.TP
+.B # unshare \-pf \-\-kill-child \-\- bash \-c "(sleep 999 &) && sleep 1000" &
+.TQ
+.B # pid=$!
+.TQ
+.B # kill $pid
+.br
+Reliable killing of subprocesses of the \fIprogram\fR.
+When \fBunshare\fR gets killed, everything below it gets killed as well.
+Without it, the children of \fIprogram\fR would have orphaned and
+been re-parented to PID 1.
+.TP
+.B # unshare \-\-fork \-\-time \-\-boottime 100000000 uptime
+.TQ
+ 10:58:48 up 1158 days, 6:05, 1 user, load average: 0.00, 0.00, 0.00
-.SH SEE ALSO
-.BR unshare (2),
-.BR clone (2),
-.BR mount (8)
.SH AUTHORS
.UR dottedmag@dottedmag.net
Mikhail Gusarov
.UR kzak@redhat.com
Karel Zak
.UE
+.SH SEE ALSO
+.BR clone (2),
+.BR unshare (2),
+.BR namespaces (7),
+.BR mount (8)
.SH AVAILABILITY
The unshare command is part of the util-linux package and is available from
-ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
+https://www.kernel.org/pub/linux/utils/util-linux/.