Manual pages: nsenter.1, unshare.1: add a reference to time_namespaces(7)

[thirdparty/util-linux.git] / sys-utils / unshare.1
diff --git a/sys-utils/unshare.1 b/sys-utils/unshare.1

index c62d24c69d5c05cfe7b33f5f3a9552609b9ac05c..db67b0d4c23b5df350b5f8dbd148db10348fc203 100644 (file)
--- a/sys-utils/unshare.1
+++ b/sys-utils/unshare.1
@@ -4,27 +4,39 @@ unshare \- run program with some namespaces unshared from parent
  .SH SYNOPSIS
  .B unshare
  [options]
-.I program
-.RI [ arguments ]
+.RI [ program
+.RI [ arguments ]]
  .SH DESCRIPTION
  Unshares the indicated namespaces from the parent process and then executes
-the specified \fIprogram\fR.
+the specified \fIprogram\fR. If \fIprogram\fR is not given, then ``${SHELL}'' is
+run (default: /bin/sh).
  .PP
  The namespaces can optionally be made persistent by bind mounting
  /proc/\fIpid\fR/ns/\fItype\fR files to a filesystem path and entered with
  .BR \%nsenter (1)
-even after the \fIprogram\fR terminates.
+even after the \fIprogram\fR terminates (except PID namespaces where
+permanently running init process is required).
  Once a persistent \%namespace is no longer needed, it can be unpersisted with
  .BR umount (8).
-See the \fBEXAMPLES\fR section for more details.
+See the \fBEXAMPLE\fR section for more details.
+.PP
+.B unshare
+since util-linux version 2.36 uses /\fIproc/[pid]/ns/pid_for_children\fP and \fI/proc/[pid]/ns/time_for_children\fP
+files for persistent PID and TIME namespaces. This change requires Linux kernel 4.17 or newer.
  .PP
  The namespaces to be unshared are indicated via options.  Unshareable namespaces are:
  .TP
-.BR "mount namespace"
-Mounting and unmounting filesystems will not affect the rest of the system
-(\fBCLONE_NEWNS\fP flag), except for filesystems which are explicitly marked as
-shared (with \fBmount --make-shared\fP; see \fI/proc/self/mountinfo\fP or
-\fBfindmnt -o+PROPAGATION\fP for the \fBshared\fP flags).
+.B mount namespace
+Mounting and unmounting filesystems will not affect the rest of the system,
+except for filesystems which are explicitly marked as
+shared (with \fBmount \-\-make-shared\fP; see \fI/proc/self/mountinfo\fP or
+\fBfindmnt \-o+PROPAGATION\fP for the \fBshared\fP flags).
+For further details, see
+.BR mount_namespaces (7)
+and the discussion of the
+.B CLONE_NEWNS
+flag in
+.BR clone (2).
  .sp
  .B unshare
  since util-linux version 2.27 automatically sets propagation to \fBprivate\fP
@@ -33,28 +45,74 @@ unshared.  It's possible to disable this feature with option
  \fB\-\-propagation unchanged\fP.
  Note that \fBprivate\fP is the kernel default.
  .TP
-.BR "UTS namespace"
+.B UTS namespace
  Setting hostname or domainname will not affect the rest of the system.
-(\fBCLONE_NEWUTS\fP flag)
+For further details, see
+.BR uts_namespaces (7)
+and the discussion of the
+.B CLONE_NEWUTS
+flag in
+.BR clone (2).
  .TP
-.BR "IPC namespace"
-The process will have an independent namespace for System V \%message queues,
-semaphore sets and shared memory segments.  (\fBCLONE_NEWIPC\fP flag)
+.B IPC namespace
+The process will have an independent namespace for POSIX message queues
+as well as System V \%message queues,
+semaphore sets and shared memory segments.
+For further details, see
+.BR ipc_namespaces (7)
+and the discussion of the
+.B CLONE_NEWIPC
+flag in
+.BR clone (2).
  .TP
-.BR "network namespace"
+.B network namespace
  The process will have independent IPv4 and IPv6 stacks, IP routing tables,
  firewall rules, the \fI/proc/net\fP and \fI/sys/class/net\fP directory trees,
-sockets, etc.  (\fBCLONE_NEWNET\fP flag)
+sockets, etc.
+For further details, see
+.BR network_namespaces (7)
+and the discussion of the
+.B CLONE_NEWNET
+flag in
+.BR clone (2).
  .TP
-.BR "pid namespace"
+.B PID namespace
  Children will have a distinct set of PID-to-process mappings from their parent.
-(\fBCLONE_NEWPID\fP flag)
+For further details, see
+.BR pid_namespaces (7)
+and
+the discussion of the
+.B CLONE_NEWPID
+flag in
+.BR clone (2).
+.TP
+.B cgroup namespace
+The process will have a virtualized view of \fI/proc\:/self\:/cgroup\fP, and new
+cgroup mounts will be rooted at the namespace cgroup root.
+For further details, see
+.BR cgroup_namespaces (7)
+and the discussion of the
+.B CLONE_NEWCGROUP
+flag in
+.BR clone (2).
  .TP
-.BR "user namespace"
+.B user namespace
  The process will have a distinct set of UIDs, GIDs and capabilities.
-(\fBCLONE_NEWUSER\fP flag)
-.PP
-See \fBclone\fR(2) for the exact semantics of the flags.
+For further details, see
+.BR user_namespaces (7)
+and the discussion of the
+.B CLONE_NEWUSER
+flag in
+.BR clone (2).
+.TP
+.B time namespace
+The process can have a distinct view of
+.B CLOCK_MONOTONIC
+and/or
+.B CLOCK_BOOTTIME
+which can be changed using \fI/proc/self/timens_offsets\fP.
+For further details, see
+.BR time_namespaces (7).
  .SH OPTIONS
  .TP
  .BR \-i , " \-\-ipc" [ =\fIfile ]
@@ -65,7 +123,7 @@ namespace is created by a bind mount.
  Unshare the mount namespace.  If \fIfile\fP is specified, then a persistent
  namespace is created by a bind mount.
  Note that \fIfile\fP has to be located on a filesystem with the propagation
-flag set to \fBprivate\fP.  Use the command \fBfindmnt -o+PROPAGATION\fP
+flag set to \fBprivate\fP.  Use the command \fBfindmnt \-o+PROPAGATION\fP
  when not sure about the current setting.  See also the examples below.
  .TP
  .BR \-n , " \-\-net" [ =\fIfile ]
@@ -74,8 +132,8 @@ namespace is created by a bind mount.
  .TP
  .BR \-p , " \-\-pid" [ =\fIfile ]
  Unshare the PID namespace.  If \fIfile\fP is specified then persistent
-namespace is created by a bind mount.  See also the \fB--fork\fP and
-\fB--mount-proc\fP options.
+namespace is created by a bind mount.  See also the \fB\-\-fork\fP and
+\fB\-\-mount-proc\fP options.
  .TP
  .BR \-u , " \-\-uts" [ =\fIfile ]
  Unshare the UTS namespace.  If \fIfile\fP is specified, then a persistent
@@ -85,10 +143,31 @@ namespace is created by a bind mount.
  Unshare the user namespace.  If \fIfile\fP is specified, then a persistent
  namespace is created by a bind mount.
  .TP
+.BR \-C , " \-\-cgroup"[=\fIfile\fP]
+Unshare the cgroup namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
+.TP
+.BR \-T , " \-\-time"[=\fIfile\fP]
+Unshare the time namespace. If \fIfile\fP is specified then a persistent
+namespace is created by a bind mount. The \fB\-\-monotonic\fP and
+\fB\-\-boottime\fP options can be used to specify the corresponding
+offset in the time namespace.
+.TP
  .BR \-f , " \-\-fork"
  Fork the specified \fIprogram\fR as a child process of \fBunshare\fR rather than
  running it directly.  This is useful when creating a new PID namespace.
  .TP
+.B \-\-keep\-caps
+When the \fB\-\-user\fP option is given, ensure that capabilities granted
+in the user namespace are preserved in the child process.
+.TP
+.BR \-\-kill\-child [ =\fIsigname ]
+When \fBunshare\fR terminates, have \fIsigname\fP be sent to the forked child process.
+Combined with \fB\-\-pid\fR this allows for an easy and reliable killing of the entire
+process tree below \fBunshare\fR.
+If not given, \fIsigname\fP defaults to \fBSIGKILL\fR.
+This option implies \fB\-\-fork\fR.
+.TP
  .BR \-\-mount\-proc [ =\fImountpoint ]
  Just before running the program, mount the proc filesystem at \fImountpoint\fP
  (default is /proc).  This is useful when creating a new PID namespace.  It also
@@ -96,6 +175,16 @@ implies creating a new mount namespace since the /proc mount would otherwise
  mess up existing programs on the system.  The new proc filesystem is explicitly
  mounted as private (with MS_PRIVATE|MS_REC).
  .TP
+.BI \-\-map\-user= uid|name
+Run the program only after the current effective user ID has been mapped to \fIuid\fP.
+If this option is specified multiple times, the last occurrence takes precedence.
+This option implies \fB\-\-user\fR.
+.TP
+.BI \-\-map\-group= gid|name
+Run the program only after the current effective group ID has been mapped to \fIgid\fP.
+If this option is specified multiple times, the last occurrence takes precedence.
+This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+.TP
  .BR \-r , " \-\-map\-root\-user"
  Run the program only after the current effective user and group IDs have been mapped to
  the superuser UID and GID in the newly created user namespace.  This makes it possible to
@@ -103,7 +192,14 @@ conveniently gain capabilities needed to manage various aspects of the newly cre
  namespaces (such as configuring interfaces in the network namespace or mounting filesystems in
  the mount namespace) even when run unprivileged.  As a mere convenience feature, it does not support
  more sophisticated use cases, such as mapping multiple ranges of UIDs and GIDs.
-This option implies \fB--setgroups=deny\fR.
+This option implies \fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+This option is equivalent to \fB\-\-map-user=0 \-\-map-group=0\fR.
+.TP
+.BR \-c , " \-\-map\-current\-user"
+Run the program only after the current effective user and group IDs have been mapped to
+the same UID and GID in the newly created user namespace. This option implies
+\fB\-\-setgroups=deny\fR and \fB\-\-user\fR.
+This option is equivalent to \fB\-\-map-user=$(id -ru) \-\-map-group=$(id -rg)\fR.
  .TP
  .BR "\-\-propagation private" | shared | slave | unchanged
  Recursively set the mount propagation flag in the new mount namespace.  The default
@@ -114,32 +210,67 @@ namespace (\fB\-\-mount\fP) is not requested.
  .BR "\-\-setgroups allow" | deny
  Allow or deny the
  .BR setgroups (2)
-syscall in user namespaces.
-
-.BR setgroups (2)
-is only callable with CAP_SETGID and CAP_SETGID in a user
-namespace.  Linux kernel since 3.19 does not give you permission to call setgroups(2)
-until after GID map has been set.  The GID map is writable by root when
-.BR setgroups (2)
-is enabled and the GID map becomes writable by unprivileged processes when
-.BR setgroups (2)
-is permanently disabled.
+system call in a user namespace.
+.sp
+To be able to call
+.BR setgroups (2),
+the calling process must at least have CAP_SETGID.
+But since Linux 3.19 a further restriction applies:
+the kernel gives permission to call
+.BR \%setgroups (2)
+only after the GID map (\fB/proc/\fIpid\fB/gid_map\fR) has been set.
+The GID map is writable by root when
+.BR \%setgroups (2)
+is enabled (i.e., \fBallow\fR, the default), and
+the GID map becomes writable by unprivileged processes when
+.BR \%setgroups (2)
+is permanently disabled (with \fBdeny\fR).
+.TP
+.BR \-R, "\-\-root=\fIdir"
+run the command with root directory set to \fIdir\fP.
+.TP
+.BR \-w, "\-\-wd=\fIdir"
+change working directory to \fIdir\fP.
+.TP
+.BR \-S, "\-\-setuid \fIuid"
+Set the user ID which will be used in the entered namespace.
+.TP
+.BR \-G, "\-\-setgid \fIgid"
+Set the group ID which will be used in the entered namespace and drop
+supplementary groups.
+.TP
+.BI \-\-monotonic " offset"
+Set the offset of
+.B CLOCK_MONOTONIC
+which will be used in the entered time namespace. This option requires
+unsharing a time namespace with \fB\-\-time\fP.
+.TP
+.BI \-\-boottime " offset"
+Set the offset of
+.B CLOCK_BOOTTIME
+which will be used in the entered time namespace. This option requires
+unsharing a time namespace with \fB\-\-time\fP.
  .TP
  .BR \-V , " \-\-version"
  Display version information and exit.
  .TP
  .BR \-h , " \-\-help"
  Display help text and exit.
-.SH EXAMPLES
+.SH NOTES
+The proc and sysfs filesystems mounting as root in a user namespace have to be
+restricted so that a less privileged user can not get more access to sensitive
+files that a more privileged user made unavailable. In short the rule for proc
+and sysfs is as close to a bind mount as possible.
+.SH EXAMPLE
  .TP
-.B # unshare --fork --pid --mount-proc readlink /proc/self
+.B # unshare \-\-fork \-\-pid \-\-mount-proc readlink /proc/self
  .TQ
  1
  .br
  Establish a PID namespace, ensure we're PID 1 in it against a newly mounted
  procfs instance.
  .TP
-.B $ unshare --map-root-user --user sh -c whoami
+.B $ unshare \-\-map-root-user \-\-user sh \-c whoami
  .TQ
  root
  .br
@@ -147,9 +278,9 @@ Establish a user namespace as an unprivileged user with a root user within it.
  .TP
  .B # touch /root/uts-ns
  .TQ
-.B # unshare --uts=/root/uts-ns hostname FOO
+.B # unshare \-\-uts=/root/uts-ns hostname FOO
  .TQ
-.B # nsenter --uts=/root/uts-ns hostname
+.B # nsenter \-\-uts=/root/uts-ns hostname
  .TQ
  FOO
  .TQ
@@ -159,22 +290,33 @@ Establish a persistent UTS namespace, and modify the hostname.  The namespace
  is then entered with \fBnsenter\fR.  The namespace is destroyed by unmounting
  the bind reference.
  .TP
-.B # mount --bind /root/namespaces /root/namespaces
+.B # mount \-\-bind /root/namespaces /root/namespaces
  .TQ
-.B # mount --make-private /root/namespaces
+.B # mount \-\-make-private /root/namespaces
  .TQ
  .B # touch /root/namespaces/mnt
  .TQ
-.B # unshare --mount=/root/namespaces/mnt
+.B # unshare \-\-mount=/root/namespaces/mnt
  .br
  Establish a persistent mount namespace referenced by the bind mount
  /root/namespaces/mnt.  This example shows a portable solution, because it
  makes sure that the bind mount is created on a shared filesystem.
+.TP
+.B # unshare \-pf \-\-kill-child \-\- bash \-c "(sleep 999 &) && sleep 1000" &
+.TQ
+.B # pid=$!
+.TQ
+.B # kill $pid
+.br
+Reliable killing of subprocesses of the \fIprogram\fR.
+When \fBunshare\fR gets killed, everything below it gets killed as well.
+Without it, the children of \fIprogram\fR would have orphaned and
+been re-parented to PID 1.
+.TP
+.B # unshare \-\-fork \-\-time \-\-boottime 100000000 uptime
+.TQ
+ 10:58:48 up 1158 days,  6:05,  1 user,  load average: 0.00, 0.00, 0.00
  
-.SH SEE ALSO
-.BR unshare (2),
-.BR clone (2),
-.BR mount (8)
  .SH AUTHORS
  .UR dottedmag@dottedmag.net
  Mikhail Gusarov
@@ -183,6 +325,11 @@ Mikhail Gusarov
  .UR kzak@redhat.com
  Karel Zak
  .UE
+.SH SEE ALSO
+.BR clone (2),
+.BR unshare (2),
+.BR namespaces (7),
+.BR mount (8)
  .SH AVAILABILITY
  The unshare command is part of the util-linux package and is available from
-ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
+https://www.kernel.org/pub/linux/utils/util-linux/.