]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/clone.2
clone.2: Reword discussion of CLONE_NEWNS, removing text also in namespaces(7)
[thirdparty/man-pages.git] / man2 / clone.2
CommitLineData
fea681da 1.\" Copyright (c) 1992 Drew Eckhardt <drew@cs.colorado.edu>, March 28, 1992
8c7b566c 2.\" and Copyright (c) Michael Kerrisk, 2001, 2002, 2005, 2013
2297bf0e 3.\"
fd0fc519 4.\" %%%LICENSE_START(GPL_NOVERSION_ONELINE)
fea681da 5.\" May be distributed under the GNU General Public License.
fd0fc519 6.\" %%%LICENSE_END
dccaff1e 7.\"
fea681da
MK
8.\" Modified by Michael Haardt <michael@moria.de>
9.\" Modified 24 Jul 1993 by Rik Faith <faith@cs.unc.edu>
10.\" Modified 21 Aug 1994 by Michael Chastain <mec@shell.portal.com>:
11.\" New man page (copied from 'fork.2').
12.\" Modified 10 June 1995 by Andries Brouwer <aeb@cwi.nl>
13.\" Modified 25 April 1998 by Xavier Leroy <Xavier.Leroy@inria.fr>
14.\" Modified 26 Jun 2001 by Michael Kerrisk
15.\" Mostly upgraded to 2.4.x
16.\" Added prototype for sys_clone() plus description
17.\" Added CLONE_THREAD with a brief description of thread groups
c13182ef 18.\" Added CLONE_PARENT and revised entire page remove ambiguity
fea681da
MK
19.\" between "calling process" and "parent process"
20.\" Added CLONE_PTRACE and CLONE_VFORK
21.\" Added EPERM and EINVAL error codes
fd8a5be4 22.\" Renamed "__clone" to "clone" (which is the prototype in <sched.h>)
fea681da 23.\" various other minor tidy ups and clarifications.
c11b1abf 24.\" Modified 26 Jun 2001 by Michael Kerrisk <mtk.manpages@gmail.com>
d9bfdb9c 25.\" Updated notes for 2.4.7+ behavior of CLONE_THREAD
c11b1abf 26.\" Modified 15 Oct 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
fea681da
MK
27.\" Added description for CLONE_NEWNS, which was added in 2.4.19
28.\" Slightly rephrased, aeb.
29.\" Modified 1 Feb 2003 - added CLONE_SIGHAND restriction, aeb.
30.\" Modified 1 Jan 2004 - various updates, aeb
0967c11f 31.\" Modified 2004-09-10 - added CLONE_PARENT_SETTID etc. - aeb.
d9bfdb9c 32.\" 2005-04-12, mtk, noted the PID caching behavior of NPTL's getpid()
31830ef0 33.\" wrapper under BUGS.
fd8a5be4
MK
34.\" 2005-05-10, mtk, added CLONE_SYSVSEM, CLONE_UNTRACED, CLONE_STOPPED.
35.\" 2005-05-17, mtk, Substantially enhanced discussion of CLONE_THREAD.
4e836144 36.\" 2008-11-18, mtk, order CLONE_* flags alphabetically
82ee147a 37.\" 2008-11-18, mtk, document CLONE_NEWPID
43ce9dda 38.\" 2008-11-19, mtk, document CLONE_NEWUTS
667417b3 39.\" 2008-11-19, mtk, document CLONE_NEWIPC
cfdc761b 40.\" 2008-11-19, Jens Axboe, mtk, document CLONE_IO
fea681da 41.\"
8980a500 42.TH CLONE 2 2014-08-19 "Linux" "Linux Programmer's Manual"
fea681da 43.SH NAME
9b0e0996 44clone, __clone2 \- create a child process
fea681da 45.SH SYNOPSIS
c10859eb 46.nf
81f10dad
MK
47/* Prototype for the glibc wrapper function */
48
fea681da 49.B #include <sched.h>
c10859eb 50
ff929e3b
MK
51.BI "int clone(int (*" "fn" ")(void *), void *" child_stack ,
52.BI " int " flags ", void *" "arg" ", ... "
d3dbc9b1 53.BI " /* pid_t *" ptid ", struct user_desc *" tls \
ff929e3b 54", pid_t *" ctid " */ );"
81f10dad 55
e585064b 56/* Prototype for the raw system call */
81f10dad
MK
57
58.BI "long clone(unsigned long " flags ", void *" child_stack ,
59.BI " void *" ptid ", void *" ctid ,
60.BI " struct pt_regs *" regs );
c10859eb 61.fi
e73b3103
MK
62.sp
63.in -4n
81f10dad 64Feature Test Macro Requirements for glibc wrapper function (see
e73b3103
MK
65.BR feature_test_macros (7)):
66.in
67.sp
68.BR clone ():
69.ad l
70.RS 4
71.PD 0
72.TP 4
73Since glibc 2.14:
74_GNU_SOURCE
75.TP 4
bd297db0 76.\" See http://sources.redhat.com/bugzilla/show_bug.cgi?id=4749
e73b3103
MK
77Before glibc 2.14:
78_BSD_SOURCE || _SVID_SOURCE
79 /* _GNU_SOURCE also suffices */
80.PD
81.RE
82.ad b
fea681da 83.SH DESCRIPTION
edcc65ff
MK
84.BR clone ()
85creates a new process, in a manner similar to
fea681da 86.BR fork (2).
81f10dad
MK
87
88This page describes both the glibc
e511ffb6 89.BR clone ()
e585064b 90wrapper function and the underlying system call on which it is based.
81f10dad 91The main text describes the wrapper function;
e585064b 92the differences for the raw system call
81f10dad 93are described toward the end of this page.
fea681da
MK
94
95Unlike
96.BR fork (2),
81f10dad
MK
97.BR clone ()
98allows the child process to share parts of its execution context with
fea681da 99the calling process, such as the memory space, the table of file
c13182ef
MK
100descriptors, and the table of signal handlers.
101(Note that on this manual
102page, "calling process" normally corresponds to "parent process".
103But see the description of
104.B CLONE_PARENT
fea681da
MK
105below.)
106
107The main use of
edcc65ff 108.BR clone ()
fea681da
MK
109is to implement threads: multiple threads of control in a program that
110run concurrently in a shared memory space.
111
112When the child process is created with
c13182ef 113.BR clone (),
fea681da 114it executes the function
c13182ef 115.IR fn ( arg ).
fea681da 116(This differs from
c13182ef 117.BR fork (2),
fea681da 118where execution continues in the child from the point
c13182ef
MK
119of the
120.BR fork (2)
fea681da
MK
121call.)
122The
123.I fn
124argument is a pointer to a function that is called by the child
125process at the beginning of its execution.
126The
127.I arg
128argument is passed to the
129.I fn
130function.
131
c13182ef 132When the
fea681da 133.IR fn ( arg )
c13182ef
MK
134function application returns, the child process terminates.
135The integer returned by
fea681da 136.I fn
c13182ef
MK
137is the exit code for the child process.
138The child process may also terminate explicitly by calling
fea681da
MK
139.BR exit (2)
140or after receiving a fatal signal.
141
142The
143.I child_stack
c13182ef
MK
144argument specifies the location of the stack used by the child process.
145Since the child and calling process may share memory,
fea681da 146it is not possible for the child process to execute in the
c13182ef
MK
147same stack as the calling process.
148The calling process must therefore
fea681da
MK
149set up memory space for the child stack and pass a pointer to this
150space to
edcc65ff 151.BR clone ().
5fab2e7c 152Stacks grow downward on all processors that run Linux
fea681da
MK
153(except the HP PA processors), so
154.I child_stack
155usually points to the topmost address of the memory space set up for
156the child stack.
157
158The low byte of
159.I flags
fd8a5be4
MK
160contains the number of the
161.I "termination signal"
162sent to the parent when the child dies.
163If this signal is specified as anything other than
fea681da
MK
164.BR SIGCHLD ,
165then the parent process must specify the
c13182ef
MK
166.B __WALL
167or
fea681da 168.B __WCLONE
c13182ef
MK
169options when waiting for the child with
170.BR wait (2).
fea681da
MK
171If no signal is specified, then the parent process is not signaled
172when the child terminates.
173
174.I flags
fd8a5be4
MK
175may also be bitwise-or'ed with zero or more of the following constants,
176in order to specify what is shared between the calling process
fea681da 177and the child process:
fea681da 178.TP
f5dbc7c8
MK
179.BR CLONE_CHILD_CLEARTID " (since Linux 2.5.49)"
180Erase child thread ID at location
d3dbc9b1 181.I ctid
f5dbc7c8
MK
182in child memory when the child exits, and do a wakeup on the futex
183at that address.
184The address involved may be changed by the
185.BR set_tid_address (2)
186system call.
187This is used by threading libraries.
188.TP
189.BR CLONE_CHILD_SETTID " (since Linux 2.5.49)"
190Store child thread ID at location
d3dbc9b1 191.I ctid
f5dbc7c8
MK
192in child memory.
193.TP
1603d6a1 194.BR CLONE_FILES " (since Linux 2.0)"
fea681da 195If
f5dbc7c8
MK
196.B CLONE_FILES
197is set, the calling process and the child process share the same file
198descriptor table.
199Any file descriptor created by the calling process or by the child
200process is also valid in the other process.
201Similarly, if one of the processes closes a file descriptor,
202or changes its associated flags (using the
203.BR fcntl (2)
204.B F_SETFD
205operation), the other process is also affected.
fea681da
MK
206
207If
f5dbc7c8
MK
208.B CLONE_FILES
209is not set, the child process inherits a copy of all file descriptors
210opened in the calling process at the time of
211.BR clone ().
212(The duplicated file descriptors in the child refer to the
213same open file descriptions (see
214.BR open (2))
215as the corresponding file descriptors in the calling process.)
216Subsequent operations that open or close file descriptors,
217or change file descriptor flags,
218performed by either the calling
219process or the child process do not affect the other process.
fea681da 220.TP
1603d6a1 221.BR CLONE_FS " (since Linux 2.0)"
fea681da
MK
222If
223.B CLONE_FS
9ee4a2b6 224is set, the caller and the child process share the same filesystem
c13182ef 225information.
9ee4a2b6 226This includes the root of the filesystem, the current
c13182ef
MK
227working directory, and the umask.
228Any call to
fea681da
MK
229.BR chroot (2),
230.BR chdir (2),
231or
232.BR umask (2)
edcc65ff 233performed by the calling process or the child process also affects the
fea681da
MK
234other process.
235
c13182ef 236If
fea681da 237.B CLONE_FS
9ee4a2b6 238is not set, the child process works on a copy of the filesystem
fea681da 239information of the calling process at the time of the
edcc65ff 240.BR clone ()
fea681da
MK
241call.
242Calls to
243.BR chroot (2),
244.BR chdir (2),
245.BR umask (2)
246performed later by one of the processes do not affect the other process.
fea681da 247.TP
a4cc375e 248.BR CLONE_IO " (since Linux 2.6.25)"
11f27a1c
JA
249If
250.B CLONE_IO
251is set, then the new process shares an I/O context with
252the calling process.
253If this flag is not set, then (as with
254.BR fork (2))
255the new process has its own I/O context.
256
257.\" The following based on text from Jens Axboe
a113945f 258The I/O context is the I/O scope of the disk scheduler (i.e,
11f27a1c
JA
259what the I/O scheduler uses to model scheduling of a process's I/O).
260If processes share the same I/O context,
261they are treated as one by the I/O scheduler.
262As a consequence, they get to share disk time.
263For some I/O schedulers,
264.\" the anticipatory and CFQ scheduler
265if two processes share an I/O context,
266they will be allowed to interleave their disk access.
267If several threads are doing I/O on behalf of the same process
268.RB ( aio_read (3),
269for instance), they should employ
270.BR CLONE_IO
271to get better I/O performance.
272.\" with CFQ and AS.
273
274If the kernel is not configured with the
275.B CONFIG_BLOCK
276option, this flag is a no-op.
277.TP
8722311b 278.BR CLONE_NEWIPC " (since Linux 2.6.19)"
667417b3
MK
279If
280.B CLONE_NEWIPC
281is set, then create the process in a new IPC namespace.
282If this flag is not set, then (as with
06b30458 283.BR fork (2)),
667417b3
MK
284the process is created in the same IPC namespace as
285the calling process.
0236bea9 286This flag is intended for the implementation of containers.
667417b3 287
efbfd7ec 288An IPC namespace provides an isolated view of System\ V IPC objects (see
009a049e
MK
289.BR svipc (7))
290and (since Linux 2.6.30)
291.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
292.\" https://lwn.net/Articles/312232/
293POSIX message queues
294(see
295.BR mq_overview (7)).
19911fa5
MK
296The common characteristic of these IPC mechanisms is that IPC
297objects are identified by mechanisms other than filesystem
298pathnames.
009a049e 299
c440fe01 300Objects created in an IPC namespace are visible to all other processes
667417b3
MK
301that are members of that namespace,
302but are not visible to processes in other IPC namespaces.
303
83c1f4b5 304When an IPC namespace is destroyed
009a049e 305(i.e., when the last process that is a member of the namespace terminates),
83c1f4b5
MK
306all IPC objects in the namespace are automatically destroyed.
307
9343f8e7
MK
308Use of this flag requires
309that the process be privileged
667417b3
MK
310.RB ( CAP_SYS_ADMIN ).
311This flag can't be specified in conjunction with
312.BR CLONE_SYSVSEM .
9343f8e7
MK
313
314For further information on IPC namespaces, see
315.BR namespaces (7).
667417b3 316.TP
163bf178 317.BR CLONE_NEWNET " (since Linux 2.6.24)"
33a0ccb2 318(The implementation of this flag was completed only
9108d867 319by about kernel version 2.6.29.)
163bf178
MK
320
321If
322.B CLONE_NEWNET
323is set, then create the process in a new network namespace.
324If this flag is not set, then (as with
57ef8c39 325.BR fork (2))
163bf178
MK
326the process is created in the same network namespace as
327the calling process.
328This flag is intended for the implementation of containers.
329
330A network namespace provides an isolated view of the networking stack
331(network device interfaces, IPv4 and IPv6 protocol stacks,
332IP routing tables, firewall rules, the
333.I /proc/net
334and
335.I /sys/class/net
336directory trees, sockets, etc.).
337A physical network device can live in exactly one
338network namespace.
339A virtual network device ("veth") pair provides a pipe-like abstraction
bea08fec 340.\" FIXME . Add pointer to veth(4) page when it is eventually completed
163bf178
MK
341that can be used to create tunnels between network namespaces,
342and can be used to create a bridge to a physical network device
343in another namespace.
344
bf032425
SH
345When a network namespace is freed
346(i.e., when the last process in the namespace terminates),
347its physical network devices are moved back to the
348initial network namespace (not to the parent of the process).
73680728
MK
349For further information on network namespaces, see
350.BR namespaces (7).
bf032425 351
73680728
MK
352Use of this flag requires
353that the process be privileged
cae2ec15 354.RB ( CAP_SYS_ADMIN ).
3dd2331c 355
163bf178 356.TP
c10859eb 357.BR CLONE_NEWNS " (since Linux 2.4.19)"
3dd2331c
MK
358If
359.B CLONE_NEWNS
360is set, the cloned child is started in a new mount namespace,
361initialized with a copy of the namespace of the parent.
362If
fea681da 363.B CLONE_NEWNS
3dd2331c 364is not set, the child lives in the same mount
4df2eb09 365namespace as the parent.
fea681da 366
3dd2331c
MK
367For further information on mount namespaces, see
368.BR namespaces (7).
fea681da 369
0b9bdf82 370Only a privileged process (one having the \fBCAP_SYS_ADMIN\fP capability)
fea681da
MK
371may specify the
372.B CLONE_NEWNS
373flag.
374It is not permitted to specify both
375.B CLONE_NEWNS
376and
377.B CLONE_FS
378in the same
e511ffb6 379.BR clone ()
fea681da 380call.
3dd2331c 381
70d21f17 382.TP
06b30458
MK
383.BR CLONE_NEWUSER
384(This flag first became meaningful for
385.BR clone ()
4d2b3ed7
MK
386in Linux 2.6.23,
387the current
388.BR clone()
389semantics were merged in Linux 3.5,
390and the final pieces to make the user namespaces completely usable were
391merged in Linux 3.8.)
392
70d21f17
EB
393If
394.B CLONE_NEWUSER
06b30458
MK
395is set, then create the process in a new user namespace.
396If this flag is not set, then (as with
57ef8c39 397.BR fork (2))
70d21f17
EB
398the process is created in the same user namespace as the calling process.
399
06b30458
MK
400A user namespace provides an isolated environment for
401security related identifiers, in particular,
402user IDs, group IDs, keys (see
70d21f17
EB
403.BR keyctl (2)),
404and capabilities.
405
06b30458
MK
406When a user namespace is created,
407it starts out without a mapping of user IDs (group IDs)
408to the parent user namespace.
409The desired mapping of user IDs (group IDs) to the parent user namespace
410may be set by writing into
411.IR /proc/[pid]/uid_map
412.RI ( /proc/[pid]/gid_map );
413see
414.BR proc (5).
415
416The first process in a user namespace starts out with a complete set
417of capabilities with respect to the new user namespace.
418
419System calls that return user IDs (group IDs) will return
420either the user ID (group ID) mapped into the current
421user namespace if there is a mapping, or the overflow user ID (group ID);
422the default value for the overflow user ID (group ID) is 65534.
423See the descriptions of
424.IR /proc/sys/kernel/overflowuid
425and
426.IR /proc/sys/kernel/overflowgid
427in
428.BR proc (5).
429
642ce311 430Use of this flag requires a kernel configured with the
a0efdddb
MK
431.BR CONFIG_USER_NS
432option.
fefbcba8
MK
433Before Linux 3.8, use of
434.BR CLONE_NEWUSER
435required that the caller have three capabilities:
436.BR CAP_SYS_ADMIN ,
437.BR CAP_SETUID ,
438and
439.BR CAP_SETGID .
440.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
06b30458
MK
441Starting with Linux 3.8,
442no privileges are needed to create a user namespace,
443and mount, PID, IPC, network, and UTS namespaces can be created with just the
444.B CAP_SYS_ADMIN
445capability in the caller's user namespace.
446
730e9c01
MK
447If
448.BR CLONE_NEWUSER
449is specified along with other
450.B CLONE_NEW*
451flags in a single
452.BR clone()
453call, the user namespace is guaranteed to be created first,
454giving the caller privileges over the remaining
455namespaces created by the call.
456Thus, it possible for an unprivileged caller to specify this combination
457of flags.
458
06b30458
MK
459Over the years, there have been a lot of features that have been added
460to the Linux kernel that are only available to privileged users
461because of their potential to confuse set-user-ID-root applications.
462In general, it becomes safe to allow the root user in a user namespace to
463use those features because it is impossible, while in a user namespace,
70d21f17
EB
464to gain more privilege than the root user of a user namespace has.
465
fea681da 466.TP
82ee147a
MK
467.BR CLONE_NEWPID " (since Linux 2.6.24)"
468.\" This explanation draws a lot of details from
469.\" http://lwn.net/Articles/259217/
470.\" Authors: Pavel Emelyanov <xemul@openvz.org>
471.\" and Kir Kolyshkin <kir@openvz.org>
472.\"
473.\" The primary kernel commit is 30e49c263e36341b60b735cbef5ca37912549264
474.\" Author: Pavel Emelyanov <xemul@openvz.org>
475If
5c95e5e8 476.B CLONE_NEWPID
82ee147a
MK
477is set, then create the process in a new PID namespace.
478If this flag is not set, then (as with
57ef8c39 479.BR fork (2))
82ee147a
MK
480the process is created in the same PID namespace as
481the calling process.
0236bea9 482This flag is intended for the implementation of containers.
82ee147a
MK
483
484A PID namespace provides an isolated environment for PIDs:
485PIDs in a new namespace start at 1,
486somewhat like a standalone system, and calls to
487.BR fork (2),
488.BR vfork (2),
489or
27d47e71 490.BR clone ()
5584229c 491will produce processes with PIDs that are unique within the namespace.
82ee147a
MK
492
493The first process created in a new namespace
494(i.e., the process created using the
495.BR CLONE_NEWPID
496flag) has the PID 1, and is the "init" process for the namespace.
497Children that are orphaned within the namespace will be reparented
498to this process rather than
499.BR init (8).
500Unlike the traditional
501.B init
502process, the "init" process of a PID namespace can terminate,
503and if it does, all of the processes in the namespace are terminated.
504
505PID namespaces form a hierarchy.