]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/clone.2
clone.2, namespaces.7: Move some CLONE_NEWIPC text from clone.2 to namespaces.7
[thirdparty/man-pages.git] / man2 / clone.2
CommitLineData
fea681da 1.\" Copyright (c) 1992 Drew Eckhardt <drew@cs.colorado.edu>, March 28, 1992
8c7b566c 2.\" and Copyright (c) Michael Kerrisk, 2001, 2002, 2005, 2013
2297bf0e 3.\"
fd0fc519 4.\" %%%LICENSE_START(GPL_NOVERSION_ONELINE)
fea681da 5.\" May be distributed under the GNU General Public License.
fd0fc519 6.\" %%%LICENSE_END
dccaff1e 7.\"
fea681da
MK
8.\" Modified by Michael Haardt <michael@moria.de>
9.\" Modified 24 Jul 1993 by Rik Faith <faith@cs.unc.edu>
10.\" Modified 21 Aug 1994 by Michael Chastain <mec@shell.portal.com>:
11.\" New man page (copied from 'fork.2').
12.\" Modified 10 June 1995 by Andries Brouwer <aeb@cwi.nl>
13.\" Modified 25 April 1998 by Xavier Leroy <Xavier.Leroy@inria.fr>
14.\" Modified 26 Jun 2001 by Michael Kerrisk
15.\" Mostly upgraded to 2.4.x
16.\" Added prototype for sys_clone() plus description
17.\" Added CLONE_THREAD with a brief description of thread groups
c13182ef 18.\" Added CLONE_PARENT and revised entire page remove ambiguity
fea681da
MK
19.\" between "calling process" and "parent process"
20.\" Added CLONE_PTRACE and CLONE_VFORK
21.\" Added EPERM and EINVAL error codes
fd8a5be4 22.\" Renamed "__clone" to "clone" (which is the prototype in <sched.h>)
fea681da 23.\" various other minor tidy ups and clarifications.
c11b1abf 24.\" Modified 26 Jun 2001 by Michael Kerrisk <mtk.manpages@gmail.com>
d9bfdb9c 25.\" Updated notes for 2.4.7+ behavior of CLONE_THREAD
c11b1abf 26.\" Modified 15 Oct 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
fea681da
MK
27.\" Added description for CLONE_NEWNS, which was added in 2.4.19
28.\" Slightly rephrased, aeb.
29.\" Modified 1 Feb 2003 - added CLONE_SIGHAND restriction, aeb.
30.\" Modified 1 Jan 2004 - various updates, aeb
0967c11f 31.\" Modified 2004-09-10 - added CLONE_PARENT_SETTID etc. - aeb.
d9bfdb9c 32.\" 2005-04-12, mtk, noted the PID caching behavior of NPTL's getpid()
31830ef0 33.\" wrapper under BUGS.
fd8a5be4
MK
34.\" 2005-05-10, mtk, added CLONE_SYSVSEM, CLONE_UNTRACED, CLONE_STOPPED.
35.\" 2005-05-17, mtk, Substantially enhanced discussion of CLONE_THREAD.
4e836144 36.\" 2008-11-18, mtk, order CLONE_* flags alphabetically
82ee147a 37.\" 2008-11-18, mtk, document CLONE_NEWPID
43ce9dda 38.\" 2008-11-19, mtk, document CLONE_NEWUTS
667417b3 39.\" 2008-11-19, mtk, document CLONE_NEWIPC
cfdc761b 40.\" 2008-11-19, Jens Axboe, mtk, document CLONE_IO
fea681da 41.\"
8980a500 42.TH CLONE 2 2014-08-19 "Linux" "Linux Programmer's Manual"
fea681da 43.SH NAME
9b0e0996 44clone, __clone2 \- create a child process
fea681da 45.SH SYNOPSIS
c10859eb 46.nf
81f10dad
MK
47/* Prototype for the glibc wrapper function */
48
fea681da 49.B #include <sched.h>
c10859eb 50
ff929e3b
MK
51.BI "int clone(int (*" "fn" ")(void *), void *" child_stack ,
52.BI " int " flags ", void *" "arg" ", ... "
d3dbc9b1 53.BI " /* pid_t *" ptid ", struct user_desc *" tls \
ff929e3b 54", pid_t *" ctid " */ );"
81f10dad 55
e585064b 56/* Prototype for the raw system call */
81f10dad
MK
57
58.BI "long clone(unsigned long " flags ", void *" child_stack ,
59.BI " void *" ptid ", void *" ctid ,
60.BI " struct pt_regs *" regs );
c10859eb 61.fi
e73b3103
MK
62.sp
63.in -4n
81f10dad 64Feature Test Macro Requirements for glibc wrapper function (see
e73b3103
MK
65.BR feature_test_macros (7)):
66.in
67.sp
68.BR clone ():
69.ad l
70.RS 4
71.PD 0
72.TP 4
73Since glibc 2.14:
74_GNU_SOURCE
75.TP 4
bd297db0 76.\" See http://sources.redhat.com/bugzilla/show_bug.cgi?id=4749
e73b3103
MK
77Before glibc 2.14:
78_BSD_SOURCE || _SVID_SOURCE
79 /* _GNU_SOURCE also suffices */
80.PD
81.RE
82.ad b
fea681da 83.SH DESCRIPTION
edcc65ff
MK
84.BR clone ()
85creates a new process, in a manner similar to
fea681da 86.BR fork (2).
81f10dad
MK
87
88This page describes both the glibc
e511ffb6 89.BR clone ()
e585064b 90wrapper function and the underlying system call on which it is based.
81f10dad 91The main text describes the wrapper function;
e585064b 92the differences for the raw system call
81f10dad 93are described toward the end of this page.
fea681da
MK
94
95Unlike
96.BR fork (2),
81f10dad
MK
97.BR clone ()
98allows the child process to share parts of its execution context with
fea681da 99the calling process, such as the memory space, the table of file
c13182ef
MK
100descriptors, and the table of signal handlers.
101(Note that on this manual
102page, "calling process" normally corresponds to "parent process".
103But see the description of
104.B CLONE_PARENT
fea681da
MK
105below.)
106
107The main use of
edcc65ff 108.BR clone ()
fea681da
MK
109is to implement threads: multiple threads of control in a program that
110run concurrently in a shared memory space.
111
112When the child process is created with
c13182ef 113.BR clone (),
fea681da 114it executes the function
c13182ef 115.IR fn ( arg ).
fea681da 116(This differs from
c13182ef 117.BR fork (2),
fea681da 118where execution continues in the child from the point
c13182ef
MK
119of the
120.BR fork (2)
fea681da
MK
121call.)
122The
123.I fn
124argument is a pointer to a function that is called by the child
125process at the beginning of its execution.
126The
127.I arg
128argument is passed to the
129.I fn
130function.
131
c13182ef 132When the
fea681da 133.IR fn ( arg )
c13182ef
MK
134function application returns, the child process terminates.
135The integer returned by
fea681da 136.I fn
c13182ef
MK
137is the exit code for the child process.
138The child process may also terminate explicitly by calling
fea681da
MK
139.BR exit (2)
140or after receiving a fatal signal.
141
142The
143.I child_stack
c13182ef
MK
144argument specifies the location of the stack used by the child process.
145Since the child and calling process may share memory,
fea681da 146it is not possible for the child process to execute in the
c13182ef
MK
147same stack as the calling process.
148The calling process must therefore
fea681da
MK
149set up memory space for the child stack and pass a pointer to this
150space to
edcc65ff 151.BR clone ().
5fab2e7c 152Stacks grow downward on all processors that run Linux
fea681da
MK
153(except the HP PA processors), so
154.I child_stack
155usually points to the topmost address of the memory space set up for
156the child stack.
157
158The low byte of
159.I flags
fd8a5be4
MK
160contains the number of the
161.I "termination signal"
162sent to the parent when the child dies.
163If this signal is specified as anything other than
fea681da
MK
164.BR SIGCHLD ,
165then the parent process must specify the
c13182ef
MK
166.B __WALL
167or
fea681da 168.B __WCLONE
c13182ef
MK
169options when waiting for the child with
170.BR wait (2).
fea681da
MK
171If no signal is specified, then the parent process is not signaled
172when the child terminates.
173
174.I flags
fd8a5be4
MK
175may also be bitwise-or'ed with zero or more of the following constants,
176in order to specify what is shared between the calling process
fea681da 177and the child process:
fea681da 178.TP
f5dbc7c8
MK
179.BR CLONE_CHILD_CLEARTID " (since Linux 2.5.49)"
180Erase child thread ID at location
d3dbc9b1 181.I ctid
f5dbc7c8
MK
182in child memory when the child exits, and do a wakeup on the futex
183at that address.
184The address involved may be changed by the
185.BR set_tid_address (2)
186system call.
187This is used by threading libraries.
188.TP
189.BR CLONE_CHILD_SETTID " (since Linux 2.5.49)"
190Store child thread ID at location
d3dbc9b1 191.I ctid
f5dbc7c8
MK
192in child memory.
193.TP
1603d6a1 194.BR CLONE_FILES " (since Linux 2.0)"
fea681da 195If
f5dbc7c8
MK
196.B CLONE_FILES
197is set, the calling process and the child process share the same file
198descriptor table.
199Any file descriptor created by the calling process or by the child
200process is also valid in the other process.
201Similarly, if one of the processes closes a file descriptor,
202or changes its associated flags (using the
203.BR fcntl (2)
204.B F_SETFD
205operation), the other process is also affected.
fea681da
MK
206
207If
f5dbc7c8
MK
208.B CLONE_FILES
209is not set, the child process inherits a copy of all file descriptors
210opened in the calling process at the time of
211.BR clone ().
212(The duplicated file descriptors in the child refer to the
213same open file descriptions (see
214.BR open (2))
215as the corresponding file descriptors in the calling process.)
216Subsequent operations that open or close file descriptors,
217or change file descriptor flags,
218performed by either the calling
219process or the child process do not affect the other process.
fea681da 220.TP
1603d6a1 221.BR CLONE_FS " (since Linux 2.0)"
fea681da
MK
222If
223.B CLONE_FS
9ee4a2b6 224is set, the caller and the child process share the same filesystem
c13182ef 225information.
9ee4a2b6 226This includes the root of the filesystem, the current
c13182ef
MK
227working directory, and the umask.
228Any call to
fea681da
MK
229.BR chroot (2),
230.BR chdir (2),
231or
232.BR umask (2)
edcc65ff 233performed by the calling process or the child process also affects the
fea681da
MK
234other process.
235
c13182ef 236If
fea681da 237.B CLONE_FS
9ee4a2b6 238is not set, the child process works on a copy of the filesystem
fea681da 239information of the calling process at the time of the
edcc65ff 240.BR clone ()
fea681da
MK
241call.
242Calls to
243.BR chroot (2),
244.BR chdir (2),
245.BR umask (2)
246performed later by one of the processes do not affect the other process.
fea681da 247.TP
a4cc375e 248.BR CLONE_IO " (since Linux 2.6.25)"
11f27a1c
JA
249If
250.B CLONE_IO
251is set, then the new process shares an I/O context with
252the calling process.
253If this flag is not set, then (as with
254.BR fork (2))
255the new process has its own I/O context.
256
257.\" The following based on text from Jens Axboe
a113945f 258The I/O context is the I/O scope of the disk scheduler (i.e,
11f27a1c
JA
259what the I/O scheduler uses to model scheduling of a process's I/O).
260If processes share the same I/O context,
261they are treated as one by the I/O scheduler.
262As a consequence, they get to share disk time.
263For some I/O schedulers,
264.\" the anticipatory and CFQ scheduler
265if two processes share an I/O context,
266they will be allowed to interleave their disk access.
267If several threads are doing I/O on behalf of the same process
268.RB ( aio_read (3),
269for instance), they should employ
270.BR CLONE_IO
271to get better I/O performance.
272.\" with CFQ and AS.
273
274If the kernel is not configured with the
275.B CONFIG_BLOCK
276option, this flag is a no-op.
277.TP
8722311b 278.BR CLONE_NEWIPC " (since Linux 2.6.19)"
667417b3
MK
279If
280.B CLONE_NEWIPC
281is set, then create the process in a new IPC namespace.
282If this flag is not set, then (as with
06b30458 283.BR fork (2)),
667417b3
MK
284the process is created in the same IPC namespace as
285the calling process.
0236bea9 286This flag is intended for the implementation of containers.
667417b3 287
efbfd7ec 288An IPC namespace provides an isolated view of System\ V IPC objects (see
009a049e
MK
289.BR svipc (7))
290and (since Linux 2.6.30)
291.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
292.\" https://lwn.net/Articles/312232/
293POSIX message queues
294(see
295.BR mq_overview (7)).
19911fa5
MK
296The common characteristic of these IPC mechanisms is that IPC
297objects are identified by mechanisms other than filesystem
298pathnames.
009a049e 299
c440fe01 300Objects created in an IPC namespace are visible to all other processes
667417b3
MK
301that are members of that namespace,
302but are not visible to processes in other IPC namespaces.
303
83c1f4b5 304When an IPC namespace is destroyed
009a049e 305(i.e., when the last process that is a member of the namespace terminates),
83c1f4b5
MK
306all IPC objects in the namespace are automatically destroyed.
307
9343f8e7
MK
308Use of this flag requires
309that the process be privileged
667417b3
MK
310.RB ( CAP_SYS_ADMIN ).
311This flag can't be specified in conjunction with
312.BR CLONE_SYSVSEM .
9343f8e7
MK
313
314For further information on IPC namespaces, see
315.BR namespaces (7).
667417b3 316.TP
163bf178 317.BR CLONE_NEWNET " (since Linux 2.6.24)"
33a0ccb2 318(The implementation of this flag was completed only
9108d867 319by about kernel version 2.6.29.)
163bf178
MK
320
321If
322.B CLONE_NEWNET
323is set, then create the process in a new network namespace.
324If this flag is not set, then (as with
57ef8c39 325.BR fork (2))
163bf178
MK
326the process is created in the same network namespace as
327the calling process.
328This flag is intended for the implementation of containers.
329
330A network namespace provides an isolated view of the networking stack
331(network device interfaces, IPv4 and IPv6 protocol stacks,
332IP routing tables, firewall rules, the
333.I /proc/net
334and
335.I /sys/class/net
336directory trees, sockets, etc.).
337A physical network device can live in exactly one
338network namespace.
339A virtual network device ("veth") pair provides a pipe-like abstraction
bea08fec 340.\" FIXME . Add pointer to veth(4) page when it is eventually completed
163bf178
MK
341that can be used to create tunnels between network namespaces,
342and can be used to create a bridge to a physical network device
343in another namespace.
344
bf032425
SH
345When a network namespace is freed
346(i.e., when the last process in the namespace terminates),
347its physical network devices are moved back to the
348initial network namespace (not to the parent of the process).
349
163bf178
MK
350Use of this flag requires: a kernel configured with the
351.B CONFIG_NET_NS
352option and that the process be privileged
cae2ec15 353.RB ( CAP_SYS_ADMIN ).
163bf178 354.TP
c10859eb 355.BR CLONE_NEWNS " (since Linux 2.4.19)"
732e54dd 356Start the child in a new mount namespace.
fea681da 357
732e54dd 358Every process lives in a mount namespace.
c13182ef 359The
fea681da
MK
360.I namespace
361of a process is the data (the set of mounts) describing the file hierarchy
c13182ef
MK
362as seen by that process.
363After a
fea681da
MK
364.BR fork (2)
365or
2777b1ca 366.BR clone ()
fea681da
MK
367where the
368.B CLONE_NEWNS
732e54dd 369flag is not set, the child lives in the same mount
4df2eb09 370namespace as the parent.
fea681da
MK
371The system calls
372.BR mount (2)
373and
374.BR umount (2)
732e54dd 375change the mount namespace of the calling process, and hence affect
fea681da 376all processes that live in the same namespace, but do not affect
732e54dd 377processes in a different mount namespace.
fea681da
MK
378
379After a
2777b1ca 380.BR clone ()
fea681da
MK
381where the
382.B CLONE_NEWNS
732e54dd 383flag is set, the cloned child is started in a new mount namespace,
fea681da
MK
384initialized with a copy of the namespace of the parent.
385
0b9bdf82 386Only a privileged process (one having the \fBCAP_SYS_ADMIN\fP capability)
fea681da
MK
387may specify the
388.B CLONE_NEWNS
389flag.
390It is not permitted to specify both
391.B CLONE_NEWNS
392and
393.B CLONE_FS
394in the same
e511ffb6 395.BR clone ()
fea681da 396call.
70d21f17 397.TP
06b30458
MK
398.BR CLONE_NEWUSER
399(This flag first became meaningful for
400.BR clone ()
4d2b3ed7
MK
401in Linux 2.6.23,
402the current
403.BR clone()
404semantics were merged in Linux 3.5,
405and the final pieces to make the user namespaces completely usable were
406merged in Linux 3.8.)
407
70d21f17
EB
408If
409.B CLONE_NEWUSER
06b30458
MK
410is set, then create the process in a new user namespace.
411If this flag is not set, then (as with
57ef8c39 412.BR fork (2))
70d21f17
EB
413the process is created in the same user namespace as the calling process.
414
06b30458
MK
415A user namespace provides an isolated environment for
416security related identifiers, in particular,
417user IDs, group IDs, keys (see
70d21f17
EB
418.BR keyctl (2)),
419and capabilities.
420
06b30458
MK
421When a user namespace is created,
422it starts out without a mapping of user IDs (group IDs)
423to the parent user namespace.
424The desired mapping of user IDs (group IDs) to the parent user namespace
425may be set by writing into
426.IR /proc/[pid]/uid_map
427.RI ( /proc/[pid]/gid_map );
428see
429.BR proc (5).
430
431The first process in a user namespace starts out with a complete set
432of capabilities with respect to the new user namespace.
433
434System calls that return user IDs (group IDs) will return
435either the user ID (group ID) mapped into the current
436user namespace if there is a mapping, or the overflow user ID (group ID);
437the default value for the overflow user ID (group ID) is 65534.
438See the descriptions of
439.IR /proc/sys/kernel/overflowuid
440and
441.IR /proc/sys/kernel/overflowgid
442in
443.BR proc (5).
444
642ce311 445Use of this flag requires a kernel configured with the
a0efdddb
MK
446.BR CONFIG_USER_NS
447option.
fefbcba8
MK
448Before Linux 3.8, use of
449.BR CLONE_NEWUSER
450required that the caller have three capabilities:
451.BR CAP_SYS_ADMIN ,
452.BR CAP_SETUID ,
453and
454.BR CAP_SETGID .
455.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
06b30458
MK
456Starting with Linux 3.8,
457no privileges are needed to create a user namespace,
458and mount, PID, IPC, network, and UTS namespaces can be created with just the
459.B CAP_SYS_ADMIN
460capability in the caller's user namespace.
461
730e9c01
MK
462If
463.BR CLONE_NEWUSER
464is specified along with other
465.B CLONE_NEW*
466flags in a single
467.BR clone()
468call, the user namespace is guaranteed to be created first,
469giving the caller privileges over the remaining
470namespaces created by the call.
471Thus, it possible for an unprivileged caller to specify this combination
472of flags.
473
06b30458
MK
474Over the years, there have been a lot of features that have been added
475to the Linux kernel that are only available to privileged users
476because of their potential to confuse set-user-ID-root applications.
477In general, it becomes safe to allow the root user in a user namespace to
478use those features because it is impossible, while in a user namespace,
70d21f17
EB
479to gain more privilege than the root user of a user namespace has.
480
fea681da 481.TP
82ee147a
MK
482.BR CLONE_NEWPID " (since Linux 2.6.24)"
483.\" This explanation draws a lot of details from
484.\" http://lwn.net/Articles/259217/
485.\" Authors: Pavel Emelyanov <xemul@openvz.org>
486.\" and Kir Kolyshkin <kir@openvz.org>
487.\"
488.\" The primary kernel commit is 30e49c263e36341b60b735cbef5ca37912549264
489.\" Author: Pavel Emelyanov <xemul@openvz.org>
490If
5c95e5e8 491.B CLONE_NEWPID
82ee147a
MK
492is set, then create the process in a new PID namespace.
493If this flag is not set, then (as with
57ef8c39 494.BR fork (2))
82ee147a
MK
495the process is created in the same PID namespace as
496the calling process.
0236bea9 497This flag is intended for the implementation of containers.
82ee147a
MK
498
499A PID namespace provides an isolated environment for PIDs:
500PIDs in a new namespace start at 1,
501somewhat like a standalone system, and calls to
502.BR fork (2),
503.BR vfork (2),
504or
27d47e71 505.BR clone ()
5584229c 506will produce processes with PIDs that are unique within the namespace.
82ee147a
MK
507
508The first process created in a new namespace
509(i.e., the process created using the
510.BR CLONE_NEWPID
511flag) has the PID 1, and is the "init" process for the namespace.
512Children that are orphaned within the namespace will be reparented
513to this process rather than
514.BR init (8).
515Unlike the traditional
516.B init
517process, the "init" process of a PID namespace can terminate,
518and if it does, all of the processes in the namespace are terminated.
519
520PID namespaces form a hierarchy.