]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/clone.2
setns.2: Minor changes to example program discussion
[thirdparty/man-pages.git] / man2 / clone.2
CommitLineData
fea681da
MK
1.\" Hey Emacs! This file is -*- nroff -*- source.
2.\"
3.\" Copyright (c) 1992 Drew Eckhardt <drew@cs.colorado.edu>, March 28, 1992
8c7b566c 4.\" and Copyright (c) Michael Kerrisk, 2001, 2002, 2005, 2013
fea681da
MK
5.\" May be distributed under the GNU General Public License.
6.\" Modified by Michael Haardt <michael@moria.de>
7.\" Modified 24 Jul 1993 by Rik Faith <faith@cs.unc.edu>
8.\" Modified 21 Aug 1994 by Michael Chastain <mec@shell.portal.com>:
9.\" New man page (copied from 'fork.2').
10.\" Modified 10 June 1995 by Andries Brouwer <aeb@cwi.nl>
11.\" Modified 25 April 1998 by Xavier Leroy <Xavier.Leroy@inria.fr>
12.\" Modified 26 Jun 2001 by Michael Kerrisk
13.\" Mostly upgraded to 2.4.x
14.\" Added prototype for sys_clone() plus description
15.\" Added CLONE_THREAD with a brief description of thread groups
c13182ef 16.\" Added CLONE_PARENT and revised entire page remove ambiguity
fea681da
MK
17.\" between "calling process" and "parent process"
18.\" Added CLONE_PTRACE and CLONE_VFORK
19.\" Added EPERM and EINVAL error codes
fd8a5be4 20.\" Renamed "__clone" to "clone" (which is the prototype in <sched.h>)
fea681da 21.\" various other minor tidy ups and clarifications.
c11b1abf 22.\" Modified 26 Jun 2001 by Michael Kerrisk <mtk.manpages@gmail.com>
d9bfdb9c 23.\" Updated notes for 2.4.7+ behavior of CLONE_THREAD
c11b1abf 24.\" Modified 15 Oct 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
fea681da
MK
25.\" Added description for CLONE_NEWNS, which was added in 2.4.19
26.\" Slightly rephrased, aeb.
27.\" Modified 1 Feb 2003 - added CLONE_SIGHAND restriction, aeb.
28.\" Modified 1 Jan 2004 - various updates, aeb
0967c11f 29.\" Modified 2004-09-10 - added CLONE_PARENT_SETTID etc. - aeb.
d9bfdb9c 30.\" 2005-04-12, mtk, noted the PID caching behavior of NPTL's getpid()
31830ef0 31.\" wrapper under BUGS.
fd8a5be4
MK
32.\" 2005-05-10, mtk, added CLONE_SYSVSEM, CLONE_UNTRACED, CLONE_STOPPED.
33.\" 2005-05-17, mtk, Substantially enhanced discussion of CLONE_THREAD.
4e836144 34.\" 2008-11-18, mtk, order CLONE_* flags alphabetically
82ee147a 35.\" 2008-11-18, mtk, document CLONE_NEWPID
43ce9dda 36.\" 2008-11-19, mtk, document CLONE_NEWUTS
667417b3 37.\" 2008-11-19, mtk, document CLONE_NEWIPC
cfdc761b 38.\" 2008-11-19, Jens Axboe, mtk, document CLONE_IO
fea681da 39.\"
185341d4
MK
40.\" FIXME Document CLONE_NEWUSER, which is new in 2.6.23
41.\" (also supported for unshare()?)
360ed6b3 42.\"
8c7b566c 43.TH CLONE 2 2013-01-01 "Linux" "Linux Programmer's Manual"
fea681da 44.SH NAME
9b0e0996 45clone, __clone2 \- create a child process
fea681da 46.SH SYNOPSIS
c10859eb 47.nf
86b91fdf 48.BR "#define _GNU_SOURCE" " /* See feature_test_macros(7) */"
cc4615cc 49.\" Actually _BSD_SOURCE || _SVID_SOURCE
a4405ff9 50.\" FIXME See http://sources.redhat.com/bugzilla/show_bug.cgi?id=4749
fea681da 51.B #include <sched.h>
c10859eb 52
ff929e3b
MK
53.BI "int clone(int (*" "fn" ")(void *), void *" child_stack ,
54.BI " int " flags ", void *" "arg" ", ... "
d3dbc9b1 55.BI " /* pid_t *" ptid ", struct user_desc *" tls \
ff929e3b 56", pid_t *" ctid " */ );"
c10859eb 57.fi
fea681da 58.SH DESCRIPTION
edcc65ff
MK
59.BR clone ()
60creates a new process, in a manner similar to
fea681da 61.BR fork (2).
735f354f 62It is actually a library function layered on top of the underlying
e511ffb6 63.BR clone ()
fea681da
MK
64system call, hereinafter referred to as
65.BR sys_clone .
66A description of
0daa9e92 67.B sys_clone
5fab2e7c 68is given toward the end of this page.
fea681da
MK
69
70Unlike
71.BR fork (2),
c13182ef 72these calls
fea681da
MK
73allow the child process to share parts of its execution context with
74the calling process, such as the memory space, the table of file
c13182ef
MK
75descriptors, and the table of signal handlers.
76(Note that on this manual
77page, "calling process" normally corresponds to "parent process".
78But see the description of
79.B CLONE_PARENT
fea681da
MK
80below.)
81
82The main use of
edcc65ff 83.BR clone ()
fea681da
MK
84is to implement threads: multiple threads of control in a program that
85run concurrently in a shared memory space.
86
87When the child process is created with
c13182ef 88.BR clone (),
fea681da 89it executes the function
c13182ef 90.IR fn ( arg ).
fea681da 91(This differs from
c13182ef 92.BR fork (2),
fea681da 93where execution continues in the child from the point
c13182ef
MK
94of the
95.BR fork (2)
fea681da
MK
96call.)
97The
98.I fn
99argument is a pointer to a function that is called by the child
100process at the beginning of its execution.
101The
102.I arg
103argument is passed to the
104.I fn
105function.
106
c13182ef 107When the
fea681da 108.IR fn ( arg )
c13182ef
MK
109function application returns, the child process terminates.
110The integer returned by
fea681da 111.I fn
c13182ef
MK
112is the exit code for the child process.
113The child process may also terminate explicitly by calling
fea681da
MK
114.BR exit (2)
115or after receiving a fatal signal.
116
117The
118.I child_stack
c13182ef
MK
119argument specifies the location of the stack used by the child process.
120Since the child and calling process may share memory,
fea681da 121it is not possible for the child process to execute in the
c13182ef
MK
122same stack as the calling process.
123The calling process must therefore
fea681da
MK
124set up memory space for the child stack and pass a pointer to this
125space to
edcc65ff 126.BR clone ().
5fab2e7c 127Stacks grow downward on all processors that run Linux
fea681da
MK
128(except the HP PA processors), so
129.I child_stack
130usually points to the topmost address of the memory space set up for
131the child stack.
132
133The low byte of
134.I flags
fd8a5be4
MK
135contains the number of the
136.I "termination signal"
137sent to the parent when the child dies.
138If this signal is specified as anything other than
fea681da
MK
139.BR SIGCHLD ,
140then the parent process must specify the
c13182ef
MK
141.B __WALL
142or
fea681da 143.B __WCLONE
c13182ef
MK
144options when waiting for the child with
145.BR wait (2).
fea681da
MK
146If no signal is specified, then the parent process is not signaled
147when the child terminates.
148
149.I flags
fd8a5be4
MK
150may also be bitwise-or'ed with zero or more of the following constants,
151in order to specify what is shared between the calling process
fea681da 152and the child process:
fea681da 153.TP
f5dbc7c8
MK
154.BR CLONE_CHILD_CLEARTID " (since Linux 2.5.49)"
155Erase child thread ID at location
d3dbc9b1 156.I ctid
f5dbc7c8
MK
157in child memory when the child exits, and do a wakeup on the futex
158at that address.
159The address involved may be changed by the
160.BR set_tid_address (2)
161system call.
162This is used by threading libraries.
163.TP
164.BR CLONE_CHILD_SETTID " (since Linux 2.5.49)"
165Store child thread ID at location
d3dbc9b1 166.I ctid
f5dbc7c8
MK
167in child memory.
168.TP
1603d6a1 169.BR CLONE_FILES " (since Linux 2.0)"
fea681da 170If
f5dbc7c8
MK
171.B CLONE_FILES
172is set, the calling process and the child process share the same file
173descriptor table.
174Any file descriptor created by the calling process or by the child
175process is also valid in the other process.
176Similarly, if one of the processes closes a file descriptor,
177or changes its associated flags (using the
178.BR fcntl (2)
179.B F_SETFD
180operation), the other process is also affected.
fea681da
MK
181
182If
f5dbc7c8
MK
183.B CLONE_FILES
184is not set, the child process inherits a copy of all file descriptors
185opened in the calling process at the time of
186.BR clone ().
187(The duplicated file descriptors in the child refer to the
188same open file descriptions (see
189.BR open (2))
190as the corresponding file descriptors in the calling process.)
191Subsequent operations that open or close file descriptors,
192or change file descriptor flags,
193performed by either the calling
194process or the child process do not affect the other process.
fea681da 195.TP
1603d6a1 196.BR CLONE_FS " (since Linux 2.0)"
fea681da
MK
197If
198.B CLONE_FS
314c8ff4 199is set, the caller and the child process share the same file system
c13182ef
MK
200information.
201This includes the root of the file system, the current
202working directory, and the umask.
203Any call to
fea681da
MK
204.BR chroot (2),
205.BR chdir (2),
206or
207.BR umask (2)
edcc65ff 208performed by the calling process or the child process also affects the
fea681da
MK
209other process.
210
c13182ef 211If
fea681da
MK
212.B CLONE_FS
213is not set, the child process works on a copy of the file system
214information of the calling process at the time of the
edcc65ff 215.BR clone ()
fea681da
MK
216call.
217Calls to
218.BR chroot (2),
219.BR chdir (2),
220.BR umask (2)
221performed later by one of the processes do not affect the other process.
fea681da 222.TP
a4cc375e 223.BR CLONE_IO " (since Linux 2.6.25)"
11f27a1c
JA
224If
225.B CLONE_IO
226is set, then the new process shares an I/O context with
227the calling process.
228If this flag is not set, then (as with
229.BR fork (2))
230the new process has its own I/O context.
231
232.\" The following based on text from Jens Axboe
a113945f 233The I/O context is the I/O scope of the disk scheduler (i.e,
11f27a1c
JA
234what the I/O scheduler uses to model scheduling of a process's I/O).
235If processes share the same I/O context,
236they are treated as one by the I/O scheduler.
237As a consequence, they get to share disk time.
238For some I/O schedulers,
239.\" the anticipatory and CFQ scheduler
240if two processes share an I/O context,
241they will be allowed to interleave their disk access.
242If several threads are doing I/O on behalf of the same process
243.RB ( aio_read (3),
244for instance), they should employ
245.BR CLONE_IO
246to get better I/O performance.
247.\" with CFQ and AS.
248
249If the kernel is not configured with the
250.B CONFIG_BLOCK
251option, this flag is a no-op.
252.TP
8722311b 253.BR CLONE_NEWIPC " (since Linux 2.6.19)"
667417b3
MK
254If
255.B CLONE_NEWIPC
256is set, then create the process in a new IPC namespace.
257If this flag is not set, then (as with
258.BR fork (2)),
259the process is created in the same IPC namespace as
260the calling process.
0236bea9 261This flag is intended for the implementation of containers.
667417b3
MK
262
263An IPC namespace consists of the set of identifiers for
264System V IPC objects.
265(These objects are created using
266.BR msgctl (2),
267.BR semctl (2),
268and
269.BR shmctl (2)).
c440fe01 270Objects created in an IPC namespace are visible to all other processes
667417b3
MK
271that are members of that namespace,
272but are not visible to processes in other IPC namespaces.
273
89f54905
MK
274Since Linux 2.6.30,
275.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
276.\" https://lwn.net/Articles/312232/
277.B CLONE_NEWIPC
278also supports POSIX message queues, meaning that
279.B CLONE_NEWIPC
280causes a new instance of a POSIX message queue file system (see
281.BR mq_overview (7))
ffec0800 282to be create.
89f54905 283
83c1f4b5
MK
284When an IPC namespace is destroyed
285(i.e, when the last process that is a member of the namespace terminates),
286all IPC objects in the namespace are automatically destroyed.
287
667417b3
MK
288Use of this flag requires: a kernel configured with the
289.B CONFIG_SYSVIPC
290and
291.B CONFIG_IPC_NS
c8e18bd1 292options and that the process be privileged
667417b3
MK
293.RB ( CAP_SYS_ADMIN ).
294This flag can't be specified in conjunction with
295.BR CLONE_SYSVSEM .
296.TP
163bf178 297.BR CLONE_NEWNET " (since Linux 2.6.24)"
b9145b2c 298.\" FIXME Check when the implementation was completed
9108d867
MK
299(The implementation of this flag was only completed
300by about kernel version 2.6.29.)
163bf178
MK
301
302If
303.B CLONE_NEWNET
304is set, then create the process in a new network namespace.
305If this flag is not set, then (as with
306.BR fork (2)),
307the process is created in the same network namespace as
308the calling process.
309This flag is intended for the implementation of containers.
310
311A network namespace provides an isolated view of the networking stack
312(network device interfaces, IPv4 and IPv6 protocol stacks,
313IP routing tables, firewall rules, the
314.I /proc/net
315and
316.I /sys/class/net
317directory trees, sockets, etc.).
318A physical network device can live in exactly one
319network namespace.
320A virtual network device ("veth") pair provides a pipe-like abstraction
1a95a1be 321.\" FIXME Add pointer to veth(4) page when it is eventually completed
163bf178
MK
322that can be used to create tunnels between network namespaces,
323and can be used to create a bridge to a physical network device
324in another namespace.
325
bf032425
SH
326When a network namespace is freed
327(i.e., when the last process in the namespace terminates),
328its physical network devices are moved back to the
329initial network namespace (not to the parent of the process).
330
163bf178
MK
331Use of this flag requires: a kernel configured with the
332.B CONFIG_NET_NS
333option and that the process be privileged
cae2ec15 334.RB ( CAP_SYS_ADMIN ).
163bf178 335.TP
c10859eb 336.BR CLONE_NEWNS " (since Linux 2.4.19)"
732e54dd 337Start the child in a new mount namespace.
fea681da 338
732e54dd 339Every process lives in a mount namespace.
c13182ef 340The
fea681da
MK
341.I namespace
342of a process is the data (the set of mounts) describing the file hierarchy
c13182ef
MK
343as seen by that process.
344After a
fea681da
MK
345.BR fork (2)
346or
2777b1ca 347.BR clone ()
fea681da
MK
348where the
349.B CLONE_NEWNS
732e54dd 350flag is not set, the child lives in the same mount
4df2eb09 351namespace as the parent.
fea681da
MK
352The system calls
353.BR mount (2)
354and
355.BR umount (2)
732e54dd 356change the mount namespace of the calling process, and hence affect
fea681da 357all processes that live in the same namespace, but do not affect
732e54dd 358processes in a different mount namespace.
fea681da
MK
359
360After a
2777b1ca 361.BR clone ()
fea681da
MK
362where the
363.B CLONE_NEWNS
732e54dd 364flag is set, the cloned child is started in a new mount namespace,
fea681da
MK
365initialized with a copy of the namespace of the parent.
366
0b9bdf82 367Only a privileged process (one having the \fBCAP_SYS_ADMIN\fP capability)
fea681da
MK
368may specify the
369.B CLONE_NEWNS
370flag.
371It is not permitted to specify both
372.B CLONE_NEWNS
373and
374.B CLONE_FS
375in the same
e511ffb6 376.BR clone ()
fea681da 377call.
fea681da 378.TP
82ee147a
MK
379.BR CLONE_NEWPID " (since Linux 2.6.24)"
380.\" This explanation draws a lot of details from
381.\" http://lwn.net/Articles/259217/
382.\" Authors: Pavel Emelyanov <xemul@openvz.org>
383.\" and Kir Kolyshkin <kir@openvz.org>
384.\"
385.\" The primary kernel commit is 30e49c263e36341b60b735cbef5ca37912549264
386.\" Author: Pavel Emelyanov <xemul@openvz.org>
387If
5c95e5e8 388.B CLONE_NEWPID
82ee147a
MK
389is set, then create the process in a new PID namespace.
390If this flag is not set, then (as with
391.BR fork (2)),
392the process is created in the same PID namespace as
393the calling process.
0236bea9 394This flag is intended for the implementation of containers.
82ee147a
MK
395
396A PID namespace provides an isolated environment for PIDs:
397PIDs in a new namespace start at 1,
398somewhat like a standalone system, and calls to
399.BR fork (2),
400.BR vfork (2),
401or
27d47e71 402.BR clone ()
5584229c 403will produce processes with PIDs that are unique within the namespace.
82ee147a
MK
404
405The first process created in a new namespace
406(i.e., the process created using the
407.BR CLONE_NEWPID
408flag) has the PID 1, and is the "init" process for the namespace.
409Children that are orphaned within the namespace will be reparented
410to this process rather than
411.BR init (8).
412Unlike the traditional
413.B init
414process, the "init" process of a PID namespace can terminate,
415and if it does, all of the processes in the namespace are terminated.
416
417PID namespaces form a hierarchy.