]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/pid_namespaces.7
sched.7: Minor wording improvement in text introducing system calls
[thirdparty/man-pages.git] / man7 / pid_namespaces.7
CommitLineData
a79bacf5
MK
1.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3.\"
c228b4b4 4.\" %%%LICENSE_START(VERBATIM)
a79bacf5
MK
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
c228b4b4 24.\" %%%LICENSE_END
a79bacf5
MK
25.\"
26.\"
3df541c0 27.TH PID_NAMESPACES 7 2016-07-17 "Linux" "Linux Programmer's Manual"
a79bacf5
MK
28.SH NAME
29pid_namespaces \- overview of Linux PID namespaces
30.SH DESCRIPTION
31For an overview of namespaces, see
32.BR namespaces (7).
84030779 33
a79bacf5
MK
34PID namespaces isolate the process ID number space,
35meaning that processes in different PID namespaces can have the same PID.
36b04745
MK
36PID namespaces allow containers to provide functionality
37such as suspending/resuming the set of processes in the container and
38migrating the container to a new host
a79bacf5
MK
39while the processes inside the container maintain the same PIDs.
40
41PIDs in a new PID namespace start at 1,
42somewhat like a standalone system, and calls to
43.BR fork (2),
44.BR vfork (2),
45or
46.BR clone (2)
47will produce processes with PIDs that are unique within the namespace.
48
84030779
MK
49Use of PID namespaces requires a kernel that is configured with the
50.B CONFIG_PID_NS
51option.
4085d4cd
MK
52.\"
53.\" ============================================================
54.\"
84030779 55.SS The namespace "init" process
a79bacf5
MK
56The first process created in a new namespace
57(i.e., the process created using
58.BR clone (2)
59with the
60.BR CLONE_NEWPID
61flag, or the first child created by a process after a call to
62.BR unshare (2)
63using the
64.BR CLONE_NEWPID
65flag) has the PID 1, and is the "init" process for the namespace (see
66.BR init (1)).
2a4b78e7 67A child process that is orphaned within the namespace will be reparented
a79bacf5 68to this process rather than
2a4b78e7
MK
69.BR init (1)
70(unless one of the ancestors of the child
1a1d8762 71in the same PID namespace employed the
2a4b78e7 72.BR prctl (2)
208c82ce 73.B PR_SET_CHILD_SUBREAPER
2a4b78e7 74command to mark itself as the reaper of orphaned descendant processes).
a79bacf5
MK
75
76If the "init" process of a PID namespace terminates,
77the kernel terminates all of the processes in the namespace via a
78.BR SIGKILL
79signal.
80This behavior reflects the fact that the "init" process
81is essential for the correct operation of a PID namespace.
7a9ab601 82In this case, a subsequent
a79bacf5 83.BR fork (2)
81ccc853 84into this PID namespace will fail with the error
a79bacf5
MK
85.BR ENOMEM ;
86it is not possible to create a new processes in a PID namespace whose "init"
87process has terminated.
81ccc853
MK
88Such scenarios can occur when, for example,
89a process uses an open file descriptor for a
90.I /proc/[pid]/ns/pid
91file corresponding to a process that was in a namespace to
92.BR setns (2)
93into that namespace after the "init" process has terminated.
94Another possible scenario can occur after a call to
95.BR unshare (2):
96if the first child subsequently created by a
97.BR fork (2)
98terminates, then subsequent calls to
99.BR fork (2)
100will fail with
101.BR ENOMEM .
a79bacf5
MK
102
103Only signals for which the "init" process has established a signal handler
104can be sent to the "init" process by other members of the PID namespace.
105This restriction applies even to privileged processes,
106and prevents other members of the PID namespace from
107accidentally killing the "init" process.
108
109Likewise, a process in an ancestor namespace
110can\(emsubject to the usual permission checks described in
111.BR kill (2)\(emsend
7a9ab601 112signals to the "init" process of a child PID namespace only
a79bacf5
MK
113if the "init" process has established a handler for that signal.
114(Within the handler, the
115.I siginfo_t
116.I si_pid
117field described in
118.BR sigaction (2)
119will be zero.)
120.B SIGKILL
121or
122.B SIGSTOP
123are treated exceptionally:
124these signals are forcibly delivered when sent from an ancestor PID namespace.
125Neither of these signals can be caught by the "init" process,
126and so will result in the usual actions associated with those signals
127(respectively, terminating and stopping the process).
78d6b55b 128
f7ee0f51 129Starting with Linux 3.4, the
78d6b55b 130.BR reboot (2)
891121f6 131system call causes a signal to be sent to the namespace "init" process.
78d6b55b 132See
ff853168 133.BR reboot (2)
78d6b55b 134for more details.
4085d4cd
MK
135.\"
136.\" ============================================================
137.\"
84030779 138.SS Nesting PID namespaces
546fb4ee
MK
139PID namespaces can be nested:
140each PID namespace has a parent,
141except for the initial ("root") PID namespace.
142The parent of a PID namespace is the PID namespace of the process that
143created the namespace using
144.BR clone (2)
145or
146.BR unshare (2).
147PID namespaces thus form a tree,
148with all namespaces ultimately tracing their ancestry to the root namespace.
149
150A process is visible to other processes in its PID namespace,
151and to the processes in each direct ancestor PID namespace
152going back to the root PID namespace.
153In this context, "visible" means that one process
154can be the target of operations by another process using
155system calls that specify a process ID.
156Conversely, the processes in a child PID namespace can't see
891121f6 157processes in the parent and further removed ancestor namespaces.
a79bacf5 158More succinctly: a process can see (e.g., send signals with
ff853168 159.BR kill (2),
546fb4ee
MK
160set nice values with
161.BR setpriority (2),
162etc.) only processes contained in its own PID namespace
163and in descendants of that namespace.
a79bacf5 164
546fb4ee
MK
165A process has one process ID in each of the layers of the PID
166namespace hierarchy in which is visible,
167and walking back though each direct ancestor namespace
a79bacf5 168through to the root PID namespace.
546fb4ee
MK
169System calls that operate on process IDs always
170operate using the process ID that is visible in the
171PID namespace of the caller.
a79bacf5
MK
172A call to
173.BR getpid (2)
174always returns the PID associated with the namespace in which
546fb4ee 175the process was created.
a79bacf5
MK
176
177Some processes in a PID namespace may have parents
178that are outside of the namespace.
179For example, the parent of the initial process in the namespace
546fb4ee 180(i.e., the
a79bacf5
MK
181.BR init (1)
182process with PID 1) is necessarily in another namespace.
183Likewise, the direct children of a process that uses
184.BR setns (2)
185to cause its children to join a PID namespace are in a different
186PID namespace from the caller of
187.BR setns (2).
188Calls to
189.BR getppid (2)
190for such processes return 0.
ba7d7ed9 191
fe376752
MK
192While processes may freely descend into child PID namespaces
193(e.g., using
ba7d7ed9
MF
194.BR setns (2)
195with
196.BR CLONE_NEWPID ),
197they may not move in the other direction.
198That is to say, processes may not enter any ancestor namespaces
199(parent, grandparent, etc.).
200Changing PID namespaces is a one way operation.
4085d4cd
MK
201.\"
202.\" ============================================================
203.\"
84030779 204.SS setns(2) and unshare(2) semantics
a79bacf5
MK
205Calls to
206.BR setns (2)
207that specify a PID namespace file descriptor
208and calls to
209.BR unshare (2)
210with the
211.BR CLONE_NEWPID
212flag cause children subsequently created
213by the caller to be placed in a different PID namespace from the caller.
214These calls do not, however,
215change the PID namespace of the calling process,
216because doing so would change the caller's idea of its own PID
217(as reported by
218.BR getpid ()),
219which would break many applications and libraries.
6e377abf 220
a79bacf5
MK
221To put things another way:
222a process's PID namespace membership is determined when the process is created
223and cannot be changed thereafter.
6e377abf 224Among other things, this means that the parental relationship
837ddeb9 225between processes mirrors the parental relationship between PID namespaces:
6e377abf
MK
226the parent of a process is either in the same namespace
227or resides in the immediate parent PID namespace.
98029e65
EB
228.SS Compatibility of CLONE_NEWPID with other CLONE_* flags
229.BR CLONE_NEWPID
230can't be combined with some other
231.BR CLONE_*
232flags:
233.IP * 3
234.B CLONE_THREAD
e4010a25 235requires being in the same PID namespace in order that
98029e65
EB
236the threads in a process can send signals to each other.
237Similarly, it must be possible to see all of the threads
238of a processes in the
239.BR proc (5)
ab3311aa 240filesystem.
98029e65
EB
241.IP *
242.BR CLONE_SIGHAND
243requires being in the same PID namespace;
244otherwise the process ID of the process sending a signal
245could not be meaningfully encoded when a signal is sent
246(see the description of the
247.I siginfo_t
248type in
249.BR sigaction (2)).
250A signal queue shared by processes in multiple PID namespaces
251will defeat that.
252.IP *
253.BR CLONE_VM
254requires all of the threads to be in the same PID namespace,
255because, from the point of view of a core dump,
891121f6 256if two processes share the same address space then they are threads and will
98029e65
EB
257be core dumped together.
258When a core dump is written, the PID of each
259thread is written into the core dump.
260Writing the process IDs could not meaningfully succeed
261if some of the process IDs were in a parent PID namespace.
262.PP
263To summarize: there is a technical requirement for each of
264.BR CLONE_THREAD ,
265.BR CLONE_SIGHAND ,
266and
267.BR CLONE_VM
268to share a PID namespace.
269(Note furthermore that in
270.BR clone (2)
271requires
272.BR CLONE_VM
273to be specified if
274.BR CLONE_THREAD
275or
276.BR CLONE_SIGHAND
277is specified.)
278Thus, call sequences such as the following will fail (with the error
47832b6d 279.BR EINVAL ):
a79bacf5
MK
280
281.nf
282 unshare(CLONE_NEWPID);
283 clone(..., CLONE_VM, ...); /* Fails */
284
285 setns(fd, CLONE_NEWPID);
286 clone(..., CLONE_VM, ...); /* Fails */
a79bacf5 287
bd23efc7
MK
288 clone(..., CLONE_VM, ...);
289 setns(fd, CLONE_NEWPID); /* Fails */
290
291 clone(..., CLONE_VM, ...);
292 unshare(CLONE_NEWPID); /* Fails */
293.fi
4085d4cd
MK
294.\"
295.\" ============================================================
296.\"
805685dc 297.SS /proc and PID namespaces
bac61628
MK
298A
299.I /proc
ab3311aa 300filesystem shows (in the
750653a8 301.I /proc/[pid]
bac61628
MK
302directories) only processes visible in the PID namespace
303of the process that performed the mount, even if the
304.I /proc
ab3311aa 305filesystem is viewed from processes in other namespaces.
bac61628 306
84030779
MK
307After creating a new PID namespace,
308it is useful for the child to change its root directory
309and mount a new procfs instance at
310.I /proc
311so that tools such as
312.BR ps (1)
313work correctly.
805685dc 314If a new mount namespace is simultaneously created by including
84030779
MK
315.BR CLONE_NEWNS
316in the
7a9ab601 317.IR flags
84030779
MK
318argument of
319.BR clone (2)
320or
cbf542aa 321.BR unshare (2),
84030779
MK
322then it isn't necessary to change the root directory:
323a new procfs instance can be mounted directly over
805685dc 324.IR /proc .
a79bacf5 325
bac61628
MK
326From a shell, the command to mount
327.I /proc
328is:
329
330 $ mount -t proc proc /proc
331
6c3db754
MK
332Calling
333.BR readlink (2)
334on the path
335.I /proc/self
336yields the process ID of the caller in the PID namespace of the procfs mount
337(i.e., the PID namespace of the process that mounted the procfs).
5597d425
MK
338This can be useful for introspection purposes,
339when a process wants to discover its PID in other namespaces.
805685dc
MK
340.\"
341.\" ============================================================
342.\"
343.SS Miscellaneous
7a9ab601 344When a process ID is passed over a UNIX domain socket to a
a79bacf5
MK
345process in a different PID namespace (see the description of
346.B SCM_CREDENTIALS
347in
348.BR unix (7)),
349it is translated into the corresponding PID value in
350the receiving process's PID namespace.
a79bacf5
MK
351.SH CONFORMING TO
352Namespaces are a Linux-specific feature.
fa88d1a4
MK
353.SH EXAMPLE
354See
355.BR user_namespaces (7).
a79bacf5 356.SH SEE ALSO
a79bacf5
MK
357.BR clone (2),
358.BR setns (2),
359.BR unshare (2),
360.BR proc (5),
a79bacf5 361.BR capabilities (7),
b10cb05c 362.BR credentials (7),
8f29c47d 363.BR namespaces (7),
a79bacf5
MK
364.BR user_namespaces (7),
365.BR switch_root (8)