]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/pid_namespaces.7
Changes: Ready for 5.02
[thirdparty/man-pages.git] / man7 / pid_namespaces.7
CommitLineData
a79bacf5
MK
1.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3.\"
c228b4b4 4.\" %%%LICENSE_START(VERBATIM)
a79bacf5
MK
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
c228b4b4 24.\" %%%LICENSE_END
a79bacf5
MK
25.\"
26.\"
9ba01802 27.TH PID_NAMESPACES 7 2019-03-06 "Linux" "Linux Programmer's Manual"
a79bacf5
MK
28.SH NAME
29pid_namespaces \- overview of Linux PID namespaces
30.SH DESCRIPTION
31For an overview of namespaces, see
32.BR namespaces (7).
a721e8b2 33.PP
a79bacf5
MK
34PID namespaces isolate the process ID number space,
35meaning that processes in different PID namespaces can have the same PID.
36b04745
MK
36PID namespaces allow containers to provide functionality
37such as suspending/resuming the set of processes in the container and
38migrating the container to a new host
a79bacf5 39while the processes inside the container maintain the same PIDs.
a721e8b2 40.PP
a79bacf5
MK
41PIDs in a new PID namespace start at 1,
42somewhat like a standalone system, and calls to
43.BR fork (2),
44.BR vfork (2),
45or
46.BR clone (2)
47will produce processes with PIDs that are unique within the namespace.
a721e8b2 48.PP
84030779
MK
49Use of PID namespaces requires a kernel that is configured with the
50.B CONFIG_PID_NS
51option.
4085d4cd
MK
52.\"
53.\" ============================================================
54.\"
84030779 55.SS The namespace "init" process
a79bacf5
MK
56The first process created in a new namespace
57(i.e., the process created using
58.BR clone (2)
59with the
60.BR CLONE_NEWPID
61flag, or the first child created by a process after a call to
62.BR unshare (2)
63using the
64.BR CLONE_NEWPID
65flag) has the PID 1, and is the "init" process for the namespace (see
66.BR init (1)).
4f1a13fe
MK
67This process becomes the parent of any child processes that are orphaned
68because a process that resides in this PID namespace terminated
69(see below for further details).
a721e8b2 70.PP
a79bacf5
MK
71If the "init" process of a PID namespace terminates,
72the kernel terminates all of the processes in the namespace via a
73.BR SIGKILL
74signal.
75This behavior reflects the fact that the "init" process
76is essential for the correct operation of a PID namespace.
7a9ab601 77In this case, a subsequent
a79bacf5 78.BR fork (2)
26cd31fd 79into this PID namespace fail with the error
a79bacf5 80.BR ENOMEM ;
16f3fc88 81it is not possible to create a new process in a PID namespace whose "init"
a79bacf5 82process has terminated.
81ccc853
MK
83Such scenarios can occur when, for example,
84a process uses an open file descriptor for a
85.I /proc/[pid]/ns/pid
86file corresponding to a process that was in a namespace to
87.BR setns (2)
88into that namespace after the "init" process has terminated.
89Another possible scenario can occur after a call to
90.BR unshare (2):
91if the first child subsequently created by a
92.BR fork (2)
93terminates, then subsequent calls to
94.BR fork (2)
26cd31fd 95fail with
81ccc853 96.BR ENOMEM .
a721e8b2 97.PP
a79bacf5
MK
98Only signals for which the "init" process has established a signal handler
99can be sent to the "init" process by other members of the PID namespace.
100This restriction applies even to privileged processes,
101and prevents other members of the PID namespace from
102accidentally killing the "init" process.
a721e8b2 103.PP
a79bacf5
MK
104Likewise, a process in an ancestor namespace
105can\(emsubject to the usual permission checks described in
106.BR kill (2)\(emsend
7a9ab601 107signals to the "init" process of a child PID namespace only
a79bacf5
MK
108if the "init" process has established a handler for that signal.
109(Within the handler, the
110.I siginfo_t
111.I si_pid
112field described in
113.BR sigaction (2)
114will be zero.)
115.B SIGKILL
116or
117.B SIGSTOP
118are treated exceptionally:
119these signals are forcibly delivered when sent from an ancestor PID namespace.
120Neither of these signals can be caught by the "init" process,
121and so will result in the usual actions associated with those signals
122(respectively, terminating and stopping the process).
a721e8b2 123.PP
f7ee0f51 124Starting with Linux 3.4, the
78d6b55b 125.BR reboot (2)
891121f6 126system call causes a signal to be sent to the namespace "init" process.
78d6b55b 127See
ff853168 128.BR reboot (2)
78d6b55b 129for more details.
4085d4cd
MK
130.\"
131.\" ============================================================
132.\"
84030779 133.SS Nesting PID namespaces
546fb4ee
MK
134PID namespaces can be nested:
135each PID namespace has a parent,
136except for the initial ("root") PID namespace.
137The parent of a PID namespace is the PID namespace of the process that
138created the namespace using
139.BR clone (2)
140or
141.BR unshare (2).
142PID namespaces thus form a tree,
143with all namespaces ultimately tracing their ancestry to the root namespace.
fb509133
MK
144Since Linux 3.7,
145.\" commit f2302505775fd13ba93f034206f1e2a587017929
146.\" The kernel constant MAX_PID_NS_LEVEL
147the kernel limits the maximum nesting depth for PID namespaces to 32.
a721e8b2 148.PP
546fb4ee
MK
149A process is visible to other processes in its PID namespace,
150and to the processes in each direct ancestor PID namespace
151going back to the root PID namespace.
152In this context, "visible" means that one process
153can be the target of operations by another process using
154system calls that specify a process ID.
155Conversely, the processes in a child PID namespace can't see
891121f6 156processes in the parent and further removed ancestor namespaces.
a79bacf5 157More succinctly: a process can see (e.g., send signals with
ff853168 158.BR kill (2),
546fb4ee
MK
159set nice values with
160.BR setpriority (2),
161etc.) only processes contained in its own PID namespace
162and in descendants of that namespace.
a721e8b2 163.PP
546fb4ee
MK
164A process has one process ID in each of the layers of the PID
165namespace hierarchy in which is visible,
166and walking back though each direct ancestor namespace
a79bacf5 167through to the root PID namespace.
546fb4ee
MK
168System calls that operate on process IDs always
169operate using the process ID that is visible in the
170PID namespace of the caller.
a79bacf5
MK
171A call to
172.BR getpid (2)
173always returns the PID associated with the namespace in which
546fb4ee 174the process was created.
a721e8b2 175.PP
a79bacf5
MK
176Some processes in a PID namespace may have parents
177that are outside of the namespace.
178For example, the parent of the initial process in the namespace
546fb4ee 179(i.e., the
a79bacf5
MK
180.BR init (1)
181process with PID 1) is necessarily in another namespace.
182Likewise, the direct children of a process that uses
183.BR setns (2)
184to cause its children to join a PID namespace are in a different
185PID namespace from the caller of
186.BR setns (2).
187Calls to
188.BR getppid (2)
189for such processes return 0.
a721e8b2 190.PP
fe376752
MK
191While processes may freely descend into child PID namespaces
192(e.g., using
ba7d7ed9 193.BR setns (2)
7cae1f4a 194with a PID namespace file descriptor),
ba7d7ed9
MF
195they may not move in the other direction.
196That is to say, processes may not enter any ancestor namespaces
197(parent, grandparent, etc.).
6d891a81 198Changing PID namespaces is a one-way operation.
a721e8b2 199.PP
3889900a
MK
200The
201.BR NS_GET_PARENT
202.BR ioctl (2)
203operation can be used to discover the parental relationship
204between PID namespaces; see
09860f31 205.BR ioctl_ns (2).
4085d4cd
MK
206.\"
207.\" ============================================================
208.\"
84030779 209.SS setns(2) and unshare(2) semantics
a79bacf5
MK
210Calls to
211.BR setns (2)
212that specify a PID namespace file descriptor
213and calls to
214.BR unshare (2)
215with the
216.BR CLONE_NEWPID
217flag cause children subsequently created
218by the caller to be placed in a different PID namespace from the caller.
df984681
MK
219(Since Linux 4.12, that PID namespace is shown via the
220.IR /proc/[pid]/ns/pid_for_children
221file, as described in
222.BR namespaces (7).)
a79bacf5
MK
223These calls do not, however,
224change the PID namespace of the calling process,
225because doing so would change the caller's idea of its own PID
226(as reported by
227.BR getpid ()),
228which would break many applications and libraries.
a721e8b2 229.PP
a79bacf5
MK
230To put things another way:
231a process's PID namespace membership is determined when the process is created
232and cannot be changed thereafter.
6e377abf 233Among other things, this means that the parental relationship
837ddeb9 234between processes mirrors the parental relationship between PID namespaces:
6e377abf
MK
235the parent of a process is either in the same namespace
236or resides in the immediate parent PID namespace.
e5cd406d
MK
237.PP
238A process may call
239.BR unshare (2)
240with the
241.B CLONE_NEWPID
242flag only once.
df0a41df
MK
243After it has performed this operation, its
244.IR /proc/PID/ns/pid_for_children
245symbolic link will be empty until the first child is created in the namespace.
e5cd406d 246.\"
4f1a13fe
MK
247.\" ============================================================
248.\"
249.SS Adoption of orphaned children
250When a child process becomes orphaned, it is reparented to the "init"
251process in the PID namespace of its parent
252(unless one of the nearer ancestors of the parent employed the
253.BR prctl (2)
254.B PR_SET_CHILD_SUBREAPER
255command to mark itself as the reaper of orphaned descendant processes).
256Note that because of the
257.BR setns (2)
258and
259.BR unshare (2)
260semantics described above, this may be the "init" process in the PID
261namespace that is the
262.I parent
263of the child's PID namespace,
264rather than the "init" process in the child's own PID namespace.
265\" Furthermore, by definition, the parent of the "init" process
266.\" of a PID namespace resides in the parent PID namespace.
267.\"
268.\" ============================================================
269.\"
98029e65 270.SS Compatibility of CLONE_NEWPID with other CLONE_* flags
4026f8ba 271In current versions of Linux,
98029e65 272.BR CLONE_NEWPID
e9fcae0f
KF
273can't be combined with
274.BR CLONE_THREAD .
275Threads are required to be in the same PID namespace such that
98029e65
EB
276the threads in a process can send signals to each other.
277Similarly, it must be possible to see all of the threads
278of a processes in the
279.BR proc (5)
4026f8ba
MK
280filesystem.
281Additionally, if two threads were in different PID
e9fcae0f 282namespaces, the process ID of the process sending a signal
98029e65
EB
283could not be meaningfully encoded when a signal is sent
284(see the description of the
285.I siginfo_t
286type in
287.BR sigaction (2)).
4026f8ba 288Since this is computed when a signal is enqueued,
e9fcae0f
KF
289a signal queue shared by processes in multiple PID namespaces
290would defeat that.
a721e8b2 291.PP
e9fcae0f
KF
292.\" Note these restrictions were all introduced in
293.\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0
294.\" when CLONE_NEWPID|CLONE_VM was disallowed
4026f8ba 295In earlier versions of Linux,
e9fcae0f 296.BR CLONE_NEWPID
4026f8ba
MK
297was additionally disallowed (failing with the error
298.BR EINVAL )
299in combination with
e9fcae0f
KF
300.BR CLONE_SIGHAND
301.\" (restriction lifted in faf00da544045fdc1454f3b9e6d7f65c841de302)
4026f8ba 302(before Linux 4.3) as well as
e9fcae0f
KF
303.\" (restriction lifted in e79f525e99b04390ca4d2366309545a836c03bf1)
304.BR CLONE_VM
4026f8ba
MK
305(before Linux 3.12).
306The changes that lifted these restrictions have also been ported to
307earlier stable kernels.
4085d4cd
MK
308.\"
309.\" ============================================================
310.\"
805685dc 311.SS /proc and PID namespaces
bac61628
MK
312A
313.I /proc
ab3311aa 314filesystem shows (in the
750653a8 315.I /proc/[pid]
bac61628
MK
316directories) only processes visible in the PID namespace
317of the process that performed the mount, even if the
318.I /proc
ab3311aa 319filesystem is viewed from processes in other namespaces.
a721e8b2 320.PP
84030779
MK
321After creating a new PID namespace,
322it is useful for the child to change its root directory
323and mount a new procfs instance at
324.I /proc
325so that tools such as
326.BR ps (1)
327work correctly.
805685dc 328If a new mount namespace is simultaneously created by including
84030779
MK
329.BR CLONE_NEWNS
330in the
7a9ab601 331.IR flags
84030779
MK
332argument of
333.BR clone (2)
334or
cbf542aa 335.BR unshare (2),
84030779
MK
336then it isn't necessary to change the root directory:
337a new procfs instance can be mounted directly over
805685dc 338.IR /proc .
a721e8b2 339.PP
bac61628
MK
340From a shell, the command to mount
341.I /proc
342is:
019d9ee8
MK
343.PP
344.in +4n
345.EX
346$ mount -t proc proc /proc
347.EE
348.in
349.PP
6c3db754
MK
350Calling
351.BR readlink (2)
352on the path
353.I /proc/self
354yields the process ID of the caller in the PID namespace of the procfs mount
355(i.e., the PID namespace of the process that mounted the procfs).
5597d425
MK
356This can be useful for introspection purposes,
357when a process wants to discover its PID in other namespaces.
805685dc
MK
358.\"
359.\" ============================================================
360.\"
10bd7553
MK
361.SS /proc files
362.TP
363.BR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)"
364.\" commit b8f566b04d3cddd192cfd2418ae6d54ac6353792
365This file displays the last PID that was allocated in this PID namespace.
366When the next PID is allocated,
367the kernel will search for the lowest unallocated PID
368that is greater than this value,
369and when this file is subsequently read it will show that PID.
370.IP
371This file is writable by a process that has the
372.B CAP_SYS_ADMIN
373capability inside its user namespace.
47d03138 374.\" This ability is necessary to support checkpoint restore in user-space
10bd7553 375This makes it possible to determine the PID that is allocated
47d03138 376to the next process that is created inside this PID namespace.
10bd7553
MK
377.\"
378.\" ============================================================
379.\"
805685dc 380.SS Miscellaneous
7a9ab601 381When a process ID is passed over a UNIX domain socket to a
a79bacf5
MK
382process in a different PID namespace (see the description of
383.B SCM_CREDENTIALS
384in
385.BR unix (7)),
386it is translated into the corresponding PID value in
387the receiving process's PID namespace.
a79bacf5
MK
388.SH CONFORMING TO
389Namespaces are a Linux-specific feature.
fa88d1a4
MK
390.SH EXAMPLE
391See
392.BR user_namespaces (7).
a79bacf5 393.SH SEE ALSO
a79bacf5 394.BR clone (2),
d64c7be5 395.BR reboot (2),
a79bacf5
MK
396.BR setns (2),
397.BR unshare (2),
398.BR proc (5),
a79bacf5 399.BR capabilities (7),
b10cb05c 400.BR credentials (7),
4bf43ba5 401.BR mount_namespaces (7),
8f29c47d 402.BR namespaces (7),
a79bacf5
MK
403.BR user_namespaces (7),
404.BR switch_root (8)