]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/pid_namespaces.7
man*/: srcfix (Use .P instead of .PP or .LP)
[thirdparty/man-pages.git] / man7 / pid_namespaces.7
CommitLineData
a79bacf5
MK
1.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3.\"
5fbde956 4.\" SPDX-License-Identifier: Linux-man-pages-copyleft
a79bacf5
MK
5.\"
6.\"
4c1c5274 7.TH pid_namespaces 7 (date) "Linux man-pages (unreleased)"
a79bacf5
MK
8.SH NAME
9pid_namespaces \- overview of Linux PID namespaces
10.SH DESCRIPTION
11For an overview of namespaces, see
12.BR namespaces (7).
c6d039a3 13.P
a79bacf5
MK
14PID namespaces isolate the process ID number space,
15meaning that processes in different PID namespaces can have the same PID.
36b04745
MK
16PID namespaces allow containers to provide functionality
17such as suspending/resuming the set of processes in the container and
18migrating the container to a new host
a79bacf5 19while the processes inside the container maintain the same PIDs.
c6d039a3 20.P
a79bacf5
MK
21PIDs in a new PID namespace start at 1,
22somewhat like a standalone system, and calls to
23.BR fork (2),
24.BR vfork (2),
25or
26.BR clone (2)
27will produce processes with PIDs that are unique within the namespace.
c6d039a3 28.P
84030779
MK
29Use of PID namespaces requires a kernel that is configured with the
30.B CONFIG_PID_NS
31option.
4085d4cd
MK
32.\"
33.\" ============================================================
34.\"
84030779 35.SS The namespace "init" process
a79bacf5
MK
36The first process created in a new namespace
37(i.e., the process created using
38.BR clone (2)
39with the
1ae6b2c7 40.B CLONE_NEWPID
a79bacf5
MK
41flag, or the first child created by a process after a call to
42.BR unshare (2)
43using the
1ae6b2c7 44.B CLONE_NEWPID
a79bacf5
MK
45flag) has the PID 1, and is the "init" process for the namespace (see
46.BR init (1)).
4f1a13fe
MK
47This process becomes the parent of any child processes that are orphaned
48because a process that resides in this PID namespace terminated
49(see below for further details).
c6d039a3 50.P
a79bacf5
MK
51If the "init" process of a PID namespace terminates,
52the kernel terminates all of the processes in the namespace via a
1ae6b2c7 53.B SIGKILL
a79bacf5
MK
54signal.
55This behavior reflects the fact that the "init" process
56is essential for the correct operation of a PID namespace.
7a9ab601 57In this case, a subsequent
a79bacf5 58.BR fork (2)
26cd31fd 59into this PID namespace fail with the error
a79bacf5 60.BR ENOMEM ;
16f3fc88 61it is not possible to create a new process in a PID namespace whose "init"
a79bacf5 62process has terminated.
81ccc853
MK
63Such scenarios can occur when, for example,
64a process uses an open file descriptor for a
1ae6b2c7 65.IR /proc/ pid /ns/pid
81ccc853
MK
66file corresponding to a process that was in a namespace to
67.BR setns (2)
68into that namespace after the "init" process has terminated.
69Another possible scenario can occur after a call to
70.BR unshare (2):
71if the first child subsequently created by a
72.BR fork (2)
73terminates, then subsequent calls to
74.BR fork (2)
26cd31fd 75fail with
81ccc853 76.BR ENOMEM .
c6d039a3 77.P
a79bacf5
MK
78Only signals for which the "init" process has established a signal handler
79can be sent to the "init" process by other members of the PID namespace.
80This restriction applies even to privileged processes,
81and prevents other members of the PID namespace from
82accidentally killing the "init" process.
c6d039a3 83.P
a79bacf5 84Likewise, a process in an ancestor namespace
36546c38
AC
85can\[em]subject to the usual permission checks described in
86.BR kill (2)\[em]send
7a9ab601 87signals to the "init" process of a child PID namespace only
a79bacf5
MK
88if the "init" process has established a handler for that signal.
89(Within the handler, the
90.I siginfo_t
91.I si_pid
92field described in
93.BR sigaction (2)
94will be zero.)
95.B SIGKILL
96or
97.B SIGSTOP
98are treated exceptionally:
99these signals are forcibly delivered when sent from an ancestor PID namespace.
100Neither of these signals can be caught by the "init" process,
101and so will result in the usual actions associated with those signals
102(respectively, terminating and stopping the process).
c6d039a3 103.P
f7ee0f51 104Starting with Linux 3.4, the
78d6b55b 105.BR reboot (2)
891121f6 106system call causes a signal to be sent to the namespace "init" process.
78d6b55b 107See
ff853168 108.BR reboot (2)
78d6b55b 109for more details.
4085d4cd
MK
110.\"
111.\" ============================================================
112.\"
84030779 113.SS Nesting PID namespaces
546fb4ee
MK
114PID namespaces can be nested:
115each PID namespace has a parent,
116except for the initial ("root") PID namespace.
117The parent of a PID namespace is the PID namespace of the process that
118created the namespace using
119.BR clone (2)
120or
121.BR unshare (2).
122PID namespaces thus form a tree,
123with all namespaces ultimately tracing their ancestry to the root namespace.
fb509133
MK
124Since Linux 3.7,
125.\" commit f2302505775fd13ba93f034206f1e2a587017929
126.\" The kernel constant MAX_PID_NS_LEVEL
127the kernel limits the maximum nesting depth for PID namespaces to 32.
c6d039a3 128.P
546fb4ee
MK
129A process is visible to other processes in its PID namespace,
130and to the processes in each direct ancestor PID namespace
131going back to the root PID namespace.
132In this context, "visible" means that one process
133can be the target of operations by another process using
134system calls that specify a process ID.
135Conversely, the processes in a child PID namespace can't see
891121f6 136processes in the parent and further removed ancestor namespaces.
a79bacf5 137More succinctly: a process can see (e.g., send signals with
ff853168 138.BR kill (2),
546fb4ee
MK
139set nice values with
140.BR setpriority (2),
141etc.) only processes contained in its own PID namespace
142and in descendants of that namespace.
c6d039a3 143.P
546fb4ee
MK
144A process has one process ID in each of the layers of the PID
145namespace hierarchy in which is visible,
146and walking back though each direct ancestor namespace
a79bacf5 147through to the root PID namespace.
546fb4ee
MK
148System calls that operate on process IDs always
149operate using the process ID that is visible in the
150PID namespace of the caller.
a79bacf5
MK
151A call to
152.BR getpid (2)
153always returns the PID associated with the namespace in which
546fb4ee 154the process was created.
c6d039a3 155.P
a79bacf5
MK
156Some processes in a PID namespace may have parents
157that are outside of the namespace.
158For example, the parent of the initial process in the namespace
546fb4ee 159(i.e., the
a79bacf5
MK
160.BR init (1)
161process with PID 1) is necessarily in another namespace.
162Likewise, the direct children of a process that uses
163.BR setns (2)
164to cause its children to join a PID namespace are in a different
165PID namespace from the caller of
166.BR setns (2).
167Calls to
168.BR getppid (2)
169for such processes return 0.
c6d039a3 170.P
fe376752
MK
171While processes may freely descend into child PID namespaces
172(e.g., using
ba7d7ed9 173.BR setns (2)
7cae1f4a 174with a PID namespace file descriptor),
ba7d7ed9
MF
175they may not move in the other direction.
176That is to say, processes may not enter any ancestor namespaces
177(parent, grandparent, etc.).
6d891a81 178Changing PID namespaces is a one-way operation.
c6d039a3 179.P
3889900a 180The
1ae6b2c7 181.B NS_GET_PARENT
3889900a
MK
182.BR ioctl (2)
183operation can be used to discover the parental relationship
184between PID namespaces; see
09860f31 185.BR ioctl_ns (2).
4085d4cd
MK
186.\"
187.\" ============================================================
188.\"
84030779 189.SS setns(2) and unshare(2) semantics
a79bacf5
MK
190Calls to
191.BR setns (2)
192that specify a PID namespace file descriptor
193and calls to
194.BR unshare (2)
195with the
1ae6b2c7 196.B CLONE_NEWPID
a79bacf5
MK
197flag cause children subsequently created
198by the caller to be placed in a different PID namespace from the caller.
df984681 199(Since Linux 4.12, that PID namespace is shown via the
1ae6b2c7 200.IR /proc/ pid /ns/pid_for_children
df984681
MK
201file, as described in
202.BR namespaces (7).)
a79bacf5
MK
203These calls do not, however,
204change the PID namespace of the calling process,
205because doing so would change the caller's idea of its own PID
206(as reported by
207.BR getpid ()),
208which would break many applications and libraries.
c6d039a3 209.P
a79bacf5
MK
210To put things another way:
211a process's PID namespace membership is determined when the process is created
212and cannot be changed thereafter.
6e377abf 213Among other things, this means that the parental relationship
837ddeb9 214between processes mirrors the parental relationship between PID namespaces:
6e377abf
MK
215the parent of a process is either in the same namespace
216or resides in the immediate parent PID namespace.
c6d039a3 217.P
e5cd406d
MK
218A process may call
219.BR unshare (2)
220with the
221.B CLONE_NEWPID
222flag only once.
df0a41df 223After it has performed this operation, its
1ae6b2c7 224.IR /proc/ pid /ns/pid_for_children
df0a41df 225symbolic link will be empty until the first child is created in the namespace.
e5cd406d 226.\"
4f1a13fe
MK
227.\" ============================================================
228.\"
229.SS Adoption of orphaned children
230When a child process becomes orphaned, it is reparented to the "init"
231process in the PID namespace of its parent
232(unless one of the nearer ancestors of the parent employed the
233.BR prctl (2)
234.B PR_SET_CHILD_SUBREAPER
235command to mark itself as the reaper of orphaned descendant processes).
236Note that because of the
237.BR setns (2)
238and
239.BR unshare (2)
240semantics described above, this may be the "init" process in the PID
241namespace that is the
242.I parent
243of the child's PID namespace,
244rather than the "init" process in the child's own PID namespace.
243d656f 245.\" Furthermore, by definition, the parent of the "init" process
4f1a13fe
MK
246.\" of a PID namespace resides in the parent PID namespace.
247.\"
248.\" ============================================================
249.\"
98029e65 250.SS Compatibility of CLONE_NEWPID with other CLONE_* flags
4026f8ba 251In current versions of Linux,
1ae6b2c7 252.B CLONE_NEWPID
e9fcae0f
KF
253can't be combined with
254.BR CLONE_THREAD .
255Threads are required to be in the same PID namespace such that
98029e65
EB
256the threads in a process can send signals to each other.
257Similarly, it must be possible to see all of the threads
067c60a7 258of a process in the
98029e65 259.BR proc (5)
4026f8ba
MK
260filesystem.
261Additionally, if two threads were in different PID
e9fcae0f 262namespaces, the process ID of the process sending a signal
98029e65
EB
263could not be meaningfully encoded when a signal is sent
264(see the description of the
265.I siginfo_t
266type in
267.BR sigaction (2)).
4026f8ba 268Since this is computed when a signal is enqueued,
e9fcae0f
KF
269a signal queue shared by processes in multiple PID namespaces
270would defeat that.
c6d039a3 271.P
e9fcae0f
KF
272.\" Note these restrictions were all introduced in
273.\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0
274.\" when CLONE_NEWPID|CLONE_VM was disallowed
4026f8ba 275In earlier versions of Linux,
1ae6b2c7 276.B CLONE_NEWPID
4026f8ba
MK
277was additionally disallowed (failing with the error
278.BR EINVAL )
279in combination with
1ae6b2c7 280.B CLONE_SIGHAND
e9fcae0f 281.\" (restriction lifted in faf00da544045fdc1454f3b9e6d7f65c841de302)
4026f8ba 282(before Linux 4.3) as well as
e9fcae0f 283.\" (restriction lifted in e79f525e99b04390ca4d2366309545a836c03bf1)
1ae6b2c7 284.B CLONE_VM
4026f8ba
MK
285(before Linux 3.12).
286The changes that lifted these restrictions have also been ported to
287earlier stable kernels.
4085d4cd
MK
288.\"
289.\" ============================================================
290.\"
805685dc 291.SS /proc and PID namespaces
bac61628
MK
292A
293.I /proc
ab3311aa 294filesystem shows (in the
1ae6b2c7 295.IR /proc/ pid
bac61628
MK
296directories) only processes visible in the PID namespace
297of the process that performed the mount, even if the
298.I /proc
ab3311aa 299filesystem is viewed from processes in other namespaces.
c6d039a3 300.P
84030779
MK
301After creating a new PID namespace,
302it is useful for the child to change its root directory
303and mount a new procfs instance at
304.I /proc
305so that tools such as
306.BR ps (1)
307work correctly.
805685dc 308If a new mount namespace is simultaneously created by including
1ae6b2c7 309.B CLONE_NEWNS
84030779 310in the
1ae6b2c7 311.I flags
84030779
MK
312argument of
313.BR clone (2)
314or
cbf542aa 315.BR unshare (2),
84030779
MK
316then it isn't necessary to change the root directory:
317a new procfs instance can be mounted directly over
805685dc 318.IR /proc .
c6d039a3 319.P
bac61628
MK
320From a shell, the command to mount
321.I /proc
322is:
c6d039a3 323.P
019d9ee8
MK
324.in +4n
325.EX
fb6d2c09 326$ mount \-t proc proc /proc
019d9ee8
MK
327.EE
328.in
c6d039a3 329.P
6c3db754
MK
330Calling
331.BR readlink (2)
332on the path
333.I /proc/self
334yields the process ID of the caller in the PID namespace of the procfs mount
335(i.e., the PID namespace of the process that mounted the procfs).
5597d425
MK
336This can be useful for introspection purposes,
337when a process wants to discover its PID in other namespaces.
805685dc
MK
338.\"
339.\" ============================================================
340.\"
10bd7553
MK
341.SS /proc files
342.TP
343.BR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)"
344.\" commit b8f566b04d3cddd192cfd2418ae6d54ac6353792
3f298932
MK
345This file
346(which is virtualized per PID namespace)
347displays the last PID that was allocated in this PID namespace.
10bd7553
MK
348When the next PID is allocated,
349the kernel will search for the lowest unallocated PID
350that is greater than this value,
351and when this file is subsequently read it will show that PID.
352.IP
353This file is writable by a process that has the
354.B CAP_SYS_ADMIN
1e516a82
MK
355or (since Linux 5.9)
356.B CAP_CHECKPOINT_RESTORE
439526d1 357capability inside the user namespace that owns the PID namespace.
47d03138 358.\" This ability is necessary to support checkpoint restore in user-space
10bd7553 359This makes it possible to determine the PID that is allocated
47d03138 360to the next process that is created inside this PID namespace.
10bd7553
MK
361.\"
362.\" ============================================================
363.\"
805685dc 364.SS Miscellaneous
7a9ab601 365When a process ID is passed over a UNIX domain socket to a
a79bacf5
MK
366process in a different PID namespace (see the description of
367.B SCM_CREDENTIALS
368in
369.BR unix (7)),
370it is translated into the corresponding PID value in
371the receiving process's PID namespace.
3113c7f3 372.SH STANDARDS
4131356c 373Linux.
a14af333 374.SH EXAMPLES
fa88d1a4
MK
375See
376.BR user_namespaces (7).
a79bacf5 377.SH SEE ALSO
a79bacf5 378.BR clone (2),
d64c7be5 379.BR reboot (2),
a79bacf5
MK
380.BR setns (2),
381.BR unshare (2),
382.BR proc (5),
a79bacf5 383.BR capabilities (7),
b10cb05c 384.BR credentials (7),
4bf43ba5 385.BR mount_namespaces (7),
8f29c47d 386.BR namespaces (7),
a79bacf5
MK
387.BR user_namespaces (7),
388.BR switch_root (8)