]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/namespaces.7
pid_namespaces.7: New page splitting PID namespace material out of namespaces(7)
[thirdparty/man-pages.git] / man7 / namespaces.7
CommitLineData
020357e8 1.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
7a30282c 2.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
020357e8
MK
3.\"
4.\" Permission is granted to make and distribute verbatim copies of this
5.\" manual provided the copyright notice and this permission notice are
6.\" preserved on all copies.
7.\"
8.\" Permission is granted to copy and distribute modified versions of this
9.\" manual under the conditions for verbatim copying, provided that the
10.\" entire resulting derived work is distributed under the terms of a
11.\" permission notice identical to this one.
12.\"
13.\" Since the Linux kernel and libraries are constantly changing, this
14.\" manual page may be incorrect or out-of-date. The author(s) assume no
15.\" responsibility for errors or omissions, or for damages resulting from
16.\" the use of the information contained herein. The author(s) may not
17.\" have taken the same level of care in the production of this manual,
18.\" which is licensed free of charge, as they might when working
19.\" professionally.
20.\"
21.\" Formatted or processed versions of this manual, if unaccompanied by
22.\" the source, must acknowledge the copyright and authors of this work.
23.\"
24.\"
25.TH NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
26.SH NAME
27namespaces \- overview of Linux namespaces
28.SH DESCRIPTION
29A namespace wraps a global system resource in an abstraction that
30makes it appear to the processes within the namespace that they
31have their own isolated instance of the global resource.
32Changes to the global resource are visible to other processes
33that are members of the namespace, but are invisible to other processes.
34One use of namespaces is to implement containers.
35
36This page describes the various namespaces and the associated
37.I /proc
38files, and summarizes the APIs for working with namespaces.
6be09bd8
MK
39.\"
40.\" ==================== The namespaces API ====================
41.\"
020357e8 42.SS The namespaces API
020357e8
MK
43As well as various
44.I /proc
45files described below,
291e9237 46the namespaces API includes the following system calls:
020357e8
MK
47.TP
48.BR clone (2)
49The
50.BR clone (2)
51system call creates a new process.
52If the
53.I flags
54argument of the call specifies one or more of the
55.B CLONE_NEW*
56flags listed below, then new namespaces are created for each flag,
57and the child process is made a member of those namespaces.
58(This system call also implements a number of features
59unrelated to namespaces.)
020357e8
MK
60.TP
61.BR setns (2)
62The
63.BR setns (2)
64system call allows the calling process to join an existing namespace.
65The namespace to join is specified via a file descriptor that refers to
66one of the
67.IR /proc/[pid]/ns
68files described below.
020357e8
MK
69.TP
70.BR unshare (2)
71The
72.BR unshare (2)
73system call moves the calling process to a new namespace.
74If the
75.I flags
76argument of the call specifies one or more of the
77.B CLONE_NEW*
78flags listed below, then new namespaces are created for each flag,
79and the calling process is made a member of those namespaces.
80(This system call also implements a number of features
81unrelated to namespaces.)
3c7103af 82.PP
027a0716
MK
83Creation of new namespaces using
84.BR clone (2)
85and
86.BR unshare (2)
87in most cases requires the
88.BR CAP_SYS_ADMIN
89capability.
90User namespaces are the exception: since Linux 3.8,
2a4cbd77 91no privilege is required to create a user namespace.
6be09bd8
MK
92.\"
93.\" ==================== The /proc/[pid]/ns/ directory ====================
94.\"
cf8bfe6d 95.SS The /proc/[pid]/ns/ directory
cf8bfe6d
MK
96Each process has a
97.IR /proc/[pid]/ns/
98.\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
99subdirectory containing one entry for each namespace that
100supports being manipulated by
f2752f90
MK
101.BR setns (2):
102
103.in +4n
104.nf
105$ \fBls -l /proc/$$/ns\fP
106total 0
107lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 ipc -> ipc:[4026531839]
108lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 mnt -> mnt:[4026531840]
109lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 net -> net:[4026531956]
110lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 pid -> pid:[4026531836]
111lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 user -> user:[4026531837]
112lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 uts -> uts:[4026531838]
113.fi
114.in
cf8bfe6d
MK
115
116Bind mounting (see
117.BR mount (2))
118one of the files in this directory
119to somewhere else in the file system keeps
120the corresponding namespace of the process specified by
121.I pid
122alive even if all processes currently in the namespace terminate.
123
124Opening one of the files in this directory
125(or a file that is bind mounted to one of these files)
126returns a file handle for
127the corresponding namespace of the process specified by
128.IR pid .
129As long as this file descriptor remains open,
130the namespace will remain alive,
131even if all processes in the namespace terminate.
132The file descriptor can be passed to
133.BR setns (2).
134
135In Linux 3.7 and earlier, these files were visible as hard links.
136Since Linux 3.8, they appear as symbolic links.
137If two processes are in the same namespace, then the inode numbers of their
138.IR /proc/[pid]/ns/xxx
139symbolic links will be the same; an application can check this using the
140.I stat.st_ino
141field returned by
142.BR stat (2).
143The content of this symbolic link is a string containing
144the namespace type and inode number as in the following example:
145
146.in +4n
147.nf
148$ \fBreadlink /proc/$$/ns/uts\fP
149uts:[4026531838]
150.fi
151.in
152
153The files in this subdirectory are as follows:
154.TP
155.IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
156This file is a handle for the IPC namespace of the process.
cf8bfe6d
MK
157.TP
158.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
159This file is a handle for the mount namespace of the process.
cf8bfe6d
MK
160.TP
161.IR /proc/[pid]/ns/net " (since Linux 3.0)"
162This file is a handle for the network namespace of the process.
cf8bfe6d
MK
163.TP
164.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
165This file is a handle for the PID namespace of the process.
cf8bfe6d
MK
166.TP
167.IR /proc/[pid]/ns/user " (since Linux 3.8)"
168This file is a handle for the user namespace of the process.
cf8bfe6d
MK
169.TP
170.IR /proc/[pid]/ns/uts " (since Linux 3.0)"
171This file is a handle for the IPC namespace of the process.
6be09bd8
MK
172.\"
173.\" ==================== IPC namespaces ====================
174.\"
020357e8 175.SS IPC namespaces (CLONE_NEWIPC)
020357e8
MK
176IPC namespaces isolate certain IPC resources,
177namely, System V IPC objects (see
178.BR svipc (7))
9343f8e7
MK
179and (since Linux 2.6.30)
180.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
181.\" https://lwn.net/Articles/312232/
182POSIX message queues (see
020357e8 183.BR mq_overview (7).
9343f8e7 184The common characteristic of these IPC mechanisms is that IPC
a122e267 185objects are identified by mechanisms other than file system
9343f8e7
MK
186pathnames.
187
020357e8
MK
188Each IPC namespace has its own set of System V IPC identifiers and
189its own POSIX message queue file system.
9343f8e7
MK
190Objects created in an IPC namespace are visible to all other processes
191that are members of that namespace,
192but are not visible to processes in other IPC namespaces.
193
194When an IPC namespace is destroyed
195(i.e., when the last process that is a member of the namespace terminates),
196all IPC objects in the namespace are automatically destroyed.
197
198Use of IPC namespaces requires a kernel that is configured with the
199.B CONFIG_IPC_NS
200option.
6be09bd8
MK
201.\"
202.\" ==================== Network namespaces ====================
203.\"
020357e8 204.SS Network namespaces (CLONE_NEWNET)
020357e8
MK
205Network namespaces provide isolation of the system resources associated
206with networking: network devices, IP addresses, IP routing tables,
207.I /proc/net
208directory,
209.I /sys/class/net
210directory, port numbers, and so on.
211
73680728
MK
212A network namespace provides an isolated view of the networking stack
213(network device interfaces, IPv4 and IPv6 protocol stacks,
214IP routing tables, firewall rules, the
215.I /proc/net
216and
217.I /sys/class/net
218directory trees, sockets, etc.).
219A physical network device can live in exactly one
220network namespace.
221A virtual network device ("veth") pair provides a pipe-like abstraction
222.\" FIXME Add pointer to veth(4) page when it is eventually completed
223that can be used to create tunnels between network namespaces,
224and can be used to create a bridge to a physical network device
225in another namespace.
226
227When a network namespace is freed
228(i.e., when the last process in the namespace terminates),
229its physical network devices are moved back to the
230initial network namespace (not to the parent of the process).
231
232Use of network namespaces requires a kernel that is configured with the
233.B CONFIG_NET_NS
234option.
6be09bd8
MK
235.\"
236.\" ==================== Mount namespaces ====================
237.\"
357002ec 238.SS Mount namespaces (CLONE_NEWNS)
357002ec
MK
239Mount namespaces isolate the set of file system mount points,
240meaning that processes in different mount namespaces can
241have different views of the file system hierarchy.
242The set of mounts in a mount namespace is modified using
243.BR mount (2)
244and
245.BR umount (2).
246
247The
248.IR /proc/[pid]/mounts
249file (present since Linux 2.4.19)
250lists all the file systems currently mounted in the
251process's mount namespace.
252The format of this file is documented in
253.BR fstab (5).
254Since kernel version 2.6.15, this file is pollable:
255after opening the file for reading, a change in this file
256(i.e., a file system mount or unmount) causes
257.BR select (2)
258to mark the file descriptor as readable, and
259.BR poll (2)
260and
261.BR epoll_wait (2)
262mark the file as having an error condition.
263
4716a1dd
MK
264The
265.IR /proc/[pid]/mountstats
266file (present since Linux 2.6.17)
267exports information (statistics, configuration information)
268about the mount points in the process's mount namespace.
269This file is only readable by the owner of the process.
270Lines in this file have the form:
271.RS
272.in 12
273.nf
274
275device /dev/sda7 mounted on /home with fstype ext3 [statistics]
276( 1 ) ( 2 ) (3 ) (4)
277.fi
278.in
279
280The fields in each line are:
281.TP 5
282(1)
283The name of the mounted device
284(or "nodevice" if there is no corresponding device).
285.TP
286(2)
287The mount point within the file system tree.
288.TP
289(3)
290The file system type.
291.TP
292(4)
293Optional statistics and configuration information.
294Currently (as at Linux 2.6.26), only NFS file systems export
295information via this field.
296.RE
6be09bd8
MK
297.\"
298.\" ==================== PID namespaces ====================
299.\"
020357e8 300.SS PID namespaces (CLONE_NEWPID)
020357e8
MK
301PID namespaces isolate the process ID number space,
302meaning that processes in different PID namespaces can have the same PID.
7091f8f3 303PID namespaces allow containers to migrate to a new host
020357e8 304while the processes inside the container maintain the same PIDs.
9d005472
MK
305
306PIDs in a new PID namespace start at 1,
307somewhat like a standalone system, and calls to
308.BR fork (2),
309.BR vfork (2),
310or
311.BR clone (2)
312will produce processes with PIDs that are unique within the namespace.
313
314The first process created in a new namespace
315(i.e., the process created using
316.BR clone (2)
317with the
318.BR CLONE_NEWPID
319flag, or the first child created by a process after a call to
320.BR unshare (2)
321using the
322.BR CLONE_NEWPID
323flag) has the PID 1, and is the "init" process for the namespace (see
324.BR init (1)).
325Children that are orphaned within the namespace will be reparented
326to this process rather than
84c35715 327.BR init (1).
33a3c1b8
MK
328
329If the "init" process of a PID namespace terminates,
b16d757d
MK
330the kernel terminates all of the processes in the namespace via a
331.BR SIGKILL
332signal.
33a3c1b8
MK
333This behavior reflects the fact that the "init" process
334is essential for the correct operation of a PID namespace.
3c967963 335In this case, a subsequent
bcf8010e 336.BR fork (2)
3c967963
MK
337into this PID namespace (e.g., from a process that has done a
338.BR setns (2)
339into the namespace using an open file descriptor for a
340.I /proc/[pid]/ns/pid
341file corresponding to a process that was in the namespace)
bcf8010e
MK
342will fail with the error
343.BR ENOMEM ;
3c967963
MK
344it is not possible to create a new processes in a PID namespace whose "init"
345process has terminated.
9d005472 346
e17d07c1
MK
347Only signals for which the "init" process has established a signal handler
348can be sent to the "init" process by other members of the PID namespace.
349This restriction applies even to privileged processes,
350and prevents other members of the PID namespace from
351accidentally killing the "init" process.
c0004fb4
MK
352
353Likewise, a process in an ancestor namespace
354can\(emsubject to the usual permission checks described in
e17d07c1 355.BR kill (2)\(emsend
c0004fb4
MK
356signals to the "init" process of a child PID namespace only
357if the "init" process has established a handler for that signal.
358(Within the handler, the
359.I siginfo_t
360.I si_pid
361field described in
362.BR sigaction (2)
363will be zero.)
fc49d2ac
MK
364.B SIGKILL
365or
366.B SIGSTOP
c0004fb4
MK
367are treated exceptionally:
368these signals are forcibly delivered when sent from an ancestor PID namespace.
fc49d2ac
MK
369Neither of these signals can be caught by the "init" process,
370and so will result in the usual actions associated with those signals
371(respectively, terminating and stopping the process).
e17d07c1 372
9d005472
MK
373PID namespaces can be nested.
374When a new PID namespace is created,
375the processes in that namespace are visible
376in the PID namespace of the process that created the new namespace;
377analogously, if the parent PID namespace is itself
378the child of another PID namespace,
379then processes in the child and parent PID namespaces will both be
380visible in the grandparent PID namespace.
381Conversely, the processes in the "child" PID namespace do not see
382the processes in the parent namespace.
383More succinctly: a process can see (e.g., send signals with
020357e8 384.BR kill(2))
e13b53a6 385only processes contained in its own PID namespace
020357e8
MK
386and the namespaces nested below that PID namespace.
387
9d005472
MK
388A process will have one PID for each of the layers of the hierarchy
389starting from the PID namespace in which it resides
390through to the root PID namespace.
391A call to
392.BR getpid (2)
393always returns the PID associated with the namespace in which
394the process resides.
395
ed94b9b8 396Some processes in a PID namespace may have parents
110026ab
MK
397that are outside of the namespace.
398For example, the parent of the initial process in the namespace
399(i.e.,
400the
401.BR init (1)
402process with PID 1) is necessarily in another namespace.
403Likewise, the direct children of a process that uses
404.BR setns (2)
405to cause its children to join a PID namespace are in a different
406PID namespace from the caller of
407.BR setns (2).
408Calls to
409.BR getppid (2)
410for such processes return 0.
411
9d005472
MK
412After creating a new PID namespace,
413it is useful for the child to change its root directory
414and mount a new procfs instance at
415.I /proc
416so that tools such as
417.BR ps (1)
418work correctly.
419.\" mount -t proc proc /proc
420(If
421.BR CLONE_NEWNS
422is also included in the
423.IR flags
424argument of
425.BR clone (2)
426or
427.BR unshare (2)),
428then it isn't necessary to change the root directory:
429a new procfs instance can be mounted directly over
430.IR /proc .)
431
ca291567
MK
432Calls to
433.BR setns (2)
434that specify a PID namespace file descriptor
435and calls to
436.BR unshare (2)
437with the
438.BR CLONE_NEWPID
439flag cause children subsequently created
440by the caller to be placed in a different PID namespace from the caller.
441These calls do not, however,
442change the PID namespace of the calling process,
443because doing so would change the caller's idea of its own PID
444(as reported by
445.BR getpid ()),
446which would break many applications and libraries.
447To put things another way:
448a process's PID namespace membership is determined when the process is created
449and cannot be changed thereafter.
450
857c57e7
MK
451Every thread in a process must be in the same PID namespace.
452For this reason, the two following call sequences will fail:
453
857c57e7
MK
454.nf
455 unshare(CLONE_NEWPID);
456 clone(..., CLONE_VM, ...); /* Fails */
457
458 setns(fd, CLONE_NEWPID);
459 clone(..., CLONE_VM, ...); /* Fails */
460.fi
857c57e7
MK
461
462Because the above
463.BR unshare (2)
464and
465.BR setns (2)
466calls only change the PID namespace for created children, the
467.BR clone (2)
468calls necessarily put the new thread in a different PID namespace from
469the calling thread.
470
53d63b89
MK
471When a process ID is passed over a UNIX domain socket to a
472process in a different PID namespace (see the description of
473.B SCM_CREDENTIALS
474in
475.BR unix (7)),
476it is translated into the corresponding PID value in
477the receiving process's PID namespace.
478.\" FIXME Presumably, a similar thing happens with the UID and GID passed
479.\" via a UNIX domain socket. That needs to be confirmed and documented
480.\" under the "User namespaces" section.
481
9d005472
MK
482Use of PID namespaces requires a kernel that is configured with the
483.B CONFIG_PID_NS
484option.
6be09bd8
MK
485.\"
486.\" ==================== User namespaces ====================
487.\"
020357e8 488.SS User namespaces (CLONE_NEWUSER)
67d1131f
MK
489See
490.BR user_namespaces (7).
6be09bd8
MK
491.\"
492.\" ==================== UTS namespaces ====================
493.\"
020357e8 494.SS UTS namespaces (CLONE_NEWUTS)
020357e8
MK
495UTS namespaces provide isolation of two system identifiers:
496the hostname and the NIS domain name.
497These identifiers are set using
498.BR sethostname (2)
499and
500.BR setdomainname (2),
501and can be retrieved using
502.BR uname (2),
503.BR gethostname (2),
504and
505.BR getdomainname (2).
506
83d9e9b2
MK
507Use of UTS namespaces requires a kernel that is configured with the
508.B CONFIG_UTS_NS
509option.
020357e8
MK
510.SH CONFORMING TO
511Namespaces are a Linux-specific feature.
512.SH SEE ALSO
86499a6b 513.BR nsenter (1),
020357e8 514.BR readlink (1),
86499a6b 515.BR unshare (1),
020357e8
MK
516.BR clone (2),
517.BR setns (2),
518.BR unshare (2),
519.BR proc (5),
520.BR credentials (7),
029ae9e3 521.BR capabilities (7),
67d1131f 522.BR user_namespaces (7),
029ae9e3 523.BR switch_root (8)