]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/namespaces.7
namespaces.7: wfix
[thirdparty/man-pages.git] / man7 / namespaces.7
CommitLineData
020357e8
MK
1.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2.\"
3.\" Permission is granted to make and distribute verbatim copies of this
4.\" manual provided the copyright notice and this permission notice are
5.\" preserved on all copies.
6.\"
7.\" Permission is granted to copy and distribute modified versions of this
8.\" manual under the conditions for verbatim copying, provided that the
9.\" entire resulting derived work is distributed under the terms of a
10.\" permission notice identical to this one.
11.\"
12.\" Since the Linux kernel and libraries are constantly changing, this
13.\" manual page may be incorrect or out-of-date. The author(s) assume no
14.\" responsibility for errors or omissions, or for damages resulting from
15.\" the use of the information contained herein. The author(s) may not
16.\" have taken the same level of care in the production of this manual,
17.\" which is licensed free of charge, as they might when working
18.\" professionally.
19.\"
20.\" Formatted or processed versions of this manual, if unaccompanied by
21.\" the source, must acknowledge the copyright and authors of this work.
22.\"
23.\"
24.TH NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
25.SH NAME
26namespaces \- overview of Linux namespaces
27.SH DESCRIPTION
28A namespace wraps a global system resource in an abstraction that
29makes it appear to the processes within the namespace that they
30have their own isolated instance of the global resource.
31Changes to the global resource are visible to other processes
32that are members of the namespace, but are invisible to other processes.
33One use of namespaces is to implement containers.
34
35This page describes the various namespaces and the associated
36.I /proc
37files, and summarizes the APIs for working with namespaces.
38
39.SS The namespaces API
40
41As well as various
42.I /proc
43files described below,
44the namespaces API comprises the following system calls:
45
46.TP
47.BR clone (2)
48The
49.BR clone (2)
50system call creates a new process.
51If the
52.I flags
53argument of the call specifies one or more of the
54.B CLONE_NEW*
55flags listed below, then new namespaces are created for each flag,
56and the child process is made a member of those namespaces.
57(This system call also implements a number of features
58unrelated to namespaces.)
59
60.TP
61.BR setns (2)
62The
63.BR setns (2)
64system call allows the calling process to join an existing namespace.
65The namespace to join is specified via a file descriptor that refers to
66one of the
67.IR /proc/[pid]/ns
68files described below.
69
70.TP
71.BR unshare (2)
72The
73.BR unshare (2)
74system call moves the calling process to a new namespace.
75If the
76.I flags
77argument of the call specifies one or more of the
78.B CLONE_NEW*
79flags listed below, then new namespaces are created for each flag,
80and the calling process is made a member of those namespaces.
81(This system call also implements a number of features
82unrelated to namespaces.)
83
84Leaving aside the other effects of the
85.BR clone (2)
86system call, the following call:
87
88 clone(..., CLONE_NEWXXX, ...);
89
90is equivalent in namespace terms to:
91
92 if (fork() == 0) /* if child */
93 unshare(CLONE_NEWXXX);
94
cf8bfe6d
MK
95.SS The /proc/[pid]/ns/ directory
96
97Each process has a
98.IR /proc/[pid]/ns/
99.\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
100subdirectory containing one entry for each namespace that
101supports being manipulated by
f2752f90
MK
102.BR setns (2):
103
104.in +4n
105.nf
106$ \fBls -l /proc/$$/ns\fP
107total 0
108lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 ipc -> ipc:[4026531839]
109lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 mnt -> mnt:[4026531840]
110lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 net -> net:[4026531956]
111lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 pid -> pid:[4026531836]
112lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 user -> user:[4026531837]
113lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 uts -> uts:[4026531838]
114.fi
115.in
cf8bfe6d
MK
116
117Bind mounting (see
118.BR mount (2))
119one of the files in this directory
120to somewhere else in the file system keeps
121the corresponding namespace of the process specified by
122.I pid
123alive even if all processes currently in the namespace terminate.
124
125Opening one of the files in this directory
126(or a file that is bind mounted to one of these files)
127returns a file handle for
128the corresponding namespace of the process specified by
129.IR pid .
130As long as this file descriptor remains open,
131the namespace will remain alive,
132even if all processes in the namespace terminate.
133The file descriptor can be passed to
134.BR setns (2).
135
136In Linux 3.7 and earlier, these files were visible as hard links.
137Since Linux 3.8, they appear as symbolic links.
138If two processes are in the same namespace, then the inode numbers of their
139.IR /proc/[pid]/ns/xxx
140symbolic links will be the same; an application can check this using the
141.I stat.st_ino
142field returned by
143.BR stat (2).
144The content of this symbolic link is a string containing
145the namespace type and inode number as in the following example:
146
147.in +4n
148.nf
149$ \fBreadlink /proc/$$/ns/uts\fP
150uts:[4026531838]
151.fi
152.in
153
154The files in this subdirectory are as follows:
155.TP
156.IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
157This file is a handle for the IPC namespace of the process.
158
159.TP
160.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
161This file is a handle for the mount namespace of the process.
162
163.TP
164.IR /proc/[pid]/ns/net " (since Linux 3.0)"
165This file is a handle for the network namespace of the process.
166
167.TP
168.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
169This file is a handle for the PID namespace of the process.
170
171.TP
172.IR /proc/[pid]/ns/user " (since Linux 3.8)"
173This file is a handle for the user namespace of the process.
174
175.TP
176.IR /proc/[pid]/ns/uts " (since Linux 3.0)"
177This file is a handle for the IPC namespace of the process.
178
179
020357e8
MK
180.SS IPC namespaces (CLONE_NEWIPC)
181
182IPC namespaces isolate certain IPC resources,
183namely, System V IPC objects (see
184.BR svipc (7))
9343f8e7
MK
185and (since Linux 2.6.30)
186.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
187.\" https://lwn.net/Articles/312232/
188POSIX message queues (see
020357e8 189.BR mq_overview (7).
9343f8e7
MK
190The common characteristic of these IPC mechanisms is that IPC
191objects are identified by mechanisms other than filesystem
192pathnames.
193
020357e8
MK
194Each IPC namespace has its own set of System V IPC identifiers and
195its own POSIX message queue file system.
9343f8e7
MK
196Objects created in an IPC namespace are visible to all other processes
197that are members of that namespace,
198but are not visible to processes in other IPC namespaces.
199
200When an IPC namespace is destroyed
201(i.e., when the last process that is a member of the namespace terminates),
202all IPC objects in the namespace are automatically destroyed.
203
204Use of IPC namespaces requires a kernel that is configured with the
205.B CONFIG_IPC_NS
206option.
020357e8
MK
207
208.SS Network namespaces (CLONE_NEWNET)
209
210Network namespaces provide isolation of the system resources associated
211with networking: network devices, IP addresses, IP routing tables,
212.I /proc/net
213directory,
214.I /sys/class/net
215directory, port numbers, and so on.
216
73680728
MK
217A network namespace provides an isolated view of the networking stack
218(network device interfaces, IPv4 and IPv6 protocol stacks,
219IP routing tables, firewall rules, the
220.I /proc/net
221and
222.I /sys/class/net
223directory trees, sockets, etc.).
224A physical network device can live in exactly one
225network namespace.
226A virtual network device ("veth") pair provides a pipe-like abstraction
227.\" FIXME Add pointer to veth(4) page when it is eventually completed
228that can be used to create tunnels between network namespaces,
229and can be used to create a bridge to a physical network device
230in another namespace.
231
232When a network namespace is freed
233(i.e., when the last process in the namespace terminates),
234its physical network devices are moved back to the
235initial network namespace (not to the parent of the process).
236
237Use of network namespaces requires a kernel that is configured with the
238.B CONFIG_NET_NS
239option.
240
357002ec
MK
241.SS Mount namespaces (CLONE_NEWNS)
242
243Mount namespaces isolate the set of file system mount points,
244meaning that processes in different mount namespaces can
245have different views of the file system hierarchy.
246The set of mounts in a mount namespace is modified using
247.BR mount (2)
248and
249.BR umount (2).
250
251The
252.IR /proc/[pid]/mounts
253file (present since Linux 2.4.19)
254lists all the file systems currently mounted in the
255process's mount namespace.
256The format of this file is documented in
257.BR fstab (5).
258Since kernel version 2.6.15, this file is pollable:
259after opening the file for reading, a change in this file
260(i.e., a file system mount or unmount) causes
261.BR select (2)
262to mark the file descriptor as readable, and
263.BR poll (2)
264and
265.BR epoll_wait (2)
266mark the file as having an error condition.
267
4716a1dd
MK
268The
269.IR /proc/[pid]/mountstats
270file (present since Linux 2.6.17)
271exports information (statistics, configuration information)
272about the mount points in the process's mount namespace.
273This file is only readable by the owner of the process.
274Lines in this file have the form:
275.RS
276.in 12
277.nf
278
279device /dev/sda7 mounted on /home with fstype ext3 [statistics]
280( 1 ) ( 2 ) (3 ) (4)
281.fi
282.in
283
284The fields in each line are:
285.TP 5
286(1)
287The name of the mounted device
288(or "nodevice" if there is no corresponding device).
289.TP
290(2)
291The mount point within the file system tree.
292.TP
293(3)
294The file system type.
295.TP
296(4)
297Optional statistics and configuration information.
298Currently (as at Linux 2.6.26), only NFS file systems export
299information via this field.
300.RE
301
020357e8
MK
302.SS PID namespaces (CLONE_NEWPID)
303
304PID namespaces isolate the process ID number space,
305meaning that processes in different PID namespaces can have the same PID.
306PID namespaces allow containers to migrate to a new hosts
307while the processes inside the container maintain the same PIDs.
9d005472
MK
308
309PIDs in a new PID namespace start at 1,
310somewhat like a standalone system, and calls to
311.BR fork (2),
312.BR vfork (2),
313or
314.BR clone (2)
315will produce processes with PIDs that are unique within the namespace.
316
317The first process created in a new namespace
318(i.e., the process created using
319.BR clone (2)
320with the
321.BR CLONE_NEWPID
322flag, or the first child created by a process after a call to
323.BR unshare (2)
324using the
325.BR CLONE_NEWPID
326flag) has the PID 1, and is the "init" process for the namespace (see
327.BR init (1)).
328Children that are orphaned within the namespace will be reparented
329to this process rather than
330.BR init (8).
331Unlike the traditional
332.B init
333process, the "init" process of a PID namespace can terminate,
334and if it does, all of the processes in the namespace are terminated.
335
336PID namespaces can be nested.
337When a new PID namespace is created,
338the processes in that namespace are visible
339in the PID namespace of the process that created the new namespace;
340analogously, if the parent PID namespace is itself
341the child of another PID namespace,
342then processes in the child and parent PID namespaces will both be
343visible in the grandparent PID namespace.
344Conversely, the processes in the "child" PID namespace do not see
345the processes in the parent namespace.
346More succinctly: a process can see (e.g., send signals with
020357e8 347.BR kill(2))
9d005472 348only to processes contained in its own PID namespace
020357e8
MK
349and the namespaces nested below that PID namespace.
350
9d005472
MK
351A process will have one PID for each of the layers of the hierarchy
352starting from the PID namespace in which it resides
353through to the root PID namespace.
354A call to
355.BR getpid (2)
356always returns the PID associated with the namespace in which
357the process resides.
358
359After creating a new PID namespace,
360it is useful for the child to change its root directory
361and mount a new procfs instance at
362.I /proc
363so that tools such as
364.BR ps (1)
365work correctly.
366.\" mount -t proc proc /proc
367(If
368.BR CLONE_NEWNS
369is also included in the
370.IR flags
371argument of
372.BR clone (2)
373or
374.BR unshare (2)),
375then it isn't necessary to change the root directory:
376a new procfs instance can be mounted directly over
377.IR /proc .)
378
379Use of PID namespaces requires a kernel that is configured with the
380.B CONFIG_PID_NS
381option.
382
020357e8
MK
383.SS User namespaces (CLONE_NEWUSER)
384
9d005472
MK
385User namespaces isolate
386security related identifiers, in particular,
387user IDs, group IDs, keys (see
388.BR keyctl (2)),
389and capabilities.
020357e8
MK
390In other words, a process's user and group IDs can be different
391inside and outside a user namespace.
392A process can have a normal unprivileged user ID outside a user namespace
393while at the same time having a user ID of 0 inside the namespace;
394in other words,
395the process has full privileges for operations inside the user namespace,
396but is unprivileged for operations outside the namespace.
397
9d005472
MK
398When a user namespace is created,
399it starts out without a mapping of user IDs (group IDs)
400to the parent user namespace.
401The desired mapping of user IDs (group IDs) to the parent user namespace
402may be set by writing into
403.IR /proc/[pid]/uid_map
404.RI ( /proc/[pid]/gid_map );
405see below.
406
407The first process in a user namespace starts out with a complete set
408of capabilities with respect to the new user namespace.
409
410System calls that return user IDs (group IDs) will return
411either the user ID (group ID) mapped into the current
412user namespace if there is a mapping, or the overflow user ID (group ID);
413the default value for the overflow user ID (group ID) is 65534.
414See the descriptions of
415.IR /proc/sys/kernel/overflowuid
416and
417.IR /proc/sys/kernel/overflowgid
418in
419.BR proc (5).
420
421Starting in Linux 3.8, unprivileged processes can create user namespaces,
422and mount, PID, IPC, network, and UTS namespaces can be created with just the
423.B CAP_SYS_ADMIN
424capability in the caller's user namespace.
425
426If
427.BR CLONE_NEWUSER
428is specified along with other
429.B CLONE_NEW*
430flags in a single
431.BR clone (2)
432or
433.BR unshare (2)
434call, the user namespace is guaranteed to be created first,
435giving the caller privileges over the remaining
436namespaces created by the call.
437Thus, it possible for an unprivileged caller to specify this combination
438of flags.
439
440Use of user namespaces requires a kernel that is configured with the
441.B CONFIG_USER_NS
442option.
443
444Over the years, there have been a lot of features that have been added
445to the Linux kernel that are only available to privileged users
446because of their potential to confuse set-user-ID-root applications.
447In general, it becomes safe to allow the root user in a user namespace to
448use those features because it is impossible, while in a user namespace,
449to gain more privilege than the root user of a user namespace has.
020357e8 450
b81acb15
MK
451The
452.IR /proc/[pid]/uid_map
453and
454.IR /proc/[pid]/gid_map
455files (available since Linux 3.5)
456.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
457expose the mappings for user and group IDs
458inside the user namespace for the process
459.IR pid .
460The description here explains the details for
461.IR uid_map ;
462.IR gid_map
463is exactly the same,
464but each instance of "user ID" is replaced by "group ID".
465
466The
467.I uid_map
468file exposes the mapping of user IDs from the user namespace
469of the process
470.IR pid
471to the user namespace of the process that opened
472.IR uid_map
473(but see a qualification to this point below).
474In other words, processes that are in different user namespaces
475will potentially see different values when reading from a particular
476.I uid_map
477file, depending on the user ID mappings for the user namespaces
478of the reading processes.
479
480Each line in the file specifies a 1-to-1 mapping of a range of contiguous
9387987b 481user IDs between two user namespaces.
b81acb15
MK
482The specification in each line takes the form of
483three numbers delimited by white space.
484The first two numbers specify the starting user ID in
485each user namespace.
486The third number specifies the length of the mapped range.
487In detail, the fields are interpreted as follows:
488.IP (1) 4
489The start of the range of user IDs in
490the user namespace of the process
491.IR pid .
492.IP (2)
493The start of the range of user
494IDs to which the user IDs specified by field one map.
495How field two is interpreted depends on whether the process that opened
496.I uid_map
497and the process
498.IR pid
499are in the same user namespace, as follows:
500.RS
501.IP a) 3
502If the two processes are in different user namespaces:
503field two is the start of a range of
504user IDs in the user namespace of the process that opened
505.IR uid_map .
506.IP b)
507If the two processes are in the same user namespace:
508field two is the start of the range of
509user IDs in the parent user namespace of the process
510.IR pid .
511(The "parent user namespace"
512is the user namespace of the process that created a user namespace
513via a call to
514.BR unshare (2)
515or
516.BR clone (2)
517with the
518.BR CLONE_NEWUSER
519flag.)
520This case enables the opener of
521.I uid_map
522(the common case here is opening
523.IR /proc/self/uid_map )
524to see the mapping of user IDs into the user namespace of the process
525that created this user namespace.
526.RE
527.IP (3)
528The length of the range of user IDs that is mapped between the two
529user namespaces.
530.PP
531After the creation of a new user namespace, the
532.I uid_map
533file may be written to exactly once to specify
534the mapping of user IDs in the new user namespace.
535(An attempt to write more than once to the file fails with the error
536.BR EPERM .)
537
538The lines written to
539.IR uid_map
540must conform to the following rules:
541.IP * 3
542The three fields must be valid numbers,
543and the last field must be greater than 0.
544.IP *
545Lines are terminated by newline characters.
546.IP *
547There is an (arbitrary) limit on the number of lines in the file.
548As at Linux 3.8, the limit is five lines.
549.IP *
550The range of user IDs specified in each line cannot overlap with the ranges
551in any other lines.
552In the current implementation (Linux 3.8), this requirement is
553satisfied by a simplistic implementation that imposes the further
554requirement that
555the values in both field 1 and field 2 of successive lines must be
556in ascending numerical order.
557.PP
558Writes that violate the above rules fail with the error
559.BR EINVAL .
560
561In order for a process to write to the
562.I /proc/[pid]/uid_map
563.RI ( /proc/[pid]/gid_map )
564file, the following requirements must be met:
565.IP * 3
566The process must have the
567.BR CAP_SETUID
568.RB ( CAP_SETGID )
569capability in the user namespace of the process
570.IR pid .
571.IP *
572The process must have the
573.BR CAP_SETUID
574.RB ( CAP_SETGID )
575capability in the parent user namespace.
576.IP *
577The process must be in either the user namespace of the process
578.I pid
579or inside the parent user namespace of the process
580.IR pid .
581
020357e8
MK
582.SS UTS namespaces (CLONE_NEWUTS)
583
584UTS namespaces provide isolation of two system identifiers:
585the hostname and the NIS domain name.
586These identifiers are set using
587.BR sethostname (2)
588and
589.BR setdomainname (2),
590and can be retrieved using
591.BR uname (2),
592.BR gethostname (2),
593and
594.BR getdomainname (2).
595
83d9e9b2
MK
596Use of UTS namespaces requires a kernel that is configured with the
597.B CONFIG_UTS_NS
598option.
599
020357e8
MK
600.SH CONFORMING TO
601Namespaces are a Linux-specific feature.
602.SH SEE ALSO
603.BR readlink (1),
604.BR clone (2),
605.BR setns (2),
606.BR unshare (2),
607.BR proc (5),
608.BR credentials (7),
609.BR capabilities (7)