1 .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
4 .\" Permission is granted to make and distribute verbatim copies of this
5 .\" manual provided the copyright notice and this permission notice are
6 .\" preserved on all copies.
8 .\" Permission is granted to copy and distribute modified versions of this
9 .\" manual under the conditions for verbatim copying, provided that the
10 .\" entire resulting derived work is distributed under the terms of a
11 .\" permission notice identical to this one.
13 .\" Since the Linux kernel and libraries are constantly changing, this
14 .\" manual page may be incorrect or out-of-date. The author(s) assume no
15 .\" responsibility for errors or omissions, or for damages resulting from
16 .\" the use of the information contained herein. The author(s) may not
17 .\" have taken the same level of care in the production of this manual,
18 .\" which is licensed free of charge, as they might when working
21 .\" Formatted or processed versions of this manual, if unaccompanied by
22 .\" the source, must acknowledge the copyright and authors of this work.
25 .TH NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
27 namespaces \- overview of Linux namespaces
29 A namespace wraps a global system resource in an abstraction that
30 makes it appear to the processes within the namespace that they
31 have their own isolated instance of the global resource.
32 Changes to the global resource are visible to other processes
33 that are members of the namespace, but are invisible to other processes.
34 One use of namespaces is to implement containers.
36 This page describes the various namespaces and the associated
38 files, and summarizes the APIs for working with namespaces.
39 .SS The namespaces API
42 files described below,
43 the namespaces API includes the following system calls:
48 system call creates a new process.
51 argument of the call specifies one or more of the
53 flags listed below, then new namespaces are created for each flag,
54 and the child process is made a member of those namespaces.
55 (This system call also implements a number of features
56 unrelated to namespaces.)
61 system call allows the calling process to join an existing namespace.
62 The namespace to join is specified via a file descriptor that refers to
65 files described below.
70 system call moves the calling process to a new namespace.
73 argument of the call specifies one or more of the
75 flags listed below, then new namespaces are created for each flag,
76 and the calling process is made a member of those namespaces.
77 (This system call also implements a number of features
78 unrelated to namespaces.)
80 Creation of new namespaces using
84 in most cases requires the
87 User namespaces are the exception: since Linux 3.8,
88 no privilege is required to create a user namespace.
89 .SS The /proc/[pid]/ns/ directory
92 .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
93 subdirectory containing one entry for each namespace that
94 supports being manipulated by
99 $ \fBls -l /proc/$$/ns\fP
101 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 ipc -> ipc:[4026531839]
102 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 mnt -> mnt:[4026531840]
103 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 net -> net:[4026531956]
104 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 pid -> pid:[4026531836]
105 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 user -> user:[4026531837]
106 lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 uts -> uts:[4026531838]
112 one of the files in this directory
113 to somewhere else in the file system keeps
114 the corresponding namespace of the process specified by
116 alive even if all processes currently in the namespace terminate.
118 Opening one of the files in this directory
119 (or a file that is bind mounted to one of these files)
120 returns a file handle for
121 the corresponding namespace of the process specified by
123 As long as this file descriptor remains open,
124 the namespace will remain alive,
125 even if all processes in the namespace terminate.
126 The file descriptor can be passed to
129 In Linux 3.7 and earlier, these files were visible as hard links.
130 Since Linux 3.8, they appear as symbolic links.
131 If two processes are in the same namespace, then the inode numbers of their
132 .IR /proc/[pid]/ns/xxx
133 symbolic links will be the same; an application can check this using the
137 The content of this symbolic link is a string containing
138 the namespace type and inode number as in the following example:
142 $ \fBreadlink /proc/$$/ns/uts\fP
147 The files in this subdirectory are as follows:
149 .IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
150 This file is a handle for the IPC namespace of the process.
152 .IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
153 This file is a handle for the mount namespace of the process.
155 .IR /proc/[pid]/ns/net " (since Linux 3.0)"
156 This file is a handle for the network namespace of the process.
158 .IR /proc/[pid]/ns/pid " (since Linux 3.8)"
159 This file is a handle for the PID namespace of the process.
161 .IR /proc/[pid]/ns/user " (since Linux 3.8)"
162 This file is a handle for the user namespace of the process.
164 .IR /proc/[pid]/ns/uts " (since Linux 3.0)"
165 This file is a handle for the IPC namespace of the process.
166 .SS IPC namespaces (CLONE_NEWIPC)
167 IPC namespaces isolate certain IPC resources,
168 namely, System V IPC objects (see
170 and (since Linux 2.6.30)
171 .\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
172 .\" https://lwn.net/Articles/312232/
173 POSIX message queues (see
175 The common characteristic of these IPC mechanisms is that IPC
176 objects are identified by mechanisms other than file system
179 Each IPC namespace has its own set of System V IPC identifiers and
180 its own POSIX message queue file system.
181 Objects created in an IPC namespace are visible to all other processes
182 that are members of that namespace,
183 but are not visible to processes in other IPC namespaces.
185 When an IPC namespace is destroyed
186 (i.e., when the last process that is a member of the namespace terminates),
187 all IPC objects in the namespace are automatically destroyed.
189 Use of IPC namespaces requires a kernel that is configured with the
192 .SS Network namespaces (CLONE_NEWNET)
193 Network namespaces provide isolation of the system resources associated
194 with networking: network devices, IP addresses, IP routing tables,
198 directory, port numbers, and so on.
200 A network namespace provides an isolated view of the networking stack
201 (network device interfaces, IPv4 and IPv6 protocol stacks,
202 IP routing tables, firewall rules, the
206 directory trees, sockets, etc.).
207 A physical network device can live in exactly one
209 A virtual network device ("veth") pair provides a pipe-like abstraction
210 .\" FIXME Add pointer to veth(4) page when it is eventually completed
211 that can be used to create tunnels between network namespaces,
212 and can be used to create a bridge to a physical network device
213 in another namespace.
215 When a network namespace is freed
216 (i.e., when the last process in the namespace terminates),
217 its physical network devices are moved back to the
218 initial network namespace (not to the parent of the process).
220 Use of network namespaces requires a kernel that is configured with the
223 .SS Mount namespaces (CLONE_NEWNS)
224 Mount namespaces isolate the set of file system mount points,
225 meaning that processes in different mount namespaces can
226 have different views of the file system hierarchy.
227 The set of mounts in a mount namespace is modified using
233 .IR /proc/[pid]/mounts
234 file (present since Linux 2.4.19)
235 lists all the file systems currently mounted in the
236 process's mount namespace.
237 The format of this file is documented in
239 Since kernel version 2.6.15, this file is pollable:
240 after opening the file for reading, a change in this file
241 (i.e., a file system mount or unmount) causes
243 to mark the file descriptor as readable, and
247 mark the file as having an error condition.
250 .IR /proc/[pid]/mountstats
251 file (present since Linux 2.6.17)
252 exports information (statistics, configuration information)
253 about the mount points in the process's mount namespace.
254 This file is only readable by the owner of the process.
255 Lines in this file have the form:
260 device /dev/sda7 mounted on /home with fstype ext3 [statistics]
265 The fields in each line are:
268 The name of the mounted device
269 (or "nodevice" if there is no corresponding device).
272 The mount point within the file system tree.
275 The file system type.
278 Optional statistics and configuration information.
279 Currently (as at Linux 2.6.26), only NFS file systems export
280 information via this field.
282 .SS PID namespaces (CLONE_NEWPID)
283 PID namespaces isolate the process ID number space,
284 meaning that processes in different PID namespaces can have the same PID.
285 PID namespaces allow containers to migrate to a new host
286 while the processes inside the container maintain the same PIDs.
288 PIDs in a new PID namespace start at 1,
289 somewhat like a standalone system, and calls to
294 will produce processes with PIDs that are unique within the namespace.
296 The first process created in a new namespace
297 (i.e., the process created using
301 flag, or the first child created by a process after a call to
305 flag) has the PID 1, and is the "init" process for the namespace (see
307 Children that are orphaned within the namespace will be reparented
308 to this process rather than
311 If the "init" process of a PID namespace terminates,
312 the kernel terminates all of the processes in the namespace via a
315 This behavior reflects the fact that the "init" process
316 is essential for the correct operation of a PID namespace.
317 In this case, a subsequent
319 into this PID namespace (e.g., from a process that has done a
321 into the namespace using an open file descriptor for a
322 .I /proc/[pid]/ns/pid
323 file corresponding to a process that was in the namespace)
324 will fail with the error
326 it is not possible to create a new processes in a PID namespace whose "init"
327 process has terminated.
329 Only signals for which the "init" process has established a signal handler
330 can be sent to the "init" process by other members of the PID namespace.
331 This restriction applies even to privileged processes,
332 and prevents other members of the PID namespace from
333 accidentally killing the "init" process.
335 Likewise, a process in an ancestor namespace
336 can\(emsubject to the usual permission checks described in
338 signals to the "init" process of a child PID namespace only
339 if the "init" process has established a handler for that signal.
340 (Within the handler, the
349 are treated exceptionally:
350 these signals are forcibly delivered when sent from an ancestor PID namespace.
351 Neither of these signals can be caught by the "init" process,
352 and so will result in the usual actions associated with those signals
353 (respectively, terminating and stopping the process).
355 PID namespaces can be nested.
356 When a new PID namespace is created,
357 the processes in that namespace are visible
358 in the PID namespace of the process that created the new namespace;
359 analogously, if the parent PID namespace is itself
360 the child of another PID namespace,
361 then processes in the child and parent PID namespaces will both be
362 visible in the grandparent PID namespace.
363 Conversely, the processes in the "child" PID namespace do not see
364 the processes in the parent namespace.
365 More succinctly: a process can see (e.g., send signals with
367 only processes contained in its own PID namespace
368 and the namespaces nested below that PID namespace.
370 A process will have one PID for each of the layers of the hierarchy
371 starting from the PID namespace in which it resides
372 through to the root PID namespace.
375 always returns the PID associated with the namespace in which
378 Some processes in a PID namespace may have parents
379 that are outside of the namespace.
380 For example, the parent of the initial process in the namespace
384 process with PID 1) is necessarily in another namespace.
385 Likewise, the direct children of a process that uses
387 to cause its children to join a PID namespace are in a different
388 PID namespace from the caller of
392 for such processes return 0.
394 After creating a new PID namespace,
395 it is useful for the child to change its root directory
396 and mount a new procfs instance at
398 so that tools such as
401 .\" mount -t proc proc /proc
404 is also included in the
410 then it isn't necessary to change the root directory:
411 a new procfs instance can be mounted directly over
416 that specify a PID namespace file descriptor
421 flag cause children subsequently created
422 by the caller to be placed in a different PID namespace from the caller.
423 These calls do not, however,
424 change the PID namespace of the calling process,
425 because doing so would change the caller's idea of its own PID
428 which would break many applications and libraries.
429 To put things another way:
430 a process's PID namespace membership is determined when the process is created
431 and cannot be changed thereafter.
433 Every thread in a process must be in the same PID namespace.
434 For this reason, the two following call sequences will fail:
438 unshare(CLONE_NEWPID);
439 clone(..., CLONE_VM, ...); /* Fails */
441 setns(fd, CLONE_NEWPID);
442 clone(..., CLONE_VM, ...); /* Fails */
450 calls only change the PID namespace for created children, the
452 calls necessarily put the new thread in a different PID namespace from
455 When a process ID is passed over a UNIX domain socket to a
456 process in a different PID namespace (see the description of
460 it is translated into the corresponding PID value in
461 the receiving process's PID namespace.
462 .\" FIXME Presumably, a similar thing happens with the UID and GID passed
463 .\" via a UNIX domain socket. That needs to be confirmed and documented
464 .\" under the "User namespaces" section.
466 Use of PID namespaces requires a kernel that is configured with the
469 .SS User namespaces (CLONE_NEWUSER)
470 User namespaces isolate security-related identifiers, in particular,
471 user IDs, group IDs, keys (see
474 A process's user and group IDs can be different
475 inside and outside a user namespace.
477 a process can have a normal unprivileged user ID outside a user namespace
478 while at the same time having a user ID of 0 inside the namespace;
480 the process has full privileges for operations inside the user namespace,
481 but is unprivileged for operations outside the namespace.
483 When a user namespace is created,
484 it starts out without a mapping of user IDs (group IDs)
485 to the parent user namespace.
486 The desired mapping of user IDs (group IDs) to the parent user namespace
487 may be set by writing into
488 .IR /proc/[pid]/uid_map
489 .RI ( /proc/[pid]/gid_map );
492 The first process in a user namespace starts out with a complete set
493 of capabilities with respect to the new user namespace.
495 System calls that return user IDs (group IDs) will return
496 either the user ID (group ID) mapped into the current
497 user namespace if there is a mapping, or the overflow user ID (group ID);
498 the default value for the overflow user ID (group ID) is 65534.
499 See the descriptions of
500 .IR /proc/sys/kernel/overflowuid
502 .IR /proc/sys/kernel/overflowgid
506 Starting in Linux 3.8, unprivileged processes can create user namespaces,
507 and mount, PID, IPC, network, and UTS namespaces can be created with just the
509 capability in the caller's user namespace.
513 is specified along with other
519 call, the user namespace is guaranteed to be created first,
520 giving the caller privileges over the remaining
521 namespaces created by the call.
522 Thus, it is possible for an unprivileged caller to specify this combination
525 Use of user namespaces requires a kernel that is configured with the
529 Over the years, there have been a lot of features that have been added
530 to the Linux kernel that are only available to privileged users
531 because of their potential to confuse set-user-ID-root applications.
532 In general, it becomes safe to allow the root user in a user namespace to
533 use those features because it is impossible, while in a user namespace,
534 to gain more privilege than the root user of a user namespace has.
537 .IR /proc/[pid]/uid_map
539 .IR /proc/[pid]/gid_map
540 files (available since Linux 3.5)
541 .\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
542 expose the mappings for user and group IDs
543 inside the user namespace for the process
545 The description here explains the details for
549 but each instance of "user ID" is replaced by "group ID".
553 file exposes the mapping of user IDs from the user namespace
556 to the user namespace of the process that opened
558 (but see a qualification to this point below).
559 In other words, processes that are in different user namespaces
560 will potentially see different values when reading from a particular
562 file, depending on the user ID mappings for the user namespaces
563 of the reading processes.
567 file specifies a 1-to-1 mapping of a range of contiguous
568 user IDs between two user namespaces.
569 (When a user namespace is first created, this file is empty.)
570 The specification in each line takes the form of
571 three numbers delimited by white space.
572 The first two numbers specify the starting user ID in
574 The third number specifies the length of the mapped range.
575 In detail, the fields are interpreted as follows:
577 The start of the range of user IDs in
578 the user namespace of the process
581 The start of the range of user
582 IDs to which the user IDs specified by field one map.
583 How field two is interpreted depends on whether the process that opened
587 are in the same user namespace, as follows:
590 If the two processes are in different user namespaces:
591 field two is the start of a range of
592 user IDs in the user namespace of the process that opened
595 If the two processes are in the same user namespace:
596 field two is the start of the range of
597 user IDs in the parent user namespace of the process
599 (The "parent user namespace"
600 is the user namespace of the process that created a user namespace
608 This case enables the opener of
610 (the common case here is opening
611 .IR /proc/self/uid_map )
612 to see the mapping of user IDs into the user namespace of the process
613 that created this user namespace.
616 The length of the range of user IDs that is mapped between the two
619 After the creation of a new user namespace, the
623 of the process in the namespace may be written to
625 to define the mapping of user IDs in the new user namespace.
626 (An attempt to write more than once to a
628 file in a user namespace fails with the error
633 must conform to the following rules:
635 The three fields must be valid numbers,
636 and the last field must be greater than 0.
638 Lines are terminated by newline characters.
640 There is an (arbitrary) limit on the number of lines in the file.
641 As at Linux 3.8, the limit is five lines.
642 In addition, the number of bytes written to
643 the file must be less than the system page size,
644 .\" FIXME(Eric): the restriction "less than" rather than "less than or equal"
645 .\" seems strangely arbitrary. Furthermore, the comment does not agree
646 .\" with the code in kernel/user_namespace.c. Which is correct.
647 and the write must be performed at the start of the file (i.e.,
651 can't be used to write to nonzero offsets in the file).
653 The range of user IDs specified in each line cannot overlap with the ranges
655 In the current implementation (Linux 3.8), this requirement is
656 satisfied by a simplistic implementation that imposes the further
658 the values in both field 1 and field 2 of successive lines must be
659 in ascending numerical order.
661 At least one line must be written to the file.
663 Writes that violate the above rules fail with the error
666 In order for a process to write to the
667 .I /proc/[pid]/uid_map
668 .RI ( /proc/[pid]/gid_map )
669 file, all of the following requirements must be met:
671 The writing process must have the
674 capability in the user namespace of the process
677 The writing process must be in either the user namespace of the process
679 or inside the parent user namespace of the process
682 One of the following is true:
688 consists of a single line that maps the writing process's file system user ID
689 (group ID) in the parent user namespace to a user ID (group ID)
690 in the user namespace.
695 capability in the parent user namespace.
696 Thus, a privileged process can make mappings to arbitrary user ID (group ID)
697 in the parent user namespace.
700 Writes that violate the above rules fail with the error
703 When a process inside a user namespace executes
704 a set-user-ID (set-group-ID) program,
705 the process's effective user (group) ID inside the namespace is changed
706 to whatever value is mapped for the user (group) ID of the file.
707 However, if either the user
709 the group ID of the file has no mapping inside the namespace,
710 the set-user-ID (set-group-ID) bit is silently ignored:
711 the new program is executed,
712 but the process's effective user (group) ID is left unchanged.
713 (This mirrors the semantics of executing a set-user-ID or set-group-ID
714 program that resides on a filesystem that was mounted with the
718 .SS UTS namespaces (CLONE_NEWUTS)
719 UTS namespaces provide isolation of two system identifiers:
720 the hostname and the NIS domain name.
721 These identifiers are set using
724 .BR setdomainname (2),
725 and can be retrieved using
729 .BR getdomainname (2).
731 Use of UTS namespaces requires a kernel that is configured with the
735 Namespaces are a Linux-specific feature.
745 .BR capabilities (7),