]>
Commit | Line | Data |
---|---|---|
020357e8 | 1 | .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com> |
7a30282c | 2 | .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com> |
020357e8 MK |
3 | .\" |
4 | .\" Permission is granted to make and distribute verbatim copies of this | |
5 | .\" manual provided the copyright notice and this permission notice are | |
6 | .\" preserved on all copies. | |
7 | .\" | |
8 | .\" Permission is granted to copy and distribute modified versions of this | |
9 | .\" manual under the conditions for verbatim copying, provided that the | |
10 | .\" entire resulting derived work is distributed under the terms of a | |
11 | .\" permission notice identical to this one. | |
12 | .\" | |
13 | .\" Since the Linux kernel and libraries are constantly changing, this | |
14 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
15 | .\" responsibility for errors or omissions, or for damages resulting from | |
16 | .\" the use of the information contained herein. The author(s) may not | |
17 | .\" have taken the same level of care in the production of this manual, | |
18 | .\" which is licensed free of charge, as they might when working | |
19 | .\" professionally. | |
20 | .\" | |
21 | .\" Formatted or processed versions of this manual, if unaccompanied by | |
22 | .\" the source, must acknowledge the copyright and authors of this work. | |
23 | .\" | |
24 | .\" | |
25 | .TH NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual" | |
26 | .SH NAME | |
27 | namespaces \- overview of Linux namespaces | |
28 | .SH DESCRIPTION | |
29 | A namespace wraps a global system resource in an abstraction that | |
30 | makes it appear to the processes within the namespace that they | |
31 | have their own isolated instance of the global resource. | |
32 | Changes to the global resource are visible to other processes | |
33 | that are members of the namespace, but are invisible to other processes. | |
34 | One use of namespaces is to implement containers. | |
35 | ||
36 | This page describes the various namespaces and the associated | |
37 | .I /proc | |
38 | files, and summarizes the APIs for working with namespaces. | |
6be09bd8 MK |
39 | .\" |
40 | .\" ==================== The namespaces API ==================== | |
41 | .\" | |
020357e8 | 42 | .SS The namespaces API |
020357e8 MK |
43 | As well as various |
44 | .I /proc | |
45 | files described below, | |
291e9237 | 46 | the namespaces API includes the following system calls: |
020357e8 MK |
47 | .TP |
48 | .BR clone (2) | |
49 | The | |
50 | .BR clone (2) | |
51 | system call creates a new process. | |
52 | If the | |
53 | .I flags | |
54 | argument of the call specifies one or more of the | |
55 | .B CLONE_NEW* | |
56 | flags listed below, then new namespaces are created for each flag, | |
57 | and the child process is made a member of those namespaces. | |
58 | (This system call also implements a number of features | |
59 | unrelated to namespaces.) | |
020357e8 MK |
60 | .TP |
61 | .BR setns (2) | |
62 | The | |
63 | .BR setns (2) | |
64 | system call allows the calling process to join an existing namespace. | |
65 | The namespace to join is specified via a file descriptor that refers to | |
66 | one of the | |
67 | .IR /proc/[pid]/ns | |
68 | files described below. | |
020357e8 MK |
69 | .TP |
70 | .BR unshare (2) | |
71 | The | |
72 | .BR unshare (2) | |
73 | system call moves the calling process to a new namespace. | |
74 | If the | |
75 | .I flags | |
76 | argument of the call specifies one or more of the | |
77 | .B CLONE_NEW* | |
78 | flags listed below, then new namespaces are created for each flag, | |
79 | and the calling process is made a member of those namespaces. | |
80 | (This system call also implements a number of features | |
81 | unrelated to namespaces.) | |
3c7103af | 82 | .PP |
027a0716 MK |
83 | Creation of new namespaces using |
84 | .BR clone (2) | |
85 | and | |
86 | .BR unshare (2) | |
87 | in most cases requires the | |
88 | .BR CAP_SYS_ADMIN | |
89 | capability. | |
90 | User namespaces are the exception: since Linux 3.8, | |
2a4cbd77 | 91 | no privilege is required to create a user namespace. |
6be09bd8 MK |
92 | .\" |
93 | .\" ==================== The /proc/[pid]/ns/ directory ==================== | |
94 | .\" | |
cf8bfe6d | 95 | .SS The /proc/[pid]/ns/ directory |
cf8bfe6d MK |
96 | Each process has a |
97 | .IR /proc/[pid]/ns/ | |
98 | .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f | |
99 | subdirectory containing one entry for each namespace that | |
100 | supports being manipulated by | |
f2752f90 MK |
101 | .BR setns (2): |
102 | ||
103 | .in +4n | |
104 | .nf | |
105 | $ \fBls -l /proc/$$/ns\fP | |
106 | total 0 | |
107 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 ipc -> ipc:[4026531839] | |
108 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 mnt -> mnt:[4026531840] | |
109 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 net -> net:[4026531956] | |
110 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 pid -> pid:[4026531836] | |
111 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 user -> user:[4026531837] | |
112 | lrwxrwxrwx. 1 mtk mtk 0 Jan 14 01:20 uts -> uts:[4026531838] | |
113 | .fi | |
114 | .in | |
cf8bfe6d MK |
115 | |
116 | Bind mounting (see | |
117 | .BR mount (2)) | |
118 | one of the files in this directory | |
119 | to somewhere else in the file system keeps | |
120 | the corresponding namespace of the process specified by | |
121 | .I pid | |
122 | alive even if all processes currently in the namespace terminate. | |
123 | ||
124 | Opening one of the files in this directory | |
125 | (or a file that is bind mounted to one of these files) | |
126 | returns a file handle for | |
127 | the corresponding namespace of the process specified by | |
128 | .IR pid . | |
129 | As long as this file descriptor remains open, | |
130 | the namespace will remain alive, | |
131 | even if all processes in the namespace terminate. | |
132 | The file descriptor can be passed to | |
133 | .BR setns (2). | |
134 | ||
135 | In Linux 3.7 and earlier, these files were visible as hard links. | |
136 | Since Linux 3.8, they appear as symbolic links. | |
137 | If two processes are in the same namespace, then the inode numbers of their | |
138 | .IR /proc/[pid]/ns/xxx | |
139 | symbolic links will be the same; an application can check this using the | |
140 | .I stat.st_ino | |
141 | field returned by | |
142 | .BR stat (2). | |
143 | The content of this symbolic link is a string containing | |
144 | the namespace type and inode number as in the following example: | |
145 | ||
146 | .in +4n | |
147 | .nf | |
148 | $ \fBreadlink /proc/$$/ns/uts\fP | |
149 | uts:[4026531838] | |
150 | .fi | |
151 | .in | |
152 | ||
153 | The files in this subdirectory are as follows: | |
154 | .TP | |
155 | .IR /proc/[pid]/ns/ipc " (since Linux 3.0)" | |
156 | This file is a handle for the IPC namespace of the process. | |
cf8bfe6d MK |
157 | .TP |
158 | .IR /proc/[pid]/ns/mnt " (since Linux 3.8)" | |
159 | This file is a handle for the mount namespace of the process. | |
cf8bfe6d MK |
160 | .TP |
161 | .IR /proc/[pid]/ns/net " (since Linux 3.0)" | |
162 | This file is a handle for the network namespace of the process. | |
cf8bfe6d MK |
163 | .TP |
164 | .IR /proc/[pid]/ns/pid " (since Linux 3.8)" | |
165 | This file is a handle for the PID namespace of the process. | |
cf8bfe6d MK |
166 | .TP |
167 | .IR /proc/[pid]/ns/user " (since Linux 3.8)" | |
168 | This file is a handle for the user namespace of the process. | |
cf8bfe6d MK |
169 | .TP |
170 | .IR /proc/[pid]/ns/uts " (since Linux 3.0)" | |
171 | This file is a handle for the IPC namespace of the process. | |
6be09bd8 MK |
172 | .\" |
173 | .\" ==================== IPC namespaces ==================== | |
174 | .\" | |
020357e8 | 175 | .SS IPC namespaces (CLONE_NEWIPC) |
020357e8 MK |
176 | IPC namespaces isolate certain IPC resources, |
177 | namely, System V IPC objects (see | |
178 | .BR svipc (7)) | |
9343f8e7 MK |
179 | and (since Linux 2.6.30) |
180 | .\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f | |
181 | .\" https://lwn.net/Articles/312232/ | |
182 | POSIX message queues (see | |
020357e8 | 183 | .BR mq_overview (7). |
9343f8e7 | 184 | The common characteristic of these IPC mechanisms is that IPC |
a122e267 | 185 | objects are identified by mechanisms other than file system |
9343f8e7 MK |
186 | pathnames. |
187 | ||
020357e8 MK |
188 | Each IPC namespace has its own set of System V IPC identifiers and |
189 | its own POSIX message queue file system. | |
9343f8e7 MK |
190 | Objects created in an IPC namespace are visible to all other processes |
191 | that are members of that namespace, | |
192 | but are not visible to processes in other IPC namespaces. | |
193 | ||
194 | When an IPC namespace is destroyed | |
195 | (i.e., when the last process that is a member of the namespace terminates), | |
196 | all IPC objects in the namespace are automatically destroyed. | |
197 | ||
198 | Use of IPC namespaces requires a kernel that is configured with the | |
199 | .B CONFIG_IPC_NS | |
200 | option. | |
6be09bd8 MK |
201 | .\" |
202 | .\" ==================== Network namespaces ==================== | |
203 | .\" | |
020357e8 | 204 | .SS Network namespaces (CLONE_NEWNET) |
020357e8 MK |
205 | Network namespaces provide isolation of the system resources associated |
206 | with networking: network devices, IP addresses, IP routing tables, | |
207 | .I /proc/net | |
208 | directory, | |
209 | .I /sys/class/net | |
210 | directory, port numbers, and so on. | |
211 | ||
73680728 MK |
212 | A network namespace provides an isolated view of the networking stack |
213 | (network device interfaces, IPv4 and IPv6 protocol stacks, | |
214 | IP routing tables, firewall rules, the | |
215 | .I /proc/net | |
216 | and | |
217 | .I /sys/class/net | |
218 | directory trees, sockets, etc.). | |
219 | A physical network device can live in exactly one | |
220 | network namespace. | |
221 | A virtual network device ("veth") pair provides a pipe-like abstraction | |
222 | .\" FIXME Add pointer to veth(4) page when it is eventually completed | |
223 | that can be used to create tunnels between network namespaces, | |
224 | and can be used to create a bridge to a physical network device | |
225 | in another namespace. | |
226 | ||
227 | When a network namespace is freed | |
228 | (i.e., when the last process in the namespace terminates), | |
229 | its physical network devices are moved back to the | |
230 | initial network namespace (not to the parent of the process). | |
231 | ||
232 | Use of network namespaces requires a kernel that is configured with the | |
233 | .B CONFIG_NET_NS | |
234 | option. | |
6be09bd8 MK |
235 | .\" |
236 | .\" ==================== Mount namespaces ==================== | |
237 | .\" | |
357002ec | 238 | .SS Mount namespaces (CLONE_NEWNS) |
357002ec MK |
239 | Mount namespaces isolate the set of file system mount points, |
240 | meaning that processes in different mount namespaces can | |
241 | have different views of the file system hierarchy. | |
242 | The set of mounts in a mount namespace is modified using | |
243 | .BR mount (2) | |
244 | and | |
245 | .BR umount (2). | |
246 | ||
247 | The | |
248 | .IR /proc/[pid]/mounts | |
249 | file (present since Linux 2.4.19) | |
250 | lists all the file systems currently mounted in the | |
251 | process's mount namespace. | |
252 | The format of this file is documented in | |
253 | .BR fstab (5). | |
254 | Since kernel version 2.6.15, this file is pollable: | |
255 | after opening the file for reading, a change in this file | |
256 | (i.e., a file system mount or unmount) causes | |
257 | .BR select (2) | |
258 | to mark the file descriptor as readable, and | |
259 | .BR poll (2) | |
260 | and | |
261 | .BR epoll_wait (2) | |
262 | mark the file as having an error condition. | |
263 | ||
4716a1dd MK |
264 | The |
265 | .IR /proc/[pid]/mountstats | |
266 | file (present since Linux 2.6.17) | |
267 | exports information (statistics, configuration information) | |
268 | about the mount points in the process's mount namespace. | |
269 | This file is only readable by the owner of the process. | |
270 | Lines in this file have the form: | |
271 | .RS | |
272 | .in 12 | |
273 | .nf | |
274 | ||
275 | device /dev/sda7 mounted on /home with fstype ext3 [statistics] | |
276 | ( 1 ) ( 2 ) (3 ) (4) | |
277 | .fi | |
278 | .in | |
279 | ||
280 | The fields in each line are: | |
281 | .TP 5 | |
282 | (1) | |
283 | The name of the mounted device | |
284 | (or "nodevice" if there is no corresponding device). | |
285 | .TP | |
286 | (2) | |
287 | The mount point within the file system tree. | |
288 | .TP | |
289 | (3) | |
290 | The file system type. | |
291 | .TP | |
292 | (4) | |
293 | Optional statistics and configuration information. | |
294 | Currently (as at Linux 2.6.26), only NFS file systems export | |
295 | information via this field. | |
296 | .RE | |
6be09bd8 MK |
297 | .\" |
298 | .\" ==================== PID namespaces ==================== | |
299 | .\" | |
020357e8 | 300 | .SS PID namespaces (CLONE_NEWPID) |
020357e8 MK |
301 | PID namespaces isolate the process ID number space, |
302 | meaning that processes in different PID namespaces can have the same PID. | |
7091f8f3 | 303 | PID namespaces allow containers to migrate to a new host |
020357e8 | 304 | while the processes inside the container maintain the same PIDs. |
9d005472 MK |
305 | |
306 | PIDs in a new PID namespace start at 1, | |
307 | somewhat like a standalone system, and calls to | |
308 | .BR fork (2), | |
309 | .BR vfork (2), | |
310 | or | |
311 | .BR clone (2) | |
312 | will produce processes with PIDs that are unique within the namespace. | |
313 | ||
314 | The first process created in a new namespace | |
315 | (i.e., the process created using | |
316 | .BR clone (2) | |
317 | with the | |
318 | .BR CLONE_NEWPID | |
319 | flag, or the first child created by a process after a call to | |
320 | .BR unshare (2) | |
321 | using the | |
322 | .BR CLONE_NEWPID | |
323 | flag) has the PID 1, and is the "init" process for the namespace (see | |
324 | .BR init (1)). | |
325 | Children that are orphaned within the namespace will be reparented | |
326 | to this process rather than | |
84c35715 | 327 | .BR init (1). |
33a3c1b8 MK |
328 | |
329 | If the "init" process of a PID namespace terminates, | |
b16d757d MK |
330 | the kernel terminates all of the processes in the namespace via a |
331 | .BR SIGKILL | |
332 | signal. | |
33a3c1b8 MK |
333 | This behavior reflects the fact that the "init" process |
334 | is essential for the correct operation of a PID namespace. | |
3c967963 | 335 | In this case, a subsequent |
bcf8010e | 336 | .BR fork (2) |
3c967963 MK |
337 | into this PID namespace (e.g., from a process that has done a |
338 | .BR setns (2) | |
339 | into the namespace using an open file descriptor for a | |
340 | .I /proc/[pid]/ns/pid | |
341 | file corresponding to a process that was in the namespace) | |
bcf8010e MK |
342 | will fail with the error |
343 | .BR ENOMEM ; | |
3c967963 MK |
344 | it is not possible to create a new processes in a PID namespace whose "init" |
345 | process has terminated. | |
9d005472 | 346 | |
e17d07c1 MK |
347 | Only signals for which the "init" process has established a signal handler |
348 | can be sent to the "init" process by other members of the PID namespace. | |
349 | This restriction applies even to privileged processes, | |
350 | and prevents other members of the PID namespace from | |
351 | accidentally killing the "init" process. | |
c0004fb4 MK |
352 | |
353 | Likewise, a process in an ancestor namespace | |
354 | can\(emsubject to the usual permission checks described in | |
e17d07c1 | 355 | .BR kill (2)\(emsend |
c0004fb4 MK |
356 | signals to the "init" process of a child PID namespace only |
357 | if the "init" process has established a handler for that signal. | |
358 | (Within the handler, the | |
359 | .I siginfo_t | |
360 | .I si_pid | |
361 | field described in | |
362 | .BR sigaction (2) | |
363 | will be zero.) | |
fc49d2ac MK |
364 | .B SIGKILL |
365 | or | |
366 | .B SIGSTOP | |
c0004fb4 MK |
367 | are treated exceptionally: |
368 | these signals are forcibly delivered when sent from an ancestor PID namespace. | |
fc49d2ac MK |
369 | Neither of these signals can be caught by the "init" process, |
370 | and so will result in the usual actions associated with those signals | |
371 | (respectively, terminating and stopping the process). | |
e17d07c1 | 372 | |
9d005472 MK |
373 | PID namespaces can be nested. |
374 | When a new PID namespace is created, | |
375 | the processes in that namespace are visible | |
376 | in the PID namespace of the process that created the new namespace; | |
377 | analogously, if the parent PID namespace is itself | |
378 | the child of another PID namespace, | |
379 | then processes in the child and parent PID namespaces will both be | |
380 | visible in the grandparent PID namespace. | |
381 | Conversely, the processes in the "child" PID namespace do not see | |
382 | the processes in the parent namespace. | |
383 | More succinctly: a process can see (e.g., send signals with | |
020357e8 | 384 | .BR kill(2)) |
e13b53a6 | 385 | only processes contained in its own PID namespace |
020357e8 MK |
386 | and the namespaces nested below that PID namespace. |
387 | ||
9d005472 MK |
388 | A process will have one PID for each of the layers of the hierarchy |
389 | starting from the PID namespace in which it resides | |
390 | through to the root PID namespace. | |
391 | A call to | |
392 | .BR getpid (2) | |
393 | always returns the PID associated with the namespace in which | |
394 | the process resides. | |
395 | ||
ed94b9b8 | 396 | Some processes in a PID namespace may have parents |
110026ab MK |
397 | that are outside of the namespace. |
398 | For example, the parent of the initial process in the namespace | |
399 | (i.e., | |
400 | the | |
401 | .BR init (1) | |
402 | process with PID 1) is necessarily in another namespace. | |
403 | Likewise, the direct children of a process that uses | |
404 | .BR setns (2) | |
405 | to cause its children to join a PID namespace are in a different | |
406 | PID namespace from the caller of | |
407 | .BR setns (2). | |
408 | Calls to | |
409 | .BR getppid (2) | |
410 | for such processes return 0. | |
411 | ||
9d005472 MK |
412 | After creating a new PID namespace, |
413 | it is useful for the child to change its root directory | |
414 | and mount a new procfs instance at | |
415 | .I /proc | |
416 | so that tools such as | |
417 | .BR ps (1) | |
418 | work correctly. | |
419 | .\" mount -t proc proc /proc | |
420 | (If | |
421 | .BR CLONE_NEWNS | |
422 | is also included in the | |
423 | .IR flags | |
424 | argument of | |
425 | .BR clone (2) | |
426 | or | |
427 | .BR unshare (2)), | |
428 | then it isn't necessary to change the root directory: | |
429 | a new procfs instance can be mounted directly over | |
430 | .IR /proc .) | |
431 | ||
ca291567 MK |
432 | Calls to |
433 | .BR setns (2) | |
434 | that specify a PID namespace file descriptor | |
435 | and calls to | |
436 | .BR unshare (2) | |
437 | with the | |
438 | .BR CLONE_NEWPID | |
439 | flag cause children subsequently created | |
440 | by the caller to be placed in a different PID namespace from the caller. | |
441 | These calls do not, however, | |
442 | change the PID namespace of the calling process, | |
443 | because doing so would change the caller's idea of its own PID | |
444 | (as reported by | |
445 | .BR getpid ()), | |
446 | which would break many applications and libraries. | |
447 | To put things another way: | |
448 | a process's PID namespace membership is determined when the process is created | |
449 | and cannot be changed thereafter. | |
450 | ||
857c57e7 MK |
451 | Every thread in a process must be in the same PID namespace. |
452 | For this reason, the two following call sequences will fail: | |
453 | ||
857c57e7 MK |
454 | .nf |
455 | unshare(CLONE_NEWPID); | |
456 | clone(..., CLONE_VM, ...); /* Fails */ | |
457 | ||
458 | setns(fd, CLONE_NEWPID); | |
459 | clone(..., CLONE_VM, ...); /* Fails */ | |
460 | .fi | |
857c57e7 MK |
461 | |
462 | Because the above | |
463 | .BR unshare (2) | |
464 | and | |
465 | .BR setns (2) | |
466 | calls only change the PID namespace for created children, the | |
467 | .BR clone (2) | |
468 | calls necessarily put the new thread in a different PID namespace from | |
469 | the calling thread. | |
470 | ||
53d63b89 MK |
471 | When a process ID is passed over a UNIX domain socket to a |
472 | process in a different PID namespace (see the description of | |
473 | .B SCM_CREDENTIALS | |
474 | in | |
475 | .BR unix (7)), | |
476 | it is translated into the corresponding PID value in | |
477 | the receiving process's PID namespace. | |
478 | .\" FIXME Presumably, a similar thing happens with the UID and GID passed | |
479 | .\" via a UNIX domain socket. That needs to be confirmed and documented | |
480 | .\" under the "User namespaces" section. | |
481 | ||
9d005472 MK |
482 | Use of PID namespaces requires a kernel that is configured with the |
483 | .B CONFIG_PID_NS | |
484 | option. | |
6be09bd8 MK |
485 | .\" |
486 | .\" ==================== User namespaces ==================== | |
487 | .\" | |
020357e8 | 488 | .SS User namespaces (CLONE_NEWUSER) |
67d1131f MK |
489 | See |
490 | .BR user_namespaces (7). | |
6be09bd8 MK |
491 | .\" |
492 | .\" ==================== UTS namespaces ==================== | |
493 | .\" | |
020357e8 | 494 | .SS UTS namespaces (CLONE_NEWUTS) |
020357e8 MK |
495 | UTS namespaces provide isolation of two system identifiers: |
496 | the hostname and the NIS domain name. | |
497 | These identifiers are set using | |
498 | .BR sethostname (2) | |
499 | and | |
500 | .BR setdomainname (2), | |
501 | and can be retrieved using | |
502 | .BR uname (2), | |
503 | .BR gethostname (2), | |
504 | and | |
505 | .BR getdomainname (2). | |
506 | ||
83d9e9b2 MK |
507 | Use of UTS namespaces requires a kernel that is configured with the |
508 | .B CONFIG_UTS_NS | |
509 | option. | |
020357e8 MK |
510 | .SH CONFORMING TO |
511 | Namespaces are a Linux-specific feature. | |
512 | .SH SEE ALSO | |
86499a6b | 513 | .BR nsenter (1), |
020357e8 | 514 | .BR readlink (1), |
86499a6b | 515 | .BR unshare (1), |
020357e8 MK |
516 | .BR clone (2), |
517 | .BR setns (2), | |
518 | .BR unshare (2), | |
519 | .BR proc (5), | |
520 | .BR credentials (7), | |
029ae9e3 | 521 | .BR capabilities (7), |
67d1131f | 522 | .BR user_namespaces (7), |
029ae9e3 | 523 | .BR switch_root (8) |