]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/namespaces.7
namespaces.7: wfix
[thirdparty/man-pages.git] / man7 / namespaces.7
1 .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3 .\"
4 .\" %%%LICENSE_START(VERBATIM)
5 .\" Permission is granted to make and distribute verbatim copies of this
6 .\" manual provided the copyright notice and this permission notice are
7 .\" preserved on all copies.
8 .\"
9 .\" Permission is granted to copy and distribute modified versions of this
10 .\" manual under the conditions for verbatim copying, provided that the
11 .\" entire resulting derived work is distributed under the terms of a
12 .\" permission notice identical to this one.
13 .\"
14 .\" Since the Linux kernel and libraries are constantly changing, this
15 .\" manual page may be incorrect or out-of-date. The author(s) assume no
16 .\" responsibility for errors or omissions, or for damages resulting from
17 .\" the use of the information contained herein. The author(s) may not
18 .\" have taken the same level of care in the production of this manual,
19 .\" which is licensed free of charge, as they might when working
20 .\" professionally.
21 .\"
22 .\" Formatted or processed versions of this manual, if unaccompanied by
23 .\" the source, must acknowledge the copyright and authors of this work.
24 .\" %%%LICENSE_END
25 .\"
26 .\"
27 .TH NAMESPACES 7 2016-12-12 "Linux" "Linux Programmer's Manual"
28 .SH NAME
29 namespaces \- overview of Linux namespaces
30 .SH DESCRIPTION
31 A namespace wraps a global system resource in an abstraction that
32 makes it appear to the processes within the namespace that they
33 have their own isolated instance of the global resource.
34 Changes to the global resource are visible to other processes
35 that are members of the namespace, but are invisible to other processes.
36 One use of namespaces is to implement containers.
37
38 Linux provides the following namespaces:
39 .TS
40 lB lB lB
41 l lB l.
42 Namespace Constant Isolates
43 Cgroup CLONE_NEWCGROUP Cgroup root directory
44 IPC CLONE_NEWIPC System V IPC, POSIX message queues
45 Network CLONE_NEWNET Network devices, stacks, ports, etc.
46 Mount CLONE_NEWNS Mount points
47 PID CLONE_NEWPID Process IDs
48 User CLONE_NEWUSER User and group IDs
49 UTS CLONE_NEWUTS Hostname and NIS domain name
50 .TE
51
52 This page describes the various namespaces and the associated
53 .I /proc
54 files, and summarizes the APIs for working with namespaces.
55 .\"
56 .\" ==================== The namespaces API ====================
57 .\"
58 .SS The namespaces API
59 As well as various
60 .I /proc
61 files described below,
62 the namespaces API includes the following system calls:
63 .TP
64 .BR clone (2)
65 The
66 .BR clone (2)
67 system call creates a new process.
68 If the
69 .I flags
70 argument of the call specifies one or more of the
71 .B CLONE_NEW*
72 flags listed below, then new namespaces are created for each flag,
73 and the child process is made a member of those namespaces.
74 (This system call also implements a number of features
75 unrelated to namespaces.)
76 .TP
77 .BR setns (2)
78 The
79 .BR setns (2)
80 system call allows the calling process to join an existing namespace.
81 The namespace to join is specified via a file descriptor that refers to
82 one of the
83 .IR /proc/[pid]/ns
84 files described below.
85 .TP
86 .BR unshare (2)
87 The
88 .BR unshare (2)
89 system call moves the calling process to a new namespace.
90 If the
91 .I flags
92 argument of the call specifies one or more of the
93 .B CLONE_NEW*
94 flags listed below, then new namespaces are created for each flag,
95 and the calling process is made a member of those namespaces.
96 (This system call also implements a number of features
97 unrelated to namespaces.)
98 .PP
99 Creation of new namespaces using
100 .BR clone (2)
101 and
102 .BR unshare (2)
103 in most cases requires the
104 .BR CAP_SYS_ADMIN
105 capability.
106 User namespaces are the exception: since Linux 3.8,
107 no privilege is required to create a user namespace.
108 .\"
109 .\" ==================== The /proc/[pid]/ns/ directory ====================
110 .\"
111 .SS The /proc/[pid]/ns/ directory
112 Each process has a
113 .IR /proc/[pid]/ns/
114 .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
115 subdirectory containing one entry for each namespace that
116 supports being manipulated by
117 .BR setns (2):
118
119 .in +4n
120 .nf
121 $ \fBls \-l /proc/$$/ns\fP
122 total 0
123 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup \-> cgroup:[4026531835]
124 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc \-> ipc:[4026531839]
125 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt \-> mnt:[4026531840]
126 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net \-> net:[4026531969]
127 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid \-> pid:[4026531836]
128 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user \-> user:[4026531837]
129 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts \-> uts:[4026531838]
130 .fi
131 .in
132
133 Bind mounting (see
134 .BR mount (2))
135 one of the files in this directory
136 to somewhere else in the filesystem keeps
137 the corresponding namespace of the process specified by
138 .I pid
139 alive even if all processes currently in the namespace terminate.
140
141 Opening one of the files in this directory
142 (or a file that is bind mounted to one of these files)
143 returns a file handle for
144 the corresponding namespace of the process specified by
145 .IR pid .
146 As long as this file descriptor remains open,
147 the namespace will remain alive,
148 even if all processes in the namespace terminate.
149 The file descriptor can be passed to
150 .BR setns (2).
151
152 In Linux 3.7 and earlier, these files were visible as hard links.
153 Since Linux 3.8,
154 .\" commit bf056bfa80596a5d14b26b17276a56a0dcb080e5
155 they appear as symbolic links.
156 If two processes are in the same namespace, then the inode numbers of their
157 .IR /proc/[pid]/ns/xxx
158 symbolic links will be the same; an application can check this using the
159 .I stat.st_ino
160 field returned by
161 .BR stat (2).
162 The content of this symbolic link is a string containing
163 the namespace type and inode number as in the following example:
164
165 .in +4n
166 .nf
167 $ \fBreadlink /proc/$$/ns/uts\fP
168 uts:[4026531838]
169 .fi
170 .in
171
172 The symbolic links in this subdirectory are as follows:
173 .TP
174 .IR /proc/[pid]/ns/cgroup " (since Linux 4.6)"
175 This file is a handle for the cgroup namespace of the process.
176 .TP
177 .IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
178 This file is a handle for the IPC namespace of the process.
179 .TP
180 .IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
181 .\" commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
182 This file is a handle for the mount namespace of the process.
183 .TP
184 .IR /proc/[pid]/ns/net " (since Linux 3.0)"
185 This file is a handle for the network namespace of the process.
186 .TP
187 .IR /proc/[pid]/ns/pid " (since Linux 3.8)"
188 .\" commit 57e8391d327609cbf12d843259c968b9e5c1838f
189 This file is a handle for the PID namespace of the process.
190 .TP
191 .IR /proc/[pid]/ns/user " (since Linux 3.8)"
192 .\" commit cde1975bc242f3e1072bde623ef378e547b73f91
193 This file is a handle for the user namespace of the process.
194 .TP
195 .IR /proc/[pid]/ns/uts " (since Linux 3.0)"
196 This file is a handle for the UTS namespace of the process.
197 .PP
198 Permission to dereference or read
199 .RB ( readlink (2))
200 these symbolic links is governed by a ptrace access mode
201 .B PTRACE_MODE_READ_FSCREDS
202 check; see
203 .BR ptrace (2).
204 .\"
205 .\" ==================== Cgroup namespaces ====================
206 .\"
207 .SS Cgroup namespaces (CLONE_NEWCGROUP)
208 See
209 .BR cgroup_namespaces (7).
210 .\"
211 .\" ==================== IPC namespaces ====================
212 .\"
213 .SS IPC namespaces (CLONE_NEWIPC)
214 IPC namespaces isolate certain IPC resources,
215 namely, System V IPC objects (see
216 .BR svipc (7))
217 and (since Linux 2.6.30)
218 .\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
219 .\" https://lwn.net/Articles/312232/
220 POSIX message queues (see
221 .BR mq_overview (7)).
222 The common characteristic of these IPC mechanisms is that IPC
223 objects are identified by mechanisms other than filesystem
224 pathnames.
225
226 Each IPC namespace has its own set of System V IPC identifiers and
227 its own POSIX message queue filesystem.
228 Objects created in an IPC namespace are visible to all other processes
229 that are members of that namespace,
230 but are not visible to processes in other IPC namespaces.
231
232 The following
233 .I /proc
234 interfaces are distinct in each IPC namespace:
235 .IP * 3
236 The POSIX message queue interfaces in
237 .IR /proc/sys/fs/mqueue .
238 .IP *
239 The System V IPC interfaces in
240 .IR /proc/sys/kernel ,
241 namely:
242 .IR msgmax ,
243 .IR msgmnb ,
244 .IR msgmni ,
245 .IR sem ,
246 .IR shmall ,
247 .IR shmmax ,
248 .IR shmmni ,
249 and
250 .IR shm_rmid_forced .
251 .IP *
252 The System V IPC interfaces in
253 .IR /proc/sysvipc .
254 .PP
255 When an IPC namespace is destroyed
256 (i.e., when the last process that is a member of the namespace terminates),
257 all IPC objects in the namespace are automatically destroyed.
258
259 Use of IPC namespaces requires a kernel that is configured with the
260 .B CONFIG_IPC_NS
261 option.
262 .\"
263 .\" ==================== Network namespaces ====================
264 .\"
265 .SS Network namespaces (CLONE_NEWNET)
266 Network namespaces provide isolation of the system resources associated
267 with networking: network devices, IPv4 and IPv6 protocol stacks,
268 IP routing tables, firewalls, the
269 .I /proc/net
270 directory, the
271 .I /sys/class/net
272 directory, port numbers (sockets), and so on.
273 A physical network device can live in exactly one
274 network namespace.
275 A virtual network device ("veth") pair provides a pipe-like abstraction
276 .\" FIXME . Add pointer to veth(4) page when it is eventually completed
277 that can be used to create tunnels between network namespaces,
278 and can be used to create a bridge to a physical network device
279 in another namespace.
280
281 When a network namespace is freed
282 (i.e., when the last process in the namespace terminates),
283 its physical network devices are moved back to the
284 initial network namespace (not to the parent of the process).
285
286 Use of network namespaces requires a kernel that is configured with the
287 .B CONFIG_NET_NS
288 option.
289 .\"
290 .\" ==================== Mount namespaces ====================
291 .\"
292 .SS Mount namespaces (CLONE_NEWNS)
293 See
294 .BR mount_namespaces (7).
295 .\"
296 .\" ==================== PID namespaces ====================
297 .\"
298 .SS PID namespaces (CLONE_NEWPID)
299 See
300 .BR pid_namespaces (7).
301 .\"
302 .\" ==================== User namespaces ====================
303 .\"
304 .SS User namespaces (CLONE_NEWUSER)
305 See
306 .BR user_namespaces (7).
307 .\"
308 .\" ==================== UTS namespaces ====================
309 .\"
310 .SS UTS namespaces (CLONE_NEWUTS)
311 UTS namespaces provide isolation of two system identifiers:
312 the hostname and the NIS domain name.
313 These identifiers are set using
314 .BR sethostname (2)
315 and
316 .BR setdomainname (2),
317 and can be retrieved using
318 .BR uname (2),
319 .BR gethostname (2),
320 and
321 .BR getdomainname (2).
322
323 Use of UTS namespaces requires a kernel that is configured with the
324 .B CONFIG_UTS_NS
325 option.
326 .\"
327 .\" ============================================================
328 .\"
329 .SS Discovering namespace relationships
330 Since Linux 4.9,
331 .\" commit bcac25a58bfc6bd79191ac5d7afb49bea96da8c9
332 .\" commit 6786741dbf99e44fb0c0ed85a37582b8a26f1c3b
333 .\" commit a7306ed8d94af729ecef8b6e37506a1c6fc14788
334 .\" commit 6ad92bf63e45f97e306da48cd1cbce6e4fef1e5d
335 two
336 .BR ioctl (2)
337 operations are provided to allow discovery of namespace relationships
338 (see
339 .BR user_namespaces (7)
340 and
341 .BR pid_namespaces (7)).
342 The form of the calls is:
343
344 new_fd = ioctl(fd, request);
345
346 In each case,
347 .I fd
348 refers to a
349 .IR /proc/[pid]/ns/*
350 file.
351 Both operations return a new file descriptor on success.
352 .TP
353 .BR NS_GET_USERNS
354 Returns a file descriptor that refers to the owning user namespace
355 for the namespace referred to by
356 .IR fd .
357 .TP
358 .BR NS_GET_PARENT
359 Returns a file descriptor that refers to the parent namespace of
360 the namespace referred to by
361 .IR fd .
362 This operation is valid only for hierarchical namespaces
363 (i.e., PID and user namespaces).
364 For user namespaces,
365 .BR NS_GET_PARENT
366 is synonymous with
367 .BR NS_GET_USERNS .
368 .PP
369 The new file descriptor returned by these operations is opened with the
370 .BR O_RDONLY
371 and
372 .BR O_CLOEXEC
373 (close-on-exec; see
374 .BR fcntl (2))
375 flags.
376 .PP
377 By applying
378 .BR fstat (2)
379 to the returned file descriptor, one obtains a
380 .I stat
381 structure whose
382 .I st_dev
383 (resident device) and
384 .I st_ino
385 (inode number) fields together identify the owning/parent namespace.
386 This inode number can be matched with the inode number of another
387 .IR /proc/[pid]/ns/{pid,user}
388 file to determine whether that is the owning/parent namespace.
389
390 Either of these
391 .BR ioctl (2)
392 operations can fail with the following errors:
393 .TP
394 .B EPERM
395 The requested namespace is outside of the caller's namespace scope.
396 This error can occur if, for example, the owning user namespace is an
397 ancestor of the caller's current user namespace.
398 It can also occur on attempts to obtain the parent of the initial
399 user or PID namespace.
400 .TP
401 .B ENOTTY
402 The operation is not supported by this kernel version.
403 .PP
404 Additionally, the
405 .B NS_GET_PARENT
406 operation can fail with the following error:
407 .TP
408 .B EINVAL
409 .I fd
410 refers to a nonhierarchical namespace.
411 .PP
412 See the EXAMPLE section for an example of the use of these operations.
413 .SH CONFORMING TO
414 Namespaces are a Linux-specific feature.
415 .SH EXAMPLE
416 For one example,
417 .BR user_namespaces (7).
418
419 The example shown below uses the
420 .BR ioctl (2)
421 operations described above to perform simple
422 discovery of namespace relationships.
423 The following shell sessions show various examples of the use
424 of this program.
425
426 Trying to get the parent of the initial user namespace fails,
427 since it has no parent:
428
429 .nf
430 .in +4n
431 $ \fB./ns_introspect /proc/self/ns/user p\fP
432 The parent namespace is outside your namespace scope
433 .in
434 .fi
435
436 Create a process running
437 .BR sleep (1)
438 that resides in new user and UTS namespaces,
439 and show that the new UTS namespace is associated with the new user namespace:
440
441 .nf
442 .in +4n
443 $ \fBunshare \-Uu sleep 1000 &\fP
444 [1] 23235
445 $ \fB./ns_introspect /proc/23235/ns/uts u\fP
446 Device/Inode of owning user namespace is: [0,3] / 4026532448
447 $ \fBreadlink /proc/23235/ns/user \fP
448 user:[4026532448]
449 .in
450 .fi
451
452 Then show that the parent of the new user namespace in the preceding
453 example is the initial user namespace:
454
455 .nf
456 .in +4n
457 $ \fBreadlink /proc/self/ns/user\fP
458 user:[4026531837]
459 $ \fB./ns_introspect /proc/23235/ns/user p\fP
460 Device/Inode of parent namespace is: [0,3] / 4026531837
461 .in
462 .fi
463
464 Start a shell in a new user namespace, and show that from within
465 this shell, the parent user namespace can't be discovered.
466 Similarly, the UTS namespace
467 (which is associated with the initial user namespace)
468 can't be discovered.
469
470 .nf
471 .in +4n
472 $ \fBPS1="sh2$ " unshare \-U bash\fP
473 sh2$ \fB./ns_introspect /proc/self/ns/user p\fP
474 The parent namespace is outside your namespace scope
475 sh2$ \fB./ns_introspect /proc/self/ns/uts u\fP
476 The owning user namespace is outside your namespace scope
477 .in
478 .fi
479 .SS Program source
480 \&
481 .nf
482 /* ns_introspect.c
483
484 Licensed under the GNU General Public License v2 or later.
485 */
486 #include <stdlib.h>
487 #include <unistd.h>
488 #include <stdio.h>
489 #include <fcntl.h>
490 #include <string.h>
491 #include <sys/stat.h>
492 #include <sys/ioctl.h>
493 #include <errno.h>
494 #include <sys/sysmacros.h>
495
496 #ifndef NS_GET_USERNS
497 #define NSIO 0xb7
498 #define NS_GET_USERNS _IO(NSIO, 0x1)
499 #define NS_GET_PARENT _IO(NSIO, 0x2)
500 #endif
501
502 int
503 main(int argc, char *argv[])
504 {
505 int fd, userns_fd, parent_fd;
506 struct stat sb;
507
508 if (argc < 2) {
509 fprintf(stderr, "Usage: %s /proc/[pid]/ns/[file] [p|u]\\n",
510 argv[0]);
511 fprintf(stderr, "\\nDisplay the result of one or both "
512 "of NS_GET_USERNS (u) or NS_GET_PARENT (p)\\n"
513 "for the specified /proc/[pid]/ns/[file]. If neither "
514 "\(aqp\(aq nor \(aqu\(aq is specified,\\n"
515 "NS_GET_USERNS is the default.\\n");
516 exit(EXIT_FAILURE);
517 }
518
519 /* Obtain a file descriptor for the \(aqns\(aq file specified
520 in argv[1] */
521
522 fd = open(argv[1], O_RDONLY);
523 if (fd == \-1) {
524 perror("open");
525 exit(EXIT_FAILURE);
526 }
527
528 /* Obtain a file descriptor for the owning user namespace and
529 then obtain and display the inode number of that namespace */
530
531 if (argc < 3 || strchr(argv[2], \(aqu\(aq)) {
532 userns_fd = ioctl(fd, NS_GET_USERNS);
533
534 if (userns_fd == \-1) {
535 if (errno == EPERM)
536 printf("The owning user namespace is outside "
537 "your namespace scope\\n");
538 else
539 perror("ioctl\-NS_GET_USERNS");
540 exit(EXIT_FAILURE);
541 }
542
543 if (fstat(userns_fd, &sb) == \-1) {
544 perror("fstat\-userns");
545 exit(EXIT_FAILURE);
546 }
547 printf("Device/Inode of owning user namespace is: "
548 "[%lx,%lx] / %ld\\n",
549 (long) major(sb.st_dev), (long) minor(sb.st_dev),
550 (long) sb.st_ino);
551
552 close(userns_fd);
553 }
554
555 /* Obtain a file descriptor for the parent namespace and
556 then obtain and display the inode number of that namespace */
557
558 if (argc > 2 && strchr(argv[2], \(aqp\(aq)) {
559 parent_fd = ioctl(fd, NS_GET_PARENT);
560
561 if (parent_fd == \-1) {
562 if (errno == EINVAL)
563 printf("Can\(aq get parent namespace of a "
564 "nonhierarchical namespace\\n");
565 else if (errno == EPERM)
566 printf("The parent namespace is outside "
567 "your namespace scope\\n");
568 else
569 perror("ioctl\-NS_GET_PARENT");
570 exit(EXIT_FAILURE);
571 }
572
573 if (fstat(parent_fd, &sb) == \-1) {
574 perror("fstat\-parentns");
575 exit(EXIT_FAILURE);
576 }
577 printf("Device/Inode of parent namespace is: [%lx,%lx] / %ld\\n",
578 (long) major(sb.st_dev), (long) minor(sb.st_dev),
579 (long) sb.st_ino);
580
581 close(parent_fd);
582 }
583
584 exit(EXIT_SUCCESS);
585 }
586 .fi
587 .SH SEE ALSO
588 .BR nsenter (1),
589 .BR readlink (1),
590 .BR unshare (1),
591 .BR clone (2),
592 .BR setns (2),
593 .BR unshare (2),
594 .BR proc (5),
595 .BR capabilities (7),
596 .BR cgroup_namespaces (7),
597 .BR cgroups (7),
598 .BR credentials (7),
599 .BR pid_namespaces (7),
600 .BR user_namespaces (7),
601 .BR lsns (8),
602 .BR switch_root (8)