]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/namespaces.7
namespaces.7: Some tweaks to Kirill Tkhai's text on pid_for_children
[thirdparty/man-pages.git] / man7 / namespaces.7
1 .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3 .\"
4 .\" %%%LICENSE_START(VERBATIM)
5 .\" Permission is granted to make and distribute verbatim copies of this
6 .\" manual provided the copyright notice and this permission notice are
7 .\" preserved on all copies.
8 .\"
9 .\" Permission is granted to copy and distribute modified versions of this
10 .\" manual under the conditions for verbatim copying, provided that the
11 .\" entire resulting derived work is distributed under the terms of a
12 .\" permission notice identical to this one.
13 .\"
14 .\" Since the Linux kernel and libraries are constantly changing, this
15 .\" manual page may be incorrect or out-of-date. The author(s) assume no
16 .\" responsibility for errors or omissions, or for damages resulting from
17 .\" the use of the information contained herein. The author(s) may not
18 .\" have taken the same level of care in the production of this manual,
19 .\" which is licensed free of charge, as they might when working
20 .\" professionally.
21 .\"
22 .\" Formatted or processed versions of this manual, if unaccompanied by
23 .\" the source, must acknowledge the copyright and authors of this work.
24 .\" %%%LICENSE_END
25 .\"
26 .\"
27 .TH NAMESPACES 7 2017-05-03 "Linux" "Linux Programmer's Manual"
28 .SH NAME
29 namespaces \- overview of Linux namespaces
30 .SH DESCRIPTION
31 A namespace wraps a global system resource in an abstraction that
32 makes it appear to the processes within the namespace that they
33 have their own isolated instance of the global resource.
34 Changes to the global resource are visible to other processes
35 that are members of the namespace, but are invisible to other processes.
36 One use of namespaces is to implement containers.
37
38 Linux provides the following namespaces:
39 .TS
40 lB lB lB
41 l lB l.
42 Namespace Constant Isolates
43 Cgroup CLONE_NEWCGROUP Cgroup root directory
44 IPC CLONE_NEWIPC System V IPC, POSIX message queues
45 Network CLONE_NEWNET Network devices, stacks, ports, etc.
46 Mount CLONE_NEWNS Mount points
47 PID CLONE_NEWPID Process IDs
48 User CLONE_NEWUSER User and group IDs
49 UTS CLONE_NEWUTS Hostname and NIS domain name
50 .TE
51
52 This page describes the various namespaces and the associated
53 .I /proc
54 files, and summarizes the APIs for working with namespaces.
55 .\"
56 .\" ==================== The namespaces API ====================
57 .\"
58 .SS The namespaces API
59 As well as various
60 .I /proc
61 files described below,
62 the namespaces API includes the following system calls:
63 .TP
64 .BR clone (2)
65 The
66 .BR clone (2)
67 system call creates a new process.
68 If the
69 .I flags
70 argument of the call specifies one or more of the
71 .B CLONE_NEW*
72 flags listed below, then new namespaces are created for each flag,
73 and the child process is made a member of those namespaces.
74 (This system call also implements a number of features
75 unrelated to namespaces.)
76 .TP
77 .BR setns (2)
78 The
79 .BR setns (2)
80 system call allows the calling process to join an existing namespace.
81 The namespace to join is specified via a file descriptor that refers to
82 one of the
83 .IR /proc/[pid]/ns
84 files described below.
85 .TP
86 .BR unshare (2)
87 The
88 .BR unshare (2)
89 system call moves the calling process to a new namespace.
90 If the
91 .I flags
92 argument of the call specifies one or more of the
93 .B CLONE_NEW*
94 flags listed below, then new namespaces are created for each flag,
95 and the calling process is made a member of those namespaces.
96 (This system call also implements a number of features
97 unrelated to namespaces.)
98 .PP
99 Creation of new namespaces using
100 .BR clone (2)
101 and
102 .BR unshare (2)
103 in most cases requires the
104 .BR CAP_SYS_ADMIN
105 capability.
106 User namespaces are the exception: since Linux 3.8,
107 no privilege is required to create a user namespace.
108 .\"
109 .\" ==================== The /proc/[pid]/ns/ directory ====================
110 .\"
111 .SS The /proc/[pid]/ns/ directory
112 Each process has a
113 .IR /proc/[pid]/ns/
114 .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
115 subdirectory containing one entry for each namespace that
116 supports being manipulated by
117 .BR setns (2):
118
119 .in +4n
120 .nf
121 $ \fBls \-l /proc/$$/ns\fP
122 total 0
123 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup \-> cgroup:[4026531835]
124 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc \-> ipc:[4026531839]
125 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt \-> mnt:[4026531840]
126 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net \-> net:[4026531969]
127 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid \-> pid:[4026531836]
128 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children \-> pid:[4026531834]
129 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user \-> user:[4026531837]
130 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts \-> uts:[4026531838]
131 .fi
132 .in
133
134 Bind mounting (see
135 .BR mount (2))
136 one of the files in this directory
137 to somewhere else in the filesystem keeps
138 the corresponding namespace of the process specified by
139 .I pid
140 alive even if all processes currently in the namespace terminate.
141
142 Opening one of the files in this directory
143 (or a file that is bind mounted to one of these files)
144 returns a file handle for
145 the corresponding namespace of the process specified by
146 .IR pid .
147 As long as this file descriptor remains open,
148 the namespace will remain alive,
149 even if all processes in the namespace terminate.
150 The file descriptor can be passed to
151 .BR setns (2).
152
153 In Linux 3.7 and earlier, these files were visible as hard links.
154 Since Linux 3.8,
155 .\" commit bf056bfa80596a5d14b26b17276a56a0dcb080e5
156 they appear as symbolic links.
157 If two processes are in the same namespace, then the inode numbers of their
158 .IR /proc/[pid]/ns/xxx
159 symbolic links will be the same; an application can check this using the
160 .I stat.st_ino
161 field returned by
162 .BR stat (2).
163 The content of this symbolic link is a string containing
164 the namespace type and inode number as in the following example:
165
166 .in +4n
167 .nf
168 $ \fBreadlink /proc/$$/ns/uts\fP
169 uts:[4026531838]
170 .fi
171 .in
172
173 The symbolic links in this subdirectory are as follows:
174 .TP
175 .IR /proc/[pid]/ns/cgroup " (since Linux 4.6)"
176 This file is a handle for the cgroup namespace of the process.
177 .TP
178 .IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
179 This file is a handle for the IPC namespace of the process.
180 .TP
181 .IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
182 .\" commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
183 This file is a handle for the mount namespace of the process.
184 .TP
185 .IR /proc/[pid]/ns/net " (since Linux 3.0)"
186 This file is a handle for the network namespace of the process.
187 .TP
188 .IR /proc/[pid]/ns/pid " (since Linux 3.8)"
189 .\" commit 57e8391d327609cbf12d843259c968b9e5c1838f
190 This file is a handle for the PID namespace of the process.
191 This handle is permanent for the lifetime of the process
192 (i.e., a process's PID namespace membership never changes).
193 .TP
194 .IR /proc/[pid]/ns/pid_for_children " (since Linux 4.12)"
195 .\" commit eaa0d190bfe1ed891b814a52712dcd852554cb08
196 This file is a handle for the PID namespace of
197 child processes created by this process.
198 This can change as a consequence of calls to
199 .BR unshare (2)
200 and
201 .BR setns (2)
202 (see
203 .BR pid_namespaces (7)),
204 so the file may differ from
205 .IR /proc/[pid]/ns/pid .
206 .TP
207 .IR /proc/[pid]/ns/user " (since Linux 3.8)"
208 .\" commit cde1975bc242f3e1072bde623ef378e547b73f91
209 This file is a handle for the user namespace of the process.
210 .TP
211 .IR /proc/[pid]/ns/uts " (since Linux 3.0)"
212 This file is a handle for the UTS namespace of the process.
213 .PP
214 Permission to dereference or read
215 .RB ( readlink (2))
216 these symbolic links is governed by a ptrace access mode
217 .B PTRACE_MODE_READ_FSCREDS
218 check; see
219 .BR ptrace (2).
220 .\"
221 .\" ==================== The /proc/sys/user directory ====================
222 .\"
223 .SS The /proc/sys/user directory
224 The files in the
225 .I /proc/sys/user
226 directory (which is present since Linux 4.9) expose limits
227 on the number of namespaces of various types that can be created.
228 The files are as follows:
229 .TP
230 .IR max_cgroup_namespaces
231 The value in this file defines a per-user limit on the number of
232 cgroup namespaces that may be created in the user namespace.
233 .TP
234 .IR max_ipc_namespaces
235 The value in this file defines a per-user limit on the number of
236 ipc namespaces that may be created in the user namespace.
237 .TP
238 .IR max_mnt_namespaces
239 The value in this file defines a per-user limit on the number of
240 mount namespaces that may be created in the user namespace.
241 .TP
242 .IR max_net_namespaces
243 The value in this file defines a per-user limit on the number of
244 network namespaces that may be created in the user namespace.
245 .TP
246 .IR max_pid_namespaces
247 The value in this file defines a per-user limit on the number of
248 pid namespaces that may be created in the user namespace.
249 .TP
250 .IR max_user_namespaces
251 The value in this file defines a per-user limit on the number of
252 user namespaces that may be created in the user namespace.
253 .TP
254 .IR max_uts_namespaces
255 The value in this file defines a per-user limit on the number of
256 user namespaces that may be created in the user namespace.
257 .PP
258 Note the following details about these files:
259 .IP * 3
260 The values in these files are modifiable by privileged processes.
261 .IP *
262 The values exposed by these files are the limits for the user namespace
263 in which the opening process resides.
264 .IP *
265 The limits are per-user.
266 Each user in the same user namespace
267 can create namespaces up to the defined limit.
268 .IP *
269 The limits apply to all users, including UID 0.
270 .IP *
271 These limits apply in addition to any other per-namespace
272 limits (such as those for PID and user namespaces) that may be enforced.
273 .IP *
274 Upon encountering these limits,
275 .BR clone (2)
276 and
277 .BR unshare (2)
278 fail with the error
279 .BR ENOSPC .
280 .IP *
281 For the initial user namespace,
282 the default value in each of these files is half the limit on the number
283 of threads that may be created
284 .RI ( /proc/sys/kernel/threads-max ).
285 In all descendant user namespaces, the default value in each file is
286 .BR MAXINT .
287 .IP *
288 When a namespace is created, the object is also accounted
289 against ancestor namespaces.
290 More precisely:
291 .RS
292 .IP + 3
293 Each user namespace has a creator UID.
294 .IP +
295 When a namespace is created,
296 it is accounted against the creator UIDs in each of the
297 ancestor user namespaces,
298 and the kernel ensures that the corresponding namespace limit
299 for the creator UID in the ancestor namespace is not exceeded.
300 .IP +
301 The aforementioned point ensures that creating a new user namespace
302 cannot be used as a means to escape the limits in force
303 in the current user namespace.
304 .RE
305 .PP
306 .\"
307 .\" ==================== Cgroup namespaces ====================
308 .\"
309 .SS Cgroup namespaces (CLONE_NEWCGROUP)
310 See
311 .BR cgroup_namespaces (7).
312 .\"
313 .\" ==================== IPC namespaces ====================
314 .\"
315 .SS IPC namespaces (CLONE_NEWIPC)
316 IPC namespaces isolate certain IPC resources,
317 namely, System V IPC objects (see
318 .BR svipc (7))
319 and (since Linux 2.6.30)
320 .\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
321 .\" https://lwn.net/Articles/312232/
322 POSIX message queues (see
323 .BR mq_overview (7)).
324 The common characteristic of these IPC mechanisms is that IPC
325 objects are identified by mechanisms other than filesystem
326 pathnames.
327
328 Each IPC namespace has its own set of System V IPC identifiers and
329 its own POSIX message queue filesystem.
330 Objects created in an IPC namespace are visible to all other processes
331 that are members of that namespace,
332 but are not visible to processes in other IPC namespaces.
333
334 The following
335 .I /proc
336 interfaces are distinct in each IPC namespace:
337 .IP * 3
338 The POSIX message queue interfaces in
339 .IR /proc/sys/fs/mqueue .
340 .IP *
341 The System V IPC interfaces in
342 .IR /proc/sys/kernel ,
343 namely:
344 .IR msgmax ,
345 .IR msgmnb ,
346 .IR msgmni ,
347 .IR sem ,
348 .IR shmall ,
349 .IR shmmax ,
350 .IR shmmni ,
351 and
352 .IR shm_rmid_forced .
353 .IP *
354 The System V IPC interfaces in
355 .IR /proc/sysvipc .
356 .PP
357 When an IPC namespace is destroyed
358 (i.e., when the last process that is a member of the namespace terminates),
359 all IPC objects in the namespace are automatically destroyed.
360
361 Use of IPC namespaces requires a kernel that is configured with the
362 .B CONFIG_IPC_NS
363 option.
364 .\"
365 .\" ==================== Network namespaces ====================
366 .\"
367 .SS Network namespaces (CLONE_NEWNET)
368 Network namespaces provide isolation of the system resources associated
369 with networking: network devices, IPv4 and IPv6 protocol stacks,
370 IP routing tables, firewalls, the
371 .I /proc/net
372 directory, the
373 .I /sys/class/net
374 directory, port numbers (sockets), and so on.
375 A physical network device can live in exactly one
376 network namespace.
377 A virtual network device ("veth") pair provides a pipe-like abstraction
378 .\" FIXME . Add pointer to veth(4) page when it is eventually completed
379 that can be used to create tunnels between network namespaces,
380 and can be used to create a bridge to a physical network device
381 in another namespace.
382
383 When a network namespace is freed
384 (i.e., when the last process in the namespace terminates),
385 its physical network devices are moved back to the
386 initial network namespace (not to the parent of the process).
387
388 Use of network namespaces requires a kernel that is configured with the
389 .B CONFIG_NET_NS
390 option.
391 .\"
392 .\" ==================== Mount namespaces ====================
393 .\"
394 .SS Mount namespaces (CLONE_NEWNS)
395 See
396 .BR mount_namespaces (7).
397 .\"
398 .\" ==================== PID namespaces ====================
399 .\"
400 .SS PID namespaces (CLONE_NEWPID)
401 See
402 .BR pid_namespaces (7).
403 .\"
404 .\" ==================== User namespaces ====================
405 .\"
406 .SS User namespaces (CLONE_NEWUSER)
407 See
408 .BR user_namespaces (7).
409 .\"
410 .\" ==================== UTS namespaces ====================
411 .\"
412 .SS UTS namespaces (CLONE_NEWUTS)
413 UTS namespaces provide isolation of two system identifiers:
414 the hostname and the NIS domain name.
415 These identifiers are set using
416 .BR sethostname (2)
417 and
418 .BR setdomainname (2),
419 and can be retrieved using
420 .BR uname (2),
421 .BR gethostname (2),
422 and
423 .BR getdomainname (2).
424
425 Use of UTS namespaces requires a kernel that is configured with the
426 .B CONFIG_UTS_NS
427 option.
428 .fi
429 .SH EXAMPLE
430 See
431 .BR user_namespaces (7).
432 .SH SEE ALSO
433 .BR nsenter (1),
434 .BR readlink (1),
435 .BR unshare (1),
436 .BR clone (2),
437 .BR ioctl_ns (2),
438 .BR setns (2),
439 .BR unshare (2),
440 .BR proc (5),
441 .BR capabilities (7),
442 .BR cgroup_namespaces (7),
443 .BR cgroups (7),
444 .BR credentials (7),
445 .BR pid_namespaces (7),
446 .BR user_namespaces (7),
447 .BR ip-netns (8),
448 .BR lsns (8),
449 .BR switch_root (8)