]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/namespaces.7
namespaces.7: Document cgroup namespaces (CLONE_NEWCGROUP)
[thirdparty/man-pages.git] / man7 / namespaces.7
1 .\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
3 .\"
4 .\" %%%LICENSE_START(VERBATIM)
5 .\" Permission is granted to make and distribute verbatim copies of this
6 .\" manual provided the copyright notice and this permission notice are
7 .\" preserved on all copies.
8 .\"
9 .\" Permission is granted to copy and distribute modified versions of this
10 .\" manual under the conditions for verbatim copying, provided that the
11 .\" entire resulting derived work is distributed under the terms of a
12 .\" permission notice identical to this one.
13 .\"
14 .\" Since the Linux kernel and libraries are constantly changing, this
15 .\" manual page may be incorrect or out-of-date. The author(s) assume no
16 .\" responsibility for errors or omissions, or for damages resulting from
17 .\" the use of the information contained herein. The author(s) may not
18 .\" have taken the same level of care in the production of this manual,
19 .\" which is licensed free of charge, as they might when working
20 .\" professionally.
21 .\"
22 .\" Formatted or processed versions of this manual, if unaccompanied by
23 .\" the source, must acknowledge the copyright and authors of this work.
24 .\" %%%LICENSE_END
25 .\"
26 .\"
27 .TH NAMESPACES 7 2016-03-15 "Linux" "Linux Programmer's Manual"
28 .SH NAME
29 namespaces \- overview of Linux namespaces
30 .SH DESCRIPTION
31 A namespace wraps a global system resource in an abstraction that
32 makes it appear to the processes within the namespace that they
33 have their own isolated instance of the global resource.
34 Changes to the global resource are visible to other processes
35 that are members of the namespace, but are invisible to other processes.
36 One use of namespaces is to implement containers.
37
38 Linux provides the following namespaces:
39 .TS
40 lB lB lB
41 l lB l.
42 Namespace Constant Isolates
43 Cgroup CLONE_NEWCGROUP Cgroup root directory
44 IPC CLONE_NEWIPC System V IPC, POSIX message queues
45 Network CLONE_NEWNET Network devices, stacks, ports, etc.
46 Mount CLONE_NEWNS Mount points
47 PID CLONE_NEWPID Process IDs
48 User CLONE_NEWUSER User and group IDs
49 UTS CLONE_NEWUTS Hostname and NIS domain name
50 .TE
51
52 This page describes the various namespaces and the associated
53 .I /proc
54 files, and summarizes the APIs for working with namespaces.
55 .\"
56 .\" ==================== The namespaces API ====================
57 .\"
58 .SS The namespaces API
59 As well as various
60 .I /proc
61 files described below,
62 the namespaces API includes the following system calls:
63 .TP
64 .BR clone (2)
65 The
66 .BR clone (2)
67 system call creates a new process.
68 If the
69 .I flags
70 argument of the call specifies one or more of the
71 .B CLONE_NEW*
72 flags listed below, then new namespaces are created for each flag,
73 and the child process is made a member of those namespaces.
74 (This system call also implements a number of features
75 unrelated to namespaces.)
76 .TP
77 .BR setns (2)
78 The
79 .BR setns (2)
80 system call allows the calling process to join an existing namespace.
81 The namespace to join is specified via a file descriptor that refers to
82 one of the
83 .IR /proc/[pid]/ns
84 files described below.
85 .TP
86 .BR unshare (2)
87 The
88 .BR unshare (2)
89 system call moves the calling process to a new namespace.
90 If the
91 .I flags
92 argument of the call specifies one or more of the
93 .B CLONE_NEW*
94 flags listed below, then new namespaces are created for each flag,
95 and the calling process is made a member of those namespaces.
96 (This system call also implements a number of features
97 unrelated to namespaces.)
98 .PP
99 Creation of new namespaces using
100 .BR clone (2)
101 and
102 .BR unshare (2)
103 in most cases requires the
104 .BR CAP_SYS_ADMIN
105 capability.
106 User namespaces are the exception: since Linux 3.8,
107 no privilege is required to create a user namespace.
108 .\"
109 .\" ==================== The /proc/[pid]/ns/ directory ====================
110 .\"
111 .SS The /proc/[pid]/ns/ directory
112 Each process has a
113 .IR /proc/[pid]/ns/
114 .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
115 subdirectory containing one entry for each namespace that
116 supports being manipulated by
117 .BR setns (2):
118
119 .in +4n
120 .nf
121 $ \fBls -l /proc/$$/ns\fP
122 total 0
123 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
124 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
125 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
126 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
127 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
128 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
129 lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]
130 .fi
131 .in
132
133 Bind mounting (see
134 .BR mount (2))
135 one of the files in this directory
136 to somewhere else in the filesystem keeps
137 the corresponding namespace of the process specified by
138 .I pid
139 alive even if all processes currently in the namespace terminate.
140
141 Opening one of the files in this directory
142 (or a file that is bind mounted to one of these files)
143 returns a file handle for
144 the corresponding namespace of the process specified by
145 .IR pid .
146 As long as this file descriptor remains open,
147 the namespace will remain alive,
148 even if all processes in the namespace terminate.
149 The file descriptor can be passed to
150 .BR setns (2).
151
152 In Linux 3.7 and earlier, these files were visible as hard links.
153 Since Linux 3.8, they appear as symbolic links.
154 If two processes are in the same namespace, then the inode numbers of their
155 .IR /proc/[pid]/ns/xxx
156 symbolic links will be the same; an application can check this using the
157 .I stat.st_ino
158 field returned by
159 .BR stat (2).
160 The content of this symbolic link is a string containing
161 the namespace type and inode number as in the following example:
162
163 .in +4n
164 .nf
165 $ \fBreadlink /proc/$$/ns/uts\fP
166 uts:[4026531838]
167 .fi
168 .in
169
170 The files in this subdirectory are as follows:
171 .TP
172 .IR /proc/[pid]/ns/cgroup " (since Linux 4.6)"
173 This file is a handle for the cgroup namespace of the process.
174 .TP
175 .IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
176 This file is a handle for the IPC namespace of the process.
177 .TP
178 .IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
179 This file is a handle for the mount namespace of the process.
180 .TP
181 .IR /proc/[pid]/ns/net " (since Linux 3.0)"
182 This file is a handle for the network namespace of the process.
183 .TP
184 .IR /proc/[pid]/ns/pid " (since Linux 3.8)"
185 This file is a handle for the PID namespace of the process.
186 .TP
187 .IR /proc/[pid]/ns/user " (since Linux 3.8)"
188 This file is a handle for the user namespace of the process.
189 .TP
190 .IR /proc/[pid]/ns/uts " (since Linux 3.0)"
191 This file is a handle for the UTS namespace of the process.
192 .\"
193 .\" ==================== Cgroup namespaces ====================
194 .\"
195 .SS Cgroup namespaces (CLONE_NEWCGROUP)
196 Cgroup namespaces virtualize the view of a process's cgroups as seen via
197 .IR /proc/[pid]/cgroup
198 (see
199 .BR cgroups (7)).
200
201 Each cgroup namespace has its own set of cgroup root directories,
202 which are the base points for the relative locations displayed in
203 .IR /proc/[pid]/cgroup .
204 When a process creates a new cgroup namespace using
205 .BR clone (2)
206 or
207 .BR unshare (2)
208 with the
209 .BR CLONE_NEWCGROUP
210 flag, then its current cgroups directories become its cgroup root directories.
211 (This applies both for the cgroups version 1 hierarchies
212 as well as the cgroups version 2 unified hierarchy.)
213
214 When viewing
215 .IR /proc/[pid]/cgroup ,
216 the pathname shown in the third field of each record will be
217 relative to the reading process's cgroup root directory.
218 If the cgroup directory of the target process lies outside
219 the cgroup root directory for this namespace,
220 then the pathname will show
221 .I /..
222 entries for each ancestor level in the cgroup hierarchy.
223
224 The following shell session demonstrates the effect of creating
225 a new cgroup namespace.
226 First, we create child cgroup in the
227 .I freezer
228 hierarchy, and put the shell into that cgroup:
229
230 .nf
231 .in +4n
232 $ \fBsudo mkdir \-p /sys/fs/cgroup/freezer/sub\fP
233 $ \fBecho $$\fP # Show PID of this shell
234 30655
235 $ \fBsudo sh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
236 $ \fBcat /proc/self/cgroup | grep freezer\fP
237 7:freezer:/sub
238 .in
239 .fi
240
241 Next, we use
242 .BR unshare (1)
243 to create a process running a shell in new user and cgroup namespaces:
244
245 .nf
246 .in +4n
247 $ \fBunshare -U -C bash\fP
248 .in
249 .fi
250
251 We then inspect the
252 .IR /proc/[pid]/cgroup
253 files of, respectively, the new shell process started by the
254 .BR unshare (1)
255 command, a process that is in the original cgroup namespace
256 .RI ( init ,
257 with PID 1), and a process in a sibling cgroup:
258
259 .nf
260 .in +4n
261 $ \fBcat /proc/self/cgroup | grep freezer\fP
262 7:freezer:/
263 $ \fBcat /proc/1/cgroup | grep freezer\fP
264 7:freezer:/..
265 $ \fBcat /proc/20124/cgroup | grep freezer\fP
266 7:freezer:/../sub2
267 .in
268 .fi
269
270 The virtualization provided by cgroup namespaces can be used to prevent
271 information leaks whereby cgroup directory paths outside of
272 a container would otherwise be visible to processes in the container.
273
274 Use of cgroup namespaces requires a kernel that is configured with the
275 .B CONFIG_CGROUPS
276 option.
277 .\"
278 .\" ==================== IPC namespaces ====================
279 .\"
280 .SS IPC namespaces (CLONE_NEWIPC)
281 IPC namespaces isolate certain IPC resources,
282 namely, System V IPC objects (see
283 .BR svipc (7))
284 and (since Linux 2.6.30)
285 .\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
286 .\" https://lwn.net/Articles/312232/
287 POSIX message queues (see
288 .BR mq_overview (7)).
289 The common characteristic of these IPC mechanisms is that IPC
290 objects are identified by mechanisms other than filesystem
291 pathnames.
292
293 Each IPC namespace has its own set of System V IPC identifiers and
294 its own POSIX message queue filesystem.
295 Objects created in an IPC namespace are visible to all other processes
296 that are members of that namespace,
297 but are not visible to processes in other IPC namespaces.
298
299 The following
300 .I /proc
301 interfaces are distinct in each IPC namespace:
302 .IP * 3
303 The POSIX message queue interfaces in
304 .IR /proc/sys/fs/mqueue .
305 .IP *
306 The System V IPC interfaces in
307 .IR /proc/sys/kernel ,
308 namely:
309 .IR msgmax ,
310 .IR msgmnb ,
311 .IR msgmni ,
312 .IR sem ,
313 .IR shmall ,
314 .IR shmmax ,
315 .IR shmmni ,
316 and
317 .IR shm_rmid_forced .
318 .IP *
319 The System V IPC interfaces in
320 .IR /proc/sysvipc .
321 .PP
322 When an IPC namespace is destroyed
323 (i.e., when the last process that is a member of the namespace terminates),
324 all IPC objects in the namespace are automatically destroyed.
325
326 Use of IPC namespaces requires a kernel that is configured with the
327 .B CONFIG_IPC_NS
328 option.
329 .\"
330 .\" ==================== Network namespaces ====================
331 .\"
332 .SS Network namespaces (CLONE_NEWNET)
333 Network namespaces provide isolation of the system resources associated
334 with networking: network devices, IPv4 and IPv6 protocol stacks,
335 IP routing tables, firewalls, the
336 .I /proc/net
337 directory, the
338 .I /sys/class/net
339 directory, port numbers (sockets), and so on.
340 A physical network device can live in exactly one
341 network namespace.
342 A virtual network device ("veth") pair provides a pipe-like abstraction
343 .\" FIXME Add pointer to veth(4) page when it is eventually completed
344 that can be used to create tunnels between network namespaces,
345 and can be used to create a bridge to a physical network device
346 in another namespace.
347
348 When a network namespace is freed
349 (i.e., when the last process in the namespace terminates),
350 its physical network devices are moved back to the
351 initial network namespace (not to the parent of the process).
352
353 Use of network namespaces requires a kernel that is configured with the
354 .B CONFIG_NET_NS
355 option.
356 .\"
357 .\" ==================== Mount namespaces ====================
358 .\"
359 .SS Mount namespaces (CLONE_NEWNS)
360 Mount namespaces isolate the set of filesystem mount points,
361 meaning that processes in different mount namespaces can
362 have different views of the filesystem hierarchy.
363 The set of mounts in a mount namespace is modified using
364 .BR mount (2)
365 and
366 .BR umount (2).
367
368 The
369 .IR /proc/[pid]/mounts
370 file (present since Linux 2.4.19)
371 lists all the filesystems currently mounted in the
372 process's mount namespace.
373 The format of this file is documented in
374 .BR fstab (5).
375 Since kernel version 2.6.15, this file is pollable:
376 after opening the file for reading, a change in this file
377 (i.e., a filesystem mount or unmount) causes
378 .BR select (2)
379 to mark the file descriptor as readable, and
380 .BR poll (2)
381 and
382 .BR epoll_wait (2)
383 mark the file as having an error condition.
384
385 The
386 .IR /proc/[pid]/mountstats
387 file (present since Linux 2.6.17)
388 exports information (statistics, configuration information)
389 about the mount points in the process's mount namespace.
390 This file is readable only by the owner of the process.
391 Lines in this file have the form:
392 .RS
393 .in 12
394 .nf
395
396 device /dev/sda7 mounted on /home with fstype ext3 [statistics]
397 ( 1 ) ( 2 ) (3 ) (4)
398 .fi
399 .in
400
401 The fields in each line are:
402 .TP 5
403 (1)
404 The name of the mounted device
405 (or "nodevice" if there is no corresponding device).
406 .TP
407 (2)
408 The mount point within the filesystem tree.
409 .TP
410 (3)
411 The filesystem type.
412 .TP
413 (4)
414 Optional statistics and configuration information.
415 Currently (as at Linux 2.6.26), only NFS filesystems export
416 information via this field.
417 .RE
418 .\"
419 .\" ==================== PID namespaces ====================
420 .\"
421 .SS PID namespaces (CLONE_NEWPID)
422 See
423 .BR pid_namespaces (7).
424 .\"
425 .\" ==================== User namespaces ====================
426 .\"
427 .SS User namespaces (CLONE_NEWUSER)
428 See
429 .BR user_namespaces (7).
430 .\"
431 .\" ==================== UTS namespaces ====================
432 .\"
433 .SS UTS namespaces (CLONE_NEWUTS)
434 UTS namespaces provide isolation of two system identifiers:
435 the hostname and the NIS domain name.
436 These identifiers are set using
437 .BR sethostname (2)
438 and
439 .BR setdomainname (2),
440 and can be retrieved using
441 .BR uname (2),
442 .BR gethostname (2),
443 and
444 .BR getdomainname (2).
445
446 Use of UTS namespaces requires a kernel that is configured with the
447 .B CONFIG_UTS_NS
448 option.
449 .SH CONFORMING TO
450 Namespaces are a Linux-specific feature.
451 .SH EXAMPLE
452 See
453 .BR user_namespaces (7).
454 .SH SEE ALSO
455 .BR lsns (1),
456 .BR nsenter (1),
457 .BR readlink (1),
458 .BR unshare (1),
459 .BR clone (2),
460 .BR setns (2),
461 .BR unshare (2),
462 .BR proc (5),
463 .BR capabilities (7),
464 .BR cgroups (7),
465 .BR credentials (7),
466 .BR pid_namespaces (7),
467 .BR user_namespaces (7),
468 .BR switch_root (8)