]>
Commit | Line | Data |
---|---|---|
c736cecc MK |
1 | .\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com> |
2 | .\" | |
3 | .\" %%%LICENSE_START(VERBATIM) | |
4 | .\" Permission is granted to make and distribute verbatim copies of this | |
5 | .\" manual provided the copyright notice and this permission notice are | |
6 | .\" preserved on all copies. | |
7 | .\" | |
8 | .\" Permission is granted to copy and distribute modified versions of this | |
9 | .\" manual under the conditions for verbatim copying, provided that the | |
10 | .\" entire resulting derived work is distributed under the terms of a | |
11 | .\" permission notice identical to this one. | |
12 | .\" | |
13 | .\" Since the Linux kernel and libraries are constantly changing, this | |
14 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
15 | .\" responsibility for errors or omissions, or for damages resulting from | |
16 | .\" the use of the information contained herein. The author(s) may not | |
17 | .\" have taken the same level of care in the production of this manual, | |
18 | .\" which is licensed free of charge, as they might when working | |
19 | .\" professionally. | |
20 | .\" | |
21 | .\" Formatted or processed versions of this manual, if unaccompanied by | |
22 | .\" the source, must acknowledge the copyright and authors of this work. | |
23 | .\" %%%LICENSE_END | |
24 | .\" | |
25 | .\" | |
3df541c0 | 26 | .TH CGROUP_NAMESPACES 7 2016-07-17 "Linux" "Linux Programmer's Manual" |
c736cecc MK |
27 | .SH NAME |
28 | cgroup_namespaces \- overview of Linux cgroup namespaces | |
29 | .SH DESCRIPTION | |
30 | For an overview of namespaces, see | |
31 | .BR namespaces (7). | |
32 | ||
33 | Cgroup namespaces virtualize the view of a process's cgroups (see | |
34 | .BR cgroups (7)) | |
35 | as seen via | |
36 | .IR /proc/[pid]/cgroup | |
37 | and | |
38 | .IR /proc/[pid]/mountinfo . | |
39 | ||
40 | Each cgroup namespace has its own set of cgroup root directories, | |
41 | which are the base points for the relative locations displayed in | |
42 | .IR /proc/[pid]/cgroup . | |
43 | When a process creates a new cgroup namespace using | |
44 | .BR clone (2) | |
45 | or | |
46 | .BR unshare (2) | |
47 | with the | |
48 | .BR CLONE_NEWCGROUP | |
29179416 MK |
49 | flag, it enters a new cgroup namespace in which its current |
50 | cgroups directories become the cgroup root directories | |
51 | of the new namespace. | |
c736cecc MK |
52 | (This applies both for the cgroups version 1 hierarchies |
53 | and the cgroups version 2 unified hierarchy.) | |
54 | ||
55 | When viewing | |
56 | .IR /proc/[pid]/cgroup , | |
57 | the pathname shown in the third field of each record will be | |
58 | relative to the reading process's cgroup root directory. | |
59 | If the cgroup directory of the target process lies outside | |
60 | the root directory of the reading process's cgroup namespace, | |
61 | then the pathname will show | |
62 | .I ../ | |
63 | entries for each ancestor level in the cgroup hierarchy. | |
64 | ||
65 | The following shell session demonstrates the effect of creating | |
66 | a new cgroup namespace. | |
67 | First, (as superuser) we create a child cgroup in the | |
68 | .I freezer | |
69 | hierarchy, and put the shell into that cgroup: | |
70 | ||
71 | .nf | |
72 | .in +4n | |
73 | # \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP | |
74 | # \fBecho $$\fP # Show PID of this shell | |
75 | 30655 | |
76 | # \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP | |
77 | # \fBcat /proc/self/cgroup | grep freezer\fP | |
78 | 7:freezer:/sub | |
79 | .in | |
80 | .fi | |
81 | ||
82 | Next, we use | |
83 | .BR unshare (1) | |
84 | to create a process running a new shell in new cgroup and mount namespaces: | |
85 | ||
86 | .nf | |
87 | .in +4n | |
88 | # \fBunshare \-Cm bash\fP | |
89 | .in | |
90 | .fi | |
91 | ||
92 | We then inspect the | |
93 | .IR /proc/[pid]/cgroup | |
94 | files of, respectively, the new shell process started by the | |
95 | .BR unshare (1) | |
96 | command, a process that is in the original cgroup namespace | |
97 | .RI ( init , | |
98 | with PID 1), and a process in a sibling cgroup: | |
99 | ||
100 | .nf | |
101 | .in +4n | |
102 | $ \fBcat /proc/self/cgroup | grep freezer\fP | |
103 | 7:freezer:/ | |
104 | $ \fBcat /proc/1/cgroup | grep freezer\fP | |
105 | 7:freezer:/.. | |
106 | $ \fBcat /proc/20124/cgroup | grep freezer\fP | |
107 | 7:freezer:/../sub2 | |
108 | .in | |
109 | .fi | |
110 | ||
111 | However, when we look in | |
112 | .IR /proc/self/mountinfo | |
113 | we see the following anomaly: | |
114 | ||
115 | .nf | |
116 | .in +4n | |
117 | # \fBcat /proc/self/mountinfo | grep freezer\fP | |
118 | 155 145 0:32 /.. /sys/fs/cgroup/freezer ... | |
119 | .in | |
120 | .fi | |
121 | ||
e1b70806 | 122 | The fourth field of this file should show the |
c736cecc MK |
123 | directory in the cgroup filesystem which forms the root of this mount. |
124 | Since by the definition of cgroup namespaces, the process's current | |
125 | freezer cgroup directory became its root freezer cgroup directory, | |
126 | we should see \(aq/\(aq in this field. | |
127 | The problem here is that we are seeing a mount entry for the cgroup | |
128 | filesystem corresponding to our initial shell process's cgroup namespace | |
129 | (whose cgroup filesystem is indeed rooted in the parent directory of | |
130 | .IR sub ). | |
131 | We need to remount the freezer cgroup filesystem | |
132 | inside this cgroup namespace, after which we see the expected results: | |
133 | ||
134 | .nf | |
135 | .in +4n | |
3011d629 | 136 | # \fBmount \-\-make\-rslave /\fP # Don't propagate mount events |
c736cecc | 137 | # to other namespaces |
3011d629 MK |
138 | # \fBumount /sys/fs/cgroup/freezer\fP |
139 | # \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP | |
140 | # \fBcat /proc/self/mountinfo | grep freezer\fP | |
c736cecc MK |
141 | 155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ... |
142 | .in | |
143 | .fi | |
144 | ||
145 | Use of cgroup namespaces requires a kernel that is configured with the | |
146 | .B CONFIG_CGROUPS | |
147 | option. | |
148 | .\" | |
e664450b MK |
149 | .SH CONFORMING TO |
150 | Namespaces are a Linux-specific feature. | |
c736cecc MK |
151 | .SH NOTES |
152 | Among the purposes served by the | |
153 | virtualization provided by cgroup namespaces are the following: | |
154 | .IP * 2 | |
155 | It prevents information leaks whereby cgroup directory paths outside of | |
156 | a container would otherwise be visible to processes in the container. | |
157 | Such leakages could, for example, | |
158 | reveal information about the container framework | |
159 | to containerized applications. | |
160 | .IP * | |
10b547c5 MK |
161 | It eases tasks such as container migration. |
162 | The virtualization provided by cgroup namespaces | |
163 | allows containers to be isolated from knowledge of | |
164 | the pathnames of ancestor cgroups. | |
0191a7b9 MK |
165 | Without such isolation, the full cgroup pathnames (displayed in |
166 | .IR /proc/self/cgroups ) | |
167 | would need to be replicated on the target system when migrating a container; | |
10b547c5 MK |
168 | those pathnames would also need to be unique, |
169 | so that they don't conflict with other pathnames on the target system. | |
170 | .IP * | |
a531b2cf | 171 | It allows better confinement of containerized processes, |
a2b7dba5 MK |
172 | because it is possible to mount the container's cgroup filesystems such that |
173 | the container processes can't gain access to ancestor cgroup directories. | |
c736cecc MK |
174 | Consider, for example, the following scenario: |
175 | .RS 4 | |
176 | .IP \(bu 2 | |
177 | We have a cgroup directory, | |
178 | .IR /cg/1 , | |
179 | that is owned by user ID 9000. | |
180 | .IP \(bu | |
181 | We have a process, | |
182 | .IR X , | |
183 | also owned by user ID 9000, | |
184 | that is namespaced under the cgroup | |
185 | .IR /cg/1/2 | |
186 | (i.e., | |
187 | .I X | |
188 | was placed in a new cgroup namespace via | |
189 | .BR clone (2) | |
190 | or | |
191 | .BR unshare (2) | |
192 | with the | |
193 | .BR CLONE_NEWCGROUP | |
194 | flag). | |
195 | .RE | |
196 | .IP | |
197 | In the absence of cgroup namespacing, because the cgroup directory | |
198 | .IR /cg/1 | |
ef6f9539 | 199 | is owned (and writable) by UID 9000 and process |
bcedc0c2 MK |
200 | .I X |
201 | is also owned by user ID 9000, then process | |
202 | .I X | |
203 | would be able to modify the contents of cgroups files | |
204 | (i.e., change cgroup settings) not only in | |
c736cecc MK |
205 | .IR /cg/1/2 |
206 | but also in the ancestor cgroup directory | |
207 | .IR /cg/1 . | |
208 | Namespacing process | |
209 | .IR X | |
210 | under the cgroup directory | |
cc267b37 MK |
211 | .IR /cg/1/2 , |
212 | in combination with suitable mount operations | |
213 | for the cgroup filesystem (as shown above), | |
c736cecc MK |
214 | prevents it modifying files in |
215 | .IR /cg/1 , | |
216 | since it cannot even see the contents of that directory | |
217 | (or of further removed cgroup ancestor directories). | |
218 | Combined with correct enforcement of hierarchical limits, | |
2a785d2a MK |
219 | this prevents process |
220 | .I X | |
221 | from escaping the limits imposed by ancestor cgroups. | |
c736cecc MK |
222 | .SH SEE ALSO |
223 | .BR unshare (1), | |
224 | .BR clone (2), | |
225 | .BR setns (2), | |
226 | .BR unshare (2), | |
227 | .BR proc (5), | |
228 | .BR cgroups (7), | |
229 | .BR credentials (7), | |
61256f9f | 230 | .BR namespaces (7), |
c736cecc | 231 | .BR user_namespaces (7) |