]>
Commit | Line | Data |
---|---|---|
c736cecc MK |
1 | .\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com> |
2 | .\" | |
3 | .\" %%%LICENSE_START(VERBATIM) | |
4 | .\" Permission is granted to make and distribute verbatim copies of this | |
5 | .\" manual provided the copyright notice and this permission notice are | |
6 | .\" preserved on all copies. | |
7 | .\" | |
8 | .\" Permission is granted to copy and distribute modified versions of this | |
9 | .\" manual under the conditions for verbatim copying, provided that the | |
10 | .\" entire resulting derived work is distributed under the terms of a | |
11 | .\" permission notice identical to this one. | |
12 | .\" | |
13 | .\" Since the Linux kernel and libraries are constantly changing, this | |
14 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
15 | .\" responsibility for errors or omissions, or for damages resulting from | |
16 | .\" the use of the information contained herein. The author(s) may not | |
17 | .\" have taken the same level of care in the production of this manual, | |
18 | .\" which is licensed free of charge, as they might when working | |
19 | .\" professionally. | |
20 | .\" | |
21 | .\" Formatted or processed versions of this manual, if unaccompanied by | |
22 | .\" the source, must acknowledge the copyright and authors of this work. | |
23 | .\" %%%LICENSE_END | |
24 | .\" | |
25 | .\" | |
f55a6d59 | 26 | .TH CGROUP_NAMESPACES 7 2017-07-13 "Linux" "Linux Programmer's Manual" |
c736cecc MK |
27 | .SH NAME |
28 | cgroup_namespaces \- overview of Linux cgroup namespaces | |
29 | .SH DESCRIPTION | |
30 | For an overview of namespaces, see | |
31 | .BR namespaces (7). | |
40749137 | 32 | .PP |
c736cecc MK |
33 | Cgroup namespaces virtualize the view of a process's cgroups (see |
34 | .BR cgroups (7)) | |
35 | as seen via | |
36 | .IR /proc/[pid]/cgroup | |
37 | and | |
38 | .IR /proc/[pid]/mountinfo . | |
40749137 | 39 | .PP |
aa864d82 MK |
40 | Each cgroup namespace has its own set of cgroup root directories. |
41 | These root directories are the base points for the relative | |
42 | locations displayed in the corresponding records in the | |
43 | .IR /proc/[pid]/cgroup | |
44 | file. | |
c736cecc MK |
45 | When a process creates a new cgroup namespace using |
46 | .BR clone (2) | |
47 | or | |
48 | .BR unshare (2) | |
49 | with the | |
50 | .BR CLONE_NEWCGROUP | |
29179416 MK |
51 | flag, it enters a new cgroup namespace in which its current |
52 | cgroups directories become the cgroup root directories | |
53 | of the new namespace. | |
c736cecc MK |
54 | (This applies both for the cgroups version 1 hierarchies |
55 | and the cgroups version 2 unified hierarchy.) | |
40749137 | 56 | .PP |
c736cecc MK |
57 | When viewing |
58 | .IR /proc/[pid]/cgroup , | |
59 | the pathname shown in the third field of each record will be | |
aa864d82 MK |
60 | relative to the reading process's root directory |
61 | for the corresponding cgroup hierarchy. | |
c736cecc MK |
62 | If the cgroup directory of the target process lies outside |
63 | the root directory of the reading process's cgroup namespace, | |
64 | then the pathname will show | |
65 | .I ../ | |
66 | entries for each ancestor level in the cgroup hierarchy. | |
40749137 | 67 | .PP |
c736cecc MK |
68 | The following shell session demonstrates the effect of creating |
69 | a new cgroup namespace. | |
70 | First, (as superuser) we create a child cgroup in the | |
71 | .I freezer | |
72 | hierarchy, and put the shell into that cgroup: | |
40749137 | 73 | .PP |
32bc5a71 | 74 | .EX |
c736cecc | 75 | .in +4n |
e646a1ba | 76 | .nf |
c736cecc MK |
77 | # \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP |
78 | # \fBecho $$\fP # Show PID of this shell | |
79 | 30655 | |
3a9ef8b7 | 80 | # \fBsh \-c \(aqecho 30655 > /sys/fs/cgroup/freezer/sub/cgroup.procs\(aq\fP |
c736cecc MK |
81 | # \fBcat /proc/self/cgroup | grep freezer\fP |
82 | 7:freezer:/sub | |
c736cecc | 83 | .fi |
e646a1ba | 84 | .in |
32bc5a71 | 85 | .EE |
40749137 | 86 | .PP |
c736cecc MK |
87 | Next, we use |
88 | .BR unshare (1) | |
89 | to create a process running a new shell in new cgroup and mount namespaces: | |
40749137 | 90 | .PP |
c736cecc | 91 | .nf |
32bc5a71 | 92 | .EX |
c736cecc MK |
93 | .in +4n |
94 | # \fBunshare \-Cm bash\fP | |
95 | .in | |
32bc5a71 | 96 | .EE |
c736cecc | 97 | .fi |
40749137 | 98 | .PP |
c736cecc MK |
99 | We then inspect the |
100 | .IR /proc/[pid]/cgroup | |
101 | files of, respectively, the new shell process started by the | |
102 | .BR unshare (1) | |
103 | command, a process that is in the original cgroup namespace | |
104 | .RI ( init , | |
aa864d82 MK |
105 | with PID 1), and a process in a sibling cgroup |
106 | .RI ( sub2 ): | |
40749137 | 107 | .PP |
c736cecc | 108 | .nf |
32bc5a71 | 109 | .EX |
c736cecc MK |
110 | .in +4n |
111 | $ \fBcat /proc/self/cgroup | grep freezer\fP | |
112 | 7:freezer:/ | |
113 | $ \fBcat /proc/1/cgroup | grep freezer\fP | |
114 | 7:freezer:/.. | |
115 | $ \fBcat /proc/20124/cgroup | grep freezer\fP | |
116 | 7:freezer:/../sub2 | |
117 | .in | |
32bc5a71 | 118 | .EE |
c736cecc | 119 | .fi |
89cbd279 MK |
120 | .PP |
121 | From the output of the first command, | |
122 | we see that the freezer cgroup membership of the new shell | |
123 | (which is in the same cgroup as the initial shell) | |
124 | is shown defined relative to the freezer cgroup root directory | |
125 | that was established when the new cgroup namespace was created. | |
126 | (In absolute terms, | |
127 | the new shell is in the | |
128 | .I /sub | |
129 | freezer cgroup, | |
130 | and the root directory of the freezer cgroup hierarchy | |
131 | in the new cgroup namespace is also | |
132 | .IR /sub . | |
133 | Thus, the new shell's cgroup membership is displayed as \(aq/\(aq.) | |
134 | .PP | |
c736cecc MK |
135 | However, when we look in |
136 | .IR /proc/self/mountinfo | |
137 | we see the following anomaly: | |
40749137 | 138 | .PP |
c736cecc | 139 | .nf |
32bc5a71 | 140 | .EX |
c736cecc MK |
141 | .in +4n |
142 | # \fBcat /proc/self/mountinfo | grep freezer\fP | |
143 | 155 145 0:32 /.. /sys/fs/cgroup/freezer ... | |
144 | .in | |
32bc5a71 | 145 | .EE |
c736cecc | 146 | .fi |
40749137 | 147 | .PP |
aa864d82 MK |
148 | The fourth field of this line |
149 | .RI ( /.. ) | |
150 | should show the | |
c736cecc MK |
151 | directory in the cgroup filesystem which forms the root of this mount. |
152 | Since by the definition of cgroup namespaces, the process's current | |
153 | freezer cgroup directory became its root freezer cgroup directory, | |
154 | we should see \(aq/\(aq in this field. | |
155 | The problem here is that we are seeing a mount entry for the cgroup | |
156 | filesystem corresponding to our initial shell process's cgroup namespace | |
157 | (whose cgroup filesystem is indeed rooted in the parent directory of | |
158 | .IR sub ). | |
159 | We need to remount the freezer cgroup filesystem | |
160 | inside this cgroup namespace, after which we see the expected results: | |
40749137 | 161 | .PP |
c736cecc | 162 | .nf |
32bc5a71 | 163 | .EX |
c736cecc | 164 | .in +4n |
3011d629 | 165 | # \fBmount \-\-make\-rslave /\fP # Don't propagate mount events |
c736cecc | 166 | # to other namespaces |
3011d629 MK |
167 | # \fBumount /sys/fs/cgroup/freezer\fP |
168 | # \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP | |
169 | # \fBcat /proc/self/mountinfo | grep freezer\fP | |
c736cecc MK |
170 | 155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ... |
171 | .in | |
32bc5a71 | 172 | .EE |
c736cecc | 173 | .fi |
40749137 | 174 | .PP |
c736cecc MK |
175 | Use of cgroup namespaces requires a kernel that is configured with the |
176 | .B CONFIG_CGROUPS | |
177 | option. | |
178 | .\" | |
e664450b MK |
179 | .SH CONFORMING TO |
180 | Namespaces are a Linux-specific feature. | |
c736cecc MK |
181 | .SH NOTES |
182 | Among the purposes served by the | |
183 | virtualization provided by cgroup namespaces are the following: | |
184 | .IP * 2 | |
185 | It prevents information leaks whereby cgroup directory paths outside of | |
186 | a container would otherwise be visible to processes in the container. | |
187 | Such leakages could, for example, | |
188 | reveal information about the container framework | |
189 | to containerized applications. | |
190 | .IP * | |
10b547c5 MK |
191 | It eases tasks such as container migration. |
192 | The virtualization provided by cgroup namespaces | |
193 | allows containers to be isolated from knowledge of | |
194 | the pathnames of ancestor cgroups. | |
0191a7b9 MK |
195 | Without such isolation, the full cgroup pathnames (displayed in |
196 | .IR /proc/self/cgroups ) | |
197 | would need to be replicated on the target system when migrating a container; | |
10b547c5 MK |
198 | those pathnames would also need to be unique, |
199 | so that they don't conflict with other pathnames on the target system. | |
200 | .IP * | |
a531b2cf | 201 | It allows better confinement of containerized processes, |
a2b7dba5 MK |
202 | because it is possible to mount the container's cgroup filesystems such that |
203 | the container processes can't gain access to ancestor cgroup directories. | |
c736cecc MK |
204 | Consider, for example, the following scenario: |
205 | .RS 4 | |
206 | .IP \(bu 2 | |
207 | We have a cgroup directory, | |
208 | .IR /cg/1 , | |
209 | that is owned by user ID 9000. | |
210 | .IP \(bu | |
211 | We have a process, | |
212 | .IR X , | |
213 | also owned by user ID 9000, | |
214 | that is namespaced under the cgroup | |
215 | .IR /cg/1/2 | |
216 | (i.e., | |
217 | .I X | |
218 | was placed in a new cgroup namespace via | |
219 | .BR clone (2) | |
220 | or | |
221 | .BR unshare (2) | |
222 | with the | |
223 | .BR CLONE_NEWCGROUP | |
224 | flag). | |
225 | .RE | |
226 | .IP | |
227 | In the absence of cgroup namespacing, because the cgroup directory | |
228 | .IR /cg/1 | |
ef6f9539 | 229 | is owned (and writable) by UID 9000 and process |
bcedc0c2 MK |
230 | .I X |
231 | is also owned by user ID 9000, then process | |
232 | .I X | |
233 | would be able to modify the contents of cgroups files | |
234 | (i.e., change cgroup settings) not only in | |
c736cecc MK |
235 | .IR /cg/1/2 |
236 | but also in the ancestor cgroup directory | |
237 | .IR /cg/1 . | |
238 | Namespacing process | |
239 | .IR X | |
240 | under the cgroup directory | |
cc267b37 MK |
241 | .IR /cg/1/2 , |
242 | in combination with suitable mount operations | |
243 | for the cgroup filesystem (as shown above), | |
c736cecc MK |
244 | prevents it modifying files in |
245 | .IR /cg/1 , | |
246 | since it cannot even see the contents of that directory | |
247 | (or of further removed cgroup ancestor directories). | |
248 | Combined with correct enforcement of hierarchical limits, | |
2a785d2a MK |
249 | this prevents process |
250 | .I X | |
251 | from escaping the limits imposed by ancestor cgroups. | |
c736cecc MK |
252 | .SH SEE ALSO |
253 | .BR unshare (1), | |
254 | .BR clone (2), | |
255 | .BR setns (2), | |
256 | .BR unshare (2), | |
257 | .BR proc (5), | |
258 | .BR cgroups (7), | |
259 | .BR credentials (7), | |
61256f9f | 260 | .BR namespaces (7), |
c736cecc | 261 | .BR user_namespaces (7) |