]>
Commit | Line | Data |
---|---|---|
c736cecc MK |
1 | .\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com> |
2 | .\" | |
5fbde956 | 3 | .\" SPDX-License-Identifier: Linux-man-pages-copyleft |
c736cecc MK |
4 | .\" |
5 | .\" | |
4c1c5274 | 6 | .TH cgroup_namespaces 7 (date) "Linux man-pages (unreleased)" |
c736cecc MK |
7 | .SH NAME |
8 | cgroup_namespaces \- overview of Linux cgroup namespaces | |
9 | .SH DESCRIPTION | |
10 | For an overview of namespaces, see | |
11 | .BR namespaces (7). | |
40749137 | 12 | .PP |
c736cecc MK |
13 | Cgroup namespaces virtualize the view of a process's cgroups (see |
14 | .BR cgroups (7)) | |
15 | as seen via | |
1ae6b2c7 | 16 | .IR /proc/ pid /cgroup |
c736cecc | 17 | and |
1ae6b2c7 | 18 | .IR /proc/ pid /mountinfo . |
40749137 | 19 | .PP |
aa864d82 MK |
20 | Each cgroup namespace has its own set of cgroup root directories. |
21 | These root directories are the base points for the relative | |
22 | locations displayed in the corresponding records in the | |
1ae6b2c7 | 23 | .IR /proc/ pid /cgroup |
aa864d82 | 24 | file. |
c736cecc MK |
25 | When a process creates a new cgroup namespace using |
26 | .BR clone (2) | |
27 | or | |
28 | .BR unshare (2) | |
29 | with the | |
1ae6b2c7 | 30 | .B CLONE_NEWCGROUP |
ef129697 | 31 | flag, its current |
29179416 MK |
32 | cgroups directories become the cgroup root directories |
33 | of the new namespace. | |
c736cecc MK |
34 | (This applies both for the cgroups version 1 hierarchies |
35 | and the cgroups version 2 unified hierarchy.) | |
40749137 | 36 | .PP |
727e5609 | 37 | When reading the cgroup memberships of a "target" process from |
1ae6b2c7 | 38 | .IR /proc/ pid /cgroup , |
c736cecc | 39 | the pathname shown in the third field of each record will be |
aa864d82 MK |
40 | relative to the reading process's root directory |
41 | for the corresponding cgroup hierarchy. | |
c736cecc MK |
42 | If the cgroup directory of the target process lies outside |
43 | the root directory of the reading process's cgroup namespace, | |
44 | then the pathname will show | |
45 | .I ../ | |
46 | entries for each ancestor level in the cgroup hierarchy. | |
40749137 | 47 | .PP |
c736cecc MK |
48 | The following shell session demonstrates the effect of creating |
49 | a new cgroup namespace. | |
c9a35b01 | 50 | .PP |
727e5609 MK |
51 | First, (as superuser) in a shell in the initial cgroup namespace, |
52 | we create a child cgroup in the | |
c736cecc | 53 | .I freezer |
c9a35b01 MK |
54 | hierarchy, and place a process in that cgroup that we will |
55 | use as part of the demonstration below: | |
56 | .PP | |
57 | .in +4n | |
58 | .EX | |
59 | # \fBmkdir \-p /sys/fs/cgroup/freezer/sub2\fP | |
60 | # \fBsleep 10000 &\fP # Create a process that lives for a while | |
61 | [1] 20124 | |
62 | # \fBecho 20124 > /sys/fs/cgroup/freezer/sub2/cgroup.procs\fP | |
63 | .EE | |
64 | .in | |
65 | .PP | |
66 | We then create another child cgroup in the | |
67 | .I freezer | |
68 | hierarchy and put the shell into that cgroup: | |
40749137 | 69 | .PP |
c736cecc | 70 | .in +4n |
b8302363 | 71 | .EX |
c736cecc MK |
72 | # \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP |
73 | # \fBecho $$\fP # Show PID of this shell | |
74 | 30655 | |
e39f614f | 75 | # \fBecho 30655 > /sys/fs/cgroup/freezer/sub/cgroup.procs\fP |
c736cecc MK |
76 | # \fBcat /proc/self/cgroup | grep freezer\fP |
77 | 7:freezer:/sub | |
b8302363 | 78 | .EE |
e646a1ba | 79 | .in |
40749137 | 80 | .PP |
c736cecc MK |
81 | Next, we use |
82 | .BR unshare (1) | |
83 | to create a process running a new shell in new cgroup and mount namespaces: | |
40749137 | 84 | .PP |
c736cecc | 85 | .in +4n |
146842f9 | 86 | .EX |
f3da99c4 | 87 | # \fBPS1="sh2# " unshare \-Cm bash\fP |
32bc5a71 | 88 | .EE |
146842f9 | 89 | .in |
40749137 | 90 | .PP |
727e5609 MK |
91 | From the new shell started by |
92 | .BR unshare (1), | |
93 | we then inspect the | |
1ae6b2c7 | 94 | .IR /proc/ pid /cgroup |
727e5609 MK |
95 | files of, respectively, the new shell, |
96 | a process that is in the initial cgroup namespace | |
c736cecc | 97 | .RI ( init , |
c9a35b01 | 98 | with PID 1), and the process in the sibling cgroup |
aa864d82 | 99 | .RI ( sub2 ): |
40749137 | 100 | .PP |
c736cecc | 101 | .in +4n |
146842f9 | 102 | .EX |
f3da99c4 | 103 | sh2# \fBcat /proc/self/cgroup | grep freezer\fP |
c736cecc | 104 | 7:freezer:/ |
f3da99c4 | 105 | sh2# \fBcat /proc/1/cgroup | grep freezer\fP |
c736cecc | 106 | 7:freezer:/.. |
f3da99c4 | 107 | sh2# \fBcat /proc/20124/cgroup | grep freezer\fP |
c736cecc | 108 | 7:freezer:/../sub2 |
32bc5a71 | 109 | .EE |
146842f9 | 110 | .in |
89cbd279 MK |
111 | .PP |
112 | From the output of the first command, | |
113 | we see that the freezer cgroup membership of the new shell | |
114 | (which is in the same cgroup as the initial shell) | |
115 | is shown defined relative to the freezer cgroup root directory | |
116 | that was established when the new cgroup namespace was created. | |
117 | (In absolute terms, | |
118 | the new shell is in the | |
119 | .I /sub | |
120 | freezer cgroup, | |
121 | and the root directory of the freezer cgroup hierarchy | |
122 | in the new cgroup namespace is also | |
123 | .IR /sub . | |
124 | Thus, the new shell's cgroup membership is displayed as \(aq/\(aq.) | |
125 | .PP | |
c736cecc | 126 | However, when we look in |
1ae6b2c7 | 127 | .I /proc/self/mountinfo |
c736cecc | 128 | we see the following anomaly: |
40749137 | 129 | .PP |
c736cecc | 130 | .in +4n |
146842f9 | 131 | .EX |
f3da99c4 | 132 | sh2# \fBcat /proc/self/mountinfo | grep freezer\fP |
c736cecc | 133 | 155 145 0:32 /.. /sys/fs/cgroup/freezer ... |
32bc5a71 | 134 | .EE |
146842f9 | 135 | .in |
40749137 | 136 | .PP |
aa864d82 MK |
137 | The fourth field of this line |
138 | .RI ( /.. ) | |
139 | should show the | |
c736cecc MK |
140 | directory in the cgroup filesystem which forms the root of this mount. |
141 | Since by the definition of cgroup namespaces, the process's current | |
142 | freezer cgroup directory became its root freezer cgroup directory, | |
143 | we should see \(aq/\(aq in this field. | |
144 | The problem here is that we are seeing a mount entry for the cgroup | |
727e5609 MK |
145 | filesystem corresponding to the initial cgroup namespace |
146 | (whose cgroup filesystem is indeed rooted at the parent directory of | |
c736cecc | 147 | .IR sub ). |
727e5609 MK |
148 | To fix this problem, we must remount the freezer cgroup filesystem |
149 | from the new shell (i.e., perform the mount from a process that is in the | |
150 | new cgroup namespace), after which we see the expected results: | |
40749137 | 151 | .PP |
c736cecc | 152 | .in +4n |
146842f9 | 153 | .EX |
861d36ba | 154 | sh2# \fBmount \-\-make\-rslave /\fP # Don\(aqt propagate mount events |
f3da99c4 MK |
155 | # to other namespaces |
156 | sh2# \fBumount /sys/fs/cgroup/freezer\fP | |
157 | sh2# \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP | |
158 | sh2# \fBcat /proc/self/mountinfo | grep freezer\fP | |
c736cecc | 159 | 155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ... |
32bc5a71 | 160 | .EE |
146842f9 | 161 | .in |
c736cecc | 162 | .\" |
3113c7f3 | 163 | .SH STANDARDS |
e664450b | 164 | Namespaces are a Linux-specific feature. |
c736cecc | 165 | .SH NOTES |
d190902b MK |
166 | Use of cgroup namespaces requires a kernel that is configured with the |
167 | .B CONFIG_CGROUPS | |
168 | option. | |
169 | .PP | |
4d9b3039 | 170 | The virtualization provided by cgroup namespaces serves a number of purposes: |
22356d97 | 171 | .IP \(bu 3 |
c736cecc MK |
172 | It prevents information leaks whereby cgroup directory paths outside of |
173 | a container would otherwise be visible to processes in the container. | |
174 | Such leakages could, for example, | |
175 | reveal information about the container framework | |
176 | to containerized applications. | |
22356d97 | 177 | .IP \(bu |
10b547c5 MK |
178 | It eases tasks such as container migration. |
179 | The virtualization provided by cgroup namespaces | |
180 | allows containers to be isolated from knowledge of | |
181 | the pathnames of ancestor cgroups. | |
0191a7b9 MK |
182 | Without such isolation, the full cgroup pathnames (displayed in |
183 | .IR /proc/self/cgroups ) | |
184 | would need to be replicated on the target system when migrating a container; | |
10b547c5 MK |
185 | those pathnames would also need to be unique, |
186 | so that they don't conflict with other pathnames on the target system. | |
22356d97 | 187 | .IP \(bu |
a531b2cf | 188 | It allows better confinement of containerized processes, |
a2b7dba5 MK |
189 | because it is possible to mount the container's cgroup filesystems such that |
190 | the container processes can't gain access to ancestor cgroup directories. | |
c736cecc | 191 | Consider, for example, the following scenario: |
22356d97 AC |
192 | .RS |
193 | .IP \(bu 3 | |
c736cecc MK |
194 | We have a cgroup directory, |
195 | .IR /cg/1 , | |
196 | that is owned by user ID 9000. | |
197 | .IP \(bu | |
198 | We have a process, | |
199 | .IR X , | |
200 | also owned by user ID 9000, | |
201 | that is namespaced under the cgroup | |
1ae6b2c7 | 202 | .I /cg/1/2 |
c736cecc MK |
203 | (i.e., |
204 | .I X | |
205 | was placed in a new cgroup namespace via | |
206 | .BR clone (2) | |
207 | or | |
208 | .BR unshare (2) | |
209 | with the | |
1ae6b2c7 | 210 | .B CLONE_NEWCGROUP |
c736cecc MK |
211 | flag). |
212 | .RE | |
213 | .IP | |
214 | In the absence of cgroup namespacing, because the cgroup directory | |
1ae6b2c7 | 215 | .I /cg/1 |
ef6f9539 | 216 | is owned (and writable) by UID 9000 and process |
bcedc0c2 | 217 | .I X |
80c5b48d | 218 | is also owned by user ID 9000, process |
bcedc0c2 MK |
219 | .I X |
220 | would be able to modify the contents of cgroups files | |
221 | (i.e., change cgroup settings) not only in | |
1ae6b2c7 | 222 | .I /cg/1/2 |
c736cecc MK |
223 | but also in the ancestor cgroup directory |
224 | .IR /cg/1 . | |
225 | Namespacing process | |
1ae6b2c7 | 226 | .I X |
c736cecc | 227 | under the cgroup directory |
cc267b37 MK |
228 | .IR /cg/1/2 , |
229 | in combination with suitable mount operations | |
230 | for the cgroup filesystem (as shown above), | |
c736cecc MK |
231 | prevents it modifying files in |
232 | .IR /cg/1 , | |
233 | since it cannot even see the contents of that directory | |
234 | (or of further removed cgroup ancestor directories). | |
235 | Combined with correct enforcement of hierarchical limits, | |
2a785d2a MK |
236 | this prevents process |
237 | .I X | |
238 | from escaping the limits imposed by ancestor cgroups. | |
c736cecc MK |
239 | .SH SEE ALSO |
240 | .BR unshare (1), | |
241 | .BR clone (2), | |
242 | .BR setns (2), | |
243 | .BR unshare (2), | |
244 | .BR proc (5), | |
245 | .BR cgroups (7), | |
246 | .BR credentials (7), | |
61256f9f | 247 | .BR namespaces (7), |
c736cecc | 248 | .BR user_namespaces (7) |