.\" ==================== Cgroup namespaces ====================
.\"
.SS Cgroup namespaces (CLONE_NEWCGROUP)
-Cgroup namespaces virtualize the view of a process's cgroups as seen via
-.IR /proc/[pid]/cgroup
-(see
-.BR cgroups (7)).
+Cgroup namespaces virtualize the view of a process's cgroups (see
+.BR cgroups (7))
+as seen via
+.IR /proc/[pid]/cgroup .
Each cgroup namespace has its own set of cgroup root directories,
which are the base points for the relative locations displayed in
.BR CLONE_NEWCGROUP
flag, then its current cgroups directories become its cgroup root directories.
(This applies both for the cgroups version 1 hierarchies
-as well as the cgroups version 2 unified hierarchy.)
+and the cgroups version 2 unified hierarchy.)
When viewing
.IR /proc/[pid]/cgroup ,
The following shell session demonstrates the effect of creating
a new cgroup namespace.
-First, we create child cgroup in the
+First, (as superuser) we create a child cgroup in the
.I freezer
hierarchy, and put the shell into that cgroup:
.nf
.in +4n
-$ \fBsudo mkdir \-p /sys/fs/cgroup/freezer/sub\fP
-$ \fBecho $$\fP # Show PID of this shell
+# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
+# \fBecho $$\fP # Show PID of this shell
30655
-$ \fBsudo sh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
-$ \fBcat /proc/self/cgroup | grep freezer\fP
+# \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
+# \fBcat /proc/self/cgroup | grep freezer\fP
7:freezer:/sub
.in
.fi
Next, we use
.BR unshare (1)
-to create a process running a shell in new user and cgroup namespaces:
+to create a process running a shell in a new cgroup namespace:
.nf
.in +4n
-$ \fBunshare -U -C bash\fP
+# \fBunshare \-C bash\fP
.in
.fi
.in
.fi
-The virtualization provided by cgroup namespaces serves at least two purposes.
-First, it can be used to prevent
-information leaks whereby cgroup directory paths outside of
+Use of cgroup namespaces requires a kernel that is configured with the
+.B CONFIG_CGROUPS
+option.
+
+Among the purposes served by the
+virtualization provided by cgroup namespaces are the following:
+.IP * 2
+It prevents information leaks whereby cgroup directory paths outside of
a container would otherwise be visible to processes in the container.
-More importantly, this allows easier and more flexible
+Such leakages could, for example,
+reveal information about the container framework
+to containerized applications.
+.IP *
+It allows easier and more flexible
confinement of container root tasks, because they can mount
-their own cgroup filesystems without needing to gain access to ancestor
+their own cgroup filesystems without gaining access to ancestor
cgroup directories.
-So, for example, even if
-.I /cg/1
-is owned by uid 100000, a task namespaced under
-.I /cg/1/2
-owned by UID 100000 can mount that cgroup but not change settings in
+Consider, for example, the following scenario:
+.RS 4
+.IP \(bu 2
+We have a cgroup directory,
+.IR /cg/1 ,
+that is owned by user ID 9000.
+.IP \(bu
+We have a process,
+.IR X ,
+also owned by user ID 9000,
+that is namespaced under the cgroup
+.IR /cg/1/2
+(i.e.,
+.I X
+was placed in a new cgroup namespace via
+.BR clone (2)
+or
+.BR unshare (2)
+with the
+.BR CLONE_NEWCGROUP
+flag).
+.RE
+.IP
+In the absence of cgroup namespacing, because the cgroup directory
+.IR /cg/1
+is owned (and writable) by UID 9000 and process X is also owned
+by user ID 9000, then process X would be able to modify the contents
+of cgroups files (i.e., change cgroup settings) not only in
+.IR /cg/1/2
+but also in the ancestor cgroup directory
.IR /cg/1 .
+Namespacing process
+.IR X
+under the cgroup directory
+.IR /cg/1/2
+prevents it modifying files in
+.IR /cg/1 ,
+since it cannot even see the contents of that directory
+(or of further removed cgroup ancestor directories).
Combined with correct enforcement of hierarchical limits,
-this prevents that task from escaping its limits.
-
-Use of cgroup namespaces requires a kernel that is configured with the
-.B CONFIG_CGROUPS
-option.
+this prevents that process X from escaping the limits imposed
+by ancestor cgroups.
.\"
.\" ==================== IPC namespaces ====================
.\"