]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/cgroups.7
cgroups.7: srcfix: remove FIXME
[thirdparty/man-pages.git] / man7 / cgroups.7
CommitLineData
014cb63b 1.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
4242dfbe 2.\" and Copyright (C) 2016, 2017 Michael Kerrisk <mtk.manpages@gmail.com>
014cb63b
MK
3.\"
4.\" %%%LICENSE_START(VERBATIM)
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
24.\" %%%LICENSE_END
25.\"
4b8c67d9 26.TH CGROUPS 7 2017-09-15 "Linux" "Linux Programmer's Manual"
21f0d132
MK
27.SH NAME
28cgroups \- Linux control groups
29.SH DESCRIPTION
30Control cgroups, usually referred to as cgroups,
a15e0673 31are a Linux kernel feature which allow processes to
8bff7140
MK
32be organized into hierarchical groups whose usage of
33various types of resources can then be limited and monitored.
34The kernel's cgroup interface is provided through
21f0d132 35a pseudo-filesystem called cgroupfs.
6398ca15 36Grouping is implemented in the core cgroup kernel code,
21f0d132 37while resource tracking and limits are implemented in
8bff7140 38a set of per-resource-type subsystems (memory, CPU, and so on).
21f0d132 39.\"
176a4211
MK
40.SS Terminology
41A
42.I cgroup
43is a collection of processes that are bound to a set of
44limits or parameters defined via the cgroup filesystem.
a721e8b2 45.PP
176a4211
MK
46A
47.I subsystem
48is a kernel component that modifies the behavior of
49the processes in a cgroup.
50Various subsystems have been implemented, making it possible to do things
51such as limiting the amount of CPU time and memory available to a cgroup,
52accounting for the CPU time used by a cgroup,
53and freezing and resuming execution of the processes in a cgroup.
54Subsystems are sometimes also known as
55.IR "resource controllers"
56(or simply, controllers).
a721e8b2 57.PP
55f52de8 58The cgroups for a controller are arranged in a
176a4211
MK
59.IR hierarchy .
60This hierarchy is defined by creating, removing, and
61renaming subdirectories within the cgroup filesystem.
8fc9db1e
MK
62At each level of the hierarchy, attributes (e.g., limits) can be defined.
63The limits, control, and accounting provided by cgroups generally have
64effect throughout the subhierarchy underneath the cgroup where the
65attributes are defined.
8bff7140
MK
66Thus, for example, the limits placed on
67a cgroup at a higher level in the hierarchy cannot be exceeded
68by descendant cgroups.
176a4211 69.\"
43df1ab3
MK
70.SS Cgroups version 1 and version 2
71The initial release of the cgroups implementation was in Linux 2.6.24.
55f52de8 72Over time, various cgroup controllers have been added
43df1ab3 73to allow the management of various types of resources.
55f52de8
MK
74However, the development of these controllers was largely uncoordinated,
75with the result that many inconsistencies arose between controllers
43df1ab3
MK
76and management of the cgroup hierarchies became rather complex.
77(A longer description of these problems can be found in
78the kernel source file
0a837899 79.IR Documentation/cgroup\-v2.txt .)
a721e8b2 80.PP
813d9220
MK
81Because of the problems with the initial cgroups implementation
82(cgroups version 1),
43df1ab3
MK
83starting in Linux 3.10, work began on a new,
84orthogonal implementation to remedy these problems.
85Initially marked experimental, and hidden behind the
86.I "\-o\ __DEVEL__sane_behavior"
87mount option, the new version (cgroups version 2)
88was eventually made official with the release of Linux 4.5.
89Differences between the two versions are described in the text below.
a721e8b2 90.PP
43df1ab3
MK
91Although cgroups v2 is intended as a replacement for cgroups v1,
92the older system continues to exist
93(and for compatibility reasons is unlikely to be removed).
94Currently, cgroups v2 implements only a subset of the controllers
95available in cgroups v1.
96The two systems are implemented so that both v1 controllers and
97v2 controllers can be mounted on the same system.
98Thus, for example, it is possible to use those controllers
99that are supported under version 2,
100while also using version 1 controllers
101where version 2 does not yet support those controllers.
1a90a85e
MK
102The only restriction here is that a controller can't be simultaneously
103employed in both a cgroups v1 hierarchy and in the cgroups v2 hierarchy.
43df1ab3 104.\"
5714ccee 105.SH CGROUPS VERSION 1
8bff7140
MK
106Under cgroups v1, each controller may be mounted against a separate
107cgroup filesystem that provides its own hierarchical organization of the
108processes on the system.
109It is also possible comount multiple (or even all) cgroups v1 controllers
110against the same cgroup filesystem, meaning that the comounted controllers
111manage the same hierarchical organization of processes.
a721e8b2 112.PP
8bff7140
MK
113For each mounted hierarchy,
114the directory tree mirrors the control group hierarchy.
115Each control group is represented by a directory, with each of its child
116control cgroups represented as a child directory.
117For instance,
118.IR /user/joe/1.session
119represents control group
120.IR 1.session ,
121which is a child of cgroup
122.IR joe ,
123which is a child of
124.IR /user .
125Under each cgroup directory is a set of files which can be read or
126written to, reflecting resource limits and a few general cgroup
127properties.
a721e8b2 128.PP
8bff7140 129In addition, in cgroups v1,
55f52de8 130cgroups can be mounted with no bound controller, in which case
8bff7140 131they serve only to track processes.
59dabd75 132(See the discussion of release notification below.)
8bff7140
MK
133An example of this is the
134.I name=systemd
135cgroup which is used by
136.BR systemd (1)
137to track services and user sessions.
138.\"
6398ca15 139.SS Tasks (threads) versus processes
c775bca2
MK
140In cgroups v1, a distinction is drawn between
141.I processes
142and
143.IR tasks .
144In this view, a process can consist of multiple tasks
6398ca15
MK
145(more commonly called threads, from a user-space perspective,
146and called such in the remainder of this man page).
0ec74e08 147In cgroups v1, it is possible to independently manipulate
6398ca15 148the cgroup memberships of the threads in a process.
c775bca2
MK
149Because this ability caused certain problems,
150.\" FIXME Add some text describing why this was a problem.
151the ability to independently manipulate the cgroup memberships
6398ca15 152of the threads in a process has been removed in cgroups v2.
c775bca2
MK
153Cgroups v2 allows manipulation of cgroup membership only for processes
154(which has the effect of changing the cgroup membership of
6398ca15 155all threads in the process).
c775bca2 156.\"
77e0a626
MK
157.SS Mounting v1 controllers
158The use of cgroups requires a kernel built with the
8e6578f8
KF
159.BR CONFIG_CGROUP
160option.
77e0a626
MK
161In addition, each of the v1 controllers has an associated
162configuration option that must be set in order to employ that controller.
a721e8b2 163.PP
77e0a626
MK
164In order to use a v1 controller,
165it must be mounted against a cgroup filesystem.
4e07c70f
MK
166The usual place for such mounts is under a
167.BR tmpfs (5)
168filesystem mounted at
77e0a626
MK
169.IR /sys/fs/cgroup .
170Thus, one might mount the
171.I cpu
172controller as follows:
a721e8b2 173.PP
77e0a626 174.in +4n
b8302363 175.EX
77e0a626 176mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
b8302363 177.EE
e646a1ba 178.in
a721e8b2 179.PP
77e0a626
MK
180It is possible to comount multiple controllers against the same hierarchy.
181For example, here the
182.IR cpu
21f0d132 183and
77e0a626
MK
184.IR cpuacct
185controllers are comounted against a single hierarchy:
a721e8b2 186.PP
21f0d132 187.in +4n
b8302363 188.EX
77e0a626 189mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
b8302363 190.EE
e646a1ba 191.in
a721e8b2 192.PP
55f52de8 193Comounting controllers has the effect that a process is in the same cgroup for
77e0a626 194all of the comounted controllers.
55f52de8 195Separately mounting controllers allows a process to
21f0d132
MK
196be in cgroup
197.I /foo1
55f52de8 198for one controller while being in
21f0d132
MK
199.I /foo2/foo3
200for another.
a721e8b2 201.PP
77e0a626 202It is possible to comount all v1 controllers against the same hierarchy:
a721e8b2 203.PP
77e0a626 204.in +4n
b8302363 205.EX
77e0a626 206mount \-t cgroup \-o all cgroup /sys/fs/cgroup
b8302363 207.EE
e646a1ba 208.in
a721e8b2 209.PP
77e0a626
MK
210(One can achieve the same result by omitting
211.IR "\-o all" ,
212since it is the default if no controllers are explicitly specified.)
a721e8b2 213.PP
31ec2a5c
MK
214It is not possible to mount the same controller
215against multiple cgroup hierarchies.
216For example, it is not possible to mount both the
217.I cpu
218and
219.I cpuacct
220controllers against one hierarchy, and to mount the
221.I cpu
222controller alone against another hierarchy.
223It is possible to create multiple mount points with exactly
224the same set of comounted controllers.
225However, in this case all that results is multiple mount points
226providing a view of the same hierarchy.
a721e8b2 227.PP
77e0a626
MK
228Note that on many systems, the v1 controllers are automatically mounted under
229.IR /sys/fs/cgroup ;
230in particular,
231.BR systemd (1)
232automatically creates such mount points.
21f0d132 233.\"
7409b54b
MK
234.SS Unmounting v1 controllers
235A mounted cgroup filesystem can be unmounted using the
236.BR umount (8)
237command, as in the following example:
238.PP
239.in +4n
240.EX
241umount /sys/fs/cgroup/pids
242.EE
243.in
244.PP
245.IR "But note well" :
246a cgroup filesystem is unmounted only if it is not busy,
247that is, it has no child cgroups.
248If this is not the case, then the only effect of the
249.BR umount (8)
250is to make the mount invisible.
251Thus, to ensure that the mount point is really removed,
252one must first remove all child cgroups,
253which in turn can be done only after all member processes
254have been moved from those cgroups to the root cgroup.
255.\"
860573ad
MK
256.SS Cgroups version 1 controllers
257Each of the cgroups version 1 controllers is governed
258by a kernel configuration option (listed below).
259Additionally, the availability of the cgroups feature is governed by the
260.BR CONFIG_CGROUPS
261kernel configuration option.
262.TP
263.IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP )
264Cgroups can be guaranteed a minimum number of "CPU shares"
265when a system is busy.
266This does not limit a cgroup's CPU usage if the CPUs are not busy.
4ad9a706
MK
267For further information, see
268.IR Documentation/scheduler/sched-design-CFS.txt .
a721e8b2 269.IP
4ad9a706
MK
270In Linux 3.2,
271this controller was extended to provide CPU "bandwidth" control.
272If the kernel is configured with
81ff7360 273.BR CONFIG_CFS_BANDWIDTH ,
4ad9a706
MK
274then within each scheduling period
275(defined via a file in the cgroup directory), it is possible to define
276an upper limit on the CPU time allocated to the processes in a cgroup.
277This upper limit applies even if there is no other competition for the CPU.
860573ad
MK
278Further information can be found in the kernel source file
279.IR Documentation/scheduler/sched\-bwc.txt .
280.TP
281.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
282This provides accounting for CPU usage by groups of processes.
a721e8b2 283.IP
860573ad
MK
284Further information can be found in the kernel source file
285.IR Documentation/cgroup\-v1/cpuacct.txt .
286.TP
287.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
288This cgroup can be used to bind the processes in a cgroup to
289a specified set of CPUs and NUMA nodes.
a721e8b2 290.IP
860573ad
MK
291Further information can be found in the kernel source file
292.IR Documentation/cgroup\-v1/cpusets.txt .
293.TP
294.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
295The memory controller supports reporting and limiting of process memory, kernel
296memory, and swap used by cgroups.
a721e8b2 297.IP
860573ad
MK
298Further information can be found in the kernel source file
299.IR Documentation/cgroup\-v1/memory.txt .
300.TP
301.IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP )
302This supports controlling which processes may create (mknod) devices as
303well as open them for reading or writing.
304The policies may be specified as whitelists and blacklists.
305Hierarchy is enforced, so new rules must not
306violate existing rules for the target or ancestor cgroups.
a721e8b2 307.IP
860573ad
MK
308Further information can be found in the kernel source file
309.IR Documentation/cgroup-v1/devices.txt .
310.TP
311.IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP )
312The
313.IR freezer
314cgroup can suspend and restore (resume) all processes in a cgroup.
315Freezing a cgroup
316.I /A
317also causes its children, for example, processes in
318.IR /A/B ,
319to be frozen.
a721e8b2 320.IP
860573ad
MK
321Further information can be found in the kernel source file
322.IR Documentation/cgroup-v1/freezer-subsystem.txt .
323.TP
324.IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP )
325This places a classid, specified for the cgroup, on network packets
326created by a cgroup.
327These classids can then be used in firewall rules,
328as well as used to shape traffic using
329.BR tc (8).
330This applies only to packets
331leaving the cgroup, not to traffic arriving at the cgroup.
a721e8b2 332.IP
860573ad
MK
333Further information can be found in the kernel source file
334.IR Documentation/cgroup-v1/net_cls.txt .
335.TP
336.IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP )
337The
338.I blkio
339cgroup controls and limits access to specified block devices by
340applying IO control in the form of throttling and upper limits against leaf
341nodes and intermediate nodes in the storage hierarchy.
a721e8b2 342.IP
860573ad
MK
343Two policies are available.
344The first is a proportional-weight time-based division
345of disk implemented with CFQ.
346This is in effect for leaf nodes using CFQ.
347The second is a throttling policy which specifies
348upper I/O rate limits on a device.
a721e8b2 349.IP
860573ad
MK
350Further information can be found in the kernel source file
351.IR Documentation/cgroup-v1/blkio-controller.txt .
352.TP
353.IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP )
354This controller allows
355.I perf
356monitoring of the set of processes grouped in a cgroup.
a721e8b2 357.IP
860573ad 358Further information can be found in the kernel source file
c174eb6a 359.IR tools/perf/Documentation/perf-record.txt .
860573ad
MK
360.TP
361.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
362This allows priorities to be specified, per network interface, for cgroups.
a721e8b2 363.IP
860573ad
MK
364Further information can be found in the kernel source file
365.IR Documentation/cgroup-v1/net_prio.txt .
366.TP
367.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
368This supports limiting the use of huge pages by cgroups.
a721e8b2 369.IP
860573ad
MK
370Further information can be found in the kernel source file
371.IR Documentation/cgroup-v1/hugetlb.txt .
372.TP
373.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
374This controller permits limiting the number of process that may be created
375in a cgroup (and its descendants).
a721e8b2 376.IP
860573ad
MK
377Further information can be found in the kernel source file
378.IR Documentation/cgroup-v1/pids.txt .
cfec905e
NB
379.TP
380.IR rdma " (since Linux 4.11; " \fBCONFIG_CGROUP_RDMA\fP )
d145c025
MK
381The RDMA controller permits limiting the use of
382RDMA/IB-specific resources per cgroup.
cfec905e
NB
383.IP
384Further information can be found in the kernel source file
385.IR Documentation/cgroup-v1/rdma.txt .
860573ad 386.\"
6398ca15 387.SS Creating cgroups and moving processes
9ed582ac 388A cgroup filesystem initially contains a single root cgroup, '/',
6398ca15 389which all processes belong to.
21f0d132 390A new cgroup is created by creating a directory in the cgroup filesystem:
a721e8b2 391.PP
4769a778
MK
392.in +4n
393.EX
394mkdir /sys/fs/cgroup/cpu/cg1
395.EE
396.in
a721e8b2 397.PP
21f0d132 398This creates a new empty cgroup.
a721e8b2 399.PP
f524e7f8 400A process may be moved to this cgroup by writing its PID into the cgroup's
21f0d132 401.I cgroup.procs
21f0d132 402file:
a721e8b2 403.PP
4769a778
MK
404.in +4n
405.EX
406echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
407.EE
408.in
a721e8b2 409.PP
f524e7f8 410Only one PID at a time should be written to this file.
a721e8b2 411.PP
f524e7f8
MK
412Writing the value 0 to a
413.IR cgroup.procs
414file causes the writing process to be moved to the corresponding cgroup.
a721e8b2 415.PP
6398ca15
MK
416When writing a PID into the
417.IR cgroup.procs ,
87402a2e 418all threads in the process are moved into the new cgroup at once.
a721e8b2 419.PP
f524e7f8
MK
420Within a hierarchy, a process can be a member of exactly one cgroup.
421Writing a process's PID to a
422.IR cgroup.procs
423file automatically removes it from the cgroup of
424which it was previously a member.
a721e8b2 425.PP
f524e7f8
MK
426The
427.I cgroup.procs
428file can be read to obtain a list of the processes that are
429members of a cgroup.
430The returned list of PIDs is not guaranteed to be in order.
431Nor is it guaranteed to be free of duplicates.
432(For example, a PID may be recycled while reading from the list.)
a721e8b2 433.PP
87402a2e
MK
434In cgroups v1 (but not cgroups v2), an individual thread can be moved to
435another cgroup by writing its thread ID
436(i.e., the kernel thread ID returned by
437.BR clone (2)
438and
439.BR gettid (2))
440to the
441.IR tasks
442file in a cgroup directory.
443This file can be read to discover the set of threads
444that are members of the cgroup.
445This file is not present in cgroup v2 directories.
b43be47e
MK
446.\"
447.SS Removing cgroups
448To remove a cgroup,
449it must first have no child cgroups and contain no (nonzombie) processes.
450So long as that is the case, one can simply
451remove the corresponding directory pathname.
452Note that files in a cgroup directory cannot and need not be
453removed.
454.\"
88afe701 455.SS Cgroups v1 release notification
23388d41
MK
456Two files can be used to determine whether the kernel provides
457notifications when a cgroup becomes empty.
458A cgroup is considered to be empty when it contains no child
459cgroups and no member processes.
a721e8b2 460.PP
23388d41 461A special file in the root directory of each cgroup hierarchy,
88afe701 462.IR release_agent ,
23388d41
MK
463can be used to register the pathname of a program that may be invoked when
464a cgroup in the hierarchy becomes empty.
465The pathname of the newly empty cgroup (relative to the cgroup mount point)
466is provided as the sole command-line argument when the
467.IR release_agent
468program is invoked.
469The
470.IR release_agent
471program might remove the cgroup directory,
472or perhaps repopulate with a process.
a721e8b2 473.PP
23388d41
MK
474The default value of the
475.IR release_agent
476file is empty, meaning that no release agent is invoked.
a721e8b2 477.PP
23388d41
MK
478Whether or not the
479.IR release_agent
480program is invoked when a particular cgroup becomes empty is determined
481by the value in the
88afe701 482.IR notify_on_release
23388d41
MK
483file in the corresponding cgroup directory.
484If this file contains the value 0, then the
485.IR release_agent
486program is not invoked.
487If it contains the value 1, the
488.IR release_agent
489program is invoked.
490The default value for this file in the root cgroup is 0.
491At the time when a new cgroup is created,
492the value in this file is inherited from the corresponding file
493in the parent cgroup.
88afe701 494.\"
5714ccee 495.SH CGROUPS VERSION 2
b43be47e
MK
496In cgroups v2,
497all mounted controllers reside in a single unified hierarchy.
498While (different) controllers may be simultaneously
499mounted under the v1 and v2 hierarchies,
500it is not possible to mount the same controller simultaneously
501under both the v1 and the v2 hierarchies.
a721e8b2 502.PP
2befa495
MK
503The new behaviors in cgroups v2 are summarized here,
504and in some cases elaborated in the following subsections.
505.IP 1. 3
a15e0673 506Cgroups v2 provides a unified hierarchy against
dddb7ea1
MK
507which all controllers are mounted.
508.IP 2.
2befa495
MK
509"Internal" processes are not permitted.
510With the exception of the root cgroup, processes may reside
511only in leaf nodes (cgroups that do not themselves contain child cgroups).
4f017a68 512The details are somewhat more subtle than this, and are described below.
dddb7ea1 513.IP 3.
2befa495
MK
514Active cgroups must be specified via the files
515.IR cgroup.controllers
516and
517.IR cgroup.subtree_control .
dddb7ea1 518.IP 4.
2befa495
MK
519The
520.I tasks
521file has been removed.
522In addition, the
523.I cgroup.clone_children
524file that is employed by the
525.I cpuset
526controller has been removed.
dddb7ea1 527.IP 5.
2befa495
MK
528An improved mechanism for notification of empty cgroups is provided by the
529.IR cgroup.events
530file.
531.PP
532For more changes, see the
533.I Documentation/cgroup-v2.txt
534file in the kernel source.
e91d4f9e
MK
535.PP
536Some of the new behaviors listed above saw subsequent modification with
537the addition in Linux 4.14 of "thread mode" (described below).
2befa495 538.\"
dddb7ea1
MK
539.SS Cgroups v2 unified hierarchy
540In cgroups v1, the ability to mount different controllers
541against different hierarchies was intended to allow great flexibility
542for application design.
543In practice, though, the flexibility turned out to less useful than expected,
544and in many cases added complexity.
545Therefore, in cgroups v2,
546all available controllers are mounted against a single hierarchy.
547The available controllers are automatically mounted,
548meaning that it is not necessary (or possible) to specify the controllers
549when mounting the cgroup v2 filesystem using a command such as the following:
a721e8b2 550.PP
4769a778
MK
551.in +4n
552.EX
553mount -t cgroup2 none /mnt/cgroup2
554.EE
555.in
a721e8b2 556.PP
dddb7ea1
MK
557A cgroup v2 controller is available only if it is not currently in use
558via a mount against a cgroup v1 hierarchy.
559Or, to put things another way, it is not possible to employ
560the same controller against both a v1 hierarchy and the unified v2 hierarchy.
57cbb0db
MK
561This means that it may be necessary first to unmount a v1 controller
562(as described above) before that controller is available in v2.
563Since
564.BR systemd (1)
565makes heavy use of some v1 controllers by default,
566it can in some cases be simpler to boot the system with
567selected v1 controllers disabled.
568To do this, specify the
569.IR cgroup_no_v1=list
570option on the kernel boot command line;
571.I list
572is a comma-separated list of the names of the controllers to disable,
573or the word
574.I all
575to disable all v1 controllers.
576(This situation is correctly handled by
577.BR systemd (1),
578which falls back to operating without the specified controllers.)
03bb1264
MK
579.PP
580Note that on many modern systems,
581.BR systemd (1)
582automatically mounts the
583.I cgroup2
584filesystem at
585.I /sys/fs/cgroup/unified
586during the boot process.
dddb7ea1 587.\"
44c429ed
MK
588.SS Cgroups v2 controllers
589The following controllers, documented in the kernel source file
590.IR Documentation/cgroup-v2.txt ,
591are supported in cgroups version 2:
592.TP
593.IR io " (since Linux 4.5)"
594This is the successor of the version 1
595.I blkio
596controller.
597.TP
598.IR memory " (since Linux 4.5)"
599This is the successor of the version 1
600.I memory
601controller.
602.TP
603.IR pids " (since Linux 4.5)"
604This is the same as the version 1
605.I pids
606controller.
607.TP
608.IR perf_event " (since Linux 4.11)"
f7286edc 609This is the same as the version 1
44c429ed
MK
610.I perf_event
611controller.
612.TP
613.IR rdma " (since Linux 4.11)"
614This is the same as the version 1
615.I rdma
616controller.
617.TP
618.IR cpu " (since Linux 4.15)"
619This is the successor to the version 1
620.I cpu
621and
622.I cpuacct
623controllers.
624.\"
2befa495 625.SS Cgroups v2 subtree control
8d5f42dc
MK
626Each cgroup in the v2 hierarchy contains the following two files:
627.TP
628.IR cgroup.controllers
629This is a list of the controllers that are
630.I available
631in this cgroup.
632The contents of this file match the contents of the
633.I cgroup.subtree_control
634file in the parent cgroup.
635.TP
636.I cgroup.subtree_control
637This is a list of controllers that are
638.IR active
639.RI ( enabled )
640in the cgroup.
641The set of controllers in this file is a subset of the set in the
21f0d132 642.IR cgroup.controllers
8d5f42dc
MK
643of this cgroup.
644The set of active controllers is modified by writing strings to this file
645containing space-delimited controller names,
646each preceded by '+' (to enable a controller)
647or '\-' (to disable a controller), as in the following example:
648.IP
649.in +4n
650.EX
651echo '+pids -memory' > x/y/cgroup.subtree_control
652.EE
653.in
654.IP
c9b101d1
MK
655An attempt to enable a controller
656that is not present in
657.I cgroup.controllers
658leads to an
659.B ENOENT
660error when writing to the
661.I cgroup.subtree_control
662file.
663.PP
8d5f42dc
MK
664Because the list of controllers in
665.I cgroup.subtree_control
666is a subset of those
667.IR cgroup.controllers ,
668a controller that has been disabled in one cgroup in the hierarchy
669can never be re-enabled in the subtree below that cgroup.
670.PP
671A cgroup's
672.I cgroup.subtree_control
673file determines the set of controllers that are exercised in the
674.I child
675cgroups.
676When a controller (e.g.,
677.IR pids )
678is present in the
679.I cgroup.subtree_control
680file of a parent cgroup,
681then the corresponding controller-interface files (e.g.,
682.IR pids.max )
683are automatically created in the children of that cgroup
684and can be used to exert resource control in the child cgroups.
21f0d132 685.\"
2468f14e
MK
686.SS Cgroups v2 """no internal processes""" rule
687Cgroups v2 enforces a so-called "no internal processes" rule.
688Roughly speaking, this rule means that,
689with the exception of the root cgroup, processes may reside
690only in leaf nodes (cgroups that do not themselves contain child cgroups).
691This avoids the need to decide how to partition resources between
692processes which are members of cgroup A and processes in child cgroups of A.
693.PP
694For instance, if cgroup
695.I /cg1/cg2
696exists, then a process may reside in
697.IR /cg1/cg2 ,
698but not in
699.IR /cg1 .
700This is to avoid an ambiguity in cgroups v1
701with respect to the delegation of resources between processes in
702.I /cg1
703and its child cgroups.
704The recommended approach in cgroups v2 is to create a subdirectory called
705.I leaf
706for any nonleaf cgroup which should contain processes, but no child cgroups.
707Thus, processes which previously would have gone into
708.I /cg1
709would now go into
710.IR /cg1/leaf .
711This has the advantage of making explicit
712the relationship between processes in
713.I /cg1/leaf
714and
715.IR /cg1 's
716other children.
717.PP
718The "no internal processes" rule is in fact more subtle than stated above.
719More precisely, the rule is that a (nonroot) cgroup can't both
720(1) have member processes, and
721(2) distribute resources into child cgroups\(emthat is, have a nonempty
722.I cgroup.subtree_control
723file.
724Thus, it
725.I is
726possible for a cgroup to have both member processes and child cgroups,
727but before controllers can be enabled for that cgroup,
728the member processes must be moved out of the cgroup
729(e.g., perhaps into the child cgroups).
e91d4f9e
MK
730.PP
731With the Linux 4.14 addition of "thread mode" (described below),
732the "no internal processes" rule has been relaxed in some cases.
2468f14e 733.\"
754f4cf5
MK
734.SS Cgroups v2 cgroup.events file
735With cgroups v2, a new mechanism is provided to obtain notification
736about when a cgroup becomes empty.
737The cgroups v1
738.IR release_agent
739and
740.IR notify_on_release
741files are removed, and replaced by a new, more general-purpose file,
742.IR cgroup.events .
e5bd7e65 743This read-only file contains key-value pairs
754f4cf5
MK
744(delimited by newline characters, with the key and value separated by spaces)
745that identify events or state for a cgroup.
746Currently, only one key appears in this file,
747.IR populated ,
748which has either the value 0,
749meaning that the cgroup (and its descendants)
750contain no (nonzombie) processes,
751or 1, meaning that the cgroup contains member processes.
a721e8b2 752.PP
754f4cf5
MK
753The
754.IR cgroup.events
755file can be monitored, in order to receive notification when a cgroup
756transitions between the populated and unpopulated states (or vice versa).
757When monitoring this file using
758.BR inotify (7),
759transitions generate
760.BR IN_MODIFY
761events, and when monitoring the file using
762.BR poll (2),
763transitions generate
764.B POLLPRI
765events.
a721e8b2 766.PP
ccb1a262
MK
767The cgroups v2 release-notification mechanism provided by the
768.I populated
769field of the
770.I cgroup.events
771file offers at least two advantages over the cgroups v1
754f4cf5
MK
772.IR release_agent
773mechanism.
774First, it allows for cheaper notification,
775since a single process can monitor multiple
776.IR cgroup.events
777files.
778By contrast, the cgroups v1 mechanism requires the creation
779of a process for each notification.
a15e0673 780Second, notification can be delegated to a process that lives inside
754f4cf5 781a container associated with the newly empty cgroup.
c91a9f8a 782.\"
5e071499
MK
783.SS Cgroups v2 cgroup.stat file
784.\" commit ec39225cca42c05ac36853d11d28f877fde5c42e
785Each cgroup in the v2 hierarchy contains a read-only
786.IR cgroup.stat
787file (first introduced in Linux 4.14)
788that consists of lines containing key-value pairs.
789The following keys currently appear in this file:
790.TP
791.I nr_descendants
792This is the total number of visible (i.e., living) descendant cgroups
793underneath this cgroup.
794.TP
795.I nr_dying_descendants
796This is the total number of dying descendant cgroups
797underneath this cgroup.
798A cgroup enters the dying state after being deleted.
799It remains in that state for an undefined period
800(which will depend on system load)
c7f63e74
MK
801while resources are freed before the cgroup is destroyed.
802Note that the presence of some cgroups in the dying state is normal,
803and is not indicative of any problem.
5e071499
MK
804.IP
805A process can't be made a member of a dying cgroup,
806and a dying cgroup can't be brought back to life.
807.\"
5845e10b
MK
808.SS Limiting the number of descendant cgroups
809Each cgroup in the v2 hierarchy contains the following files,
810which can be used to view and set limits on the number
811of descendant cgroups under that cgroup:
812.TP
813.IR cgroup.max.depth " (since Linux 4.14)"
814.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
815This file defines a limit on the depth of nesting of descendant cgroups.
816A value of 0 in this file means that no descendant cgroups can be created.
817An attempt to create a descendant whose nesting level exceeds
818the limit fails
819.RI ( mkdir (2)
820fails with the error
821.BR EAGAIN ).
822.IP
823Writing the string
824.IR """max"""
825to this file means that no limit is imposed.
826The default value in this file is
827.IR """max""" .
828.TP
829.IR cgroup.max.descendants " (since Linux 4.14)"
830.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
831This file defines a limit on the number of live descendant cgroups that
832this cgroup may have.
833An attempt to create more descendants than allowed by the limit fails
834.RI ( mkdir (2)
835fails with the error
836.BR EAGAIN ).
837.IP
838Writing the string
839.IR """max"""
840to this file means that no limit is imposed.
841The default value in this file is
842.IR """max""" .
843.\"
148e0800 844.SS Cgroups v2 delegation: delegation to a less privileged user
4242dfbe
MK
845In the context of cgroups,
846delegation means passing management of some subtree
847of the cgroup hierarchy to a nonprivileged process.
848Cgroups v1 provides support for delegation that was
849accidental and not fully secure.
850Cgroups v2 supports delegation by explicit design.
851.PP
852Some terminology is required in order to describe delegation.
853A
854.I delegater
855is a privileged user (i.e., root) who owns a parent cgroup.
856A
857.I delegatee
858is a nonprivileged user who will be granted the permissions needed
859to manage some subhierarchy under that parent cgroup,
860known as the
861.IR "delegated subtree" .
862.PP
863To perform delegation,
864the delegater makes certain directories and files writable by the delegatee,
865typically by changing the ownership of the objects to be the user ID
866of the delegatee.
0735069b
MK
867Assuming that we want to delegate the hierarchy rooted at (say)
868.I /dlgt_grp
4242dfbe
MK
869and that there are not yet any child cgroups under that cgroup,
870the ownership of the following is changed to the user ID of the delegatee:
871.TP
0735069b 872.IR /dlgt_grp
4242dfbe
MK
873Changing the ownership of the root of the subtree means that any new
874cgroups created under the subtree (and the files they contain)
875will also be owned by the delegatee.
876.TP
0735069b 877.IR /dlgt_grp/cgroup.procs
f7286edc 878Changing the ownership of this file means that the delegatee
4242dfbe
MK
879can move processes into the root of the delegated subtree.
880.TP
0735069b 881.IR /dlgt_grp/cgroup.subtree_control
e5936eb6
MK
882Changing the ownership of this file means that that the delegatee
883can enable controllers (that are present in
0735069b 884.IR /dlgt_grp/cgroup.controllers )
4242dfbe 885in order to further redistribute resources at lower levels in the subtree.
e5936eb6
MK
886(As an alternative to changing the ownership of this file,
887the delegater might instead add selected controllers to this file.)
4242dfbe
MK
888.PP
889The delegater should
890.I not
891change the ownership of any of the controller interfaces files (e.g.,
892.IR pids.max ,
893.IR memory.high )
894in
0735069b 895.IR dlgt_grp .
4242dfbe
MK
896Those files are used from the next level above the delegated subtree
897in order to distribute resources into the subtree,
898and the delegatee should not have permission to change
899the resources that are distributed into the delegated subtree.
900.PP
668ef765
MK
901See also the discussion of the
902.IR /sys/kernel/cgroup/delegate
903file in NOTES.
904.PP
4242dfbe
MK
905After the aforementioned steps have been performed,
906the delegatee can create child cgroups within the delegated subtree
907and move processes between cgroups in the subtree.
908If some controllers are present in
0735069b 909.IR dlgt_grp/cgroup.subtree_control ,
4242dfbe 910or the ownership of that file was passed to the delegatee,
f7286edc 911the delegatee can also control the further redistribution
4242dfbe 912of the corresponding resources into the delegated subtree.
27b086e9 913.\"
ed3f4f34
MK
914.SS Cgroups v2 delegation: nsdelegate and cgroup namespaces
915.\"
916.\" To test this, it can be useful to boot the kernel with the options:
917.\"
918.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
919.\"
920.\" The effect of the latter option is to prevent systemd from employing
921.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2.
922.\"
923Starting with Linux 4.13,
924.\" commit 5136f6365ce3eace5a926e10f16ed2a233db5ba9
925there is a second way to perform cgroup delegation.
926This is done by mounting the cgroup v2 filesystem with the
927.I nsdelegate
928mount option:
929.PP
930.in +4n
931.EX
932$ mount -t cgroup2 -o nsdelegate none /sys/fs/cgroup/unified
933.EE
934.in
935.PP
936The effect of this option is to cause cgroup namespaces
937to automatically become delegation boundaries.
938More specifically,
939the following restrictions apply for processes inside the cgroup namespace:
940.IP * 3
941Writes to controller interface files in the root directory
942will fail with the error
943.BR EPERM .
944Processes inside the cgroup namespace can still write to delegatable
945files such as
946.IR cgroup.procs
947and
948.IR cgroup.subtree_control ,
949and can create subhierarchy underneath the root directory of
950the cgroup namespace.
951.IP *
952Attempts to migrate processes across the namespace boundary are denied
953(with the error
954.BR ENOENT ).
955Processes inside the cgroup namespace can still
956(subject to the containment rules described below)
957move processes between cgroups
958.I within
959the subhierarchy under the namespace root.
960.PP
961The ability to define cgroup namespaces as delegation boundaries
962makes cgroup namespaces more useful.
963To understand why, suppose that we already have one cgroup hierarchy
964that has been delegated to a nonprivileged user,
965.IR cecilia ,
966using the older delegation technique described above.
967Suppose further that
968.I cecilia
969wanted to further delegate a subhierarchy
970under the existing delegated hierarchy.
971(For example, the delegated hierarchy might be associated with
972an unprivileged container run by
973.IR cecilia .)
974Even if a cgroup namespace was employed,
975because both hierarchies are owned by the unprivileged user
976.IR cecilia ,
977the following illegitimate actions could be performed:
978.IP * 3
979A process in the inferior hierarchy could change the
980resource controller settings in the root directory of the that hierarchy.
981(These resource controller settings are intended to allow control to
982be exercised from the
983.I parent
984cgroup;
985a process inside the child cgroup should not be allowed to modify them.)
986.IP *
987A process inside the inferior hierarchy could move processes
988into and out of the inferior hierarchy if the cgroups in the
989superior hierarchy were somehow visible.
990.PP
991Employing the
992.I nsdelegate
993mount option prevents both of these possibilities.
994.PP
995The
996.I nsdelegate
997mount option only has an effect when performed in
998the initial mount namespace;
999in other mount namespaces, the option is silently ignored.
1000.\"
27b086e9 1001.SS Cgroup v2 delegation containment rules
4242dfbe
MK
1002Some delegation
1003.IR "containment rules"
1004ensure that the delegatee can move processes between cgroups within the
1005delegated subtree,
1006but can't move processes from outside the delegated subtree into
1007the subtree or vice versa.
1008A nonprivileged process (i.e., the delegatee) can write the PID of
1009a "target" process into a
1010.IR cgroup.procs
1011file only if all of the following are true:
1012.IP * 3
4242dfbe
MK
1013The writer has write permission on the
1014.I cgroup.procs
1015file in the destination cgroup.
1016.IP *
1017The writer has write permission on the
1018.I cgroup.procs
1019file in the common ancestor of the source and destination cgroups.
1020(In some cases,
1021the common ancestor may be the source or destination cgroup itself.)
28f612ea 1022.IP *
ed3f4f34
MK
1023If the cgroup v2 filesystem was mounted with the
1024.I nsdelegate
1025option, the writer must be able to see the source and destination cgroup
1026from its cgroup namespace.
1027.IP *
28f612ea
MK
1028Before Linux 4.11:
1029.\" commit 576dd464505fc53d501bb94569db76f220104d28
1030the effective UID of the writer (i.e., the delegatee) matches the
1031real user ID or the saved set-user-ID of the target process.
1032(This was a historical requirement inherited from cgroups v1
1033that was later deemed unnecessary,
1034since the other rules suffice for containment in cgroups v2.)
4242dfbe
MK
1035.PP
1036.IR Note :
1037one consequence of these delegation containment rules is that the
0735069b
MK
1038unprivileged delegatee can't place the first process into
1039the delegated subtree;
1040instead, the delegater must place the first process
1041(a process owned by the delegatee) into the delegated subtree.
4242dfbe 1042.\"
75e83bc2 1043.SH CGROUPS VERSION 2 THREAD MODE
c8902e25
MK
1044Among the restrictions imposed by cgroups v2 that were not present
1045in cgroups v1 are the following:
1046.IP * 3
1047.IR "No thread-granularity control" :
1048all of the threads of a process must be in the same cgroup.
1049.IP *
1050.IR "No internal processes" :
1051a cgroup can't both have member processes and
1052exercise controllers on child cgroups.
1053.PP
1054Both of these restrictions were added because
1055the lack of these restrictions had caused problems
1056in cgroups v1.
1057In particular, the cgroups v1 ability to allow thread-level granularity
1058for cgroup membership made no sense for some controllers.
1059(A notable example was the
1060.I memory
1061controller: since threads share an address space,
1062it made no sense to split threads across different
1063.I memory
1064cgroups.)
1065.PP
1066Notwithstanding the initial design decision in cgroups v2,
1067there were use cases for certain controllers, notably the
1068.IR cpu
1069controller,
1070for which thread-level granularity of control was meaningful and useful.
1071To accommodate such use cases, Linux 4.14 added
1072.I "thread mode"
1073for cgroups v2.
1074.PP
1075Thread mode allows the following:
1076.IP * 3
1077The creation of
1078.IR "threaded subtrees"
1079in which the threads of a process may
1080be spread across cgroups inside the tree.
1081(A threaded subtree may contain multiple multithreaded processes.)
1082.IP *
1083The concept of
1084.IR "threaded controllers",
1085which can distribute resources across the cgroups in a threaded subtree.
1086.IP *
1087A relaxation of the "no internal processes rule",
1088so that, within a threaded subtree,
1089a cgroup can both contain member threads and
1090exercise resource control over child cgroups.
1091.PP
1092With the addition of thread mode,
1093each nonroot cgroup now contains a new file,
1094.IR cgroup.type ,
1095that exposes, and in some circumstances can be used to change,
1096the "type" of a cgroup.
1097This file contains one of the following type values:
1098.TP
1099.I "domain"
1100This is a normal v2 cgroup that provides process-granularity control.
1101If a process is a member of this cgroup,
1102then all threads of the process are (by definition) in the same cgroup.
1103This is the default cgroup type,
1104and provides the same behavior that was provided for
1105cgroups in the initial cgroups v2 implementation.
1106.TP
1107.I "threaded"
1108This cgroup is a member of a threaded subtree.
1109Threads can be added to this cgroup,
1110and controllers can be enabled for the cgroup.
1111.TP
1112.I "domain threaded"
1113This is a domain cgroup that serves as the root of a threaded subtree.
1114This cgroup type is also known as "threaded root".
1115.TP
1116.I "domain invalid"
1117This is a cgroup inside a threaded subtree
1118that is in an "invalid" state.
1119Processes can't be added to the cgroup,
1120and controllers can't be enabled for the cgroup.
1121The only thing that can be done with this cgroup (other than deleting it)
1122is to convert it to a
1123.IR threaded
1124cgroup by writing the string
1125.IR """threaded"""
1126to the
1127.I cgroup.type
1128file.
1129.\"
1130.SS Threaded versus domain controllers
1131With the addition of threads mode,
1132cgroups v2 now distinguishes two types of resource controllers:
1133.IP * 3
1134.I Threaded
1135controllers: these controllers support thread-granularity for
1136resource control and can be enabled inside threaded subtrees,
1137with the result that the corresponding controller-interface files
1138appear inside the cgroups in the threaded subtree.
1139As at Linux 4.15, the following controllers are threaded:
1140.IR cpu ,
1141.IR perf_event ,
1142and
1143.IR pids .
1144.IP *
1145.I Domain
1146controllers: these controllers support only process granularity
1147for resource control.
1148From the perspective of a domain controller,
1149all threads of a process are always in the same cgroup.
1150Domain controllers can't be enabled inside a threaded subtree.
1151.\"
1152.SS Creating a threaded subtree
1153There are two pathways that lead to the creation of a threaded subtree.
1154The first pathway proceeds as follows:
1155.IP 1. 3
1156We write the string
1157.IR """threaded"""
1158to the
1159.I cgroup.type
1160file of a cgroup
1161.IR y/z
1162that currently has the type
1163.IR domain .
1164This has the following effects:
1165.RS
1166.IP * 3
1167The type of the cgroup
1168.IR y/z
1169becomes
1170.IR threaded .
1171.IP *
1172The type of the parent cgroup,
1173.IR y ,
1174becomes
1175.IR "domain threaded" .
1176The parent cgroup is the root of a threaded subtree
1177(also known as the "threaded root").
1178.IP *
1179All other cgroups under
1180.IR y
1181that were not already of type
1182.IR threaded
1183(because they were inside already existing threaded subtrees
1184under the new threaded root)
1185are converted to type
1186.IR "domain invalid" .
1187Any subsequently created cgroups under
1188.I y
1189will also have the type
1190.IR "domain invalid" .
1191.RE
1192.IP 2.
1193We write the string
1194.IR """threaded"""
1195to each of the
1196.IR "domain invalid"
1197cgroups under
1198.IR y ,
1199in order to convert them to the type
1200.IR threaded .
1201As a consequence of this step, all threads under the threaded root
1202now have the type
1203.IR threaded
1204and the threaded subtree is now fully usable.
1205The requirement to write
1206.IR """threaded"""
1207to each of these cgroups is somewhat cumbersome,
1208but allows for possible future extensions to the thread-mode model.
1209.PP
1210The second way of creating a threaded subtree is as follows:
1211.IP 1. 3
1212In an existing cgroup,
1213.IR z ,
1214that currently has the type
1215.IR domain ,
1216we (1) enable one or more threaded controllers and
1217(2) make a process a member of
1218.IR z .
1219(These two steps can be done in either order.)
1220This has the following consequences:
1221.RS
1222.IP * 3
1223The type of
1224.I z
1225becomes
1226.IR "domain threaded" .
1227.IP *
1228All of the descendant cgroups of
1229.I x
1230that are were not already of type
1231.IR threaded
1232are converted to type
1233.IR "domain invalid" .
1234.RE
1235.IP 2.
1236As before, we make the threaded subtree usable by writing the string
1237.IR """threaded"""
1238to each of the
1239.IR "domain invalid"
1240cgroups under
1241.IR y ,
1242in order to convert them to the type
1243.IR threaded .
1244.PP
1245One of the consequences of the above pathways to creating a threaded subtree
1246is that the threaded root cgroup can be a parent only to
1247.I threaded
1248(and
1249.IR "domain invalid" )
1250cgroups.
1251The threaded root cgroup can't be a parent of a
1252.I domain
1253cgroups, and a
1254.I threaded
1255cgroup
1256can't have a sibling that is a
1257.I domain
1258cgroup.
1259.\"
1260.SS Using a threaded subtree
1261Within a threaded subtree, threaded controllers can be enabled
1262in each subgroup whose type has been changed to
1263.IR threaded ;
1264upon doing so, the corresponding controller interface files
1265appear in the children of that cgroup.
1266.PP
1267A process can be moved into a threaded subtree by writing its PID to the
1268.I cgroup.procs
1269file in one of the cgroups inside the tree.
1270This has the effect of making all of the threads
1271in the process members of the corresponding cgroup
1272and makes the process a member of the threaded subtree.
1273The threads of the process can then be spread across
1274the threaded subtree by writing their thread IDs (see
1275.BR gettid (2))
1276to the
1277cgroup.threads
1278files in different cgroups inside the subtree.
1279The threads of a process must all reside in the same threaded subtree.
1280.PP
1281The
1282cgroup.threads
1283file is present in each cgroup (including
1284.I domain
1285cgroups) and can be read in order to discover the set of threads
1286that is present in the cgroup.
1287The set of thread IDs obtained when reading this file
1288is not guaranteed to be ordered or free of duplicates.
1289.PP
1290The
1291.I cgroup.procs
1292file in the threaded root shows the PIDs of all processes
1293that are members of the threaded subtree.
1294The
1295.I cgroup.procs
1296files in the other cgroups in the subtree are not readable.
1297.PP
1298Domain controllers can't be enabled in a threaded subtree;
1299no controller-interface files appear inside the cgroups underneath the
1300threaded root.
1301From the point of view of a domain controller,
1302threaded subtrees are invisible:
1303a multithreaded process inside a threaded subtree appears to a domain
1304controller as a process that resides in the threaded root cgroup.
1305.PP
1306Within a threaded subtree, the "no internal processes" rule does not apply:
1307a cgroup can both contain member processes (or thread)
1308and exercise controllers on child cgroups.
1309.\"
1310.SS Rules for writing to cgroup.type and creating threaded subtrees
1311A number of rules apply when writing to the
1312.I cgroup.type
1313file:
1314.IP * 3
1315Only the string
1316.IR """threaded"""
1317may be written.
1318In other words, the only explicit transition that is possible is to convert a
1319.I domain
1320cgroup to type
1321.IR threaded .
1322.IP *
1323The string
1324.IR """threaded"""
1325can be written only if the current value in
1326.IR cgroup.type
1327is one of the following
1328.RS
1329.IP \(bu 3
1330.IR domain ,
1331to start the creation of a threaded subtree via
1332the first of the pathways described above;
1333.IP \(bu
1334.IR "domain\ invalid" ,
1335to convert one of the cgroups in a threaded subtree into a usable (i.e.,
1336.IR threaded )
1337state;
1338.IP \(bu
1339.IR threaded ,
1340which has no effect (a "no-op").
1341.RE
1342.IP *
1343We can't write to a
1344.I cgroup.type
1345file if the parent's type is
1346.IR "domain invalid" .
1347In other words, the cgroups of a threaded subtree must be converted to the
1348.I threaded
1349state in a top-down manner.
1350.PP
00c27092 1351There are also some constraints that must be satisfied
c8902e25
MK
1352in order to create a threaded subtree rooted at the cgroup
1353.IR x :
1354.IP * 3
1355There can be no member processes in the descendant cgroups of
1356.IR x .
1357(The cgroup
1358.I x
1359can itself have member processes.)
1360.IP *
1361No domain controllers may be enabled in
1362.IR x 's
1363.IR cgroup.subtree_control
1364file.
c8902e25
MK
1365.PP
1366If any of the above constraints is violated, then an attempt to write
1367.IR """threaded"""
1368to a
1369.IR cgroup.type
1370file fails with the error
1371.BR ENOTSUP .
1372.\"
1373.SS The """domain threaded""" cgroup type
1374According to the pathways described above,
1375the type of a cgroup can change to
1376.IR "domain threaded"
1377in either of the following cases:
1378.IP * 3
1379The string
1380.IR """threaded"""
1381is written to a child cgroup.
1382.IP *
1383A threaded controller is enabled inside the cgroup and
1384a process is made a member of the cgroup.
1385.PP
1386A
1387.IR "domain threaded"
1388cgroup,
1389.IR x ,
1390can revert to the type
1391.IR domain
1392if the above conditions no longer hold true\(emthat is, if all
1393.I threaded
1394child cgroups of
1395.I x
1396are removed and either
1397.I x
1398no longer has threaded controllers enabled or
1399no longer has member processes.
1400.PP
1401When a
1402.IR "domain threaded"
1403cgroup
1404.IR x
1405reverts to the type
1406.IR domain :
1407.IP * 3
1408All
1409.IR "domain invalid"
1410descendants of
1411.I x
1412that are not in lower-level threaded subtrees revert to the type
1413.IR domain .
1414.IP *
1415The root cgroups in any lower-level threaded subtrees revert to the type
1416.IR "domain threaded" .
1417.\"
1418.SS Exceptions for the root cgroup
1419The root cgroup of the v2 hierarchy is treated exceptionally:
1420it can be the parent of both
1421.I domain
1422and
1423.I threaded
1424cgroups.
1425If the string
1426.I """threaded"""
1427is written to the
1428.I cgroup.type
1429file of one of the children of the root cgroup, then
1430.IP * 3
1431The type of that cgroup becomes
1432.IR threaded .
1433.IP *
1434The type of any descendants of that cgroup that
1435are not part of lower-level threaded subtrees changes to
1436.IR "domain invalid" .
1437.PP
1438Note that in this case, there is no cgroup whose type becomes
1439.IR "domain threaded" .
1440(Notionally, the root cgroup can be considered as the threaded root
1441for the cgroup whose type was changed to
1442.IR threaded .)
1443.PP
1444The aim of this exceptional treatment for the root cgroup is to
1445allow a threaded cgroup that employs the
1446.I cpu
1447controller to be placed as high as possible in the hierarchy,
1448so as to minimize the (small) cost of traversing the cgroup hierarchy.
1449.\"
1450.SS The cgroups v2 """cpu""" controller and realtime processes
1451As at Linux 4.15, the cgroups v2
1452.I cpu
1453controller does not support control of realtime processes,
1454and the controller can be enabled in the root cgroup only
1455if all realtime threads are in the root cgroup.
1456(If there are realtime processes in nonroot cgroups, then a
1457.BR write (2)
1458of the string
1459.IR """+cpu"""
1460to the
1461.I cgroup.subtree_control
1462file fails with the error
1463.BR EINVAL .
1464However, on some systems,
1465.BR systemd (1)
1466places certain realtime processes in nonroot cgroups in the v2 hierarchy.
1467On such systems,
1468these processes must first be moved to the root cgroup before the
1469.I cpu
1470controller can be enabled.
1471.\"
1472.SH ERRORS
1473The following errors can occur for
1474.BR mount (2):
1475.TP
1476.B EBUSY
1477An attempt to mount a cgroup version 1 filesystem specified neither the
1478.I name=
1479option (to mount a named hierarchy) nor a controller name (or
1480.IR all ).
1481.SH NOTES
1482A child process created via
1483.BR fork (2)
1484inherits its parent's cgroup memberships.
1485A process's cgroup memberships are preserved across
1486.BR execve (2).
1487.\"
5c2181ad
MK
1488.SS /proc files
1489.TP
34eb3340 1490.IR /proc/cgroups " (since Linux 2.6.24)"
92bb6d36 1491This file contains information about the controllers
1a4f7d59 1492that are compiled into the kernel.
34eb3340
MK
1493An example of the contents of this file (reformatted for readability)
1494is the following:
a721e8b2 1495.IP
34eb3340 1496.in +4n
b8302363 1497.EX
4580c2f6
MK
1498#subsys_name hierarchy num_cgroups enabled
1499cpuset 4 1 1
1500cpu 8 1 1
1501cpuacct 8 1 1
1502blkio 6 1 1
1503memory 3 1 1
1504devices 10 84 1
1505freezer 7 1 1
1506net_cls 9 1 1
1507perf_event 5 1 1
1508net_prio 9 1 1
1509hugetlb 0 1 0
1510pids 2 1 1
b8302363 1511.EE
e646a1ba 1512.in
a721e8b2 1513.IP
34eb3340
MK
1514The fields in this file are, from left to right:
1515.RS
1516.IP 1. 3
1517The name of the controller.
1518.IP 2.
92bb6d36 1519The unique ID of the cgroup hierarchy on which this controller is mounted.
11c0797f 1520If multiple cgroups v1 controllers are bound to the same hierarchy,
34eb3340 1521then each will show the same hierarchy ID in this field.
92bb6d36
MK
1522The value in this field will be 0 if:
1523.RS 5
1524.IP a) 3
1525the controller is not mounted on a cgroups v1 hierarchy;
1526.IP b)
1527the controller is bound to the cgroups v2 single unified hierarchy; or
1528.IP c)
1529the controller is disabled (see below).
1530.RE
34eb3340
MK
1531.IP 3.
1532The number of control groups in this hierarchy using this controller.
1533.IP 4.
1534This field contains the value 1 if this controller is enabled,
1535or 0 if it has been disabled (via the
1536.IR cgroup_disable
1537kernel command-line boot parameter).
1538.RE
1539.TP
5c2181ad 1540.IR /proc/[pid]/cgroup " (since Linux 2.6.24)"
f5faa016
MK
1541This file describes control groups to which the process
1542with the corresponding PID belongs.
5f8a7eb2 1543The displayed information differs for
2c4fbe35 1544cgroups version 1 and version 2 hierarchies.
a721e8b2 1545.IP
5f8a7eb2 1546For each cgroup hierarchy of which the process is a member,
2e33b59e 1547there is one entry containing three colon-separated fields:
a721e8b2 1548.IP
4769a778
MK
1549.in +4n
1550.EX
1551hierarchy-ID:controller-list:cgroup-path
1552.EE
1553.in
a721e8b2 1554.IP
5f8a7eb2 1555For example:
c1a022dc
MK
1556.IP
1557.in +4n
1558.EX
15595:cpuacct,cpu,cpuset:/daemons
1560.EE
1561.in
5c2181ad
MK
1562.IP
1563The colon-separated fields are, from left to right:
5f8a7eb2 1564.RS
5c2181ad 1565.IP 1. 3
5f8a7eb2
MK
1566For cgroups version 1 hierarchies,
1567this field contains a unique hierarchy ID number
1568that can be matched to a hierarchy ID in
1569.IR /proc/cgroups .
1570For the cgroups version 2 hierarchy, this field contains the value 0.
5c2181ad 1571.IP 2.
5f8a7eb2 1572For cgroups version 1 hierarchies,
55f52de8 1573this field contains a comma-separated list of the controllers
5f8a7eb2
MK
1574bound to the hierarchy.
1575For the cgroups version 2 hierarchy, this field is empty.
5c2181ad 1576.IP 3.
5f8a7eb2
MK
1577This field contains the pathname of the control group in the hierarchy
1578to which the process belongs.
1579This pathname is relative to the mount point of the hierarchy.
5c2181ad 1580.RE
668ef765
MK
1581.\"
1582.SS /sys/kernel/cgroup files
1583.TP
1584.IR /sys/kernel/cgroup/delegate " (since Linux 4.15)"
1585.\" commit 01ee6cfb1483fe57c9cbd8e73817dfbf9bacffd3
1586This file exports a list of the cgroups v2 files
1587(one per line) that are delegatable
1588(i.e., whose ownership should be changed to the user ID of the delegatee).
1589In the future, the set of delegatable files may change or grow,
1590and this file provides a way for the kernel to inform
1591user-space applications of which files must be delegated.
1592As at Linux 4.15, one sees the following when inspecting this file:
1593.IP
1594.EX
1595.in +4n
1596$ \fBcat /sys/kernel/cgroup/delegate\fP
1597cgroup.procs
1598cgroup.subtree_control
1599.in
1600.EE
6413d784
MK
1601.TP
1602.IR /sys/kernel/cgroup/features " (since Linux 4.15)"
1603.\" commit 5f2e673405b742be64e7c3604ed4ed3ac14f35ce
1604Over time, the set of cgroups v2 features that are provided by the
1605kernel may change or grow,
1606or some features may not be enabled by default.
1607This file provides a way for user-space applications to discover what
1608features the running kernel supports or has enabled.
1609Features are listed one per line:
1610.IP
1611.in +4n
1612.EX
1613.EE
1614$ \fBcat /sys/kernel/cgroup/features\fP
1615nsdelegate
1616.in
1617.IP
1618The entries that can appear in this file are:
1619.RS
1620.TP
1621.IR nsdelegate " (since Linux 4.15)"
1622The kernel supports the
1623.I nsdelegate
1624mount option.
1625.RE
2e23a9b2
MK
1626.SH ERRORS
1627The following errors can occur for
1628.BR mount (2):
1629.TP
1630.B EBUSY
1631An attempt to mount a cgroup version 1 filesystem specified neither the
1632.I name=
1633option (to mount a named hierarchy) nor a controller name (or
28bcfee9 1634.IR all ).
15ce4b0c
MK
1635.SH NOTES
1636A child process created via
1637.BR fork (2)
1638inherits its parent's cgroup memberships.
1639A process's cgroup memberships are preserved across
1640.BR execve (2).
bbfdf727 1641.SH SEE ALSO
ebbc83be 1642.BR prlimit (1),
f60a5da2 1643.BR systemd (1),
edc2a022
MK
1644.BR systemd-cgls (1),
1645.BR systemd-cgtop (1),
325b7eb0 1646.BR clone (2),
ebbc83be
MK
1647.BR ioprio_set (2),
1648.BR perf_event_open (2),
1649.BR setrlimit (2),
cff6de30 1650.BR cgroup_namespaces (7),
69c47536 1651.BR cpuset (7),
ebbc83be
MK
1652.BR namespaces (7),
1653.BR sched (7),
1654.BR user_namespaces (7)