]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/cgroups.7
cgroups.7: Mention the existence of "thread mode" in Linux 4.14
[thirdparty/man-pages.git] / man7 / cgroups.7
CommitLineData
014cb63b 1.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
4242dfbe 2.\" and Copyright (C) 2016, 2017 Michael Kerrisk <mtk.manpages@gmail.com>
014cb63b
MK
3.\"
4.\" %%%LICENSE_START(VERBATIM)
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
24.\" %%%LICENSE_END
25.\"
4b8c67d9 26.TH CGROUPS 7 2017-09-15 "Linux" "Linux Programmer's Manual"
21f0d132
MK
27.SH NAME
28cgroups \- Linux control groups
29.SH DESCRIPTION
30Control cgroups, usually referred to as cgroups,
a15e0673 31are a Linux kernel feature which allow processes to
8bff7140
MK
32be organized into hierarchical groups whose usage of
33various types of resources can then be limited and monitored.
34The kernel's cgroup interface is provided through
21f0d132 35a pseudo-filesystem called cgroupfs.
6398ca15 36Grouping is implemented in the core cgroup kernel code,
21f0d132 37while resource tracking and limits are implemented in
8bff7140 38a set of per-resource-type subsystems (memory, CPU, and so on).
21f0d132 39.\"
176a4211
MK
40.SS Terminology
41A
42.I cgroup
43is a collection of processes that are bound to a set of
44limits or parameters defined via the cgroup filesystem.
a721e8b2 45.PP
176a4211
MK
46A
47.I subsystem
48is a kernel component that modifies the behavior of
49the processes in a cgroup.
50Various subsystems have been implemented, making it possible to do things
51such as limiting the amount of CPU time and memory available to a cgroup,
52accounting for the CPU time used by a cgroup,
53and freezing and resuming execution of the processes in a cgroup.
54Subsystems are sometimes also known as
55.IR "resource controllers"
56(or simply, controllers).
a721e8b2 57.PP
55f52de8 58The cgroups for a controller are arranged in a
176a4211
MK
59.IR hierarchy .
60This hierarchy is defined by creating, removing, and
61renaming subdirectories within the cgroup filesystem.
8fc9db1e
MK
62At each level of the hierarchy, attributes (e.g., limits) can be defined.
63The limits, control, and accounting provided by cgroups generally have
64effect throughout the subhierarchy underneath the cgroup where the
65attributes are defined.
8bff7140
MK
66Thus, for example, the limits placed on
67a cgroup at a higher level in the hierarchy cannot be exceeded
68by descendant cgroups.
176a4211 69.\"
43df1ab3
MK
70.SS Cgroups version 1 and version 2
71The initial release of the cgroups implementation was in Linux 2.6.24.
55f52de8 72Over time, various cgroup controllers have been added
43df1ab3 73to allow the management of various types of resources.
55f52de8
MK
74However, the development of these controllers was largely uncoordinated,
75with the result that many inconsistencies arose between controllers
43df1ab3
MK
76and management of the cgroup hierarchies became rather complex.
77(A longer description of these problems can be found in
78the kernel source file
0a837899 79.IR Documentation/cgroup\-v2.txt .)
a721e8b2 80.PP
813d9220
MK
81Because of the problems with the initial cgroups implementation
82(cgroups version 1),
43df1ab3
MK
83starting in Linux 3.10, work began on a new,
84orthogonal implementation to remedy these problems.
85Initially marked experimental, and hidden behind the
86.I "\-o\ __DEVEL__sane_behavior"
87mount option, the new version (cgroups version 2)
88was eventually made official with the release of Linux 4.5.
89Differences between the two versions are described in the text below.
a721e8b2 90.PP
43df1ab3
MK
91Although cgroups v2 is intended as a replacement for cgroups v1,
92the older system continues to exist
93(and for compatibility reasons is unlikely to be removed).
94Currently, cgroups v2 implements only a subset of the controllers
95available in cgroups v1.
96The two systems are implemented so that both v1 controllers and
97v2 controllers can be mounted on the same system.
98Thus, for example, it is possible to use those controllers
99that are supported under version 2,
100while also using version 1 controllers
101where version 2 does not yet support those controllers.
1a90a85e
MK
102The only restriction here is that a controller can't be simultaneously
103employed in both a cgroups v1 hierarchy and in the cgroups v2 hierarchy.
43df1ab3 104.\"
8bff7140
MK
105.SS Cgroups version 1
106Under cgroups v1, each controller may be mounted against a separate
107cgroup filesystem that provides its own hierarchical organization of the
108processes on the system.
109It is also possible comount multiple (or even all) cgroups v1 controllers
110against the same cgroup filesystem, meaning that the comounted controllers
111manage the same hierarchical organization of processes.
a721e8b2 112.PP
8bff7140
MK
113For each mounted hierarchy,
114the directory tree mirrors the control group hierarchy.
115Each control group is represented by a directory, with each of its child
116control cgroups represented as a child directory.
117For instance,
118.IR /user/joe/1.session
119represents control group
120.IR 1.session ,
121which is a child of cgroup
122.IR joe ,
123which is a child of
124.IR /user .
125Under each cgroup directory is a set of files which can be read or
126written to, reflecting resource limits and a few general cgroup
127properties.
a721e8b2 128.PP
8bff7140 129In addition, in cgroups v1,
55f52de8 130cgroups can be mounted with no bound controller, in which case
8bff7140 131they serve only to track processes.
59dabd75 132(See the discussion of release notification below.)
8bff7140
MK
133An example of this is the
134.I name=systemd
135cgroup which is used by
136.BR systemd (1)
137to track services and user sessions.
138.\"
6398ca15 139.SS Tasks (threads) versus processes
c775bca2
MK
140In cgroups v1, a distinction is drawn between
141.I processes
142and
143.IR tasks .
144In this view, a process can consist of multiple tasks
6398ca15
MK
145(more commonly called threads, from a user-space perspective,
146and called such in the remainder of this man page).
0ec74e08 147In cgroups v1, it is possible to independently manipulate
6398ca15 148the cgroup memberships of the threads in a process.
c775bca2
MK
149Because this ability caused certain problems,
150.\" FIXME Add some text describing why this was a problem.
151the ability to independently manipulate the cgroup memberships
6398ca15 152of the threads in a process has been removed in cgroups v2.
c775bca2
MK
153Cgroups v2 allows manipulation of cgroup membership only for processes
154(which has the effect of changing the cgroup membership of
6398ca15 155all threads in the process).
c775bca2 156.\"
77e0a626
MK
157.SS Mounting v1 controllers
158The use of cgroups requires a kernel built with the
8e6578f8
KF
159.BR CONFIG_CGROUP
160option.
77e0a626
MK
161In addition, each of the v1 controllers has an associated
162configuration option that must be set in order to employ that controller.
a721e8b2 163.PP
77e0a626
MK
164In order to use a v1 controller,
165it must be mounted against a cgroup filesystem.
4e07c70f
MK
166The usual place for such mounts is under a
167.BR tmpfs (5)
168filesystem mounted at
77e0a626
MK
169.IR /sys/fs/cgroup .
170Thus, one might mount the
171.I cpu
172controller as follows:
a721e8b2 173.PP
77e0a626 174.in +4n
b8302363 175.EX
77e0a626 176mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
b8302363 177.EE
e646a1ba 178.in
a721e8b2 179.PP
77e0a626
MK
180It is possible to comount multiple controllers against the same hierarchy.
181For example, here the
182.IR cpu
21f0d132 183and
77e0a626
MK
184.IR cpuacct
185controllers are comounted against a single hierarchy:
a721e8b2 186.PP
21f0d132 187.in +4n
b8302363 188.EX
77e0a626 189mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
b8302363 190.EE
e646a1ba 191.in
a721e8b2 192.PP
55f52de8 193Comounting controllers has the effect that a process is in the same cgroup for
77e0a626 194all of the comounted controllers.
55f52de8 195Separately mounting controllers allows a process to
21f0d132
MK
196be in cgroup
197.I /foo1
55f52de8 198for one controller while being in
21f0d132
MK
199.I /foo2/foo3
200for another.
a721e8b2 201.PP
77e0a626 202It is possible to comount all v1 controllers against the same hierarchy:
a721e8b2 203.PP
77e0a626 204.in +4n
b8302363 205.EX
77e0a626 206mount \-t cgroup \-o all cgroup /sys/fs/cgroup
b8302363 207.EE
e646a1ba 208.in
a721e8b2 209.PP
77e0a626
MK
210(One can achieve the same result by omitting
211.IR "\-o all" ,
212since it is the default if no controllers are explicitly specified.)
a721e8b2 213.PP
31ec2a5c
MK
214It is not possible to mount the same controller
215against multiple cgroup hierarchies.
216For example, it is not possible to mount both the
217.I cpu
218and
219.I cpuacct
220controllers against one hierarchy, and to mount the
221.I cpu
222controller alone against another hierarchy.
223It is possible to create multiple mount points with exactly
224the same set of comounted controllers.
225However, in this case all that results is multiple mount points
226providing a view of the same hierarchy.
a721e8b2 227.PP
77e0a626
MK
228Note that on many systems, the v1 controllers are automatically mounted under
229.IR /sys/fs/cgroup ;
230in particular,
231.BR systemd (1)
232automatically creates such mount points.
21f0d132 233.\"
7409b54b
MK
234.SS Unmounting v1 controllers
235A mounted cgroup filesystem can be unmounted using the
236.BR umount (8)
237command, as in the following example:
238.PP
239.in +4n
240.EX
241umount /sys/fs/cgroup/pids
242.EE
243.in
244.PP
245.IR "But note well" :
246a cgroup filesystem is unmounted only if it is not busy,
247that is, it has no child cgroups.
248If this is not the case, then the only effect of the
249.BR umount (8)
250is to make the mount invisible.
251Thus, to ensure that the mount point is really removed,
252one must first remove all child cgroups,
253which in turn can be done only after all member processes
254have been moved from those cgroups to the root cgroup.
255.\"
860573ad
MK
256.SS Cgroups version 1 controllers
257Each of the cgroups version 1 controllers is governed
258by a kernel configuration option (listed below).
259Additionally, the availability of the cgroups feature is governed by the
260.BR CONFIG_CGROUPS
261kernel configuration option.
262.TP
263.IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP )
264Cgroups can be guaranteed a minimum number of "CPU shares"
265when a system is busy.
266This does not limit a cgroup's CPU usage if the CPUs are not busy.
4ad9a706
MK
267For further information, see
268.IR Documentation/scheduler/sched-design-CFS.txt .
a721e8b2 269.IP
4ad9a706
MK
270In Linux 3.2,
271this controller was extended to provide CPU "bandwidth" control.
272If the kernel is configured with
81ff7360 273.BR CONFIG_CFS_BANDWIDTH ,
4ad9a706
MK
274then within each scheduling period
275(defined via a file in the cgroup directory), it is possible to define
276an upper limit on the CPU time allocated to the processes in a cgroup.
277This upper limit applies even if there is no other competition for the CPU.
860573ad
MK
278Further information can be found in the kernel source file
279.IR Documentation/scheduler/sched\-bwc.txt .
280.TP
281.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
282This provides accounting for CPU usage by groups of processes.
a721e8b2 283.IP
860573ad
MK
284Further information can be found in the kernel source file
285.IR Documentation/cgroup\-v1/cpuacct.txt .
286.TP
287.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
288This cgroup can be used to bind the processes in a cgroup to
289a specified set of CPUs and NUMA nodes.
a721e8b2 290.IP
860573ad
MK
291Further information can be found in the kernel source file
292.IR Documentation/cgroup\-v1/cpusets.txt .
293.TP
294.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
295The memory controller supports reporting and limiting of process memory, kernel
296memory, and swap used by cgroups.
a721e8b2 297.IP
860573ad
MK
298Further information can be found in the kernel source file
299.IR Documentation/cgroup\-v1/memory.txt .
300.TP
301.IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP )
302This supports controlling which processes may create (mknod) devices as
303well as open them for reading or writing.
304The policies may be specified as whitelists and blacklists.
305Hierarchy is enforced, so new rules must not
306violate existing rules for the target or ancestor cgroups.
a721e8b2 307.IP
860573ad
MK
308Further information can be found in the kernel source file
309.IR Documentation/cgroup-v1/devices.txt .
310.TP
311.IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP )
312The
313.IR freezer
314cgroup can suspend and restore (resume) all processes in a cgroup.
315Freezing a cgroup
316.I /A
317also causes its children, for example, processes in
318.IR /A/B ,
319to be frozen.
a721e8b2 320.IP
860573ad
MK
321Further information can be found in the kernel source file
322.IR Documentation/cgroup-v1/freezer-subsystem.txt .
323.TP
324.IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP )
325This places a classid, specified for the cgroup, on network packets
326created by a cgroup.
327These classids can then be used in firewall rules,
328as well as used to shape traffic using
329.BR tc (8).
330This applies only to packets
331leaving the cgroup, not to traffic arriving at the cgroup.
a721e8b2 332.IP
860573ad
MK
333Further information can be found in the kernel source file
334.IR Documentation/cgroup-v1/net_cls.txt .
335.TP
336.IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP )
337The
338.I blkio
339cgroup controls and limits access to specified block devices by
340applying IO control in the form of throttling and upper limits against leaf
341nodes and intermediate nodes in the storage hierarchy.
a721e8b2 342.IP
860573ad
MK
343Two policies are available.
344The first is a proportional-weight time-based division
345of disk implemented with CFQ.
346This is in effect for leaf nodes using CFQ.
347The second is a throttling policy which specifies
348upper I/O rate limits on a device.
a721e8b2 349.IP
860573ad
MK
350Further information can be found in the kernel source file
351.IR Documentation/cgroup-v1/blkio-controller.txt .
352.TP
353.IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP )
354This controller allows
355.I perf
356monitoring of the set of processes grouped in a cgroup.
a721e8b2 357.IP
860573ad 358Further information can be found in the kernel source file
c174eb6a 359.IR tools/perf/Documentation/perf-record.txt .
860573ad
MK
360.TP
361.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
362This allows priorities to be specified, per network interface, for cgroups.
a721e8b2 363.IP
860573ad
MK
364Further information can be found in the kernel source file
365.IR Documentation/cgroup-v1/net_prio.txt .
366.TP
367.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
368This supports limiting the use of huge pages by cgroups.
a721e8b2 369.IP
860573ad
MK
370Further information can be found in the kernel source file
371.IR Documentation/cgroup-v1/hugetlb.txt .
372.TP
373.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
374This controller permits limiting the number of process that may be created
375in a cgroup (and its descendants).
a721e8b2 376.IP
860573ad
MK
377Further information can be found in the kernel source file
378.IR Documentation/cgroup-v1/pids.txt .
cfec905e
NB
379.TP
380.IR rdma " (since Linux 4.11; " \fBCONFIG_CGROUP_RDMA\fP )
d145c025
MK
381The RDMA controller permits limiting the use of
382RDMA/IB-specific resources per cgroup.
cfec905e
NB
383.IP
384Further information can be found in the kernel source file
385.IR Documentation/cgroup-v1/rdma.txt .
860573ad 386.\"
6398ca15 387.SS Creating cgroups and moving processes
9ed582ac 388A cgroup filesystem initially contains a single root cgroup, '/',
6398ca15 389which all processes belong to.
21f0d132 390A new cgroup is created by creating a directory in the cgroup filesystem:
a721e8b2 391.PP
4769a778
MK
392.in +4n
393.EX
394mkdir /sys/fs/cgroup/cpu/cg1
395.EE
396.in
a721e8b2 397.PP
21f0d132 398This creates a new empty cgroup.
a721e8b2 399.PP
f524e7f8 400A process may be moved to this cgroup by writing its PID into the cgroup's
21f0d132 401.I cgroup.procs
21f0d132 402file:
a721e8b2 403.PP
4769a778
MK
404.in +4n
405.EX
406echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
407.EE
408.in
a721e8b2 409.PP
f524e7f8 410Only one PID at a time should be written to this file.
a721e8b2 411.PP
f524e7f8
MK
412Writing the value 0 to a
413.IR cgroup.procs
414file causes the writing process to be moved to the corresponding cgroup.
a721e8b2 415.PP
6398ca15
MK
416When writing a PID into the
417.IR cgroup.procs ,
87402a2e 418all threads in the process are moved into the new cgroup at once.
a721e8b2 419.PP
f524e7f8
MK
420Within a hierarchy, a process can be a member of exactly one cgroup.
421Writing a process's PID to a
422.IR cgroup.procs
423file automatically removes it from the cgroup of
424which it was previously a member.
a721e8b2 425.PP
f524e7f8
MK
426The
427.I cgroup.procs
428file can be read to obtain a list of the processes that are
429members of a cgroup.
430The returned list of PIDs is not guaranteed to be in order.
431Nor is it guaranteed to be free of duplicates.
432(For example, a PID may be recycled while reading from the list.)
a721e8b2 433.PP
87402a2e
MK
434In cgroups v1 (but not cgroups v2), an individual thread can be moved to
435another cgroup by writing its thread ID
436(i.e., the kernel thread ID returned by
437.BR clone (2)
438and
439.BR gettid (2))
440to the
441.IR tasks
442file in a cgroup directory.
443This file can be read to discover the set of threads
444that are members of the cgroup.
445This file is not present in cgroup v2 directories.
b43be47e
MK
446.\"
447.SS Removing cgroups
448To remove a cgroup,
449it must first have no child cgroups and contain no (nonzombie) processes.
450So long as that is the case, one can simply
451remove the corresponding directory pathname.
452Note that files in a cgroup directory cannot and need not be
453removed.
454.\"
88afe701 455.SS Cgroups v1 release notification
23388d41
MK
456Two files can be used to determine whether the kernel provides
457notifications when a cgroup becomes empty.
458A cgroup is considered to be empty when it contains no child
459cgroups and no member processes.
a721e8b2 460.PP
23388d41 461A special file in the root directory of each cgroup hierarchy,
88afe701 462.IR release_agent ,
23388d41
MK
463can be used to register the pathname of a program that may be invoked when
464a cgroup in the hierarchy becomes empty.
465The pathname of the newly empty cgroup (relative to the cgroup mount point)
466is provided as the sole command-line argument when the
467.IR release_agent
468program is invoked.
469The
470.IR release_agent
471program might remove the cgroup directory,
472or perhaps repopulate with a process.
a721e8b2 473.PP
23388d41
MK
474The default value of the
475.IR release_agent
476file is empty, meaning that no release agent is invoked.
a721e8b2 477.PP
23388d41
MK
478Whether or not the
479.IR release_agent
480program is invoked when a particular cgroup becomes empty is determined
481by the value in the
88afe701 482.IR notify_on_release
23388d41
MK
483file in the corresponding cgroup directory.
484If this file contains the value 0, then the
485.IR release_agent
486program is not invoked.
487If it contains the value 1, the
488.IR release_agent
489program is invoked.
490The default value for this file in the root cgroup is 0.
491At the time when a new cgroup is created,
492the value in this file is inherited from the corresponding file
493in the parent cgroup.
88afe701 494.\"
b43be47e
MK
495.SS Cgroups version 2
496In cgroups v2,
497all mounted controllers reside in a single unified hierarchy.
498While (different) controllers may be simultaneously
499mounted under the v1 and v2 hierarchies,
500it is not possible to mount the same controller simultaneously
501under both the v1 and the v2 hierarchies.
a721e8b2 502.PP
2befa495
MK
503The new behaviors in cgroups v2 are summarized here,
504and in some cases elaborated in the following subsections.
505.IP 1. 3
a15e0673 506Cgroups v2 provides a unified hierarchy against
dddb7ea1
MK
507which all controllers are mounted.
508.IP 2.
2befa495
MK
509"Internal" processes are not permitted.
510With the exception of the root cgroup, processes may reside
511only in leaf nodes (cgroups that do not themselves contain child cgroups).
4f017a68 512The details are somewhat more subtle than this, and are described below.
dddb7ea1 513.IP 3.
2befa495
MK
514Active cgroups must be specified via the files
515.IR cgroup.controllers
516and
517.IR cgroup.subtree_control .
dddb7ea1 518.IP 4.
2befa495
MK
519The
520.I tasks
521file has been removed.
522In addition, the
523.I cgroup.clone_children
524file that is employed by the
525.I cpuset
526controller has been removed.
dddb7ea1 527.IP 5.
2befa495
MK
528An improved mechanism for notification of empty cgroups is provided by the
529.IR cgroup.events
530file.
531.PP
532For more changes, see the
533.I Documentation/cgroup-v2.txt
534file in the kernel source.
e91d4f9e
MK
535.PP
536Some of the new behaviors listed above saw subsequent modification with
537the addition in Linux 4.14 of "thread mode" (described below).
2befa495 538.\"
dddb7ea1
MK
539.SS Cgroups v2 unified hierarchy
540In cgroups v1, the ability to mount different controllers
541against different hierarchies was intended to allow great flexibility
542for application design.
543In practice, though, the flexibility turned out to less useful than expected,
544and in many cases added complexity.
545Therefore, in cgroups v2,
546all available controllers are mounted against a single hierarchy.
547The available controllers are automatically mounted,
548meaning that it is not necessary (or possible) to specify the controllers
549when mounting the cgroup v2 filesystem using a command such as the following:
a721e8b2 550.PP
4769a778
MK
551.in +4n
552.EX
553mount -t cgroup2 none /mnt/cgroup2
554.EE
555.in
a721e8b2 556.PP
dddb7ea1
MK
557A cgroup v2 controller is available only if it is not currently in use
558via a mount against a cgroup v1 hierarchy.
559Or, to put things another way, it is not possible to employ
560the same controller against both a v1 hierarchy and the unified v2 hierarchy.
57cbb0db
MK
561This means that it may be necessary first to unmount a v1 controller
562(as described above) before that controller is available in v2.
563Since
564.BR systemd (1)
565makes heavy use of some v1 controllers by default,
566it can in some cases be simpler to boot the system with
567selected v1 controllers disabled.
568To do this, specify the
569.IR cgroup_no_v1=list
570option on the kernel boot command line;
571.I list
572is a comma-separated list of the names of the controllers to disable,
573or the word
574.I all
575to disable all v1 controllers.
576(This situation is correctly handled by
577.BR systemd (1),
578which falls back to operating without the specified controllers.)
03bb1264
MK
579.PP
580Note that on many modern systems,
581.BR systemd (1)
582automatically mounts the
583.I cgroup2
584filesystem at
585.I /sys/fs/cgroup/unified
586during the boot process.
dddb7ea1 587.\"
44c429ed
MK
588.SS Cgroups v2 controllers
589The following controllers, documented in the kernel source file
590.IR Documentation/cgroup-v2.txt ,
591are supported in cgroups version 2:
592.TP
593.IR io " (since Linux 4.5)"
594This is the successor of the version 1
595.I blkio
596controller.
597.TP
598.IR memory " (since Linux 4.5)"
599This is the successor of the version 1
600.I memory
601controller.
602.TP
603.IR pids " (since Linux 4.5)"
604This is the same as the version 1
605.I pids
606controller.
607.TP
608.IR perf_event " (since Linux 4.11)"
f7286edc 609This is the same as the version 1
44c429ed
MK
610.I perf_event
611controller.
612.TP
613.IR rdma " (since Linux 4.11)"
614This is the same as the version 1
615.I rdma
616controller.
617.TP
618.IR cpu " (since Linux 4.15)"
619This is the successor to the version 1
620.I cpu
621and
622.I cpuacct
623controllers.
624.\"
2befa495 625.SS Cgroups v2 subtree control
8d5f42dc
MK
626Each cgroup in the v2 hierarchy contains the following two files:
627.TP
628.IR cgroup.controllers
629This is a list of the controllers that are
630.I available
631in this cgroup.
632The contents of this file match the contents of the
633.I cgroup.subtree_control
634file in the parent cgroup.
635.TP
636.I cgroup.subtree_control
637This is a list of controllers that are
638.IR active
639.RI ( enabled )
640in the cgroup.
641The set of controllers in this file is a subset of the set in the
21f0d132 642.IR cgroup.controllers
8d5f42dc
MK
643of this cgroup.
644The set of active controllers is modified by writing strings to this file
645containing space-delimited controller names,
646each preceded by '+' (to enable a controller)
647or '\-' (to disable a controller), as in the following example:
648.IP
649.in +4n
650.EX
651echo '+pids -memory' > x/y/cgroup.subtree_control
652.EE
653.in
654.IP
c9b101d1
MK
655An attempt to enable a controller
656that is not present in
657.I cgroup.controllers
658leads to an
659.B ENOENT
660error when writing to the
661.I cgroup.subtree_control
662file.
663.PP
8d5f42dc
MK
664Because the list of controllers in
665.I cgroup.subtree_control
666is a subset of those
667.IR cgroup.controllers ,
668a controller that has been disabled in one cgroup in the hierarchy
669can never be re-enabled in the subtree below that cgroup.
670.PP
671A cgroup's
672.I cgroup.subtree_control
673file determines the set of controllers that are exercised in the
674.I child
675cgroups.
676When a controller (e.g.,
677.IR pids )
678is present in the
679.I cgroup.subtree_control
680file of a parent cgroup,
681then the corresponding controller-interface files (e.g.,
682.IR pids.max )
683are automatically created in the children of that cgroup
684and can be used to exert resource control in the child cgroups.
21f0d132 685.\"
2468f14e
MK
686.SS Cgroups v2 """no internal processes""" rule
687Cgroups v2 enforces a so-called "no internal processes" rule.
688Roughly speaking, this rule means that,
689with the exception of the root cgroup, processes may reside
690only in leaf nodes (cgroups that do not themselves contain child cgroups).
691This avoids the need to decide how to partition resources between
692processes which are members of cgroup A and processes in child cgroups of A.
693.PP
694For instance, if cgroup
695.I /cg1/cg2
696exists, then a process may reside in
697.IR /cg1/cg2 ,
698but not in
699.IR /cg1 .
700This is to avoid an ambiguity in cgroups v1
701with respect to the delegation of resources between processes in
702.I /cg1
703and its child cgroups.
704The recommended approach in cgroups v2 is to create a subdirectory called
705.I leaf
706for any nonleaf cgroup which should contain processes, but no child cgroups.
707Thus, processes which previously would have gone into
708.I /cg1
709would now go into
710.IR /cg1/leaf .
711This has the advantage of making explicit
712the relationship between processes in
713.I /cg1/leaf
714and
715.IR /cg1 's
716other children.
717.PP
718The "no internal processes" rule is in fact more subtle than stated above.
719More precisely, the rule is that a (nonroot) cgroup can't both
720(1) have member processes, and
721(2) distribute resources into child cgroups\(emthat is, have a nonempty
722.I cgroup.subtree_control
723file.
724Thus, it
725.I is
726possible for a cgroup to have both member processes and child cgroups,
727but before controllers can be enabled for that cgroup,
728the member processes must be moved out of the cgroup
729(e.g., perhaps into the child cgroups).
e91d4f9e
MK
730.PP
731With the Linux 4.14 addition of "thread mode" (described below),
732the "no internal processes" rule has been relaxed in some cases.
2468f14e 733.\"
754f4cf5
MK
734.SS Cgroups v2 cgroup.events file
735With cgroups v2, a new mechanism is provided to obtain notification
736about when a cgroup becomes empty.
737The cgroups v1
738.IR release_agent
739and
740.IR notify_on_release
741files are removed, and replaced by a new, more general-purpose file,
742.IR cgroup.events .
743This file contains key-value pairs
744(delimited by newline characters, with the key and value separated by spaces)
745that identify events or state for a cgroup.
746Currently, only one key appears in this file,
747.IR populated ,
748which has either the value 0,
749meaning that the cgroup (and its descendants)
750contain no (nonzombie) processes,
751or 1, meaning that the cgroup contains member processes.
a721e8b2 752.PP
754f4cf5
MK
753The
754.IR cgroup.events
755file can be monitored, in order to receive notification when a cgroup
756transitions between the populated and unpopulated states (or vice versa).
757When monitoring this file using
758.BR inotify (7),
759transitions generate
760.BR IN_MODIFY
761events, and when monitoring the file using
762.BR poll (2),
763transitions generate
764.B POLLPRI
765events.
a721e8b2 766.PP
ccb1a262
MK
767The cgroups v2 release-notification mechanism provided by the
768.I populated
769field of the
770.I cgroup.events
771file offers at least two advantages over the cgroups v1
754f4cf5
MK
772.IR release_agent
773mechanism.
774First, it allows for cheaper notification,
775since a single process can monitor multiple
776.IR cgroup.events
777files.
778By contrast, the cgroups v1 mechanism requires the creation
779of a process for each notification.
a15e0673 780Second, notification can be delegated to a process that lives inside
754f4cf5 781a container associated with the newly empty cgroup.
c91a9f8a 782.\"
5e071499
MK
783.SS Cgroups v2 cgroup.stat file
784.\" commit ec39225cca42c05ac36853d11d28f877fde5c42e
785Each cgroup in the v2 hierarchy contains a read-only
786.IR cgroup.stat
787file (first introduced in Linux 4.14)
788that consists of lines containing key-value pairs.
789The following keys currently appear in this file:
790.TP
791.I nr_descendants
792This is the total number of visible (i.e., living) descendant cgroups
793underneath this cgroup.
794.TP
795.I nr_dying_descendants
796This is the total number of dying descendant cgroups
797underneath this cgroup.
798A cgroup enters the dying state after being deleted.
799It remains in that state for an undefined period
800(which will depend on system load)
801before being destroyed.
802.IP
803A process can't be made a member of a dying cgroup,
804and a dying cgroup can't be brought back to life.
805.\"
5845e10b
MK
806.SS Limiting the number of descendant cgroups
807Each cgroup in the v2 hierarchy contains the following files,
808which can be used to view and set limits on the number
809of descendant cgroups under that cgroup:
810.TP
811.IR cgroup.max.depth " (since Linux 4.14)"
812.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
813This file defines a limit on the depth of nesting of descendant cgroups.
814A value of 0 in this file means that no descendant cgroups can be created.
815An attempt to create a descendant whose nesting level exceeds
816the limit fails
817.RI ( mkdir (2)
818fails with the error
819.BR EAGAIN ).
820.IP
821Writing the string
822.IR """max"""
823to this file means that no limit is imposed.
824The default value in this file is
825.IR """max""" .
826.TP
827.IR cgroup.max.descendants " (since Linux 4.14)"
828.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
829This file defines a limit on the number of live descendant cgroups that
830this cgroup may have.
831An attempt to create more descendants than allowed by the limit fails
832.RI ( mkdir (2)
833fails with the error
834.BR EAGAIN ).
835.IP
836Writing the string
837.IR """max"""
838to this file means that no limit is imposed.
839The default value in this file is
840.IR """max""" .
841.\"
4242dfbe
MK
842.SS Cgroups v2 delegation
843In the context of cgroups,
844delegation means passing management of some subtree
845of the cgroup hierarchy to a nonprivileged process.
846Cgroups v1 provides support for delegation that was
847accidental and not fully secure.
848Cgroups v2 supports delegation by explicit design.
849.PP
850Some terminology is required in order to describe delegation.
851A
852.I delegater
853is a privileged user (i.e., root) who owns a parent cgroup.
854A
855.I delegatee
856is a nonprivileged user who will be granted the permissions needed
857to manage some subhierarchy under that parent cgroup,
858known as the
859.IR "delegated subtree" .
860.PP
861To perform delegation,
862the delegater makes certain directories and files writable by the delegatee,
863typically by changing the ownership of the objects to be the user ID
864of the delegatee.
865Assuming that we want to delegate the hierarchy rooted at
866.I /grp1
867and that there are not yet any child cgroups under that cgroup,
868the ownership of the following is changed to the user ID of the delegatee:
869.TP
870.IR /grp1
871Changing the ownership of the root of the subtree means that any new
872cgroups created under the subtree (and the files they contain)
873will also be owned by the delegatee.
874.TP
875.IR /grp1/cgroup.procs
f7286edc 876Changing the ownership of this file means that the delegatee
4242dfbe
MK
877can move processes into the root of the delegated subtree.
878.TP
879.IR /grp1/cgroup.subtree_control
880Making this file owned by the delegatee is optional.
881Doing so means that that the delegatee can enable controllers
882(that are present in
883.IR /grp1/cgroup.controllers )
884in order to further redistribute resources at lower levels in the subtree.
885As an alternative to changing the ownership of this file,
886the delegater might instead add selected controllers to this file.
887.PP
888The delegater should
889.I not
890change the ownership of any of the controller interfaces files (e.g.,
891.IR pids.max ,
892.IR memory.high )
893in
894.IR grp1 .
895Those files are used from the next level above the delegated subtree
896in order to distribute resources into the subtree,
897and the delegatee should not have permission to change
898the resources that are distributed into the delegated subtree.
899.PP
900After the aforementioned steps have been performed,
901the delegatee can create child cgroups within the delegated subtree
902and move processes between cgroups in the subtree.
903If some controllers are present in
904.IR grp1/cgroup.subtree_control ,
905or the ownership of that file was passed to the delegatee,
f7286edc 906the delegatee can also control the further redistribution
4242dfbe
MK
907of the corresponding resources into the delegated subtree.
908.PP
909Some delegation
910.IR "containment rules"
911ensure that the delegatee can move processes between cgroups within the
912delegated subtree,
913but can't move processes from outside the delegated subtree into
914the subtree or vice versa.
915A nonprivileged process (i.e., the delegatee) can write the PID of
916a "target" process into a
917.IR cgroup.procs
918file only if all of the following are true:
919.IP * 3
920The effective UID of the writer (i.e., the delegatee) matches the
921real user ID or the saved set-user-ID of the target process.
922.IP *
923The writer has write permission on the
924.I cgroup.procs
925file in the destination cgroup.
926.IP *
927The writer has write permission on the
928.I cgroup.procs
929file in the common ancestor of the source and destination cgroups.
930(In some cases,
931the common ancestor may be the source or destination cgroup itself.)
932.PP
933.IR Note :
934one consequence of these delegation containment rules is that the
935delegater must place the first process (a process owned by the delegatee)
936into the delegated subtree.
937.\"
5c2181ad
MK
938.SS /proc files
939.TP
34eb3340 940.IR /proc/cgroups " (since Linux 2.6.24)"
92bb6d36 941This file contains information about the controllers
1a4f7d59 942that are compiled into the kernel.
34eb3340
MK
943An example of the contents of this file (reformatted for readability)
944is the following:
a721e8b2 945.IP
34eb3340 946.in +4n
b8302363 947.EX
4580c2f6
MK
948#subsys_name hierarchy num_cgroups enabled
949cpuset 4 1 1
950cpu 8 1 1
951cpuacct 8 1 1
952blkio 6 1 1
953memory 3 1 1
954devices 10 84 1
955freezer 7 1 1
956net_cls 9 1 1
957perf_event 5 1 1
958net_prio 9 1 1
959hugetlb 0 1 0
960pids 2 1 1
b8302363 961.EE
e646a1ba 962.in
a721e8b2 963.IP
34eb3340
MK
964The fields in this file are, from left to right:
965.RS
966.IP 1. 3
967The name of the controller.
968.IP 2.
92bb6d36 969The unique ID of the cgroup hierarchy on which this controller is mounted.
11c0797f 970If multiple cgroups v1 controllers are bound to the same hierarchy,
34eb3340 971then each will show the same hierarchy ID in this field.
92bb6d36
MK
972The value in this field will be 0 if:
973.RS 5
974.IP a) 3
975the controller is not mounted on a cgroups v1 hierarchy;
976.IP b)
977the controller is bound to the cgroups v2 single unified hierarchy; or
978.IP c)
979the controller is disabled (see below).
980.RE
34eb3340
MK
981.IP 3.
982The number of control groups in this hierarchy using this controller.
983.IP 4.
984This field contains the value 1 if this controller is enabled,
985or 0 if it has been disabled (via the
986.IR cgroup_disable
987kernel command-line boot parameter).
988.RE
989.TP
5c2181ad 990.IR /proc/[pid]/cgroup " (since Linux 2.6.24)"
f5faa016
MK
991This file describes control groups to which the process
992with the corresponding PID belongs.
5f8a7eb2 993The displayed information differs for
2c4fbe35 994cgroups version 1 and version 2 hierarchies.
a721e8b2 995.IP
5f8a7eb2 996For each cgroup hierarchy of which the process is a member,
2e33b59e 997there is one entry containing three colon-separated fields:
a721e8b2 998.IP
4769a778
MK
999.in +4n
1000.EX
1001hierarchy-ID:controller-list:cgroup-path
1002.EE
1003.in
a721e8b2 1004.IP
5f8a7eb2 1005For example:
c1a022dc
MK
1006.IP
1007.in +4n
1008.EX
10095:cpuacct,cpu,cpuset:/daemons
1010.EE
1011.in
5c2181ad
MK
1012.IP
1013The colon-separated fields are, from left to right:
5f8a7eb2 1014.RS
5c2181ad 1015.IP 1. 3
5f8a7eb2
MK
1016For cgroups version 1 hierarchies,
1017this field contains a unique hierarchy ID number
1018that can be matched to a hierarchy ID in
1019.IR /proc/cgroups .
1020For the cgroups version 2 hierarchy, this field contains the value 0.
5c2181ad 1021.IP 2.
5f8a7eb2 1022For cgroups version 1 hierarchies,
55f52de8 1023this field contains a comma-separated list of the controllers
5f8a7eb2
MK
1024bound to the hierarchy.
1025For the cgroups version 2 hierarchy, this field is empty.
5c2181ad 1026.IP 3.
5f8a7eb2
MK
1027This field contains the pathname of the control group in the hierarchy
1028to which the process belongs.
1029This pathname is relative to the mount point of the hierarchy.
5c2181ad 1030.RE
2e23a9b2
MK
1031.SH ERRORS
1032The following errors can occur for
1033.BR mount (2):
1034.TP
1035.B EBUSY
1036An attempt to mount a cgroup version 1 filesystem specified neither the
1037.I name=
1038option (to mount a named hierarchy) nor a controller name (or
28bcfee9 1039.IR all ).
15ce4b0c
MK
1040.SH NOTES
1041A child process created via
1042.BR fork (2)
1043inherits its parent's cgroup memberships.
1044A process's cgroup memberships are preserved across
1045.BR execve (2).
bbfdf727 1046.SH SEE ALSO
ebbc83be 1047.BR prlimit (1),
f60a5da2 1048.BR systemd (1),
edc2a022
MK
1049.BR systemd-cgls (1),
1050.BR systemd-cgtop (1),
325b7eb0 1051.BR clone (2),
ebbc83be
MK
1052.BR ioprio_set (2),
1053.BR perf_event_open (2),
1054.BR setrlimit (2),
cff6de30 1055.BR cgroup_namespaces (7),
69c47536 1056.BR cpuset (7),
ebbc83be
MK
1057.BR namespaces (7),
1058.BR sched (7),
1059.BR user_namespaces (7)