]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/cgroups.7
cgroups.7: Rework text on threads and cgroups v2
[thirdparty/man-pages.git] / man7 / cgroups.7
CommitLineData
014cb63b 1.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
4242dfbe 2.\" and Copyright (C) 2016, 2017 Michael Kerrisk <mtk.manpages@gmail.com>
014cb63b
MK
3.\"
4.\" %%%LICENSE_START(VERBATIM)
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
24.\" %%%LICENSE_END
25.\"
4b8c67d9 26.TH CGROUPS 7 2017-09-15 "Linux" "Linux Programmer's Manual"
21f0d132
MK
27.SH NAME
28cgroups \- Linux control groups
29.SH DESCRIPTION
30Control cgroups, usually referred to as cgroups,
a15e0673 31are a Linux kernel feature which allow processes to
8bff7140
MK
32be organized into hierarchical groups whose usage of
33various types of resources can then be limited and monitored.
34The kernel's cgroup interface is provided through
21f0d132 35a pseudo-filesystem called cgroupfs.
6398ca15 36Grouping is implemented in the core cgroup kernel code,
21f0d132 37while resource tracking and limits are implemented in
8bff7140 38a set of per-resource-type subsystems (memory, CPU, and so on).
21f0d132 39.\"
176a4211
MK
40.SS Terminology
41A
42.I cgroup
43is a collection of processes that are bound to a set of
44limits or parameters defined via the cgroup filesystem.
a721e8b2 45.PP
176a4211
MK
46A
47.I subsystem
48is a kernel component that modifies the behavior of
49the processes in a cgroup.
50Various subsystems have been implemented, making it possible to do things
51such as limiting the amount of CPU time and memory available to a cgroup,
52accounting for the CPU time used by a cgroup,
53and freezing and resuming execution of the processes in a cgroup.
54Subsystems are sometimes also known as
55.IR "resource controllers"
56(or simply, controllers).
a721e8b2 57.PP
55f52de8 58The cgroups for a controller are arranged in a
176a4211
MK
59.IR hierarchy .
60This hierarchy is defined by creating, removing, and
61renaming subdirectories within the cgroup filesystem.
8fc9db1e
MK
62At each level of the hierarchy, attributes (e.g., limits) can be defined.
63The limits, control, and accounting provided by cgroups generally have
64effect throughout the subhierarchy underneath the cgroup where the
65attributes are defined.
8bff7140
MK
66Thus, for example, the limits placed on
67a cgroup at a higher level in the hierarchy cannot be exceeded
68by descendant cgroups.
176a4211 69.\"
43df1ab3
MK
70.SS Cgroups version 1 and version 2
71The initial release of the cgroups implementation was in Linux 2.6.24.
55f52de8 72Over time, various cgroup controllers have been added
43df1ab3 73to allow the management of various types of resources.
55f52de8
MK
74However, the development of these controllers was largely uncoordinated,
75with the result that many inconsistencies arose between controllers
43df1ab3
MK
76and management of the cgroup hierarchies became rather complex.
77(A longer description of these problems can be found in
78the kernel source file
0a837899 79.IR Documentation/cgroup\-v2.txt .)
a721e8b2 80.PP
813d9220
MK
81Because of the problems with the initial cgroups implementation
82(cgroups version 1),
43df1ab3
MK
83starting in Linux 3.10, work began on a new,
84orthogonal implementation to remedy these problems.
85Initially marked experimental, and hidden behind the
86.I "\-o\ __DEVEL__sane_behavior"
87mount option, the new version (cgroups version 2)
88was eventually made official with the release of Linux 4.5.
89Differences between the two versions are described in the text below.
a721e8b2 90.PP
43df1ab3
MK
91Although cgroups v2 is intended as a replacement for cgroups v1,
92the older system continues to exist
93(and for compatibility reasons is unlikely to be removed).
94Currently, cgroups v2 implements only a subset of the controllers
95available in cgroups v1.
96The two systems are implemented so that both v1 controllers and
97v2 controllers can be mounted on the same system.
98Thus, for example, it is possible to use those controllers
99that are supported under version 2,
100while also using version 1 controllers
101where version 2 does not yet support those controllers.
1a90a85e
MK
102The only restriction here is that a controller can't be simultaneously
103employed in both a cgroups v1 hierarchy and in the cgroups v2 hierarchy.
43df1ab3 104.\"
5714ccee 105.SH CGROUPS VERSION 1
8bff7140
MK
106Under cgroups v1, each controller may be mounted against a separate
107cgroup filesystem that provides its own hierarchical organization of the
108processes on the system.
980f1827 109It is also possible to comount multiple (or even all) cgroups v1 controllers
8bff7140
MK
110against the same cgroup filesystem, meaning that the comounted controllers
111manage the same hierarchical organization of processes.
a721e8b2 112.PP
8bff7140
MK
113For each mounted hierarchy,
114the directory tree mirrors the control group hierarchy.
115Each control group is represented by a directory, with each of its child
116control cgroups represented as a child directory.
117For instance,
118.IR /user/joe/1.session
119represents control group
120.IR 1.session ,
121which is a child of cgroup
122.IR joe ,
123which is a child of
124.IR /user .
125Under each cgroup directory is a set of files which can be read or
126written to, reflecting resource limits and a few general cgroup
127properties.
a721e8b2 128.PP
8bff7140 129In addition, in cgroups v1,
55f52de8 130cgroups can be mounted with no bound controller, in which case
8bff7140 131they serve only to track processes.
59dabd75 132(See the discussion of release notification below.)
8bff7140
MK
133An example of this is the
134.I name=systemd
135cgroup which is used by
136.BR systemd (1)
137to track services and user sessions.
138.\"
6398ca15 139.SS Tasks (threads) versus processes
c775bca2
MK
140In cgroups v1, a distinction is drawn between
141.I processes
142and
143.IR tasks .
144In this view, a process can consist of multiple tasks
6398ca15
MK
145(more commonly called threads, from a user-space perspective,
146and called such in the remainder of this man page).
0ec74e08 147In cgroups v1, it is possible to independently manipulate
6398ca15 148the cgroup memberships of the threads in a process.
56769384
MK
149Because splitting threads across different cgroups
150caused problems in some cases,
c775bca2
MK
151.\" FIXME Add some text describing why this was a problem.
152the ability to independently manipulate the cgroup memberships
56769384
MK
153of the threads in a process was removed in the initial cgroups v2
154implementation, and subsequently restored in a more limited form
155(see the discussion of "thread mode" below).
c775bca2 156.\"
77e0a626
MK
157.SS Mounting v1 controllers
158The use of cgroups requires a kernel built with the
8e6578f8
KF
159.BR CONFIG_CGROUP
160option.
77e0a626
MK
161In addition, each of the v1 controllers has an associated
162configuration option that must be set in order to employ that controller.
a721e8b2 163.PP
77e0a626
MK
164In order to use a v1 controller,
165it must be mounted against a cgroup filesystem.
4e07c70f
MK
166The usual place for such mounts is under a
167.BR tmpfs (5)
168filesystem mounted at
77e0a626
MK
169.IR /sys/fs/cgroup .
170Thus, one might mount the
171.I cpu
172controller as follows:
a721e8b2 173.PP
77e0a626 174.in +4n
b8302363 175.EX
77e0a626 176mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
b8302363 177.EE
e646a1ba 178.in
a721e8b2 179.PP
77e0a626
MK
180It is possible to comount multiple controllers against the same hierarchy.
181For example, here the
182.IR cpu
21f0d132 183and
77e0a626
MK
184.IR cpuacct
185controllers are comounted against a single hierarchy:
a721e8b2 186.PP
21f0d132 187.in +4n
b8302363 188.EX
77e0a626 189mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
b8302363 190.EE
e646a1ba 191.in
a721e8b2 192.PP
55f52de8 193Comounting controllers has the effect that a process is in the same cgroup for
77e0a626 194all of the comounted controllers.
55f52de8 195Separately mounting controllers allows a process to
21f0d132
MK
196be in cgroup
197.I /foo1
55f52de8 198for one controller while being in
21f0d132
MK
199.I /foo2/foo3
200for another.
a721e8b2 201.PP
77e0a626 202It is possible to comount all v1 controllers against the same hierarchy:
a721e8b2 203.PP
77e0a626 204.in +4n
b8302363 205.EX
77e0a626 206mount \-t cgroup \-o all cgroup /sys/fs/cgroup
b8302363 207.EE
e646a1ba 208.in
a721e8b2 209.PP
77e0a626
MK
210(One can achieve the same result by omitting
211.IR "\-o all" ,
212since it is the default if no controllers are explicitly specified.)
a721e8b2 213.PP
31ec2a5c
MK
214It is not possible to mount the same controller
215against multiple cgroup hierarchies.
216For example, it is not possible to mount both the
217.I cpu
218and
219.I cpuacct
220controllers against one hierarchy, and to mount the
221.I cpu
222controller alone against another hierarchy.
223It is possible to create multiple mount points with exactly
224the same set of comounted controllers.
225However, in this case all that results is multiple mount points
226providing a view of the same hierarchy.
a721e8b2 227.PP
77e0a626
MK
228Note that on many systems, the v1 controllers are automatically mounted under
229.IR /sys/fs/cgroup ;
230in particular,
231.BR systemd (1)
232automatically creates such mount points.
21f0d132 233.\"
7409b54b
MK
234.SS Unmounting v1 controllers
235A mounted cgroup filesystem can be unmounted using the
236.BR umount (8)
237command, as in the following example:
238.PP
239.in +4n
240.EX
241umount /sys/fs/cgroup/pids
242.EE
243.in
244.PP
245.IR "But note well" :
246a cgroup filesystem is unmounted only if it is not busy,
247that is, it has no child cgroups.
248If this is not the case, then the only effect of the
249.BR umount (8)
250is to make the mount invisible.
251Thus, to ensure that the mount point is really removed,
252one must first remove all child cgroups,
253which in turn can be done only after all member processes
254have been moved from those cgroups to the root cgroup.
255.\"
860573ad
MK
256.SS Cgroups version 1 controllers
257Each of the cgroups version 1 controllers is governed
258by a kernel configuration option (listed below).
259Additionally, the availability of the cgroups feature is governed by the
260.BR CONFIG_CGROUPS
261kernel configuration option.
262.TP
263.IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP )
264Cgroups can be guaranteed a minimum number of "CPU shares"
265when a system is busy.
266This does not limit a cgroup's CPU usage if the CPUs are not busy.
4ad9a706
MK
267For further information, see
268.IR Documentation/scheduler/sched-design-CFS.txt .
a721e8b2 269.IP
4ad9a706
MK
270In Linux 3.2,
271this controller was extended to provide CPU "bandwidth" control.
272If the kernel is configured with
81ff7360 273.BR CONFIG_CFS_BANDWIDTH ,
4ad9a706
MK
274then within each scheduling period
275(defined via a file in the cgroup directory), it is possible to define
276an upper limit on the CPU time allocated to the processes in a cgroup.
277This upper limit applies even if there is no other competition for the CPU.
860573ad
MK
278Further information can be found in the kernel source file
279.IR Documentation/scheduler/sched\-bwc.txt .
280.TP
281.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
282This provides accounting for CPU usage by groups of processes.
a721e8b2 283.IP
860573ad
MK
284Further information can be found in the kernel source file
285.IR Documentation/cgroup\-v1/cpuacct.txt .
286.TP
287.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
288This cgroup can be used to bind the processes in a cgroup to
289a specified set of CPUs and NUMA nodes.
a721e8b2 290.IP
860573ad
MK
291Further information can be found in the kernel source file
292.IR Documentation/cgroup\-v1/cpusets.txt .
293.TP
294.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
295The memory controller supports reporting and limiting of process memory, kernel
296memory, and swap used by cgroups.
a721e8b2 297.IP
860573ad
MK
298Further information can be found in the kernel source file
299.IR Documentation/cgroup\-v1/memory.txt .
300.TP
301.IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP )
302This supports controlling which processes may create (mknod) devices as
303well as open them for reading or writing.
304The policies may be specified as whitelists and blacklists.
305Hierarchy is enforced, so new rules must not
306violate existing rules for the target or ancestor cgroups.
a721e8b2 307.IP
860573ad
MK
308Further information can be found in the kernel source file
309.IR Documentation/cgroup-v1/devices.txt .
310.TP
311.IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP )
312The
313.IR freezer
314cgroup can suspend and restore (resume) all processes in a cgroup.
315Freezing a cgroup
316.I /A
317also causes its children, for example, processes in
318.IR /A/B ,
319to be frozen.
a721e8b2 320.IP
860573ad
MK
321Further information can be found in the kernel source file
322.IR Documentation/cgroup-v1/freezer-subsystem.txt .
323.TP
324.IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP )
325This places a classid, specified for the cgroup, on network packets
326created by a cgroup.
327These classids can then be used in firewall rules,
328as well as used to shape traffic using
329.BR tc (8).
330This applies only to packets
331leaving the cgroup, not to traffic arriving at the cgroup.
a721e8b2 332.IP
860573ad
MK
333Further information can be found in the kernel source file
334.IR Documentation/cgroup-v1/net_cls.txt .
335.TP
336.IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP )
337The
338.I blkio
339cgroup controls and limits access to specified block devices by
340applying IO control in the form of throttling and upper limits against leaf
341nodes and intermediate nodes in the storage hierarchy.
a721e8b2 342.IP
860573ad
MK
343Two policies are available.
344The first is a proportional-weight time-based division
345of disk implemented with CFQ.
346This is in effect for leaf nodes using CFQ.
347The second is a throttling policy which specifies
348upper I/O rate limits on a device.
a721e8b2 349.IP
860573ad
MK
350Further information can be found in the kernel source file
351.IR Documentation/cgroup-v1/blkio-controller.txt .
352.TP
353.IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP )
354This controller allows
355.I perf
356monitoring of the set of processes grouped in a cgroup.
a721e8b2 357.IP
860573ad 358Further information can be found in the kernel source file
c174eb6a 359.IR tools/perf/Documentation/perf-record.txt .
860573ad
MK
360.TP
361.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
362This allows priorities to be specified, per network interface, for cgroups.
a721e8b2 363.IP
860573ad
MK
364Further information can be found in the kernel source file
365.IR Documentation/cgroup-v1/net_prio.txt .
366.TP
367.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
368This supports limiting the use of huge pages by cgroups.
a721e8b2 369.IP
860573ad
MK
370Further information can be found in the kernel source file
371.IR Documentation/cgroup-v1/hugetlb.txt .
372.TP
373.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
374This controller permits limiting the number of process that may be created
375in a cgroup (and its descendants).
a721e8b2 376.IP
860573ad
MK
377Further information can be found in the kernel source file
378.IR Documentation/cgroup-v1/pids.txt .
cfec905e
NB
379.TP
380.IR rdma " (since Linux 4.11; " \fBCONFIG_CGROUP_RDMA\fP )
d145c025
MK
381The RDMA controller permits limiting the use of
382RDMA/IB-specific resources per cgroup.
cfec905e
NB
383.IP
384Further information can be found in the kernel source file
385.IR Documentation/cgroup-v1/rdma.txt .
860573ad 386.\"
6398ca15 387.SS Creating cgroups and moving processes
9ed582ac 388A cgroup filesystem initially contains a single root cgroup, '/',
6398ca15 389which all processes belong to.
21f0d132 390A new cgroup is created by creating a directory in the cgroup filesystem:
a721e8b2 391.PP
4769a778
MK
392.in +4n
393.EX
394mkdir /sys/fs/cgroup/cpu/cg1
395.EE
396.in
a721e8b2 397.PP
21f0d132 398This creates a new empty cgroup.
a721e8b2 399.PP
f524e7f8 400A process may be moved to this cgroup by writing its PID into the cgroup's
21f0d132 401.I cgroup.procs
21f0d132 402file:
a721e8b2 403.PP
4769a778
MK
404.in +4n
405.EX
406echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
407.EE
408.in
a721e8b2 409.PP
f524e7f8 410Only one PID at a time should be written to this file.
a721e8b2 411.PP
f524e7f8
MK
412Writing the value 0 to a
413.IR cgroup.procs
414file causes the writing process to be moved to the corresponding cgroup.
a721e8b2 415.PP
6398ca15
MK
416When writing a PID into the
417.IR cgroup.procs ,
87402a2e 418all threads in the process are moved into the new cgroup at once.
a721e8b2 419.PP
f524e7f8
MK
420Within a hierarchy, a process can be a member of exactly one cgroup.
421Writing a process's PID to a
422.IR cgroup.procs
423file automatically removes it from the cgroup of
424which it was previously a member.
a721e8b2 425.PP
f524e7f8
MK
426The
427.I cgroup.procs
428file can be read to obtain a list of the processes that are
429members of a cgroup.
430The returned list of PIDs is not guaranteed to be in order.
431Nor is it guaranteed to be free of duplicates.
432(For example, a PID may be recycled while reading from the list.)
a721e8b2 433.PP
56769384 434In cgroups v1, an individual thread can be moved to
87402a2e
MK
435another cgroup by writing its thread ID
436(i.e., the kernel thread ID returned by
437.BR clone (2)
438and
439.BR gettid (2))
440to the
441.IR tasks
442file in a cgroup directory.
443This file can be read to discover the set of threads
444that are members of the cgroup.
b43be47e
MK
445.\"
446.SS Removing cgroups
447To remove a cgroup,
448it must first have no child cgroups and contain no (nonzombie) processes.
449So long as that is the case, one can simply
450remove the corresponding directory pathname.
451Note that files in a cgroup directory cannot and need not be
452removed.
453.\"
88afe701 454.SS Cgroups v1 release notification
23388d41
MK
455Two files can be used to determine whether the kernel provides
456notifications when a cgroup becomes empty.
457A cgroup is considered to be empty when it contains no child
458cgroups and no member processes.
a721e8b2 459.PP
23388d41 460A special file in the root directory of each cgroup hierarchy,
88afe701 461.IR release_agent ,
23388d41
MK
462can be used to register the pathname of a program that may be invoked when
463a cgroup in the hierarchy becomes empty.
464The pathname of the newly empty cgroup (relative to the cgroup mount point)
465is provided as the sole command-line argument when the
466.IR release_agent
467program is invoked.
468The
469.IR release_agent
470program might remove the cgroup directory,
980f1827 471or perhaps repopulate it with a process.
a721e8b2 472.PP
23388d41
MK
473The default value of the
474.IR release_agent
475file is empty, meaning that no release agent is invoked.
a721e8b2 476.PP
23388d41
MK
477Whether or not the
478.IR release_agent
479program is invoked when a particular cgroup becomes empty is determined
480by the value in the
88afe701 481.IR notify_on_release
23388d41
MK
482file in the corresponding cgroup directory.
483If this file contains the value 0, then the
484.IR release_agent
485program is not invoked.
486If it contains the value 1, the
487.IR release_agent
488program is invoked.
489The default value for this file in the root cgroup is 0.
490At the time when a new cgroup is created,
491the value in this file is inherited from the corresponding file
492in the parent cgroup.
88afe701 493.\"
5714ccee 494.SH CGROUPS VERSION 2
b43be47e
MK
495In cgroups v2,
496all mounted controllers reside in a single unified hierarchy.
497While (different) controllers may be simultaneously
498mounted under the v1 and v2 hierarchies,
499it is not possible to mount the same controller simultaneously
500under both the v1 and the v2 hierarchies.
a721e8b2 501.PP
2befa495
MK
502The new behaviors in cgroups v2 are summarized here,
503and in some cases elaborated in the following subsections.
504.IP 1. 3
a15e0673 505Cgroups v2 provides a unified hierarchy against
dddb7ea1
MK
506which all controllers are mounted.
507.IP 2.
2befa495
MK
508"Internal" processes are not permitted.
509With the exception of the root cgroup, processes may reside
510only in leaf nodes (cgroups that do not themselves contain child cgroups).
4f017a68 511The details are somewhat more subtle than this, and are described below.
dddb7ea1 512.IP 3.
2befa495
MK
513Active cgroups must be specified via the files
514.IR cgroup.controllers
515and
516.IR cgroup.subtree_control .
dddb7ea1 517.IP 4.
2befa495
MK
518The
519.I tasks
520file has been removed.
521In addition, the
522.I cgroup.clone_children
523file that is employed by the
524.I cpuset
525controller has been removed.
dddb7ea1 526.IP 5.
2befa495
MK
527An improved mechanism for notification of empty cgroups is provided by the
528.IR cgroup.events
529file.
530.PP
531For more changes, see the
532.I Documentation/cgroup-v2.txt
533file in the kernel source.
e91d4f9e
MK
534.PP
535Some of the new behaviors listed above saw subsequent modification with
536the addition in Linux 4.14 of "thread mode" (described below).
2befa495 537.\"
dddb7ea1
MK
538.SS Cgroups v2 unified hierarchy
539In cgroups v1, the ability to mount different controllers
540against different hierarchies was intended to allow great flexibility
541for application design.
542In practice, though, the flexibility turned out to less useful than expected,
543and in many cases added complexity.
544Therefore, in cgroups v2,
545all available controllers are mounted against a single hierarchy.
546The available controllers are automatically mounted,
547meaning that it is not necessary (or possible) to specify the controllers
548when mounting the cgroup v2 filesystem using a command such as the following:
a721e8b2 549.PP
4769a778
MK
550.in +4n
551.EX
552mount -t cgroup2 none /mnt/cgroup2
553.EE
554.in
a721e8b2 555.PP
dddb7ea1
MK
556A cgroup v2 controller is available only if it is not currently in use
557via a mount against a cgroup v1 hierarchy.
558Or, to put things another way, it is not possible to employ
559the same controller against both a v1 hierarchy and the unified v2 hierarchy.
57cbb0db
MK
560This means that it may be necessary first to unmount a v1 controller
561(as described above) before that controller is available in v2.
562Since
563.BR systemd (1)
564makes heavy use of some v1 controllers by default,
565it can in some cases be simpler to boot the system with
566selected v1 controllers disabled.
567To do this, specify the
568.IR cgroup_no_v1=list
569option on the kernel boot command line;
570.I list
571is a comma-separated list of the names of the controllers to disable,
572or the word
573.I all
574to disable all v1 controllers.
575(This situation is correctly handled by
576.BR systemd (1),
577which falls back to operating without the specified controllers.)
03bb1264
MK
578.PP
579Note that on many modern systems,
580.BR systemd (1)
581automatically mounts the
582.I cgroup2
583filesystem at
584.I /sys/fs/cgroup/unified
585during the boot process.
dddb7ea1 586.\"
44c429ed
MK
587.SS Cgroups v2 controllers
588The following controllers, documented in the kernel source file
589.IR Documentation/cgroup-v2.txt ,
590are supported in cgroups version 2:
591.TP
592.IR io " (since Linux 4.5)"
593This is the successor of the version 1
594.I blkio
595controller.
596.TP
597.IR memory " (since Linux 4.5)"
598This is the successor of the version 1
599.I memory
600controller.
601.TP
602.IR pids " (since Linux 4.5)"
603This is the same as the version 1
604.I pids
605controller.
606.TP
607.IR perf_event " (since Linux 4.11)"
f7286edc 608This is the same as the version 1
44c429ed
MK
609.I perf_event
610controller.
611.TP
612.IR rdma " (since Linux 4.11)"
613This is the same as the version 1
614.I rdma
615controller.
616.TP
617.IR cpu " (since Linux 4.15)"
618This is the successor to the version 1
619.I cpu
620and
621.I cpuacct
622controllers.
623.\"
2befa495 624.SS Cgroups v2 subtree control
8d5f42dc
MK
625Each cgroup in the v2 hierarchy contains the following two files:
626.TP
627.IR cgroup.controllers
628This is a list of the controllers that are
629.I available
630in this cgroup.
631The contents of this file match the contents of the
632.I cgroup.subtree_control
633file in the parent cgroup.
634.TP
635.I cgroup.subtree_control
636This is a list of controllers that are
637.IR active
638.RI ( enabled )
639in the cgroup.
640The set of controllers in this file is a subset of the set in the
21f0d132 641.IR cgroup.controllers
8d5f42dc
MK
642of this cgroup.
643The set of active controllers is modified by writing strings to this file
644containing space-delimited controller names,
645each preceded by '+' (to enable a controller)
646or '\-' (to disable a controller), as in the following example:
647.IP
648.in +4n
649.EX
650echo '+pids -memory' > x/y/cgroup.subtree_control
651.EE
652.in
653.IP
c9b101d1
MK
654An attempt to enable a controller
655that is not present in
656.I cgroup.controllers
657leads to an
658.B ENOENT
659error when writing to the
660.I cgroup.subtree_control
661file.
662.PP
8d5f42dc
MK
663Because the list of controllers in
664.I cgroup.subtree_control
665is a subset of those
666.IR cgroup.controllers ,
667a controller that has been disabled in one cgroup in the hierarchy
668can never be re-enabled in the subtree below that cgroup.
669.PP
670A cgroup's
671.I cgroup.subtree_control
672file determines the set of controllers that are exercised in the
673.I child
674cgroups.
675When a controller (e.g.,
676.IR pids )
677is present in the
678.I cgroup.subtree_control
679file of a parent cgroup,
680then the corresponding controller-interface files (e.g.,
681.IR pids.max )
682are automatically created in the children of that cgroup
683and can be used to exert resource control in the child cgroups.
21f0d132 684.\"
2468f14e
MK
685.SS Cgroups v2 """no internal processes""" rule
686Cgroups v2 enforces a so-called "no internal processes" rule.
687Roughly speaking, this rule means that,
688with the exception of the root cgroup, processes may reside
689only in leaf nodes (cgroups that do not themselves contain child cgroups).
690This avoids the need to decide how to partition resources between
691processes which are members of cgroup A and processes in child cgroups of A.
692.PP
693For instance, if cgroup
694.I /cg1/cg2
695exists, then a process may reside in
696.IR /cg1/cg2 ,
697but not in
698.IR /cg1 .
699This is to avoid an ambiguity in cgroups v1
700with respect to the delegation of resources between processes in
701.I /cg1
702and its child cgroups.
703The recommended approach in cgroups v2 is to create a subdirectory called
704.I leaf
705for any nonleaf cgroup which should contain processes, but no child cgroups.
706Thus, processes which previously would have gone into
707.I /cg1
708would now go into
709.IR /cg1/leaf .
710This has the advantage of making explicit
711the relationship between processes in
712.I /cg1/leaf
713and
714.IR /cg1 's
715other children.
716.PP
717The "no internal processes" rule is in fact more subtle than stated above.
718More precisely, the rule is that a (nonroot) cgroup can't both
719(1) have member processes, and
720(2) distribute resources into child cgroups\(emthat is, have a nonempty
721.I cgroup.subtree_control
722file.
723Thus, it
724.I is
725possible for a cgroup to have both member processes and child cgroups,
726but before controllers can be enabled for that cgroup,
727the member processes must be moved out of the cgroup
728(e.g., perhaps into the child cgroups).
e91d4f9e
MK
729.PP
730With the Linux 4.14 addition of "thread mode" (described below),
731the "no internal processes" rule has been relaxed in some cases.
2468f14e 732.\"
754f4cf5
MK
733.SS Cgroups v2 cgroup.events file
734With cgroups v2, a new mechanism is provided to obtain notification
735about when a cgroup becomes empty.
736The cgroups v1
737.IR release_agent
738and
739.IR notify_on_release
740files are removed, and replaced by a new, more general-purpose file,
741.IR cgroup.events .
e5bd7e65 742This read-only file contains key-value pairs
754f4cf5
MK
743(delimited by newline characters, with the key and value separated by spaces)
744that identify events or state for a cgroup.
745Currently, only one key appears in this file,
746.IR populated ,
747which has either the value 0,
748meaning that the cgroup (and its descendants)
749contain no (nonzombie) processes,
750or 1, meaning that the cgroup contains member processes.
a721e8b2 751.PP
754f4cf5
MK
752The
753.IR cgroup.events
754file can be monitored, in order to receive notification when a cgroup
755transitions between the populated and unpopulated states (or vice versa).
756When monitoring this file using
757.BR inotify (7),
758transitions generate
759.BR IN_MODIFY
760events, and when monitoring the file using
761.BR poll (2),
762transitions generate
763.B POLLPRI
764events.
a721e8b2 765.PP
ccb1a262
MK
766The cgroups v2 release-notification mechanism provided by the
767.I populated
768field of the
769.I cgroup.events
770file offers at least two advantages over the cgroups v1
754f4cf5
MK
771.IR release_agent
772mechanism.
773First, it allows for cheaper notification,
774since a single process can monitor multiple
775.IR cgroup.events
776files.
777By contrast, the cgroups v1 mechanism requires the creation
778of a process for each notification.
a15e0673 779Second, notification can be delegated to a process that lives inside
754f4cf5 780a container associated with the newly empty cgroup.
c91a9f8a 781.\"
5e071499
MK
782.SS Cgroups v2 cgroup.stat file
783.\" commit ec39225cca42c05ac36853d11d28f877fde5c42e
784Each cgroup in the v2 hierarchy contains a read-only
785.IR cgroup.stat
786file (first introduced in Linux 4.14)
787that consists of lines containing key-value pairs.
788The following keys currently appear in this file:
789.TP
790.I nr_descendants
791This is the total number of visible (i.e., living) descendant cgroups
792underneath this cgroup.
793.TP
794.I nr_dying_descendants
795This is the total number of dying descendant cgroups
796underneath this cgroup.
797A cgroup enters the dying state after being deleted.
798It remains in that state for an undefined period
799(which will depend on system load)
c7f63e74
MK
800while resources are freed before the cgroup is destroyed.
801Note that the presence of some cgroups in the dying state is normal,
802and is not indicative of any problem.
5e071499
MK
803.IP
804A process can't be made a member of a dying cgroup,
805and a dying cgroup can't be brought back to life.
806.\"
5845e10b
MK
807.SS Limiting the number of descendant cgroups
808Each cgroup in the v2 hierarchy contains the following files,
809which can be used to view and set limits on the number
810of descendant cgroups under that cgroup:
811.TP
812.IR cgroup.max.depth " (since Linux 4.14)"
813.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
814This file defines a limit on the depth of nesting of descendant cgroups.
815A value of 0 in this file means that no descendant cgroups can be created.
816An attempt to create a descendant whose nesting level exceeds
817the limit fails
818.RI ( mkdir (2)
819fails with the error
820.BR EAGAIN ).
821.IP
822Writing the string
823.IR """max"""
824to this file means that no limit is imposed.
825The default value in this file is
826.IR """max""" .
827.TP
828.IR cgroup.max.descendants " (since Linux 4.14)"
829.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
830This file defines a limit on the number of live descendant cgroups that
831this cgroup may have.
832An attempt to create more descendants than allowed by the limit fails
833.RI ( mkdir (2)
834fails with the error
835.BR EAGAIN ).
836.IP
837Writing the string
838.IR """max"""
839to this file means that no limit is imposed.
840The default value in this file is
841.IR """max""" .
842.\"
148e0800 843.SS Cgroups v2 delegation: delegation to a less privileged user
4242dfbe
MK
844In the context of cgroups,
845delegation means passing management of some subtree
846of the cgroup hierarchy to a nonprivileged process.
847Cgroups v1 provides support for delegation that was
848accidental and not fully secure.
849Cgroups v2 supports delegation by explicit design.
850.PP
851Some terminology is required in order to describe delegation.
852A
853.I delegater
854is a privileged user (i.e., root) who owns a parent cgroup.
855A
856.I delegatee
857is a nonprivileged user who will be granted the permissions needed
858to manage some subhierarchy under that parent cgroup,
859known as the
860.IR "delegated subtree" .
861.PP
862To perform delegation,
863the delegater makes certain directories and files writable by the delegatee,
864typically by changing the ownership of the objects to be the user ID
865of the delegatee.
0735069b
MK
866Assuming that we want to delegate the hierarchy rooted at (say)
867.I /dlgt_grp
4242dfbe
MK
868and that there are not yet any child cgroups under that cgroup,
869the ownership of the following is changed to the user ID of the delegatee:
870.TP
0735069b 871.IR /dlgt_grp
4242dfbe
MK
872Changing the ownership of the root of the subtree means that any new
873cgroups created under the subtree (and the files they contain)
874will also be owned by the delegatee.
875.TP
0735069b 876.IR /dlgt_grp/cgroup.procs
f7286edc 877Changing the ownership of this file means that the delegatee
4242dfbe
MK
878can move processes into the root of the delegated subtree.
879.TP
0735069b 880.IR /dlgt_grp/cgroup.subtree_control
e5936eb6
MK
881Changing the ownership of this file means that that the delegatee
882can enable controllers (that are present in
0735069b 883.IR /dlgt_grp/cgroup.controllers )
4242dfbe 884in order to further redistribute resources at lower levels in the subtree.
e5936eb6
MK
885(As an alternative to changing the ownership of this file,
886the delegater might instead add selected controllers to this file.)
4242dfbe
MK
887.PP
888The delegater should
889.I not
890change the ownership of any of the controller interfaces files (e.g.,
891.IR pids.max ,
892.IR memory.high )
893in
0735069b 894.IR dlgt_grp .
4242dfbe
MK
895Those files are used from the next level above the delegated subtree
896in order to distribute resources into the subtree,
897and the delegatee should not have permission to change
898the resources that are distributed into the delegated subtree.
899.PP
668ef765
MK
900See also the discussion of the
901.IR /sys/kernel/cgroup/delegate
902file in NOTES.
903.PP
4242dfbe
MK
904After the aforementioned steps have been performed,
905the delegatee can create child cgroups within the delegated subtree
906and move processes between cgroups in the subtree.
907If some controllers are present in
0735069b 908.IR dlgt_grp/cgroup.subtree_control ,
4242dfbe 909or the ownership of that file was passed to the delegatee,
f7286edc 910the delegatee can also control the further redistribution
4242dfbe 911of the corresponding resources into the delegated subtree.
27b086e9 912.\"
ed3f4f34
MK
913.SS Cgroups v2 delegation: nsdelegate and cgroup namespaces
914.\"
915.\" To test this, it can be useful to boot the kernel with the options:
916.\"
917.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
918.\"
919.\" The effect of the latter option is to prevent systemd from employing
920.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2.
921.\"
922Starting with Linux 4.13,
923.\" commit 5136f6365ce3eace5a926e10f16ed2a233db5ba9
924there is a second way to perform cgroup delegation.
925This is done by mounting the cgroup v2 filesystem with the
926.I nsdelegate
927mount option:
928.PP
929.in +4n
930.EX
931$ mount -t cgroup2 -o nsdelegate none /sys/fs/cgroup/unified
932.EE
933.in
934.PP
935The effect of this option is to cause cgroup namespaces
936to automatically become delegation boundaries.
937More specifically,
938the following restrictions apply for processes inside the cgroup namespace:
939.IP * 3
940Writes to controller interface files in the root directory
941will fail with the error
942.BR EPERM .
943Processes inside the cgroup namespace can still write to delegatable
944files such as
945.IR cgroup.procs
946and
947.IR cgroup.subtree_control ,
948and can create subhierarchy underneath the root directory of
949the cgroup namespace.
950.IP *
951Attempts to migrate processes across the namespace boundary are denied
952(with the error
953.BR ENOENT ).
954Processes inside the cgroup namespace can still
955(subject to the containment rules described below)
956move processes between cgroups
957.I within
958the subhierarchy under the namespace root.
959.PP
960The ability to define cgroup namespaces as delegation boundaries
961makes cgroup namespaces more useful.
962To understand why, suppose that we already have one cgroup hierarchy
963that has been delegated to a nonprivileged user,
964.IR cecilia ,
965using the older delegation technique described above.
966Suppose further that
967.I cecilia
968wanted to further delegate a subhierarchy
969under the existing delegated hierarchy.
970(For example, the delegated hierarchy might be associated with
971an unprivileged container run by
972.IR cecilia .)
973Even if a cgroup namespace was employed,
974because both hierarchies are owned by the unprivileged user
975.IR cecilia ,
976the following illegitimate actions could be performed:
977.IP * 3
978A process in the inferior hierarchy could change the
979resource controller settings in the root directory of the that hierarchy.
980(These resource controller settings are intended to allow control to
981be exercised from the
982.I parent
983cgroup;
984a process inside the child cgroup should not be allowed to modify them.)
985.IP *
986A process inside the inferior hierarchy could move processes
987into and out of the inferior hierarchy if the cgroups in the
988superior hierarchy were somehow visible.
989.PP
990Employing the
991.I nsdelegate
992mount option prevents both of these possibilities.
993.PP
994The
995.I nsdelegate
996mount option only has an effect when performed in
997the initial mount namespace;
998in other mount namespaces, the option is silently ignored.
999.\"
27b086e9 1000.SS Cgroup v2 delegation containment rules
4242dfbe
MK
1001Some delegation
1002.IR "containment rules"
1003ensure that the delegatee can move processes between cgroups within the
1004delegated subtree,
1005but can't move processes from outside the delegated subtree into
1006the subtree or vice versa.
1007A nonprivileged process (i.e., the delegatee) can write the PID of
1008a "target" process into a
1009.IR cgroup.procs
1010file only if all of the following are true:
1011.IP * 3
4242dfbe
MK
1012The writer has write permission on the
1013.I cgroup.procs
1014file in the destination cgroup.
1015.IP *
1016The writer has write permission on the
1017.I cgroup.procs
1018file in the common ancestor of the source and destination cgroups.
1019(In some cases,
1020the common ancestor may be the source or destination cgroup itself.)
28f612ea 1021.IP *
ed3f4f34
MK
1022If the cgroup v2 filesystem was mounted with the
1023.I nsdelegate
1024option, the writer must be able to see the source and destination cgroup
1025from its cgroup namespace.
1026.IP *
28f612ea
MK
1027Before Linux 4.11:
1028.\" commit 576dd464505fc53d501bb94569db76f220104d28
1029the effective UID of the writer (i.e., the delegatee) matches the
1030real user ID or the saved set-user-ID of the target process.
1031(This was a historical requirement inherited from cgroups v1
1032that was later deemed unnecessary,
1033since the other rules suffice for containment in cgroups v2.)
4242dfbe
MK
1034.PP
1035.IR Note :
1036one consequence of these delegation containment rules is that the
0735069b
MK
1037unprivileged delegatee can't place the first process into
1038the delegated subtree;
1039instead, the delegater must place the first process
1040(a process owned by the delegatee) into the delegated subtree.
4242dfbe 1041.\"
75e83bc2 1042.SH CGROUPS VERSION 2 THREAD MODE
c8902e25
MK
1043Among the restrictions imposed by cgroups v2 that were not present
1044in cgroups v1 are the following:
1045.IP * 3
1046.IR "No thread-granularity control" :
1047all of the threads of a process must be in the same cgroup.
1048.IP *
1049.IR "No internal processes" :
1050a cgroup can't both have member processes and
1051exercise controllers on child cgroups.
1052.PP
1053Both of these restrictions were added because
1054the lack of these restrictions had caused problems
1055in cgroups v1.
1056In particular, the cgroups v1 ability to allow thread-level granularity
1057for cgroup membership made no sense for some controllers.
1058(A notable example was the
1059.I memory
1060controller: since threads share an address space,
1061it made no sense to split threads across different
1062.I memory
1063cgroups.)
1064.PP
1065Notwithstanding the initial design decision in cgroups v2,
1066there were use cases for certain controllers, notably the
1067.IR cpu
1068controller,
1069for which thread-level granularity of control was meaningful and useful.
1070To accommodate such use cases, Linux 4.14 added
1071.I "thread mode"
1072for cgroups v2.
1073.PP
1074Thread mode allows the following:
1075.IP * 3
1076The creation of
1077.IR "threaded subtrees"
1078in which the threads of a process may
1079be spread across cgroups inside the tree.
1080(A threaded subtree may contain multiple multithreaded processes.)
1081.IP *
1082The concept of
1083.IR "threaded controllers",
1084which can distribute resources across the cgroups in a threaded subtree.
1085.IP *
1086A relaxation of the "no internal processes rule",
1087so that, within a threaded subtree,
1088a cgroup can both contain member threads and
1089exercise resource control over child cgroups.
1090.PP
1091With the addition of thread mode,
1092each nonroot cgroup now contains a new file,
1093.IR cgroup.type ,
1094that exposes, and in some circumstances can be used to change,
1095the "type" of a cgroup.
1096This file contains one of the following type values:
1097.TP
1098.I "domain"
1099This is a normal v2 cgroup that provides process-granularity control.
1100If a process is a member of this cgroup,
1101then all threads of the process are (by definition) in the same cgroup.
1102This is the default cgroup type,
1103and provides the same behavior that was provided for
1104cgroups in the initial cgroups v2 implementation.
1105.TP
1106.I "threaded"
1107This cgroup is a member of a threaded subtree.
1108Threads can be added to this cgroup,
1109and controllers can be enabled for the cgroup.
1110.TP
1111.I "domain threaded"
1112This is a domain cgroup that serves as the root of a threaded subtree.
1113This cgroup type is also known as "threaded root".
1114.TP
1115.I "domain invalid"
1116This is a cgroup inside a threaded subtree
1117that is in an "invalid" state.
1118Processes can't be added to the cgroup,
1119and controllers can't be enabled for the cgroup.
1120The only thing that can be done with this cgroup (other than deleting it)
1121is to convert it to a
1122.IR threaded
1123cgroup by writing the string
1124.IR """threaded"""
1125to the
1126.I cgroup.type
1127file.
1128.\"
1129.SS Threaded versus domain controllers
1130With the addition of threads mode,
1131cgroups v2 now distinguishes two types of resource controllers:
1132.IP * 3
1133.I Threaded
1134controllers: these controllers support thread-granularity for
1135resource control and can be enabled inside threaded subtrees,
1136with the result that the corresponding controller-interface files
1137appear inside the cgroups in the threaded subtree.
1138As at Linux 4.15, the following controllers are threaded:
1139.IR cpu ,
1140.IR perf_event ,
1141and
1142.IR pids .
1143.IP *
1144.I Domain
1145controllers: these controllers support only process granularity
1146for resource control.
1147From the perspective of a domain controller,
1148all threads of a process are always in the same cgroup.
1149Domain controllers can't be enabled inside a threaded subtree.
1150.\"
1151.SS Creating a threaded subtree
1152There are two pathways that lead to the creation of a threaded subtree.
1153The first pathway proceeds as follows:
1154.IP 1. 3
1155We write the string
1156.IR """threaded"""
1157to the
1158.I cgroup.type
1159file of a cgroup
1160.IR y/z
1161that currently has the type
1162.IR domain .
1163This has the following effects:
1164.RS
1165.IP * 3
1166The type of the cgroup
1167.IR y/z
1168becomes
1169.IR threaded .
1170.IP *
1171The type of the parent cgroup,
1172.IR y ,
1173becomes
1174.IR "domain threaded" .
1175The parent cgroup is the root of a threaded subtree
1176(also known as the "threaded root").
1177.IP *
1178All other cgroups under
1179.IR y
1180that were not already of type
1181.IR threaded
1182(because they were inside already existing threaded subtrees
1183under the new threaded root)
1184are converted to type
1185.IR "domain invalid" .
1186Any subsequently created cgroups under
1187.I y
1188will also have the type
1189.IR "domain invalid" .
1190.RE
1191.IP 2.
1192We write the string
1193.IR """threaded"""
1194to each of the
1195.IR "domain invalid"
1196cgroups under
1197.IR y ,
1198in order to convert them to the type
1199.IR threaded .
1200As a consequence of this step, all threads under the threaded root
1201now have the type
1202.IR threaded
1203and the threaded subtree is now fully usable.
1204The requirement to write
1205.IR """threaded"""
1206to each of these cgroups is somewhat cumbersome,
1207but allows for possible future extensions to the thread-mode model.
1208.PP
1209The second way of creating a threaded subtree is as follows:
1210.IP 1. 3
1211In an existing cgroup,
1212.IR z ,
1213that currently has the type
1214.IR domain ,
1215we (1) enable one or more threaded controllers and
1216(2) make a process a member of
1217.IR z .
1218(These two steps can be done in either order.)
1219This has the following consequences:
1220.RS
1221.IP * 3
1222The type of
1223.I z
1224becomes
1225.IR "domain threaded" .
1226.IP *
1227All of the descendant cgroups of
1228.I x
1229that are were not already of type
1230.IR threaded
1231are converted to type
1232.IR "domain invalid" .
1233.RE
1234.IP 2.
1235As before, we make the threaded subtree usable by writing the string
1236.IR """threaded"""
1237to each of the
1238.IR "domain invalid"
1239cgroups under
1240.IR y ,
1241in order to convert them to the type
1242.IR threaded .
1243.PP
1244One of the consequences of the above pathways to creating a threaded subtree
1245is that the threaded root cgroup can be a parent only to
1246.I threaded
1247(and
1248.IR "domain invalid" )
1249cgroups.
1250The threaded root cgroup can't be a parent of a
1251.I domain
1252cgroups, and a
1253.I threaded
1254cgroup
1255can't have a sibling that is a
1256.I domain
1257cgroup.
1258.\"
1259.SS Using a threaded subtree
1260Within a threaded subtree, threaded controllers can be enabled
1261in each subgroup whose type has been changed to
1262.IR threaded ;
1263upon doing so, the corresponding controller interface files
1264appear in the children of that cgroup.
1265.PP
1266A process can be moved into a threaded subtree by writing its PID to the
1267.I cgroup.procs
1268file in one of the cgroups inside the tree.
1269This has the effect of making all of the threads
1270in the process members of the corresponding cgroup
1271and makes the process a member of the threaded subtree.
1272The threads of the process can then be spread across
1273the threaded subtree by writing their thread IDs (see
1274.BR gettid (2))
1275to the
1276cgroup.threads
1277files in different cgroups inside the subtree.
1278The threads of a process must all reside in the same threaded subtree.
1279.PP
1280The
1281cgroup.threads
1282file is present in each cgroup (including
1283.I domain
1284cgroups) and can be read in order to discover the set of threads
1285that is present in the cgroup.
1286The set of thread IDs obtained when reading this file
1287is not guaranteed to be ordered or free of duplicates.
1288.PP
1289The
1290.I cgroup.procs
1291file in the threaded root shows the PIDs of all processes
1292that are members of the threaded subtree.
1293The
1294.I cgroup.procs
1295files in the other cgroups in the subtree are not readable.
1296.PP
1297Domain controllers can't be enabled in a threaded subtree;
1298no controller-interface files appear inside the cgroups underneath the
1299threaded root.
1300From the point of view of a domain controller,
1301threaded subtrees are invisible:
1302a multithreaded process inside a threaded subtree appears to a domain
1303controller as a process that resides in the threaded root cgroup.
1304.PP
1305Within a threaded subtree, the "no internal processes" rule does not apply:
1306a cgroup can both contain member processes (or thread)
1307and exercise controllers on child cgroups.
1308.\"
1309.SS Rules for writing to cgroup.type and creating threaded subtrees
1310A number of rules apply when writing to the
1311.I cgroup.type
1312file:
1313.IP * 3
1314Only the string
1315.IR """threaded"""
1316may be written.
1317In other words, the only explicit transition that is possible is to convert a
1318.I domain
1319cgroup to type
1320.IR threaded .
1321.IP *
1322The string
1323.IR """threaded"""
1324can be written only if the current value in
1325.IR cgroup.type
1326is one of the following
1327.RS
1328.IP \(bu 3
1329.IR domain ,
1330to start the creation of a threaded subtree via
1331the first of the pathways described above;
1332.IP \(bu
1333.IR "domain\ invalid" ,
1334to convert one of the cgroups in a threaded subtree into a usable (i.e.,
1335.IR threaded )
1336state;
1337.IP \(bu
1338.IR threaded ,
1339which has no effect (a "no-op").
1340.RE
1341.IP *
1342We can't write to a
1343.I cgroup.type
1344file if the parent's type is
1345.IR "domain invalid" .
1346In other words, the cgroups of a threaded subtree must be converted to the
1347.I threaded
1348state in a top-down manner.
1349.PP
00c27092 1350There are also some constraints that must be satisfied
c8902e25
MK
1351in order to create a threaded subtree rooted at the cgroup
1352.IR x :
1353.IP * 3
1354There can be no member processes in the descendant cgroups of
1355.IR x .
1356(The cgroup
1357.I x
1358can itself have member processes.)
1359.IP *
1360No domain controllers may be enabled in
1361.IR x 's
1362.IR cgroup.subtree_control
1363file.
c8902e25
MK
1364.PP
1365If any of the above constraints is violated, then an attempt to write
1366.IR """threaded"""
1367to a
1368.IR cgroup.type
1369file fails with the error
1370.BR ENOTSUP .
1371.\"
1372.SS The """domain threaded""" cgroup type
1373According to the pathways described above,
1374the type of a cgroup can change to
1375.IR "domain threaded"
1376in either of the following cases:
1377.IP * 3
1378The string
1379.IR """threaded"""
1380is written to a child cgroup.
1381.IP *
1382A threaded controller is enabled inside the cgroup and
1383a process is made a member of the cgroup.
1384.PP
1385A
1386.IR "domain threaded"
1387cgroup,
1388.IR x ,
1389can revert to the type
1390.IR domain
1391if the above conditions no longer hold true\(emthat is, if all
1392.I threaded
1393child cgroups of
1394.I x
1395are removed and either
1396.I x
1397no longer has threaded controllers enabled or
1398no longer has member processes.
1399.PP
1400When a
1401.IR "domain threaded"
1402cgroup
1403.IR x
1404reverts to the type
1405.IR domain :
1406.IP * 3
1407All
1408.IR "domain invalid"
1409descendants of
1410.I x
1411that are not in lower-level threaded subtrees revert to the type
1412.IR domain .
1413.IP *
1414The root cgroups in any lower-level threaded subtrees revert to the type
1415.IR "domain threaded" .
1416.\"
1417.SS Exceptions for the root cgroup
1418The root cgroup of the v2 hierarchy is treated exceptionally:
1419it can be the parent of both
1420.I domain
1421and
1422.I threaded
1423cgroups.
1424If the string
1425.I """threaded"""
1426is written to the
1427.I cgroup.type
1428file of one of the children of the root cgroup, then
1429.IP * 3
1430The type of that cgroup becomes
1431.IR threaded .
1432.IP *
1433The type of any descendants of that cgroup that
1434are not part of lower-level threaded subtrees changes to
1435.IR "domain invalid" .
1436.PP
1437Note that in this case, there is no cgroup whose type becomes
1438.IR "domain threaded" .
1439(Notionally, the root cgroup can be considered as the threaded root
1440for the cgroup whose type was changed to
1441.IR threaded .)
1442.PP
1443The aim of this exceptional treatment for the root cgroup is to
1444allow a threaded cgroup that employs the
1445.I cpu
1446controller to be placed as high as possible in the hierarchy,
1447so as to minimize the (small) cost of traversing the cgroup hierarchy.
1448.\"
1449.SS The cgroups v2 """cpu""" controller and realtime processes
1450As at Linux 4.15, the cgroups v2
1451.I cpu
1452controller does not support control of realtime processes,
1453and the controller can be enabled in the root cgroup only
1454if all realtime threads are in the root cgroup.
1455(If there are realtime processes in nonroot cgroups, then a
1456.BR write (2)
1457of the string
1458.IR """+cpu"""
1459to the
1460.I cgroup.subtree_control
1461file fails with the error
1462.BR EINVAL .
1463However, on some systems,
1464.BR systemd (1)
1465places certain realtime processes in nonroot cgroups in the v2 hierarchy.
1466On such systems,
1467these processes must first be moved to the root cgroup before the
1468.I cpu
1469controller can be enabled.
1470.\"
1471.SH ERRORS
1472The following errors can occur for
1473.BR mount (2):
1474.TP
1475.B EBUSY
1476An attempt to mount a cgroup version 1 filesystem specified neither the
1477.I name=
1478option (to mount a named hierarchy) nor a controller name (or
1479.IR all ).
1480.SH NOTES
1481A child process created via
1482.BR fork (2)
1483inherits its parent's cgroup memberships.
1484A process's cgroup memberships are preserved across
1485.BR execve (2).
1486.\"
5c2181ad
MK
1487.SS /proc files
1488.TP
34eb3340 1489.IR /proc/cgroups " (since Linux 2.6.24)"
92bb6d36 1490This file contains information about the controllers
1a4f7d59 1491that are compiled into the kernel.
34eb3340
MK
1492An example of the contents of this file (reformatted for readability)
1493is the following:
a721e8b2 1494.IP
34eb3340 1495.in +4n
b8302363 1496.EX
4580c2f6
MK
1497#subsys_name hierarchy num_cgroups enabled
1498cpuset 4 1 1
1499cpu 8 1 1
1500cpuacct 8 1 1
1501blkio 6 1 1
1502memory 3 1 1
1503devices 10 84 1
1504freezer 7 1 1
1505net_cls 9 1 1
1506perf_event 5 1 1
1507net_prio 9 1 1
1508hugetlb 0 1 0
1509pids 2 1 1
b8302363 1510.EE
e646a1ba 1511.in
a721e8b2 1512.IP
34eb3340
MK
1513The fields in this file are, from left to right:
1514.RS
1515.IP 1. 3
1516The name of the controller.
1517.IP 2.
92bb6d36 1518The unique ID of the cgroup hierarchy on which this controller is mounted.
11c0797f 1519If multiple cgroups v1 controllers are bound to the same hierarchy,
34eb3340 1520then each will show the same hierarchy ID in this field.
92bb6d36
MK
1521The value in this field will be 0 if:
1522.RS 5
1523.IP a) 3
1524the controller is not mounted on a cgroups v1 hierarchy;
1525.IP b)
1526the controller is bound to the cgroups v2 single unified hierarchy; or
1527.IP c)
1528the controller is disabled (see below).
1529.RE
34eb3340
MK
1530.IP 3.
1531The number of control groups in this hierarchy using this controller.
1532.IP 4.
1533This field contains the value 1 if this controller is enabled,
1534or 0 if it has been disabled (via the
1535.IR cgroup_disable
1536kernel command-line boot parameter).
1537.RE
1538.TP
5c2181ad 1539.IR /proc/[pid]/cgroup " (since Linux 2.6.24)"
f5faa016
MK
1540This file describes control groups to which the process
1541with the corresponding PID belongs.
5f8a7eb2 1542The displayed information differs for
2c4fbe35 1543cgroups version 1 and version 2 hierarchies.
a721e8b2 1544.IP
5f8a7eb2 1545For each cgroup hierarchy of which the process is a member,
2e33b59e 1546there is one entry containing three colon-separated fields:
a721e8b2 1547.IP
4769a778
MK
1548.in +4n
1549.EX
1550hierarchy-ID:controller-list:cgroup-path
1551.EE
1552.in
a721e8b2 1553.IP
5f8a7eb2 1554For example:
c1a022dc
MK
1555.IP
1556.in +4n
1557.EX
15585:cpuacct,cpu,cpuset:/daemons
1559.EE
1560.in
5c2181ad
MK
1561.IP
1562The colon-separated fields are, from left to right:
5f8a7eb2 1563.RS
5c2181ad 1564.IP 1. 3
5f8a7eb2
MK
1565For cgroups version 1 hierarchies,
1566this field contains a unique hierarchy ID number
1567that can be matched to a hierarchy ID in
1568.IR /proc/cgroups .
1569For the cgroups version 2 hierarchy, this field contains the value 0.
5c2181ad 1570.IP 2.
5f8a7eb2 1571For cgroups version 1 hierarchies,
55f52de8 1572this field contains a comma-separated list of the controllers
5f8a7eb2
MK
1573bound to the hierarchy.
1574For the cgroups version 2 hierarchy, this field is empty.
5c2181ad 1575.IP 3.
5f8a7eb2
MK
1576This field contains the pathname of the control group in the hierarchy
1577to which the process belongs.
1578This pathname is relative to the mount point of the hierarchy.
5c2181ad 1579.RE
668ef765
MK
1580.\"
1581.SS /sys/kernel/cgroup files
1582.TP
1583.IR /sys/kernel/cgroup/delegate " (since Linux 4.15)"
1584.\" commit 01ee6cfb1483fe57c9cbd8e73817dfbf9bacffd3
1585This file exports a list of the cgroups v2 files
1586(one per line) that are delegatable
1587(i.e., whose ownership should be changed to the user ID of the delegatee).
1588In the future, the set of delegatable files may change or grow,
1589and this file provides a way for the kernel to inform
1590user-space applications of which files must be delegated.
1591As at Linux 4.15, one sees the following when inspecting this file:
1592.IP
1593.EX
1594.in +4n
1595$ \fBcat /sys/kernel/cgroup/delegate\fP
1596cgroup.procs
1597cgroup.subtree_control
1598.in
1599.EE
6413d784
MK
1600.TP
1601.IR /sys/kernel/cgroup/features " (since Linux 4.15)"
1602.\" commit 5f2e673405b742be64e7c3604ed4ed3ac14f35ce
1603Over time, the set of cgroups v2 features that are provided by the
1604kernel may change or grow,
1605or some features may not be enabled by default.
1606This file provides a way for user-space applications to discover what
fcf115f5 1607features the running kernel supports and has enabled.
6413d784
MK
1608Features are listed one per line:
1609.IP
1610.in +4n
1611.EX
1612.EE
1613$ \fBcat /sys/kernel/cgroup/features\fP
1614nsdelegate
1615.in
1616.IP
1617The entries that can appear in this file are:
1618.RS
1619.TP
1620.IR nsdelegate " (since Linux 4.15)"
1621The kernel supports the
1622.I nsdelegate
1623mount option.
1624.RE
2e23a9b2
MK
1625.SH ERRORS
1626The following errors can occur for
1627.BR mount (2):
1628.TP
1629.B EBUSY
1630An attempt to mount a cgroup version 1 filesystem specified neither the
1631.I name=
1632option (to mount a named hierarchy) nor a controller name (or
28bcfee9 1633.IR all ).
15ce4b0c
MK
1634.SH NOTES
1635A child process created via
1636.BR fork (2)
1637inherits its parent's cgroup memberships.
1638A process's cgroup memberships are preserved across
1639.BR execve (2).
bbfdf727 1640.SH SEE ALSO
ebbc83be 1641.BR prlimit (1),
f60a5da2 1642.BR systemd (1),
edc2a022
MK
1643.BR systemd-cgls (1),
1644.BR systemd-cgtop (1),
325b7eb0 1645.BR clone (2),
ebbc83be
MK
1646.BR ioprio_set (2),
1647.BR perf_event_open (2),
1648.BR setrlimit (2),
cff6de30 1649.BR cgroup_namespaces (7),
69c47536 1650.BR cpuset (7),
ebbc83be
MK
1651.BR namespaces (7),
1652.BR sched (7),
1653.BR user_namespaces (7)