]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/cgroups.7
cgroups.7: wfix: use "threads" consistently
[thirdparty/man-pages.git] / man7 / cgroups.7
CommitLineData
014cb63b 1.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
4242dfbe 2.\" and Copyright (C) 2016, 2017 Michael Kerrisk <mtk.manpages@gmail.com>
014cb63b
MK
3.\"
4.\" %%%LICENSE_START(VERBATIM)
5.\" Permission is granted to make and distribute verbatim copies of this
6.\" manual provided the copyright notice and this permission notice are
7.\" preserved on all copies.
8.\"
9.\" Permission is granted to copy and distribute modified versions of this
10.\" manual under the conditions for verbatim copying, provided that the
11.\" entire resulting derived work is distributed under the terms of a
12.\" permission notice identical to this one.
13.\"
14.\" Since the Linux kernel and libraries are constantly changing, this
15.\" manual page may be incorrect or out-of-date. The author(s) assume no
16.\" responsibility for errors or omissions, or for damages resulting from
17.\" the use of the information contained herein. The author(s) may not
18.\" have taken the same level of care in the production of this manual,
19.\" which is licensed free of charge, as they might when working
20.\" professionally.
21.\"
22.\" Formatted or processed versions of this manual, if unaccompanied by
23.\" the source, must acknowledge the copyright and authors of this work.
24.\" %%%LICENSE_END
25.\"
8538a62b 26.TH CGROUPS 7 2018-02-02 "Linux" "Linux Programmer's Manual"
21f0d132
MK
27.SH NAME
28cgroups \- Linux control groups
29.SH DESCRIPTION
30Control cgroups, usually referred to as cgroups,
a15e0673 31are a Linux kernel feature which allow processes to
8bff7140
MK
32be organized into hierarchical groups whose usage of
33various types of resources can then be limited and monitored.
34The kernel's cgroup interface is provided through
21f0d132 35a pseudo-filesystem called cgroupfs.
6398ca15 36Grouping is implemented in the core cgroup kernel code,
21f0d132 37while resource tracking and limits are implemented in
8bff7140 38a set of per-resource-type subsystems (memory, CPU, and so on).
21f0d132 39.\"
176a4211
MK
40.SS Terminology
41A
42.I cgroup
43is a collection of processes that are bound to a set of
44limits or parameters defined via the cgroup filesystem.
a721e8b2 45.PP
176a4211
MK
46A
47.I subsystem
48is a kernel component that modifies the behavior of
49the processes in a cgroup.
50Various subsystems have been implemented, making it possible to do things
51such as limiting the amount of CPU time and memory available to a cgroup,
52accounting for the CPU time used by a cgroup,
53and freezing and resuming execution of the processes in a cgroup.
54Subsystems are sometimes also known as
55.IR "resource controllers"
56(or simply, controllers).
a721e8b2 57.PP
55f52de8 58The cgroups for a controller are arranged in a
176a4211
MK
59.IR hierarchy .
60This hierarchy is defined by creating, removing, and
61renaming subdirectories within the cgroup filesystem.
8fc9db1e
MK
62At each level of the hierarchy, attributes (e.g., limits) can be defined.
63The limits, control, and accounting provided by cgroups generally have
64effect throughout the subhierarchy underneath the cgroup where the
65attributes are defined.
8bff7140
MK
66Thus, for example, the limits placed on
67a cgroup at a higher level in the hierarchy cannot be exceeded
68by descendant cgroups.
176a4211 69.\"
43df1ab3
MK
70.SS Cgroups version 1 and version 2
71The initial release of the cgroups implementation was in Linux 2.6.24.
55f52de8 72Over time, various cgroup controllers have been added
43df1ab3 73to allow the management of various types of resources.
55f52de8
MK
74However, the development of these controllers was largely uncoordinated,
75with the result that many inconsistencies arose between controllers
43df1ab3
MK
76and management of the cgroup hierarchies became rather complex.
77(A longer description of these problems can be found in
78the kernel source file
0a837899 79.IR Documentation/cgroup\-v2.txt .)
a721e8b2 80.PP
813d9220
MK
81Because of the problems with the initial cgroups implementation
82(cgroups version 1),
43df1ab3
MK
83starting in Linux 3.10, work began on a new,
84orthogonal implementation to remedy these problems.
85Initially marked experimental, and hidden behind the
86.I "\-o\ __DEVEL__sane_behavior"
87mount option, the new version (cgroups version 2)
88was eventually made official with the release of Linux 4.5.
89Differences between the two versions are described in the text below.
a721e8b2 90.PP
43df1ab3
MK
91Although cgroups v2 is intended as a replacement for cgroups v1,
92the older system continues to exist
93(and for compatibility reasons is unlikely to be removed).
94Currently, cgroups v2 implements only a subset of the controllers
95available in cgroups v1.
96The two systems are implemented so that both v1 controllers and
97v2 controllers can be mounted on the same system.
98Thus, for example, it is possible to use those controllers
99that are supported under version 2,
100while also using version 1 controllers
101where version 2 does not yet support those controllers.
1a90a85e
MK
102The only restriction here is that a controller can't be simultaneously
103employed in both a cgroups v1 hierarchy and in the cgroups v2 hierarchy.
43df1ab3 104.\"
5714ccee 105.SH CGROUPS VERSION 1
8bff7140
MK
106Under cgroups v1, each controller may be mounted against a separate
107cgroup filesystem that provides its own hierarchical organization of the
108processes on the system.
980f1827 109It is also possible to comount multiple (or even all) cgroups v1 controllers
8bff7140
MK
110against the same cgroup filesystem, meaning that the comounted controllers
111manage the same hierarchical organization of processes.
a721e8b2 112.PP
8bff7140
MK
113For each mounted hierarchy,
114the directory tree mirrors the control group hierarchy.
115Each control group is represented by a directory, with each of its child
116control cgroups represented as a child directory.
117For instance,
118.IR /user/joe/1.session
119represents control group
120.IR 1.session ,
121which is a child of cgroup
122.IR joe ,
123which is a child of
124.IR /user .
125Under each cgroup directory is a set of files which can be read or
126written to, reflecting resource limits and a few general cgroup
127properties.
8bff7140 128.\"
6398ca15 129.SS Tasks (threads) versus processes
c775bca2
MK
130In cgroups v1, a distinction is drawn between
131.I processes
132and
133.IR tasks .
134In this view, a process can consist of multiple tasks
6398ca15
MK
135(more commonly called threads, from a user-space perspective,
136and called such in the remainder of this man page).
0ec74e08 137In cgroups v1, it is possible to independently manipulate
6398ca15 138the cgroup memberships of the threads in a process.
c56ec51b
MK
139.PP
140The cgroups v1 ability to split threads across different cgroups
141caused problems in some cases.
142For example, it made no sense for the
143.I memory
144controller,
145since all of the threads of a process share a single address space.
146Because of these problems,
c775bca2 147the ability to independently manipulate the cgroup memberships
56769384
MK
148of the threads in a process was removed in the initial cgroups v2
149implementation, and subsequently restored in a more limited form
150(see the discussion of "thread mode" below).
c775bca2 151.\"
77e0a626
MK
152.SS Mounting v1 controllers
153The use of cgroups requires a kernel built with the
8e6578f8
KF
154.BR CONFIG_CGROUP
155option.
77e0a626
MK
156In addition, each of the v1 controllers has an associated
157configuration option that must be set in order to employ that controller.
a721e8b2 158.PP
77e0a626
MK
159In order to use a v1 controller,
160it must be mounted against a cgroup filesystem.
4e07c70f
MK
161The usual place for such mounts is under a
162.BR tmpfs (5)
163filesystem mounted at
77e0a626
MK
164.IR /sys/fs/cgroup .
165Thus, one might mount the
166.I cpu
167controller as follows:
a721e8b2 168.PP
77e0a626 169.in +4n
b8302363 170.EX
77e0a626 171mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
b8302363 172.EE
e646a1ba 173.in
a721e8b2 174.PP
77e0a626
MK
175It is possible to comount multiple controllers against the same hierarchy.
176For example, here the
177.IR cpu
21f0d132 178and
77e0a626
MK
179.IR cpuacct
180controllers are comounted against a single hierarchy:
a721e8b2 181.PP
21f0d132 182.in +4n
b8302363 183.EX
77e0a626 184mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
b8302363 185.EE
e646a1ba 186.in
a721e8b2 187.PP
55f52de8 188Comounting controllers has the effect that a process is in the same cgroup for
77e0a626 189all of the comounted controllers.
55f52de8 190Separately mounting controllers allows a process to
21f0d132
MK
191be in cgroup
192.I /foo1
55f52de8 193for one controller while being in
21f0d132
MK
194.I /foo2/foo3
195for another.
a721e8b2 196.PP
77e0a626 197It is possible to comount all v1 controllers against the same hierarchy:
a721e8b2 198.PP
77e0a626 199.in +4n
b8302363 200.EX
77e0a626 201mount \-t cgroup \-o all cgroup /sys/fs/cgroup
b8302363 202.EE
e646a1ba 203.in
a721e8b2 204.PP
77e0a626
MK
205(One can achieve the same result by omitting
206.IR "\-o all" ,
207since it is the default if no controllers are explicitly specified.)
a721e8b2 208.PP
31ec2a5c
MK
209It is not possible to mount the same controller
210against multiple cgroup hierarchies.
211For example, it is not possible to mount both the
212.I cpu
213and
214.I cpuacct
215controllers against one hierarchy, and to mount the
216.I cpu
217controller alone against another hierarchy.
218It is possible to create multiple mount points with exactly
219the same set of comounted controllers.
220However, in this case all that results is multiple mount points
221providing a view of the same hierarchy.
a721e8b2 222.PP
77e0a626
MK
223Note that on many systems, the v1 controllers are automatically mounted under
224.IR /sys/fs/cgroup ;
225in particular,
226.BR systemd (1)
227automatically creates such mount points.
21f0d132 228.\"
7409b54b
MK
229.SS Unmounting v1 controllers
230A mounted cgroup filesystem can be unmounted using the
231.BR umount (8)
232command, as in the following example:
233.PP
234.in +4n
235.EX
236umount /sys/fs/cgroup/pids
237.EE
238.in
239.PP
240.IR "But note well" :
241a cgroup filesystem is unmounted only if it is not busy,
242that is, it has no child cgroups.
243If this is not the case, then the only effect of the
244.BR umount (8)
245is to make the mount invisible.
246Thus, to ensure that the mount point is really removed,
247one must first remove all child cgroups,
248which in turn can be done only after all member processes
249have been moved from those cgroups to the root cgroup.
250.\"
860573ad
MK
251.SS Cgroups version 1 controllers
252Each of the cgroups version 1 controllers is governed
253by a kernel configuration option (listed below).
254Additionally, the availability of the cgroups feature is governed by the
255.BR CONFIG_CGROUPS
256kernel configuration option.
257.TP
258.IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP )
259Cgroups can be guaranteed a minimum number of "CPU shares"
260when a system is busy.
261This does not limit a cgroup's CPU usage if the CPUs are not busy.
4ad9a706
MK
262For further information, see
263.IR Documentation/scheduler/sched-design-CFS.txt .
a721e8b2 264.IP
4ad9a706
MK
265In Linux 3.2,
266this controller was extended to provide CPU "bandwidth" control.
267If the kernel is configured with
81ff7360 268.BR CONFIG_CFS_BANDWIDTH ,
4ad9a706
MK
269then within each scheduling period
270(defined via a file in the cgroup directory), it is possible to define
271an upper limit on the CPU time allocated to the processes in a cgroup.
272This upper limit applies even if there is no other competition for the CPU.
860573ad
MK
273Further information can be found in the kernel source file
274.IR Documentation/scheduler/sched\-bwc.txt .
275.TP
276.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
277This provides accounting for CPU usage by groups of processes.
a721e8b2 278.IP
860573ad
MK
279Further information can be found in the kernel source file
280.IR Documentation/cgroup\-v1/cpuacct.txt .
281.TP
282.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
283This cgroup can be used to bind the processes in a cgroup to
284a specified set of CPUs and NUMA nodes.
a721e8b2 285.IP
860573ad
MK
286Further information can be found in the kernel source file
287.IR Documentation/cgroup\-v1/cpusets.txt .
288.TP
289.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
290The memory controller supports reporting and limiting of process memory, kernel
291memory, and swap used by cgroups.
a721e8b2 292.IP
860573ad
MK
293Further information can be found in the kernel source file
294.IR Documentation/cgroup\-v1/memory.txt .
295.TP
296.IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP )
297This supports controlling which processes may create (mknod) devices as
298well as open them for reading or writing.
299The policies may be specified as whitelists and blacklists.
300Hierarchy is enforced, so new rules must not
301violate existing rules for the target or ancestor cgroups.
a721e8b2 302.IP
860573ad
MK
303Further information can be found in the kernel source file
304.IR Documentation/cgroup-v1/devices.txt .
305.TP
306.IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP )
307The
308.IR freezer
309cgroup can suspend and restore (resume) all processes in a cgroup.
310Freezing a cgroup
311.I /A
312also causes its children, for example, processes in
313.IR /A/B ,
314to be frozen.
a721e8b2 315.IP
860573ad
MK
316Further information can be found in the kernel source file
317.IR Documentation/cgroup-v1/freezer-subsystem.txt .
318.TP
319.IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP )
320This places a classid, specified for the cgroup, on network packets
321created by a cgroup.
322These classids can then be used in firewall rules,
323as well as used to shape traffic using
324.BR tc (8).
325This applies only to packets
326leaving the cgroup, not to traffic arriving at the cgroup.
a721e8b2 327.IP
860573ad
MK
328Further information can be found in the kernel source file
329.IR Documentation/cgroup-v1/net_cls.txt .
330.TP
331.IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP )
332The
333.I blkio
334cgroup controls and limits access to specified block devices by
335applying IO control in the form of throttling and upper limits against leaf
336nodes and intermediate nodes in the storage hierarchy.
a721e8b2 337.IP
860573ad
MK
338Two policies are available.
339The first is a proportional-weight time-based division
340of disk implemented with CFQ.
341This is in effect for leaf nodes using CFQ.
342The second is a throttling policy which specifies
343upper I/O rate limits on a device.
a721e8b2 344.IP
860573ad
MK
345Further information can be found in the kernel source file
346.IR Documentation/cgroup-v1/blkio-controller.txt .
347.TP
348.IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP )
349This controller allows
350.I perf
351monitoring of the set of processes grouped in a cgroup.
a721e8b2 352.IP
860573ad 353Further information can be found in the kernel source file
c174eb6a 354.IR tools/perf/Documentation/perf-record.txt .
860573ad
MK
355.TP
356.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
357This allows priorities to be specified, per network interface, for cgroups.
a721e8b2 358.IP
860573ad
MK
359Further information can be found in the kernel source file
360.IR Documentation/cgroup-v1/net_prio.txt .
361.TP
362.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
363This supports limiting the use of huge pages by cgroups.
a721e8b2 364.IP
860573ad
MK
365Further information can be found in the kernel source file
366.IR Documentation/cgroup-v1/hugetlb.txt .
367.TP
368.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
369This controller permits limiting the number of process that may be created
370in a cgroup (and its descendants).
a721e8b2 371.IP
860573ad
MK
372Further information can be found in the kernel source file
373.IR Documentation/cgroup-v1/pids.txt .
cfec905e
NB
374.TP
375.IR rdma " (since Linux 4.11; " \fBCONFIG_CGROUP_RDMA\fP )
d145c025
MK
376The RDMA controller permits limiting the use of
377RDMA/IB-specific resources per cgroup.
cfec905e
NB
378.IP
379Further information can be found in the kernel source file
380.IR Documentation/cgroup-v1/rdma.txt .
860573ad 381.\"
6398ca15 382.SS Creating cgroups and moving processes
9ed582ac 383A cgroup filesystem initially contains a single root cgroup, '/',
6398ca15 384which all processes belong to.
21f0d132 385A new cgroup is created by creating a directory in the cgroup filesystem:
a721e8b2 386.PP
4769a778
MK
387.in +4n
388.EX
389mkdir /sys/fs/cgroup/cpu/cg1
390.EE
391.in
a721e8b2 392.PP
21f0d132 393This creates a new empty cgroup.
a721e8b2 394.PP
f524e7f8 395A process may be moved to this cgroup by writing its PID into the cgroup's
21f0d132 396.I cgroup.procs
21f0d132 397file:
a721e8b2 398.PP
4769a778
MK
399.in +4n
400.EX
401echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
402.EE
403.in
a721e8b2 404.PP
f524e7f8 405Only one PID at a time should be written to this file.
a721e8b2 406.PP
f524e7f8
MK
407Writing the value 0 to a
408.IR cgroup.procs
409file causes the writing process to be moved to the corresponding cgroup.
a721e8b2 410.PP
6398ca15
MK
411When writing a PID into the
412.IR cgroup.procs ,
87402a2e 413all threads in the process are moved into the new cgroup at once.
a721e8b2 414.PP
f524e7f8
MK
415Within a hierarchy, a process can be a member of exactly one cgroup.
416Writing a process's PID to a
417.IR cgroup.procs
418file automatically removes it from the cgroup of
419which it was previously a member.
a721e8b2 420.PP
f524e7f8
MK
421The
422.I cgroup.procs
423file can be read to obtain a list of the processes that are
424members of a cgroup.
425The returned list of PIDs is not guaranteed to be in order.
426Nor is it guaranteed to be free of duplicates.
427(For example, a PID may be recycled while reading from the list.)
a721e8b2 428.PP
56769384 429In cgroups v1, an individual thread can be moved to
87402a2e
MK
430another cgroup by writing its thread ID
431(i.e., the kernel thread ID returned by
432.BR clone (2)
433and
434.BR gettid (2))
435to the
436.IR tasks
437file in a cgroup directory.
438This file can be read to discover the set of threads
439that are members of the cgroup.
b43be47e
MK
440.\"
441.SS Removing cgroups
442To remove a cgroup,
443it must first have no child cgroups and contain no (nonzombie) processes.
444So long as that is the case, one can simply
445remove the corresponding directory pathname.
446Note that files in a cgroup directory cannot and need not be
447removed.
448.\"
88afe701 449.SS Cgroups v1 release notification
23388d41
MK
450Two files can be used to determine whether the kernel provides
451notifications when a cgroup becomes empty.
452A cgroup is considered to be empty when it contains no child
453cgroups and no member processes.
a721e8b2 454.PP
23388d41 455A special file in the root directory of each cgroup hierarchy,
88afe701 456.IR release_agent ,
23388d41
MK
457can be used to register the pathname of a program that may be invoked when
458a cgroup in the hierarchy becomes empty.
459The pathname of the newly empty cgroup (relative to the cgroup mount point)
460is provided as the sole command-line argument when the
461.IR release_agent
462program is invoked.
463The
464.IR release_agent
465program might remove the cgroup directory,
980f1827 466or perhaps repopulate it with a process.
a721e8b2 467.PP
23388d41
MK
468The default value of the
469.IR release_agent
470file is empty, meaning that no release agent is invoked.
a721e8b2 471.PP
59af0514
MK
472The content of the
473.I release_agent
474file can also be specified via a mount option when the
475cgroup filesystem is mounted:
476.PP
477.in +4n
478.EX
479mount -o release_agent=pathname ...
480.EE
481.in
482.PP
23388d41
MK
483Whether or not the
484.IR release_agent
485program is invoked when a particular cgroup becomes empty is determined
486by the value in the
88afe701 487.IR notify_on_release
23388d41
MK
488file in the corresponding cgroup directory.
489If this file contains the value 0, then the
490.IR release_agent
491program is not invoked.
492If it contains the value 1, the
493.IR release_agent
494program is invoked.
495The default value for this file in the root cgroup is 0.
496At the time when a new cgroup is created,
497the value in this file is inherited from the corresponding file
498in the parent cgroup.
88afe701 499.\"
d311c798
MK
500.SS Cgroup v1 named hierarchies
501In cgroups v1,
502it is possible to mount a cgroup hierarchy that has no attached controllers:
503.PP
504.in +4n
505.EX
506mount -t cgroup -o none,name=somename none /some/mount/point
507.EE
508.in
509.PP
510Multiple instances of such hierarchies can be mounted;
511each hierarchy must have a unique name.
512The only purpose of such hierarchies is to track processes.
513(See the discussion of release notification below.)
514An example of this is the
515.I name=systemd
516cgroup hierarchy that is used by
517.BR systemd (1)
518to track services and user sessions.
519.\"
5714ccee 520.SH CGROUPS VERSION 2
b43be47e
MK
521In cgroups v2,
522all mounted controllers reside in a single unified hierarchy.
523While (different) controllers may be simultaneously
524mounted under the v1 and v2 hierarchies,
525it is not possible to mount the same controller simultaneously
526under both the v1 and the v2 hierarchies.
a721e8b2 527.PP
2befa495
MK
528The new behaviors in cgroups v2 are summarized here,
529and in some cases elaborated in the following subsections.
530.IP 1. 3
a15e0673 531Cgroups v2 provides a unified hierarchy against
dddb7ea1
MK
532which all controllers are mounted.
533.IP 2.
2befa495
MK
534"Internal" processes are not permitted.
535With the exception of the root cgroup, processes may reside
536only in leaf nodes (cgroups that do not themselves contain child cgroups).
4f017a68 537The details are somewhat more subtle than this, and are described below.
dddb7ea1 538.IP 3.
2befa495
MK
539Active cgroups must be specified via the files
540.IR cgroup.controllers
541and
542.IR cgroup.subtree_control .
dddb7ea1 543.IP 4.
2befa495
MK
544The
545.I tasks
546file has been removed.
547In addition, the
548.I cgroup.clone_children
549file that is employed by the
550.I cpuset
551controller has been removed.
dddb7ea1 552.IP 5.
2befa495
MK
553An improved mechanism for notification of empty cgroups is provided by the
554.IR cgroup.events
555file.
556.PP
557For more changes, see the
558.I Documentation/cgroup-v2.txt
559file in the kernel source.
e91d4f9e
MK
560.PP
561Some of the new behaviors listed above saw subsequent modification with
562the addition in Linux 4.14 of "thread mode" (described below).
2befa495 563.\"
dddb7ea1
MK
564.SS Cgroups v2 unified hierarchy
565In cgroups v1, the ability to mount different controllers
566against different hierarchies was intended to allow great flexibility
567for application design.
568In practice, though, the flexibility turned out to less useful than expected,
569and in many cases added complexity.
570Therefore, in cgroups v2,
571all available controllers are mounted against a single hierarchy.
572The available controllers are automatically mounted,
573meaning that it is not necessary (or possible) to specify the controllers
574when mounting the cgroup v2 filesystem using a command such as the following:
a721e8b2 575.PP
4769a778
MK
576.in +4n
577.EX
578mount -t cgroup2 none /mnt/cgroup2
579.EE
580.in
a721e8b2 581.PP
dddb7ea1
MK
582A cgroup v2 controller is available only if it is not currently in use
583via a mount against a cgroup v1 hierarchy.
584Or, to put things another way, it is not possible to employ
585the same controller against both a v1 hierarchy and the unified v2 hierarchy.
57cbb0db
MK
586This means that it may be necessary first to unmount a v1 controller
587(as described above) before that controller is available in v2.
588Since
589.BR systemd (1)
590makes heavy use of some v1 controllers by default,
591it can in some cases be simpler to boot the system with
592selected v1 controllers disabled.
593To do this, specify the
594.IR cgroup_no_v1=list
595option on the kernel boot command line;
596.I list
597is a comma-separated list of the names of the controllers to disable,
598or the word
599.I all
600to disable all v1 controllers.
601(This situation is correctly handled by
602.BR systemd (1),
603which falls back to operating without the specified controllers.)
03bb1264
MK
604.PP
605Note that on many modern systems,
606.BR systemd (1)
607automatically mounts the
608.I cgroup2
609filesystem at
610.I /sys/fs/cgroup/unified
611during the boot process.
dddb7ea1 612.\"
44c429ed
MK
613.SS Cgroups v2 controllers
614The following controllers, documented in the kernel source file
615.IR Documentation/cgroup-v2.txt ,
616are supported in cgroups version 2:
617.TP
618.IR io " (since Linux 4.5)"
619This is the successor of the version 1
620.I blkio
621controller.
622.TP
623.IR memory " (since Linux 4.5)"
624This is the successor of the version 1
625.I memory
626controller.
627.TP
628.IR pids " (since Linux 4.5)"
629This is the same as the version 1
630.I pids
631controller.
632.TP
633.IR perf_event " (since Linux 4.11)"
f7286edc 634This is the same as the version 1
44c429ed
MK
635.I perf_event
636controller.
637.TP
638.IR rdma " (since Linux 4.11)"
639This is the same as the version 1
640.I rdma
641controller.
642.TP
643.IR cpu " (since Linux 4.15)"
644This is the successor to the version 1
645.I cpu
646and
647.I cpuacct
648controllers.
649.\"
2befa495 650.SS Cgroups v2 subtree control
8d5f42dc
MK
651Each cgroup in the v2 hierarchy contains the following two files:
652.TP
653.IR cgroup.controllers
277559a4 654This read-only file exposes a list of the controllers that are
8d5f42dc
MK
655.I available
656in this cgroup.
657The contents of this file match the contents of the
658.I cgroup.subtree_control
659file in the parent cgroup.
660.TP
661.I cgroup.subtree_control
662This is a list of controllers that are
663.IR active
664.RI ( enabled )
665in the cgroup.
666The set of controllers in this file is a subset of the set in the
21f0d132 667.IR cgroup.controllers
8d5f42dc
MK
668of this cgroup.
669The set of active controllers is modified by writing strings to this file
670containing space-delimited controller names,
671each preceded by '+' (to enable a controller)
672or '\-' (to disable a controller), as in the following example:
673.IP
674.in +4n
675.EX
676echo '+pids -memory' > x/y/cgroup.subtree_control
677.EE
678.in
679.IP
c9b101d1
MK
680An attempt to enable a controller
681that is not present in
682.I cgroup.controllers
683leads to an
684.B ENOENT
685error when writing to the
686.I cgroup.subtree_control
687file.
688.PP
8d5f42dc
MK
689Because the list of controllers in
690.I cgroup.subtree_control
691is a subset of those
692.IR cgroup.controllers ,
693a controller that has been disabled in one cgroup in the hierarchy
694can never be re-enabled in the subtree below that cgroup.
695.PP
696A cgroup's
697.I cgroup.subtree_control
698file determines the set of controllers that are exercised in the
699.I child
700cgroups.
701When a controller (e.g.,
702.IR pids )
703is present in the
704.I cgroup.subtree_control
705file of a parent cgroup,
706then the corresponding controller-interface files (e.g.,
707.IR pids.max )
708are automatically created in the children of that cgroup
709and can be used to exert resource control in the child cgroups.
21f0d132 710.\"
2468f14e
MK
711.SS Cgroups v2 """no internal processes""" rule
712Cgroups v2 enforces a so-called "no internal processes" rule.
713Roughly speaking, this rule means that,
714with the exception of the root cgroup, processes may reside
715only in leaf nodes (cgroups that do not themselves contain child cgroups).
716This avoids the need to decide how to partition resources between
717processes which are members of cgroup A and processes in child cgroups of A.
718.PP
719For instance, if cgroup
720.I /cg1/cg2
721exists, then a process may reside in
722.IR /cg1/cg2 ,
723but not in
724.IR /cg1 .
725This is to avoid an ambiguity in cgroups v1
726with respect to the delegation of resources between processes in
727.I /cg1
728and its child cgroups.
729The recommended approach in cgroups v2 is to create a subdirectory called
730.I leaf
731for any nonleaf cgroup which should contain processes, but no child cgroups.
732Thus, processes which previously would have gone into
733.I /cg1
734would now go into
735.IR /cg1/leaf .
736This has the advantage of making explicit
737the relationship between processes in
738.I /cg1/leaf
739and
740.IR /cg1 's
741other children.
742.PP
743The "no internal processes" rule is in fact more subtle than stated above.
744More precisely, the rule is that a (nonroot) cgroup can't both
745(1) have member processes, and
746(2) distribute resources into child cgroups\(emthat is, have a nonempty
747.I cgroup.subtree_control
748file.
749Thus, it
750.I is
751possible for a cgroup to have both member processes and child cgroups,
752but before controllers can be enabled for that cgroup,
753the member processes must be moved out of the cgroup
754(e.g., perhaps into the child cgroups).
e91d4f9e
MK
755.PP
756With the Linux 4.14 addition of "thread mode" (described below),
757the "no internal processes" rule has been relaxed in some cases.
2468f14e 758.\"
754f4cf5
MK
759.SS Cgroups v2 cgroup.events file
760With cgroups v2, a new mechanism is provided to obtain notification
761about when a cgroup becomes empty.
762The cgroups v1
763.IR release_agent
764and
765.IR notify_on_release
766files are removed, and replaced by a new, more general-purpose file,
767.IR cgroup.events .
e5bd7e65 768This read-only file contains key-value pairs
754f4cf5
MK
769(delimited by newline characters, with the key and value separated by spaces)
770that identify events or state for a cgroup.
771Currently, only one key appears in this file,
772.IR populated ,
773which has either the value 0,
774meaning that the cgroup (and its descendants)
775contain no (nonzombie) processes,
776or 1, meaning that the cgroup contains member processes.
a721e8b2 777.PP
754f4cf5
MK
778The
779.IR cgroup.events
780file can be monitored, in order to receive notification when a cgroup
781transitions between the populated and unpopulated states (or vice versa).
782When monitoring this file using
783.BR inotify (7),
784transitions generate
785.BR IN_MODIFY
786events, and when monitoring the file using
787.BR poll (2),
7747ed97 788transitions cause the bits
754f4cf5 789.B POLLPRI
7747ed97
MK
790and
791.B POLLERR
792to be returned in the
793.IR revents
794field.
a721e8b2 795.PP
ccb1a262
MK
796The cgroups v2 release-notification mechanism provided by the
797.I populated
798field of the
799.I cgroup.events
800file offers at least two advantages over the cgroups v1
754f4cf5
MK
801.IR release_agent
802mechanism.
803First, it allows for cheaper notification,
804since a single process can monitor multiple
805.IR cgroup.events
806files.
807By contrast, the cgroups v1 mechanism requires the creation
808of a process for each notification.
a15e0673 809Second, notification can be delegated to a process that lives inside
754f4cf5 810a container associated with the newly empty cgroup.
c91a9f8a 811.\"
5e071499
MK
812.SS Cgroups v2 cgroup.stat file
813.\" commit ec39225cca42c05ac36853d11d28f877fde5c42e
814Each cgroup in the v2 hierarchy contains a read-only
815.IR cgroup.stat
816file (first introduced in Linux 4.14)
817that consists of lines containing key-value pairs.
818The following keys currently appear in this file:
819.TP
820.I nr_descendants
821This is the total number of visible (i.e., living) descendant cgroups
822underneath this cgroup.
823.TP
824.I nr_dying_descendants
825This is the total number of dying descendant cgroups
826underneath this cgroup.
827A cgroup enters the dying state after being deleted.
828It remains in that state for an undefined period
829(which will depend on system load)
c7f63e74
MK
830while resources are freed before the cgroup is destroyed.
831Note that the presence of some cgroups in the dying state is normal,
832and is not indicative of any problem.
5e071499
MK
833.IP
834A process can't be made a member of a dying cgroup,
835and a dying cgroup can't be brought back to life.
836.\"
5845e10b
MK
837.SS Limiting the number of descendant cgroups
838Each cgroup in the v2 hierarchy contains the following files,
839which can be used to view and set limits on the number
840of descendant cgroups under that cgroup:
841.TP
842.IR cgroup.max.depth " (since Linux 4.14)"
843.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
844This file defines a limit on the depth of nesting of descendant cgroups.
845A value of 0 in this file means that no descendant cgroups can be created.
846An attempt to create a descendant whose nesting level exceeds
847the limit fails
848.RI ( mkdir (2)
849fails with the error
850.BR EAGAIN ).
851.IP
852Writing the string
853.IR """max"""
854to this file means that no limit is imposed.
855The default value in this file is
856.IR """max""" .
857.TP
858.IR cgroup.max.descendants " (since Linux 4.14)"
859.\" commit 1a926e0bbab83bae8207d05a533173425e0496d1
860This file defines a limit on the number of live descendant cgroups that
861this cgroup may have.
862An attempt to create more descendants than allowed by the limit fails
863.RI ( mkdir (2)
864fails with the error
865.BR EAGAIN ).
866.IP
867Writing the string
868.IR """max"""
869to this file means that no limit is imposed.
870The default value in this file is
871.IR """max""" .
872.\"
148e0800 873.SS Cgroups v2 delegation: delegation to a less privileged user
4242dfbe
MK
874In the context of cgroups,
875delegation means passing management of some subtree
876of the cgroup hierarchy to a nonprivileged process.
877Cgroups v1 provides support for delegation that was
878accidental and not fully secure.
879Cgroups v2 supports delegation by explicit design.
880.PP
881Some terminology is required in order to describe delegation.
882A
883.I delegater
884is a privileged user (i.e., root) who owns a parent cgroup.
885A
886.I delegatee
887is a nonprivileged user who will be granted the permissions needed
888to manage some subhierarchy under that parent cgroup,
889known as the
890.IR "delegated subtree" .
891.PP
892To perform delegation,
893the delegater makes certain directories and files writable by the delegatee,
894typically by changing the ownership of the objects to be the user ID
895of the delegatee.
0735069b
MK
896Assuming that we want to delegate the hierarchy rooted at (say)
897.I /dlgt_grp
4242dfbe
MK
898and that there are not yet any child cgroups under that cgroup,
899the ownership of the following is changed to the user ID of the delegatee:
900.TP
0735069b 901.IR /dlgt_grp
4242dfbe
MK
902Changing the ownership of the root of the subtree means that any new
903cgroups created under the subtree (and the files they contain)
904will also be owned by the delegatee.
905.TP
0735069b 906.IR /dlgt_grp/cgroup.procs
f7286edc 907Changing the ownership of this file means that the delegatee
4242dfbe
MK
908can move processes into the root of the delegated subtree.
909.TP
0735069b 910.IR /dlgt_grp/cgroup.subtree_control
e5936eb6
MK
911Changing the ownership of this file means that that the delegatee
912can enable controllers (that are present in
0735069b 913.IR /dlgt_grp/cgroup.controllers )
4242dfbe 914in order to further redistribute resources at lower levels in the subtree.
e5936eb6
MK
915(As an alternative to changing the ownership of this file,
916the delegater might instead add selected controllers to this file.)
639b6c8c
MK
917.TP
918.IR /dlgt_grp/cgroup.threads
919Changing the ownership of this file is necessary if a threaded subtree
920is being delegated (see the description of "thread mode", below).
7b327dd5 921This permits the delegatee to write thread IDs to the file.
cd7f4c49
MK
922(The ownership of this file can also be changed when delegating
923a domain subtree, but currently this serves no purpose,
924since, as described below, it is not possible to move a thread between
925domain cgroups by writing its thread ID to the
926.IR cgroup.tasks
927file.)
4242dfbe
MK
928.PP
929The delegater should
930.I not
931change the ownership of any of the controller interfaces files (e.g.,
932.IR pids.max ,
933.IR memory.high )
934in
0735069b 935.IR dlgt_grp .
4242dfbe
MK
936Those files are used from the next level above the delegated subtree
937in order to distribute resources into the subtree,
938and the delegatee should not have permission to change
939the resources that are distributed into the delegated subtree.
940.PP
668ef765
MK
941See also the discussion of the
942.IR /sys/kernel/cgroup/delegate
943file in NOTES.
944.PP
4242dfbe
MK
945After the aforementioned steps have been performed,
946the delegatee can create child cgroups within the delegated subtree
6dc513cd
MK
947(the cgroup subdirectories and the files they contain
948will be owned by the delegatee)
4242dfbe
MK
949and move processes between cgroups in the subtree.
950If some controllers are present in
0735069b 951.IR dlgt_grp/cgroup.subtree_control ,
4242dfbe 952or the ownership of that file was passed to the delegatee,
f7286edc 953the delegatee can also control the further redistribution
4242dfbe 954of the corresponding resources into the delegated subtree.
27b086e9 955.\"
ed3f4f34 956.SS Cgroups v2 delegation: nsdelegate and cgroup namespaces
ed3f4f34
MK
957Starting with Linux 4.13,
958.\" commit 5136f6365ce3eace5a926e10f16ed2a233db5ba9
959there is a second way to perform cgroup delegation.
07361828 960This is done by mounting or remounting the cgroup v2 filesystem with the
ed3f4f34 961.I nsdelegate
07361828
MK
962mount option.
963For example, if the cgroup v2 filesystem has already been mounted,
964we can remount it with the
965.I nsdelegate
966option as follows:
ed3f4f34
MK
967.PP
968.in +4n
969.EX
07361828
MK
970mount -t cgroup2 -o remount,nsdelegate \\
971 none /sys/fs/cgroup/unified
ed3f4f34
MK
972.EE
973.in
07361828
MK
974.\"
975.\" ALternatively, we could boot the kernel with the options:
976.\"
977.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
978.\"
979.\" The effect of the latter option is to prevent systemd from employing
980.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2.
ed3f4f34 981.PP
dc581e07 982The effect of this mount option is to cause cgroup namespaces
ed3f4f34
MK
983to automatically become delegation boundaries.
984More specifically,
985the following restrictions apply for processes inside the cgroup namespace:
986.IP * 3
446d1643 987Writes to controller interface files in the root directory of the namespace
ed3f4f34
MK
988will fail with the error
989.BR EPERM .
990Processes inside the cgroup namespace can still write to delegatable
446d1643 991files in the root directory of the cgroup namespace such as
ed3f4f34
MK
992.IR cgroup.procs
993and
994.IR cgroup.subtree_control ,
446d1643 995and can create subhierarchy underneath the root directory.
ed3f4f34
MK
996.IP *
997Attempts to migrate processes across the namespace boundary are denied
998(with the error
999.BR ENOENT ).
1000Processes inside the cgroup namespace can still
1001(subject to the containment rules described below)
1002move processes between cgroups
1003.I within
1004the subhierarchy under the namespace root.
1005.PP
1006The ability to define cgroup namespaces as delegation boundaries
1007makes cgroup namespaces more useful.
1008To understand why, suppose that we already have one cgroup hierarchy
1009that has been delegated to a nonprivileged user,
1010.IR cecilia ,
1011using the older delegation technique described above.
1012Suppose further that
1013.I cecilia
1014wanted to further delegate a subhierarchy
1015under the existing delegated hierarchy.
1016(For example, the delegated hierarchy might be associated with
1017an unprivileged container run by
1018.IR cecilia .)
1019Even if a cgroup namespace was employed,
1020because both hierarchies are owned by the unprivileged user
1021.IR cecilia ,
1022the following illegitimate actions could be performed:
1023.IP * 3
1024A process in the inferior hierarchy could change the
1025resource controller settings in the root directory of the that hierarchy.
1026(These resource controller settings are intended to allow control to
1027be exercised from the
1028.I parent
1029cgroup;
1030a process inside the child cgroup should not be allowed to modify them.)
1031.IP *
1032A process inside the inferior hierarchy could move processes
1033into and out of the inferior hierarchy if the cgroups in the
1034superior hierarchy were somehow visible.
1035.PP
1036Employing the
1037.I nsdelegate
1038mount option prevents both of these possibilities.
1039.PP
1040The
1041.I nsdelegate
1042mount option only has an effect when performed in
1043the initial mount namespace;
1044in other mount namespaces, the option is silently ignored.
07361828
MK
1045.PP
1046.IR Note :
1047On some systems,
1048.BR systemd (1)
1049automatically mounts the cgroup v2 filesystem.
1050In order to experiment with the
1051.I nsdelegate
2cd9bbfa 1052operation, it may be desirable to
ed3f4f34 1053.\"
27b086e9 1054.SS Cgroup v2 delegation containment rules
4242dfbe
MK
1055Some delegation
1056.IR "containment rules"
1057ensure that the delegatee can move processes between cgroups within the
1058delegated subtree,
1059but can't move processes from outside the delegated subtree into
1060the subtree or vice versa.
1061A nonprivileged process (i.e., the delegatee) can write the PID of
1062a "target" process into a
1063.IR cgroup.procs
1064file only if all of the following are true:
1065.IP * 3
4242dfbe
MK
1066The writer has write permission on the
1067.I cgroup.procs
1068file in the destination cgroup.
1069.IP *
1070The writer has write permission on the
1071.I cgroup.procs
396761ee 1072file in the nearest common ancestor of the source and destination cgroups.
4242dfbe 1073(In some cases,
396761ee 1074the nearest common ancestor may be the source or destination cgroup itself.)
28f612ea 1075.IP *
ed3f4f34
MK
1076If the cgroup v2 filesystem was mounted with the
1077.I nsdelegate
7b574df5 1078option, the writer must be able to see the source and destination cgroups
ed3f4f34
MK
1079from its cgroup namespace.
1080.IP *
28f612ea
MK
1081Before Linux 4.11:
1082.\" commit 576dd464505fc53d501bb94569db76f220104d28
1083the effective UID of the writer (i.e., the delegatee) matches the
1084real user ID or the saved set-user-ID of the target process.
1085(This was a historical requirement inherited from cgroups v1
1086that was later deemed unnecessary,
1087since the other rules suffice for containment in cgroups v2.)
4242dfbe
MK
1088.PP
1089.IR Note :
1090one consequence of these delegation containment rules is that the
0735069b
MK
1091unprivileged delegatee can't place the first process into
1092the delegated subtree;
1093instead, the delegater must place the first process
1094(a process owned by the delegatee) into the delegated subtree.
4242dfbe 1095.\"
75e83bc2 1096.SH CGROUPS VERSION 2 THREAD MODE
c8902e25
MK
1097Among the restrictions imposed by cgroups v2 that were not present
1098in cgroups v1 are the following:
1099.IP * 3
1100.IR "No thread-granularity control" :
1101all of the threads of a process must be in the same cgroup.
1102.IP *
1103.IR "No internal processes" :
1104a cgroup can't both have member processes and
1105exercise controllers on child cgroups.
1106.PP
1107Both of these restrictions were added because
1108the lack of these restrictions had caused problems
1109in cgroups v1.
1110In particular, the cgroups v1 ability to allow thread-level granularity
1111for cgroup membership made no sense for some controllers.
1112(A notable example was the
1113.I memory
1114controller: since threads share an address space,
1115it made no sense to split threads across different
1116.I memory
1117cgroups.)
1118.PP
1119Notwithstanding the initial design decision in cgroups v2,
1120there were use cases for certain controllers, notably the
1121.IR cpu
1122controller,
1123for which thread-level granularity of control was meaningful and useful.
1124To accommodate such use cases, Linux 4.14 added
1125.I "thread mode"
1126for cgroups v2.
1127.PP
1128Thread mode allows the following:
1129.IP * 3
1130The creation of
1131.IR "threaded subtrees"
1132in which the threads of a process may
1133be spread across cgroups inside the tree.
1134(A threaded subtree may contain multiple multithreaded processes.)
1135.IP *
1136The concept of
1137.IR "threaded controllers",
1138which can distribute resources across the cgroups in a threaded subtree.
1139.IP *
1140A relaxation of the "no internal processes rule",
1141so that, within a threaded subtree,
1142a cgroup can both contain member threads and
1143exercise resource control over child cgroups.
1144.PP
1145With the addition of thread mode,
1146each nonroot cgroup now contains a new file,
1147.IR cgroup.type ,
1148that exposes, and in some circumstances can be used to change,
1149the "type" of a cgroup.
1150This file contains one of the following type values:
1151.TP
1152.I "domain"
1153This is a normal v2 cgroup that provides process-granularity control.
1154If a process is a member of this cgroup,
1155then all threads of the process are (by definition) in the same cgroup.
1156This is the default cgroup type,
1157and provides the same behavior that was provided for
1158cgroups in the initial cgroups v2 implementation.
1159.TP
1160.I "threaded"
1161This cgroup is a member of a threaded subtree.
1162Threads can be added to this cgroup,
1163and controllers can be enabled for the cgroup.
1164.TP
1165.I "domain threaded"
1166This is a domain cgroup that serves as the root of a threaded subtree.
1167This cgroup type is also known as "threaded root".
1168.TP
1169.I "domain invalid"
1170This is a cgroup inside a threaded subtree
1171that is in an "invalid" state.
1172Processes can't be added to the cgroup,
1173and controllers can't be enabled for the cgroup.
1174The only thing that can be done with this cgroup (other than deleting it)
1175is to convert it to a
1176.IR threaded
1177cgroup by writing the string
1178.IR """threaded"""
1179to the
1180.I cgroup.type
1181file.
61254835
MK
1182.IP
1183The rationale for the existence of this "interim" type
1184during the creation of a threaded subtree
1185(rather than the kernel simply immediately converting all cgroups
1186under the threaded root to the type
1187.IR threaded )
1188is to allow for
1189possible future extensions to the thread mode model
c8902e25
MK
1190.\"
1191.SS Threaded versus domain controllers
1192With the addition of threads mode,
1193cgroups v2 now distinguishes two types of resource controllers:
1194.IP * 3
1195.I Threaded
2cd9bbfa 1196.\" In the kernel source, look for ".threaded[ \t]*= true" in
218eadf4 1197.\" initializations of struct cgroup_subsys
c8902e25
MK
1198controllers: these controllers support thread-granularity for
1199resource control and can be enabled inside threaded subtrees,
1200with the result that the corresponding controller-interface files
1201appear inside the cgroups in the threaded subtree.
1202As at Linux 4.15, the following controllers are threaded:
1203.IR cpu ,
1204.IR perf_event ,
1205and
1206.IR pids .
1207.IP *
1208.I Domain
1209controllers: these controllers support only process granularity
1210for resource control.
1211From the perspective of a domain controller,
1212all threads of a process are always in the same cgroup.
1213Domain controllers can't be enabled inside a threaded subtree.
1214.\"
1215.SS Creating a threaded subtree
1216There are two pathways that lead to the creation of a threaded subtree.
1217The first pathway proceeds as follows:
1218.IP 1. 3
1219We write the string
1220.IR """threaded"""
1221to the
1222.I cgroup.type
1223file of a cgroup
1224.IR y/z
1225that currently has the type
1226.IR domain .
1227This has the following effects:
1228.RS
1229.IP * 3
1230The type of the cgroup
1231.IR y/z
1232becomes
1233.IR threaded .
1234.IP *
1235The type of the parent cgroup,
1236.IR y ,
1237becomes
1238.IR "domain threaded" .
1239The parent cgroup is the root of a threaded subtree
1240(also known as the "threaded root").
1241.IP *
1242All other cgroups under
1243.IR y
1244that were not already of type
1245.IR threaded
1246(because they were inside already existing threaded subtrees
1247under the new threaded root)
1248are converted to type
1249.IR "domain invalid" .
1250Any subsequently created cgroups under
1251.I y
1252will also have the type
1253.IR "domain invalid" .
1254.RE
1255.IP 2.
1256We write the string
1257.IR """threaded"""
1258to each of the
1259.IR "domain invalid"
1260cgroups under
1261.IR y ,
1262in order to convert them to the type
1263.IR threaded .
1264As a consequence of this step, all threads under the threaded root
1265now have the type
1266.IR threaded
1267and the threaded subtree is now fully usable.
1268The requirement to write
1269.IR """threaded"""
1270to each of these cgroups is somewhat cumbersome,
1271but allows for possible future extensions to the thread-mode model.
1272.PP
1273The second way of creating a threaded subtree is as follows:
1274.IP 1. 3
1275In an existing cgroup,
1276.IR z ,
1277that currently has the type
1278.IR domain ,
1279we (1) enable one or more threaded controllers and
1280(2) make a process a member of
1281.IR z .
1282(These two steps can be done in either order.)
1283This has the following consequences:
1284.RS
1285.IP * 3
1286The type of
1287.I z
1288becomes
1289.IR "domain threaded" .
1290.IP *
1291All of the descendant cgroups of
1292.I x
7a1cddd2 1293that were not already of type
c8902e25
MK
1294.IR threaded
1295are converted to type
1296.IR "domain invalid" .
1297.RE
1298.IP 2.
1299As before, we make the threaded subtree usable by writing the string
1300.IR """threaded"""
1301to each of the
1302.IR "domain invalid"
1303cgroups under
1304.IR y ,
1305in order to convert them to the type
1306.IR threaded .
1307.PP
1308One of the consequences of the above pathways to creating a threaded subtree
1309is that the threaded root cgroup can be a parent only to
1310.I threaded
1311(and
1312.IR "domain invalid" )
1313cgroups.
1314The threaded root cgroup can't be a parent of a
1315.I domain
1316cgroups, and a
1317.I threaded
1318cgroup
1319can't have a sibling that is a
1320.I domain
1321cgroup.
1322.\"
1323.SS Using a threaded subtree
1324Within a threaded subtree, threaded controllers can be enabled
1325in each subgroup whose type has been changed to
1326.IR threaded ;
1327upon doing so, the corresponding controller interface files
1328appear in the children of that cgroup.
1329.PP
1330A process can be moved into a threaded subtree by writing its PID to the
1331.I cgroup.procs
1332file in one of the cgroups inside the tree.
1333This has the effect of making all of the threads
1334in the process members of the corresponding cgroup
1335and makes the process a member of the threaded subtree.
1336The threads of the process can then be spread across
1337the threaded subtree by writing their thread IDs (see
1338.BR gettid (2))
1339to the
b2c3e720 1340.I cgroup.threads
c8902e25
MK
1341files in different cgroups inside the subtree.
1342The threads of a process must all reside in the same threaded subtree.
1343.PP
d84e558e
MK
1344As with writing to
1345.IR cgroup.procs ,
1346some containment rules apply when writing to the
b2c3e720 1347.I cgroup.threads
d84e558e
MK
1348file:
1349.IP * 3
1350The writer must have write permission on the
1351cgroup.threads
1352file in the destination cgroup.
1353.IP *
1354The writer must have write permission on the
1355.I cgroup.procs
1356file in the common ancestor of the source and destination cgroups.
1357(In some cases,
1358the common ancestor may be the source or destination cgroup itself.)
1359.IP *
1360The source and destination cgroups must be in the same threaded subtree.
1361(Outside a threaded subtree, an attempt to move a thread by writing
1362its thread ID to the
1363.I cgroup.threads
1364file in a different
1365.I domain
1366cgroup fails with the error
1367.BR EOPNOTSUPP .)
4178f132
MK
1368.PP
1369The
1370.I cgroup.threads
c8902e25
MK
1371file is present in each cgroup (including
1372.I domain
1373cgroups) and can be read in order to discover the set of threads
1374that is present in the cgroup.
1375The set of thread IDs obtained when reading this file
1376is not guaranteed to be ordered or free of duplicates.
1377.PP
1378The
1379.I cgroup.procs
1380file in the threaded root shows the PIDs of all processes
1381that are members of the threaded subtree.
1382The
1383.I cgroup.procs
1384files in the other cgroups in the subtree are not readable.
1385.PP
1386Domain controllers can't be enabled in a threaded subtree;
1387no controller-interface files appear inside the cgroups underneath the
1388threaded root.
1389From the point of view of a domain controller,
1390threaded subtrees are invisible:
1391a multithreaded process inside a threaded subtree appears to a domain
1392controller as a process that resides in the threaded root cgroup.
1393.PP
1394Within a threaded subtree, the "no internal processes" rule does not apply:
1395a cgroup can both contain member processes (or thread)
1396and exercise controllers on child cgroups.
1397.\"
1398.SS Rules for writing to cgroup.type and creating threaded subtrees
1399A number of rules apply when writing to the
1400.I cgroup.type
1401file:
1402.IP * 3
1403Only the string
1404.IR """threaded"""
1405may be written.
1406In other words, the only explicit transition that is possible is to convert a
1407.I domain
1408cgroup to type
1409.IR threaded .
1410.IP *
6c9aa5ad 1411The effect of writing
c8902e25 1412.IR """threaded"""
6c9aa5ad
MK
1413depends on the current value in
1414.IR cgroup.type ,
1415as follows:
c8902e25
MK
1416.RS
1417.IP \(bu 3
6c9aa5ad
MK
1418.IR domain
1419or
1420.IR "domain threaded" :
1421start the creation of a threaded subtree
1422(whose root is the parent of this cgroup) via
c8902e25
MK
1423the first of the pathways described above;
1424.IP \(bu
6c9aa5ad 1425.IR "domain\ invalid" :
4644794c 1426convert this cgroup (which is inside a threaded subtree) to a usable (i.e.,
c8902e25
MK
1427.IR threaded )
1428state;
1429.IP \(bu
6c9aa5ad
MK
1430.IR threaded :
1431no effect (a "no-op").
c8902e25
MK
1432.RE
1433.IP *
1434We can't write to a
1435.I cgroup.type
1436file if the parent's type is
1437.IR "domain invalid" .
1438In other words, the cgroups of a threaded subtree must be converted to the
1439.I threaded
1440state in a top-down manner.
1441.PP
00c27092 1442There are also some constraints that must be satisfied
c8902e25
MK
1443in order to create a threaded subtree rooted at the cgroup
1444.IR x :
1445.IP * 3
1446There can be no member processes in the descendant cgroups of
1447.IR x .
1448(The cgroup
1449.I x
1450can itself have member processes.)
1451.IP *
1452No domain controllers may be enabled in
1453.IR x 's
1454.IR cgroup.subtree_control
1455file.
c8902e25
MK
1456.PP
1457If any of the above constraints is violated, then an attempt to write
1458.IR """threaded"""
1459to a
1460.IR cgroup.type
1461file fails with the error
1462.BR ENOTSUP .
1463.\"
1464.SS The """domain threaded""" cgroup type
1465According to the pathways described above,
1466the type of a cgroup can change to
1467.IR "domain threaded"
1468in either of the following cases:
1469.IP * 3
1470The string
1471.IR """threaded"""
1472is written to a child cgroup.
1473.IP *
1474A threaded controller is enabled inside the cgroup and
1475a process is made a member of the cgroup.
1476.PP
1477A
1478.IR "domain threaded"
1479cgroup,
1480.IR x ,
1481can revert to the type
1482.IR domain
1483if the above conditions no longer hold true\(emthat is, if all
1484.I threaded
1485child cgroups of
1486.I x
1487are removed and either
1488.I x
1489no longer has threaded controllers enabled or
1490no longer has member processes.
1491.PP
1492When a
1493.IR "domain threaded"
1494cgroup
1495.IR x
1496reverts to the type
1497.IR domain :
1498.IP * 3
1499All
1500.IR "domain invalid"
1501descendants of
1502.I x
1503that are not in lower-level threaded subtrees revert to the type
1504.IR domain .
1505.IP *
1506The root cgroups in any lower-level threaded subtrees revert to the type
1507.IR "domain threaded" .
1508.\"
1509.SS Exceptions for the root cgroup
1510The root cgroup of the v2 hierarchy is treated exceptionally:
1511it can be the parent of both
1512.I domain
1513and
1514.I threaded
1515cgroups.
1516If the string
1517.I """threaded"""
1518is written to the
1519.I cgroup.type
1520file of one of the children of the root cgroup, then
1521.IP * 3
1522The type of that cgroup becomes
1523.IR threaded .
1524.IP *
1525The type of any descendants of that cgroup that
1526are not part of lower-level threaded subtrees changes to
1527.IR "domain invalid" .
1528.PP
1529Note that in this case, there is no cgroup whose type becomes
1530.IR "domain threaded" .
1531(Notionally, the root cgroup can be considered as the threaded root
1532for the cgroup whose type was changed to
1533.IR threaded .)
1534.PP
1535The aim of this exceptional treatment for the root cgroup is to
1536allow a threaded cgroup that employs the
1537.I cpu
1538controller to be placed as high as possible in the hierarchy,
1539so as to minimize the (small) cost of traversing the cgroup hierarchy.
1540.\"
edc90967 1541.SS The cgroups v2 """cpu""" controller and realtime threads
c8902e25
MK
1542As at Linux 4.15, the cgroups v2
1543.I cpu
0bef253e
MK
1544controller does not support control of realtime threads
1545(specifically threads scheduled under any of the policies
1546.BR SCHED_FIFO ,
1547.BR SCHED_RR ,
1548described
1549.BR SCHED_DEADLINE ;
1550see
1551.BR sched (7)).
1552Therefore, the
1553.I cpu
1554controller can be enabled in the root cgroup only
c8902e25 1555if all realtime threads are in the root cgroup.
edc90967 1556(If there are realtime threads in nonroot cgroups, then a
c8902e25
MK
1557.BR write (2)
1558of the string
1559.IR """+cpu"""
1560to the
1561.I cgroup.subtree_control
1562file fails with the error
c2df7694 1563.BR EINVAL .)
c8902e25
MK
1564However, on some systems,
1565.BR systemd (1)
edc90967 1566places certain realtime threads in nonroot cgroups in the v2 hierarchy.
c8902e25 1567On such systems,
edc90967 1568these threads must first be moved to the root cgroup before the
c8902e25
MK
1569.I cpu
1570controller can be enabled.
1571.\"
1572.SH ERRORS
1573The following errors can occur for
1574.BR mount (2):
1575.TP
1576.B EBUSY
1577An attempt to mount a cgroup version 1 filesystem specified neither the
1578.I name=
1579option (to mount a named hierarchy) nor a controller name (or
1580.IR all ).
1581.SH NOTES
1582A child process created via
1583.BR fork (2)
1584inherits its parent's cgroup memberships.
1585A process's cgroup memberships are preserved across
1586.BR execve (2).
1587.\"
5c2181ad
MK
1588.SS /proc files
1589.TP
34eb3340 1590.IR /proc/cgroups " (since Linux 2.6.24)"
92bb6d36 1591This file contains information about the controllers
1a4f7d59 1592that are compiled into the kernel.
34eb3340
MK
1593An example of the contents of this file (reformatted for readability)
1594is the following:
a721e8b2 1595.IP
34eb3340 1596.in +4n
b8302363 1597.EX
4580c2f6
MK
1598#subsys_name hierarchy num_cgroups enabled
1599cpuset 4 1 1
1600cpu 8 1 1
1601cpuacct 8 1 1
1602blkio 6 1 1
1603memory 3 1 1
1604devices 10 84 1
1605freezer 7 1 1
1606net_cls 9 1 1
1607perf_event 5 1 1
1608net_prio 9 1 1
1609hugetlb 0 1 0
1610pids 2 1 1
b8302363 1611.EE
e646a1ba 1612.in
a721e8b2 1613.IP
34eb3340
MK
1614The fields in this file are, from left to right:
1615.RS
1616.IP 1. 3
1617The name of the controller.
1618.IP 2.
92bb6d36 1619The unique ID of the cgroup hierarchy on which this controller is mounted.
11c0797f 1620If multiple cgroups v1 controllers are bound to the same hierarchy,
34eb3340 1621then each will show the same hierarchy ID in this field.
92bb6d36
MK
1622The value in this field will be 0 if:
1623.RS 5
1624.IP a) 3
1625the controller is not mounted on a cgroups v1 hierarchy;
1626.IP b)
1627the controller is bound to the cgroups v2 single unified hierarchy; or
1628.IP c)
1629the controller is disabled (see below).
1630.RE
34eb3340
MK
1631.IP 3.
1632The number of control groups in this hierarchy using this controller.
1633.IP 4.
1634This field contains the value 1 if this controller is enabled,
1635or 0 if it has been disabled (via the
1636.IR cgroup_disable
1637kernel command-line boot parameter).
1638.RE
1639.TP
5c2181ad 1640.IR /proc/[pid]/cgroup " (since Linux 2.6.24)"
f5faa016
MK
1641This file describes control groups to which the process
1642with the corresponding PID belongs.
5f8a7eb2 1643The displayed information differs for
2c4fbe35 1644cgroups version 1 and version 2 hierarchies.
a721e8b2 1645.IP
5f8a7eb2 1646For each cgroup hierarchy of which the process is a member,
2e33b59e 1647there is one entry containing three colon-separated fields:
a721e8b2 1648.IP
4769a778
MK
1649.in +4n
1650.EX
1651hierarchy-ID:controller-list:cgroup-path
1652.EE
1653.in
a721e8b2 1654.IP
5f8a7eb2 1655For example:
c1a022dc
MK
1656.IP
1657.in +4n
1658.EX
16595:cpuacct,cpu,cpuset:/daemons
1660.EE
1661.in
5c2181ad
MK
1662.IP
1663The colon-separated fields are, from left to right:
5f8a7eb2 1664.RS
5c2181ad 1665.IP 1. 3
5f8a7eb2
MK
1666For cgroups version 1 hierarchies,
1667this field contains a unique hierarchy ID number
1668that can be matched to a hierarchy ID in
1669.IR /proc/cgroups .
1670For the cgroups version 2 hierarchy, this field contains the value 0.
5c2181ad 1671.IP 2.
5f8a7eb2 1672For cgroups version 1 hierarchies,
55f52de8 1673this field contains a comma-separated list of the controllers
5f8a7eb2
MK
1674bound to the hierarchy.
1675For the cgroups version 2 hierarchy, this field is empty.
5c2181ad 1676.IP 3.
5f8a7eb2
MK
1677This field contains the pathname of the control group in the hierarchy
1678to which the process belongs.
1679This pathname is relative to the mount point of the hierarchy.
5c2181ad 1680.RE
668ef765
MK
1681.\"
1682.SS /sys/kernel/cgroup files
1683.TP
1684.IR /sys/kernel/cgroup/delegate " (since Linux 4.15)"
1685.\" commit 01ee6cfb1483fe57c9cbd8e73817dfbf9bacffd3
1686This file exports a list of the cgroups v2 files
1687(one per line) that are delegatable
1688(i.e., whose ownership should be changed to the user ID of the delegatee).
1689In the future, the set of delegatable files may change or grow,
1690and this file provides a way for the kernel to inform
1691user-space applications of which files must be delegated.
1692As at Linux 4.15, one sees the following when inspecting this file:
1693.IP
1694.EX
1695.in +4n
1696$ \fBcat /sys/kernel/cgroup/delegate\fP
1697cgroup.procs
1698cgroup.subtree_control
c7913617 1699cgroup.threads
668ef765
MK
1700.in
1701.EE
6413d784
MK
1702.TP
1703.IR /sys/kernel/cgroup/features " (since Linux 4.15)"
1704.\" commit 5f2e673405b742be64e7c3604ed4ed3ac14f35ce
1705Over time, the set of cgroups v2 features that are provided by the
1706kernel may change or grow,
1707or some features may not be enabled by default.
1708This file provides a way for user-space applications to discover what
fcf115f5 1709features the running kernel supports and has enabled.
6413d784
MK
1710Features are listed one per line:
1711.IP
1712.in +4n
1713.EX
6413d784
MK
1714$ \fBcat /sys/kernel/cgroup/features\fP
1715nsdelegate
2e69ff53 1716.EE
6413d784
MK
1717.in
1718.IP
1719The entries that can appear in this file are:
1720.RS
1721.TP
1722.IR nsdelegate " (since Linux 4.15)"
1723The kernel supports the
1724.I nsdelegate
1725mount option.
1726.RE
bbfdf727 1727.SH SEE ALSO
ebbc83be 1728.BR prlimit (1),
f60a5da2 1729.BR systemd (1),
edc2a022
MK
1730.BR systemd-cgls (1),
1731.BR systemd-cgtop (1),
325b7eb0 1732.BR clone (2),
ebbc83be
MK
1733.BR ioprio_set (2),
1734.BR perf_event_open (2),
1735.BR setrlimit (2),
cff6de30 1736.BR cgroup_namespaces (7),
69c47536 1737.BR cpuset (7),
ebbc83be
MK
1738.BR namespaces (7),
1739.BR sched (7),
1740.BR user_namespaces (7)