]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/cgroups.7
ldd.1, localedef.1, add_key.2, chroot.2, clone.2, fork.2, futex.2, get_mempolicy...
[thirdparty/man-pages.git] / man7 / cgroups.7
1 .\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
2 .\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
3 .\"
4 .\" %%%LICENSE_START(VERBATIM)
5 .\" Permission is granted to make and distribute verbatim copies of this
6 .\" manual provided the copyright notice and this permission notice are
7 .\" preserved on all copies.
8 .\"
9 .\" Permission is granted to copy and distribute modified versions of this
10 .\" manual under the conditions for verbatim copying, provided that the
11 .\" entire resulting derived work is distributed under the terms of a
12 .\" permission notice identical to this one.
13 .\"
14 .\" Since the Linux kernel and libraries are constantly changing, this
15 .\" manual page may be incorrect or out-of-date. The author(s) assume no
16 .\" responsibility for errors or omissions, or for damages resulting from
17 .\" the use of the information contained herein. The author(s) may not
18 .\" have taken the same level of care in the production of this manual,
19 .\" which is licensed free of charge, as they might when working
20 .\" professionally.
21 .\"
22 .\" Formatted or processed versions of this manual, if unaccompanied by
23 .\" the source, must acknowledge the copyright and authors of this work.
24 .\" %%%LICENSE_END
25 .\"
26 .TH CGROUPS 7 2016-07-17 "Linux" "Linux Programmer's Manual"
27 .SH NAME
28 cgroups \- Linux control groups
29 .SH DESCRIPTION
30 Control cgroups, usually referred to as cgroups,
31 are a Linux kernel feature which provides for grouping of tasks and
32 resource tracking and limitations for those groups.
33 While several systems have been introduced to help in configuring and
34 managing cgroups, the kernel's cgroup interface is provided through
35 a pseudo-filesystem called cgroupfs.
36 Task grouping is implemented in the core cgroup kernel code,
37 while resource tracking and limits are implemented in
38 a set of per-resource-type subsystems (memory, CPU, and so on) which may be
39 enabled as separate hierarchies, or joined into comounted hierarchies.
40
41 Each hierarchy constitutes a separate mount of the cgroup filesystem,
42 with the subsystems enabled in that hierarchy listed in the mount options.
43 For each mounted hierarchy,
44 the directory tree mirrors the control group hierarchy.
45 Each control group is represented by a directory, with each of its child
46 control cgroups represented as a child directory.
47 For instance,
48 .IR /user/joe/1.session
49 represents control group
50 .IR 1.session ,
51 which is a child of cgroup
52 .IR joe ,
53 which is a child of
54 .IR /user .
55 Under each cgroup directory is a set of files which can be read or
56 written to, reflecting resource limits and a few general cgroup
57 properties.
58
59 In general, cgroup limits are hierarchical, meaning that the limits placed on
60 .IR /user/joe
61 cannot be exceeded by
62 .IR /usr/joe/1.session .
63 There are currently exceptions to this rule,
64 but stricter adherence is a goal as cgroups are being largely reworked.
65
66 In addition, cgroups can be mounted with no bound subsystem, in which case
67 they serve only to track processes.
68 An example of this is the
69 .I name=systemd
70 cgroup which is used by
71 .BR systemd (1)
72 to track services and user sessions.
73 .\"
74 .SS Terminology
75 A
76 .I cgroup
77 is a collection of processes that are bound to a set of
78 limits or parameters defined via the cgroup filesystem.
79
80 A
81 .I subsystem
82 is a kernel component that modifies the behavior of
83 the processes in a cgroup.
84 Various subsystems have been implemented, making it possible to do things
85 such as limiting the amount of CPU time and memory available to a cgroup,
86 accounting for the CPU time used by a cgroup,
87 and freezing and resuming execution of the processes in a cgroup.
88 Subsystems are sometimes also known as
89 .IR "resource controllers"
90 (or simply, controllers).
91
92 The cgroups for a subsystem are arranged in a
93 .IR hierarchy .
94 This hierarchy is defined by creating, removing, and
95 renaming subdirectories within the cgroup filesystem.
96 At each level of the hierarchy, attributes (e.g., limits) can be defined;
97 these attributes may govern or propagate
98 to child cgroups and their descendants in the hierarchy.
99 .\"
100 .SS Cgroups version 1 and version 2
101 The initial release of the cgroups implementation was in Linux 2.6.24.
102 Over time, various cgroup subsystems have been added
103 to allow the management of various types of resources.
104 However, the development of these subsystems was largely uncoordinated,
105 with the result that many inconsistencies arose between subsystems
106 and management of the cgroup hierarchies became rather complex.
107 (A longer description of these problems can be found in
108 the kernel source file
109 .IR Documentation/cgroup\-v2.txt .)
110
111 Because of the problems with the initial cgroups implementation
112 (cgroups version 1),
113 starting in Linux 3.10, work began on a new,
114 orthogonal implementation to remedy these problems.
115 Initially marked experimental, and hidden behind the
116 .I "\-o\ __DEVEL__sane_behavior"
117 mount option, the new version (cgroups version 2)
118 was eventually made official with the release of Linux 4.5.
119 Differences between the two versions are described in the text below.
120
121 Although cgroups v2 is intended as a replacement for cgroups v1,
122 the older system continues to exist
123 (and for compatibility reasons is unlikely to be removed).
124 Currently, cgroups v2 implements only a subset of the controllers
125 available in cgroups v1.
126 The two systems are implemented so that both v1 controllers and
127 v2 controllers can be mounted on the same system.
128 Thus, for example, it is possible to use those controllers
129 that are supported under version 2,
130 while also using version 1 controllers
131 where version 2 does not yet support those controllers.
132 .\"
133 .SS Tasks versus processes
134 In cgroups v1, a distinction is drawn between
135 .I processes
136 and
137 .IR tasks .
138 In this view, a process can consist of multiple tasks
139 (more commonly called threads, from a user-space perspective).
140 In cgroups v1, it is possible independently manipulate
141 the cgroup memberships of the tasks in a process.
142 Because this ability caused certain problems,
143 .\" FIXME Add some text describing why this was a problem.
144 the ability to independently manipulate the cgroup memberships
145 of the tasks in a process has been removed in cgroups v2.
146 Cgroups v2 allows manipulation of cgroup membership only for processes
147 (which has the effect of changing the cgroup membership of
148 all tasks in the process).
149 .\"
150 .SS Mounting
151 To be available, a given cgroup subsystem must be compiled into the
152 kernel.
153 Since they are exposed through a virtual filesystem, subsystems
154 must be mounted before they can be controlled.
155 The usual place for this is under
156 .IR /sys/fs/cgroup .
157 If all the desired subsystems can be comounted,
158 then one can do so with the following command:
159
160 mount \-t cgroup \-o all cgroup /sys/fs/cgroup
161
162 (One can achieve the same result by omitting
163 .IR "\-o all" ,
164 since it is the default if subsystems are explicitly specified.)
165
166 If multiple, separately mounted subsystems are desired, then this is
167 usually done in per-subsystem subdirectories.
168 This requires first mounting a tmpfs under
169 .I /sys/fs/cgroup
170 so that subdirectories can be created.
171 For instance, one could mount
172 .IR cpu ,
173 .IR memory ,
174 and
175 .I devices
176 cgroups as follows:
177
178 .nf
179 .in +4n
180 mount \-t tmpfs \-o size=100000,mode=755 cgroups /sys/fs/cgroup
181 for s in cpu memory devices; do
182 mkdir /sys/fs/cgroup/$s
183 mount \-t cgroup \-o $s $s /sys/fs/cgroup/$s
184 done
185 .in
186 .fi
187
188 Comounting subsystems has the effect that a task is in the same cgroup for
189 all comounted subsystems.
190 Separately mounting subsystems allows a task to
191 be in cgroup
192 .I /foo1
193 for one subsystem while being in
194 .I /foo2/foo3
195 for another.
196 .\"
197 .SS Introspection
198 The list of subsystems compiled into the kernel can be seen in the file
199 .IR /proc/cgroups .
200 The file
201 .I /proc/pid/cgroup
202 lists the task's current cgroup
203 membership for each mounted hierarchy.
204 .\"
205 .SS Creating cgroups and moving tasks
206 The system begins with a single root cgroup (per hierarchy), '/', which all tasks belong to.
207 A new cgroup is created by creating a directory in the cgroup filesystem:
208
209 mkdir /sys/fs/cgroup/cpu/cg1
210
211 This creates a new empty cgroup.
212 Tasks may be moved to this cgroup by writing
213 their PIDs into the cgroup's
214 .I cgroup.procs
215 or
216 .I tasks
217 (deprecated)
218 file:
219
220 echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
221
222 The same file can be read to obtain a list of the processes currently in
223 .IR cg1 .
224 By using the
225 .I cgroup.procs
226 file instead of the
227 .I tasks
228 file, all tasks in the
229 thread group are moved into the new cgroup at once.
230
231 On
232 .BR fork (2),
233 the new child is created as a member of the parent's cgroup,
234 leading to implicit grouping of process hierarchies.
235
236 Note: in the upcoming unified hierarchy, a new restriction is imposed such
237 that tasks may exist only in leaf cgroups.
238 For instance, if cgroup
239 .I /cg1/cg2
240 exists, then a task may exist in
241 .IR /cg1/cg2 ,
242 but not in
243 .IR /cg1 .
244 This is to avoid the current ambiguity in the delegation of resources
245 between tasks in
246 .I /cg1
247 and its child cgroups.
248 The recommended workaround is to create a subdirectory called
249 .I leaf
250 for any non-leaf cgroup which should contain tasks, and make sure not to
251 create child cgroups of it.
252 In the above example, tasks which previously would have gone into
253 .I /cg1
254 would now go into
255 .IR /cg1/leaf .
256 This has the advantage of making explicit the relationship between tasks in
257 .I /cg1/leaf
258 and
259 .IR /cg1 's
260 other children.
261 .\"
262 .SS Removing cgroups
263 To remove a cgroup, it must first have no child cgroups and contain no tasks.
264 So long as that is the case, one can simply
265 remove the corresponding directory pathname.
266 Note that files in a cgroup directory cannot and need not be
267 removed.
268
269 A special file in each cgroup hierarchy,
270 .IR release_agent ,
271 can be used to register a program to handle cgroups which become newly empty.
272 The program will be called each time a cgroup marked for
273 autoremove becomes empty and childless.
274 The cgroup path will be provided as the first command-line argument.
275 The cgroup must be marked as eligible for autoremove by writing '1' into its
276 .IR notify_on_release
277 file;
278 this value is inherited by newly created child cgroups.
279
280 A new feature in cgroups v2 is the
281 .I cgroup.populated
282 file.
283 This reads 0 if there are no tasks in the cgroup or its descendants,
284 and 1 otherwise.
285 It can be watched for changes using
286 .BR inotify (7).
287 This allows user-space applications to efficiently watch cgroups
288 for autoremove conditions.
289 .\"
290 .SS Cgroups version 2
291 In cgroups v2,
292 all mounted controllers reside in a single unified hierarchy.
293 While (different) controllers may be simultaneously
294 mounted under the v1 and v2 hierarchies,
295 it is not possible to mount the same controller simultaneously
296 under both the v1 and the v2 hierarchies.
297
298 The new behaviors in cgroups v2 are summarized below:
299 .TP 3
300 1. Tasks only in leaf nodes
301 With the exception of the root cgroup, tasks may reside only in leaf nodes.
302 This avoids the need to decide how to partition resources between tasks which
303 are members of cgroup A and tasks in child cgroups of A.
304 .TP
305 2. Active cgroups must be specified
306 The unified hierarchy presents two new files,
307 .IR cgroup.controllers
308 and
309 .IR cgroup.subtree_control .
310 When a cgroup
311 .I A/b
312 is created, its
313 .IR cgroup.controllers
314 file contains the list of controllers which were active in its parent, A.
315 This is the list of controllers which are available to this cgroup.
316 No controllers are active until they are enabled through the
317 .IR cgroup.subtree_control
318 file, by writing the list of space-delimited names of the controllers,
319 each preceded by '+' (to enable) or '\-' (to disable).
320 If the
321 .I freezer
322 controller is not enabled in
323 .IR /A/B ,
324 then it cannot be enabled in
325 .IR /A/B/C .
326 .TP
327 3. No "tasks" or "cgroup.clone_children" files
328 .TP
329 4. Empty cgroup notification
330 A new file,
331 .IR cgroup.populated ,
332 under each cgroup contains '0' when the
333 cgroup is empty, and 1 when it is populated.
334 It therefore may be watched to detect when a cgroup becomes (non-)empty.
335 This replaces the original notify-on-release mechanism.
336
337 For more changes, please see the
338 .I Documentation/cgroups/unified-hierarchy
339 file in the kernel source.
340 .\"
341 .SS Cgroups version 1 subsystems
342 Each of the cgroups version 1 subsystems is governed
343 by a kernel configuration option (listed below).
344 Additionally, the availability of the cgroups feature is governed by the
345 .BR CONFIG_CGROUPS
346 kernel configuration option.
347 .TP
348 .IR cpu " (since Linux 2.6.24; " \fBCONFIG_CGROUP_SCHED\fP )
349 Cgroups can be guaranteed a minimum number of "CPU shares"
350 when a system is busy.
351 This does not limit a cgroup's CPU usage if the CPUs are not busy.
352
353 Further information can be found in the kernel source file
354 .IR Documentation/scheduler/sched\-bwc.txt .
355 .TP
356 .IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
357 This provides accounting for CPU usage by groups of tasks.
358
359 Further information can be found in the kernel source file
360 .IR Documentation/cgroup\-v1/cpuacct.txt .
361 .TP
362 .IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
363 This cgroup can be used to bind the tasks in a cgroup to
364 a specified set of CPUs and NUMA nodes.
365
366 Further information can be found in the kernel source file
367 .IR Documentation/cgroup\-v1/cpusets.txt .
368 .TP
369 .IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
370 The memory controller supports reporting and limiting of process memory, kernel
371 memory, and swap used by cgroups.
372
373 Further information can be found in the kernel source file
374 .IR Documentation/cgroup\-v1/memory.txt .
375 .TP
376 .IR devices " (since Linux 2.6.26; " \fBCONFIG_CGROUP_DEVICE\fP )
377 This supports controlling which tasks may create (mknod) devices as
378 well as open them for reading or writing.
379 The policies may be specified as whitelists and blacklists.
380 Hierarchy is enforced, so new rules must not
381 violate existing rules for the target or ancestor cgroups.
382
383 Further information can be found in the kernel source file
384 .IR Documentation/cgroup-v1/devices.txt .
385 .TP
386 .IR freezer " (since Linux 2.6.28; " \fBCONFIG_CGROUP_FREEZER\fP )
387 The
388 .IR freezer
389 cgroup can suspend and restore (resume) all tasks in a cgroup.
390 Freezing a cgroup
391 .I /A
392 also causes its children, for example, tasks in
393 .IR /A/B ,
394 to be frozen.
395
396 Further information can be found in the kernel source file
397 .IR Documentation/cgroup-v1/freezer-subsystem.txt .
398 .TP
399 .IR net_cls " (since Linux 2.6.29; " \fBCONFIG_CGROUP_NET_CLASSID\fP )
400 This places a classid, specified for the cgroup, on network packets
401 created by a cgroup.
402 These classids can then be used in firewall rules,
403 as well as used to shape traffic using
404 .BR tc (8).
405 This applies only to packets
406 leaving the cgroup, not to traffic arriving at the cgroup.
407
408 Further information can be found in the kernel source file
409 .IR Documentation/cgroup-v1/net_cls.txt .
410 .TP
411 .IR blkio " (since Linux 2.6.33; " \fBCONFIG_BLK_CGROUP\fP )
412 The
413 .I blkio
414 cgroup controls and limits access to specified block devices by
415 applying IO control in the form of throttling and upper limits against leaf
416 nodes and intermediate nodes in the storage hierarchy.
417
418 Two policies are available.
419 The first is a proportional-weight time-based division
420 of disk implemented with CFQ.
421 This is in effect for leaf nodes using CFQ.
422 The second is a throttling policy which specifies
423 upper I/O rate limits on a device.
424
425 Further information can be found in the kernel source file
426 .IR Documentation/cgroup-v1/blkio-controller.txt .
427 .TP
428 .IR perf_event " (since Linux 2.6.39; " \fBCONFIG_CGROUP_PERF\fP )
429 This controller allows
430 .I perf
431 monitoring of the set of processes grouped in a cgroup.
432
433 Further information can be found in the kernel source file
434 .IR Documentation/perf-record.txt .
435 .TP
436 .IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
437 This allows priorities to be specified, per network interface, for cgroups.
438
439 Further information can be found in the kernel source file
440 .IR Documentation/cgroup-v1/net_prio.txt .
441 .TP
442 .IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
443 This supports limiting the use of huge pages by cgroups.
444
445 Further information can be found in the kernel source file
446 .IR Documentation/cgroup-v1/hugetlb.txt .
447 .TP
448 .IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
449 This controller permits limiting the number of process that may be created
450 in a cgroup (and its descendants).
451
452 Further information can be found in the kernel source file
453 .IR Documentation/cgroup-v1/pids.txt .
454 .SS /proc files
455 .TP
456 .IR /proc/cgroups " (since Linux 2.6.24)"
457 This file contains information about the controllers
458 that are available on the system.
459 An example of the contents of this file (reformatted for readability)
460 is the following:
461
462 .nf
463 .in +4n
464 #subsys_name hierarchy num_cgroups enabled
465 cpuset 4 1 1
466 cpu 8 1 1
467 cpuacct 8 1 1
468 blkio 6 1 1
469 memory 3 1 1
470 devices 10 84 1
471 freezer 7 1 1
472 net_cls 9 1 1
473 perf_event 5 1 1
474 net_prio 9 1 1
475 hugetlb 0 1 0
476 pids 2 1 1
477 .in
478 .fi
479
480 The fields in this file are, from left to right:
481 .RS
482 .IP 1. 3
483 The name of the controller.
484 .IP 2.
485 The unique ID of the cgroup hierarchy on which this controller is mounted.
486 If multiple cgroups v1 controllers are bound to the same hierarchy,
487 then each will show the same hierarchy ID in this field.
488 The value in this field will be 0 if:
489 .RS 5
490 .IP a) 3
491 the controller is not mounted on a cgroups v1 hierarchy;
492 .IP b)
493 the controller is bound to the cgroups v2 single unified hierarchy; or
494 .IP c)
495 the controller is disabled (see below).
496 .RE
497 .IP 3.
498 The number of control groups in this hierarchy using this controller.
499 .IP 4.
500 This field contains the value 1 if this controller is enabled,
501 or 0 if it has been disabled (via the
502 .IR cgroup_disable
503 kernel command-line boot parameter).
504 .RE
505 .TP
506 .IR /proc/[pid]/cgroup " (since Linux 2.6.24)"
507 This file describes control groups to which the process
508 with the corresponding PID belongs.
509 The displayed information differs for
510 cgroups version 1 and version 2 hierarchies.
511
512 For each cgroup hierarchy of which the process is a member,
513 there is one entry containing three
514 colon-separated fields of the form:
515
516 hierarchy-ID:subsystem-list:cgroup-path
517
518 For example:
519 .nf
520 .ft CW
521
522 5:cpuacct,cpu,cpuset:/daemons
523 .ft
524 .fi
525 .IP
526 The colon-separated fields are, from left to right:
527 .RS
528 .IP 1. 3
529 For cgroups version 1 hierarchies,
530 this field contains a unique hierarchy ID number
531 that can be matched to a hierarchy ID in
532 .IR /proc/cgroups .
533 For the cgroups version 2 hierarchy, this field contains the value 0.
534 .IP 2.
535 For cgroups version 1 hierarchies,
536 this field contains a comma-separated list of the subsystems
537 bound to the hierarchy.
538 For the cgroups version 2 hierarchy, this field is empty.
539 .IP 3.
540 This field contains the pathname of the control group in the hierarchy
541 to which the process belongs.
542 This pathname is relative to the mount point of the hierarchy.
543 .RE
544 .SH ERRORS
545 The following errors can occur for
546 .BR mount (2):
547 .TP
548 .B EBUSY
549 An attempt to mount a cgroup version 1 filesystem specified neither the
550 .I name=
551 option (to mount a named hierarchy) nor a controller name (or
552 .IR all )
553 .SH SEE ALSO
554 .BR prlimit (1),
555 .BR systemd (1),
556 .BR clone (2),
557 .BR ioprio_set (2),
558 .BR perf_event_open (2),
559 .BR setrlimit (2),
560 .BR cgroup_namespaces (7),
561 .BR cpuset (7),
562 .BR namespaces (7),
563 .BR sched (7),
564 .BR user_namespaces (7)