]> git.ipfire.org Git - thirdparty/systemd.git/blob - man/systemd.resource-control.xml
Merge pull request #32677 from keszybz/wording-fixes
[thirdparty/systemd.git] / man / systemd.resource-control.xml
1 <?xml version='1.0'?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
4 <!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
5
6 <refentry id="systemd.resource-control" xmlns:xi="http://www.w3.org/2001/XInclude">
7 <refentryinfo>
8 <title>systemd.resource-control</title>
9 <productname>systemd</productname>
10 </refentryinfo>
11
12 <refmeta>
13 <refentrytitle>systemd.resource-control</refentrytitle>
14 <manvolnum>5</manvolnum>
15 </refmeta>
16
17 <refnamediv>
18 <refname>systemd.resource-control</refname>
19 <refpurpose>Resource control unit settings</refpurpose>
20 </refnamediv>
21
22 <refsynopsisdiv>
23 <para>
24 <filename><replaceable>slice</replaceable>.slice</filename>,
25 <filename><replaceable>scope</replaceable>.scope</filename>,
26 <filename><replaceable>service</replaceable>.service</filename>,
27 <filename><replaceable>socket</replaceable>.socket</filename>,
28 <filename><replaceable>mount</replaceable>.mount</filename>,
29 <filename><replaceable>swap</replaceable>.swap</filename>
30 </para>
31 </refsynopsisdiv>
32
33 <refsect1>
34 <title>Description</title>
35
36 <para>Unit configuration files for services, slices, scopes, sockets, mount points, and swap devices share a subset
37 of configuration options for resource control of spawned processes. Internally, this relies on the Linux Control
38 Groups (cgroups) kernel concept for organizing processes in a hierarchical tree of named groups for the purpose of
39 resource management.</para>
40
41 <para>This man page lists the configuration options shared by
42 those six unit types. See
43 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>
44 for the common options of all unit configuration files, and
45 <citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
46 <citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
47 <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
48 <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
49 <citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
50 and
51 <citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry>
52 for more information on the specific unit configuration files. The
53 resource control configuration options are configured in the
54 [Slice], [Scope], [Service], [Socket], [Mount], or [Swap]
55 sections, depending on the unit type.</para>
56
57 <para>In addition, options which control resources available to programs
58 <emphasis>executed</emphasis> by systemd are listed in
59 <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
60 Those options complement options listed here.</para>
61
62 <refsect2>
63 <title>Enabling and disabling controllers</title>
64
65 <para>Controllers in the cgroup hierarchy are hierarchical, and resource control is realized by
66 distributing resource assignments between siblings in branches of the cgroup hierarchy. There is no
67 need to explicitly <emphasis>enable</emphasis> a cgroup controller for a unit.
68 <command>systemd</command> will instruct the kernel to enable a controller for a given unit when this
69 unit has configuration for a given controller. For example, when <varname>CPUWeight=</varname> is set,
70 the <option>cpu</option> controller will be enabled, and when <varname>TasksMax=</varname> are set, the
71 <option>pids</option> controller will be enabled. In addition, various controllers may be also be
72 enabled explicitly via the
73 <varname>MemoryAccounting=</varname>/<varname>TasksAccounting=</varname>/<varname>IOAccounting=</varname>
74 settings. Because of how the cgroup hierarchy works, controllers will be automatically enabled for all
75 parent units and for any sibling units starting with the lowest level at which a controller is enabled.
76 Units for which a controller is enabled may be subject to resource control even if they don't have any
77 explicit configuration.</para>
78
79 <para>Setting <varname>Delegate=</varname> enables any delegated controllers for that unit (see below).
80 The delegatee may then enable controllers for its children as appropriate. In particular, if the
81 delegatee is <command>systemd</command> (in the <filename>user@.service</filename> unit), it will
82 repeat the same logic as the system instance and enable controllers for user units which have resource
83 limits configured, and their siblings and parents and parents' siblings.</para>
84
85 <para>Controllers may be <emphasis>disabled</emphasis> for parts of the cgroup hierarchy with
86 <varname>DisableControllers=</varname> (see below).</para>
87
88 <example>
89 <title>Enabling and disabling controllers</title>
90
91 <programlisting>
92 -.slice
93 / \
94 /-----/ \--------------\
95 / \
96 system.slice user.slice
97 / \ / \
98 / \ / \
99 / \ user@42.service user@1000.service
100 / \ Delegate= Delegate=yes
101 a.service b.slice / \
102 CPUWeight=20 DisableControllers=cpu / \
103 / \ app.slice session.slice
104 / \ CPUWeight=100 CPUWeight=100
105 / \
106 b1.service b2.service
107 CPUWeight=1000
108 </programlisting>
109
110 <para>In this hierarchy, the <option>cpu</option> controller is enabled for all units shown except
111 <filename>b1.service</filename> and <filename>b2.service</filename>. Because there is no explicit
112 configuration for <filename>system.slice</filename> and <filename>user.slice</filename>, CPU
113 resources will be split equally between them. Similarly, resources are allocated equally between
114 children of <filename>user.slice</filename> and between the child slices beneath
115 <filename>user@1000.service</filename>. Assuming that there is no further configuration of resources
116 or delegation below slices <filename>app.slice</filename> or <filename>session.slice</filename>, the
117 <option>cpu</option> controller would not be enabled for units in those slices and CPU resources
118 would be further allocated using other mechanisms, e.g. based on nice levels. The manager for user
119 42 has delegation enabled without any controllers, i.e. it can manipulate its subtree of the cgroup
120 hierarchy, but without resource control.</para>
121
122 <para>In the slice <filename>system.slice</filename>, CPU resources are split 1:6 for service
123 <filename>a.service</filename>, and 5:6 for slice <filename>b.slice</filename>, because slice
124 <filename>b.slice</filename> gets the default value of 100 for <filename>cpu.weight</filename> when
125 <varname>CPUWeight=</varname> is not set.</para>
126
127 <para><varname>CPUWeight=</varname> setting in service <filename>b2.service</filename> is neutralized
128 by <varname>DisableControllers=</varname> in slice <filename>b.slice</filename>, so the
129 <option>cpu</option> controller would not be enabled for services <filename>b1.service</filename> and
130 <filename>b2.service</filename>, and CPU resources would be further allocated using other mechanisms,
131 e.g. based on nice levels.</para>
132 </example>
133 </refsect2>
134
135 <refsect2>
136 <title>Setting resource controls for a group of related units</title>
137
138 <para>As described in
139 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>, the
140 settings listed here may be set through the main file of a unit and drop-in snippets in
141 <filename index="false">*.d/</filename> directories. The list of directories searched for drop-ins
142 includes names formed by repeatedly truncating the unit name after all dashes. This is particularly
143 convenient to set resource limits for a group of units with similar names.</para>
144
145 <para>For example, every user gets their own slice
146 <filename>user-<replaceable>nnn</replaceable>.slice</filename>. Drop-ins with local configuration that
147 affect user 1000 may be placed in
148 <filename index="false">/etc/systemd/system/user-1000.slice</filename>,
149 <filename index="false">/etc/systemd/system/user-1000.slice.d/*.conf</filename>, but also
150 <filename index="false">/etc/systemd/system/user-.slice.d/*.conf</filename>. This last directory
151 applies to all user slices.</para>
152 </refsect2>
153
154 <refsect2>
155 <title/>
156 <para>See the <ulink
157 url="https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface">New
158 Control Group Interfaces</ulink> for an introduction on how to make
159 use of resource control APIs from programs.</para>
160 </refsect2>
161 </refsect1>
162
163 <refsect1>
164 <title>Implicit Dependencies</title>
165
166 <para>The following dependencies are implicitly added:</para>
167
168 <itemizedlist>
169 <listitem><para>Units with the <varname>Slice=</varname> setting set automatically acquire
170 <varname>Requires=</varname> and <varname>After=</varname> dependencies on the specified
171 slice unit.</para></listitem>
172 </itemizedlist>
173 </refsect1>
174
175 <!-- We don't have any default dependency here. -->
176
177 <refsect1>
178 <title>Options</title>
179
180 <para>Units of the types listed above can have settings for resource control configuration:</para>
181
182 <refsect2><title>CPU Accounting and Control</title>
183
184 <variablelist class='unit-directives'>
185
186 <varlistentry>
187 <term><varname>CPUAccounting=</varname></term>
188
189 <listitem>
190 <para>Turn on CPU usage accounting for this unit. Takes a
191 boolean argument. Note that turning on CPU accounting for
192 one unit will also implicitly turn it on for all units
193 contained in the same slice and for all its parent slices
194 and the units contained therein. The system default for this
195 setting may be controlled with
196 <varname>DefaultCPUAccounting=</varname> in
197 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
198
199 <para>Under the unified cgroup hierarchy, CPU accounting is available for all units and this
200 setting has no effect.</para>
201
202 <xi:include href="version-info.xml" xpointer="v208"/>
203 </listitem>
204 </varlistentry>
205
206 <varlistentry>
207 <term><varname>CPUWeight=<replaceable>weight</replaceable></varname></term>
208 <term><varname>StartupCPUWeight=<replaceable>weight</replaceable></varname></term>
209
210 <listitem>
211 <para>These settings control the <option>cpu</option> controller in the unified hierarchy.</para>
212
213 <para>These options accept an integer value or a the special string "idle":</para>
214 <itemizedlist>
215 <listitem>
216 <para>If set to an integer value, assign the specified CPU time weight to the processes
217 executed, if the unified control group hierarchy is used on the system. These options control
218 the <literal>cpu.weight</literal> control group attribute. The allowed range is 1 to 10000.
219 Defaults to unset, but the kernel default is 100. For details about this control group
220 attribute, see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups
221 v2</ulink> and <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS
222 Scheduler</ulink>. The available CPU time is split up among all units within one slice
223 relative to their CPU time weight. A higher weight means more CPU time, a lower weight means
224 less.</para>
225 </listitem>
226 <listitem>
227 <para>If set to the special string "idle", mark the cgroup for "idle scheduling", which means
228 that it will get CPU resources only when there are no processes not marked in this way to execute in this
229 cgroup or its siblings. This setting corresponds to the <literal>cpu.idle</literal> cgroup attribute.</para>
230
231 <para>Note that this value only has an effect on cgroup-v2, for cgroup-v1 it is equivalent to the minimum weight.</para>
232 </listitem>
233 </itemizedlist>
234
235 <para>While <varname>StartupCPUWeight=</varname> applies to the startup and shutdown phases of the system,
236 <varname>CPUWeight=</varname> applies to normal runtime of the system, and if the former is not set also to
237 the startup and shutdown phases. Using <varname>StartupCPUWeight=</varname> allows prioritizing specific services at
238 boot-up and shutdown differently than during normal runtime.</para>
239
240 <para>In addition to the resource allocation performed by the <option>cpu</option> controller, the
241 kernel may automatically divide resources based on session-id grouping, see "The autogroup feature"
242 in <citerefentry
243 project='man-pages'><refentrytitle>sched</refentrytitle><manvolnum>7</manvolnum></citerefentry>.
244 The effect of this feature is similar to the <option>cpu</option> controller with no explicit
245 configuration, so users should be careful to not mistake one for the other.</para>
246
247 <xi:include href="version-info.xml" xpointer="v232"/>
248 </listitem>
249 </varlistentry>
250
251 <varlistentry>
252 <term><varname>CPUQuota=</varname></term>
253
254 <listitem>
255 <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para>
256
257 <para>Assign the specified CPU time quota to the processes executed. Takes a percentage value, suffixed with
258 "%". The percentage specifies how much CPU time the unit shall get at maximum, relative to the total CPU time
259 available on one CPU. Use values &gt; 100% for allotting CPU time on more than one CPU. This controls the
260 <literal>cpu.max</literal> attribute on the unified control group hierarchy and
261 <literal>cpu.cfs_quota_us</literal> on legacy. For details about these control group attributes, see <ulink
262 url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and <ulink
263 url="https://docs.kernel.org/scheduler/sched-bwc.html">CFS Bandwidth Control</ulink>.
264 Setting <varname>CPUQuota=</varname> to an empty value unsets the quota.</para>
265
266 <para>Example: <varname>CPUQuota=20%</varname> ensures that the executed processes will never get more than
267 20% CPU time on one CPU.</para>
268
269 <xi:include href="version-info.xml" xpointer="v213"/>
270
271 </listitem>
272 </varlistentry>
273
274 <varlistentry>
275 <term><varname>CPUQuotaPeriodSec=</varname></term>
276
277 <listitem>
278 <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para>
279
280 <para>Assign the duration over which the CPU time quota specified by <varname>CPUQuota=</varname> is measured.
281 Takes a time duration value in seconds, with an optional suffix such as "ms" for milliseconds (or "s" for seconds.)
282 The default setting is 100ms. The period is clamped to the range supported by the kernel, which is [1ms, 1000ms].
283 Additionally, the period is adjusted up so that the quota interval is also at least 1ms.
284 Setting <varname>CPUQuotaPeriodSec=</varname> to an empty value resets it to the default.</para>
285
286 <para>This controls the second field of <literal>cpu.max</literal> attribute on the unified control group hierarchy
287 and <literal>cpu.cfs_period_us</literal> on legacy. For details about these control group attributes, see
288 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and
289 <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS Scheduler</ulink>.</para>
290
291 <para>Example: <varname>CPUQuotaPeriodSec=10ms</varname> to request that the CPU quota is measured in periods of 10ms.</para>
292
293 <xi:include href="version-info.xml" xpointer="v242"/>
294 </listitem>
295 </varlistentry>
296
297 <varlistentry>
298 <term><varname>AllowedCPUs=</varname></term>
299 <term><varname>StartupAllowedCPUs=</varname></term>
300
301 <listitem>
302 <para>This setting controls the <option>cpuset</option> controller in the unified hierarchy.</para>
303
304 <para>Restrict processes to be executed on specific CPUs. Takes a list of CPU indices or ranges separated by either
305 whitespace or commas. CPU ranges are specified by the lower and upper CPU indices separated by a dash.</para>
306
307 <para>Setting <varname>AllowedCPUs=</varname> or <varname>StartupAllowedCPUs=</varname> doesn't guarantee that all
308 of the CPUs will be used by the processes as it may be limited by parent units. The effective configuration is
309 reported as <varname>EffectiveCPUs=</varname>.</para>
310
311 <para>While <varname>StartupAllowedCPUs=</varname> applies to the startup and shutdown phases of the system,
312 <varname>AllowedCPUs=</varname> applies to normal runtime of the system, and if the former is not set also to
313 the startup and shutdown phases. Using <varname>StartupAllowedCPUs=</varname> allows prioritizing specific services at
314 boot-up and shutdown differently than during normal runtime.</para>
315
316 <para>This setting is supported only with the unified control group hierarchy.</para>
317
318 <xi:include href="version-info.xml" xpointer="v244"/>
319 </listitem>
320 </varlistentry>
321
322 </variablelist>
323
324 </refsect2><refsect2><title>Memory Accounting and Control</title>
325
326 <variablelist class='unit-directives'>
327
328 <varlistentry>
329 <term><varname>MemoryAccounting=</varname></term>
330
331 <listitem>
332 <para>This setting controls the <option>memory</option> controller in the unified hierarchy.</para>
333
334 <para>Turn on process and kernel memory accounting for this
335 unit. Takes a boolean argument. Note that turning on memory
336 accounting for one unit will also implicitly turn it on for
337 all units contained in the same slice and for all its parent
338 slices and the units contained therein. The system default
339 for this setting may be controlled with
340 <varname>DefaultMemoryAccounting=</varname> in
341 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
342
343 <xi:include href="version-info.xml" xpointer="v208"/>
344 </listitem>
345 </varlistentry>
346
347 <varlistentry>
348 <term><varname>MemoryMin=<replaceable>bytes</replaceable></varname>, <varname>MemoryLow=<replaceable>bytes</replaceable></varname></term>
349 <term><varname>StartupMemoryLow=<replaceable>bytes</replaceable></varname>, <varname>DefaultStartupMemoryLow=<replaceable>bytes</replaceable></varname></term>
350
351 <listitem>
352 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
353
354 <para>Specify the memory usage protection of the executed processes in this unit.
355 When reclaiming memory, the unit is treated as if it was using less memory resulting in memory
356 to be preferentially reclaimed from unprotected units.
357 Using <varname>MemoryLow=</varname> results in a weaker protection where memory may still
358 be reclaimed to avoid invoking the OOM killer in case there is no other reclaimable memory.</para>
359 <para>
360 For a protection to be effective, it is generally required to set a corresponding
361 allocation on all ancestors, which is then distributed between children
362 (with the exception of the root slice).
363 Any <varname>MemoryMin=</varname> or <varname>MemoryLow=</varname> allocation that is not
364 explicitly distributed to specific children is used to create a shared protection for all children.
365 As this is a shared protection, the children will freely compete for the memory.</para>
366
367 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
368 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
369 percentage value may be specified, which is taken relative to the installed physical memory on the
370 system. If assigned the special value <literal>infinity</literal>, all available memory is protected, which may be
371 useful in order to always inherit all of the protection afforded by ancestors.
372 This controls the <literal>memory.min</literal> or <literal>memory.low</literal> control group attribute.
373 For details about this control group attribute, see <ulink
374 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
375
376 <para>Units may have their children use a default <literal>memory.min</literal> or
377 <literal>memory.low</literal> value by specifying <varname>DefaultMemoryMin=</varname> or
378 <varname>DefaultMemoryLow=</varname>, which has the same semantics as
379 <varname>MemoryMin=</varname> and <varname>MemoryLow=</varname>, or <varname>DefaultStartupMemoryLow=</varname>
380 which has the same semantics as <varname>StartupMemoryLow=</varname>.
381 This setting does not affect <literal>memory.min</literal> or <literal>memory.low</literal>
382 in the unit itself.
383 Using it to set a default child allocation is only useful on kernels older than 5.7,
384 which do not support the <literal>memory_recursiveprot</literal> cgroup2 mount option.</para>
385
386 <para>While <varname>StartupMemoryLow=</varname> applies to the startup and shutdown phases of the system,
387 <varname>MemoryMin=</varname> applies to normal runtime of the system, and if the former is not set also to
388 the startup and shutdown phases. Using <varname>StartupMemoryLow=</varname> allows prioritizing specific services at
389 boot-up and shutdown differently than during normal runtime.</para>
390
391 <xi:include href="version-info.xml" xpointer="v240"/>
392 </listitem>
393 </varlistentry>
394
395 <varlistentry>
396 <term><varname>MemoryHigh=<replaceable>bytes</replaceable></varname></term>
397 <term><varname>StartupMemoryHigh=<replaceable>bytes</replaceable></varname></term>
398
399 <listitem>
400 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
401
402 <para>Specify the throttling limit on memory usage of the executed processes in this unit. Memory usage may go
403 above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away
404 aggressively in such cases. This is the main mechanism to control memory usage of a unit.</para>
405
406 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
407 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
408 percentage value may be specified, which is taken relative to the installed physical memory on the
409 system. If assigned the
410 special value <literal>infinity</literal>, no memory throttling is applied. This controls the
411 <literal>memory.high</literal> control group attribute. For details about this control group attribute, see
412 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.
413 The effective configuration is reported as <varname>EffectiveMemoryHigh=</varname>
414 (see also <varname>EffectiveMemoryMax=</varname>).</para>
415
416 <para>While <varname>StartupMemoryHigh=</varname> applies to the startup and shutdown phases of the system,
417 <varname>MemoryHigh=</varname> applies to normal runtime of the system, and if the former is not set also to
418 the startup and shutdown phases. Using <varname>StartupMemoryHigh=</varname> allows prioritizing specific services at
419 boot-up and shutdown differently than during normal runtime.</para>
420
421 <xi:include href="version-info.xml" xpointer="v231"/>
422 </listitem>
423 </varlistentry>
424
425 <varlistentry>
426 <term><varname>MemoryMax=<replaceable>bytes</replaceable></varname></term>
427 <term><varname>StartupMemoryMax=<replaceable>bytes</replaceable></varname></term>
428
429 <listitem>
430 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
431
432 <para>Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage
433 cannot be contained under the limit, out-of-memory killer is invoked inside the unit. It is recommended to
434 use <varname>MemoryHigh=</varname> as the main control mechanism and use <varname>MemoryMax=</varname> as the
435 last line of defense.</para>
436
437 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
438 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
439 percentage value may be specified, which is taken relative to the installed physical memory on the system. If
440 assigned the special value <literal>infinity</literal>, no memory limit is applied. This controls the
441 <literal>memory.max</literal> control group attribute. For details about this control group attribute, see
442 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.
443 The effective configuration is reported as <varname>EffectiveMemoryMax=</varname> (the value is
444 the most stringent limit of the unit and parent slices and it is capped by physical memory).</para>
445
446 <para>While <varname>StartupMemoryMax=</varname> applies to the startup and shutdown phases of the system,
447 <varname>MemoryMax=</varname> applies to normal runtime of the system, and if the former is not set also to
448 the startup and shutdown phases. Using <varname>StartupMemoryMax=</varname> allows prioritizing specific services at
449 boot-up and shutdown differently than during normal runtime.</para>
450
451 <xi:include href="version-info.xml" xpointer="v231"/>
452 </listitem>
453 </varlistentry>
454
455 <varlistentry>
456 <term><varname>MemorySwapMax=<replaceable>bytes</replaceable></varname></term>
457 <term><varname>StartupMemorySwapMax=<replaceable>bytes</replaceable></varname></term>
458
459 <listitem>
460 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
461
462 <para>Specify the absolute limit on swap usage of the executed processes in this unit.</para>
463
464 <para>Takes a swap size in bytes. If the value is suffixed with K, M, G or T, the specified swap size is
465 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the
466 special value <literal>infinity</literal>, no swap limit is applied. These settings control the
467 <literal>memory.swap.max</literal> control group attribute. For details about this control group attribute,
468 see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
469
470 <para>While <varname>StartupMemorySwapMax=</varname> applies to the startup and shutdown phases of the system,
471 <varname>MemorySwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to
472 the startup and shutdown phases. Using <varname>StartupMemorySwapMax=</varname> allows prioritizing specific services at
473 boot-up and shutdown differently than during normal runtime.</para>
474
475 <xi:include href="version-info.xml" xpointer="v232"/>
476 </listitem>
477 </varlistentry>
478
479 <varlistentry>
480 <term><varname>MemoryZSwapMax=<replaceable>bytes</replaceable></varname></term>
481 <term><varname>StartupMemoryZSwapMax=<replaceable>bytes</replaceable></varname></term>
482
483 <listitem>
484 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
485
486 <para>Specify the absolute limit on zswap usage of the processes in this unit. Zswap is a lightweight compressed
487 cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a
488 dynamically allocated RAM-based memory pool. If the limit specified is hit, no entries from this unit will be
489 stored in the pool until existing entries are faulted back or written out to disk. See the kernel's
490 <ulink url="https://docs.kernel.org/admin-guide/mm/zswap.html">Zswap</ulink> documentation for more details.</para>
491
492 <para>Takes a size in bytes. If the value is suffixed with K, M, G or T, the specified size is
493 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the
494 special value <literal>infinity</literal>, no limit is applied. These settings control the
495 <literal>memory.zswap.max</literal> control group attribute. For details about this control group attribute,
496 see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
497
498 <para>While <varname>StartupMemoryZSwapMax=</varname> applies to the startup and shutdown phases of the system,
499 <varname>MemoryZSwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to
500 the startup and shutdown phases. Using <varname>StartupMemoryZSwapMax=</varname> allows prioritizing specific services at
501 boot-up and shutdown differently than during normal runtime.</para>
502
503 <xi:include href="version-info.xml" xpointer="v253"/>
504 </listitem>
505 </varlistentry>
506
507 <varlistentry>
508 <term><varname>MemoryZSwapWriteback=</varname></term>
509
510 <listitem>
511 <para>This setting controls the <option>memory</option> controller in the unified hierarchy.</para>
512
513 <para>Takes a boolean argument. When true, pages stored in the Zswap cache are permitted to be
514 written to the backing storage, false otherwise. Defaults to true. This allows disabling
515 writeback of swap pages for IO-intensive applications, while retaining the ability to store
516 compressed pages in Zswap. See the kernel's
517 <ulink url="https://docs.kernel.org/admin-guide/mm/zswap.html">Zswap</ulink> documentation
518 for more details.</para>
519
520 <xi:include href="version-info.xml" xpointer="v256"/>
521 </listitem>
522 </varlistentry>
523
524 <varlistentry>
525 <term><varname>AllowedMemoryNodes=</varname></term>
526 <term><varname>StartupAllowedMemoryNodes=</varname></term>
527
528 <listitem>
529 <para>These settings control the <option>cpuset</option> controller in the unified hierarchy.</para>
530
531 <para>Restrict processes to be executed on specific memory NUMA nodes. Takes a list of memory NUMA nodes indices
532 or ranges separated by either whitespace or commas. Memory NUMA nodes ranges are specified by the lower and upper
533 NUMA nodes indices separated by a dash.</para>
534
535 <para>Setting <varname>AllowedMemoryNodes=</varname> or <varname>StartupAllowedMemoryNodes=</varname> doesn't
536 guarantee that all of the memory NUMA nodes will be used by the processes as it may be limited by parent units.
537 The effective configuration is reported as <varname>EffectiveMemoryNodes=</varname>.</para>
538
539 <para>While <varname>StartupAllowedMemoryNodes=</varname> applies to the startup and shutdown phases of the system,
540 <varname>AllowedMemoryNodes=</varname> applies to normal runtime of the system, and if the former is not set also to
541 the startup and shutdown phases. Using <varname>StartupAllowedMemoryNodes=</varname> allows prioritizing specific services at
542 boot-up and shutdown differently than during normal runtime.</para>
543
544 <para>This setting is supported only with the unified control group hierarchy.</para>
545
546 <xi:include href="version-info.xml" xpointer="v244"/>
547 </listitem>
548 </varlistentry>
549
550 </variablelist>
551
552 </refsect2><refsect2><title>Process Accounting and Control</title>
553
554 <variablelist class='unit-directives'>
555
556 <varlistentry>
557 <term><varname>TasksAccounting=</varname></term>
558
559 <listitem>
560 <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para>
561
562 <para>Turn on task accounting for this unit. Takes a boolean argument. If enabled, the kernel will
563 keep track of the total number of tasks in the unit and its children. This number includes both
564 kernel threads and userspace processes, with each thread counted individually. Note that turning on
565 tasks accounting for one unit will also implicitly turn it on for all units contained in the same
566 slice and for all its parent slices and the units contained therein. The system default for this
567 setting may be controlled with <varname>DefaultTasksAccounting=</varname> in
568 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
569
570 <xi:include href="version-info.xml" xpointer="v227"/>
571 </listitem>
572 </varlistentry>
573
574 <varlistentry>
575 <term><varname>TasksMax=<replaceable>N</replaceable></varname></term>
576
577 <listitem>
578 <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para>
579
580 <para>Specify the maximum number of tasks that may be created in the unit. This ensures that the
581 number of tasks accounted for the unit (see above) stays below a specific limit. This either takes
582 an absolute number of tasks or a percentage value that is taken relative to the configured maximum
583 number of tasks on the system. If assigned the special value <literal>infinity</literal>, no tasks
584 limit is applied. This controls the <literal>pids.max</literal> control group attribute. For
585 details about this control group attribute, the
586 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#pid">pids controller
587 </ulink>.
588 The effective configuration is reported as <varname>EffectiveTasksMax=</varname>.</para>
589
590 <para>The system default for this setting may be controlled with
591 <varname>DefaultTasksMax=</varname> in
592 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
593
594 <xi:include href="version-info.xml" xpointer="v227"/>
595 </listitem>
596 </varlistentry>
597
598 </variablelist>
599
600 </refsect2><refsect2><title>IO Accounting and Control</title>
601
602 <variablelist class='unit-directives'>
603
604 <varlistentry>
605 <term><varname>IOAccounting=</varname></term>
606
607 <listitem>
608 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
609
610 <para>Turn on Block I/O accounting for this unit, if the unified control group hierarchy is used on the
611 system. Takes a boolean argument. Note that turning on block I/O accounting for one unit will also implicitly
612 turn it on for all units contained in the same slice and all for its parent slices and the units contained
613 therein. The system default for this setting may be controlled with <varname>DefaultIOAccounting=</varname>
614 in
615 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
616
617 <xi:include href="version-info.xml" xpointer="v230"/>
618 </listitem>
619 </varlistentry>
620
621 <varlistentry>
622 <term><varname>IOWeight=<replaceable>weight</replaceable></varname></term>
623 <term><varname>StartupIOWeight=<replaceable>weight</replaceable></varname></term>
624
625 <listitem>
626 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
627
628 <para>Set the default overall block I/O weight for the executed processes, if the unified control
629 group hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the
630 default block I/O weight. This controls the <literal>io.weight</literal> control group attribute,
631 which defaults to 100. For details about this control group attribute, see <ulink
632 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO
633 Interface Files</ulink>. The available I/O bandwidth is split up among all units within one slice
634 relative to their block I/O weight. A higher weight means more I/O bandwidth, a lower weight means
635 less.</para>
636
637 <para>While <varname>StartupIOWeight=</varname> applies
638 to the startup and shutdown phases of the system,
639 <varname>IOWeight=</varname> applies to the later runtime of
640 the system, and if the former is not set also to the startup
641 and shutdown phases. This allows prioritizing specific services at boot-up
642 and shutdown differently than during runtime.</para>
643
644 <xi:include href="version-info.xml" xpointer="v230"/>
645 </listitem>
646 </varlistentry>
647
648 <varlistentry>
649 <term><varname>IODeviceWeight=<replaceable>device</replaceable> <replaceable>weight</replaceable></varname></term>
650
651 <listitem>
652 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
653
654 <para>Set the per-device overall block I/O weight for the executed processes, if the unified control group
655 hierarchy is used on the system. Takes a space-separated pair of a file path and a weight value to specify
656 the device specific weight value, between 1 and 10000. (Example: <literal>/dev/sda 1000</literal>). The file
657 path may be specified as path to a block device node or as any other file, in which case the backing block
658 device of the file system of the file is determined. This controls the <literal>io.weight</literal> control
659 group attribute, which defaults to 100. Use this option multiple times to set weights for multiple devices.
660 For details about this control group attribute, see <ulink
661 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para>
662
663 <para>The specified device node should reference a block device that has an I/O scheduler
664 associated, i.e. should not refer to partition or loopback block devices, but to the originating,
665 physical device. When a path to a regular file or directory is specified it is attempted to
666 discover the correct originating device backing the file system of the specified path. This works
667 correctly only for simpler cases, where the file system is directly placed on a partition or
668 physical block device, or where simple 1:1 encryption using dm-crypt/LUKS is used. This discovery
669 does not cover complex storage and in particular RAID and volume management storage devices.</para>
670
671 <xi:include href="version-info.xml" xpointer="v230"/>
672 </listitem>
673 </varlistentry>
674
675 <varlistentry>
676 <term><varname>IOReadBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term>
677 <term><varname>IOWriteBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term>
678
679 <listitem>
680 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
681
682 <para>Set the per-device overall block I/O bandwidth maximum limit for the executed processes, if the unified
683 control group hierarchy is used on the system. This limit is not work-conserving and the executed processes
684 are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of a file
685 path and a bandwidth value (in bytes per second) to specify the device specific bandwidth. The file path may
686 be a path to a block device node, or as any other file in which case the backing block device of the file
687 system of the file is used. If the bandwidth is suffixed with K, M, G, or T, the specified bandwidth is
688 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the base of 1000. (Example:
689 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This controls the <literal>io.max</literal> control
690 group attributes. Use this option multiple times to set bandwidth limits for multiple devices. For details
691 about this control group attribute, see <ulink
692 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.
693 </para>
694
695 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
696
697 <xi:include href="version-info.xml" xpointer="v230"/>
698 </listitem>
699 </varlistentry>
700
701 <varlistentry>
702 <term><varname>IOReadIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term>
703 <term><varname>IOWriteIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term>
704
705 <listitem>
706 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
707
708 <para>Set the per-device overall block I/O IOs-Per-Second maximum limit for the executed processes, if the
709 unified control group hierarchy is used on the system. This limit is not work-conserving and the executed
710 processes are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of
711 a file path and an IOPS value to specify the device specific IOPS. The file path may be a path to a block
712 device node, or as any other file in which case the backing block device of the file system of the file is
713 used. If the IOPS is suffixed with K, M, G, or T, the specified IOPS is parsed as KiloIOPS, MegaIOPS,
714 GigaIOPS, or TeraIOPS, respectively, to the base of 1000. (Example:
715 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This controls the <literal>io.max</literal> control
716 group attributes. Use this option multiple times to set IOPS limits for multiple devices. For details about
717 this control group attribute, see <ulink
718 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.
719 </para>
720
721 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
722
723 <xi:include href="version-info.xml" xpointer="v230"/>
724 </listitem>
725 </varlistentry>
726
727 <varlistentry>
728 <term><varname>IODeviceLatencyTargetSec=<replaceable>device</replaceable> <replaceable>target</replaceable></varname></term>
729
730 <listitem>
731 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
732
733 <para>Set the per-device average target I/O latency for the executed processes, if the unified control group
734 hierarchy is used on the system. Takes a file path and a timespan separated by a space to specify
735 the device specific latency target. (Example: "/dev/sda 25ms"). The file path may be specified
736 as path to a block device node or as any other file, in which case the backing block device of the file
737 system of the file is determined. This controls the <literal>io.latency</literal> control group
738 attribute. Use this option multiple times to set latency target for multiple devices. For details about this
739 control group attribute, see <ulink
740 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para>
741
742 <para>Implies <literal>IOAccounting=yes</literal>.</para>
743
744 <para>These settings are supported only if the unified control group hierarchy is used.</para>
745
746 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
747
748 <xi:include href="version-info.xml" xpointer="v240"/>
749 </listitem>
750 </varlistentry>
751
752 </variablelist>
753
754 </refsect2><refsect2><title>Network Accounting and Control</title>
755
756 <variablelist class='unit-directives'>
757
758 <varlistentry>
759 <term><varname>IPAccounting=</varname></term>
760
761 <listitem>
762 <para>Takes a boolean argument. If true, turns on IPv4 and IPv6 network traffic accounting for packets sent
763 or received by the unit. When this option is turned on, all IPv4 and IPv6 sockets created by any process of
764 the unit are accounted for.</para>
765
766 <para>When this option is used in socket units, it applies to all IPv4 and IPv6 sockets
767 associated with it (including both listening and connection sockets where this applies). Note that for
768 socket-activated services, this configuration setting and the accounting data of the service unit and the
769 socket unit are kept separate, and displayed separately. No propagation of the setting and the collected
770 statistics is done, in either direction. Moreover, any traffic sent or received on any of the socket unit's
771 sockets is accounted to the socket unit — and never to the service unit it might have activated, even if the
772 socket is used by it.</para>
773
774 <para>The system default for this setting may be controlled with <varname>DefaultIPAccounting=</varname> in
775 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
776
777 <para>Note that this functionality is currently only available for system services, not for
778 per-user services.</para>
779
780 <xi:include href="version-info.xml" xpointer="v235"/>
781 </listitem>
782 </varlistentry>
783
784 <varlistentry>
785 <term><varname>IPAddressAllow=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
786 <term><varname>IPAddressDeny=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
787
788 <listitem>
789 <para>Turn on network traffic filtering for IP packets sent and received over
790 <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets. Both directives take a
791 space separated list of IPv4 or IPv6 addresses, each optionally suffixed with an address prefix
792 length in bits after a <literal>/</literal> character. If the suffix is omitted, the address is
793 considered a host address, i.e. the filter covers the whole address (32 bits for IPv4, 128 bits for
794 IPv6).</para>
795
796 <para>The access lists configured with this option are applied to all sockets created by processes
797 of this unit (or in the case of socket units, associated with it). The lists are implicitly
798 combined with any lists configured for any of the parent slice units this unit might be a member
799 of. By default both access lists are empty. Both ingress and egress traffic is filtered by these
800 settings. In case of ingress traffic the source IP address is checked against these access lists,
801 in case of egress traffic the destination IP address is checked. The following rules are applied in
802 turn:</para>
803
804 <itemizedlist>
805 <listitem><para>Access is granted when the checked IP address matches an entry in the
806 <varname>IPAddressAllow=</varname> list.</para></listitem>
807
808 <listitem><para>Otherwise, access is denied when the checked IP address matches an entry in the
809 <varname>IPAddressDeny=</varname> list.</para></listitem>
810
811 <listitem><para>Otherwise, access is granted.</para></listitem>
812 </itemizedlist>
813
814 <para>In order to implement an allow-listing IP firewall, it is recommended to use a
815 <varname>IPAddressDeny=</varname><constant>any</constant> setting on an upper-level slice unit
816 (such as the root slice <filename>-.slice</filename> or the slice containing all system services
817 <filename>system.slice</filename> – see
818 <citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry>
819 for details on these slice units), plus individual per-service <varname>IPAddressAllow=</varname>
820 lines permitting network access to relevant services, and only them.</para>
821
822 <para>Note that for socket-activated services, the IP access list configured on the socket unit
823 applies to all sockets associated with it directly, but not to any sockets created by the
824 ultimately activated services for it. Conversely, the IP access list configured for the service is
825 not applied to any sockets passed into the service via socket activation. Thus, it is usually a
826 good idea to replicate the IP access lists on both the socket and the service unit. Nevertheless,
827 it may make sense to maintain one list more open and the other one more restricted, depending on
828 the use case.</para>
829
830 <para>If these settings are used multiple times in the same unit the specified lists are combined. If an
831 empty string is assigned to these settings the specific access list is reset and all previous settings undone.</para>
832
833 <para>In place of explicit IPv4 or IPv6 address and prefix length specifications a small set of symbolic
834 names may be used. The following names are defined:</para>
835
836 <table>
837 <title>Special address/network names</title>
838
839 <tgroup cols='3'>
840 <colspec colname='name'/>
841 <colspec colname='definition'/>
842 <colspec colname='meaning'/>
843
844 <thead>
845 <row>
846 <entry>Symbolic Name</entry>
847 <entry>Definition</entry>
848 <entry>Meaning</entry>
849 </row>
850 </thead>
851
852 <tbody>
853 <row>
854 <entry><constant>any</constant></entry>
855 <entry>0.0.0.0/0 ::/0</entry>
856 <entry>Any host</entry>
857 </row>
858
859 <row>
860 <entry><constant>localhost</constant></entry>
861 <entry>127.0.0.0/8 ::1/128</entry>
862 <entry>All addresses on the local loopback</entry>
863 </row>
864
865 <row>
866 <entry><constant>link-local</constant></entry>
867 <entry>169.254.0.0/16 fe80::/64</entry>
868 <entry>All link-local IP addresses</entry>
869 </row>
870
871 <row>
872 <entry><constant>multicast</constant></entry>
873 <entry>224.0.0.0/4 ff00::/8</entry>
874 <entry>All IP multicasting addresses</entry>
875 </row>
876 </tbody>
877 </tgroup>
878 </table>
879
880 <para>Note that these settings might not be supported on some systems (for example if eBPF control group
881 support is not enabled in the underlying kernel or container manager). These settings will have no effect in
882 that case. If compatibility with such systems is desired it is hence recommended to not exclusively rely on
883 them for IP security.</para>
884
885 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
886
887 <xi:include href="version-info.xml" xpointer="v235"/>
888 </listitem>
889 </varlistentry>
890
891 <varlistentry>
892 <term><varname>SocketBindAllow=<replaceable>bind-rule</replaceable></varname></term>
893 <term><varname>SocketBindDeny=<replaceable>bind-rule</replaceable></varname></term>
894
895 <listitem>
896 <para>Configures restrictions on the ability of unit processes to invoke <citerefentry
897 project='man-pages'><refentrytitle>bind</refentrytitle><manvolnum>2</manvolnum></citerefentry> on a
898 socket. Both allow and deny rules may defined that restrict which addresses a socket may be bound
899 to.</para>
900
901 <para><replaceable>bind-rule</replaceable> describes socket properties such as <replaceable>address-family</replaceable>,
902 <replaceable>transport-protocol</replaceable> and <replaceable>ip-ports</replaceable>.</para>
903
904 <para><replaceable>bind-rule</replaceable> :=
905 { [<replaceable>address-family</replaceable><constant>:</constant>][<replaceable>transport-protocol</replaceable><constant>:</constant>][<replaceable>ip-ports</replaceable>] | <constant>any</constant> }</para>
906
907 <para><replaceable>address-family</replaceable> := { <constant>ipv4</constant> | <constant>ipv6</constant> }</para>
908
909 <para><replaceable>transport-protocol</replaceable> := { <constant>tcp</constant> | <constant>udp</constant> }</para>
910
911 <para><replaceable>ip-ports</replaceable> := { <replaceable>ip-port</replaceable> | <replaceable>ip-port-range</replaceable> }</para>
912
913 <para>An optional <replaceable>address-family</replaceable> expects <constant>ipv4</constant> or <constant>ipv6</constant> values.
914 If not specified, a rule will be matched for both IPv4 and IPv6 addresses and applied depending on other socket fields, e.g. <replaceable>transport-protocol</replaceable>,
915 <replaceable>ip-port</replaceable>.</para>
916
917 <para>An optional <replaceable>transport-protocol</replaceable> expects <constant>tcp</constant> or <constant>udp</constant> transport protocol names.
918 If not specified, a rule will be matched for any transport protocol.</para>
919
920 <para>An optional <replaceable>ip-port</replaceable> value must lie within 165535 interval inclusively, i.e.
921 dynamic port <constant>0</constant> is not allowed. A range of sequential ports is described by
922 <replaceable>ip-port-range</replaceable> := <replaceable>ip-port-low</replaceable><constant>-</constant><replaceable>ip-port-high</replaceable>,
923 where <replaceable>ip-port-low</replaceable> is smaller than or equal to <replaceable>ip-port-high</replaceable>
924 and both are within 165535 inclusively.</para>
925
926 <para>A special value <constant>any</constant> can be used to apply a rule to any address family, transport protocol and any port with a positive value.</para>
927
928 <para>To allow multiple rules assign <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname> multiple times.
929 To clear the existing assignments pass an empty <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname>
930 assignment.</para>
931
932 <para>For each of <varname>SocketBindAllow=</varname> and <varname>SocketBindDeny=</varname>, maximum allowed number of assignments is
933 <constant>128</constant>.</para>
934
935 <itemizedlist>
936 <listitem><para>Binding to a socket is allowed when a socket address matches an entry in the
937 <varname>SocketBindAllow=</varname> list.</para></listitem>
938
939 <listitem><para>Otherwise, binding is denied when the socket address matches an entry in the
940 <varname>SocketBindDeny=</varname> list.</para></listitem>
941
942 <listitem><para>Otherwise, binding is allowed.</para></listitem>
943 </itemizedlist>
944
945 <para>The feature is implemented with <constant>cgroup/bind4</constant> and <constant>cgroup/bind6</constant> cgroup-bpf hooks.</para>
946
947 <para>Note that these settings apply to any <citerefentry
948 project='man-pages'><refentrytitle>bind</refentrytitle><manvolnum>2</manvolnum></citerefentry>
949 system call invocation by the unit processes, regardless in which network namespace they are
950 placed. Or in other words: changing the network namespace is not a suitable mechanism for escaping
951 these restrictions on <function>bind()</function>.</para>
952
953 <para>Examples:<programlisting>
954 # Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
955 [Service]
956 SocketBindAllow=ipv6:10000-65535
957 SocketBindDeny=any
958
959 # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
960 [Service]
961 SocketBindAllow=1234
962 SocketBindAllow=4321
963 SocketBindDeny=any
964
965 # Deny binding IPv6 socket addresses.
966 [Service]
967 SocketBindDeny=ipv6
968
969 # Deny binding IPv4 and IPv6 socket addresses.
970 [Service]
971 SocketBindDeny=any
972
973 # Allow binding only over TCP
974 [Service]
975 SocketBindAllow=tcp
976 SocketBindDeny=any
977
978 # Allow binding only over IPv6/TCP
979 [Service]
980 SocketBindAllow=ipv6:tcp
981 SocketBindDeny=any
982
983 # Allow binding ports within 10000-65535 range over IPv4/UDP.
984 [Service]
985 SocketBindAllow=ipv4:udp:10000-65535
986 SocketBindDeny=any
987</programlisting></para>
988
989 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
990
991 <xi:include href="version-info.xml" xpointer="v249"/>
992 </listitem>
993 </varlistentry>
994
995 <varlistentry>
996 <term><varname>RestrictNetworkInterfaces=</varname></term>
997
998 <listitem>
999 <para>Takes a list of space-separated network interface names. This option restricts the network
1000 interfaces that processes of this unit can use. By default processes can only use the network interfaces
1001 listed (allow-list). If the first character of the rule is <literal>~</literal>, the effect is inverted:
1002 the processes can only use network interfaces not listed (deny-list).
1003 </para>
1004
1005 <para>This option can appear multiple times, in which case the network interface names are merged. If the
1006 empty string is assigned the set is reset, all prior assignments will have not effect.
1007 </para>
1008
1009 <para>If you specify both types of this option (i.e. allow-listing and deny-listing), the first encountered
1010 will take precedence and will dictate the default action (allow vs deny). Then the next occurrences of this
1011 option will add or delete the listed network interface names from the set, depending of its type and the
1012 default action.
1013 </para>
1014
1015 <para>The loopback interface ("lo") is not treated in any special way, you have to configure it explicitly
1016 in the unit file.
1017 </para>
1018 <para>Example 1: allow-list
1019 <programlisting>
1020 RestrictNetworkInterfaces=eth1
1021 RestrictNetworkInterfaces=eth2</programlisting>
1022 Programs in the unit will be only able to use the eth1 and eth2 network
1023 interfaces.
1024 </para>
1025
1026 <para>Example 2: deny-list
1027 <programlisting>
1028 RestrictNetworkInterfaces=~eth1 eth2</programlisting>
1029 Programs in the unit will be able to use any network interface but eth1 and eth2.
1030 </para>
1031
1032 <para>Example 3: mixed
1033 <programlisting>
1034 RestrictNetworkInterfaces=eth1 eth2
1035 RestrictNetworkInterfaces=~eth1</programlisting>
1036 Programs in the unit will be only able to use the eth2 network interface.
1037 </para>
1038
1039 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
1040
1041 <xi:include href="version-info.xml" xpointer="v250"/>
1042 </listitem>
1043 </varlistentry>
1044
1045 <varlistentry>
1046 <term><varname>NFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term>
1047 <listitem>
1048 <para>This setting provides a method for integrating dynamic cgroup, user and group IDs into
1049 firewall rules with <ulink url="https://netfilter.org/projects/nftables/index.html">NFT</ulink>
1050 sets. The benefit of using this setting is to be able to use the IDs as selectors in firewall rules
1051 easily and this in turn allows more fine grained filtering. NFT rules for cgroup matching use
1052 numeric cgroup IDs, which change every time a service is restarted, making them hard to use in
1053 systemd environment otherwise. Dynamic and random IDs used by <varname>DynamicUser=</varname> can
1054 be also integrated with this setting.</para>
1055
1056 <para>This option expects a whitespace separated list of NFT set definitions. Each definition
1057 consists of a colon-separated tuple of source type (one of <literal>cgroup</literal>,
1058 <literal>user</literal> or <literal>group</literal>), NFT address family (one of
1059 <literal>arp</literal>, <literal>bridge</literal>, <literal>inet</literal>, <literal>ip</literal>,
1060 <literal>ip6</literal>, or <literal>netdev</literal>), table name and set name. The names of tables
1061 and sets must conform to lexical restrictions of NFT table names. The type of the element used in
1062 the NFT filter must match the type implied by the directive (<literal>cgroup</literal>,
1063 <literal>user</literal> or <literal>group</literal>) as shown in the table below. When a control
1064 group or a unit is realized, the corresponding ID will be appended to the NFT sets and it will be
1065 be removed when the control group or unit is removed. <command>systemd</command> only inserts
1066 elements to (or removes from) the sets, so the related NFT rules, tables and sets must be prepared
1067 elsewhere in advance. Failures to manage the sets will be ignored.</para>
1068
1069 <table>
1070 <title>Defined <varname>source type</varname> values</title>
1071 <tgroup cols='3'>
1072 <colspec colname='source type'/>
1073 <colspec colname='description'/>
1074 <colspec colname='NFT type name'/>
1075 <thead>
1076 <row>
1077 <entry>Source type</entry>
1078 <entry>Description</entry>
1079 <entry>Corresponding NFT type name</entry>
1080 </row>
1081 </thead>
1082
1083 <tbody>
1084 <row>
1085 <entry><literal>cgroup</literal></entry>
1086 <entry>control group ID</entry>
1087 <entry><literal>cgroupsv2</literal></entry>
1088 </row>
1089 <row>
1090 <entry><literal>user</literal></entry>
1091 <entry>user ID</entry>
1092 <entry><literal>meta skuid</literal></entry>
1093 </row>
1094 <row>
1095 <entry><literal>group</literal></entry>
1096 <entry>group ID</entry>
1097 <entry><literal>meta skgid</literal></entry>
1098 </row>
1099 </tbody>
1100 </tgroup>
1101 </table>
1102
1103 <para>If the firewall rules are reinstalled so that the contents of NFT sets are destroyed, command
1104 <command>systemctl daemon-reload</command> can be used to refill the sets.</para>
1105
1106 <para>Example:
1107 <programlisting>[Unit]
1108 NFTSet=cgroup:inet:filter:my_service user:inet:filter:serviceuser
1109 </programlisting>
1110 Corresponding NFT rules:
1111 <programlisting>table inet filter {
1112 set my_service {
1113 type cgroupsv2
1114 }
1115 set serviceuser {
1116 typeof meta skuid
1117 }
1118 chain x {
1119 socket cgroupv2 level 2 @my_service accept
1120 drop
1121 }
1122 chain y {
1123 meta skuid @serviceuser accept
1124 drop
1125 }
1126 }</programlisting>
1127 </para>
1128 <xi:include href="version-info.xml" xpointer="v255"/></listitem>
1129 </varlistentry>
1130
1131 </variablelist>
1132
1133 </refsect2><refsect2><title>BPF Programs</title>
1134
1135 <variablelist class='unit-directives'>
1136
1137 <varlistentry>
1138 <term><varname>IPIngressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term>
1139 <term><varname>IPEgressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term>
1140
1141 <listitem>
1142 <para>Add custom network traffic filters implemented as BPF programs, applying to all IP packets
1143 sent and received over <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets.
1144 Takes an absolute path to a pinned BPF program in the BPF virtual filesystem (<filename>/sys/fs/bpf/</filename>).
1145 </para>
1146
1147 <para>The filters configured with this option are applied to all sockets created by processes
1148 of this unit (or in the case of socket units, associated with it). The filters are loaded in addition
1149 to filters any of the parent slice units this unit might be a member of as well as any
1150 <varname>IPAddressAllow=</varname> and <varname>IPAddressDeny=</varname> filters in any of these units.
1151 By default there are no filters specified.</para>
1152
1153 <para>If these settings are used multiple times in the same unit all the specified programs are attached. If an
1154 empty string is assigned to these settings the program list is reset and all previous specified programs ignored.</para>
1155
1156 <para>If the path <replaceable>BPF_FS_PROGRAM_PATH</replaceable> in <varname>IPIngressFilterPath=</varname> assignment
1157 is already being handled by <varname>BPFProgram=</varname> ingress hook, e.g.
1158 <varname>BPFProgram=</varname><constant>ingress</constant>:<replaceable>BPF_FS_PROGRAM_PATH</replaceable>,
1159 the assignment will be still considered valid and the program will be attached to a cgroup. Same for
1160 <varname>IPEgressFilterPath=</varname> path and <constant>egress</constant> hook.</para>
1161
1162 <para>Note that for socket-activated services, the IP filter programs configured on the socket unit apply to
1163 all sockets associated with it directly, but not to any sockets created by the ultimately activated services
1164 for it. Conversely, the IP filter programs configured for the service are not applied to any sockets passed into
1165 the service via socket activation. Thus, it is usually a good idea, to replicate the IP filter programs on both
1166 the socket and the service unit, however it often makes sense to maintain one configuration more open and the other
1167 one more restricted, depending on the use case.</para>
1168
1169 <para>Note that these settings might not be supported on some systems (for example if eBPF control group
1170 support is not enabled in the underlying kernel or container manager). These settings will fail the service in
1171 that case. If compatibility with such systems is desired it is hence recommended to attach your filter manually
1172 (requires <varname>Delegate=</varname><constant>yes</constant>) instead of using this setting.</para>
1173
1174 <xi:include href="version-info.xml" xpointer="v243"/>
1175 </listitem>
1176 </varlistentry>
1177
1178 <varlistentry>
1179 <term><varname>BPFProgram=<replaceable>type</replaceable>:<replaceable>program-path</replaceable></varname></term>
1180 <listitem>
1181 <para><varname>BPFProgram=</varname> allows attaching custom BPF programs to the cgroup of a
1182 unit. (This generalizes the functionality exposed via <varname>IPEgressFilterPath=</varname> and
1183 <varname>IPIngressFilterPath=</varname> for other hooks.) Cgroup-bpf hooks in the form of BPF
1184 programs loaded to the BPF filesystem are attached with cgroup-bpf attach flags determined by the
1185 unit. For details about attachment types and flags see <ulink
1186 url="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h"><filename>bpf.h</filename></ulink>. Also
1187 refer to the general <ulink url="https://docs.kernel.org/bpf/">BPF documentation</ulink>.</para>
1188
1189 <para>The specification of BPF program consists of a pair of BPF program type and program path in
1190 the file system, with <literal>:</literal> as the separator:
1191 <replaceable>type</replaceable>:<replaceable>program-path</replaceable>.</para>
1192
1193 <para>The BPF program type is equivalent to the BPF attach type used in
1194 <citerefentry project='mankier'><refentrytitle>bpftool</refentrytitle><manvolnum>8</manvolnum></citerefentry>
1195 It may be one of
1196 <constant>egress</constant>,
1197 <constant>ingress</constant>,
1198 <constant>sock_create</constant>,
1199 <constant>sock_ops</constant>,
1200 <constant>device</constant>,
1201 <constant>bind4</constant>,
1202 <constant>bind6</constant>,
1203 <constant>connect4</constant>,
1204 <constant>connect6</constant>,
1205 <constant>post_bind4</constant>,
1206 <constant>post_bind6</constant>,
1207 <constant>sendmsg4</constant>,
1208 <constant>sendmsg6</constant>,
1209 <constant>sysctl</constant>,
1210 <constant>recvmsg4</constant>,
1211 <constant>recvmsg6</constant>,
1212 <constant>getsockopt</constant>,
1213 or <constant>setsockopt</constant>.
1214 </para>
1215
1216 <para>The specified program path must be an absolute path referencing a BPF program inode in the
1217 bpffs file system (which generally means it must begin with <filename>/sys/fs/bpf/</filename>). If
1218 a specified program does not exist (i.e. has not been uploaded to the BPF subsystem of the kernel
1219 yet), it will not be installed but unit activation will continue (a warning will be printed to the
1220 logs).</para>
1221
1222 <para>Setting <varname>BPFProgram=</varname> to an empty value makes previous assignments
1223 ineffective.</para>
1224
1225 <para>Multiple assignments of the same program type/path pair have the same effect as a single
1226 assignment: the program will be attached just once.</para>
1227
1228 <para>If BPF <constant>egress</constant> pinned to <replaceable>program-path</replaceable> path is already being
1229 handled by <varname>IPEgressFilterPath=</varname>, <varname>BPFProgram=</varname>
1230 assignment will be considered valid and <varname>BPFProgram=</varname> will be attached to a cgroup.
1231 Similarly for <constant>ingress</constant> hook and <varname>IPIngressFilterPath=</varname> assignment.</para>
1232
1233 <para>BPF programs passed with <varname>BPFProgram=</varname> are attached to the cgroup of a unit
1234 with BPF attach flag <constant>multi</constant>, that allows further attachments of the same
1235 <replaceable>type</replaceable> within cgroup hierarchy topped by the unit cgroup.</para>
1236
1237 <para>Examples:<programlisting>BPFProgram=egress:/sys/fs/bpf/egress-hook
1238 BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
1239 </programlisting></para>
1240
1241 <xi:include href="version-info.xml" xpointer="v249"/>
1242 </listitem>
1243 </varlistentry>
1244
1245 </variablelist>
1246
1247 </refsect2><refsect2><title>Device Access</title>
1248
1249 <variablelist class='unit-directives'>
1250
1251 <varlistentry>
1252 <term><varname>DeviceAllow=</varname></term>
1253
1254 <listitem>
1255 <para>Control access to specific device nodes by the executed processes. Takes two space-separated
1256 strings: a device node specifier followed by a combination of <constant>r</constant>,
1257 <constant>w</constant>, <constant>m</constant> to control <emphasis>r</emphasis>eading,
1258 <emphasis>w</emphasis>riting, or creation of the specific device nodes by the unit
1259 (<emphasis>m</emphasis>knod), respectively. This functionality is implemented using eBPF
1260 filtering.</para>
1261
1262 <para>When access to <emphasis>all</emphasis> physical devices should be disallowed,
1263 <varname>PrivateDevices=</varname> may be used instead. See
1264 <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
1265 </para>
1266
1267 <para>The device node specifier is either a path to a device node in the file system, starting with
1268 <filename>/dev/</filename>, or a string starting with either <literal>char-</literal> or
1269 <literal>block-</literal> followed by a device group name, as listed in
1270 <filename>/proc/devices</filename>. The latter is useful to allow-list all current and future
1271 devices belonging to a specific device group at once. The device group is matched according to
1272 filename globbing rules, you may hence use the <literal>*</literal> and <literal>?</literal>
1273 wildcards. (Note that such globbing wildcards are not available for device node path
1274 specifications!) In order to match device nodes by numeric major/minor, use device node paths in
1275 the <filename>/dev/char/</filename> and <filename>/dev/block/</filename> directories. However,
1276 matching devices by major/minor is generally not recommended as assignments are neither stable nor
1277 portable between systems or different kernel versions.</para>
1278
1279 <para>Examples: <filename>/dev/sda5</filename> is a path to a device node, referring to an ATA or
1280 SCSI block device. <literal>char-pts</literal> and <literal>char-alsa</literal> are specifiers for
1281 all pseudo TTYs and all ALSA sound devices, respectively. <literal>char-cpu/*</literal> is a
1282 specifier matching all CPU related device groups.</para>
1283
1284 <para>Note that allow lists defined this way should only reference device groups which are
1285 resolvable at the time the unit is started. Any device groups not resolvable then are not added to
1286 the device allow list. In order to work around this limitation, consider extending service units
1287 with a pair of <command>After=modprobe@xyz.service</command> and
1288 <command>Wants=modprobe@xyz.service</command> lines that load the necessary kernel module
1289 implementing the device group if missing.
1290 Example: <programlisting>
1291 [Unit]
1292 Wants=modprobe@loop.service
1293 After=modprobe@loop.service
1294
1295 [Service]
1296 DeviceAllow=block-loop
1297 DeviceAllow=/dev/loop-control
1298</programlisting></para>
1299
1300 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
1301
1302 <xi:include href="version-info.xml" xpointer="v208"/>
1303 </listitem>
1304 </varlistentry>
1305
1306 <varlistentry>
1307 <term><varname>DevicePolicy=auto|closed|strict</varname></term>
1308
1309 <listitem>
1310 <para>
1311 Control the policy for allowing device access:
1312 </para>
1313 <variablelist>
1314 <varlistentry>
1315 <term><option>strict</option></term>
1316 <listitem>
1317 <para>means to only allow types of access that are
1318 explicitly specified.</para>
1319
1320 <xi:include href="version-info.xml" xpointer="v208"/>
1321 </listitem>
1322 </varlistentry>
1323
1324 <varlistentry>
1325 <term><option>closed</option></term>
1326 <listitem>
1327 <para>in addition, allows access to standard pseudo
1328 devices including
1329 <filename>/dev/null</filename>,
1330 <filename>/dev/zero</filename>,
1331 <filename>/dev/full</filename>,
1332 <filename>/dev/random</filename>, and
1333 <filename>/dev/urandom</filename>.
1334 </para>
1335
1336 <xi:include href="version-info.xml" xpointer="v208"/>
1337 </listitem>
1338 </varlistentry>
1339
1340 <varlistentry>
1341 <term><option>auto</option></term>
1342 <listitem>
1343 <para>
1344 in addition, allows access to all devices if no
1345 explicit <varname>DeviceAllow=</varname> is present.
1346 This is the default.
1347 </para>
1348
1349 <xi:include href="version-info.xml" xpointer="v208"/>
1350 </listitem>
1351 </varlistentry>
1352 </variablelist>
1353
1354 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
1355
1356 <xi:include href="version-info.xml" xpointer="v208"/>
1357 </listitem>
1358 </varlistentry>
1359
1360 </variablelist>
1361
1362 </refsect2><refsect2><title>Control Group Management</title>
1363
1364 <variablelist class='unit-directives'>
1365
1366 <varlistentry>
1367 <term><varname>Slice=</varname></term>
1368
1369 <listitem>
1370 <para>The name of the slice unit to place the unit
1371 in. Defaults to <filename>system.slice</filename> for all
1372 non-instantiated units of all unit types (except for slice
1373 units themselves see below). Instance units are by default
1374 placed in a subslice of <filename>system.slice</filename>
1375 that is named after the template name.</para>
1376
1377 <para>This option may be used to arrange systemd units in a
1378 hierarchy of slices each of which might have resource
1379 settings applied.</para>
1380
1381 <para>For units of type slice, the only accepted value for
1382 this setting is the parent slice. Since the name of a slice
1383 unit implies the parent slice, it is hence redundant to ever
1384 set this parameter directly for slice units.</para>
1385
1386 <para>Special care should be taken when relying on the default slice assignment in templated service units
1387 that have <varname>DefaultDependencies=no</varname> set, see
1388 <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>, section
1389 "Default Dependencies" for details.</para>
1390
1391 <xi:include href="version-info.xml" xpointer="v208"/>
1392
1393 </listitem>
1394 </varlistentry>
1395
1396 <varlistentry>
1397 <term><varname>Delegate=</varname></term>
1398
1399 <listitem>
1400 <para>Turns on delegation of further resource control partitioning to processes of the unit. Units
1401 where this is enabled may create and manage their own private subhierarchy of control groups below
1402 the control group of the unit itself. For unprivileged services (i.e. those using the
1403 <varname>User=</varname> setting) the unit's control group will be made accessible to the relevant
1404 user.</para>
1405
1406 <para>When enabled the service manager will refrain from manipulating control groups or moving
1407 processes below the unit's control group, so that a clear concept of ownership is established: the
1408 control group tree at the level of the unit's control group and above (i.e. towards the root
1409 control group) is owned and managed by the service manager of the host, while the control group
1410 tree below the unit's control group is owned and managed by the unit itself.</para>
1411
1412 <para>Takes either a boolean argument or a (possibly empty) list of control group controller names.
1413 If true, delegation is turned on, and all supported controllers are enabled for the unit, making
1414 them available to the unit's processes for management. If false, delegation is turned off entirely
1415 (and no additional controllers are enabled). If set to a list of controllers, delegation is turned
1416 on, and the specified controllers are enabled for the unit. Assigning the empty string will enable
1417 delegation, but reset the list of controllers, and all assignments prior to this will have no
1418 effect. Note that additional controllers other than the ones specified might be made available as
1419 well, depending on configuration of the containing slice unit or other units contained in it.
1420 Defaults to false.</para>
1421
1422 <para>Note that controller delegation to less privileged code is only safe on the unified control
1423 group hierarchy. Accordingly, access to the specified controllers will not be granted to
1424 unprivileged services on the legacy hierarchy, even when requested.</para>
1425
1426 <xi:include href="supported-controllers.xml" xpointer="controllers-text" />
1427
1428 <para>Not all of these controllers are available on all kernels however, and some are specific to
1429 the unified hierarchy while others are specific to the legacy hierarchy. Also note that the kernel
1430 might support further controllers, which aren't covered here yet as delegation is either not
1431 supported at all for them or not defined cleanly.</para>
1432
1433 <para>Note that because of the hierarchical nature of cgroup hierarchy, any controllers that are
1434 delegated will be enabled for the parent and sibling units of the unit with delegation.</para>
1435
1436 <para>For further details on the delegation model consult <ulink
1437 url="https://systemd.io/CGROUP_DELEGATION">Control Group APIs and Delegation</ulink>.</para>
1438
1439 <xi:include href="version-info.xml" xpointer="v218"/>
1440 </listitem>
1441 </varlistentry>
1442
1443 <varlistentry>
1444 <term><varname>DelegateSubgroup=</varname></term>
1445
1446 <listitem>
1447 <para>Place unit processes in the specified subgroup of the unit's control group. Takes a valid
1448 control group name (not a path!) as parameter, or an empty string to turn this feature
1449 off. Defaults to off. The control group name must be usable as filename and avoid conflicts with
1450 the kernel's control group attribute files (i.e. <filename>cgroup.procs</filename> is not an
1451 acceptable name, since the kernel exposes a native control group attribute file by that name). This
1452 option has no effect unless control group delegation is turned on via <varname>Delegate=</varname>,
1453 see above. Note that this setting only applies to "main" processes of a unit, i.e. for services to
1454 <varname>ExecStart=</varname>, but not for <varname>ExecReload=</varname> and similar. If
1455 delegation is enabled, the latter are always placed inside a subgroup named
1456 <filename>.control</filename>. The specified subgroup is automatically created (and potentially
1457 ownership is passed to the unit's configured user/group) when a process is started in it.</para>
1458
1459 <para>This option is useful to avoid manually moving the invoked process into a subgroup after it
1460 has been started. Since no processes should live in inner nodes of the control group tree it's
1461 almost always necessary to run the main ("supervising") process of a unit that has delegation
1462 turned on in a subgroup.</para>
1463
1464 <xi:include href="version-info.xml" xpointer="v254"/>
1465 </listitem>
1466 </varlistentry>
1467
1468 <varlistentry>
1469 <term><varname>DisableControllers=</varname></term>
1470
1471 <listitem>
1472 <para>Disables controllers from being enabled for a unit's children. If a controller listed is
1473 already in use in its subtree, the controller will be removed from the subtree. This can be used to
1474 avoid configuration in child units from being able to implicitly or explicitly enable a controller.
1475 Defaults to empty.</para>
1476
1477 <para>Multiple controllers may be specified, separated by spaces. You may also pass
1478 <varname>DisableControllers=</varname> multiple times, in which case each new instance adds another controller
1479 to disable. Passing <varname>DisableControllers=</varname> by itself with no controller name present resets
1480 the disabled controller list.</para>
1481
1482 <para>It may not be possible to disable a controller after units have been started, if the unit or
1483 any child of the unit in question delegates controllers to its children, as any delegated subtree
1484 of the cgroup hierarchy is unmanaged by systemd.</para>
1485
1486 <xi:include href="supported-controllers.xml" xpointer="controllers-text" />
1487
1488 <xi:include href="version-info.xml" xpointer="v240"/>
1489 </listitem>
1490 </varlistentry>
1491
1492 </variablelist>
1493
1494 </refsect2><refsect2><title>Memory Pressure Control</title>
1495
1496 <variablelist class='unit-directives'>
1497
1498 <varlistentry>
1499 <term><varname>ManagedOOMSwap=auto|kill</varname></term>
1500 <term><varname>ManagedOOMMemoryPressure=auto|kill</varname></term>
1501
1502 <listitem>
1503 <para>Specifies how
1504 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
1505 will act on this unit's cgroups. Defaults to <option>auto</option>.</para>
1506
1507 <para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by
1508 <command>systemd-oomd</command>. If the cgroup passes the limits set by
1509 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or
1510 the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send
1511 <constant>SIGKILL</constant> to all of the processes under it. You can find more details on
1512 candidates and kill behavior at
1513 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
1514 and
1515 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
1516
1517 <para>Setting either of these properties to <option>kill</option> will also result in
1518 <varname>After=</varname> and <varname>Wants=</varname> dependencies on
1519 <filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para>
1520
1521 <para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this
1522 cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these
1523 properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate
1524 for <command>systemd-oomd</command> to terminate.</para>
1525
1526 <xi:include href="version-info.xml" xpointer="v247"/>
1527 </listitem>
1528 </varlistentry>
1529
1530 <varlistentry>
1531 <term><varname>ManagedOOMMemoryPressureLimit=</varname></term>
1532
1533 <listitem>
1534 <para>Overrides the default memory pressure limit set by
1535 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
1536 this unit (cgroup). Takes a percentage value between 0% and 100%, inclusive. This property is
1537 ignored unless <varname>ManagedOOMMemoryPressure=</varname><option>kill</option>. Defaults to 0%,
1538 which means to use the default set by
1539 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
1540 </para>
1541
1542 <xi:include href="version-info.xml" xpointer="v247"/>
1543 </listitem>
1544 </varlistentry>
1545
1546 <varlistentry>
1547 <term><varname>ManagedOOMPreference=none|avoid|omit</varname></term>
1548
1549 <listitem>
1550 <para>Allows deprioritizing or omitting this unit's cgroup as a candidate when
1551 <command>systemd-oomd</command> needs to act. Requires support for extended attributes (see
1552 <citerefentry project='man-pages'><refentrytitle>xattr</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
1553 in order to use <option>avoid</option> or <option>omit</option>.</para>
1554
1555 <para>When calculating candidates to relieve swap usage, <command>systemd-oomd</command> will
1556 only respect these extended attributes if the unit's cgroup is owned by root.</para>
1557
1558 <para>When calculating candidates to relieve memory pressure, <command>systemd-oomd</command>
1559 will only respect these extended attributes if the unit's cgroup is owned by root, or if the
1560 unit's cgroup owner, and the owner of the monitored ancestor cgroup are the same. For example,
1561 if <command>systemd-oomd</command> is calculating candidates for <filename>-.slice</filename>,
1562 then extended attributes set on descendants of <filename>/user.slice/user-1000.slice/user@1000.service/</filename>
1563 will be ignored because the descendants are owned by UID 1000, and <filename>-.slice</filename>
1564 is owned by UID 0. But, if calculating candidates for
1565 <filename>/user.slice/user-1000.slice/user@1000.service/</filename>, then extended attributes set
1566 on the descendants would be respected.</para>
1567
1568 <para>If this property is set to <option>avoid</option>, the service manager will convey this to
1569 <command>systemd-oomd</command>, which will only select this cgroup if there are no other viable
1570 candidates.</para>
1571
1572 <para>If this property is set to <option>omit</option>, the service manager will convey this to
1573 <command>systemd-oomd</command>, which will ignore this cgroup as a candidate and will not perform
1574 any actions on it.</para>
1575
1576 <para>It is recommended to use <option>avoid</option> and <option>omit</option> sparingly, as it
1577 can adversely affect <command>systemd-oomd</command>'s kill behavior. Also note that these extended
1578 attributes are not applied recursively to cgroups under this unit's cgroup.</para>
1579
1580 <para>Defaults to <option>none</option> which means <command>systemd-oomd</command> will rank this
1581 unit's cgroup as defined in
1582 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
1583 and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
1584 </para>
1585
1586 <xi:include href="version-info.xml" xpointer="v248"/>
1587 </listitem>
1588 </varlistentry>
1589
1590 <varlistentry>
1591 <term><varname>MemoryPressureWatch=</varname></term>
1592
1593 <listitem><para>Controls memory pressure monitoring for invoked processes. Takes one of
1594 <literal>off</literal>, <literal>on</literal>, <literal>auto</literal> or <literal>skip</literal>. If
1595 <literal>off</literal> tells the service not to watch for memory pressure events, by setting the
1596 <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable to the literal string
1597 <filename>/dev/null</filename>. If <literal>on</literal> tells the service to watch for memory
1598 pressure events. This enables memory accounting for the service, and ensures the
1599 <filename>memory.pressure</filename> cgroup attribute file is accessible for reading and writing by the
1600 service's user. It then sets the <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable for
1601 processes invoked by the unit to the file system path to this file. The threshold information
1602 configured with <varname>MemoryPressureThresholdSec=</varname> is encoded in the
1603 <varname>$MEMORY_PRESSURE_WRITE</varname> environment variable. If the <literal>auto</literal> value
1604 is set the protocol is enabled if memory accounting is anyway enabled for the unit, and disabled
1605 otherwise. If set to <literal>skip</literal> the logic is neither enabled, nor disabled and the two
1606 environment variables are not set.</para>
1607
1608 <para>Note that services are free to use the two environment variables, but it's unproblematic if
1609 they ignore them. Memory pressure handling must be implemented individually in each service, and
1610 usually means different things for different software. For further details on memory pressure
1611 handling see <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure Handling in
1612 systemd</ulink>.</para>
1613
1614 <para>Services implemented using
1615 <citerefentry><refentrytitle>sd-event</refentrytitle><manvolnum>3</manvolnum></citerefentry> may use
1616 <citerefentry><refentrytitle>sd_event_add_memory_pressure</refentrytitle><manvolnum>3</manvolnum></citerefentry>
1617 to watch for and handle memory pressure events.</para>
1618
1619 <para>If not explicit set, defaults to the <varname>DefaultMemoryPressureWatch=</varname> setting in
1620 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
1621
1622 <xi:include href="version-info.xml" xpointer="v254"/></listitem>
1623 </varlistentry>
1624
1625 <varlistentry>
1626 <term><varname>MemoryPressureThresholdSec=</varname></term>
1627
1628 <listitem><para>Sets the memory pressure threshold time for memory pressure monitor as configured via
1629 <varname>MemoryPressureWatch=</varname>. Specifies the maximum allocation latency before a memory
1630 pressure event is signalled to the service, per 2s window. If not specified defaults to the
1631 <varname>DefaultMemoryPressureThresholdSec=</varname> setting in
1632 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
1633 (which in turn defaults to 200ms). The specified value expects a time unit such as
1634 <literal>ms</literal> or <literal>μs</literal>, see
1635 <citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
1636 details on the permitted syntax.</para>
1637
1638 <xi:include href="version-info.xml" xpointer="v254"/></listitem>
1639 </varlistentry>
1640 </variablelist>
1641
1642 </refsect2><refsect2><title>Coredump Control</title>
1643
1644 <variablelist class='unit-directives'>
1645
1646 <varlistentry>
1647 <term><varname>CoredumpReceive=</varname></term>
1648
1649 <listitem><para>Takes a boolean argument. This setting is used to enable coredump forwarding for containers
1650 that belong to this unit's cgroup. Units with <varname>CoredumpReceive=yes</varname> must also be configured
1651 with <varname>Delegate=yes</varname>. Defaults to false.</para>
1652
1653 <para>When <command>systemd-coredump</command> is handling a coredump for a process from a container,
1654 if the container's leader process is a descendant of a cgroup with <varname>CoredumpReceive=yes</varname>
1655 and <varname>Delegate=yes</varname>, then <command>systemd-coredump</command> will attempt to forward
1656 the coredump to <command>systemd-coredump</command> within the container.</para>
1657
1658 <xi:include href="version-info.xml" xpointer="v255"/></listitem>
1659 </varlistentry>
1660
1661 </variablelist>
1662 </refsect2>
1663 </refsect1>
1664
1665 <refsect1>
1666 <title>History</title>
1667
1668 <variablelist>
1669 <varlistentry>
1670 <term>systemd 252</term>
1671 <listitem><para> Options for controlling the Legacy Control Group Hierarchy (<ulink
1672 url="https://docs.kernel.org/admin-guide/cgroup-v1/index.html">Control Groups version 1</ulink>)
1673 are now fully deprecated:
1674 <varname>CPUShares=<replaceable>weight</replaceable></varname>,
1675 <varname>StartupCPUShares=<replaceable>weight</replaceable></varname>,
1676 <varname>MemoryLimit=<replaceable>bytes</replaceable></varname>,
1677 <varname>BlockIOAccounting=</varname>,
1678 <varname>BlockIOWeight=<replaceable>weight</replaceable></varname>,
1679 <varname>StartupBlockIOWeight=<replaceable>weight</replaceable></varname>,
1680 <varname>BlockIODeviceWeight=<replaceable>device</replaceable>
1681 <replaceable>weight</replaceable></varname>,
1682 <varname>BlockIOReadBandwidth=<replaceable>device</replaceable>
1683 <replaceable>bytes</replaceable></varname>,
1684 <varname>BlockIOWriteBandwidth=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname>.
1685 Please switch to the unified cgroup hierarchy.</para>
1686
1687 <xi:include href="version-info.xml" xpointer="v252"/></listitem>
1688 </varlistentry>
1689 </variablelist>
1690 </refsect1>
1691
1692 <refsect1>
1693 <title>See Also</title>
1694 <para><simplelist type="inline">
1695 <member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
1696 <member><citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1697 <member><citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1698 <member><citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1699 <member><citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1700 <member><citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1701 <member><citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1702 <member><citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1703 <member><citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1704 <member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member>
1705 <member><citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
1706 <member><citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
1707 <member><citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member>
1708 <member>The documentation for control groups and specific controllers in the Linux kernel:
1709 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink></member>
1710 </simplelist></para>
1711 </refsect1>
1712 </refentry>