]> git.ipfire.org Git - thirdparty/systemd.git/blame - man/systemd.resource-control.xml
man: add version info
[thirdparty/systemd.git] / man / systemd.resource-control.xml
CommitLineData
514094f9 1<?xml version='1.0'?>
3a54a157
ZJS
2<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
db9ecf05 4<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
d868475a 5
5403e153 6<refentry id="systemd.resource-control" xmlns:xi="http://www.w3.org/2001/XInclude">
d868475a 7 <refentryinfo>
3fde5f30 8 <title>systemd.resource-control</title>
d868475a 9 <productname>systemd</productname>
d868475a
ZJS
10 </refentryinfo>
11
12 <refmeta>
3fde5f30 13 <refentrytitle>systemd.resource-control</refentrytitle>
d868475a
ZJS
14 <manvolnum>5</manvolnum>
15 </refmeta>
16
17 <refnamediv>
3fde5f30
LP
18 <refname>systemd.resource-control</refname>
19 <refpurpose>Resource control unit settings</refpurpose>
d868475a
ZJS
20 </refnamediv>
21
22 <refsynopsisdiv>
23 <para>
24 <filename><replaceable>slice</replaceable>.slice</filename>,
25 <filename><replaceable>scope</replaceable>.scope</filename>,
26 <filename><replaceable>service</replaceable>.service</filename>,
27 <filename><replaceable>socket</replaceable>.socket</filename>,
28 <filename><replaceable>mount</replaceable>.mount</filename>,
29 <filename><replaceable>swap</replaceable>.swap</filename>
30 </para>
31 </refsynopsisdiv>
32
33 <refsect1>
34 <title>Description</title>
35
c7458f93
LP
36 <para>Unit configuration files for services, slices, scopes, sockets, mount points, and swap devices share a subset
37 of configuration options for resource control of spawned processes. Internally, this relies on the Linux Control
38 Groups (cgroups) kernel concept for organizing processes in a hierarchical tree of named groups for the purpose of
39 resource management.</para>
9365b048 40
d868475a
ZJS
41 <para>This man page lists the configuration options shared by
42 those six unit types. See
43 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>
44 for the common options of all unit configuration files, and
45 <citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
46 <citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
47 <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
48 <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
49 <citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
50 and
51 <citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry>
52 for more information on the specific unit configuration files. The
3fde5f30 53 resource control configuration options are configured in the
d868475a
ZJS
54 [Slice], [Scope], [Service], [Socket], [Mount], or [Swap]
55 sections, depending on the unit type.</para>
ea021cc3 56
74b47bbd
ZJS
57 <para>In addition, options which control resources available to programs
58 <emphasis>executed</emphasis> by systemd are listed in
59 <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
60 Those options complement options listed here.</para>
61
253d0d59
ZJS
62 <refsect2>
63 <title>Enabling and disabling controllers</title>
64
65 <para>Controllers in the cgroup hierarchy are hierarchical, and resource control is realized by
66 distributing resource assignments between siblings in branches of the cgroup hierarchy. There is no
67 need to explicitly <emphasis>enable</emphasis> a cgroup controller for a unit.
68 <command>systemd</command> will instruct the kernel to enable a controller for a given unit when this
69 unit has configuration for a given controller. For example, when <varname>CPUWeight=</varname> is set,
70 the <option>cpu</option> controller will be enabled, and when <varname>TasksMax=</varname> are set, the
71 <option>pids</option> controller will be enabled. In addition, various controllers may be also be
72 enabled explicitly via the
73 <varname>MemoryAccounting=</varname>/<varname>TasksAccounting=</varname>/<varname>IOAccounting=</varname>
74 settings. Because of how the cgroup hierarchy works, controllers will be automatically enabled for all
75 parent units and for any sibling units starting with the lowest level at which a controller is enabled.
76 Units for which a controller is enabled may be subject to resource control even if they don't have any
77 explicit configuration.</para>
78
79 <para>Setting <varname>Delegate=</varname> enables any delegated controllers for that unit (see below).
80 The delegatee may then enable controllers for its children as appropriate. In particular, if the
81 delegatee is <command>systemd</command> (in the <filename>user@.service</filename> unit), it will
82 repeat the same logic as the system instance and enable controllers for user units which have resource
83 limits configured, and their siblings and parents and parents' siblings.</para>
84
85 <para>Controllers may be <emphasis>disabled</emphasis> for parts of the cgroup hierarchy with
86 <varname>DisableControllers=</varname> (see below).</para>
87
88 <example>
89 <title>Enabling and disabling controllers</title>
90
91 <programlisting>
92 -.slice
93 / \
94 /-----/ \--------------\
95 / \
96 system.slice user.slice
97 / \ / \
98 / \ / \
449172f9
ZJS
99 / \ user@42.service user@1000.service
100 / \ Delegate= Delegate=yes
253d0d59
ZJS
101a.service b.slice / \
102CPUWeight=20 DisableControllers=cpu / \
103 / \ app.slice session.slice
104 / \ CPUWeight=100 CPUWeight=100
105 / \
106 b1.service b2.service
107 CPUWeight=1000
108 </programlisting>
109
110 <para>In this hierarchy, the <option>cpu</option> controller is enabled for all units shown except
111 <filename>b1.service</filename> and <filename>b2.service</filename>. Because there is no explicit
112 configuration for <filename>system.slice</filename> and <filename>user.slice</filename>, CPU
113 resources will be split equally between them. Similarly, resources are allocated equally between
114 children of <filename>user.slice</filename> and between the child slices beneath
94d82b59 115 <filename>user@1000.service</filename>. Assuming that there is no further configuration of resources
253d0d59
ZJS
116 or delegation below slices <filename>app.slice</filename> or <filename>session.slice</filename>, the
117 <option>cpu</option> controller would not be enabled for units in those slices and CPU resources
449172f9
ZJS
118 would be further allocated using other mechanisms, e.g. based on nice levels. The manager for user
119 42 has delegation enabled without any controllers, i.e. it can manipulate its subtree of the cgroup
120 hierarchy, but without resource control.</para>
253d0d59
ZJS
121
122 <para>In the slice <filename>system.slice</filename>, CPU resources are split 1:6 for service
123 <filename>a.service</filename>, and 5:6 for slice <filename>b.slice</filename>, because slice
124 <filename>b.slice</filename> gets the default value of 100 for <filename>cpu.weight</filename> when
125 <varname>CPUWeight=</varname> is not set.</para>
126
127 <para><varname>CPUWeight=</varname> setting in service <filename>b2.service</filename> is neutralized
128 by <varname>DisableControllers=</varname> in slice <filename>b.slice</filename>, so the
129 <option>cpu</option> controller would not be enabled for services <filename>b1.service</filename> and
130 <filename>b2.service</filename>, and CPU resources would be further allocated using other mechanisms,
131 e.g. based on nice levels.</para>
132 </example>
133 </refsect2>
a8136f1b
ZJS
134
135 <refsect2>
136 <title>Setting resource controls for a group of related units</title>
137
138 <para>As described in
139 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>, the
140 settings listed here may be set through the main file of a unit and drop-in snippets in
141 <filename index="false">*.d/</filename> directories. The list of directories searched for drop-ins
142 includes names formed by repeatedly truncating the unit name after all dashes. This is particularly
143 convenient to set resource limits for a group of units with similar names.</para>
144
145 <para>For example, every user gets their own slice
146 <filename>user-<replaceable>nnn</replaceable>.slice</filename>. Drop-ins with local configuration that
147 affect user 1000 may be placed in
148 <filename index="false">/etc/systemd/system/user-1000.slice</filename>,
149 <filename index="false">/etc/systemd/system/user-1000.slice.d/*.conf</filename>, but also
150 <filename index="false">/etc/systemd/system/user-.slice.d/*.conf</filename>. This last directory
151 applies to all user slices.</para>
152 </refsect2>
253d0d59
ZJS
153
154 <para>See the <ulink
155 url="https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface">New
156 Control Group Interfaces</ulink> for an introduction on how to make
157 use of resource control APIs from programs.</para>
d868475a
ZJS
158 </refsect1>
159
c129bd5d 160 <refsect1>
45f09f93 161 <title>Implicit Dependencies</title>
c129bd5d 162
45f09f93
JL
163 <para>The following dependencies are implicitly added:</para>
164
165 <itemizedlist>
166 <listitem><para>Units with the <varname>Slice=</varname> setting set automatically acquire
167 <varname>Requires=</varname> and <varname>After=</varname> dependencies on the specified
168 slice unit.</para></listitem>
169 </itemizedlist>
c129bd5d
LP
170 </refsect1>
171
45f09f93
JL
172 <!-- We don't have any default dependency here. -->
173
d868475a
ZJS
174 <refsect1>
175 <title>Options</title>
176
5cbfbf2a
LP
177 <para>Units of the types listed above can have settings for resource control configuration:</para>
178
179 <refsect2><title>CPU Accounting and Control</title>
d868475a
ZJS
180
181 <variablelist class='unit-directives'>
d868475a
ZJS
182
183 <varlistentry>
61ad59b1 184 <term><varname>CPUAccounting=</varname></term>
d868475a
ZJS
185
186 <listitem>
61ad59b1
LP
187 <para>Turn on CPU usage accounting for this unit. Takes a
188 boolean argument. Note that turning on CPU accounting for
03a7b521 189 one unit will also implicitly turn it on for all units
085afe36
LP
190 contained in the same slice and for all its parent slices
191 and the units contained therein. The system default for this
03a7b521 192 setting may be controlled with
085afe36
LP
193 <varname>DefaultCPUAccounting=</varname> in
194 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
695e39dd
ZJS
195
196 <para>Under the unified cgroup hierarchy, CPU accounting is available for all units and this
197 setting has no effect.</para>
d868475a
ZJS
198 </listitem>
199 </varlistentry>
200
66ebf6c0
TH
201 <varlistentry>
202 <term><varname>CPUWeight=<replaceable>weight</replaceable></varname></term>
203 <term><varname>StartupCPUWeight=<replaceable>weight</replaceable></varname></term>
204
205 <listitem>
253d0d59
ZJS
206 <para>These settings control the <option>cpu</option> controller in the unified hierarchy.</para>
207
c8340822 208 <para>These options accept an integer value or a the special string "idle":</para>
209 <itemizedlist>
210 <listitem>
396d298d
ZJS
211 <para>If set to an integer value, assign the specified CPU time weight to the processes
212 executed, if the unified control group hierarchy is used on the system. These options control
213 the <literal>cpu.weight</literal> control group attribute. The allowed range is 1 to 10000.
214 Defaults to unset, but the kernel default is 100. For details about this control group
215 attribute, see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups
216 v2</ulink> and <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS
217 Scheduler</ulink>. The available CPU time is split up among all units within one slice
218 relative to their CPU time weight. A higher weight means more CPU time, a lower weight means
219 less.</para>
c8340822 220 </listitem>
221 <listitem>
222 <para>If set to the special string "idle", mark the cgroup for "idle scheduling", which means
223 that it will get CPU resources only when there are no processes not marked in this way to execute in this
224 cgroup or its siblings. This setting corresponds to the <literal>cpu.idle</literal> cgroup attribute.</para>
225
226 <para>Note that this value only has an effect on cgroup-v2, for cgroup-v1 it is equivalent to the minimum weight.</para>
227 </listitem>
228 </itemizedlist>
66ebf6c0 229
058a2d8f 230 <para>While <varname>StartupCPUWeight=</varname> applies to the startup and shutdown phases of the system,
66ebf6c0 231 <varname>CPUWeight=</varname> applies to normal runtime of the system, and if the former is not set also to
058a2d8f
PM
232 the startup and shutdown phases. Using <varname>StartupCPUWeight=</varname> allows prioritizing specific services at
233 boot-up and shutdown differently than during normal runtime.</para>
dca031d2
ZJS
234
235 <para>In addition to the resource allocation performed by the <option>cpu</option> controller, the
236 kernel may automatically divide resources based on session-id grouping, see "The autogroup feature"
237 in <citerefentry
238 project='man-pages'><refentrytitle>sched</refentrytitle><manvolnum>7</manvolnum></citerefentry>.
239 The effect of this feature is similar to the <option>cpu</option> controller with no explicit
240 configuration, so users should be careful to not mistake one for the other.</para>
b2f8b02e
LP
241 </listitem>
242 </varlistentry>
243
244 <varlistentry>
245 <term><varname>CPUQuota=</varname></term>
246
247 <listitem>
253d0d59
ZJS
248 <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para>
249
66ebf6c0
TH
250 <para>Assign the specified CPU time quota to the processes executed. Takes a percentage value, suffixed with
251 "%". The percentage specifies how much CPU time the unit shall get at maximum, relative to the total CPU time
252 available on one CPU. Use values &gt; 100% for allotting CPU time on more than one CPU. This controls the
253 <literal>cpu.max</literal> attribute on the unified control group hierarchy and
254 <literal>cpu.cfs_quota_us</literal> on legacy. For details about these control group attributes, see <ulink
0e685823 255 url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and <ulink
256 url="https://docs.kernel.org/scheduler/sched-bwc.html">CFS Bandwidth Control</ulink>.
17cfd6f9 257 Setting <varname>CPUQuota=</varname> to an empty value unsets the quota.</para>
b2f8b02e 258
66ebf6c0
TH
259 <para>Example: <varname>CPUQuota=20%</varname> ensures that the executed processes will never get more than
260 20% CPU time on one CPU.</para>
b2f8b02e 261
b2f8b02e
LP
262 </listitem>
263 </varlistentry>
264
10f28641
FB
265 <varlistentry>
266 <term><varname>CPUQuotaPeriodSec=</varname></term>
267
268 <listitem>
253d0d59
ZJS
269 <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para>
270
10f28641
FB
271 <para>Assign the duration over which the CPU time quota specified by <varname>CPUQuota=</varname> is measured.
272 Takes a time duration value in seconds, with an optional suffix such as "ms" for milliseconds (or "s" for seconds.)
273 The default setting is 100ms. The period is clamped to the range supported by the kernel, which is [1ms, 1000ms].
274 Additionally, the period is adjusted up so that the quota interval is also at least 1ms.
275 Setting <varname>CPUQuotaPeriodSec=</varname> to an empty value resets it to the default.</para>
276
277 <para>This controls the second field of <literal>cpu.max</literal> attribute on the unified control group hierarchy
278 and <literal>cpu.cfs_period_us</literal> on legacy. For details about these control group attributes, see
0e685823 279 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and
280 <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS Scheduler</ulink>.</para>
10f28641
FB
281
282 <para>Example: <varname>CPUQuotaPeriodSec=10ms</varname> to request that the CPU quota is measured in periods of 10ms.</para>
283 </listitem>
284 </varlistentry>
047f5d63
PH
285
286 <varlistentry>
287 <term><varname>AllowedCPUs=</varname></term>
c93a7d4a 288 <term><varname>StartupAllowedCPUs=</varname></term>
047f5d63
PH
289
290 <listitem>
253d0d59
ZJS
291 <para>This setting controls the <option>cpuset</option> controller in the unified hierarchy.</para>
292
047f5d63
PH
293 <para>Restrict processes to be executed on specific CPUs. Takes a list of CPU indices or ranges separated by either
294 whitespace or commas. CPU ranges are specified by the lower and upper CPU indices separated by a dash.</para>
295
c93a7d4a
PM
296 <para>Setting <varname>AllowedCPUs=</varname> or <varname>StartupAllowedCPUs=</varname> doesn't guarantee that all
297 of the CPUs will be used by the processes as it may be limited by parent units. The effective configuration is
298 reported as <varname>EffectiveCPUs=</varname>.</para>
299
058a2d8f 300 <para>While <varname>StartupAllowedCPUs=</varname> applies to the startup and shutdown phases of the system,
c93a7d4a 301 <varname>AllowedCPUs=</varname> applies to normal runtime of the system, and if the former is not set also to
058a2d8f
PM
302 the startup and shutdown phases. Using <varname>StartupAllowedCPUs=</varname> allows prioritizing specific services at
303 boot-up and shutdown differently than during normal runtime.</para>
047f5d63
PH
304
305 <para>This setting is supported only with the unified control group hierarchy.</para>
306 </listitem>
307 </varlistentry>
308
5cbfbf2a 309 </variablelist>
c93a7d4a 310
5cbfbf2a 311 </refsect2><refsect2><title>Memory Accounting and Control</title>
047f5d63 312
5cbfbf2a 313 <variablelist class='unit-directives'>
10f28641 314
61ad59b1
LP
315 <varlistentry>
316 <term><varname>MemoryAccounting=</varname></term>
317
318 <listitem>
253d0d59
ZJS
319 <para>This setting controls the <option>memory</option> controller in the unified hierarchy.</para>
320
61ad59b1
LP
321 <para>Turn on process and kernel memory accounting for this
322 unit. Takes a boolean argument. Note that turning on memory
03a7b521
LP
323 accounting for one unit will also implicitly turn it on for
324 all units contained in the same slice and for all its parent
325 slices and the units contained therein. The system default
326 for this setting may be controlled with
085afe36
LP
327 <varname>DefaultMemoryAccounting=</varname> in
328 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
61ad59b1
LP
329 </listitem>
330 </varlistentry>
331
48422635 332 <varlistentry>
29bb3d7f 333 <term><varname>MemoryMin=<replaceable>bytes</replaceable></varname>, <varname>MemoryLow=<replaceable>bytes</replaceable></varname></term>
f72dcb92 334 <term><varname>StartupMemoryLow=<replaceable>bytes</replaceable></varname>, <varname>DefaultStartupMemoryLow=<replaceable>bytes</replaceable></varname></term>
48422635
TH
335
336 <listitem>
253d0d59
ZJS
337 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
338
29bb3d7f
BB
339 <para>Specify the memory usage protection of the executed processes in this unit.
340 When reclaiming memory, the unit is treated as if it was using less memory resulting in memory
341 to be preferentially reclaimed from unprotected units.
342 Using <varname>MemoryLow=</varname> results in a weaker protection where memory may still
343 be reclaimed to avoid invoking the OOM killer in case there is no other reclaimable memory.</para>
344 <para>
345 For a protection to be effective, it is generally required to set a corresponding
346 allocation on all ancestors, which is then distributed between children
347 (with the exception of the root slice).
348 Any <varname>MemoryMin=</varname> or <varname>MemoryLow=</varname> allocation that is not
349 explicitly distributed to specific children is used to create a shared protection for all children.
350 As this is a shared protection, the children will freely compete for the memory.</para>
48422635
TH
351
352 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
353 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
354 percentage value may be specified, which is taken relative to the installed physical memory on the
b62087d4
CD
355 system. If assigned the special value <literal>infinity</literal>, all available memory is protected, which may be
356 useful in order to always inherit all of the protection afforded by ancestors.
29bb3d7f
BB
357 This controls the <literal>memory.min</literal> or <literal>memory.low</literal> control group attribute.
358 For details about this control group attribute, see <ulink
0e685823 359 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
48422635 360
29bb3d7f
BB
361 <para>Units may have their children use a default <literal>memory.min</literal> or
362 <literal>memory.low</literal> value by specifying <varname>DefaultMemoryMin=</varname> or
363 <varname>DefaultMemoryLow=</varname>, which has the same semantics as
96f321b6
LB
364 <varname>MemoryMin=</varname> and <varname>MemoryLow=</varname>, or <varname>DefaultStartupMemoryLow=</varname>
365 which has the same semantics as <varname>StartupMemoryLow=</varname>.
29bb3d7f
BB
366 This setting does not affect <literal>memory.min</literal> or <literal>memory.low</literal>
367 in the unit itself.
368 Using it to set a default child allocation is only useful on kernels older than 5.7,
369 which do not support the <literal>memory_recursiveprot</literal> cgroup2 mount option.</para>
53fda560
LB
370
371 <para>While <varname>StartupMemoryLow=</varname> applies to the startup and shutdown phases of the system,
372 <varname>MemoryMin=</varname> applies to normal runtime of the system, and if the former is not set also to
373 the startup and shutdown phases. Using <varname>StartupMemoryLow=</varname> allows prioritizing specific services at
374 boot-up and shutdown differently than during normal runtime.</para>
da4d897e
TH
375 </listitem>
376 </varlistentry>
377
378 <varlistentry>
379 <term><varname>MemoryHigh=<replaceable>bytes</replaceable></varname></term>
53fda560 380 <term><varname>StartupMemoryHigh=<replaceable>bytes</replaceable></varname></term>
da4d897e
TH
381
382 <listitem>
253d0d59
ZJS
383 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
384
ba79e19c 385 <para>Specify the throttling limit on memory usage of the executed processes in this unit. Memory usage may go
da4d897e
TH
386 above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away
387 aggressively in such cases. This is the main mechanism to control memory usage of a unit.</para>
388
389 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
875ae566
LP
390 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
391 percentage value may be specified, which is taken relative to the installed physical memory on the
392 system. If assigned the
ba79e19c 393 special value <literal>infinity</literal>, no memory throttling is applied. This controls the
da4d897e 394 <literal>memory.high</literal> control group attribute. For details about this control group attribute, see
0e685823 395 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
53fda560
LB
396
397 <para>While <varname>StartupMemoryHigh=</varname> applies to the startup and shutdown phases of the system,
398 <varname>MemoryHigh=</varname> applies to normal runtime of the system, and if the former is not set also to
399 the startup and shutdown phases. Using <varname>StartupMemoryHigh=</varname> allows prioritizing specific services at
400 boot-up and shutdown differently than during normal runtime.</para>
da4d897e
TH
401 </listitem>
402 </varlistentry>
403
404 <varlistentry>
405 <term><varname>MemoryMax=<replaceable>bytes</replaceable></varname></term>
53fda560 406 <term><varname>StartupMemoryMax=<replaceable>bytes</replaceable></varname></term>
da4d897e
TH
407
408 <listitem>
253d0d59
ZJS
409 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
410
da4d897e
TH
411 <para>Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage
412 cannot be contained under the limit, out-of-memory killer is invoked inside the unit. It is recommended to
413 use <varname>MemoryHigh=</varname> as the main control mechanism and use <varname>MemoryMax=</varname> as the
414 last line of defense.</para>
415
416 <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
875ae566
LP
417 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
418 percentage value may be specified, which is taken relative to the installed physical memory on the system. If
419 assigned the special value <literal>infinity</literal>, no memory limit is applied. This controls the
da4d897e 420 <literal>memory.max</literal> control group attribute. For details about this control group attribute, see
0e685823 421 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
53fda560
LB
422
423 <para>While <varname>StartupMemoryMax=</varname> applies to the startup and shutdown phases of the system,
424 <varname>MemoryMax=</varname> applies to normal runtime of the system, and if the former is not set also to
425 the startup and shutdown phases. Using <varname>StartupMemoryMax=</varname> allows prioritizing specific services at
426 boot-up and shutdown differently than during normal runtime.</para>
da4d897e
TH
427 </listitem>
428 </varlistentry>
429
96e131ea
WC
430 <varlistentry>
431 <term><varname>MemorySwapMax=<replaceable>bytes</replaceable></varname></term>
53fda560 432 <term><varname>StartupMemorySwapMax=<replaceable>bytes</replaceable></varname></term>
96e131ea
WC
433
434 <listitem>
253d0d59
ZJS
435 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
436
6ee27eb3 437 <para>Specify the absolute limit on swap usage of the executed processes in this unit.</para>
96e131ea
WC
438
439 <para>Takes a swap size in bytes. If the value is suffixed with K, M, G or T, the specified swap size is
440 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the
d7fe0a67 441 special value <literal>infinity</literal>, no swap limit is applied. These settings control the
6ee27eb3
AZ
442 <literal>memory.swap.max</literal> control group attribute. For details about this control group attribute,
443 see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
53fda560
LB
444
445 <para>While <varname>StartupMemorySwapMax=</varname> applies to the startup and shutdown phases of the system,
446 <varname>MemorySwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to
447 the startup and shutdown phases. Using <varname>StartupMemorySwapMax=</varname> allows prioritizing specific services at
448 boot-up and shutdown differently than during normal runtime.</para>
6ee27eb3
AZ
449 </listitem>
450 </varlistentry>
451
452 <varlistentry>
453 <term><varname>MemoryZSwapMax=<replaceable>bytes</replaceable></varname></term>
53fda560 454 <term><varname>StartupMemoryZSwapMax=<replaceable>bytes</replaceable></varname></term>
6ee27eb3
AZ
455
456 <listitem>
253d0d59
ZJS
457 <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para>
458
6ee27eb3
AZ
459 <para>Specify the absolute limit on zswap usage of the processes in this unit. Zswap is a lightweight compressed
460 cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a
461 dynamically allocated RAM-based memory pool. If the limit specified is hit, no entries from this unit will be
462 stored in the pool until existing entries are faulted back or written out to disk. See the kernel's
463 <ulink url="https://www.kernel.org/doc/html/latest/admin-guide/mm/zswap.html">Zswap</ulink> documentation for more details.</para>
464
465 <para>Takes a size in bytes. If the value is suffixed with K, M, G or T, the specified size is
466 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the
467 special value <literal>infinity</literal>, no limit is applied. These settings control the
468 <literal>memory.zswap.max</literal> control group attribute. For details about this control group attribute,
0e685823 469 see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para>
53fda560
LB
470
471 <para>While <varname>StartupMemoryZSwapMax=</varname> applies to the startup and shutdown phases of the system,
472 <varname>MemoryZSwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to
473 the startup and shutdown phases. Using <varname>StartupMemoryZSwapMax=</varname> allows prioritizing specific services at
474 boot-up and shutdown differently than during normal runtime.</para>
d868475a
ZJS
475 </listitem>
476 </varlistentry>
477
5cbfbf2a
LP
478 <varlistentry>
479 <term><varname>AllowedMemoryNodes=</varname></term>
480 <term><varname>StartupAllowedMemoryNodes=</varname></term>
481
482 <listitem>
483 <para>These settings control the <option>cpuset</option> controller in the unified hierarchy.</para>
484
485 <para>Restrict processes to be executed on specific memory NUMA nodes. Takes a list of memory NUMA nodes indices
486 or ranges separated by either whitespace or commas. Memory NUMA nodes ranges are specified by the lower and upper
487 NUMA nodes indices separated by a dash.</para>
488
489 <para>Setting <varname>AllowedMemoryNodes=</varname> or <varname>StartupAllowedMemoryNodes=</varname> doesn't
490 guarantee that all of the memory NUMA nodes will be used by the processes as it may be limited by parent units.
491 The effective configuration is reported as <varname>EffectiveMemoryNodes=</varname>.</para>
492
493 <para>While <varname>StartupAllowedMemoryNodes=</varname> applies to the startup and shutdown phases of the system,
494 <varname>AllowedMemoryNodes=</varname> applies to normal runtime of the system, and if the former is not set also to
495 the startup and shutdown phases. Using <varname>StartupAllowedMemoryNodes=</varname> allows prioritizing specific services at
496 boot-up and shutdown differently than during normal runtime.</para>
497
498 <para>This setting is supported only with the unified control group hierarchy.</para>
499 </listitem>
500 </varlistentry>
501
502 </variablelist>
503
504 </refsect2><refsect2><title>Process Accounting and Control</title>
505
506 <variablelist class='unit-directives'>
507
03a7b521
LP
508 <varlistentry>
509 <term><varname>TasksAccounting=</varname></term>
510
511 <listitem>
253d0d59
ZJS
512 <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para>
513
396d298d
ZJS
514 <para>Turn on task accounting for this unit. Takes a boolean argument. If enabled, the kernel will
515 keep track of the total number of tasks in the unit and its children. This number includes both
516 kernel threads and userspace processes, with each thread counted individually. Note that turning on
517 tasks accounting for one unit will also implicitly turn it on for all units contained in the same
518 slice and for all its parent slices and the units contained therein. The system default for this
519 setting may be controlled with <varname>DefaultTasksAccounting=</varname> in
03a7b521
LP
520 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
521 </listitem>
522 </varlistentry>
523
524 <varlistentry>
525 <term><varname>TasksMax=<replaceable>N</replaceable></varname></term>
526
527 <listitem>
253d0d59
ZJS
528 <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para>
529
6d48c7cf
LP
530 <para>Specify the maximum number of tasks that may be created in the unit. This ensures that the
531 number of tasks accounted for the unit (see above) stays below a specific limit. This either takes
532 an absolute number of tasks or a percentage value that is taken relative to the configured maximum
533 number of tasks on the system. If assigned the special value <literal>infinity</literal>, no tasks
534 limit is applied. This controls the <literal>pids.max</literal> control group attribute. For
535 details about this control group attribute, the
536 <ulink url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#pid">pids controller
537 </ulink>.</para>
03a7b521 538
bb6d563a 539 <para>The system default for this setting may be controlled with
0af20ea2
LP
540 <varname>DefaultTasksMax=</varname> in
541 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
03a7b521
LP
542 </listitem>
543 </varlistentry>
544
5cbfbf2a
LP
545 </variablelist>
546
547 </refsect2><refsect2><title>IO Accounting and Control</title>
548
549 <variablelist class='unit-directives'>
550
13c31542
TH
551 <varlistentry>
552 <term><varname>IOAccounting=</varname></term>
553
554 <listitem>
253d0d59
ZJS
555 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
556
0069a0dd
LP
557 <para>Turn on Block I/O accounting for this unit, if the unified control group hierarchy is used on the
558 system. Takes a boolean argument. Note that turning on block I/O accounting for one unit will also implicitly
559 turn it on for all units contained in the same slice and all for its parent slices and the units contained
560 therein. The system default for this setting may be controlled with <varname>DefaultIOAccounting=</varname>
561 in
13c31542
TH
562 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
563 </listitem>
564 </varlistentry>
565
566 <varlistentry>
567 <term><varname>IOWeight=<replaceable>weight</replaceable></varname></term>
568 <term><varname>StartupIOWeight=<replaceable>weight</replaceable></varname></term>
569
570 <listitem>
253d0d59
ZJS
571 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
572
7dbc38db
LP
573 <para>Set the default overall block I/O weight for the executed processes, if the unified control
574 group hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the
575 default block I/O weight. This controls the <literal>io.weight</literal> control group attribute,
576 which defaults to 100. For details about this control group attribute, see <ulink
0e685823 577 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO
7dbc38db
LP
578 Interface Files</ulink>. The available I/O bandwidth is split up among all units within one slice
579 relative to their block I/O weight. A higher weight means more I/O bandwidth, a lower weight means
580 less.</para>
13c31542 581
058a2d8f
PM
582 <para>While <varname>StartupIOWeight=</varname> applies
583 to the startup and shutdown phases of the system,
13c31542
TH
584 <varname>IOWeight=</varname> applies to the later runtime of
585 the system, and if the former is not set also to the startup
058a2d8f
PM
586 and shutdown phases. This allows prioritizing specific services at boot-up
587 and shutdown differently than during runtime.</para>
13c31542
TH
588 </listitem>
589 </varlistentry>
590
591 <varlistentry>
592 <term><varname>IODeviceWeight=<replaceable>device</replaceable> <replaceable>weight</replaceable></varname></term>
593
594 <listitem>
253d0d59
ZJS
595 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
596
0069a0dd
LP
597 <para>Set the per-device overall block I/O weight for the executed processes, if the unified control group
598 hierarchy is used on the system. Takes a space-separated pair of a file path and a weight value to specify
6ae4283c
TH
599 the device specific weight value, between 1 and 10000. (Example: <literal>/dev/sda 1000</literal>). The file
600 path may be specified as path to a block device node or as any other file, in which case the backing block
601 device of the file system of the file is determined. This controls the <literal>io.weight</literal> control
602 group attribute, which defaults to 100. Use this option multiple times to set weights for multiple devices.
603 For details about this control group attribute, see <ulink
0e685823 604 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para>
13c31542 605
f27a21d4
LP
606 <para>The specified device node should reference a block device that has an I/O scheduler
607 associated, i.e. should not refer to partition or loopback block devices, but to the originating,
608 physical device. When a path to a regular file or directory is specified it is attempted to
609 discover the correct originating device backing the file system of the specified path. This works
610 correctly only for simpler cases, where the file system is directly placed on a partition or
611 physical block device, or where simple 1:1 encryption using dm-crypt/LUKS is used. This discovery
612 does not cover complex storage and in particular RAID and volume management storage devices.</para>
13c31542
TH
613 </listitem>
614 </varlistentry>
615
616 <varlistentry>
617 <term><varname>IOReadBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term>
618 <term><varname>IOWriteBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term>
619
620 <listitem>
253d0d59
ZJS
621 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
622
0069a0dd
LP
623 <para>Set the per-device overall block I/O bandwidth maximum limit for the executed processes, if the unified
624 control group hierarchy is used on the system. This limit is not work-conserving and the executed processes
625 are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of a file
626 path and a bandwidth value (in bytes per second) to specify the device specific bandwidth. The file path may
627 be a path to a block device node, or as any other file in which case the backing block device of the file
628 system of the file is used. If the bandwidth is suffixed with K, M, G, or T, the specified bandwidth is
629 parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the base of 1000. (Example:
630 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This controls the <literal>io.max</literal> control
631 group attributes. Use this option multiple times to set bandwidth limits for multiple devices. For details
632 about this control group attribute, see <ulink
0e685823 633 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.
13c31542
TH
634 </para>
635
f27a21d4 636 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
13c31542
TH
637 </listitem>
638 </varlistentry>
639
ac06a0cf
TH
640 <varlistentry>
641 <term><varname>IOReadIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term>
642 <term><varname>IOWriteIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term>
643
644 <listitem>
253d0d59
ZJS
645 <para>These settings control the <option>io</option> controller in the unified hierarchy.</para>
646
ac06a0cf
TH
647 <para>Set the per-device overall block I/O IOs-Per-Second maximum limit for the executed processes, if the
648 unified control group hierarchy is used on the system. This limit is not work-conserving and the executed
649 processes are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of
650 a file path and an IOPS value to specify the device specific IOPS. The file path may be a path to a block
651 device node, or as any other file in which case the backing block device of the file system of the file is
652 used. If the IOPS is suffixed with K, M, G, or T, the specified IOPS is parsed as KiloIOPS, MegaIOPS,
653 GigaIOPS, or TeraIOPS, respectively, to the base of 1000. (Example:
654 "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This controls the <literal>io.max</literal> control
655 group attributes. Use this option multiple times to set IOPS limits for multiple devices. For details about
656 this control group attribute, see <ulink
0e685823 657 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.
ac06a0cf
TH
658 </para>
659
f27a21d4 660 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
d868475a 661 </listitem>
6ae4283c
TH
662 </varlistentry>
663
664 <varlistentry>
665 <term><varname>IODeviceLatencyTargetSec=<replaceable>device</replaceable> <replaceable>target</replaceable></varname></term>
666
667 <listitem>
253d0d59
ZJS
668 <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para>
669
6ae4283c
TH
670 <para>Set the per-device average target I/O latency for the executed processes, if the unified control group
671 hierarchy is used on the system. Takes a file path and a timespan separated by a space to specify
672 the device specific latency target. (Example: "/dev/sda 25ms"). The file path may be specified
673 as path to a block device node or as any other file, in which case the backing block device of the file
674 system of the file is determined. This controls the <literal>io.latency</literal> control group
675 attribute. Use this option multiple times to set latency target for multiple devices. For details about this
676 control group attribute, see <ulink
0e685823 677 url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para>
6ae4283c 678
964c4eda 679 <para>Implies <literal>IOAccounting=yes</literal>.</para>
6ae4283c
TH
680
681 <para>These settings are supported only if the unified control group hierarchy is used.</para>
f27a21d4
LP
682
683 <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para>
6ae4283c 684 </listitem>
d868475a
ZJS
685 </varlistentry>
686
5cbfbf2a
LP
687 </variablelist>
688
689 </refsect2><refsect2><title>Network Accounting and Control</title>
690
691 <variablelist class='unit-directives'>
692
8d8631d4
DM
693 <varlistentry>
694 <term><varname>IPAccounting=</varname></term>
695
696 <listitem>
697 <para>Takes a boolean argument. If true, turns on IPv4 and IPv6 network traffic accounting for packets sent
698 or received by the unit. When this option is turned on, all IPv4 and IPv6 sockets created by any process of
2f75b05c
ZJS
699 the unit are accounted for.</para>
700
701 <para>When this option is used in socket units, it applies to all IPv4 and IPv6 sockets
8d8631d4
DM
702 associated with it (including both listening and connection sockets where this applies). Note that for
703 socket-activated services, this configuration setting and the accounting data of the service unit and the
704 socket unit are kept separate, and displayed separately. No propagation of the setting and the collected
705 statistics is done, in either direction. Moreover, any traffic sent or received on any of the socket unit's
706 sockets is accounted to the socket unit — and never to the service unit it might have activated, even if the
2f75b05c
ZJS
707 socket is used by it.</para>
708
709 <para>The system default for this setting may be controlled with <varname>DefaultIPAccounting=</varname> in
8d8631d4
DM
710 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
711 </listitem>
712 </varlistentry>
713
714 <varlistentry>
dcfaecc7 715 <term><varname>IPAddressAllow=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
8d8631d4
DM
716 <term><varname>IPAddressDeny=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
717
718 <listitem>
e1a04232
ZJS
719 <para>Turn on network traffic filtering for IP packets sent and received over
720 <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets. Both directives take a
ef81ce6e 721 space separated list of IPv4 or IPv6 addresses, each optionally suffixed with an address prefix
e1a04232
ZJS
722 length in bits after a <literal>/</literal> character. If the suffix is omitted, the address is
723 considered a host address, i.e. the filter covers the whole address (32 bits for IPv4, 128 bits for
724 IPv6).</para>
ef81ce6e
LP
725
726 <para>The access lists configured with this option are applied to all sockets created by processes
727 of this unit (or in the case of socket units, associated with it). The lists are implicitly
728 combined with any lists configured for any of the parent slice units this unit might be a member
e1a04232 729 of. By default both access lists are empty. Both ingress and egress traffic is filtered by these
ef81ce6e 730 settings. In case of ingress traffic the source IP address is checked against these access lists,
e1a04232
ZJS
731 in case of egress traffic the destination IP address is checked. The following rules are applied in
732 turn:</para>
8d8631d4
DM
733
734 <itemizedlist>
e1a04232
ZJS
735 <listitem><para>Access is granted when the checked IP address matches an entry in the
736 <varname>IPAddressAllow=</varname> list.</para></listitem>
8d8631d4 737
e1a04232
ZJS
738 <listitem><para>Otherwise, access is denied when the checked IP address matches an entry in the
739 <varname>IPAddressDeny=</varname> list.</para></listitem>
8d8631d4 740
e1a04232 741 <listitem><para>Otherwise, access is granted.</para></listitem>
8d8631d4
DM
742 </itemizedlist>
743
6b000af4
LP
744 <para>In order to implement an allow-listing IP firewall, it is recommended to use a
745 <varname>IPAddressDeny=</varname><constant>any</constant> setting on an upper-level slice unit
746 (such as the root slice <filename>-.slice</filename> or the slice containing all system services
8d8631d4 747 <filename>system.slice</filename> – see
6b000af4
LP
748 <citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry>
749 for details on these slice units), plus individual per-service <varname>IPAddressAllow=</varname>
750 lines permitting network access to relevant services, and only them.</para>
8d8631d4 751
e1a04232
ZJS
752 <para>Note that for socket-activated services, the IP access list configured on the socket unit
753 applies to all sockets associated with it directly, but not to any sockets created by the
754 ultimately activated services for it. Conversely, the IP access list configured for the service is
755 not applied to any sockets passed into the service via socket activation. Thus, it is usually a
756 good idea to replicate the IP access lists on both the socket and the service unit. Nevertheless,
757 it may make sense to maintain one list more open and the other one more restricted, depending on
758 the usecase.</para>
8d8631d4
DM
759
760 <para>If these settings are used multiple times in the same unit the specified lists are combined. If an
761 empty string is assigned to these settings the specific access list is reset and all previous settings undone.</para>
762
763 <para>In place of explicit IPv4 or IPv6 address and prefix length specifications a small set of symbolic
764 names may be used. The following names are defined:</para>
765
766 <table>
767 <title>Special address/network names</title>
768
769 <tgroup cols='3'>
770 <colspec colname='name'/>
771 <colspec colname='definition'/>
772 <colspec colname='meaning'/>
773
774 <thead>
775 <row>
776 <entry>Symbolic Name</entry>
777 <entry>Definition</entry>
778 <entry>Meaning</entry>
779 </row>
780 </thead>
781
782 <tbody>
783 <row>
784 <entry><constant>any</constant></entry>
785 <entry>0.0.0.0/0 ::/0</entry>
786 <entry>Any host</entry>
787 </row>
788
789 <row>
790 <entry><constant>localhost</constant></entry>
791 <entry>127.0.0.0/8 ::1/128</entry>
792 <entry>All addresses on the local loopback</entry>
793 </row>
794
795 <row>
796 <entry><constant>link-local</constant></entry>
797 <entry>169.254.0.0/16 fe80::/64</entry>
798 <entry>All link-local IP addresses</entry>
799 </row>
800
801 <row>
802 <entry><constant>multicast</constant></entry>
803 <entry>224.0.0.0/4 ff00::/8</entry>
804 <entry>All IP multicasting addresses</entry>
805 </row>
806 </tbody>
807 </tgroup>
808 </table>
809
810 <para>Note that these settings might not be supported on some systems (for example if eBPF control group
811 support is not enabled in the underlying kernel or container manager). These settings will have no effect in
812 that case. If compatibility with such systems is desired it is hence recommended to not exclusively rely on
813 them for IP security.</para>
f2af682c
LB
814
815 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
8d8631d4
DM
816 </listitem>
817 </varlistentry>
818
63598110
JK
819 <varlistentry>
820 <term><varname>SocketBindAllow=<replaceable>bind-rule</replaceable></varname></term>
821 <term><varname>SocketBindDeny=<replaceable>bind-rule</replaceable></varname></term>
822
823 <listitem>
824 <para>Allow or deny binding a socket address to a socket by matching it with the <replaceable>bind-rule</replaceable> and
825 applying a corresponding action if there is a match.</para>
826
120338ae
JK
827 <para><replaceable>bind-rule</replaceable> describes socket properties such as <replaceable>address-family</replaceable>,
828 <replaceable>transport-protocol</replaceable> and <replaceable>ip-ports</replaceable>.</para>
63598110 829
120338ae
JK
830 <para><replaceable>bind-rule</replaceable> :=
831 { [<replaceable>address-family</replaceable><constant>:</constant>][<replaceable>transport-protocol</replaceable><constant>:</constant>][<replaceable>ip-ports</replaceable>] | <constant>any</constant> }</para>
63598110 832
f80a206a 833 <para><replaceable>address-family</replaceable> := { <constant>ipv4</constant> | <constant>ipv6</constant> }</para>
63598110 834
120338ae 835 <para><replaceable>transport-protocol</replaceable> := { <constant>tcp</constant> | <constant>udp</constant> }</para>
63598110 836
120338ae
JK
837 <para><replaceable>ip-ports</replaceable> := { <replaceable>ip-port</replaceable> | <replaceable>ip-port-range</replaceable> }</para>
838
839 <para>An optional <replaceable>address-family</replaceable> expects <constant>ipv4</constant> or <constant>ipv6</constant> values.
840 If not specified, a rule will be matched for both IPv4 and IPv6 addresses and applied depending on other socket fields, e.g. <replaceable>transport-protocol</replaceable>,
63598110
JK
841 <replaceable>ip-port</replaceable>.</para>
842
120338ae
JK
843 <para>An optional <replaceable>transport-protocol</replaceable> expects <constant>tcp</constant> or <constant>udp</constant> transport protocol names.
844 If not specified, a rule will be matched for any transport protocol.</para>
845
846 <para>An optional <replaceable>ip-port</replaceable> value must lie within 1…65535 interval inclusively, i.e.
63598110
JK
847 dynamic port <constant>0</constant> is not allowed. A range of sequential ports is described by
848 <replaceable>ip-port-range</replaceable> := <replaceable>ip-port-low</replaceable><constant>-</constant><replaceable>ip-port-high</replaceable>,
849 where <replaceable>ip-port-low</replaceable> is smaller than or equal to <replaceable>ip-port-high</replaceable>
120338ae
JK
850 and both are within 1…65535 inclusively.</para>
851
852 <para>A special value <constant>any</constant> can be used to apply a rule to any address family, transport protocol and any port with a positive value.</para>
63598110
JK
853
854 <para>To allow multiple rules assign <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname> multiple times.
855 To clear the existing assignments pass an empty <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname>
856 assignment.</para>
857
858 <para>For each of <varname>SocketBindAllow=</varname> and <varname>SocketBindDeny=</varname>, maximum allowed number of assignments is
859 <constant>128</constant>.</para>
860
861 <itemizedlist>
862 <listitem><para>Binding to a socket is allowed when a socket address matches an entry in the
863 <varname>SocketBindAllow=</varname> list.</para></listitem>
864
865 <listitem><para>Otherwise, binding is denied when the socket address matches an entry in the
866 <varname>SocketBindDeny=</varname> list.</para></listitem>
867
868 <listitem><para>Otherwise, binding is allowed.</para></listitem>
869 </itemizedlist>
870
871 <para>The feature is implemented with <constant>cgroup/bind4</constant> and <constant>cgroup/bind6</constant> cgroup-bpf hooks.</para>
872 <para>Examples:<programlisting>…
873# Allow binding IPv6 socket addresses with a port greater than or equal to 10000.
874[Service]
f80a206a 875SocketBindAllow=ipv6:10000-65535
63598110
JK
876SocketBindDeny=any
877
878# Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports.
879[Service]
880SocketBindAllow=1234
881SocketBindAllow=4321
882SocketBindDeny=any
883
884# Deny binding IPv6 socket addresses.
885[Service]
120338ae 886SocketBindDeny=ipv6
63598110
JK
887
888# Deny binding IPv4 and IPv6 socket addresses.
889[Service]
890SocketBindDeny=any
120338ae
JK
891
892# Allow binding only over TCP
893[Service]
894SocketBindAllow=tcp
895SocketBindDeny=any
896
897# Allow binding only over IPv6/TCP
898[Service]
899SocketBindAllow=ipv6:tcp
900SocketBindDeny=any
901
902# Allow binding ports within 10000-65535 range over IPv4/UDP.
903[Service]
904SocketBindAllow=ipv4:udp:10000-65535
905SocketBindDeny=any
63598110 906…</programlisting></para>
f2af682c
LB
907
908 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
63598110
JK
909 </listitem>
910 </varlistentry>
911
795ccb03
MV
912 <varlistentry>
913 <term><varname>RestrictNetworkInterfaces=</varname></term>
914
915 <listitem>
916 <para>Takes a list of space-separated network interface names. This option restricts the network
917 interfaces that processes of this unit can use. By default processes can only use the network interfaces
918 listed (allow-list). If the first character of the rule is <literal>~</literal>, the effect is inverted:
919 the processes can only use network interfaces not listed (deny-list).
920 </para>
921
922 <para>This option can appear multiple times, in which case the network interface names are merged. If the
d4e30ad1 923 empty string is assigned the set is reset, all prior assignments will have not effect.
795ccb03
MV
924 </para>
925
926 <para>If you specify both types of this option (i.e. allow-listing and deny-listing), the first encountered
927 will take precedence and will dictate the default action (allow vs deny). Then the next occurrences of this
928 option will add or delete the listed network interface names from the set, depending of its type and the
929 default action.
930 </para>
931
932 <para>The loopback interface ("lo") is not treated in any special way, you have to configure it explicitly
933 in the unit file.
934 </para>
935 <para>Example 1: allow-list
936 <programlisting>
937RestrictNetworkInterfaces=eth1
938RestrictNetworkInterfaces=eth2</programlisting>
939 Programs in the unit will be only able to use the eth1 and eth2 network
940 interfaces.
941 </para>
942
943 <para>Example 2: deny-list
944 <programlisting>
945RestrictNetworkInterfaces=~eth1 eth2</programlisting>
946 Programs in the unit will be able to use any network interface but eth1 and eth2.
947 </para>
948
949 <para>Example 3: mixed
950 <programlisting>
951RestrictNetworkInterfaces=eth1 eth2
952RestrictNetworkInterfaces=~eth1</programlisting>
953 Programs in the unit will be only able to use the eth2 network interface.
954 </para>
f2af682c
LB
955
956 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
795ccb03
MV
957 </listitem>
958 </varlistentry>
959
5cbfbf2a
LP
960 </variablelist>
961
962 </refsect2><refsect2><title>BPF Programs</title>
963
964 <variablelist class='unit-directives'>
965
966 <varlistentry>
967 <term><varname>IPIngressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term>
968 <term><varname>IPEgressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term>
969
970 <listitem>
971 <para>Add custom network traffic filters implemented as BPF programs, applying to all IP packets
972 sent and received over <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets.
973 Takes an absolute path to a pinned BPF program in the BPF virtual filesystem (<filename>/sys/fs/bpf/</filename>).
974 </para>
975
976 <para>The filters configured with this option are applied to all sockets created by processes
977 of this unit (or in the case of socket units, associated with it). The filters are loaded in addition
978 to filters any of the parent slice units this unit might be a member of as well as any
979 <varname>IPAddressAllow=</varname> and <varname>IPAddressDeny=</varname> filters in any of these units.
980 By default there are no filters specified.</para>
981
982 <para>If these settings are used multiple times in the same unit all the specified programs are attached. If an
983 empty string is assigned to these settings the program list is reset and all previous specified programs ignored.</para>
984
985 <para>If the path <replaceable>BPF_FS_PROGRAM_PATH</replaceable> in <varname>IPIngressFilterPath=</varname> assignment
986 is already being handled by <varname>BPFProgram=</varname> ingress hook, e.g.
987 <varname>BPFProgram=</varname><constant>ingress</constant>:<replaceable>BPF_FS_PROGRAM_PATH</replaceable>,
988 the assignment will be still considered valid and the program will be attached to a cgroup. Same for
989 <varname>IPEgressFilterPath=</varname> path and <constant>egress</constant> hook.</para>
990
991 <para>Note that for socket-activated services, the IP filter programs configured on the socket unit apply to
992 all sockets associated with it directly, but not to any sockets created by the ultimately activated services
993 for it. Conversely, the IP filter programs configured for the service are not applied to any sockets passed into
994 the service via socket activation. Thus, it is usually a good idea, to replicate the IP filter programs on both
995 the socket and the service unit, however it often makes sense to maintain one configuration more open and the other
996 one more restricted, depending on the usecase.</para>
997
998 <para>Note that these settings might not be supported on some systems (for example if eBPF control group
999 support is not enabled in the underlying kernel or container manager). These settings will fail the service in
1000 that case. If compatibility with such systems is desired it is hence recommended to attach your filter manually
1001 (requires <varname>Delegate=</varname><constant>yes</constant>) instead of using this setting.</para>
1002 </listitem>
1003 </varlistentry>
1004
1005 <varlistentry>
a27e6fb7 1006 <term><varname>BPFProgram=<replaceable>type</replaceable>:<replaceable>program-path</replaceable></varname></term>
5cbfbf2a 1007 <listitem>
a27e6fb7
LP
1008 <para><varname>BPFProgram=</varname> allows attaching custom BPF programs to the cgroup of a
1009 unit. (This generalizes the functionality exposed via <varname>IPEgressFilterPath=</varname> and
1010 and <varname>IPIngressFilterPath=</varname> for other hooks.) Cgroup-bpf hooks in the form of BPF
1011 programs loaded to the BPF filesystem are attached with cgroup-bpf attach flags determined by the
1012 unit. For details about attachment types and flags see <ulink
1013 url="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h"><filename>bpf.h</filename></ulink>. Also
1014 refer to the general <ulink url="https://docs.kernel.org/bpf/">BPF documentation</ulink>.</para>
1015
1016 <para>The specification of BPF program consists of a pair of BPF program type and program path in
1017 the file system, with <literal>:</literal> as the separator:
1018 <replaceable>type</replaceable>:<replaceable>program-path</replaceable>.</para>
1019
1020 <para>The BPF program type is equivalent to the BPF attach type used in
1021 <command>bpftool</command>. It may be one of <constant>egress</constant>,
5cbfbf2a
LP
1022 <constant>ingress</constant>, <constant>sock_create</constant>, <constant>sock_ops</constant>,
1023 <constant>device</constant>, <constant>bind4</constant>, <constant>bind6</constant>,
1024 <constant>connect4</constant>, <constant>connect6</constant>, <constant>post_bind4</constant>,
1025 <constant>post_bind6</constant>, <constant>sendmsg4</constant>, <constant>sendmsg6</constant>,
1026 <constant>sysctl</constant>, <constant>recvmsg4</constant>, <constant>recvmsg6</constant>,
1027 <constant>getsockopt</constant>, <constant>setsockopt</constant>.</para>
1028
a27e6fb7
LP
1029 <para>The specified program path must be an absolute path referencing a BPF program inode in the
1030 bpffs file system (which generally means it must begin with <filename>/sys/fs/bpf/</filename>). If
1031 a specified program does not exist (i.e. has not been uploaded to the BPF subsystem of the kernel
1032 yet), it will not be installed but unit activation will continue (a warning will be printed to the
1033 logs).</para>
1034
1035 <para>Setting <varname>BPFProgram=</varname> to an empty value makes previous assignments
1036 ineffective.</para>
1037
1038 <para>Multiple assignments of the same program type/path pair have the same effect as a single
1039 assignment: the program will be attached just once.</para>
1040
5cbfbf2a
LP
1041 <para>If BPF <constant>egress</constant> pinned to <replaceable>program-path</replaceable> path is already being
1042 handled by <varname>IPEgressFilterPath=</varname>, <varname>BPFProgram=</varname>
1043 assignment will be considered valid and <varname>BPFProgram=</varname> will be attached to a cgroup.
1044 Similarly for <constant>ingress</constant> hook and <varname>IPIngressFilterPath=</varname> assignment.</para>
1045
a27e6fb7
LP
1046 <para>BPF programs passed with <varname>BPFProgram=</varname> are attached to the cgroup of a unit
1047 with BPF attach flag <constant>multi</constant>, that allows further attachments of the same
5cbfbf2a
LP
1048 <replaceable>type</replaceable> within cgroup hierarchy topped by the unit cgroup.</para>
1049
a27e6fb7 1050 <para>Examples:<programlisting>BPFProgram=egress:/sys/fs/bpf/egress-hook
5cbfbf2a
LP
1051BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook
1052</programlisting></para>
1053 </listitem>
1054 </varlistentry>
1055
1056 </variablelist>
1057
1058 </refsect2><refsect2><title>Device Access</title>
1059
1060 <variablelist class='unit-directives'>
1061
d868475a
ZJS
1062 <varlistentry>
1063 <term><varname>DeviceAllow=</varname></term>
1064
1065 <listitem>
3ff668cb
LP
1066 <para>Control access to specific device nodes by the executed processes. Takes two space-separated
1067 strings: a device node specifier followed by a combination of <constant>r</constant>,
1068 <constant>w</constant>, <constant>m</constant> to control <emphasis>r</emphasis>eading,
0923b425 1069 <emphasis>w</emphasis>riting, or creation of the specific device nodes by the unit
6d48c7cf
LP
1070 (<emphasis>m</emphasis>knod), respectively. This functionality is implemented using eBPF
1071 filtering.</para>
3ff668cb 1072
a14e028e
ZJS
1073 <para>When access to <emphasis>all</emphasis> physical devices should be disallowed,
1074 <varname>PrivateDevices=</varname> may be used instead. See
1075 <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
1076 </para>
1077
3ff668cb
LP
1078 <para>The device node specifier is either a path to a device node in the file system, starting with
1079 <filename>/dev/</filename>, or a string starting with either <literal>char-</literal> or
1080 <literal>block-</literal> followed by a device group name, as listed in
6b000af4 1081 <filename>/proc/devices</filename>. The latter is useful to allow-list all current and future
3ff668cb
LP
1082 devices belonging to a specific device group at once. The device group is matched according to
1083 filename globbing rules, you may hence use the <literal>*</literal> and <literal>?</literal>
1084 wildcards. (Note that such globbing wildcards are not available for device node path
1085 specifications!) In order to match device nodes by numeric major/minor, use device node paths in
1086 the <filename>/dev/char/</filename> and <filename>/dev/block/</filename> directories. However,
1087 matching devices by major/minor is generally not recommended as assignments are neither stable nor
1088 portable between systems or different kernel versions.</para>
1089
1090 <para>Examples: <filename>/dev/sda5</filename> is a path to a device node, referring to an ATA or
1091 SCSI block device. <literal>char-pts</literal> and <literal>char-alsa</literal> are specifiers for
1092 all pseudo TTYs and all ALSA sound devices, respectively. <literal>char-cpu/*</literal> is a
1093 specifier matching all CPU related device groups.</para>
00d85bbb 1094
6b000af4 1095 <para>Note that allow lists defined this way should only reference device groups which are
00d85bbb 1096 resolvable at the time the unit is started. Any device groups not resolvable then are not added to
6b000af4 1097 the device allow list. In order to work around this limitation, consider extending service units
3a827125
LP
1098 with a pair of <command>After=modprobe@xyz.service</command> and
1099 <command>Wants=modprobe@xyz.service</command> lines that load the necessary kernel module
1100 implementing the device group if missing.
1101 Example: <programlisting>…
1102[Unit]
1103Wants=modprobe@loop.service
1104After=modprobe@loop.service
1105
00d85bbb 1106[Service]
00d85bbb
LP
1107DeviceAllow=block-loop
1108DeviceAllow=/dev/loop-control
1109…</programlisting></para>
1110
f2af682c 1111 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
d868475a
ZJS
1112 </listitem>
1113 </varlistentry>
1114
1115 <varlistentry>
1116 <term><varname>DevicePolicy=auto|closed|strict</varname></term>
1117
1118 <listitem>
1119 <para>
1120 Control the policy for allowing device access:
1121 </para>
1122 <variablelist>
1123 <varlistentry>
1124 <term><option>strict</option></term>
1125 <listitem>
1126 <para>means to only allow types of access that are
1127 explicitly specified.</para>
1128 </listitem>
1129 </varlistentry>
1130
1131 <varlistentry>
1132 <term><option>closed</option></term>
1133 <listitem>
6a75304e 1134 <para>in addition, allows access to standard pseudo
d868475a
ZJS
1135 devices including
1136 <filename>/dev/null</filename>,
1137 <filename>/dev/zero</filename>,
1138 <filename>/dev/full</filename>,
1139 <filename>/dev/random</filename>, and
1140 <filename>/dev/urandom</filename>.
1141 </para>
1142 </listitem>
1143 </varlistentry>
1144
1145 <varlistentry>
1146 <term><option>auto</option></term>
1147 <listitem>
1148 <para>
6a75304e 1149 in addition, allows access to all devices if no
d868475a
ZJS
1150 explicit <varname>DeviceAllow=</varname> is present.
1151 This is the default.
1152 </para>
1153 </listitem>
1154 </varlistentry>
1155 </variablelist>
f2af682c
LB
1156
1157 <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/>
d868475a
ZJS
1158 </listitem>
1159 </varlistentry>
61ad59b1 1160
5cbfbf2a
LP
1161 </variablelist>
1162
1163 </refsect2><refsect2><title>Control Group Management</title>
1164
1165 <variablelist class='unit-directives'>
1166
61ad59b1
LP
1167 <varlistentry>
1168 <term><varname>Slice=</varname></term>
1169
1170 <listitem>
1171 <para>The name of the slice unit to place the unit
1172 in. Defaults to <filename>system.slice</filename> for all
dc7adf20
LP
1173 non-instantiated units of all unit types (except for slice
1174 units themselves see below). Instance units are by default
1175 placed in a subslice of <filename>system.slice</filename>
1176 that is named after the template name.</para>
1177
1178 <para>This option may be used to arrange systemd units in a
1179 hierarchy of slices each of which might have resource
1180 settings applied.</para>
61ad59b1 1181
fbce1139 1182 <para>For units of type slice, the only accepted value for
61ad59b1 1183 this setting is the parent slice. Since the name of a slice
fbce1139 1184 unit implies the parent slice, it is hence redundant to ever
61ad59b1 1185 set this parameter directly for slice units.</para>
ae0a5fb1
LP
1186
1187 <para>Special care should be taken when relying on the default slice assignment in templated service units
1188 that have <varname>DefaultDependencies=no</varname> set, see
1189 <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>, section
45f09f93 1190 "Default Dependencies" for details.</para>
ae0a5fb1 1191
61ad59b1
LP
1192 </listitem>
1193 </varlistentry>
1194
a931ad47
LP
1195 <varlistentry>
1196 <term><varname>Delegate=</varname></term>
1197
1198 <listitem>
a8b993dc
LP
1199 <para>Turns on delegation of further resource control partitioning to processes of the unit. Units
1200 where this is enabled may create and manage their own private subhierarchy of control groups below
1201 the control group of the unit itself. For unprivileged services (i.e. those using the
1202 <varname>User=</varname> setting) the unit's control group will be made accessible to the relevant
1203 user.</para>
253d0d59
ZJS
1204
1205 <para>When enabled the service manager will refrain from manipulating control groups or moving
1206 processes below the unit's control group, so that a clear concept of ownership is established: the
449172f9
ZJS
1207 control group tree at the level of the unit's control group and above (i.e. towards the root
1208 control group) is owned and managed by the service manager of the host, while the control group
1209 tree below the unit's control group is owned and managed by the unit itself.</para>
1210
1211 <para>Takes either a boolean argument or a (possibly empty) list of control group controller names.
1212 If true, delegation is turned on, and all supported controllers are enabled for the unit, making
1213 them available to the unit's processes for management. If false, delegation is turned off entirely
1214 (and no additional controllers are enabled). If set to a list of controllers, delegation is turned
1215 on, and the specified controllers are enabled for the unit. Assigning the empty string will enable
253d0d59 1216 delegation, but reset the list of controllers, and all assignments prior to this will have no
449172f9
ZJS
1217 effect. Note that additional controllers other than the ones specified might be made available as
1218 well, depending on configuration of the containing slice unit or other units contained in it.
1219 Defaults to false.</para>
253d0d59
ZJS
1220
1221 <para>Note that controller delegation to less privileged code is only safe on the unified control
1222 group hierarchy. Accordingly, access to the specified controllers will not be granted to
1223 unprivileged services on the legacy hierarchy, even when requested.</para>
a9f01ad1 1224
5403e153
AZ
1225 <xi:include href="supported-controllers.xml" xpointer="controllers-text" />
1226
253d0d59
ZJS
1227 <para>Not all of these controllers are available on all kernels however, and some are specific to
1228 the unified hierarchy while others are specific to the legacy hierarchy. Also note that the kernel
1229 might support further controllers, which aren't covered here yet as delegation is either not
1230 supported at all for them or not defined cleanly.</para>
1231
1232 <para>Note that because of the hierarchical nature of cgroup hierarchy, any controllers that are
1233 delegated will be enabled for the parent and sibling units of the unit with delegation.</para>
077c40bc
LP
1234
1235 <para>For further details on the delegation model consult <ulink
1236 url="https://systemd.io/CGROUP_DELEGATION">Control Group APIs and Delegation</ulink>.</para>
a931ad47
LP
1237 </listitem>
1238 </varlistentry>
1239
a8b993dc
LP
1240 <varlistentry>
1241 <term><varname>DelegateSubgroup=</varname></term>
1242
1243 <listitem>
1244 <para>Place unit processes in the specified subgroup of the unit's control group. Takes a valid
1245 control group name (not a path!) as parameter, or an empty string to turn this feature
1246 off. Defaults to off. The control group name must be usable as filename and avoid conflicts with
1247 the kernel's control group attribute files (i.e. <filename>cgroup.procs</filename> is not an
1248 acceptable name, since the kernel exposes a native control group attribute file by that name). This
1249 option has no effect unless control group delegation is turned on via <varname>Delegate=</varname>,
1250 see above. Note that this setting only applies to "main" processes of a unit, i.e. for services to
1251 <varname>ExecStart=</varname>, but not for <varname>ExecReload=</varname> and similar. If
1252 delegation is enabled, the latter are always placed inside a subgroup named
1253 <filename>.control</filename>. The specified subgroup is automatically created (and potentially
1254 ownership is passed to the unit's configured user/group) when a process is started in it.</para>
1255
1256 <para>This option is useful to avoid manually moving the invoked process into a subgroup after it
1257 has been started. Since no processes should live in inner nodes of the control group tree it's
1258 almost always necessary to run the main ("supervising") process of a unit that has delegation
1259 turned on in a subgroup.</para>
ec07c3c8
AK
1260
1261 <xi:include href="version-info.xml" xpointer="v254"/>
a8b993dc
LP
1262 </listitem>
1263 </varlistentry>
1264
c72703e2
CD
1265 <varlistentry>
1266 <term><varname>DisableControllers=</varname></term>
1267
1268 <listitem>
253d0d59
ZJS
1269 <para>Disables controllers from being enabled for a unit's children. If a controller listed is
1270 already in use in its subtree, the controller will be removed from the subtree. This can be used to
1271 avoid configuration in child units from being able to implicitly or explicitly enable a controller.
1272 Defaults to empty.</para>
c72703e2
CD
1273
1274 <para>Multiple controllers may be specified, separated by spaces. You may also pass
1275 <varname>DisableControllers=</varname> multiple times, in which case each new instance adds another controller
1276 to disable. Passing <varname>DisableControllers=</varname> by itself with no controller name present resets
1277 the disabled controller list.</para>
1278
253d0d59
ZJS
1279 <para>It may not be possible to disable a controller after units have been started, if the unit or
1280 any child of the unit in question delegates controllers to its children, as any delegated subtree
1281 of the cgroup hierarchy is unmanaged by systemd.</para>
1282
5403e153 1283 <xi:include href="supported-controllers.xml" xpointer="controllers-text" />
c72703e2
CD
1284 </listitem>
1285 </varlistentry>
cf3e5788 1286
5cbfbf2a
LP
1287 </variablelist>
1288
1289 </refsect2><refsect2><title>Memory Pressure Control</title>
1290
1291 <variablelist class='unit-directives'>
1292
cf3e5788
AZ
1293 <varlistentry>
1294 <term><varname>ManagedOOMSwap=auto|kill</varname></term>
1295 <term><varname>ManagedOOMMemoryPressure=auto|kill</varname></term>
1296
1297 <listitem>
1298 <para>Specifies how
1299 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
1300 will act on this unit's cgroups. Defaults to <option>auto</option>.</para>
1301
6f83ea60
ZJS
1302 <para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by
1303 <command>systemd-oomd</command>. If the cgroup passes the limits set by
1304 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or
1305 the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send
1306 <constant>SIGKILL</constant> to all of the processes under it. You can find more details on
1307 candidates and kill behavior at
cf3e5788 1308 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
6f83ea60
ZJS
1309 and
1310 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
1311
1312 <para>Setting either of these properties to <option>kill</option> will also result in
cf3e5788 1313 <varname>After=</varname> and <varname>Wants=</varname> dependencies on
6f83ea60 1314 <filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para>
cf3e5788 1315
6f83ea60
ZJS
1316 <para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this
1317 cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these
1318 properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate
1319 for <command>systemd-oomd</command> to terminate.</para>
cf3e5788
AZ
1320 </listitem>
1321 </varlistentry>
1322
1323 <varlistentry>
0a9f9344 1324 <term><varname>ManagedOOMMemoryPressureLimit=</varname></term>
cf3e5788
AZ
1325
1326 <listitem>
1327 <para>Overrides the default memory pressure limit set by
75909cc7
ZJS
1328 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
1329 this unit (cgroup). Takes a percentage value between 0% and 100%, inclusive. This property is
1330 ignored unless <varname>ManagedOOMMemoryPressure=</varname><option>kill</option>. Defaults to 0%,
1331 which means to use the default set by
1332 <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
cf3e5788
AZ
1333 </para>
1334 </listitem>
1335 </varlistentry>
d8a4d64b
AZ
1336
1337 <varlistentry>
1338 <term><varname>ManagedOOMPreference=none|avoid|omit</varname></term>
1339
1340 <listitem>
326152af
ZJS
1341 <para>Allows deprioritizing or omitting this unit's cgroup as a candidate when
1342 <command>systemd-oomd</command> needs to act. Requires support for extended attributes (see
d8a4d64b 1343 <citerefentry project='man-pages'><refentrytitle>xattr</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
58b2f0d1
NR
1344 in order to use <option>avoid</option> or <option>omit</option>.</para>
1345
1346 <para>When calculating candidates to relieve swap usage, <command>systemd-oomd</command> will
1347 only respect these extended attributes if the unit's cgroup is owned by root.</para>
1348
1349 <para>When calculating candidates to relieve memory pressure, <command>systemd-oomd</command>
3b44e33f
NR
1350 will only respect these extended attributes if the unit's cgroup is owned by root, or if the
1351 unit's cgroup owner, and the owner of the monitored ancestor cgroup are the same. For example,
1352 if <command>systemd-oomd</command> is calculating candidates for <filename>-.slice</filename>,
1353 then extended attributes set on descendants of <filename>/user.slice/user-1000.slice/user@1000.service/</filename>
58b2f0d1
NR
1354 will be ignored because the descendants are owned by UID 1000, and <filename>-.slice</filename>
1355 is owned by UID 0. But, if calculating candidates for
1356 <filename>/user.slice/user-1000.slice/user@1000.service/</filename>, then extended attributes set
1357 on the descendants would be respected.</para>
d8a4d64b 1358
34507fa9
ZJS
1359 <para>If this property is set to <option>avoid</option>, the service manager will convey this to
1360 <command>systemd-oomd</command>, which will only select this cgroup if there are no other viable
1361 candidates.</para>
1362
1363 <para>If this property is set to <option>omit</option>, the service manager will convey this to
1364 <command>systemd-oomd</command>, which will ignore this cgroup as a candidate and will not perform
1365 any actions on it.</para>
326152af
ZJS
1366
1367 <para>It is recommended to use <option>avoid</option> and <option>omit</option> sparingly, as it
1368 can adversely affect <command>systemd-oomd</command>'s kill behavior. Also note that these extended
1369 attributes are not applied recursively to cgroups under this unit's cgroup.</para>
1370
34507fa9
ZJS
1371 <para>Defaults to <option>none</option> which means <command>systemd-oomd</command> will rank this
1372 unit's cgroup as defined in
d8a4d64b 1373 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
34507fa9
ZJS
1374 and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
1375 </para>
d8a4d64b
AZ
1376 </listitem>
1377 </varlistentry>
6bb00842
LP
1378
1379 <varlistentry>
1380 <term><varname>MemoryPressureWatch=</varname></term>
1381
1382 <listitem><para>Controls memory pressure monitoring for invoked processes. Takes one of
1383 <literal>off</literal>, <literal>on</literal>, <literal>auto</literal> or <literal>skip</literal>. If
1384 <literal>off</literal> tells the service not to watch for memory pressure events, by setting the
1385 <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable to the literal string
1386 <filename>/dev/null</filename>. If <literal>on</literal> tells the service to watch for memory
1387 pressure events. This enables memory accounting for the service, and ensures the
1388 <filename>memory.pressure</filename> cgroup attribute files is accessible for read and write to the
1389 service's user. It then sets the <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable for
1390 processes invoked by the unit to the file system path to this file. The threshold information
1391 configured with <varname>MemoryPressureThresholdSec=</varname> is encoded in the
1392 <varname>$MEMORY_PRESSURE_WRITE</varname> environment variable. If the <literal>auto</literal> value
1393 is set the protocol is enabled if memory accounting is anyway enabled for the unit, and disabled
1394 otherwise. If set to <literal>skip</literal> the logic is neither enabled, nor disabled and the two
1395 environment variables are not set.</para>
1396
1397 <para>Note that services are free to use the two environment variables, but it's unproblematic if
1398 they ignore them. Memory pressure handling must be implemented individually in each service, and
1399 usually means different things for different software. For further details on memory pressure
1400 handling see <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure Handling in
1401 systemd</ulink>.</para>
1402
1403 <para>Services implemented using
1404 <citerefentry><refentrytitle>sd-event</refentrytitle><manvolnum>3</manvolnum></citerefentry> may use
1405 <citerefentry><refentrytitle>sd_event_add_memory_pressure</refentrytitle><manvolnum>3</manvolnum></citerefentry>
1406 to watch for and handle memory pressure events.</para>
1407
1408 <para>If not explicit set, defaults to the <varname>DefaultMemoryPressureWatch=</varname> setting in
ec07c3c8
AK
1409 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
1410
1411 <xi:include href="version-info.xml" xpointer="v254"/></listitem>
6bb00842
LP
1412 </varlistentry>
1413
1414 <varlistentry>
1415 <term><varname>MemoryPressureThresholdSec=</varname></term>
1416
1417 <listitem><para>Sets the memory pressure threshold time for memory pressure monitor as configured via
1418 <varname>MemoryPressureWatch=</varname>. Specifies the maximum allocation latency before a memory
a6170074 1419 pressure event is signalled to the service, per 2s window. If not specified defaults to the
6bb00842
LP
1420 <varname>DefaultMemoryPressureThresholdSec=</varname> setting in
1421 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
a6170074 1422 (which in turn defaults to 200ms). The specified value expects a time unit such as
e503019b 1423 <literal>ms</literal> or <literal>μs</literal>, see
6bb00842 1424 <citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
ec07c3c8
AK
1425 details on the permitted syntax.</para>
1426
1427 <xi:include href="version-info.xml" xpointer="v254"/></listitem>
6bb00842 1428 </varlistentry>
d868475a 1429 </variablelist>
5cbfbf2a 1430 </refsect2>
d868475a
ZJS
1431 </refsect1>
1432
7a9e0bd0
ZJS
1433 <refsect1>
1434 <title>History</title>
1435
1436 <variablelist>
1437 <varlistentry>
1438 <term>systemd 252</term>
1439 <listitem><para> Options for controlling the Legacy Control Group Hierarchy (<ulink
8b9f0921
ZJS
1440 url="https://docs.kernel.org/admin-guide/cgroup-v1/index.html">Control Groups version 1</ulink>)
1441 are now fully deprecated:
1442 <varname>CPUShares=<replaceable>weight</replaceable></varname>,
7a9e0bd0
ZJS
1443 <varname>StartupCPUShares=<replaceable>weight</replaceable></varname>,
1444 <varname>MemoryLimit=<replaceable>bytes</replaceable></varname>,
1445 <varname>BlockIOAccounting=</varname>,
1446 <varname>BlockIOWeight=<replaceable>weight</replaceable></varname>,
1447 <varname>StartupBlockIOWeight=<replaceable>weight</replaceable></varname>,
1448 <varname>BlockIODeviceWeight=<replaceable>device</replaceable>
1449 <replaceable>weight</replaceable></varname>,
1450 <varname>BlockIOReadBandwidth=<replaceable>device</replaceable>
1451 <replaceable>bytes</replaceable></varname>,
8b9f0921 1452 <varname>BlockIOWriteBandwidth=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname>.
ec07c3c8
AK
1453 Please switch to the unified cgroup hierarchy.</para>
1454
1455 <xi:include href="version-info.xml" xpointer="v252"/></listitem>
7a9e0bd0
ZJS
1456 </varlistentry>
1457 </variablelist>
1458 </refsect1>
1459
d868475a
ZJS
1460 <refsect1>
1461 <title>See Also</title>
1462 <para>
1463 <citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
d1698b82 1464 <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
d868475a
ZJS
1465 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1466 <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1467 <citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1468 <citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1469 <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1470 <citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
1471 <citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
74b47bbd 1472 <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
d868475a 1473 <citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
61ad59b1 1474 <citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
cf3e5788 1475 <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>,
d868475a 1476 The documentation for control groups and specific controllers in the Linux kernel:
0e685823 1477 <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink>.
d868475a
ZJS
1478 </para>
1479 </refsect1>
1480</refentry>