]>
Commit | Line | Data |
---|---|---|
514094f9 | 1 | <?xml version='1.0'?> |
3a54a157 | 2 | <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" |
eea10b26 | 3 | "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> |
db9ecf05 | 4 | <!-- SPDX-License-Identifier: LGPL-2.1-or-later --> |
d868475a | 5 | |
5403e153 | 6 | <refentry id="systemd.resource-control" xmlns:xi="http://www.w3.org/2001/XInclude"> |
d868475a | 7 | <refentryinfo> |
3fde5f30 | 8 | <title>systemd.resource-control</title> |
d868475a | 9 | <productname>systemd</productname> |
d868475a ZJS |
10 | </refentryinfo> |
11 | ||
12 | <refmeta> | |
3fde5f30 | 13 | <refentrytitle>systemd.resource-control</refentrytitle> |
d868475a ZJS |
14 | <manvolnum>5</manvolnum> |
15 | </refmeta> | |
16 | ||
17 | <refnamediv> | |
3fde5f30 LP |
18 | <refname>systemd.resource-control</refname> |
19 | <refpurpose>Resource control unit settings</refpurpose> | |
d868475a ZJS |
20 | </refnamediv> |
21 | ||
22 | <refsynopsisdiv> | |
23 | <para> | |
24 | <filename><replaceable>slice</replaceable>.slice</filename>, | |
25 | <filename><replaceable>scope</replaceable>.scope</filename>, | |
26 | <filename><replaceable>service</replaceable>.service</filename>, | |
27 | <filename><replaceable>socket</replaceable>.socket</filename>, | |
28 | <filename><replaceable>mount</replaceable>.mount</filename>, | |
29 | <filename><replaceable>swap</replaceable>.swap</filename> | |
30 | </para> | |
31 | </refsynopsisdiv> | |
32 | ||
33 | <refsect1> | |
34 | <title>Description</title> | |
35 | ||
c7458f93 LP |
36 | <para>Unit configuration files for services, slices, scopes, sockets, mount points, and swap devices share a subset |
37 | of configuration options for resource control of spawned processes. Internally, this relies on the Linux Control | |
38 | Groups (cgroups) kernel concept for organizing processes in a hierarchical tree of named groups for the purpose of | |
39 | resource management.</para> | |
9365b048 | 40 | |
d868475a ZJS |
41 | <para>This man page lists the configuration options shared by |
42 | those six unit types. See | |
43 | <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> | |
44 | for the common options of all unit configuration files, and | |
45 | <citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>, | |
46 | <citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry>, | |
47 | <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>, | |
48 | <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>, | |
49 | <citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry>, | |
50 | and | |
51 | <citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry> | |
52 | for more information on the specific unit configuration files. The | |
3fde5f30 | 53 | resource control configuration options are configured in the |
d868475a ZJS |
54 | [Slice], [Scope], [Service], [Socket], [Mount], or [Swap] |
55 | sections, depending on the unit type.</para> | |
ea021cc3 | 56 | |
74b47bbd ZJS |
57 | <para>In addition, options which control resources available to programs |
58 | <emphasis>executed</emphasis> by systemd are listed in | |
59 | <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>. | |
60 | Those options complement options listed here.</para> | |
61 | ||
253d0d59 ZJS |
62 | <refsect2> |
63 | <title>Enabling and disabling controllers</title> | |
64 | ||
65 | <para>Controllers in the cgroup hierarchy are hierarchical, and resource control is realized by | |
66 | distributing resource assignments between siblings in branches of the cgroup hierarchy. There is no | |
67 | need to explicitly <emphasis>enable</emphasis> a cgroup controller for a unit. | |
68 | <command>systemd</command> will instruct the kernel to enable a controller for a given unit when this | |
69 | unit has configuration for a given controller. For example, when <varname>CPUWeight=</varname> is set, | |
70 | the <option>cpu</option> controller will be enabled, and when <varname>TasksMax=</varname> are set, the | |
71 | <option>pids</option> controller will be enabled. In addition, various controllers may be also be | |
72 | enabled explicitly via the | |
73 | <varname>MemoryAccounting=</varname>/<varname>TasksAccounting=</varname>/<varname>IOAccounting=</varname> | |
74 | settings. Because of how the cgroup hierarchy works, controllers will be automatically enabled for all | |
75 | parent units and for any sibling units starting with the lowest level at which a controller is enabled. | |
76 | Units for which a controller is enabled may be subject to resource control even if they don't have any | |
77 | explicit configuration.</para> | |
78 | ||
79 | <para>Setting <varname>Delegate=</varname> enables any delegated controllers for that unit (see below). | |
80 | The delegatee may then enable controllers for its children as appropriate. In particular, if the | |
81 | delegatee is <command>systemd</command> (in the <filename>user@.service</filename> unit), it will | |
82 | repeat the same logic as the system instance and enable controllers for user units which have resource | |
83 | limits configured, and their siblings and parents and parents' siblings.</para> | |
84 | ||
85 | <para>Controllers may be <emphasis>disabled</emphasis> for parts of the cgroup hierarchy with | |
86 | <varname>DisableControllers=</varname> (see below).</para> | |
87 | ||
88 | <example> | |
89 | <title>Enabling and disabling controllers</title> | |
90 | ||
91 | <programlisting> | |
92 | -.slice | |
93 | / \ | |
94 | /-----/ \--------------\ | |
95 | / \ | |
96 | system.slice user.slice | |
97 | / \ / \ | |
98 | / \ / \ | |
449172f9 ZJS |
99 | / \ user@42.service user@1000.service |
100 | / \ Delegate= Delegate=yes | |
253d0d59 ZJS |
101 | a.service b.slice / \ |
102 | CPUWeight=20 DisableControllers=cpu / \ | |
103 | / \ app.slice session.slice | |
104 | / \ CPUWeight=100 CPUWeight=100 | |
105 | / \ | |
106 | b1.service b2.service | |
107 | CPUWeight=1000 | |
108 | </programlisting> | |
109 | ||
110 | <para>In this hierarchy, the <option>cpu</option> controller is enabled for all units shown except | |
111 | <filename>b1.service</filename> and <filename>b2.service</filename>. Because there is no explicit | |
112 | configuration for <filename>system.slice</filename> and <filename>user.slice</filename>, CPU | |
113 | resources will be split equally between them. Similarly, resources are allocated equally between | |
114 | children of <filename>user.slice</filename> and between the child slices beneath | |
94d82b59 | 115 | <filename>user@1000.service</filename>. Assuming that there is no further configuration of resources |
253d0d59 ZJS |
116 | or delegation below slices <filename>app.slice</filename> or <filename>session.slice</filename>, the |
117 | <option>cpu</option> controller would not be enabled for units in those slices and CPU resources | |
449172f9 ZJS |
118 | would be further allocated using other mechanisms, e.g. based on nice levels. The manager for user |
119 | 42 has delegation enabled without any controllers, i.e. it can manipulate its subtree of the cgroup | |
120 | hierarchy, but without resource control.</para> | |
253d0d59 ZJS |
121 | |
122 | <para>In the slice <filename>system.slice</filename>, CPU resources are split 1:6 for service | |
123 | <filename>a.service</filename>, and 5:6 for slice <filename>b.slice</filename>, because slice | |
124 | <filename>b.slice</filename> gets the default value of 100 for <filename>cpu.weight</filename> when | |
125 | <varname>CPUWeight=</varname> is not set.</para> | |
126 | ||
127 | <para><varname>CPUWeight=</varname> setting in service <filename>b2.service</filename> is neutralized | |
128 | by <varname>DisableControllers=</varname> in slice <filename>b.slice</filename>, so the | |
129 | <option>cpu</option> controller would not be enabled for services <filename>b1.service</filename> and | |
130 | <filename>b2.service</filename>, and CPU resources would be further allocated using other mechanisms, | |
131 | e.g. based on nice levels.</para> | |
132 | </example> | |
133 | </refsect2> | |
a8136f1b ZJS |
134 | |
135 | <refsect2> | |
136 | <title>Setting resource controls for a group of related units</title> | |
137 | ||
138 | <para>As described in | |
139 | <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>, the | |
140 | settings listed here may be set through the main file of a unit and drop-in snippets in | |
141 | <filename index="false">*.d/</filename> directories. The list of directories searched for drop-ins | |
142 | includes names formed by repeatedly truncating the unit name after all dashes. This is particularly | |
143 | convenient to set resource limits for a group of units with similar names.</para> | |
144 | ||
145 | <para>For example, every user gets their own slice | |
146 | <filename>user-<replaceable>nnn</replaceable>.slice</filename>. Drop-ins with local configuration that | |
147 | affect user 1000 may be placed in | |
148 | <filename index="false">/etc/systemd/system/user-1000.slice</filename>, | |
149 | <filename index="false">/etc/systemd/system/user-1000.slice.d/*.conf</filename>, but also | |
150 | <filename index="false">/etc/systemd/system/user-.slice.d/*.conf</filename>. This last directory | |
151 | applies to all user slices.</para> | |
152 | </refsect2> | |
253d0d59 | 153 | |
d2c0c05f DT |
154 | <refsect2> |
155 | <title/> | |
156 | <para>See the <ulink | |
157 | url="https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface">New | |
158 | Control Group Interfaces</ulink> for an introduction on how to make | |
159 | use of resource control APIs from programs.</para> | |
160 | </refsect2> | |
d868475a ZJS |
161 | </refsect1> |
162 | ||
c129bd5d | 163 | <refsect1> |
45f09f93 | 164 | <title>Implicit Dependencies</title> |
c129bd5d | 165 | |
45f09f93 JL |
166 | <para>The following dependencies are implicitly added:</para> |
167 | ||
168 | <itemizedlist> | |
169 | <listitem><para>Units with the <varname>Slice=</varname> setting set automatically acquire | |
170 | <varname>Requires=</varname> and <varname>After=</varname> dependencies on the specified | |
171 | slice unit.</para></listitem> | |
172 | </itemizedlist> | |
c129bd5d LP |
173 | </refsect1> |
174 | ||
45f09f93 JL |
175 | <!-- We don't have any default dependency here. --> |
176 | ||
d868475a ZJS |
177 | <refsect1> |
178 | <title>Options</title> | |
179 | ||
5cbfbf2a LP |
180 | <para>Units of the types listed above can have settings for resource control configuration:</para> |
181 | ||
182 | <refsect2><title>CPU Accounting and Control</title> | |
d868475a ZJS |
183 | |
184 | <variablelist class='unit-directives'> | |
d868475a ZJS |
185 | |
186 | <varlistentry> | |
61ad59b1 | 187 | <term><varname>CPUAccounting=</varname></term> |
d868475a ZJS |
188 | |
189 | <listitem> | |
61ad59b1 LP |
190 | <para>Turn on CPU usage accounting for this unit. Takes a |
191 | boolean argument. Note that turning on CPU accounting for | |
03a7b521 | 192 | one unit will also implicitly turn it on for all units |
085afe36 LP |
193 | contained in the same slice and for all its parent slices |
194 | and the units contained therein. The system default for this | |
03a7b521 | 195 | setting may be controlled with |
085afe36 LP |
196 | <varname>DefaultCPUAccounting=</varname> in |
197 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
695e39dd ZJS |
198 | |
199 | <para>Under the unified cgroup hierarchy, CPU accounting is available for all units and this | |
200 | setting has no effect.</para> | |
aefdc112 AK |
201 | |
202 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
203 | </listitem> |
204 | </varlistentry> | |
205 | ||
66ebf6c0 TH |
206 | <varlistentry> |
207 | <term><varname>CPUWeight=<replaceable>weight</replaceable></varname></term> | |
208 | <term><varname>StartupCPUWeight=<replaceable>weight</replaceable></varname></term> | |
209 | ||
210 | <listitem> | |
253d0d59 ZJS |
211 | <para>These settings control the <option>cpu</option> controller in the unified hierarchy.</para> |
212 | ||
c8340822 | 213 | <para>These options accept an integer value or a the special string "idle":</para> |
214 | <itemizedlist> | |
215 | <listitem> | |
396d298d ZJS |
216 | <para>If set to an integer value, assign the specified CPU time weight to the processes |
217 | executed, if the unified control group hierarchy is used on the system. These options control | |
218 | the <literal>cpu.weight</literal> control group attribute. The allowed range is 1 to 10000. | |
219 | Defaults to unset, but the kernel default is 100. For details about this control group | |
220 | attribute, see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups | |
221 | v2</ulink> and <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS | |
222 | Scheduler</ulink>. The available CPU time is split up among all units within one slice | |
223 | relative to their CPU time weight. A higher weight means more CPU time, a lower weight means | |
224 | less.</para> | |
c8340822 | 225 | </listitem> |
226 | <listitem> | |
227 | <para>If set to the special string "idle", mark the cgroup for "idle scheduling", which means | |
228 | that it will get CPU resources only when there are no processes not marked in this way to execute in this | |
229 | cgroup or its siblings. This setting corresponds to the <literal>cpu.idle</literal> cgroup attribute.</para> | |
230 | ||
231 | <para>Note that this value only has an effect on cgroup-v2, for cgroup-v1 it is equivalent to the minimum weight.</para> | |
232 | </listitem> | |
233 | </itemizedlist> | |
66ebf6c0 | 234 | |
058a2d8f | 235 | <para>While <varname>StartupCPUWeight=</varname> applies to the startup and shutdown phases of the system, |
66ebf6c0 | 236 | <varname>CPUWeight=</varname> applies to normal runtime of the system, and if the former is not set also to |
058a2d8f PM |
237 | the startup and shutdown phases. Using <varname>StartupCPUWeight=</varname> allows prioritizing specific services at |
238 | boot-up and shutdown differently than during normal runtime.</para> | |
dca031d2 ZJS |
239 | |
240 | <para>In addition to the resource allocation performed by the <option>cpu</option> controller, the | |
241 | kernel may automatically divide resources based on session-id grouping, see "The autogroup feature" | |
242 | in <citerefentry | |
243 | project='man-pages'><refentrytitle>sched</refentrytitle><manvolnum>7</manvolnum></citerefentry>. | |
244 | The effect of this feature is similar to the <option>cpu</option> controller with no explicit | |
245 | configuration, so users should be careful to not mistake one for the other.</para> | |
aefdc112 AK |
246 | |
247 | <xi:include href="version-info.xml" xpointer="v232"/> | |
b2f8b02e LP |
248 | </listitem> |
249 | </varlistentry> | |
250 | ||
251 | <varlistentry> | |
252 | <term><varname>CPUQuota=</varname></term> | |
253 | ||
254 | <listitem> | |
253d0d59 ZJS |
255 | <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para> |
256 | ||
66ebf6c0 TH |
257 | <para>Assign the specified CPU time quota to the processes executed. Takes a percentage value, suffixed with |
258 | "%". The percentage specifies how much CPU time the unit shall get at maximum, relative to the total CPU time | |
259 | available on one CPU. Use values > 100% for allotting CPU time on more than one CPU. This controls the | |
260 | <literal>cpu.max</literal> attribute on the unified control group hierarchy and | |
261 | <literal>cpu.cfs_quota_us</literal> on legacy. For details about these control group attributes, see <ulink | |
0e685823 | 262 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and <ulink |
263 | url="https://docs.kernel.org/scheduler/sched-bwc.html">CFS Bandwidth Control</ulink>. | |
17cfd6f9 | 264 | Setting <varname>CPUQuota=</varname> to an empty value unsets the quota.</para> |
b2f8b02e | 265 | |
66ebf6c0 TH |
266 | <para>Example: <varname>CPUQuota=20%</varname> ensures that the executed processes will never get more than |
267 | 20% CPU time on one CPU.</para> | |
b2f8b02e | 268 | |
aefdc112 AK |
269 | <xi:include href="version-info.xml" xpointer="v213"/> |
270 | ||
b2f8b02e LP |
271 | </listitem> |
272 | </varlistentry> | |
273 | ||
10f28641 FB |
274 | <varlistentry> |
275 | <term><varname>CPUQuotaPeriodSec=</varname></term> | |
276 | ||
277 | <listitem> | |
253d0d59 ZJS |
278 | <para>This setting controls the <option>cpu</option> controller in the unified hierarchy.</para> |
279 | ||
10f28641 FB |
280 | <para>Assign the duration over which the CPU time quota specified by <varname>CPUQuota=</varname> is measured. |
281 | Takes a time duration value in seconds, with an optional suffix such as "ms" for milliseconds (or "s" for seconds.) | |
282 | The default setting is 100ms. The period is clamped to the range supported by the kernel, which is [1ms, 1000ms]. | |
283 | Additionally, the period is adjusted up so that the quota interval is also at least 1ms. | |
284 | Setting <varname>CPUQuotaPeriodSec=</varname> to an empty value resets it to the default.</para> | |
285 | ||
286 | <para>This controls the second field of <literal>cpu.max</literal> attribute on the unified control group hierarchy | |
287 | and <literal>cpu.cfs_period_us</literal> on legacy. For details about these control group attributes, see | |
0e685823 | 288 | <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink> and |
289 | <ulink url="https://docs.kernel.org/scheduler/sched-design-CFS.html">CFS Scheduler</ulink>.</para> | |
10f28641 FB |
290 | |
291 | <para>Example: <varname>CPUQuotaPeriodSec=10ms</varname> to request that the CPU quota is measured in periods of 10ms.</para> | |
aefdc112 AK |
292 | |
293 | <xi:include href="version-info.xml" xpointer="v242"/> | |
10f28641 FB |
294 | </listitem> |
295 | </varlistentry> | |
047f5d63 PH |
296 | |
297 | <varlistentry> | |
298 | <term><varname>AllowedCPUs=</varname></term> | |
c93a7d4a | 299 | <term><varname>StartupAllowedCPUs=</varname></term> |
047f5d63 PH |
300 | |
301 | <listitem> | |
253d0d59 ZJS |
302 | <para>This setting controls the <option>cpuset</option> controller in the unified hierarchy.</para> |
303 | ||
047f5d63 PH |
304 | <para>Restrict processes to be executed on specific CPUs. Takes a list of CPU indices or ranges separated by either |
305 | whitespace or commas. CPU ranges are specified by the lower and upper CPU indices separated by a dash.</para> | |
306 | ||
c93a7d4a PM |
307 | <para>Setting <varname>AllowedCPUs=</varname> or <varname>StartupAllowedCPUs=</varname> doesn't guarantee that all |
308 | of the CPUs will be used by the processes as it may be limited by parent units. The effective configuration is | |
309 | reported as <varname>EffectiveCPUs=</varname>.</para> | |
310 | ||
058a2d8f | 311 | <para>While <varname>StartupAllowedCPUs=</varname> applies to the startup and shutdown phases of the system, |
c93a7d4a | 312 | <varname>AllowedCPUs=</varname> applies to normal runtime of the system, and if the former is not set also to |
058a2d8f PM |
313 | the startup and shutdown phases. Using <varname>StartupAllowedCPUs=</varname> allows prioritizing specific services at |
314 | boot-up and shutdown differently than during normal runtime.</para> | |
047f5d63 PH |
315 | |
316 | <para>This setting is supported only with the unified control group hierarchy.</para> | |
aefdc112 AK |
317 | |
318 | <xi:include href="version-info.xml" xpointer="v244"/> | |
047f5d63 PH |
319 | </listitem> |
320 | </varlistentry> | |
321 | ||
5cbfbf2a | 322 | </variablelist> |
c93a7d4a | 323 | |
5cbfbf2a | 324 | </refsect2><refsect2><title>Memory Accounting and Control</title> |
047f5d63 | 325 | |
5cbfbf2a | 326 | <variablelist class='unit-directives'> |
10f28641 | 327 | |
61ad59b1 LP |
328 | <varlistentry> |
329 | <term><varname>MemoryAccounting=</varname></term> | |
330 | ||
331 | <listitem> | |
253d0d59 ZJS |
332 | <para>This setting controls the <option>memory</option> controller in the unified hierarchy.</para> |
333 | ||
61ad59b1 LP |
334 | <para>Turn on process and kernel memory accounting for this |
335 | unit. Takes a boolean argument. Note that turning on memory | |
03a7b521 LP |
336 | accounting for one unit will also implicitly turn it on for |
337 | all units contained in the same slice and for all its parent | |
338 | slices and the units contained therein. The system default | |
339 | for this setting may be controlled with | |
085afe36 LP |
340 | <varname>DefaultMemoryAccounting=</varname> in |
341 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
aefdc112 AK |
342 | |
343 | <xi:include href="version-info.xml" xpointer="v208"/> | |
61ad59b1 LP |
344 | </listitem> |
345 | </varlistentry> | |
346 | ||
48422635 | 347 | <varlistentry> |
29bb3d7f | 348 | <term><varname>MemoryMin=<replaceable>bytes</replaceable></varname>, <varname>MemoryLow=<replaceable>bytes</replaceable></varname></term> |
f72dcb92 | 349 | <term><varname>StartupMemoryLow=<replaceable>bytes</replaceable></varname>, <varname>DefaultStartupMemoryLow=<replaceable>bytes</replaceable></varname></term> |
48422635 TH |
350 | |
351 | <listitem> | |
253d0d59 ZJS |
352 | <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para> |
353 | ||
29bb3d7f BB |
354 | <para>Specify the memory usage protection of the executed processes in this unit. |
355 | When reclaiming memory, the unit is treated as if it was using less memory resulting in memory | |
356 | to be preferentially reclaimed from unprotected units. | |
357 | Using <varname>MemoryLow=</varname> results in a weaker protection where memory may still | |
358 | be reclaimed to avoid invoking the OOM killer in case there is no other reclaimable memory.</para> | |
359 | <para> | |
360 | For a protection to be effective, it is generally required to set a corresponding | |
361 | allocation on all ancestors, which is then distributed between children | |
362 | (with the exception of the root slice). | |
363 | Any <varname>MemoryMin=</varname> or <varname>MemoryLow=</varname> allocation that is not | |
364 | explicitly distributed to specific children is used to create a shared protection for all children. | |
365 | As this is a shared protection, the children will freely compete for the memory.</para> | |
48422635 TH |
366 | |
367 | <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is | |
368 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a | |
369 | percentage value may be specified, which is taken relative to the installed physical memory on the | |
b62087d4 CD |
370 | system. If assigned the special value <literal>infinity</literal>, all available memory is protected, which may be |
371 | useful in order to always inherit all of the protection afforded by ancestors. | |
29bb3d7f BB |
372 | This controls the <literal>memory.min</literal> or <literal>memory.low</literal> control group attribute. |
373 | For details about this control group attribute, see <ulink | |
0e685823 | 374 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para> |
48422635 | 375 | |
29bb3d7f BB |
376 | <para>Units may have their children use a default <literal>memory.min</literal> or |
377 | <literal>memory.low</literal> value by specifying <varname>DefaultMemoryMin=</varname> or | |
378 | <varname>DefaultMemoryLow=</varname>, which has the same semantics as | |
96f321b6 LB |
379 | <varname>MemoryMin=</varname> and <varname>MemoryLow=</varname>, or <varname>DefaultStartupMemoryLow=</varname> |
380 | which has the same semantics as <varname>StartupMemoryLow=</varname>. | |
29bb3d7f BB |
381 | This setting does not affect <literal>memory.min</literal> or <literal>memory.low</literal> |
382 | in the unit itself. | |
383 | Using it to set a default child allocation is only useful on kernels older than 5.7, | |
384 | which do not support the <literal>memory_recursiveprot</literal> cgroup2 mount option.</para> | |
53fda560 LB |
385 | |
386 | <para>While <varname>StartupMemoryLow=</varname> applies to the startup and shutdown phases of the system, | |
387 | <varname>MemoryMin=</varname> applies to normal runtime of the system, and if the former is not set also to | |
388 | the startup and shutdown phases. Using <varname>StartupMemoryLow=</varname> allows prioritizing specific services at | |
389 | boot-up and shutdown differently than during normal runtime.</para> | |
aefdc112 AK |
390 | |
391 | <xi:include href="version-info.xml" xpointer="v240"/> | |
da4d897e TH |
392 | </listitem> |
393 | </varlistentry> | |
394 | ||
395 | <varlistentry> | |
396 | <term><varname>MemoryHigh=<replaceable>bytes</replaceable></varname></term> | |
53fda560 | 397 | <term><varname>StartupMemoryHigh=<replaceable>bytes</replaceable></varname></term> |
da4d897e TH |
398 | |
399 | <listitem> | |
253d0d59 ZJS |
400 | <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para> |
401 | ||
ba79e19c | 402 | <para>Specify the throttling limit on memory usage of the executed processes in this unit. Memory usage may go |
da4d897e TH |
403 | above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away |
404 | aggressively in such cases. This is the main mechanism to control memory usage of a unit.</para> | |
405 | ||
406 | <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is | |
875ae566 LP |
407 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a |
408 | percentage value may be specified, which is taken relative to the installed physical memory on the | |
409 | system. If assigned the | |
ba79e19c | 410 | special value <literal>infinity</literal>, no memory throttling is applied. This controls the |
da4d897e | 411 | <literal>memory.high</literal> control group attribute. For details about this control group attribute, see |
4fb0d2dc MK |
412 | <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>. |
413 | The effective configuration is reported as <varname>EffectiveMemoryHigh=</varname> | |
414 | (see also <varname>EffectiveMemoryMax=</varname>).</para> | |
53fda560 LB |
415 | |
416 | <para>While <varname>StartupMemoryHigh=</varname> applies to the startup and shutdown phases of the system, | |
417 | <varname>MemoryHigh=</varname> applies to normal runtime of the system, and if the former is not set also to | |
418 | the startup and shutdown phases. Using <varname>StartupMemoryHigh=</varname> allows prioritizing specific services at | |
419 | boot-up and shutdown differently than during normal runtime.</para> | |
aefdc112 AK |
420 | |
421 | <xi:include href="version-info.xml" xpointer="v231"/> | |
da4d897e TH |
422 | </listitem> |
423 | </varlistentry> | |
424 | ||
425 | <varlistentry> | |
426 | <term><varname>MemoryMax=<replaceable>bytes</replaceable></varname></term> | |
53fda560 | 427 | <term><varname>StartupMemoryMax=<replaceable>bytes</replaceable></varname></term> |
da4d897e TH |
428 | |
429 | <listitem> | |
253d0d59 ZJS |
430 | <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para> |
431 | ||
da4d897e TH |
432 | <para>Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage |
433 | cannot be contained under the limit, out-of-memory killer is invoked inside the unit. It is recommended to | |
434 | use <varname>MemoryHigh=</varname> as the main control mechanism and use <varname>MemoryMax=</varname> as the | |
435 | last line of defense.</para> | |
436 | ||
437 | <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is | |
875ae566 LP |
438 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a |
439 | percentage value may be specified, which is taken relative to the installed physical memory on the system. If | |
440 | assigned the special value <literal>infinity</literal>, no memory limit is applied. This controls the | |
da4d897e | 441 | <literal>memory.max</literal> control group attribute. For details about this control group attribute, see |
4fb0d2dc MK |
442 | <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>. |
443 | The effective configuration is reported as <varname>EffectiveMemoryMax=</varname> (the value is | |
93f8e88d | 444 | the most stringent limit of the unit and parent slices and it is capped by physical memory).</para> |
53fda560 LB |
445 | |
446 | <para>While <varname>StartupMemoryMax=</varname> applies to the startup and shutdown phases of the system, | |
447 | <varname>MemoryMax=</varname> applies to normal runtime of the system, and if the former is not set also to | |
448 | the startup and shutdown phases. Using <varname>StartupMemoryMax=</varname> allows prioritizing specific services at | |
449 | boot-up and shutdown differently than during normal runtime.</para> | |
aefdc112 AK |
450 | |
451 | <xi:include href="version-info.xml" xpointer="v231"/> | |
da4d897e TH |
452 | </listitem> |
453 | </varlistentry> | |
454 | ||
96e131ea WC |
455 | <varlistentry> |
456 | <term><varname>MemorySwapMax=<replaceable>bytes</replaceable></varname></term> | |
53fda560 | 457 | <term><varname>StartupMemorySwapMax=<replaceable>bytes</replaceable></varname></term> |
96e131ea WC |
458 | |
459 | <listitem> | |
253d0d59 ZJS |
460 | <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para> |
461 | ||
6ee27eb3 | 462 | <para>Specify the absolute limit on swap usage of the executed processes in this unit.</para> |
96e131ea WC |
463 | |
464 | <para>Takes a swap size in bytes. If the value is suffixed with K, M, G or T, the specified swap size is | |
465 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the | |
d7fe0a67 | 466 | special value <literal>infinity</literal>, no swap limit is applied. These settings control the |
6ee27eb3 AZ |
467 | <literal>memory.swap.max</literal> control group attribute. For details about this control group attribute, |
468 | see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para> | |
53fda560 LB |
469 | |
470 | <para>While <varname>StartupMemorySwapMax=</varname> applies to the startup and shutdown phases of the system, | |
471 | <varname>MemorySwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to | |
472 | the startup and shutdown phases. Using <varname>StartupMemorySwapMax=</varname> allows prioritizing specific services at | |
473 | boot-up and shutdown differently than during normal runtime.</para> | |
aefdc112 AK |
474 | |
475 | <xi:include href="version-info.xml" xpointer="v232"/> | |
6ee27eb3 AZ |
476 | </listitem> |
477 | </varlistentry> | |
478 | ||
479 | <varlistentry> | |
480 | <term><varname>MemoryZSwapMax=<replaceable>bytes</replaceable></varname></term> | |
53fda560 | 481 | <term><varname>StartupMemoryZSwapMax=<replaceable>bytes</replaceable></varname></term> |
6ee27eb3 AZ |
482 | |
483 | <listitem> | |
253d0d59 ZJS |
484 | <para>These settings control the <option>memory</option> controller in the unified hierarchy.</para> |
485 | ||
6ee27eb3 AZ |
486 | <para>Specify the absolute limit on zswap usage of the processes in this unit. Zswap is a lightweight compressed |
487 | cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a | |
488 | dynamically allocated RAM-based memory pool. If the limit specified is hit, no entries from this unit will be | |
489 | stored in the pool until existing entries are faulted back or written out to disk. See the kernel's | |
93428875 | 490 | <ulink url="https://docs.kernel.org/admin-guide/mm/zswap.html">Zswap</ulink> documentation for more details.</para> |
6ee27eb3 AZ |
491 | |
492 | <para>Takes a size in bytes. If the value is suffixed with K, M, G or T, the specified size is | |
493 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. If assigned the | |
494 | special value <literal>infinity</literal>, no limit is applied. These settings control the | |
495 | <literal>memory.zswap.max</literal> control group attribute. For details about this control group attribute, | |
0e685823 | 496 | see <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-interface-files">Memory Interface Files</ulink>.</para> |
53fda560 LB |
497 | |
498 | <para>While <varname>StartupMemoryZSwapMax=</varname> applies to the startup and shutdown phases of the system, | |
499 | <varname>MemoryZSwapMax=</varname> applies to normal runtime of the system, and if the former is not set also to | |
500 | the startup and shutdown phases. Using <varname>StartupMemoryZSwapMax=</varname> allows prioritizing specific services at | |
501 | boot-up and shutdown differently than during normal runtime.</para> | |
aefdc112 AK |
502 | |
503 | <xi:include href="version-info.xml" xpointer="v253"/> | |
d868475a ZJS |
504 | </listitem> |
505 | </varlistentry> | |
506 | ||
5cbfbf2a LP |
507 | <varlistentry> |
508 | <term><varname>AllowedMemoryNodes=</varname></term> | |
509 | <term><varname>StartupAllowedMemoryNodes=</varname></term> | |
510 | ||
511 | <listitem> | |
512 | <para>These settings control the <option>cpuset</option> controller in the unified hierarchy.</para> | |
513 | ||
514 | <para>Restrict processes to be executed on specific memory NUMA nodes. Takes a list of memory NUMA nodes indices | |
515 | or ranges separated by either whitespace or commas. Memory NUMA nodes ranges are specified by the lower and upper | |
516 | NUMA nodes indices separated by a dash.</para> | |
517 | ||
518 | <para>Setting <varname>AllowedMemoryNodes=</varname> or <varname>StartupAllowedMemoryNodes=</varname> doesn't | |
519 | guarantee that all of the memory NUMA nodes will be used by the processes as it may be limited by parent units. | |
520 | The effective configuration is reported as <varname>EffectiveMemoryNodes=</varname>.</para> | |
521 | ||
522 | <para>While <varname>StartupAllowedMemoryNodes=</varname> applies to the startup and shutdown phases of the system, | |
523 | <varname>AllowedMemoryNodes=</varname> applies to normal runtime of the system, and if the former is not set also to | |
524 | the startup and shutdown phases. Using <varname>StartupAllowedMemoryNodes=</varname> allows prioritizing specific services at | |
525 | boot-up and shutdown differently than during normal runtime.</para> | |
526 | ||
527 | <para>This setting is supported only with the unified control group hierarchy.</para> | |
aefdc112 AK |
528 | |
529 | <xi:include href="version-info.xml" xpointer="v244"/> | |
5cbfbf2a LP |
530 | </listitem> |
531 | </varlistentry> | |
532 | ||
533 | </variablelist> | |
534 | ||
535 | </refsect2><refsect2><title>Process Accounting and Control</title> | |
536 | ||
537 | <variablelist class='unit-directives'> | |
538 | ||
03a7b521 LP |
539 | <varlistentry> |
540 | <term><varname>TasksAccounting=</varname></term> | |
541 | ||
542 | <listitem> | |
253d0d59 ZJS |
543 | <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para> |
544 | ||
396d298d ZJS |
545 | <para>Turn on task accounting for this unit. Takes a boolean argument. If enabled, the kernel will |
546 | keep track of the total number of tasks in the unit and its children. This number includes both | |
547 | kernel threads and userspace processes, with each thread counted individually. Note that turning on | |
548 | tasks accounting for one unit will also implicitly turn it on for all units contained in the same | |
549 | slice and for all its parent slices and the units contained therein. The system default for this | |
550 | setting may be controlled with <varname>DefaultTasksAccounting=</varname> in | |
03a7b521 | 551 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> |
aefdc112 AK |
552 | |
553 | <xi:include href="version-info.xml" xpointer="v227"/> | |
03a7b521 LP |
554 | </listitem> |
555 | </varlistentry> | |
556 | ||
557 | <varlistentry> | |
558 | <term><varname>TasksMax=<replaceable>N</replaceable></varname></term> | |
559 | ||
560 | <listitem> | |
253d0d59 ZJS |
561 | <para>This setting controls the <option>pids</option> controller in the unified hierarchy.</para> |
562 | ||
6d48c7cf LP |
563 | <para>Specify the maximum number of tasks that may be created in the unit. This ensures that the |
564 | number of tasks accounted for the unit (see above) stays below a specific limit. This either takes | |
565 | an absolute number of tasks or a percentage value that is taken relative to the configured maximum | |
566 | number of tasks on the system. If assigned the special value <literal>infinity</literal>, no tasks | |
567 | limit is applied. This controls the <literal>pids.max</literal> control group attribute. For | |
568 | details about this control group attribute, the | |
93428875 | 569 | <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html#pid">pids controller |
4fb0d2dc MK |
570 | </ulink>. |
571 | The effective configuration is reported as <varname>EffectiveTasksMax=</varname>.</para> | |
03a7b521 | 572 | |
bb6d563a | 573 | <para>The system default for this setting may be controlled with |
0af20ea2 LP |
574 | <varname>DefaultTasksMax=</varname> in |
575 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
aefdc112 AK |
576 | |
577 | <xi:include href="version-info.xml" xpointer="v227"/> | |
03a7b521 LP |
578 | </listitem> |
579 | </varlistentry> | |
580 | ||
5cbfbf2a LP |
581 | </variablelist> |
582 | ||
583 | </refsect2><refsect2><title>IO Accounting and Control</title> | |
584 | ||
585 | <variablelist class='unit-directives'> | |
586 | ||
13c31542 TH |
587 | <varlistentry> |
588 | <term><varname>IOAccounting=</varname></term> | |
589 | ||
590 | <listitem> | |
253d0d59 ZJS |
591 | <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para> |
592 | ||
0069a0dd LP |
593 | <para>Turn on Block I/O accounting for this unit, if the unified control group hierarchy is used on the |
594 | system. Takes a boolean argument. Note that turning on block I/O accounting for one unit will also implicitly | |
595 | turn it on for all units contained in the same slice and all for its parent slices and the units contained | |
596 | therein. The system default for this setting may be controlled with <varname>DefaultIOAccounting=</varname> | |
597 | in | |
13c31542 | 598 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> |
aefdc112 AK |
599 | |
600 | <xi:include href="version-info.xml" xpointer="v230"/> | |
13c31542 TH |
601 | </listitem> |
602 | </varlistentry> | |
603 | ||
604 | <varlistentry> | |
605 | <term><varname>IOWeight=<replaceable>weight</replaceable></varname></term> | |
606 | <term><varname>StartupIOWeight=<replaceable>weight</replaceable></varname></term> | |
607 | ||
608 | <listitem> | |
253d0d59 ZJS |
609 | <para>These settings control the <option>io</option> controller in the unified hierarchy.</para> |
610 | ||
7dbc38db LP |
611 | <para>Set the default overall block I/O weight for the executed processes, if the unified control |
612 | group hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the | |
613 | default block I/O weight. This controls the <literal>io.weight</literal> control group attribute, | |
614 | which defaults to 100. For details about this control group attribute, see <ulink | |
0e685823 | 615 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO |
7dbc38db LP |
616 | Interface Files</ulink>. The available I/O bandwidth is split up among all units within one slice |
617 | relative to their block I/O weight. A higher weight means more I/O bandwidth, a lower weight means | |
618 | less.</para> | |
13c31542 | 619 | |
058a2d8f PM |
620 | <para>While <varname>StartupIOWeight=</varname> applies |
621 | to the startup and shutdown phases of the system, | |
13c31542 TH |
622 | <varname>IOWeight=</varname> applies to the later runtime of |
623 | the system, and if the former is not set also to the startup | |
058a2d8f PM |
624 | and shutdown phases. This allows prioritizing specific services at boot-up |
625 | and shutdown differently than during runtime.</para> | |
aefdc112 AK |
626 | |
627 | <xi:include href="version-info.xml" xpointer="v230"/> | |
13c31542 TH |
628 | </listitem> |
629 | </varlistentry> | |
630 | ||
631 | <varlistentry> | |
632 | <term><varname>IODeviceWeight=<replaceable>device</replaceable> <replaceable>weight</replaceable></varname></term> | |
633 | ||
634 | <listitem> | |
253d0d59 ZJS |
635 | <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para> |
636 | ||
0069a0dd LP |
637 | <para>Set the per-device overall block I/O weight for the executed processes, if the unified control group |
638 | hierarchy is used on the system. Takes a space-separated pair of a file path and a weight value to specify | |
6ae4283c TH |
639 | the device specific weight value, between 1 and 10000. (Example: <literal>/dev/sda 1000</literal>). The file |
640 | path may be specified as path to a block device node or as any other file, in which case the backing block | |
641 | device of the file system of the file is determined. This controls the <literal>io.weight</literal> control | |
642 | group attribute, which defaults to 100. Use this option multiple times to set weights for multiple devices. | |
643 | For details about this control group attribute, see <ulink | |
0e685823 | 644 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para> |
13c31542 | 645 | |
f27a21d4 LP |
646 | <para>The specified device node should reference a block device that has an I/O scheduler |
647 | associated, i.e. should not refer to partition or loopback block devices, but to the originating, | |
648 | physical device. When a path to a regular file or directory is specified it is attempted to | |
649 | discover the correct originating device backing the file system of the specified path. This works | |
650 | correctly only for simpler cases, where the file system is directly placed on a partition or | |
651 | physical block device, or where simple 1:1 encryption using dm-crypt/LUKS is used. This discovery | |
652 | does not cover complex storage and in particular RAID and volume management storage devices.</para> | |
aefdc112 AK |
653 | |
654 | <xi:include href="version-info.xml" xpointer="v230"/> | |
13c31542 TH |
655 | </listitem> |
656 | </varlistentry> | |
657 | ||
658 | <varlistentry> | |
659 | <term><varname>IOReadBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term> | |
660 | <term><varname>IOWriteBandwidthMax=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname></term> | |
661 | ||
662 | <listitem> | |
253d0d59 ZJS |
663 | <para>These settings control the <option>io</option> controller in the unified hierarchy.</para> |
664 | ||
0069a0dd LP |
665 | <para>Set the per-device overall block I/O bandwidth maximum limit for the executed processes, if the unified |
666 | control group hierarchy is used on the system. This limit is not work-conserving and the executed processes | |
667 | are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of a file | |
668 | path and a bandwidth value (in bytes per second) to specify the device specific bandwidth. The file path may | |
669 | be a path to a block device node, or as any other file in which case the backing block device of the file | |
670 | system of the file is used. If the bandwidth is suffixed with K, M, G, or T, the specified bandwidth is | |
671 | parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively, to the base of 1000. (Example: | |
672 | "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 5M"). This controls the <literal>io.max</literal> control | |
673 | group attributes. Use this option multiple times to set bandwidth limits for multiple devices. For details | |
674 | about this control group attribute, see <ulink | |
0e685823 | 675 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>. |
13c31542 TH |
676 | </para> |
677 | ||
f27a21d4 | 678 | <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para> |
aefdc112 AK |
679 | |
680 | <xi:include href="version-info.xml" xpointer="v230"/> | |
13c31542 TH |
681 | </listitem> |
682 | </varlistentry> | |
683 | ||
ac06a0cf TH |
684 | <varlistentry> |
685 | <term><varname>IOReadIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term> | |
686 | <term><varname>IOWriteIOPSMax=<replaceable>device</replaceable> <replaceable>IOPS</replaceable></varname></term> | |
687 | ||
688 | <listitem> | |
253d0d59 ZJS |
689 | <para>These settings control the <option>io</option> controller in the unified hierarchy.</para> |
690 | ||
ac06a0cf TH |
691 | <para>Set the per-device overall block I/O IOs-Per-Second maximum limit for the executed processes, if the |
692 | unified control group hierarchy is used on the system. This limit is not work-conserving and the executed | |
693 | processes are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of | |
694 | a file path and an IOPS value to specify the device specific IOPS. The file path may be a path to a block | |
695 | device node, or as any other file in which case the backing block device of the file system of the file is | |
696 | used. If the IOPS is suffixed with K, M, G, or T, the specified IOPS is parsed as KiloIOPS, MegaIOPS, | |
697 | GigaIOPS, or TeraIOPS, respectively, to the base of 1000. (Example: | |
698 | "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 1K"). This controls the <literal>io.max</literal> control | |
699 | group attributes. Use this option multiple times to set IOPS limits for multiple devices. For details about | |
700 | this control group attribute, see <ulink | |
0e685823 | 701 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>. |
ac06a0cf TH |
702 | </para> |
703 | ||
f27a21d4 | 704 | <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para> |
aefdc112 AK |
705 | |
706 | <xi:include href="version-info.xml" xpointer="v230"/> | |
d868475a | 707 | </listitem> |
6ae4283c TH |
708 | </varlistentry> |
709 | ||
710 | <varlistentry> | |
711 | <term><varname>IODeviceLatencyTargetSec=<replaceable>device</replaceable> <replaceable>target</replaceable></varname></term> | |
712 | ||
713 | <listitem> | |
253d0d59 ZJS |
714 | <para>This setting controls the <option>io</option> controller in the unified hierarchy.</para> |
715 | ||
6ae4283c TH |
716 | <para>Set the per-device average target I/O latency for the executed processes, if the unified control group |
717 | hierarchy is used on the system. Takes a file path and a timespan separated by a space to specify | |
718 | the device specific latency target. (Example: "/dev/sda 25ms"). The file path may be specified | |
719 | as path to a block device node or as any other file, in which case the backing block device of the file | |
720 | system of the file is determined. This controls the <literal>io.latency</literal> control group | |
721 | attribute. Use this option multiple times to set latency target for multiple devices. For details about this | |
722 | control group attribute, see <ulink | |
0e685823 | 723 | url="https://docs.kernel.org/admin-guide/cgroup-v2.html#io-interface-files">IO Interface Files</ulink>.</para> |
6ae4283c | 724 | |
964c4eda | 725 | <para>Implies <literal>IOAccounting=yes</literal>.</para> |
6ae4283c TH |
726 | |
727 | <para>These settings are supported only if the unified control group hierarchy is used.</para> | |
f27a21d4 LP |
728 | |
729 | <para>Similar restrictions on block device discovery as for <varname>IODeviceWeight=</varname> apply, see above.</para> | |
aefdc112 AK |
730 | |
731 | <xi:include href="version-info.xml" xpointer="v240"/> | |
6ae4283c | 732 | </listitem> |
d868475a ZJS |
733 | </varlistentry> |
734 | ||
5cbfbf2a LP |
735 | </variablelist> |
736 | ||
737 | </refsect2><refsect2><title>Network Accounting and Control</title> | |
738 | ||
739 | <variablelist class='unit-directives'> | |
740 | ||
8d8631d4 DM |
741 | <varlistentry> |
742 | <term><varname>IPAccounting=</varname></term> | |
743 | ||
744 | <listitem> | |
745 | <para>Takes a boolean argument. If true, turns on IPv4 and IPv6 network traffic accounting for packets sent | |
746 | or received by the unit. When this option is turned on, all IPv4 and IPv6 sockets created by any process of | |
2f75b05c ZJS |
747 | the unit are accounted for.</para> |
748 | ||
749 | <para>When this option is used in socket units, it applies to all IPv4 and IPv6 sockets | |
8d8631d4 DM |
750 | associated with it (including both listening and connection sockets where this applies). Note that for |
751 | socket-activated services, this configuration setting and the accounting data of the service unit and the | |
752 | socket unit are kept separate, and displayed separately. No propagation of the setting and the collected | |
753 | statistics is done, in either direction. Moreover, any traffic sent or received on any of the socket unit's | |
754 | sockets is accounted to the socket unit — and never to the service unit it might have activated, even if the | |
2f75b05c ZJS |
755 | socket is used by it.</para> |
756 | ||
757 | <para>The system default for this setting may be controlled with <varname>DefaultIPAccounting=</varname> in | |
8d8631d4 | 758 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> |
aefdc112 AK |
759 | |
760 | <xi:include href="version-info.xml" xpointer="v235"/> | |
8d8631d4 DM |
761 | </listitem> |
762 | </varlistentry> | |
763 | ||
764 | <varlistentry> | |
dcfaecc7 | 765 | <term><varname>IPAddressAllow=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term> |
8d8631d4 DM |
766 | <term><varname>IPAddressDeny=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term> |
767 | ||
768 | <listitem> | |
e1a04232 ZJS |
769 | <para>Turn on network traffic filtering for IP packets sent and received over |
770 | <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets. Both directives take a | |
ef81ce6e | 771 | space separated list of IPv4 or IPv6 addresses, each optionally suffixed with an address prefix |
e1a04232 ZJS |
772 | length in bits after a <literal>/</literal> character. If the suffix is omitted, the address is |
773 | considered a host address, i.e. the filter covers the whole address (32 bits for IPv4, 128 bits for | |
774 | IPv6).</para> | |
ef81ce6e LP |
775 | |
776 | <para>The access lists configured with this option are applied to all sockets created by processes | |
777 | of this unit (or in the case of socket units, associated with it). The lists are implicitly | |
778 | combined with any lists configured for any of the parent slice units this unit might be a member | |
e1a04232 | 779 | of. By default both access lists are empty. Both ingress and egress traffic is filtered by these |
ef81ce6e | 780 | settings. In case of ingress traffic the source IP address is checked against these access lists, |
e1a04232 ZJS |
781 | in case of egress traffic the destination IP address is checked. The following rules are applied in |
782 | turn:</para> | |
8d8631d4 DM |
783 | |
784 | <itemizedlist> | |
e1a04232 ZJS |
785 | <listitem><para>Access is granted when the checked IP address matches an entry in the |
786 | <varname>IPAddressAllow=</varname> list.</para></listitem> | |
8d8631d4 | 787 | |
e1a04232 ZJS |
788 | <listitem><para>Otherwise, access is denied when the checked IP address matches an entry in the |
789 | <varname>IPAddressDeny=</varname> list.</para></listitem> | |
8d8631d4 | 790 | |
e1a04232 | 791 | <listitem><para>Otherwise, access is granted.</para></listitem> |
8d8631d4 DM |
792 | </itemizedlist> |
793 | ||
6b000af4 LP |
794 | <para>In order to implement an allow-listing IP firewall, it is recommended to use a |
795 | <varname>IPAddressDeny=</varname><constant>any</constant> setting on an upper-level slice unit | |
796 | (such as the root slice <filename>-.slice</filename> or the slice containing all system services | |
8d8631d4 | 797 | <filename>system.slice</filename> – see |
6b000af4 LP |
798 | <citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry> |
799 | for details on these slice units), plus individual per-service <varname>IPAddressAllow=</varname> | |
800 | lines permitting network access to relevant services, and only them.</para> | |
8d8631d4 | 801 | |
e1a04232 ZJS |
802 | <para>Note that for socket-activated services, the IP access list configured on the socket unit |
803 | applies to all sockets associated with it directly, but not to any sockets created by the | |
804 | ultimately activated services for it. Conversely, the IP access list configured for the service is | |
805 | not applied to any sockets passed into the service via socket activation. Thus, it is usually a | |
806 | good idea to replicate the IP access lists on both the socket and the service unit. Nevertheless, | |
807 | it may make sense to maintain one list more open and the other one more restricted, depending on | |
7227dd81 | 808 | the use case.</para> |
8d8631d4 DM |
809 | |
810 | <para>If these settings are used multiple times in the same unit the specified lists are combined. If an | |
811 | empty string is assigned to these settings the specific access list is reset and all previous settings undone.</para> | |
812 | ||
813 | <para>In place of explicit IPv4 or IPv6 address and prefix length specifications a small set of symbolic | |
814 | names may be used. The following names are defined:</para> | |
815 | ||
816 | <table> | |
817 | <title>Special address/network names</title> | |
818 | ||
819 | <tgroup cols='3'> | |
820 | <colspec colname='name'/> | |
821 | <colspec colname='definition'/> | |
822 | <colspec colname='meaning'/> | |
823 | ||
824 | <thead> | |
825 | <row> | |
826 | <entry>Symbolic Name</entry> | |
827 | <entry>Definition</entry> | |
828 | <entry>Meaning</entry> | |
829 | </row> | |
830 | </thead> | |
831 | ||
832 | <tbody> | |
833 | <row> | |
834 | <entry><constant>any</constant></entry> | |
835 | <entry>0.0.0.0/0 ::/0</entry> | |
836 | <entry>Any host</entry> | |
837 | </row> | |
838 | ||
839 | <row> | |
840 | <entry><constant>localhost</constant></entry> | |
841 | <entry>127.0.0.0/8 ::1/128</entry> | |
842 | <entry>All addresses on the local loopback</entry> | |
843 | </row> | |
844 | ||
845 | <row> | |
846 | <entry><constant>link-local</constant></entry> | |
847 | <entry>169.254.0.0/16 fe80::/64</entry> | |
848 | <entry>All link-local IP addresses</entry> | |
849 | </row> | |
850 | ||
851 | <row> | |
852 | <entry><constant>multicast</constant></entry> | |
853 | <entry>224.0.0.0/4 ff00::/8</entry> | |
854 | <entry>All IP multicasting addresses</entry> | |
855 | </row> | |
856 | </tbody> | |
857 | </tgroup> | |
858 | </table> | |
859 | ||
860 | <para>Note that these settings might not be supported on some systems (for example if eBPF control group | |
861 | support is not enabled in the underlying kernel or container manager). These settings will have no effect in | |
862 | that case. If compatibility with such systems is desired it is hence recommended to not exclusively rely on | |
863 | them for IP security.</para> | |
f2af682c LB |
864 | |
865 | <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/> | |
aefdc112 AK |
866 | |
867 | <xi:include href="version-info.xml" xpointer="v235"/> | |
8d8631d4 DM |
868 | </listitem> |
869 | </varlistentry> | |
870 | ||
63598110 JK |
871 | <varlistentry> |
872 | <term><varname>SocketBindAllow=<replaceable>bind-rule</replaceable></varname></term> | |
873 | <term><varname>SocketBindDeny=<replaceable>bind-rule</replaceable></varname></term> | |
874 | ||
875 | <listitem> | |
876 | <para>Allow or deny binding a socket address to a socket by matching it with the <replaceable>bind-rule</replaceable> and | |
877 | applying a corresponding action if there is a match.</para> | |
878 | ||
120338ae JK |
879 | <para><replaceable>bind-rule</replaceable> describes socket properties such as <replaceable>address-family</replaceable>, |
880 | <replaceable>transport-protocol</replaceable> and <replaceable>ip-ports</replaceable>.</para> | |
63598110 | 881 | |
120338ae JK |
882 | <para><replaceable>bind-rule</replaceable> := |
883 | { [<replaceable>address-family</replaceable><constant>:</constant>][<replaceable>transport-protocol</replaceable><constant>:</constant>][<replaceable>ip-ports</replaceable>] | <constant>any</constant> }</para> | |
63598110 | 884 | |
f80a206a | 885 | <para><replaceable>address-family</replaceable> := { <constant>ipv4</constant> | <constant>ipv6</constant> }</para> |
63598110 | 886 | |
120338ae | 887 | <para><replaceable>transport-protocol</replaceable> := { <constant>tcp</constant> | <constant>udp</constant> }</para> |
63598110 | 888 | |
120338ae JK |
889 | <para><replaceable>ip-ports</replaceable> := { <replaceable>ip-port</replaceable> | <replaceable>ip-port-range</replaceable> }</para> |
890 | ||
891 | <para>An optional <replaceable>address-family</replaceable> expects <constant>ipv4</constant> or <constant>ipv6</constant> values. | |
892 | If not specified, a rule will be matched for both IPv4 and IPv6 addresses and applied depending on other socket fields, e.g. <replaceable>transport-protocol</replaceable>, | |
63598110 JK |
893 | <replaceable>ip-port</replaceable>.</para> |
894 | ||
120338ae JK |
895 | <para>An optional <replaceable>transport-protocol</replaceable> expects <constant>tcp</constant> or <constant>udp</constant> transport protocol names. |
896 | If not specified, a rule will be matched for any transport protocol.</para> | |
897 | ||
898 | <para>An optional <replaceable>ip-port</replaceable> value must lie within 1…65535 interval inclusively, i.e. | |
63598110 JK |
899 | dynamic port <constant>0</constant> is not allowed. A range of sequential ports is described by |
900 | <replaceable>ip-port-range</replaceable> := <replaceable>ip-port-low</replaceable><constant>-</constant><replaceable>ip-port-high</replaceable>, | |
901 | where <replaceable>ip-port-low</replaceable> is smaller than or equal to <replaceable>ip-port-high</replaceable> | |
120338ae JK |
902 | and both are within 1…65535 inclusively.</para> |
903 | ||
904 | <para>A special value <constant>any</constant> can be used to apply a rule to any address family, transport protocol and any port with a positive value.</para> | |
63598110 JK |
905 | |
906 | <para>To allow multiple rules assign <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname> multiple times. | |
907 | To clear the existing assignments pass an empty <varname>SocketBindAllow=</varname> or <varname>SocketBindDeny=</varname> | |
908 | assignment.</para> | |
909 | ||
910 | <para>For each of <varname>SocketBindAllow=</varname> and <varname>SocketBindDeny=</varname>, maximum allowed number of assignments is | |
911 | <constant>128</constant>.</para> | |
912 | ||
913 | <itemizedlist> | |
914 | <listitem><para>Binding to a socket is allowed when a socket address matches an entry in the | |
915 | <varname>SocketBindAllow=</varname> list.</para></listitem> | |
916 | ||
917 | <listitem><para>Otherwise, binding is denied when the socket address matches an entry in the | |
918 | <varname>SocketBindDeny=</varname> list.</para></listitem> | |
919 | ||
920 | <listitem><para>Otherwise, binding is allowed.</para></listitem> | |
921 | </itemizedlist> | |
922 | ||
923 | <para>The feature is implemented with <constant>cgroup/bind4</constant> and <constant>cgroup/bind6</constant> cgroup-bpf hooks.</para> | |
924 | <para>Examples:<programlisting>… | |
925 | # Allow binding IPv6 socket addresses with a port greater than or equal to 10000. | |
926 | [Service] | |
f80a206a | 927 | SocketBindAllow=ipv6:10000-65535 |
63598110 JK |
928 | SocketBindDeny=any |
929 | … | |
930 | # Allow binding IPv4 and IPv6 socket addresses with 1234 and 4321 ports. | |
931 | [Service] | |
932 | SocketBindAllow=1234 | |
933 | SocketBindAllow=4321 | |
934 | SocketBindDeny=any | |
935 | … | |
936 | # Deny binding IPv6 socket addresses. | |
937 | [Service] | |
120338ae | 938 | SocketBindDeny=ipv6 |
63598110 JK |
939 | … |
940 | # Deny binding IPv4 and IPv6 socket addresses. | |
941 | [Service] | |
942 | SocketBindDeny=any | |
120338ae JK |
943 | … |
944 | # Allow binding only over TCP | |
945 | [Service] | |
946 | SocketBindAllow=tcp | |
947 | SocketBindDeny=any | |
948 | … | |
949 | # Allow binding only over IPv6/TCP | |
950 | [Service] | |
951 | SocketBindAllow=ipv6:tcp | |
952 | SocketBindDeny=any | |
953 | … | |
954 | # Allow binding ports within 10000-65535 range over IPv4/UDP. | |
955 | [Service] | |
956 | SocketBindAllow=ipv4:udp:10000-65535 | |
957 | SocketBindDeny=any | |
63598110 | 958 | …</programlisting></para> |
f2af682c LB |
959 | |
960 | <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/> | |
aefdc112 AK |
961 | |
962 | <xi:include href="version-info.xml" xpointer="v249"/> | |
63598110 JK |
963 | </listitem> |
964 | </varlistentry> | |
965 | ||
795ccb03 MV |
966 | <varlistentry> |
967 | <term><varname>RestrictNetworkInterfaces=</varname></term> | |
968 | ||
969 | <listitem> | |
970 | <para>Takes a list of space-separated network interface names. This option restricts the network | |
971 | interfaces that processes of this unit can use. By default processes can only use the network interfaces | |
972 | listed (allow-list). If the first character of the rule is <literal>~</literal>, the effect is inverted: | |
973 | the processes can only use network interfaces not listed (deny-list). | |
974 | </para> | |
975 | ||
976 | <para>This option can appear multiple times, in which case the network interface names are merged. If the | |
d4e30ad1 | 977 | empty string is assigned the set is reset, all prior assignments will have not effect. |
795ccb03 MV |
978 | </para> |
979 | ||
980 | <para>If you specify both types of this option (i.e. allow-listing and deny-listing), the first encountered | |
981 | will take precedence and will dictate the default action (allow vs deny). Then the next occurrences of this | |
982 | option will add or delete the listed network interface names from the set, depending of its type and the | |
983 | default action. | |
984 | </para> | |
985 | ||
986 | <para>The loopback interface ("lo") is not treated in any special way, you have to configure it explicitly | |
987 | in the unit file. | |
988 | </para> | |
989 | <para>Example 1: allow-list | |
990 | <programlisting> | |
991 | RestrictNetworkInterfaces=eth1 | |
992 | RestrictNetworkInterfaces=eth2</programlisting> | |
993 | Programs in the unit will be only able to use the eth1 and eth2 network | |
994 | interfaces. | |
995 | </para> | |
996 | ||
997 | <para>Example 2: deny-list | |
998 | <programlisting> | |
999 | RestrictNetworkInterfaces=~eth1 eth2</programlisting> | |
1000 | Programs in the unit will be able to use any network interface but eth1 and eth2. | |
1001 | </para> | |
1002 | ||
1003 | <para>Example 3: mixed | |
1004 | <programlisting> | |
1005 | RestrictNetworkInterfaces=eth1 eth2 | |
1006 | RestrictNetworkInterfaces=~eth1</programlisting> | |
1007 | Programs in the unit will be only able to use the eth2 network interface. | |
1008 | </para> | |
f2af682c LB |
1009 | |
1010 | <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/> | |
aefdc112 AK |
1011 | |
1012 | <xi:include href="version-info.xml" xpointer="v250"/> | |
795ccb03 MV |
1013 | </listitem> |
1014 | </varlistentry> | |
1015 | ||
a90f80c7 NR |
1016 | <varlistentry> |
1017 | <term><varname>NFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term> | |
1018 | <listitem> | |
1019 | <para>This setting provides a method for integrating dynamic cgroup, user and group IDs into | |
1020 | firewall rules with <ulink url="https://netfilter.org/projects/nftables/index.html">NFT</ulink> | |
1021 | sets. The benefit of using this setting is to be able to use the IDs as selectors in firewall rules | |
1022 | easily and this in turn allows more fine grained filtering. NFT rules for cgroup matching use | |
1023 | numeric cgroup IDs, which change every time a service is restarted, making them hard to use in | |
1024 | systemd environment otherwise. Dynamic and random IDs used by <varname>DynamicUser=</varname> can | |
1025 | be also integrated with this setting.</para> | |
1026 | ||
1027 | <para>This option expects a whitespace separated list of NFT set definitions. Each definition | |
1028 | consists of a colon-separated tuple of source type (one of <literal>cgroup</literal>, | |
1029 | <literal>user</literal> or <literal>group</literal>), NFT address family (one of | |
1030 | <literal>arp</literal>, <literal>bridge</literal>, <literal>inet</literal>, <literal>ip</literal>, | |
1031 | <literal>ip6</literal>, or <literal>netdev</literal>), table name and set name. The names of tables | |
1032 | and sets must conform to lexical restrictions of NFT table names. The type of the element used in | |
1033 | the NFT filter must match the type implied by the directive (<literal>cgroup</literal>, | |
1034 | <literal>user</literal> or <literal>group</literal>) as shown in the table below. When a control | |
1035 | group or a unit is realized, the corresponding ID will be appended to the NFT sets and it will be | |
1036 | be removed when the control group or unit is removed. <command>systemd</command> only inserts | |
1037 | elements to (or removes from) the sets, so the related NFT rules, tables and sets must be prepared | |
1038 | elsewhere in advance. Failures to manage the sets will be ignored.</para> | |
1039 | ||
1040 | <table> | |
1041 | <title>Defined <varname>source type</varname> values</title> | |
1042 | <tgroup cols='3'> | |
1043 | <colspec colname='source type'/> | |
1044 | <colspec colname='description'/> | |
1045 | <colspec colname='NFT type name'/> | |
1046 | <thead> | |
1047 | <row> | |
1048 | <entry>Source type</entry> | |
1049 | <entry>Description</entry> | |
1050 | <entry>Corresponding NFT type name</entry> | |
1051 | </row> | |
1052 | </thead> | |
1053 | ||
1054 | <tbody> | |
1055 | <row> | |
1056 | <entry><literal>cgroup</literal></entry> | |
1057 | <entry>control group ID</entry> | |
1058 | <entry><literal>cgroupsv2</literal></entry> | |
1059 | </row> | |
1060 | <row> | |
1061 | <entry><literal>user</literal></entry> | |
1062 | <entry>user ID</entry> | |
1063 | <entry><literal>meta skuid</literal></entry> | |
1064 | </row> | |
1065 | <row> | |
1066 | <entry><literal>group</literal></entry> | |
1067 | <entry>group ID</entry> | |
1068 | <entry><literal>meta skgid</literal></entry> | |
1069 | </row> | |
1070 | </tbody> | |
1071 | </tgroup> | |
1072 | </table> | |
1073 | ||
1074 | <para>If the firewall rules are reinstalled so that the contents of NFT sets are destroyed, command | |
1075 | <command>systemctl daemon-reload</command> can be used to refill the sets.</para> | |
1076 | ||
1077 | <para>Example: | |
1078 | <programlisting>[Unit] | |
1079 | NFTSet=cgroup:inet:filter:my_service user:inet:filter:serviceuser | |
1080 | </programlisting> | |
1081 | Corresponding NFT rules: | |
1082 | <programlisting>table inet filter { | |
1083 | set my_service { | |
1084 | type cgroupsv2 | |
1085 | } | |
1086 | set serviceuser { | |
1087 | typeof meta skuid | |
1088 | } | |
1089 | chain x { | |
1090 | socket cgroupv2 level 2 @my_service accept | |
1091 | drop | |
1092 | } | |
1093 | chain y { | |
1094 | meta skuid @serviceuser accept | |
1095 | drop | |
1096 | } | |
1097 | }</programlisting> | |
1098 | </para> | |
1099 | <xi:include href="version-info.xml" xpointer="v255"/></listitem> | |
1100 | </varlistentry> | |
1101 | ||
5cbfbf2a LP |
1102 | </variablelist> |
1103 | ||
1104 | </refsect2><refsect2><title>BPF Programs</title> | |
1105 | ||
1106 | <variablelist class='unit-directives'> | |
1107 | ||
1108 | <varlistentry> | |
1109 | <term><varname>IPIngressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term> | |
1110 | <term><varname>IPEgressFilterPath=<replaceable>BPF_FS_PROGRAM_PATH</replaceable></varname></term> | |
1111 | ||
1112 | <listitem> | |
1113 | <para>Add custom network traffic filters implemented as BPF programs, applying to all IP packets | |
1114 | sent and received over <constant>AF_INET</constant> and <constant>AF_INET6</constant> sockets. | |
1115 | Takes an absolute path to a pinned BPF program in the BPF virtual filesystem (<filename>/sys/fs/bpf/</filename>). | |
1116 | </para> | |
1117 | ||
1118 | <para>The filters configured with this option are applied to all sockets created by processes | |
1119 | of this unit (or in the case of socket units, associated with it). The filters are loaded in addition | |
1120 | to filters any of the parent slice units this unit might be a member of as well as any | |
1121 | <varname>IPAddressAllow=</varname> and <varname>IPAddressDeny=</varname> filters in any of these units. | |
1122 | By default there are no filters specified.</para> | |
1123 | ||
1124 | <para>If these settings are used multiple times in the same unit all the specified programs are attached. If an | |
1125 | empty string is assigned to these settings the program list is reset and all previous specified programs ignored.</para> | |
1126 | ||
1127 | <para>If the path <replaceable>BPF_FS_PROGRAM_PATH</replaceable> in <varname>IPIngressFilterPath=</varname> assignment | |
1128 | is already being handled by <varname>BPFProgram=</varname> ingress hook, e.g. | |
1129 | <varname>BPFProgram=</varname><constant>ingress</constant>:<replaceable>BPF_FS_PROGRAM_PATH</replaceable>, | |
1130 | the assignment will be still considered valid and the program will be attached to a cgroup. Same for | |
1131 | <varname>IPEgressFilterPath=</varname> path and <constant>egress</constant> hook.</para> | |
1132 | ||
1133 | <para>Note that for socket-activated services, the IP filter programs configured on the socket unit apply to | |
1134 | all sockets associated with it directly, but not to any sockets created by the ultimately activated services | |
1135 | for it. Conversely, the IP filter programs configured for the service are not applied to any sockets passed into | |
1136 | the service via socket activation. Thus, it is usually a good idea, to replicate the IP filter programs on both | |
1137 | the socket and the service unit, however it often makes sense to maintain one configuration more open and the other | |
7227dd81 | 1138 | one more restricted, depending on the use case.</para> |
5cbfbf2a LP |
1139 | |
1140 | <para>Note that these settings might not be supported on some systems (for example if eBPF control group | |
1141 | support is not enabled in the underlying kernel or container manager). These settings will fail the service in | |
1142 | that case. If compatibility with such systems is desired it is hence recommended to attach your filter manually | |
1143 | (requires <varname>Delegate=</varname><constant>yes</constant>) instead of using this setting.</para> | |
aefdc112 AK |
1144 | |
1145 | <xi:include href="version-info.xml" xpointer="v243"/> | |
5cbfbf2a LP |
1146 | </listitem> |
1147 | </varlistentry> | |
1148 | ||
1149 | <varlistentry> | |
a27e6fb7 | 1150 | <term><varname>BPFProgram=<replaceable>type</replaceable>:<replaceable>program-path</replaceable></varname></term> |
5cbfbf2a | 1151 | <listitem> |
a27e6fb7 LP |
1152 | <para><varname>BPFProgram=</varname> allows attaching custom BPF programs to the cgroup of a |
1153 | unit. (This generalizes the functionality exposed via <varname>IPEgressFilterPath=</varname> and | |
bf63dadb | 1154 | <varname>IPIngressFilterPath=</varname> for other hooks.) Cgroup-bpf hooks in the form of BPF |
a27e6fb7 LP |
1155 | programs loaded to the BPF filesystem are attached with cgroup-bpf attach flags determined by the |
1156 | unit. For details about attachment types and flags see <ulink | |
1157 | url="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/bpf.h"><filename>bpf.h</filename></ulink>. Also | |
1158 | refer to the general <ulink url="https://docs.kernel.org/bpf/">BPF documentation</ulink>.</para> | |
1159 | ||
1160 | <para>The specification of BPF program consists of a pair of BPF program type and program path in | |
1161 | the file system, with <literal>:</literal> as the separator: | |
1162 | <replaceable>type</replaceable>:<replaceable>program-path</replaceable>.</para> | |
1163 | ||
1164 | <para>The BPF program type is equivalent to the BPF attach type used in | |
bf63dadb ZJS |
1165 | <citerefentry project='mankier'><refentrytitle>bpftool</refentrytitle><manvolnum>8</manvolnum></citerefentry> |
1166 | It may be one of | |
1167 | <constant>egress</constant>, | |
1168 | <constant>ingress</constant>, | |
1169 | <constant>sock_create</constant>, | |
1170 | <constant>sock_ops</constant>, | |
1171 | <constant>device</constant>, | |
1172 | <constant>bind4</constant>, | |
1173 | <constant>bind6</constant>, | |
1174 | <constant>connect4</constant>, | |
1175 | <constant>connect6</constant>, | |
1176 | <constant>post_bind4</constant>, | |
1177 | <constant>post_bind6</constant>, | |
1178 | <constant>sendmsg4</constant>, | |
1179 | <constant>sendmsg6</constant>, | |
1180 | <constant>sysctl</constant>, | |
1181 | <constant>recvmsg4</constant>, | |
1182 | <constant>recvmsg6</constant>, | |
1183 | <constant>getsockopt</constant>, | |
1184 | or <constant>setsockopt</constant>. | |
1185 | </para> | |
5cbfbf2a | 1186 | |
a27e6fb7 LP |
1187 | <para>The specified program path must be an absolute path referencing a BPF program inode in the |
1188 | bpffs file system (which generally means it must begin with <filename>/sys/fs/bpf/</filename>). If | |
1189 | a specified program does not exist (i.e. has not been uploaded to the BPF subsystem of the kernel | |
1190 | yet), it will not be installed but unit activation will continue (a warning will be printed to the | |
1191 | logs).</para> | |
1192 | ||
1193 | <para>Setting <varname>BPFProgram=</varname> to an empty value makes previous assignments | |
1194 | ineffective.</para> | |
1195 | ||
1196 | <para>Multiple assignments of the same program type/path pair have the same effect as a single | |
1197 | assignment: the program will be attached just once.</para> | |
1198 | ||
5cbfbf2a LP |
1199 | <para>If BPF <constant>egress</constant> pinned to <replaceable>program-path</replaceable> path is already being |
1200 | handled by <varname>IPEgressFilterPath=</varname>, <varname>BPFProgram=</varname> | |
1201 | assignment will be considered valid and <varname>BPFProgram=</varname> will be attached to a cgroup. | |
1202 | Similarly for <constant>ingress</constant> hook and <varname>IPIngressFilterPath=</varname> assignment.</para> | |
1203 | ||
a27e6fb7 LP |
1204 | <para>BPF programs passed with <varname>BPFProgram=</varname> are attached to the cgroup of a unit |
1205 | with BPF attach flag <constant>multi</constant>, that allows further attachments of the same | |
5cbfbf2a LP |
1206 | <replaceable>type</replaceable> within cgroup hierarchy topped by the unit cgroup.</para> |
1207 | ||
a27e6fb7 | 1208 | <para>Examples:<programlisting>BPFProgram=egress:/sys/fs/bpf/egress-hook |
5cbfbf2a LP |
1209 | BPFProgram=bind6:/sys/fs/bpf/sock-addr-hook |
1210 | </programlisting></para> | |
aefdc112 AK |
1211 | |
1212 | <xi:include href="version-info.xml" xpointer="v249"/> | |
5cbfbf2a LP |
1213 | </listitem> |
1214 | </varlistentry> | |
1215 | ||
1216 | </variablelist> | |
1217 | ||
1218 | </refsect2><refsect2><title>Device Access</title> | |
1219 | ||
1220 | <variablelist class='unit-directives'> | |
1221 | ||
d868475a ZJS |
1222 | <varlistentry> |
1223 | <term><varname>DeviceAllow=</varname></term> | |
1224 | ||
1225 | <listitem> | |
3ff668cb LP |
1226 | <para>Control access to specific device nodes by the executed processes. Takes two space-separated |
1227 | strings: a device node specifier followed by a combination of <constant>r</constant>, | |
1228 | <constant>w</constant>, <constant>m</constant> to control <emphasis>r</emphasis>eading, | |
0923b425 | 1229 | <emphasis>w</emphasis>riting, or creation of the specific device nodes by the unit |
6d48c7cf LP |
1230 | (<emphasis>m</emphasis>knod), respectively. This functionality is implemented using eBPF |
1231 | filtering.</para> | |
3ff668cb | 1232 | |
a14e028e ZJS |
1233 | <para>When access to <emphasis>all</emphasis> physical devices should be disallowed, |
1234 | <varname>PrivateDevices=</varname> may be used instead. See | |
1235 | <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>. | |
1236 | </para> | |
1237 | ||
3ff668cb LP |
1238 | <para>The device node specifier is either a path to a device node in the file system, starting with |
1239 | <filename>/dev/</filename>, or a string starting with either <literal>char-</literal> or | |
1240 | <literal>block-</literal> followed by a device group name, as listed in | |
6b000af4 | 1241 | <filename>/proc/devices</filename>. The latter is useful to allow-list all current and future |
3ff668cb LP |
1242 | devices belonging to a specific device group at once. The device group is matched according to |
1243 | filename globbing rules, you may hence use the <literal>*</literal> and <literal>?</literal> | |
1244 | wildcards. (Note that such globbing wildcards are not available for device node path | |
1245 | specifications!) In order to match device nodes by numeric major/minor, use device node paths in | |
1246 | the <filename>/dev/char/</filename> and <filename>/dev/block/</filename> directories. However, | |
1247 | matching devices by major/minor is generally not recommended as assignments are neither stable nor | |
1248 | portable between systems or different kernel versions.</para> | |
1249 | ||
1250 | <para>Examples: <filename>/dev/sda5</filename> is a path to a device node, referring to an ATA or | |
1251 | SCSI block device. <literal>char-pts</literal> and <literal>char-alsa</literal> are specifiers for | |
1252 | all pseudo TTYs and all ALSA sound devices, respectively. <literal>char-cpu/*</literal> is a | |
1253 | specifier matching all CPU related device groups.</para> | |
00d85bbb | 1254 | |
6b000af4 | 1255 | <para>Note that allow lists defined this way should only reference device groups which are |
00d85bbb | 1256 | resolvable at the time the unit is started. Any device groups not resolvable then are not added to |
6b000af4 | 1257 | the device allow list. In order to work around this limitation, consider extending service units |
3a827125 LP |
1258 | with a pair of <command>After=modprobe@xyz.service</command> and |
1259 | <command>Wants=modprobe@xyz.service</command> lines that load the necessary kernel module | |
1260 | implementing the device group if missing. | |
1261 | Example: <programlisting>… | |
1262 | [Unit] | |
1263 | Wants=modprobe@loop.service | |
1264 | After=modprobe@loop.service | |
1265 | ||
00d85bbb | 1266 | [Service] |
00d85bbb LP |
1267 | DeviceAllow=block-loop |
1268 | DeviceAllow=/dev/loop-control | |
1269 | …</programlisting></para> | |
1270 | ||
f2af682c | 1271 | <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/> |
aefdc112 AK |
1272 | |
1273 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
1274 | </listitem> |
1275 | </varlistentry> | |
1276 | ||
1277 | <varlistentry> | |
1278 | <term><varname>DevicePolicy=auto|closed|strict</varname></term> | |
1279 | ||
1280 | <listitem> | |
1281 | <para> | |
1282 | Control the policy for allowing device access: | |
1283 | </para> | |
1284 | <variablelist> | |
1285 | <varlistentry> | |
1286 | <term><option>strict</option></term> | |
1287 | <listitem> | |
1288 | <para>means to only allow types of access that are | |
1289 | explicitly specified.</para> | |
aefdc112 AK |
1290 | |
1291 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
1292 | </listitem> |
1293 | </varlistentry> | |
1294 | ||
1295 | <varlistentry> | |
1296 | <term><option>closed</option></term> | |
1297 | <listitem> | |
6a75304e | 1298 | <para>in addition, allows access to standard pseudo |
d868475a ZJS |
1299 | devices including |
1300 | <filename>/dev/null</filename>, | |
1301 | <filename>/dev/zero</filename>, | |
1302 | <filename>/dev/full</filename>, | |
1303 | <filename>/dev/random</filename>, and | |
1304 | <filename>/dev/urandom</filename>. | |
1305 | </para> | |
aefdc112 AK |
1306 | |
1307 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
1308 | </listitem> |
1309 | </varlistentry> | |
1310 | ||
1311 | <varlistentry> | |
1312 | <term><option>auto</option></term> | |
1313 | <listitem> | |
1314 | <para> | |
6a75304e | 1315 | in addition, allows access to all devices if no |
d868475a ZJS |
1316 | explicit <varname>DeviceAllow=</varname> is present. |
1317 | This is the default. | |
1318 | </para> | |
aefdc112 AK |
1319 | |
1320 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
1321 | </listitem> |
1322 | </varlistentry> | |
1323 | </variablelist> | |
f2af682c LB |
1324 | |
1325 | <xi:include href="cgroup-sandboxing.xml" xpointer="singular"/> | |
aefdc112 AK |
1326 | |
1327 | <xi:include href="version-info.xml" xpointer="v208"/> | |
d868475a ZJS |
1328 | </listitem> |
1329 | </varlistentry> | |
61ad59b1 | 1330 | |
5cbfbf2a LP |
1331 | </variablelist> |
1332 | ||
1333 | </refsect2><refsect2><title>Control Group Management</title> | |
1334 | ||
1335 | <variablelist class='unit-directives'> | |
1336 | ||
61ad59b1 LP |
1337 | <varlistentry> |
1338 | <term><varname>Slice=</varname></term> | |
1339 | ||
1340 | <listitem> | |
1341 | <para>The name of the slice unit to place the unit | |
1342 | in. Defaults to <filename>system.slice</filename> for all | |
dc7adf20 LP |
1343 | non-instantiated units of all unit types (except for slice |
1344 | units themselves see below). Instance units are by default | |
1345 | placed in a subslice of <filename>system.slice</filename> | |
1346 | that is named after the template name.</para> | |
1347 | ||
1348 | <para>This option may be used to arrange systemd units in a | |
1349 | hierarchy of slices each of which might have resource | |
1350 | settings applied.</para> | |
61ad59b1 | 1351 | |
fbce1139 | 1352 | <para>For units of type slice, the only accepted value for |
61ad59b1 | 1353 | this setting is the parent slice. Since the name of a slice |
fbce1139 | 1354 | unit implies the parent slice, it is hence redundant to ever |
61ad59b1 | 1355 | set this parameter directly for slice units.</para> |
ae0a5fb1 LP |
1356 | |
1357 | <para>Special care should be taken when relying on the default slice assignment in templated service units | |
1358 | that have <varname>DefaultDependencies=no</varname> set, see | |
1359 | <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>, section | |
45f09f93 | 1360 | "Default Dependencies" for details.</para> |
ae0a5fb1 | 1361 | |
aefdc112 AK |
1362 | <xi:include href="version-info.xml" xpointer="v208"/> |
1363 | ||
61ad59b1 LP |
1364 | </listitem> |
1365 | </varlistentry> | |
1366 | ||
a931ad47 LP |
1367 | <varlistentry> |
1368 | <term><varname>Delegate=</varname></term> | |
1369 | ||
1370 | <listitem> | |
a8b993dc LP |
1371 | <para>Turns on delegation of further resource control partitioning to processes of the unit. Units |
1372 | where this is enabled may create and manage their own private subhierarchy of control groups below | |
1373 | the control group of the unit itself. For unprivileged services (i.e. those using the | |
1374 | <varname>User=</varname> setting) the unit's control group will be made accessible to the relevant | |
1375 | user.</para> | |
253d0d59 ZJS |
1376 | |
1377 | <para>When enabled the service manager will refrain from manipulating control groups or moving | |
1378 | processes below the unit's control group, so that a clear concept of ownership is established: the | |
449172f9 ZJS |
1379 | control group tree at the level of the unit's control group and above (i.e. towards the root |
1380 | control group) is owned and managed by the service manager of the host, while the control group | |
1381 | tree below the unit's control group is owned and managed by the unit itself.</para> | |
1382 | ||
1383 | <para>Takes either a boolean argument or a (possibly empty) list of control group controller names. | |
1384 | If true, delegation is turned on, and all supported controllers are enabled for the unit, making | |
1385 | them available to the unit's processes for management. If false, delegation is turned off entirely | |
1386 | (and no additional controllers are enabled). If set to a list of controllers, delegation is turned | |
1387 | on, and the specified controllers are enabled for the unit. Assigning the empty string will enable | |
253d0d59 | 1388 | delegation, but reset the list of controllers, and all assignments prior to this will have no |
449172f9 ZJS |
1389 | effect. Note that additional controllers other than the ones specified might be made available as |
1390 | well, depending on configuration of the containing slice unit or other units contained in it. | |
1391 | Defaults to false.</para> | |
253d0d59 ZJS |
1392 | |
1393 | <para>Note that controller delegation to less privileged code is only safe on the unified control | |
1394 | group hierarchy. Accordingly, access to the specified controllers will not be granted to | |
1395 | unprivileged services on the legacy hierarchy, even when requested.</para> | |
a9f01ad1 | 1396 | |
5403e153 AZ |
1397 | <xi:include href="supported-controllers.xml" xpointer="controllers-text" /> |
1398 | ||
253d0d59 ZJS |
1399 | <para>Not all of these controllers are available on all kernels however, and some are specific to |
1400 | the unified hierarchy while others are specific to the legacy hierarchy. Also note that the kernel | |
1401 | might support further controllers, which aren't covered here yet as delegation is either not | |
1402 | supported at all for them or not defined cleanly.</para> | |
1403 | ||
1404 | <para>Note that because of the hierarchical nature of cgroup hierarchy, any controllers that are | |
1405 | delegated will be enabled for the parent and sibling units of the unit with delegation.</para> | |
077c40bc LP |
1406 | |
1407 | <para>For further details on the delegation model consult <ulink | |
1408 | url="https://systemd.io/CGROUP_DELEGATION">Control Group APIs and Delegation</ulink>.</para> | |
aefdc112 AK |
1409 | |
1410 | <xi:include href="version-info.xml" xpointer="v218"/> | |
a931ad47 LP |
1411 | </listitem> |
1412 | </varlistentry> | |
1413 | ||
a8b993dc LP |
1414 | <varlistentry> |
1415 | <term><varname>DelegateSubgroup=</varname></term> | |
1416 | ||
1417 | <listitem> | |
1418 | <para>Place unit processes in the specified subgroup of the unit's control group. Takes a valid | |
1419 | control group name (not a path!) as parameter, or an empty string to turn this feature | |
1420 | off. Defaults to off. The control group name must be usable as filename and avoid conflicts with | |
1421 | the kernel's control group attribute files (i.e. <filename>cgroup.procs</filename> is not an | |
1422 | acceptable name, since the kernel exposes a native control group attribute file by that name). This | |
1423 | option has no effect unless control group delegation is turned on via <varname>Delegate=</varname>, | |
1424 | see above. Note that this setting only applies to "main" processes of a unit, i.e. for services to | |
1425 | <varname>ExecStart=</varname>, but not for <varname>ExecReload=</varname> and similar. If | |
1426 | delegation is enabled, the latter are always placed inside a subgroup named | |
1427 | <filename>.control</filename>. The specified subgroup is automatically created (and potentially | |
1428 | ownership is passed to the unit's configured user/group) when a process is started in it.</para> | |
1429 | ||
1430 | <para>This option is useful to avoid manually moving the invoked process into a subgroup after it | |
1431 | has been started. Since no processes should live in inner nodes of the control group tree it's | |
1432 | almost always necessary to run the main ("supervising") process of a unit that has delegation | |
1433 | turned on in a subgroup.</para> | |
ec07c3c8 AK |
1434 | |
1435 | <xi:include href="version-info.xml" xpointer="v254"/> | |
a8b993dc LP |
1436 | </listitem> |
1437 | </varlistentry> | |
1438 | ||
c72703e2 CD |
1439 | <varlistentry> |
1440 | <term><varname>DisableControllers=</varname></term> | |
1441 | ||
1442 | <listitem> | |
253d0d59 ZJS |
1443 | <para>Disables controllers from being enabled for a unit's children. If a controller listed is |
1444 | already in use in its subtree, the controller will be removed from the subtree. This can be used to | |
1445 | avoid configuration in child units from being able to implicitly or explicitly enable a controller. | |
1446 | Defaults to empty.</para> | |
c72703e2 CD |
1447 | |
1448 | <para>Multiple controllers may be specified, separated by spaces. You may also pass | |
1449 | <varname>DisableControllers=</varname> multiple times, in which case each new instance adds another controller | |
1450 | to disable. Passing <varname>DisableControllers=</varname> by itself with no controller name present resets | |
1451 | the disabled controller list.</para> | |
1452 | ||
253d0d59 ZJS |
1453 | <para>It may not be possible to disable a controller after units have been started, if the unit or |
1454 | any child of the unit in question delegates controllers to its children, as any delegated subtree | |
1455 | of the cgroup hierarchy is unmanaged by systemd.</para> | |
1456 | ||
5403e153 | 1457 | <xi:include href="supported-controllers.xml" xpointer="controllers-text" /> |
aefdc112 AK |
1458 | |
1459 | <xi:include href="version-info.xml" xpointer="v240"/> | |
c72703e2 CD |
1460 | </listitem> |
1461 | </varlistentry> | |
cf3e5788 | 1462 | |
5cbfbf2a LP |
1463 | </variablelist> |
1464 | ||
1465 | </refsect2><refsect2><title>Memory Pressure Control</title> | |
1466 | ||
1467 | <variablelist class='unit-directives'> | |
1468 | ||
cf3e5788 AZ |
1469 | <varlistentry> |
1470 | <term><varname>ManagedOOMSwap=auto|kill</varname></term> | |
1471 | <term><varname>ManagedOOMMemoryPressure=auto|kill</varname></term> | |
1472 | ||
1473 | <listitem> | |
1474 | <para>Specifies how | |
1475 | <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
1476 | will act on this unit's cgroups. Defaults to <option>auto</option>.</para> | |
1477 | ||
6f83ea60 ZJS |
1478 | <para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by |
1479 | <command>systemd-oomd</command>. If the cgroup passes the limits set by | |
1480 | <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or | |
1481 | the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send | |
1482 | <constant>SIGKILL</constant> to all of the processes under it. You can find more details on | |
1483 | candidates and kill behavior at | |
cf3e5788 | 1484 | <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> |
6f83ea60 ZJS |
1485 | and |
1486 | <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
1487 | ||
1488 | <para>Setting either of these properties to <option>kill</option> will also result in | |
cf3e5788 | 1489 | <varname>After=</varname> and <varname>Wants=</varname> dependencies on |
6f83ea60 | 1490 | <filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para> |
cf3e5788 | 1491 | |
6f83ea60 ZJS |
1492 | <para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this |
1493 | cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these | |
1494 | properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate | |
1495 | for <command>systemd-oomd</command> to terminate.</para> | |
aefdc112 AK |
1496 | |
1497 | <xi:include href="version-info.xml" xpointer="v247"/> | |
cf3e5788 AZ |
1498 | </listitem> |
1499 | </varlistentry> | |
1500 | ||
1501 | <varlistentry> | |
0a9f9344 | 1502 | <term><varname>ManagedOOMMemoryPressureLimit=</varname></term> |
cf3e5788 AZ |
1503 | |
1504 | <listitem> | |
1505 | <para>Overrides the default memory pressure limit set by | |
75909cc7 ZJS |
1506 | <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> for |
1507 | this unit (cgroup). Takes a percentage value between 0% and 100%, inclusive. This property is | |
1508 | ignored unless <varname>ManagedOOMMemoryPressure=</varname><option>kill</option>. Defaults to 0%, | |
1509 | which means to use the default set by | |
1510 | <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. | |
cf3e5788 | 1511 | </para> |
aefdc112 AK |
1512 | |
1513 | <xi:include href="version-info.xml" xpointer="v247"/> | |
cf3e5788 AZ |
1514 | </listitem> |
1515 | </varlistentry> | |
d8a4d64b AZ |
1516 | |
1517 | <varlistentry> | |
1518 | <term><varname>ManagedOOMPreference=none|avoid|omit</varname></term> | |
1519 | ||
1520 | <listitem> | |
326152af ZJS |
1521 | <para>Allows deprioritizing or omitting this unit's cgroup as a candidate when |
1522 | <command>systemd-oomd</command> needs to act. Requires support for extended attributes (see | |
d8a4d64b | 1523 | <citerefentry project='man-pages'><refentrytitle>xattr</refentrytitle><manvolnum>7</manvolnum></citerefentry>) |
58b2f0d1 NR |
1524 | in order to use <option>avoid</option> or <option>omit</option>.</para> |
1525 | ||
1526 | <para>When calculating candidates to relieve swap usage, <command>systemd-oomd</command> will | |
1527 | only respect these extended attributes if the unit's cgroup is owned by root.</para> | |
1528 | ||
1529 | <para>When calculating candidates to relieve memory pressure, <command>systemd-oomd</command> | |
3b44e33f NR |
1530 | will only respect these extended attributes if the unit's cgroup is owned by root, or if the |
1531 | unit's cgroup owner, and the owner of the monitored ancestor cgroup are the same. For example, | |
1532 | if <command>systemd-oomd</command> is calculating candidates for <filename>-.slice</filename>, | |
1533 | then extended attributes set on descendants of <filename>/user.slice/user-1000.slice/user@1000.service/</filename> | |
58b2f0d1 NR |
1534 | will be ignored because the descendants are owned by UID 1000, and <filename>-.slice</filename> |
1535 | is owned by UID 0. But, if calculating candidates for | |
1536 | <filename>/user.slice/user-1000.slice/user@1000.service/</filename>, then extended attributes set | |
1537 | on the descendants would be respected.</para> | |
d8a4d64b | 1538 | |
34507fa9 ZJS |
1539 | <para>If this property is set to <option>avoid</option>, the service manager will convey this to |
1540 | <command>systemd-oomd</command>, which will only select this cgroup if there are no other viable | |
1541 | candidates.</para> | |
1542 | ||
1543 | <para>If this property is set to <option>omit</option>, the service manager will convey this to | |
1544 | <command>systemd-oomd</command>, which will ignore this cgroup as a candidate and will not perform | |
1545 | any actions on it.</para> | |
326152af ZJS |
1546 | |
1547 | <para>It is recommended to use <option>avoid</option> and <option>omit</option> sparingly, as it | |
1548 | can adversely affect <command>systemd-oomd</command>'s kill behavior. Also note that these extended | |
1549 | attributes are not applied recursively to cgroups under this unit's cgroup.</para> | |
1550 | ||
34507fa9 ZJS |
1551 | <para>Defaults to <option>none</option> which means <command>systemd-oomd</command> will rank this |
1552 | unit's cgroup as defined in | |
d8a4d64b | 1553 | <citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> |
34507fa9 ZJS |
1554 | and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. |
1555 | </para> | |
aefdc112 AK |
1556 | |
1557 | <xi:include href="version-info.xml" xpointer="v248"/> | |
d8a4d64b AZ |
1558 | </listitem> |
1559 | </varlistentry> | |
6bb00842 LP |
1560 | |
1561 | <varlistentry> | |
1562 | <term><varname>MemoryPressureWatch=</varname></term> | |
1563 | ||
1564 | <listitem><para>Controls memory pressure monitoring for invoked processes. Takes one of | |
1565 | <literal>off</literal>, <literal>on</literal>, <literal>auto</literal> or <literal>skip</literal>. If | |
1566 | <literal>off</literal> tells the service not to watch for memory pressure events, by setting the | |
1567 | <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable to the literal string | |
1568 | <filename>/dev/null</filename>. If <literal>on</literal> tells the service to watch for memory | |
1569 | pressure events. This enables memory accounting for the service, and ensures the | |
bf63dadb | 1570 | <filename>memory.pressure</filename> cgroup attribute file is accessible for reading and writing by the |
6bb00842 LP |
1571 | service's user. It then sets the <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable for |
1572 | processes invoked by the unit to the file system path to this file. The threshold information | |
1573 | configured with <varname>MemoryPressureThresholdSec=</varname> is encoded in the | |
1574 | <varname>$MEMORY_PRESSURE_WRITE</varname> environment variable. If the <literal>auto</literal> value | |
1575 | is set the protocol is enabled if memory accounting is anyway enabled for the unit, and disabled | |
1576 | otherwise. If set to <literal>skip</literal> the logic is neither enabled, nor disabled and the two | |
1577 | environment variables are not set.</para> | |
1578 | ||
1579 | <para>Note that services are free to use the two environment variables, but it's unproblematic if | |
1580 | they ignore them. Memory pressure handling must be implemented individually in each service, and | |
1581 | usually means different things for different software. For further details on memory pressure | |
1582 | handling see <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure Handling in | |
1583 | systemd</ulink>.</para> | |
1584 | ||
1585 | <para>Services implemented using | |
1586 | <citerefentry><refentrytitle>sd-event</refentrytitle><manvolnum>3</manvolnum></citerefentry> may use | |
1587 | <citerefentry><refentrytitle>sd_event_add_memory_pressure</refentrytitle><manvolnum>3</manvolnum></citerefentry> | |
1588 | to watch for and handle memory pressure events.</para> | |
1589 | ||
1590 | <para>If not explicit set, defaults to the <varname>DefaultMemoryPressureWatch=</varname> setting in | |
ec07c3c8 AK |
1591 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> |
1592 | ||
1593 | <xi:include href="version-info.xml" xpointer="v254"/></listitem> | |
6bb00842 LP |
1594 | </varlistentry> |
1595 | ||
1596 | <varlistentry> | |
1597 | <term><varname>MemoryPressureThresholdSec=</varname></term> | |
1598 | ||
1599 | <listitem><para>Sets the memory pressure threshold time for memory pressure monitor as configured via | |
1600 | <varname>MemoryPressureWatch=</varname>. Specifies the maximum allocation latency before a memory | |
a6170074 | 1601 | pressure event is signalled to the service, per 2s window. If not specified defaults to the |
6bb00842 LP |
1602 | <varname>DefaultMemoryPressureThresholdSec=</varname> setting in |
1603 | <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> | |
a6170074 | 1604 | (which in turn defaults to 200ms). The specified value expects a time unit such as |
e503019b | 1605 | <literal>ms</literal> or <literal>μs</literal>, see |
6bb00842 | 1606 | <citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for |
ec07c3c8 AK |
1607 | details on the permitted syntax.</para> |
1608 | ||
1609 | <xi:include href="version-info.xml" xpointer="v254"/></listitem> | |
6bb00842 | 1610 | </varlistentry> |
d868475a | 1611 | </variablelist> |
cfc015f0 NR |
1612 | |
1613 | </refsect2><refsect2><title>Coredump Control</title> | |
1614 | ||
1615 | <variablelist class='unit-directives'> | |
1616 | ||
1617 | <varlistentry> | |
1618 | <term><varname>CoredumpReceive=</varname></term> | |
1619 | ||
1620 | <listitem><para>Takes a boolean argument. This setting is used to enable coredump forwarding for containers | |
1621 | that belong to this unit's cgroup. Units with <varname>CoredumpReceive=yes</varname> must also be configured | |
1622 | with <varname>Delegate=yes</varname>. Defaults to false.</para> | |
1623 | ||
1624 | <para>When <command>systemd-coredump</command> is handling a coredump for a process from a container, | |
1625 | if the container's leader process is a descendant of a cgroup with <varname>CoredumpReceive=yes</varname> | |
1626 | and <varname>Delegate=yes</varname>, then <command>systemd-coredump</command> will attempt to forward | |
1627 | the coredump to <command>systemd-coredump</command> within the container.</para> | |
1628 | ||
1629 | <xi:include href="version-info.xml" xpointer="v255"/></listitem> | |
1630 | </varlistentry> | |
1631 | ||
1632 | </variablelist> | |
5cbfbf2a | 1633 | </refsect2> |
d868475a ZJS |
1634 | </refsect1> |
1635 | ||
7a9e0bd0 ZJS |
1636 | <refsect1> |
1637 | <title>History</title> | |
1638 | ||
1639 | <variablelist> | |
1640 | <varlistentry> | |
1641 | <term>systemd 252</term> | |
1642 | <listitem><para> Options for controlling the Legacy Control Group Hierarchy (<ulink | |
8b9f0921 ZJS |
1643 | url="https://docs.kernel.org/admin-guide/cgroup-v1/index.html">Control Groups version 1</ulink>) |
1644 | are now fully deprecated: | |
1645 | <varname>CPUShares=<replaceable>weight</replaceable></varname>, | |
7a9e0bd0 ZJS |
1646 | <varname>StartupCPUShares=<replaceable>weight</replaceable></varname>, |
1647 | <varname>MemoryLimit=<replaceable>bytes</replaceable></varname>, | |
1648 | <varname>BlockIOAccounting=</varname>, | |
1649 | <varname>BlockIOWeight=<replaceable>weight</replaceable></varname>, | |
1650 | <varname>StartupBlockIOWeight=<replaceable>weight</replaceable></varname>, | |
1651 | <varname>BlockIODeviceWeight=<replaceable>device</replaceable> | |
1652 | <replaceable>weight</replaceable></varname>, | |
1653 | <varname>BlockIOReadBandwidth=<replaceable>device</replaceable> | |
1654 | <replaceable>bytes</replaceable></varname>, | |
8b9f0921 | 1655 | <varname>BlockIOWriteBandwidth=<replaceable>device</replaceable> <replaceable>bytes</replaceable></varname>. |
ec07c3c8 AK |
1656 | Please switch to the unified cgroup hierarchy.</para> |
1657 | ||
1658 | <xi:include href="version-info.xml" xpointer="v252"/></listitem> | |
7a9e0bd0 ZJS |
1659 | </varlistentry> |
1660 | </variablelist> | |
1661 | </refsect1> | |
1662 | ||
d868475a ZJS |
1663 | <refsect1> |
1664 | <title>See Also</title> | |
13a69c12 DT |
1665 | <para><simplelist type="inline"> |
1666 | <member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member> | |
1667 | <member><citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1668 | <member><citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1669 | <member><citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1670 | <member><citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1671 | <member><citerefentry><refentrytitle>systemd.scope</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1672 | <member><citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1673 | <member><citerefentry><refentrytitle>systemd.mount</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1674 | <member><citerefentry><refentrytitle>systemd.swap</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1675 | <member><citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry></member> | |
1676 | <member><citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry></member> | |
1677 | <member><citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry></member> | |
1678 | <member><citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry></member> | |
1679 | <member>The documentation for control groups and specific controllers in the Linux kernel: | |
cfcdee7c | 1680 | <ulink url="https://docs.kernel.org/admin-guide/cgroup-v2.html">Control Groups v2</ulink></member> |
13a69c12 | 1681 | </simplelist></para> |
d868475a ZJS |
1682 | </refsect1> |
1683 | </refentry> |