-<?xml version='1.0'?> <!--*- Mode: nxml; nxml-child-indent: 2; indent-tabs-mode: nil -*-->
+<?xml version='1.0'?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<!--
- This file is part of systemd.
-
- Copyright 2013 Zbigniew Jędrzejewski-Szmek
-
- systemd is free software; you can redistribute it and/or modify it
- under the terms of the GNU Lesser General Public License as published by
- the Free Software Foundation; either version 2.1 of the License, or
- (at your option) any later version.
-
- systemd is distributed in the hope that it will be useful, but
- WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public License
- along with systemd; If not, see <http://www.gnu.org/licenses/>.
+ SPDX-License-Identifier: LGPL-2.1+
-->
<refentry id="systemd.resource-control">
<refentryinfo>
<title>systemd.resource-control</title>
<productname>systemd</productname>
-
- <authorgroup>
- <author>
- <contrib>Developer</contrib>
- <firstname>Lennart</firstname>
- <surname>Poettering</surname>
- <email>lennart@poettering.net</email>
- </author>
- </authorgroup>
</refentryinfo>
<refmeta>
</refsect1>
<refsect1>
- <title>Automatic Dependencies</title>
+ <title>Implicit Dependencies</title>
- <para>Units with the <varname>Slice=</varname> setting set automatically acquire <varname>Requires=</varname> and
- <varname>After=</varname> dependencies on the specified slice unit.</para>
+ <para>The following dependencies are implicitly added:</para>
+
+ <itemizedlist>
+ <listitem><para>Units with the <varname>Slice=</varname> setting set automatically acquire
+ <varname>Requires=</varname> and <varname>After=</varname> dependencies on the specified
+ slice unit.</para></listitem>
+ </itemizedlist>
</refsect1>
+ <!-- We don't have any default dependency here. -->
+
<refsect1>
<title>Unified and Legacy Control Group Hierarchies</title>
<varlistentry>
<term><option>CPU</option></term>
<listitem>
- <para>Due to the lack of consensus in the kernel community, the CPU controller support on the unified
- control group hierarchy requires out-of-tree kernel patches. See <ulink
- url="https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu">cgroup-v2-cpu.txt</ulink>.</para>
-
<para><varname>CPUWeight=</varname> and <varname>StartupCPUWeight=</varname> replace
<varname>CPUShares=</varname> and <varname>StartupCPUShares=</varname>, respectively.</para>
<varlistentry>
<term><option>IO</option></term>
<listitem>
- <para><varname>IO</varname> prefixed settings are superset of and replace <varname>BlockIO</varname>
+ <para><varname>IO</varname> prefixed settings are a superset of and replace <varname>BlockIO</varname>
prefixed ones. On unified hierarchy, IO resource control also applies to buffered writes.</para>
</listitem>
</varlistentry>
the startup phase. Using <varname>StartupCPUWeight=</varname> allows prioritizing specific services at
boot-up differently than during normal runtime.</para>
- <para>Implies <literal>CPUAccounting=true</literal>.</para>
-
<para>These settings replace <varname>CPUShares=</varname> and <varname>StartupCPUShares=</varname>.</para>
</listitem>
</varlistentry>
<literal>cpu.max</literal> attribute on the unified control group hierarchy and
<literal>cpu.cfs_quota_us</literal> on legacy. For details about these control group attributes, see <ulink
url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink> and <ulink
- url="https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt">sched-design-CFS.txt</ulink>.</para>
+ url="https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt">sched-bwc.txt</ulink>.</para>
<para>Example: <varname>CPUQuota=20%</varname> ensures that the executed processes will never get more than
20% CPU time on one CPU.</para>
- <para>Implies <literal>CPUAccounting=true</literal>.</para>
</listitem>
</varlistentry>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>MemoryMin=<replaceable>bytes</replaceable></varname></term>
+
+ <listitem>
+ <para>Specify the memory usage protection of the executed processes in this unit. If the memory usages of
+ this unit and all its ancestors are below their minimum boundaries, this unit's memory won't be reclaimed.</para>
+
+ <para>Takes a memory size in bytes. If the value is suffixed with K, M, G or T, the specified memory size is
+ parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Alternatively, a
+ percentage value may be specified, which is taken relative to the installed physical memory on the
+ system. This controls the <literal>memory.min</literal> control group attribute. For details about this
+ control group attribute, see <ulink
+ url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
+
+ <para>This setting is supported only if the unified control group hierarchy is used and disables
+ <varname>MemoryLimit=</varname>.</para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>MemoryLow=<replaceable>bytes</replaceable></varname></term>
control group attribute, see <ulink
url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
- <para>Implies <literal>MemoryAccounting=true</literal>.</para>
-
<para>This setting is supported only if the unified control group hierarchy is used and disables
<varname>MemoryLimit=</varname>.</para>
</listitem>
<literal>memory.high</literal> control group attribute. For details about this control group attribute, see
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
- <para>Implies <literal>MemoryAccounting=true</literal>.</para>
-
<para>This setting is supported only if the unified control group hierarchy is used and disables
<varname>MemoryLimit=</varname>.</para>
</listitem>
<literal>memory.max</literal> control group attribute. For details about this control group attribute, see
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
- <para>Implies <literal>MemoryAccounting=true</literal>.</para>
-
<para>This setting replaces <varname>MemoryLimit=</varname>.</para>
</listitem>
</varlistentry>
<literal>memory.swap.max</literal> control group attribute. For details about this control group attribute,
see <ulink url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
- <para>Implies <literal>MemoryAccounting=true</literal>.</para>
-
<para>This setting is supported only if the unified control group hierarchy is used and disables
<varname>MemoryLimit=</varname>.</para>
</listitem>
the <literal>pids.max</literal> control group attribute. For details about this control group attribute, see
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt">pids.txt</ulink>.</para>
- <para>Implies <literal>TasksAccounting=true</literal>. The
+ <para>The
system default for this setting may be controlled with
<varname>DefaultTasksMax=</varname> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
phase. This allows prioritizing specific services at boot-up
differently than during runtime.</para>
- <para>Implies <literal>IOAccounting=true</literal>.</para>
-
<para>These settings replace <varname>BlockIOWeight=</varname> and <varname>StartupBlockIOWeight=</varname>
and disable settings prefixed with <varname>BlockIO</varname> or <varname>StartupBlockIO</varname>.</para>
</listitem>
<listitem>
<para>Set the per-device overall block I/O weight for the executed processes, if the unified control group
hierarchy is used on the system. Takes a space-separated pair of a file path and a weight value to specify
- the device specific weight value, between 1 and 10000. (Example: "/dev/sda 1000"). The file path may be
- specified as path to a block device node or as any other file, in which case the backing block device of the
- file system of the file is determined. This controls the <literal>io.weight</literal> control group
- attribute, which defaults to 100. Use this option multiple times to set weights for multiple devices. For
- details about this control group attribute, see <ulink
+ the device specific weight value, between 1 and 10000. (Example: <literal>/dev/sda 1000</literal>). The file
+ path may be specified as path to a block device node or as any other file, in which case the backing block
+ device of the file system of the file is determined. This controls the <literal>io.weight</literal> control
+ group attribute, which defaults to 100. Use this option multiple times to set weights for multiple devices.
+ For details about this control group attribute, see <ulink
url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
- <para>Implies <literal>IOAccounting=true</literal>.</para>
-
<para>This setting replaces <varname>BlockIODeviceWeight=</varname> and disables settings prefixed with
<varname>BlockIO</varname> or <varname>StartupBlockIO</varname>.</para>
</listitem>
url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.
</para>
- <para>Implies <literal>IOAccounting=true</literal>.</para>
-
<para>These settings replace <varname>BlockIOReadBandwidth=</varname> and
<varname>BlockIOWriteBandwidth=</varname> and disable settings prefixed with <varname>BlockIO</varname> or
<varname>StartupBlockIO</varname>.</para>
url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.
</para>
- <para>Implies <literal>IOAccounting=true</literal>.</para>
-
<para>These settings are supported only if the unified control group hierarchy is used and disable settings
prefixed with <varname>BlockIO</varname> or <varname>StartupBlockIO</varname>.</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>IODeviceLatencyTargetSec=<replaceable>device</replaceable> <replaceable>target</replaceable></varname></term>
+
+ <listitem>
+ <para>Set the per-device average target I/O latency for the executed processes, if the unified control group
+ hierarchy is used on the system. Takes a file path and a timespan separated by a space to specify
+ the device specific latency target. (Example: "/dev/sda 25ms"). The file path may be specified
+ as path to a block device node or as any other file, in which case the backing block device of the file
+ system of the file is determined. This controls the <literal>io.latency</literal> control group
+ attribute. Use this option multiple times to set latency target for multiple devices. For details about this
+ control group attribute, see <ulink
+ url="https://www.kernel.org/doc/Documentation/cgroup-v2.txt">cgroup-v2.txt</ulink>.</para>
+
+ <para>Implies <literal>IOAccounting=yes</literal>.</para>
+
+ <para>These settings are supported only if the unified control group hierarchy is used.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>IPAccounting=</varname></term>
+
+ <listitem>
+ <para>Takes a boolean argument. If true, turns on IPv4 and IPv6 network traffic accounting for packets sent
+ or received by the unit. When this option is turned on, all IPv4 and IPv6 sockets created by any process of
+ the unit are accounted for.</para>
+
+ <para>When this option is used in socket units, it applies to all IPv4 and IPv6 sockets
+ associated with it (including both listening and connection sockets where this applies). Note that for
+ socket-activated services, this configuration setting and the accounting data of the service unit and the
+ socket unit are kept separate, and displayed separately. No propagation of the setting and the collected
+ statistics is done, in either direction. Moreover, any traffic sent or received on any of the socket unit's
+ sockets is accounted to the socket unit — and never to the service unit it might have activated, even if the
+ socket is used by it.</para>
+
+ <para>The system default for this setting may be controlled with <varname>DefaultIPAccounting=</varname> in
+ <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>IPAddressAllow=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
+ <term><varname>IPAddressDeny=<replaceable>ADDRESS[/PREFIXLENGTH]…</replaceable></varname></term>
+
+ <listitem>
+ <para>Turn on address range network traffic filtering for packets sent and received over AF_INET and AF_INET6
+ sockets. Both directives take a space separated list of IPv4 or IPv6 addresses, each optionally suffixed
+ with an address prefix length (separated by a <literal>/</literal> character). If the latter is omitted, the
+ address is considered a host address, i.e. the prefix covers the whole address (32 for IPv4, 128 for IPv6).
+ </para>
+
+ <para>The access lists configured with this option are applied to all sockets created by processes of this
+ unit (or in the case of socket units, associated with it). The lists are implicitly combined with any lists
+ configured for any of the parent slice units this unit might be a member of. By default all access lists are
+ empty. When configured the lists are enforced as follows:</para>
+
+ <itemizedlist>
+ <listitem><para>Access will be granted in case its destination/source address matches any entry in the
+ <varname>IPAddressAllow=</varname> setting.</para></listitem>
+
+ <listitem><para>Otherwise, access will be denied in case its destination/source address matches any entry
+ in the <varname>IPAddressDeny=</varname> setting.</para></listitem>
+
+ <listitem><para>Otherwise, access will be granted.</para></listitem>
+ </itemizedlist>
+
+ <para>In order to implement a whitelisting IP firewall, it is recommended to use a
+ <varname>IPAddressDeny=</varname><constant>any</constant> setting on an upper-level slice unit (such as the
+ root slice <filename>-.slice</filename> or the slice containing all system services
+ <filename>system.slice</filename> – see
+ <citerefentry><refentrytitle>systemd.special</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
+ details on these slice units), plus individual per-service <varname>IPAddressAllow=</varname> lines
+ permitting network access to relevant services, and only them.</para>
+
+ <para>Note that for socket-activated services, the IP access list configured on the socket unit applies to
+ all sockets associated with it directly, but not to any sockets created by the ultimately activated services
+ for it. Conversely, the IP access list configured for the service is not applied to any sockets passed into
+ the service via socket activation. Thus, it is usually a good idea, to replicate the IP access lists on both
+ the socket and the service unit, however it often makes sense to maintain one list more open and the other
+ one more restricted, depending on the usecase.</para>
+
+ <para>If these settings are used multiple times in the same unit the specified lists are combined. If an
+ empty string is assigned to these settings the specific access list is reset and all previous settings undone.</para>
+
+ <para>In place of explicit IPv4 or IPv6 address and prefix length specifications a small set of symbolic
+ names may be used. The following names are defined:</para>
+
+ <table>
+ <title>Special address/network names</title>
+
+ <tgroup cols='3'>
+ <colspec colname='name'/>
+ <colspec colname='definition'/>
+ <colspec colname='meaning'/>
+
+ <thead>
+ <row>
+ <entry>Symbolic Name</entry>
+ <entry>Definition</entry>
+ <entry>Meaning</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><constant>any</constant></entry>
+ <entry>0.0.0.0/0 ::/0</entry>
+ <entry>Any host</entry>
+ </row>
+
+ <row>
+ <entry><constant>localhost</constant></entry>
+ <entry>127.0.0.0/8 ::1/128</entry>
+ <entry>All addresses on the local loopback</entry>
+ </row>
+
+ <row>
+ <entry><constant>link-local</constant></entry>
+ <entry>169.254.0.0/16 fe80::/64</entry>
+ <entry>All link-local IP addresses</entry>
+ </row>
+
+ <row>
+ <entry><constant>multicast</constant></entry>
+ <entry>224.0.0.0/4 ff00::/8</entry>
+ <entry>All IP multicasting addresses</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>Note that these settings might not be supported on some systems (for example if eBPF control group
+ support is not enabled in the underlying kernel or container manager). These settings will have no effect in
+ that case. If compatibility with such systems is desired it is hence recommended to not exclusively rely on
+ them for IP security.</para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>DeviceAllow=</varname></term>
<filename>/proc/devices</filename>. The latter is useful to
whitelist all current and future devices belonging to a
specific device group at once. The device group is matched
- according to file name globbing rules, you may hence use the
+ according to filename globbing rules, you may hence use the
<literal>*</literal> and <literal>?</literal>
wildcards. Examples: <filename>/dev/sda5</filename> is a
path to a device node, referring to an ATA or SCSI block
<para>Special care should be taken when relying on the default slice assignment in templated service units
that have <varname>DefaultDependencies=no</varname> set, see
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>, section
- "Automatic Dependencies" for details.</para>
+ "Default Dependencies" for details.</para>
</listitem>
</varlistentry>
<term><varname>Delegate=</varname></term>
<listitem>
- <para>Turns on delegation of further resource control
- partitioning to processes of the unit. For unprivileged
- services (i.e. those using the <varname>User=</varname>
- setting), this allows processes to create a subhierarchy
- beneath its control group path. For privileged services and
- scopes, this ensures the processes will have all control
- group controllers enabled.</para>
+ <para>Turns on delegation of further resource control partitioning to processes of the unit. Units where this
+ is enabled may create and manage their own private subhierarchy of control groups below the control group of
+ the unit itself. For unprivileged services (i.e. those using the <varname>User=</varname> setting) the unit's
+ control group will be made accessible to the relevant user. When enabled the service manager will refrain
+ from manipulating control groups or moving processes below the unit's control group, so that a clear concept
+ of ownership is established: the control group tree above the unit's control group (i.e. towards the root
+ control group) is owned and managed by the service manager of the host, while the control group tree below
+ the unit's control group is owned and managed by the unit itself. Takes either a boolean argument or a list
+ of control group controller names. If true, delegation is turned on, and all supported controllers are
+ enabled for the unit, making them available to the unit's processes for management. If false, delegation is
+ turned off entirely (and no additional controllers are enabled). If set to a list of controllers, delegation
+ is turned on, and the specified controllers are enabled for the unit. Note that additional controllers than
+ the ones specified might be made available as well, depending on configuration of the containing slice unit
+ or other units contained in it. Note that assigning the empty string will enable delegation, but reset the
+ list of controllers, all assignments prior to this will have no effect. Defaults to false.</para>
+
+ <para>Note that controller delegation to less privileged code is only safe on the unified control group
+ hierarchy. Accordingly, access to the specified controllers will not be granted to unprivileged services on
+ the legacy hierarchy, even when requested.</para>
+
+ <para>The following controller names may be specified: <option>cpu</option>, <option>cpuacct</option>,
+ <option>io</option>, <option>blkio</option>, <option>memory</option>, <option>devices</option>,
+ <option>pids</option>. Not all of these controllers are available on all kernels however, and some are
+ specific to the unified hierarchy while others are specific to the legacy hierarchy. Also note that the
+ kernel might support further controllers, which aren't covered here yet as delegation is either not supported
+ at all for them or not defined cleanly.</para>
+
+ <para>For further details on the delegation model consult <ulink
+ url="https://systemd.io/CGROUP_DELEGATION">Control Group APIs and Delegation</ulink>.</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>DisableControllers=</varname></term>
+
+ <listitem>
+ <para>Disables controllers from being enabled for a unit's children. If a controller listed is already in use
+ in its subtree, the controller will be removed from the subtree. This can be used to avoid child units being
+ able to implicitly or explicitly enable a controller. Defaults to not disabling any controllers.</para>
+
+ <para>It may not be possible to successfully disable a controller if the unit or any child of the unit in
+ question delegates controllers to its children, as any delegated subtree of the cgroup hierarchy is unmanaged
+ by systemd.</para>
+
+ <para>Multiple controllers may be specified, separated by spaces. You may also pass
+ <varname>DisableControllers=</varname> multiple times, in which case each new instance adds another controller
+ to disable. Passing <varname>DisableControllers=</varname> by itself with no controller name present resets
+ the disabled controller list.</para>
+
+ <para>Valid controllers are <option>cpu</option>, <option>cpuacct</option>, <option>io</option>,
+ <option>blkio</option>, <option>memory</option>, <option>devices</option>, and <option>pids</option>.</para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
the startup phase. Using <varname>StartupCPUShares=</varname> allows prioritizing specific services at
boot-up differently than during normal runtime.</para>
- <para>Implies <literal>CPUAccounting=true</literal>.</para>
+ <para>Implies <literal>CPUAccounting=yes</literal>.</para>
<para>These settings are deprecated. Use <varname>CPUWeight=</varname> and
<varname>StartupCPUWeight=</varname> instead.</para>
attribute, see <ulink
url="https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt">memory.txt</ulink>.</para>
- <para>Implies <literal>MemoryAccounting=true</literal>.</para>
+ <para>Implies <literal>MemoryAccounting=yes</literal>.</para>
<para>This setting is deprecated. Use <varname>MemoryMax=</varname> instead.</para>
</listitem>
boot-up differently than during runtime.</para>
<para>Implies
- <literal>BlockIOAccounting=true</literal>.</para>
+ <literal>BlockIOAccounting=yes</literal>.</para>
<para>These settings are deprecated. Use <varname>IOWeight=</varname> and <varname>StartupIOWeight=</varname>
instead.</para>
url="https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt">blkio-controller.txt</ulink>.</para>
<para>Implies
- <literal>BlockIOAccounting=true</literal>.</para>
+ <literal>BlockIOAccounting=yes</literal>.</para>
<para>This setting is deprecated. Use <varname>IODeviceWeight=</varname> instead.</para>
</listitem>
</para>
<para>Implies
- <literal>BlockIOAccounting=true</literal>.</para>
+ <literal>BlockIOAccounting=yes</literal>.</para>
<para>These settings are deprecated. Use <varname>IOReadBandwidthMax=</varname> and
<varname>IOWriteBandwidthMax=</varname> instead.</para>
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt">cpuacct.txt</ulink>,
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt">memory.txt</ulink>,
<ulink url="https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt">blkio-controller.txt</ulink>.
+ <ulink url="https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt">sched-bwc.txt</ulink>.
</para>
</refsect1>
</refentry>