execution specific configuration options are configured in the
[Service], [Socket], [Mount], or [Swap] sections, depending on the
unit type.</para>
+
+ <para>In addition, options which control resources through Linux Control Groups (cgroups) are listed in
+ <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
+ Those options complement options listed here.</para>
</refsect1>
<refsect1>
<para>A few execution parameters result in additional, automatic
dependencies to be added.</para>
- <para>Units with <varname>WorkingDirectory=</varname> or
- <varname>RootDirectory=</varname> set automatically gain
- dependencies of type <varname>Requires=</varname> and
- <varname>After=</varname> on all mount units required to access
- the specified paths. This is equivalent to having them listed
- explicitly in <varname>RequiresMountsFor=</varname>.</para>
+ <para>Units with <varname>WorkingDirectory=</varname>, <varname>RootDirectory=</varname> or
+ <varname>RootImage=</varname> set automatically gain dependencies of type <varname>Requires=</varname> and
+ <varname>After=</varname> on all mount units required to access the specified paths. This is equivalent to having
+ them listed explicitly in <varname>RequiresMountsFor=</varname>.</para>
- <para>Similar, units with <varname>PrivateTmp=</varname> enabled
- automatically get mount unit dependencies for all mounts
- required to access <filename>/tmp</filename> and
- <filename>/var/tmp</filename>.</para>
+ <para>Similar, units with <varname>PrivateTmp=</varname> enabled automatically get mount unit dependencies for all
+ mounts required to access <filename>/tmp</filename> and <filename>/var/tmp</filename>. They will also gain an
+ automatic <varname>After=</varname> dependency on
+ <citerefentry><refentrytitle>systemd-tmpfiles-setup.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>.</para>
<para>Units whose standard output or error output is connected to <option>journal</option>, <option>syslog</option>
or <option>kmsg</option> (or their combinations with console output, see below) automatically acquire dependencies
<varlistentry>
<term><varname>WorkingDirectory=</varname></term>
- <listitem><para>Takes a directory path relative to the service's root
- directory specified by <varname>RootDirectory=</varname>, or the
- special value <literal>~</literal>. Sets the working directory
- for executed processes. If set to <literal>~</literal>, the
- home directory of the user specified in
- <varname>User=</varname> is used. If not set, defaults to the
- root directory when systemd is running as a system instance
- and the respective user's home directory if run as user. If
- the setting is prefixed with the <literal>-</literal>
- character, a missing working directory is not considered
- fatal. If <varname>RootDirectory=</varname> is not set, then
- <varname>WorkingDirectory=</varname> is relative to the root of
- the system running the service manager.
- Note that setting this parameter might result in
- additional dependencies to be added to the unit (see
+ <listitem><para>Takes a directory path relative to the service's root directory specified by
+ <varname>RootDirectory=</varname>, or the special value <literal>~</literal>. Sets the working directory for
+ executed processes. If set to <literal>~</literal>, the home directory of the user specified in
+ <varname>User=</varname> is used. If not set, defaults to the root directory when systemd is running as a
+ system instance and the respective user's home directory if run as user. If the setting is prefixed with the
+ <literal>-</literal> character, a missing working directory is not considered fatal. If
+ <varname>RootDirectory=</varname>/<varname>RootImage=</varname> is not set, then
+ <varname>WorkingDirectory=</varname> is relative to the root of the system running the service manager. Note
+ that setting this parameter might result in additional dependencies to be added to the unit (see
above).</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>RootDirectory=</varname></term>
- <listitem><para>Takes a directory path relative to the host's root directory
- (i.e. the root of the system running the service manager). Sets the
- root directory for executed processes, with the <citerefentry
- project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- system call. If this is used, it must be ensured that the
- process binary and all its auxiliary files are available in
- the <function>chroot()</function> jail. Note that setting this
- parameter might result in additional dependencies to be added
- to the unit (see above).</para></listitem>
+ <listitem><para>Takes a directory path relative to the host's root directory (i.e. the root of the system
+ running the service manager). Sets the root directory for executed processes, with the <citerefentry
+ project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry> system
+ call. If this is used, it must be ensured that the process binary and all its auxiliary files are available in
+ the <function>chroot()</function> jail. Note that setting this parameter might result in additional
+ dependencies to be added to the unit (see above).</para>
+
+ <para>The <varname>MountAPIVFS=</varname> and <varname>PrivateUsers=</varname> settings are particularly useful
+ in conjunction with <varname>RootDirectory=</varname>. For details, see below.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>RootImage=</varname></term>
+ <listitem><para>Takes a path to a block device node or regular file as argument. This call is similar to
+ <varname>RootDirectory=</varname> however mounts a file system hierarchy from a block device node or loopback
+ file instead of a directory. The device node or file system image file needs to contain a file system without a
+ partition table, or a file system within an MBR/MS-DOS or GPT partition table with only a single
+ Linux-compatible partition, or a set of file systems within a GPT partition table that follows the <ulink
+ url="https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/">Discoverable Partitions
+ Specification</ulink>.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>MountAPIVFS=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If on, a private mount namespace for the unit's processes is created
+ and the API file systems <filename>/proc</filename>, <filename>/sys</filename>, and <filename>/dev</filename>
+ are mounted inside of it, unless they are already mounted. Note that this option has no effect unless used in
+ conjunction with <varname>RootDirectory=</varname>/<varname>RootImage=</varname> as these three mounts are
+ generally mounted in the host anyway, and unless the root directory is changed, the private mount namespace
+ will be a 1:1 copy of the host's, and include these three mounts. Note that the <filename>/dev</filename> file
+ system of the host is bind mounted if this option is used without <varname>PrivateDevices=</varname>. To run
+ the service with a private, minimal version of <filename>/dev/</filename>, combine this option with
+ <varname>PrivateDevices=</varname>.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>User=</varname></term>
<term><varname>Group=</varname></term>
- <listitem><para>Sets the Unix user or group that the processes
- are executed as, respectively. Takes a single user or group
- name or ID as argument. If no group is set, the default group
- of the user is chosen. These do not affect commands prefixed with <literal>!</literal>.</para></listitem>
+ <listitem><para>Set the UNIX user or group that the processes are executed as, respectively. Takes a single
+ user or group name, or numeric ID as argument. For system services (services run by the system service manager,
+ i.e. managed by PID 1) and for user services of the root user (services managed by root's instance of
+ <command>systemd --user</command>), the default is <literal>root</literal>, but <varname>User=</varname> may be
+ used to specify a different user. For user services of any other user, switching user identity is not
+ permitted, hence the only valid setting is the same user the user's service manager is running as. If no group
+ is set, the default group of the user is used. This setting does not affect commands whose command line is
+ prefixed with <literal>+</literal>.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>DynamicUser=</varname></term>
+
+ <listitem><para>Takes a boolean parameter. If set, a UNIX user and group pair is allocated dynamically when the
+ unit is started, and released as soon as it is stopped. The user and group will not be added to
+ <filename>/etc/passwd</filename> or <filename>/etc/group</filename>, but are managed transiently during
+ runtime. The <citerefentry><refentrytitle>nss-systemd</refentrytitle><manvolnum>8</manvolnum></citerefentry>
+ glibc NSS module provides integration of these dynamic users/groups into the system's user and group
+ databases. The user and group name to use may be configured via <varname>User=</varname> and
+ <varname>Group=</varname> (see above). If these options are not used and dynamic user/group allocation is
+ enabled for a unit, the name of the dynamic user/group is implicitly derived from the unit name. If the unit
+ name without the type suffix qualifies as valid user name it is used directly, otherwise a name incorporating a
+ hash of it is used. If a statically allocated user or group of the configured name already exists, it is used
+ and no dynamic user/group is allocated. Dynamic users/groups are allocated from the UID/GID range
+ 61184…65519. It is recommended to avoid this range for regular system or login users. At any point in time
+ each UID/GID from this range is only assigned to zero or one dynamically allocated users/groups in
+ use. However, UID/GIDs are recycled after a unit is terminated. Care should be taken that any processes running
+ as part of a unit for which dynamic users/groups are enabled do not leave files or directories owned by these
+ users/groups around, as a different unit might get the same UID/GID assigned later on, and thus gain access to
+ these files or directories. If <varname>DynamicUser=</varname> is enabled, <varname>RemoveIPC=</varname>,
+ <varname>PrivateTmp=</varname> are implied. This ensures that the lifetime of IPC objects and temporary files
+ created by the executed processes is bound to the runtime of the service, and hence the lifetime of the dynamic
+ user/group. Since <filename>/tmp</filename> and <filename>/var/tmp</filename> are usually the only
+ world-writable directories on a system this ensures that a unit making use of dynamic user/group allocation
+ cannot leave files around after unit termination. Moreover <varname>ProtectSystem=strict</varname> and
+ <varname>ProtectHome=read-only</varname> are implied, thus prohibiting the service to write to arbitrary file
+ system locations. In order to allow the service to write to certain directories, they have to be whitelisted
+ using <varname>ReadWritePaths=</varname>, but care must be taken so that UID/GID recycling doesn't
+ create security issues involving files created by the service. Use <varname>RuntimeDirectory=</varname> (see
+ below) in order to assign a writable runtime directory to a service, owned by the dynamic user/group and
+ removed automatically when the unit is terminated. Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
this one will have no effect. In any way, this option does not
override, but extends the list of supplementary groups
configured in the system group database for the
- user. This does not affect commands prefixed with <literal>!</literal>.</para></listitem>
+ user. This does not affect commands prefixed with <literal>+</literal>.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>RemoveIPC=</varname></term>
+
+ <listitem><para>Takes a boolean parameter. If set, all System V and POSIX IPC objects owned by the user and
+ group the processes of this unit are run as are removed when the unit is stopped. This setting only has an
+ effect if at least one of <varname>User=</varname>, <varname>Group=</varname> and
+ <varname>DynamicUser=</varname> are used. It has no effect on IPC objects owned by the root user. Specifically,
+ this removes System V semaphores, as well as System V and POSIX shared memory segments and message queues. If
+ multiple units use the same user or group the IPC objects are removed when the last of these units is
+ stopped. This setting is implied if <varname>DynamicUser=</varname> is set.</para></listitem>
</varlistentry>
<varlistentry>
<option>null</option>,
<option>tty</option>,
<option>tty-force</option>,
- <option>tty-fail</option> or
- <option>socket</option>.</para>
+ <option>tty-fail</option>,
+ <option>socket</option> or
+ <option>fd</option>.</para>
<para>If <option>null</option> is selected, standard input
will be connected to <filename>/dev/null</filename>, i.e. all
<citerefentry project='freebsd'><refentrytitle>inetd</refentrytitle><manvolnum>8</manvolnum></citerefentry>
daemon.</para>
+ <para>The <option>fd</option> option connects
+ the input stream to a single file descriptor provided by a socket unit.
+ A custom named file descriptor can be specified as part of this option,
+ after a <literal>:</literal> (e.g. <literal>fd:<replaceable>foobar</replaceable></literal>).
+ If no name is specified, <literal>stdin</literal> is assumed
+ (i.e. <literal>fd</literal> is equivalent to <literal>fd:stdin</literal>).
+ At least one socket unit defining such name must be explicitly provided via the
+ <varname>Sockets=</varname> option, and file descriptor name may differ
+ from the name of its containing socket unit.
+ If multiple matches are found, the first one will be used.
+ See <varname>FileDescriptorName=</varname> in
+ <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+ for more details about named descriptors and ordering.</para>
+
<para>This setting defaults to
<option>null</option>.</para></listitem>
</varlistentry>
<option>kmsg</option>,
<option>journal+console</option>,
<option>syslog+console</option>,
- <option>kmsg+console</option> or
- <option>socket</option>.</para>
+ <option>kmsg+console</option>,
+ <option>socket</option> or
+ <option>fd</option>.</para>
<para><option>inherit</option> duplicates the file descriptor
of standard input for standard output.</para>
similar to the same option of
<varname>StandardInput=</varname>.</para>
+ <para>The <option>fd</option> option connects
+ the output stream to a single file descriptor provided by a socket unit.
+ A custom named file descriptor can be specified as part of this option,
+ after a <literal>:</literal> (e.g. <literal>fd:<replaceable>foobar</replaceable></literal>).
+ If no name is specified, <literal>stdout</literal> is assumed
+ (i.e. <literal>fd</literal> is equivalent to <literal>fd:stdout</literal>).
+ At least one socket unit defining such name must be explicitly provided via the
+ <varname>Sockets=</varname> option, and file descriptor name may differ
+ from the name of its containing socket unit.
+ If multiple matches are found, the first one will be used.
+ See <varname>FileDescriptorName=</varname> in
+ <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+ for more details about named descriptors and ordering.</para>
+
<para>If the standard output (or error output, see below) of a unit is connected to the journal, syslog or the
kernel log buffer, the unit will implicitly gain a dependency of type <varname>After=</varname> on
<filename>systemd-journald.socket</filename> (also see the automatic dependencies section above).</para>
<listitem><para>Controls where file descriptor 2 (STDERR) of
the executed processes is connected to. The available options
are identical to those of <varname>StandardOutput=</varname>,
- with one exception: if set to <option>inherit</option> the
+ with some exceptions: if set to <option>inherit</option> the
file descriptor used for standard output is duplicated for
- standard error. This setting defaults to the value set with
+ standard error, while <option>fd</option> operates on the error
+ stream and will look by default for a descriptor named
+ <literal>stderr</literal>.</para>
+
+ <para>This setting defaults to the value set with
<option>DefaultStandardError=</option> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
which defaults to <option>inherit</option>. Note that setting
<varname>MemoryLimit=</varname> is a more powerful (and
working) replacement for <varname>LimitRSS=</varname>.</para>
+ <para>For system units these resource limits may be chosen freely. For user units however (i.e. units run by a
+ per-user instance of
+ <citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>), these limits are
+ bound by (possibly more restrictive) per-user limits enforced by the OS.</para>
+
+ <para>Resource limits not configured explicitly for a unit default to the value configured in the various
+ <varname>DefaultLimitCPU=</varname>, <varname>DefaultLimitFSIZE=</varname>, … options available in
+ <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>, and –
+ if not configured there – the kernel or per-user defaults, as defined by the OS (the latter only for user
+ services, see above).</para>
+
<table>
- <title>Limit directives and their equivalent with ulimit</title>
+ <title>Resource limit directives, their equivalent <command>ulimit</command> shell commands and the unit used</title>
<tgroup cols='3'>
<colspec colname='directive' />
<thead>
<row>
<entry>Directive</entry>
- <entry>ulimit equivalent</entry>
+ <entry><command>ulimit</command> equivalent</entry>
<entry>Unit</entry>
</row>
</thead>
<varlistentry>
<term><varname>PAMName=</varname></term>
- <listitem><para>Sets the PAM service name to set up a session
- as. If set, the executed process will be registered as a PAM
- session under the specified service name. This is only useful
- in conjunction with the <varname>User=</varname> setting. If
- not set, no PAM session will be opened for the executed
- processes. See
- <citerefentry project='man-pages'><refentrytitle>pam</refentrytitle><manvolnum>8</manvolnum></citerefentry>
- for details.</para></listitem>
+ <listitem><para>Sets the PAM service name to set up a session as. If set, the executed process will be
+ registered as a PAM session under the specified service name. This is only useful in conjunction with the
+ <varname>User=</varname> setting, and is otherwise ignored. If not set, no PAM session will be opened for the
+ executed processes. See <citerefentry
+ project='man-pages'><refentrytitle>pam</refentrytitle><manvolnum>8</manvolnum></citerefentry> for
+ details.</para>
+
+ <para>Note that for each unit making use of this option a PAM session handler process will be maintained as
+ part of the unit and stays around as long as the unit is active, to ensure that appropriate actions can be
+ taken when the unit and hence the PAM session terminates. This process is named <literal>(sd-pam)</literal> and
+ is an immediate child process of the unit's main process.</para></listitem>
</varlistentry>
<varlistentry>
<listitem><para>Controls which capabilities to include in the capability bounding set for the executed
process. See <citerefentry
project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
- details. Takes a whitespace-separated list of capability names as read by <citerefentry
- project='mankier'><refentrytitle>cap_from_name</refentrytitle><manvolnum>3</manvolnum></citerefentry>,
- e.g. <constant>CAP_SYS_ADMIN</constant>, <constant>CAP_DAC_OVERRIDE</constant>,
- <constant>CAP_SYS_PTRACE</constant>. Capabilities listed will be included in the bounding set, all others are
- removed. If the list of capabilities is prefixed with <literal>~</literal>, all but the listed capabilities
- will be included, the effect of the assignment inverted. Note that this option also affects the respective
- capabilities in the effective, permitted and inheritable capability sets. If this option is not used, the
- capability bounding set is not modified on process execution, hence no limits on the capabilities of the
- process are enforced. This option may appear more than once, in which case the bounding sets are merged. If the
- empty string is assigned to this option, the bounding set is reset to the empty capability set, and all prior
- settings have no effect. If set to <literal>~</literal> (without any further argument), the bounding set is
- reset to the full set of available capabilities, also undoing any previous settings. This does not affect
- commands prefixed with <literal>!</literal>.</para></listitem>
+ details. Takes a whitespace-separated list of capability names, e.g. <constant>CAP_SYS_ADMIN</constant>,
+ <constant>CAP_DAC_OVERRIDE</constant>, <constant>CAP_SYS_PTRACE</constant>. Capabilities listed will be
+ included in the bounding set, all others are removed. If the list of capabilities is prefixed with
+ <literal>~</literal>, all but the listed capabilities will be included, the effect of the assignment
+ inverted. Note that this option also affects the respective capabilities in the effective, permitted and
+ inheritable capability sets. If this option is not used, the capability bounding set is not modified on process
+ execution, hence no limits on the capabilities of the process are enforced. This option may appear more than
+ once, in which case the bounding sets are merged. If the empty string is assigned to this option, the bounding
+ set is reset to the empty capability set, and all prior settings have no effect. If set to
+ <literal>~</literal> (without any further argument), the bounding set is reset to the full set of available
+ capabilities, also undoing any previous settings. This does not affect commands prefixed with
+ <literal>+</literal>.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>AmbientCapabilities=</varname></term>
- <listitem><para>Controls which capabilities to include in the
- ambient capability set for the executed process. Takes a
- whitespace-separated list of capability names as read by
- <citerefentry project='mankier'><refentrytitle>cap_from_name</refentrytitle><manvolnum>3</manvolnum></citerefentry>,
- e.g. <constant>CAP_SYS_ADMIN</constant>,
- <constant>CAP_DAC_OVERRIDE</constant>,
- <constant>CAP_SYS_PTRACE</constant>. This option may appear more than
- once in which case the ambient capability sets are merged.
- If the list of capabilities is prefixed with <literal>~</literal>, all
- but the listed capabilities will be included, the effect of the
- assignment inverted. If the empty string is
- assigned to this option, the ambient capability set is reset to
- the empty capability set, and all prior settings have no effect.
- If set to <literal>~</literal> (without any further argument), the
- ambient capability set is reset to the full set of available
- capabilities, also undoing any previous settings. Note that adding
- capabilities to ambient capability set adds them to the process's
- inherited capability set.
- </para><para>
- Ambient capability sets are useful if you want to execute a process
- as a non-privileged user but still want to give it some capabilities.
- Note that in this case option <constant>keep-caps</constant> is
- automatically added to <varname>SecureBits=</varname> to retain the
- capabilities over the user change. <varname>AmbientCapabilities=</varname> does not affect
- commands prefixed with <literal>!</literal>.</para></listitem>
+ <listitem><para>Controls which capabilities to include in the ambient capability set for the executed
+ process. Takes a whitespace-separated list of capability names, e.g. <constant>CAP_SYS_ADMIN</constant>,
+ <constant>CAP_DAC_OVERRIDE</constant>, <constant>CAP_SYS_PTRACE</constant>. This option may appear more than
+ once in which case the ambient capability sets are merged. If the list of capabilities is prefixed with
+ <literal>~</literal>, all but the listed capabilities will be included, the effect of the assignment
+ inverted. If the empty string is assigned to this option, the ambient capability set is reset to the empty
+ capability set, and all prior settings have no effect. If set to <literal>~</literal> (without any further
+ argument), the ambient capability set is reset to the full set of available capabilities, also undoing any
+ previous settings. Note that adding capabilities to ambient capability set adds them to the process's inherited
+ capability set. </para><para> Ambient capability sets are useful if you want to execute a process as a
+ non-privileged user but still want to give it some capabilities. Note that in this case option
+ <constant>keep-caps</constant> is automatically added to <varname>SecureBits=</varname> to retain the
+ capabilities over the user change. <varname>AmbientCapabilities=</varname> does not affect commands prefixed
+ with <literal>+</literal>.</para></listitem>
</varlistentry>
<varlistentry>
<option>noroot-locked</option>.
This option may appear more than once, in which case the secure
bits are ORed. If the empty string is assigned to this option,
- the bits are reset to 0. This does not affect commands prefixed with <literal>!</literal>.
+ the bits are reset to 0. This does not affect commands prefixed with <literal>+</literal>.
See <citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry>
for details.</para></listitem>
</varlistentry>
<varlistentry>
- <term><varname>ReadWriteDirectories=</varname></term>
- <term><varname>ReadOnlyDirectories=</varname></term>
- <term><varname>InaccessibleDirectories=</varname></term>
-
- <listitem><para>Sets up a new file system namespace for
- executed processes. These options may be used to limit access
- a process might have to the main file system hierarchy. Each
- setting takes a space-separated list of directory paths relative to
- the host's root directory (i.e. the system running the service manager).
- Directories listed in
- <varname>ReadWriteDirectories=</varname> are accessible from
- within the namespace with the same access rights as from
- outside. Directories listed in
- <varname>ReadOnlyDirectories=</varname> are accessible for
- reading only, writing will be refused even if the usual file
- access controls would permit this. Directories listed in
- <varname>InaccessibleDirectories=</varname> will be made
- inaccessible for processes inside the namespace, and may not
- countain any other mountpoints, including those specified by
- <varname>ReadWriteDirectories=</varname> or
- <varname>ReadOnlyDirectories=</varname>.
- Note that restricting access with these options does not extend
- to submounts of a directory that are created later on. These
- options may be specified more than once, in which case all
- directories listed will have limited access from within the
- namespace. If the empty string is assigned to this option, the
- specific list is reset, and all prior assignments have no
- effect.</para>
- <para>Paths in
- <varname>ReadOnlyDirectories=</varname>
- and
- <varname>InaccessibleDirectories=</varname>
- may be prefixed with
- <literal>-</literal>, in which case
- they will be ignored when they do not
- exist. Note that using this
- setting will disconnect propagation of
- mounts from the service to the host
- (propagation in the opposite direction
- continues to work). This means that
- this setting may not be used for
- services which shall be able to
- install mount points in the main mount
- namespace.</para></listitem>
+ <term><varname>ReadWritePaths=</varname></term>
+ <term><varname>ReadOnlyPaths=</varname></term>
+ <term><varname>InaccessiblePaths=</varname></term>
+
+ <listitem><para>Sets up a new file system namespace for executed processes. These options may be used to limit
+ access a process might have to the file system hierarchy. Each setting takes a space-separated list of paths
+ relative to the host's root directory (i.e. the system running the service manager). Note that if paths
+ contain symlinks, they are resolved relative to the root directory set with
+ <varname>RootDirectory=</varname>/<varname>RootImage=</varname>.</para>
+
+ <para>Paths listed in <varname>ReadWritePaths=</varname> are accessible from within the namespace with the same
+ access modes as from outside of it. Paths listed in <varname>ReadOnlyPaths=</varname> are accessible for
+ reading only, writing will be refused even if the usual file access controls would permit this. Nest
+ <varname>ReadWritePaths=</varname> inside of <varname>ReadOnlyPaths=</varname> in order to provide writable
+ subdirectories within read-only directories. Use <varname>ReadWritePaths=</varname> in order to whitelist
+ specific paths for write access if <varname>ProtectSystem=strict</varname> is used. Paths listed in
+ <varname>InaccessiblePaths=</varname> will be made inaccessible for processes inside the namespace (along with
+ everything below them in the file system hierarchy).</para>
+
+ <para>Note that restricting access with these options does not extend to submounts of a directory that are
+ created later on. Non-directory paths may be specified as well. These options may be specified more than once,
+ in which case all paths listed will have limited access from within the namespace. If the empty string is
+ assigned to this option, the specific list is reset, and all prior assignments have no effect.</para>
+
+ <para>Paths in <varname>ReadWritePaths=</varname>, <varname>ReadOnlyPaths=</varname> and
+ <varname>InaccessiblePaths=</varname> may be prefixed with <literal>-</literal>, in which case they will be
+ ignored when they do not exist. If prefixed with <literal>+</literal> the paths are taken relative to the root
+ directory of the unit, as configured with <varname>RootDirectory=</varname>/<varname>RootImage=</varname>,
+ instead of relative to the root directory of the host (see above). When combining <literal>-</literal> and
+ <literal>+</literal> on the same path make sure to specify <literal>-</literal> first, and <literal>+</literal>
+ second.</para>
+
+ <para>Note that using this setting will disconnect propagation of mounts from the service to the host
+ (propagation in the opposite direction continues to work). This means that this setting may not be used for
+ services which shall be able to install mount points in the main mount namespace. Note that the effect of these
+ settings may be undone by privileged processes. In order to set up an effective sandboxed environment for a
+ unit it is thus recommended to combine these settings with either
+ <varname>CapabilityBoundingSet=~CAP_SYS_ADMIN</varname> or
+ <varname>SystemCallFilter=~@mount</varname>.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>BindPaths=</varname></term>
+ <term><varname>BindReadOnlyPaths=</varname></term>
+
+ <listitem><para>Configures unit-specific bind mounts. A bind mount makes a particular file or directory
+ available at an additional place in the unit's view of the file system. Any bind mounts created with this
+ option are specific to the unit, and are not visible in the host's mount table. This option expects a
+ whitespace separated list of bind mount definitions. Each definition consists of a colon-separated triple of
+ source path, destination path and option string, where the latter two are optional. If only a source path is
+ specified the source and destination is taken to be the same. The option string may be either
+ <literal>rbind</literal> or <literal>norbind</literal> for configuring a recursive or non-recursive bind
+ mount. If the destination path is omitted, the option string must be omitted too.</para>
+
+ <para><varname>BindPaths=</varname> creates regular writable bind mounts (unless the source file system mount
+ is already marked read-only), while <varname>BindReadOnlyPaths=</varname> creates read-only bind mounts. These
+ settings may be used more than once, each usage appends to the unit's list of bind mounts. If the empty string
+ is assigned to either of these two options the entire list of bind mounts defined prior to this is reset. Note
+ that in this case both read-only and regular bind mounts are reset, regardless which of the two settings is
+ used.</para>
+
+ <para>This option is particularly useful when <varname>RootDirectory=</varname>/<varname>RootImage=</varname>
+ is used. In this case the source path refers to a path on the host file system, while the destination path
+ refers to a path below the root directory of the unit.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>PrivateTmp=</varname></term>
- <listitem><para>Takes a boolean argument. If true, sets up a
- new file system namespace for the executed processes and
- mounts private <filename>/tmp</filename> and
- <filename>/var/tmp</filename> directories inside it that is
- not shared by processes outside of the namespace. This is
- useful to secure access to temporary files of the process, but
- makes sharing between processes via <filename>/tmp</filename>
- or <filename>/var/tmp</filename> impossible. If this is
- enabled, all temporary files created by a service in these
- directories will be removed after the service is stopped.
- Defaults to false. It is possible to run two or more units
- within the same private <filename>/tmp</filename> and
- <filename>/var/tmp</filename> namespace by using the
+ <listitem><para>Takes a boolean argument. If true, sets up a new file system namespace for the executed
+ processes and mounts private <filename>/tmp</filename> and <filename>/var/tmp</filename> directories inside it
+ that is not shared by processes outside of the namespace. This is useful to secure access to temporary files of
+ the process, but makes sharing between processes via <filename>/tmp</filename> or <filename>/var/tmp</filename>
+ impossible. If this is enabled, all temporary files created by a service in these directories will be removed
+ after the service is stopped. Defaults to false. It is possible to run two or more units within the same
+ private <filename>/tmp</filename> and <filename>/var/tmp</filename> namespace by using the
<varname>JoinsNamespaceOf=</varname> directive, see
- <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>
- for details. Note that using this setting will disconnect
- propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work).
- This means that this setting may not be used for services
- which shall be able to install mount points in the main mount
- namespace.</para></listitem>
+ <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
+ details. This setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same
+ restrictions regarding mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and
+ related calls, see above. Enabling this setting has the side effect of adding <varname>Requires=</varname> and
+ <varname>After=</varname> dependencies on all mount units necessary to access <filename>/tmp</filename> and
+ <filename>/var/tmp</filename>. Moreover an implicitly <varname>After=</varname> ordering on
+ <citerefentry><refentrytitle>systemd-tmpfiles-setup.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
+ is added.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>PrivateDevices=</varname></term>
- <listitem><para>Takes a boolean argument. If true, sets up a
- new /dev namespace for the executed processes and only adds
- API pseudo devices such as <filename>/dev/null</filename>,
- <filename>/dev/zero</filename> or
- <filename>/dev/random</filename> (as well as the pseudo TTY
- subsystem) to it, but no physical devices such as
- <filename>/dev/sda</filename>. This is useful to securely turn
- off physical device access by the executed process. Defaults
- to false. Enabling this option will also remove
- <constant>CAP_MKNOD</constant> from the capability bounding
- set for the unit (see above), and set
- <varname>DevicePolicy=closed</varname> (see
+ <listitem><para>Takes a boolean argument. If true, sets up a new /dev namespace for the executed processes and
+ only adds API pseudo devices such as <filename>/dev/null</filename>, <filename>/dev/zero</filename> or
+ <filename>/dev/random</filename> (as well as the pseudo TTY subsystem) to it, but no physical devices such as
+ <filename>/dev/sda</filename>, system memory <filename>/dev/mem</filename>, system ports
+ <filename>/dev/port</filename> and others. This is useful to securely turn off physical device access by the
+ executed process. Defaults to false. Enabling this option will install a system call filter to block low-level
+ I/O system calls that are grouped in the <varname>@raw-io</varname> set, will also remove
+ <constant>CAP_MKNOD</constant> and <constant>CAP_SYS_RAWIO</constant> from the capability bounding set for
+ the unit (see above), and set <varname>DevicePolicy=closed</varname> (see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
- for details). Note that using this setting will disconnect
- propagation of mounts from the service to the host
- (propagation in the opposite direction continues to work).
- This means that this setting may not be used for services
- which shall be able to install mount points in the main mount
- namespace. The /dev namespace will be mounted read-only and 'noexec'.
- The latter may break old programs which try to set up executable
- memory by using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- of <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>.</para></listitem>
+ for details). Note that using this setting will disconnect propagation of mounts from the service to the host
+ (propagation in the opposite direction continues to work). This means that this setting may not be used for
+ services which shall be able to install mount points in the main mount namespace. The /dev namespace will be
+ mounted read-only and 'noexec'. The latter may break old programs which try to set up executable memory by
+ using <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> of
+ <filename>/dev/zero</filename> instead of using <constant>MAP_ANON</constant>. This setting is implied if
+ <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding mount propagation and
+ privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.
+ If turned on and if running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant>
+ capability (e.g. setting <varname>User=</varname>), <varname>NoNewPrivileges=yes</varname>
+ is implied.
+ </para></listitem>
</varlistentry>
<varlistentry>
accessible).</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>PrivateUsers=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If true, sets up a new user namespace for the executed processes and
+ configures a minimal user and group mapping, that maps the <literal>root</literal> user and group as well as
+ the unit's own user and group to themselves and everything else to the <literal>nobody</literal> user and
+ group. This is useful to securely detach the user and group databases used by the unit from the rest of the
+ system, and thus to create an effective sandbox environment. All files, directories, processes, IPC objects and
+ other resources owned by users/groups not equaling <literal>root</literal> or the unit's own will stay visible
+ from within the unit but appear owned by the <literal>nobody</literal> user and group. If this mode is enabled,
+ all unit processes are run without privileges in the host user namespace (regardless if the unit's own
+ user/group is <literal>root</literal> or not). Specifically this means that the process will have zero process
+ capabilities on the host's user namespace, but full capabilities within the service's user namespace. Settings
+ such as <varname>CapabilityBoundingSet=</varname> will affect only the latter, and there's no way to acquire
+ additional capabilities in the host's user namespace. Defaults to off.</para>
+
+ <para>This setting is particularly useful in conjunction with
+ <varname>RootDirectory=</varname>/<varname>RootImage=</varname>, as the need to synchronize the user and group
+ databases in the root directory and on the host is reduced, as the only users and groups who need to be matched
+ are <literal>root</literal>, <literal>nobody</literal> and the unit's own user and group.</para></listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>ProtectSystem=</varname></term>
- <listitem><para>Takes a boolean argument or
- <literal>full</literal>. If true, mounts the
- <filename>/usr</filename> and <filename>/boot</filename>
- directories read-only for processes invoked by this unit. If
- set to <literal>full</literal>, the <filename>/etc</filename>
- directory is mounted read-only, too. This setting ensures that
- any modification of the vendor-supplied operating system (and
- optionally its configuration) is prohibited for the service.
- It is recommended to enable this setting for all long-running
- services, unless they are involved with system updates or need
- to modify the operating system in other ways. Note however
- that processes retaining the CAP_SYS_ADMIN capability can undo
- the effect of this setting. This setting is hence particularly
- useful for daemons which have this capability removed, for
- example with <varname>CapabilityBoundingSet=</varname>.
- Defaults to off.</para></listitem>
+ <listitem><para>Takes a boolean argument or the special values <literal>full</literal> or
+ <literal>strict</literal>. If true, mounts the <filename>/usr</filename> and <filename>/boot</filename>
+ directories read-only for processes invoked by this unit. If set to <literal>full</literal>, the
+ <filename>/etc</filename> directory is mounted read-only, too. If set to <literal>strict</literal> the entire
+ file system hierarchy is mounted read-only, except for the API file system subtrees <filename>/dev</filename>,
+ <filename>/proc</filename> and <filename>/sys</filename> (protect these directories using
+ <varname>PrivateDevices=</varname>, <varname>ProtectKernelTunables=</varname>,
+ <varname>ProtectControlGroups=</varname>). This setting ensures that any modification of the vendor-supplied
+ operating system (and optionally its configuration, and local mounts) is prohibited for the service. It is
+ recommended to enable this setting for all long-running services, unless they are involved with system updates
+ or need to modify the operating system in other ways. If this option is used,
+ <varname>ReadWritePaths=</varname> may be used to exclude specific directories from being made read-only. This
+ setting is implied if <varname>DynamicUser=</varname> is set. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
+ above. Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>ProtectHome=</varname></term>
- <listitem><para>Takes a boolean argument or
- <literal>read-only</literal>. If true, the directories
- <filename>/home</filename>, <filename>/root</filename> and
- <filename>/run/user</filename>
- are made inaccessible and empty for processes invoked by this
- unit. If set to <literal>read-only</literal>, the three
- directories are made read-only instead. It is recommended to
- enable this setting for all long-running services (in
- particular network-facing ones), to ensure they cannot get
- access to private user data, unless the services actually
- require access to the user's private data. Note however that
- processes retaining the CAP_SYS_ADMIN capability can undo the
- effect of this setting. This setting is hence particularly
- useful for daemons which have this capability removed, for
- example with <varname>CapabilityBoundingSet=</varname>.
- Defaults to off.</para></listitem>
+ <listitem><para>Takes a boolean argument or <literal>read-only</literal>. If true, the directories
+ <filename>/home</filename>, <filename>/root</filename> and <filename>/run/user</filename> are made inaccessible
+ and empty for processes invoked by this unit. If set to <literal>read-only</literal>, the three directories are
+ made read-only instead. It is recommended to enable this setting for all long-running services (in particular
+ network-facing ones), to ensure they cannot get access to private user data, unless the services actually
+ require access to the user's private data. This setting is implied if <varname>DynamicUser=</varname> is
+ set. For this setting the same restrictions regarding mount propagation and privileges apply as for
+ <varname>ReadOnlyPaths=</varname> and related calls, see above.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>ProtectKernelTunables=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If true, kernel variables accessible through
+ <filename>/proc/sys</filename>, <filename>/sys</filename>, <filename>/proc/sysrq-trigger</filename>,
+ <filename>/proc/latency_stats</filename>, <filename>/proc/acpi</filename>,
+ <filename>/proc/timer_stats</filename>, <filename>/proc/fs</filename> and <filename>/proc/irq</filename> will
+ be made read-only to all processes of the unit. Usually, tunable kernel variables should be initialized only at
+ boot-time, for example with the
+ <citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry> mechanism. Few
+ services need to write to these at runtime; it is hence recommended to turn this on for most services. For this
+ setting the same restrictions regarding mount propagation and privileges apply as for
+ <varname>ReadOnlyPaths=</varname> and related calls, see above. Defaults to off. If turned on and if running
+ in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability (e.g. services
+ for which <varname>User=</varname> is set), <varname>NoNewPrivileges=yes</varname> is implied. Note that this
+ option does not prevent indirect changes to kernel tunables effected by IPC calls to other processes. However,
+ <varname>InaccessiblePaths=</varname> may be used to make relevant IPC file system objects inaccessible. If
+ <varname>ProtectKernelTunables=</varname> is set, <varname>MountAPIVFS=yes</varname> is
+ implied.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>ProtectKernelModules=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If true, explicit module loading will
+ be denied. This allows to turn off module load and unload operations on modular
+ kernels. It is recommended to turn this on for most services that do not need special
+ file systems or extra kernel modules to work. Default to off. Enabling this option
+ removes <constant>CAP_SYS_MODULE</constant> from the capability bounding set for
+ the unit, and installs a system call filter to block module system calls,
+ also <filename>/usr/lib/modules</filename> is made inaccessible. For this
+ setting the same restrictions regarding mount propagation and privileges
+ apply as for <varname>ReadOnlyPaths=</varname> and related calls, see above.
+ Note that limited automatic module loading due to user configuration or kernel
+ mapping tables might still happen as side effect of requested user operations,
+ both privileged and unprivileged. To disable module auto-load feature please see
+ <citerefentry><refentrytitle>sysctl.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+ <constant>kernel.modules_disabled</constant> mechanism and
+ <filename>/proc/sys/kernel/modules_disabled</filename> documentation.
+ If turned on and if running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant>
+ capability (e.g. setting <varname>User=</varname>), <varname>NoNewPrivileges=yes</varname>
+ is implied.
+ </para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>ProtectControlGroups=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If true, the Linux Control Groups (<citerefentry
+ project='man-pages'><refentrytitle>cgroups</refentrytitle><manvolnum>7</manvolnum></citerefentry>) hierarchies
+ accessible through <filename>/sys/fs/cgroup</filename> will be made read-only to all processes of the
+ unit. Except for container managers no services should require write access to the control groups hierarchies;
+ it is hence recommended to turn this on for most services. For this setting the same restrictions regarding
+ mount propagation and privileges apply as for <varname>ReadOnlyPaths=</varname> and related calls, see
+ above. Defaults to off. If <varname>ProtectControlGroups=</varname> is set, <varname>MountAPIVFS=yes</varname> is
+ implied.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>MountFlags=</varname></term>
- <listitem><para>Takes a mount propagation flag:
- <option>shared</option>, <option>slave</option> or
- <option>private</option>, which control whether mounts in the
- file system namespace set up for this unit's processes will
- receive or propagate mounts or unmounts. See
- <citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- for details. Defaults to <option>shared</option>. Use
- <option>shared</option> to ensure that mounts and unmounts are
- propagated from the host to the container and vice versa. Use
- <option>slave</option> to run processes so that none of their
- mounts and unmounts will propagate to the host. Use
- <option>private</option> to also ensure that no mounts and
- unmounts from the host will propagate into the unit processes'
- namespace. Note that <option>slave</option> means that file
- systems mounted on the host might stay mounted continuously in
- the unit's namespace, and thus keep the device busy. Note that
- the file system namespace related options
- (<varname>PrivateTmp=</varname>,
- <varname>PrivateDevices=</varname>,
- <varname>ProtectSystem=</varname>,
- <varname>ProtectHome=</varname>,
- <varname>ReadOnlyDirectories=</varname>,
- <varname>InaccessibleDirectories=</varname> and
- <varname>ReadWriteDirectories=</varname>) require that mount
- and unmount propagation from the unit's file system namespace
- is disabled, and hence downgrade <option>shared</option> to
+ <listitem><para>Takes a mount propagation flag: <option>shared</option>, <option>slave</option> or
+ <option>private</option>, which control whether mounts in the file system namespace set up for this unit's
+ processes will receive or propagate mounts and unmounts. See <citerefentry
+ project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry> for
+ details. Defaults to <option>shared</option>. Use <option>shared</option> to ensure that mounts and unmounts
+ are propagated from systemd's namespace to the service's namespace and vice versa. Use <option>slave</option>
+ to run processes so that none of their mounts and unmounts will propagate to the host. Use <option>private</option>
+ to also ensure that no mounts and unmounts from the host will propagate into the unit processes' namespace.
+ If this is set to <option>slave</option> or <option>private</option>, any mounts created by spawned processes
+ will be unmounted after the completion of the current command line of <varname>ExecStartPre=</varname>,
+ <varname>ExecStartPost=</varname>, <varname>ExecStart=</varname>,
+ and <varname>ExecStopPost=</varname>. Note that
+ <option>slave</option> means that file systems mounted on the host might stay mounted continuously in the
+ unit's namespace, and thus keep the device busy. Note that the file system namespace related options
+ (<varname>PrivateTmp=</varname>, <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>,
+ <varname>ProtectHome=</varname>, <varname>ProtectKernelTunables=</varname>,
+ <varname>ProtectControlGroups=</varname>, <varname>ReadOnlyPaths=</varname>,
+ <varname>InaccessiblePaths=</varname>, <varname>ReadWritePaths=</varname>) require that mount and unmount
+ propagation from the unit's file system namespace is disabled, and hence downgrade <option>shared</option> to
<option>slave</option>. </para></listitem>
</varlistentry>
domain transition. However, the policy still needs to
authorize the transition. This directive is ignored if SELinux
is disabled. If prefixed by <literal>-</literal>, all errors
- will be ignored. This does not affect commands prefixed with <literal>!</literal>.
+ will be ignored. This does not affect commands prefixed with <literal>+</literal>.
See <citerefentry project='die-net'><refentrytitle>setexeccon</refentrytitle><manvolnum>3</manvolnum></citerefentry>
for details.</para></listitem>
</varlistentry>
Profiles must already be loaded in the kernel, or the unit
will fail. This result in a non operation if AppArmor is not
enabled. If prefixed by <literal>-</literal>, all errors will
- be ignored. This does not affect commands prefixed with <literal>!</literal>.</para></listitem>
+ be ignored. This does not affect commands prefixed with <literal>+</literal>.</para></listitem>
</varlistentry>
<varlistentry>
<para>The value may be prefixed by <literal>-</literal>, in
which case all errors will be ignored. An empty value may be
specified to unset previous assignments. This does not affect
- commands prefixed with <literal>!</literal>.</para>
+ commands prefixed with <literal>+</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>NoNewPrivileges=</varname></term>
- <listitem><para>Takes a boolean argument. If true, ensures
- that the service process and all its children can never gain
- new privileges. This option is more powerful than the
- respective secure bits flags (see above), as it also prohibits
- UID changes of any kind. This is the simplest, most effective
- way to ensure that a process and its children can never
- elevate privileges again.</para></listitem>
+ <listitem><para>Takes a boolean argument. If true, ensures that the service process and all its children can
+ never gain new privileges through <function>execve()</function> (e.g. via setuid or setgid bits, or filesystem
+ capabilities). This is the simplest and most effective way to ensure that a process and its children can never
+ elevate privileges again. Defaults to false, but certain settings force
+ <varname>NoNewPrivileges=yes</varname>, ignoring the value of this setting. This is the case when
+ <varname>SystemCallFilter=</varname>, <varname>SystemCallArchitectures=</varname>,
+ <varname>RestrictAddressFamilies=</varname>, <varname>RestrictNamespaces=</varname>,
+ <varname>PrivateDevices=</varname>, <varname>ProtectKernelTunables=</varname>,
+ <varname>ProtectKernelModules=</varname>, <varname>MemoryDenyWriteExecute=</varname>, or
+ <varname>RestrictRealtime=</varname> are specified.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>SystemCallFilter=</varname></term>
- <listitem><para>Takes a space-separated list of system call
- names. If this setting is used, all system calls executed by
- the unit processes except for the listed ones will result in
- immediate process termination with the
- <constant>SIGSYS</constant> signal (whitelisting). If the
- first character of the list is <literal>~</literal>, the
- effect is inverted: only the listed system calls will result
- in immediate process termination (blacklisting). If running in
- user mode, or in system mode, but without the
- <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting
- <varname>User=nobody</varname>),
- <varname>NoNewPrivileges=yes</varname> is implied. This
- feature makes use of the Secure Computing Mode 2 interfaces of
- the kernel ('seccomp filtering') and is useful for enforcing a
- minimal sandboxing environment. Note that the
- <function>execve</function>,
- <function>rt_sigreturn</function>,
- <function>sigreturn</function>,
- <function>exit_group</function>, <function>exit</function>
- system calls are implicitly whitelisted and do not need to be
- listed explicitly. This option may be specified more than once,
- in which case the filter masks are merged. If the empty string
- is assigned, the filter is reset, all prior assignments will
- have no effect. This does not affect commands prefixed with <literal>!</literal>.</para>
+ <listitem><para>Takes a space-separated list of system call names. If this setting is used, all system calls
+ executed by the unit processes except for the listed ones will result in immediate process termination with the
+ <constant>SIGSYS</constant> signal (whitelisting). If the first character of the list is <literal>~</literal>,
+ the effect is inverted: only the listed system calls will result in immediate process termination
+ (blacklisting). If running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant>
+ capability (e.g. setting <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is
+ implied. This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp filtering')
+ and is useful for enforcing a minimal sandboxing environment. Note that the <function>execve</function>,
+ <function>exit</function>, <function>exit_group</function>, <function>getrlimit</function>,
+ <function>rt_sigreturn</function>, <function>sigreturn</function> system calls and the system calls for
+ querying time and sleeping are implicitly whitelisted and do not need to be listed explicitly. This option may
+ be specified more than once, in which case the filter masks are merged. If the empty string is assigned, the
+ filter is reset, all prior assignments will have no effect. This does not affect commands prefixed with
+ <literal>+</literal>.</para>
+
+ <para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off
+ alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
+ option. Specifically, it is recommended to combine this option with
+ <varname>SystemCallArchitectures=native</varname> or similar.</para>
+
+ <para>Note that strict system call filters may impact execution and error handling code paths of the service
+ invocation. Specifically, access to the <function>execve</function> system call is required for the execution
+ of the service binary — if it is blocked service invocation will necessarily fail. Also, if execution of the
+ service binary fails for some reason (for example: missing service executable), the error handling logic might
+ require access to an additional set of system calls in order to process and log this failure correctly. It
+ might be necessary to temporarily disable system call filters in order to simplify debugging of such
+ failures.</para>
<para>If you specify both types of this option (i.e.
whitelisting and blacklisting), the first encountered will
</row>
</thead>
<tbody>
+ <row>
+ <entry>@basic-io</entry>
+ <entry>System calls for basic I/O: reading, writing, seeking, file descriptor duplication and closing (<citerefentry project='man-pages'><refentrytitle>read</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>write</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry>
+ </row>
<row>
<entry>@clock</entry>
<entry>System calls for changing the system clock (<citerefentry project='man-pages'><refentrytitle>adjtimex</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>settimeofday</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry>
<entry>@debug</entry>
<entry>Debugging, performance monitoring and tracing functionality (<citerefentry project='man-pages'><refentrytitle>ptrace</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>perf_event_open</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry>
</row>
+ <row>
+ <entry>@file-system</entry>
+ <entry>File system operations: opening, creating files and directories for read and write, renaming and removing them, reading file properties, or creating hard and symbolic links.</entry>
+ </row>
<row>
<entry>@io-event</entry>
<entry>Event loop system calls (<citerefentry project='man-pages'><refentrytitle>poll</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>select</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>epoll</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>eventfd</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry>
</row>
<row>
<entry>@ipc</entry>
- <entry>SysV IPC, POSIX Message Queues or other IPC (<citerefentry project='man-pages'><refentrytitle>mq_overview</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>svipc</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry>
+ <entry>Pipes, SysV IPC, POSIX Message Queues and other IPC (<citerefentry project='man-pages'><refentrytitle>mq_overview</refentrytitle><manvolnum>7</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>svipc</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry>
</row>
<row>
<entry>@keyring</entry>
</row>
<row>
<entry>@module</entry>
- <entry>Kernel module control (<citerefentry project='man-pages'><refentrytitle>init_module</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>delete_module</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry>
+ <entry>Loading and unloading of kernel modules (<citerefentry project='man-pages'><refentrytitle>init_module</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>delete_module</refentrytitle><manvolnum>2</manvolnum></citerefentry> and related calls)</entry>
</row>
<row>
<entry>@mount</entry>
- <entry>File system mounting and unmounting (<citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry>
+ <entry>Mounting and unmounting of file systems (<citerefentry project='man-pages'><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>2</manvolnum></citerefentry>, and related calls)</entry>
</row>
<row>
<entry>@network-io</entry>
</row>
<row>
<entry>@process</entry>
- <entry>Process control, execution, namespaces (<citerefentry project='man-pages'><refentrytitle>execve</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>kill</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>, …</entry>
+ <entry>Process control, execution, namespaceing operations (<citerefentry project='man-pages'><refentrytitle>clone</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>kill</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>, …</entry>
</row>
<row>
<entry>@raw-io</entry>
- <entry>Raw I/O port access (<citerefentry project='man-pages'><refentrytitle>ioperm</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>iopl</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <function>pciconfig_read()</function>, …</entry>
+ <entry>Raw I/O port access (<citerefentry project='man-pages'><refentrytitle>ioperm</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>iopl</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <function>pciconfig_read()</function>, …)</entry>
+ </row>
+ <row>
+ <entry>@reboot</entry>
+ <entry>System calls for rebooting and reboot preparation (<citerefentry project='man-pages'><refentrytitle>reboot</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <function>kexec()</function>, …)</entry>
+ </row>
+ <row>
+ <entry>@resources</entry>
+ <entry>System calls for changing resource limits, memory and scheduling parameters (<citerefentry project='man-pages'><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setpriority</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
+ </row>
+ <row>
+ <entry>@swap</entry>
+ <entry>System calls for enabling/disabling swap devices (<citerefentry project='man-pages'><refentrytitle>swapon</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>swapoff</refentrytitle><manvolnum>2</manvolnum></citerefentry>)</entry>
</row>
</tbody>
</tgroup>
</table>
- Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
- above, so the contents of the sets may change between systemd versions.</para></listitem>
+ Note, that as new system calls are added to the kernel, additional system calls might be
+ added to the groups above. Contents of the sets may also change between systemd
+ versions. In addition, the list of system calls depends on the kernel version and
+ architecture for which systemd was compiled. Use
+ <command>systemd-analyze syscall-filter</command> to list the actual list of system calls in
+ each filter.
+ </para>
+
+ <para>It is recommended to combine the file system namespacing related options with
+ <varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the
+ mappings. Specifically these are the options <varname>PrivateTmp=</varname>,
+ <varname>PrivateDevices=</varname>, <varname>ProtectSystem=</varname>, <varname>ProtectHome=</varname>,
+ <varname>ProtectKernelTunables=</varname>, <varname>ProtectControlGroups=</varname>,
+ <varname>ReadOnlyPaths=</varname>, <varname>InaccessiblePaths=</varname> and
+ <varname>ReadWritePaths=</varname>.</para></listitem>
</varlistentry>
<varlistentry>
<varlistentry>
<term><varname>SystemCallArchitectures=</varname></term>
- <listitem><para>Takes a space-separated list of architecture
- identifiers to include in the system call filter. The known
- architecture identifiers are <constant>x86</constant>,
- <constant>x86-64</constant>, <constant>x32</constant>,
- <constant>arm</constant> as well as the special identifier
- <constant>native</constant>. Only system calls of the
- specified architectures will be permitted to processes of this
- unit. This is an effective way to disable compatibility with
- non-native architectures for processes, for example to
- prohibit execution of 32-bit x86 binaries on 64-bit x86-64
- systems. The special <constant>native</constant> identifier
- implicitly maps to the native architecture of the system (or
- more strictly: to the architecture the system manager is
- compiled for). If running in user mode, or in system mode,
- but without the <constant>CAP_SYS_ADMIN</constant>
- capability (e.g. setting <varname>User=nobody</varname>),
- <varname>NoNewPrivileges=yes</varname> is implied. Note
- that setting this option to a non-empty list implies that
- <constant>native</constant> is included too. By default, this
- option is set to the empty list, i.e. no architecture system
- call filtering is applied.</para></listitem>
+ <listitem><para>Takes a space-separated list of architecture identifiers to include in the system call
+ filter. The known architecture identifiers are the same as for <varname>ConditionArchitecture=</varname>
+ described in <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
+ as well as <constant>x32</constant>, <constant>mips64-n32</constant>, <constant>mips64-le-n32</constant>, and
+ the special identifier <constant>native</constant>. Only system calls of the specified architectures will be
+ permitted to processes of this unit. This is an effective way to disable compatibility with non-native
+ architectures for processes, for example to prohibit execution of 32-bit x86 binaries on 64-bit x86-64
+ systems. The special <constant>native</constant> identifier implicitly maps to the native architecture of the
+ system (or more strictly: to the architecture the system manager is compiled for). If running in user mode, or
+ in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting
+ <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is implied. Note that setting this
+ option to a non-empty list implies that <constant>native</constant> is included too. By default, this option is
+ set to the empty list, i.e. no system call architecture filtering is applied.</para>
+
+ <para>Note that system call filtering is not equally effective on all architectures. For example, on x86
+ filtering of network socket-related calls is not possible, due to ABI limitations — a limitation that x86-64
+ does not have, however. On systems supporting multiple ABIs at the same time — such as x86/x86-64 — it is hence
+ recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to
+ circumvent the restrictions applied to the native ABI of the system. In particular, setting
+ <varname>SystemCallFilter=native</varname> is a good choice for disabling non-native ABIs.</para>
+
+ <para>System call architectures may also be restricted system-wide via the
+ <varname>SystemCallArchitectures=</varname> option in the global configuration. See
+ <citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
+ details.</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>RestrictAddressFamilies=</varname></term>
- <listitem><para>Restricts the set of socket address families
- accessible to the processes of this unit. Takes a
- space-separated list of address family names to whitelist,
- such as
- <constant>AF_UNIX</constant>,
- <constant>AF_INET</constant> or
- <constant>AF_INET6</constant>. When
- prefixed with <constant>~</constant> the listed address
- families will be applied as blacklist, otherwise as whitelist.
- Note that this restricts access to the
- <citerefentry project='man-pages'><refentrytitle>socket</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- system call only. Sockets passed into the process by other
- means (for example, by using socket activation with socket
- units, see
- <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>)
- are unaffected. Also, sockets created with
- <function>socketpair()</function> (which creates connected
- AF_UNIX sockets only) are unaffected. Note that this option
- has no effect on 32-bit x86 and is ignored (but works
- correctly on x86-64). If running in user mode, or in system
- mode, but without the <constant>CAP_SYS_ADMIN</constant>
- capability (e.g. setting <varname>User=nobody</varname>),
- <varname>NoNewPrivileges=yes</varname> is implied. By
- default, no restriction applies, all address families are
- accessible to processes. If assigned the empty string, any
- previous list changes are undone.</para>
-
- <para>Use this option to limit exposure of processes to remote
- systems, in particular via exotic network protocols. Note that
- in most cases, the local <constant>AF_UNIX</constant> address
- family should be included in the configured whitelist as it is
- frequently used for local communication, including for
+ <listitem><para>Restricts the set of socket address families accessible to the processes of this unit. Takes a
+ space-separated list of address family names to whitelist, such as <constant>AF_UNIX</constant>,
+ <constant>AF_INET</constant> or <constant>AF_INET6</constant>. When prefixed with <constant>~</constant> the
+ listed address families will be applied as blacklist, otherwise as whitelist. Note that this restricts access
+ to the <citerefentry
+ project='man-pages'><refentrytitle>socket</refentrytitle><manvolnum>2</manvolnum></citerefentry> system call
+ only. Sockets passed into the process by other means (for example, by using socket activation with socket
+ units, see <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>)
+ are unaffected. Also, sockets created with <function>socketpair()</function> (which creates connected AF_UNIX
+ sockets only) are unaffected. Note that this option has no effect on 32-bit x86, s390, s390x, mips, mips-le,
+ ppc, ppc-le, pcc64, ppc64-le and is ignored (but works correctly on other ABIs, including x86-64). Note that on
+ systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off alternative ABIs for
+ services, so that they cannot be used to circumvent the restrictions of this option. Specifically, it is
+ recommended to combine this option with <varname>SystemCallArchitectures=native</varname> or similar. If
+ running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability
+ (e.g. setting <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is implied. By default,
+ no restrictions apply, all address families are accessible to processes. If assigned the empty string, any
+ previous address familiy restriction changes are undone. This setting does not affect commands prefixed with
+ <literal>+</literal>.</para>
+
+ <para>Use this option to limit exposure of processes to remote access, in particular via exotic and sensitive
+ network protocols, such as <constant>AF_PACKET</constant>. Note that in most cases, the local
+ <constant>AF_UNIX</constant> address family should be included in the configured whitelist as it is frequently
+ used for local communication, including for
<citerefentry><refentrytitle>syslog</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- logging. This does not affect commands prefixed with <literal>!</literal>.</para></listitem>
+ logging.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>RestrictNamespaces=</varname></term>
+
+ <listitem><para>Restricts access to Linux namespace functionality for the processes of this unit. For details
+ about Linux namespaces, see
+ <citerefentry><refentrytitle>namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>. Either takes a
+ boolean argument, or a space-separated list of namespace type identifiers. If false (the default), no
+ restrictions on namespace creation and switching are made. If true, access to any kind of namespacing is
+ prohibited. Otherwise, a space-separated list of namespace type identifiers must be specified, consisting of
+ any combination of: <constant>cgroup</constant>, <constant>ipc</constant>, <constant>net</constant>,
+ <constant>mnt</constant>, <constant>pid</constant>, <constant>user</constant> and <constant>uts</constant>. Any
+ namespace type listed is made accessible to the unit's processes, access to namespace types not listed is
+ prohibited (whitelisting). By prepending the list with a single tilda character (<literal>~</literal>) the
+ effect may be inverted: only the listed namespace types will be made inaccessible, all unlisted ones are
+ permitted (blacklisting). If the empty string is assigned, the default namespace restrictions are applied,
+ which is equivalent to false. Internally, this setting limits access to the
+ <citerefentry><refentrytitle>unshare</refentrytitle><manvolnum>2</manvolnum></citerefentry>,
+ <citerefentry><refentrytitle>clone</refentrytitle><manvolnum>2</manvolnum></citerefentry> and
+ <citerefentry><refentrytitle>setns</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls, taking
+ the specified flags parameters into account. Note that — if this option is used — in addition to restricting
+ creation and switching of the specified types of namespaces (or all of them, if true) access to the
+ <function>setns()</function> system call with a zero flags parameter is prohibited. This setting is only
+ supported on x86, x86-64, s390 and s390x, and enforces no restrictions on other architectures. If running in user
+ mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting
+ <varname>User=</varname>), <varname>NoNewPrivileges=yes</varname> is implied. </para></listitem>
</varlistentry>
<varlistentry>
<term><varname>MemoryDenyWriteExecute=</varname></term>
<listitem><para>Takes a boolean argument. If set, attempts to create memory mappings that are writable and
- executable at the same time, or to change existing memory mappings to become executable are prohibited.
- Specifically, a system call filter is added that rejects
- <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- system calls with both <constant>PROT_EXEC</constant> and <constant>PROT_WRITE</constant> set
- and <citerefentry><refentrytitle>mprotect</refentrytitle><manvolnum>2</manvolnum></citerefentry>
- system calls with <constant>PROT_EXEC</constant> set. Note that this option is incompatible with programs
- that generate program code dynamically at runtime, such as JIT execution engines, or programs compiled making
- use of the code "trampoline" feature of various C compilers. This option improves service security, as it makes
- harder for software exploits to change running code dynamically.
- </para></listitem>
+ executable at the same time, or to change existing memory mappings to become executable, or mapping shared
+ memory segments as executable are prohibited. Specifically, a system call filter is added that rejects
+ <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls with both
+ <constant>PROT_EXEC</constant> and <constant>PROT_WRITE</constant> set,
+ <citerefentry><refentrytitle>mprotect</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls with
+ <constant>PROT_EXEC</constant> set and
+ <citerefentry><refentrytitle>shmat</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls with
+ <constant>SHM_EXEC</constant> set. Note that this option is incompatible with programs that generate program
+ code dynamically at runtime, such as JIT execution engines, or programs compiled making use of the code
+ "trampoline" feature of various C compilers. This option improves service security, as it makes harder for
+ software exploits to change running code dynamically. Note that this feature is fully available on x86-64, and
+ partially on x86. Specifically, the <function>shmat()</function> protection is not available on x86. Note that
+ on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off alternative ABIs for
+ services, so that they cannot be used to circumvent the restrictions of this option. Specifically, it is
+ recommended to combine this option with <varname>SystemCallArchitectures=native</varname> or similar. If
+ running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability
+ (e.g. setting <varname>User=</varname>), <varname>NoNewPrivileges=yes</varname> is implied. </para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>RestrictRealtime=</varname></term>
+
+ <listitem><para>Takes a boolean argument. If set, any attempts to enable realtime scheduling in a process of
+ the unit are refused. This restricts access to realtime task scheduling policies such as
+ <constant>SCHED_FIFO</constant>, <constant>SCHED_RR</constant> or <constant>SCHED_DEADLINE</constant>. See
+ <citerefentry project='man-pages'><refentrytitle>sched</refentrytitle><manvolnum>7</manvolnum></citerefentry> for details about
+ these scheduling policies. If running in user mode, or in system mode, but
+ without the <constant>CAP_SYS_ADMIN</constant> capability
+ (e.g. setting <varname>User=</varname>), <varname>NoNewPrivileges=yes</varname>
+ is implied. Realtime scheduling policies may be used to monopolize CPU time for longer periods
+ of time, and may hence be used to lock up or otherwise trigger Denial-of-Service situations on the system. It
+ is hence recommended to restrict access to realtime scheduling to the few programs that actually require
+ them. Defaults to off.</para></listitem>
</varlistentry>
</variablelist>
</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>$INVOCATION_ID</varname></term>
+
+ <listitem><para>Contains a randomized, unique 128bit ID identifying each runtime cycle of the unit, formatted
+ as 32 character hexadecimal string. A new ID is assigned each time the unit changes from an inactive state into
+ an activating or active state, and may be used to identify this specific runtime cycle, in particular in data
+ stored offline, such as the journal. The same ID is passed to all processes run as part of the
+ unit.</para></listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>$XDG_RUNTIME_DIR</varname></term>
<varlistentry>
<term><varname>$MAINPID</varname></term>
- <listitem><para>The PID of the units main process if it is
+ <listitem><para>The PID of the unit's main process if it is
known. This is only set for control processes as invoked by
<varname>ExecReload=</varname> and similar. </para></listitem>
</varlistentry>
<citerefentry project='man-pages'><refentrytitle>termcap</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
</para></listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><varname>$JOURNAL_STREAM</varname></term>
+
+ <listitem><para>If the standard output or standard error output of the executed processes are connected to the
+ journal (for example, by setting <varname>StandardError=journal</varname>) <varname>$JOURNAL_STREAM</varname>
+ contains the device and inode numbers of the connection file descriptor, formatted in decimal, separated by a
+ colon (<literal>:</literal>). This permits invoked processes to safely detect whether their standard output or
+ standard error output are connected to the journal. The device and inode numbers of the file descriptors should
+ be compared with the values set in the environment variable to determine whether the process output is still
+ connected to the journal. Note that it is generally not sufficient to only check whether
+ <varname>$JOURNAL_STREAM</varname> is set at all as services might invoke external processes replacing their
+ standard output or standard error output, without unsetting the environment variable.</para>
+
+ <para>This environment variable is primarily useful to allow services to optionally upgrade their used log
+ protocol to the native journal protocol (using
+ <citerefentry><refentrytitle>sd_journal_print</refentrytitle><manvolnum>3</manvolnum></citerefentry> and other
+ functions) if their standard output or standard error output is connected to the journal anyway, thus enabling
+ delivery of structured metadata along with logged messages.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>$SERVICE_RESULT</varname></term>
+
+ <listitem><para>Only defined for the service unit type, this environment variable is passed to all
+ <varname>ExecStop=</varname> and <varname>ExecStopPost=</varname> processes, and encodes the service
+ "result". Currently, the following values are defined: <literal>protocol</literal> (in case of a protocol
+ violation; if a service did not take the steps required by its unit configuration), <literal>timeout</literal>
+ (in case of an operation timeout), <literal>exit-code</literal> (if a service process exited with a non-zero
+ exit code; see <varname>$EXIT_CODE</varname> below for the actual exit code returned), <literal>signal</literal>
+ (if a service process was terminated abnormally by a signal; see <varname>$EXIT_CODE</varname> below for the
+ actual signal used for the termination), <literal>core-dump</literal> (if a service process terminated
+ abnormally and dumped core), <literal>watchdog</literal> (if the watchdog keep-alive ping was enabled for the
+ service but it missed the deadline), or <literal>resources</literal> (a catch-all condition in case a system
+ operation failed).</para>
+
+ <para>This environment variable is useful to monitor failure or successful termination of a service. Even
+ though this variable is available in both <varname>ExecStop=</varname> and <varname>ExecStopPost=</varname>, it
+ is usually a better choice to place monitoring tools in the latter, as the former is only invoked for services
+ that managed to start up correctly, and the latter covers both services that failed during their start-up and
+ those which failed during their runtime.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>$EXIT_CODE</varname></term>
+ <term><varname>$EXIT_STATUS</varname></term>
+
+ <listitem><para>Only defined for the service unit type, these environment variables are passed to all
+ <varname>ExecStop=</varname>, <varname>ExecStopPost=</varname> processes and contain exit status/code
+ information of the main process of the service. For the precise definition of the exit code and status, see
+ <citerefentry><refentrytitle>wait</refentrytitle><manvolnum>2</manvolnum></citerefentry>. <varname>$EXIT_CODE</varname>
+ is one of <literal>exited</literal>, <literal>killed</literal>,
+ <literal>dumped</literal>. <varname>$EXIT_STATUS</varname> contains the numeric exit code formatted as string
+ if <varname>$EXIT_CODE</varname> is <literal>exited</literal>, and the signal name in all other cases. Note
+ that these environment variables are only set if the service manager succeeded to start and identify the main
+ process of the service.</para>
+
+ <table>
+ <title>Summary of possible service result variable values</title>
+ <tgroup cols='3'>
+ <colspec colname='result' />
+ <colspec colname='code' />
+ <colspec colname='status' />
+ <thead>
+ <row>
+ <entry><varname>$SERVICE_RESULT</varname></entry>
+ <entry><varname>$EXIT_CODE</varname></entry>
+ <entry><varname>$EXIT_STATUS</varname></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry morerows="1" valign="top"><literal>protocol</literal></entry>
+ <entry valign="top">not set</entry>
+ <entry>not set</entry>
+ </row>
+ <row>
+ <entry><literal>exited</literal></entry>
+ <entry><literal>0</literal></entry>
+ </row>
+
+ <row>
+ <entry morerows="1" valign="top"><literal>timeout</literal></entry>
+ <entry valign="top"><literal>killed</literal></entry>
+ <entry><literal>TERM</literal>, <literal>KILL</literal></entry>
+ </row>
+ <row>
+ <entry valign="top"><literal>exited</literal></entry>
+ <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal
+ >3</literal>, …, <literal>255</literal></entry>
+ </row>
+
+ <row>
+ <entry valign="top"><literal>exit-code</literal></entry>
+ <entry valign="top"><literal>exited</literal></entry>
+ <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal
+ >3</literal>, …, <literal>255</literal></entry>
+ </row>
+
+ <row>
+ <entry valign="top"><literal>signal</literal></entry>
+ <entry valign="top"><literal>killed</literal></entry>
+ <entry><literal>HUP</literal>, <literal>INT</literal>, <literal>KILL</literal>, …</entry>
+ </row>
+
+ <row>
+ <entry valign="top"><literal>core-dump</literal></entry>
+ <entry valign="top"><literal>dumped</literal></entry>
+ <entry><literal>ABRT</literal>, <literal>SEGV</literal>, <literal>QUIT</literal>, …</entry>
+ </row>
+
+ <row>
+ <entry morerows="2" valign="top"><literal>watchdog</literal></entry>
+ <entry><literal>dumped</literal></entry>
+ <entry><literal>ABRT</literal></entry>
+ </row>
+ <row>
+ <entry><literal>killed</literal></entry>
+ <entry><literal>TERM</literal>, <literal>KILL</literal></entry>
+ </row>
+ <row>
+ <entry><literal>exited</literal></entry>
+ <entry><literal>0</literal>, <literal>1</literal>, <literal>2</literal>, <literal
+ >3</literal>, …, <literal>255</literal></entry>
+ </row>
+
+ <row>
+ <entry><literal>resources</literal></entry>
+ <entry>any of the above</entry>
+ <entry>any of the above</entry>
+ </row>
+
+ <row>
+ <entry namest="results" nameend="code">Note: the process may be also terminated by a signal not sent by systemd. In particular the process may send an arbitrary signal to itself in a handler for any of the non-maskable signals. Nevertheless, in the <literal>timeout</literal> and <literal>watchdog</literal> rows above only the signals that systemd sends have been included.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ </listitem>
+ </varlistentry>
</variablelist>
<para>Additional variables may be configured by the following
<para>
<citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
+ <citerefentry><refentrytitle>systemd-analyze</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>journalctl</refentrytitle><manvolnum>8</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
</para>
</refsect1>
+
</refentry>