<xi:include href="system-only.xml" xpointer="singular"/></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>RootImagePolicy=</varname></term>
+ <term><varname>MountImagePolicy=</varname></term>
+ <term><varname>ExtensionImagePolicy=</varname></term>
+
+ <listitem><para>Takes an image policy string as per
+ <citerefentry><refentrytitle>systemd.image-policy</refentrytitle><manvolnum>7</manvolnum></citerefentry>
+ to use when mounting the disk images (DDI) specified in <varname>RootImage=</varname>,
+ <varname>MountImage=</varname>, <varname>ExtensionImage=</varname>, respectively. If not specified
+ the following policy string is the default for <varname>RootImagePolicy=</varname> and <varname>MountImagePolicy</varname>:</para>
+
+ <programlisting>root=verity+signed+encrypted+unprotected+absent: \
+ usr=verity+signed+encrypted+unprotected+absent: \
+ home=encrypted+unprotected+absent: \
+ srv=encrypted+unprotected+absent: \
+ tmp=encrypted+unprotected+absent: \
+ var=encrypted+unprotected+absent</programlisting>
+
+ <para>The default policy for <varname>ExtensionImagePolicy=</varname> is:</para>
+
+ <programlisting>root=verity+signed+encrypted+unprotected+absent: \
+ usr=verity+signed+encrypted+unprotected+absent</programlisting></listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>MountAPIVFS=</varname></term>
<term><varname>Personality=</varname></term>
<listitem><para>Controls which kernel architecture <citerefentry
- project='man-pages'><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry> shall report,
- when invoked by unit processes. Takes one of the architecture identifiers <constant>x86</constant>,
- <constant>x86-64</constant>, <constant>arm64</constant>, <constant>arm64-be</constant>, <constant>arm</constant>,
- <constant>arm-be</constant>, <constant>alpha</constant>, <constant>arc</constant>, <constant>arc-be</constant>,
- <constant>cris</constant>, <constant>ia64</constant>, <constant>loongarch64</constant>, <constant>m68k</constant>,
- <constant>mips64-le</constant>, <constant>mips64</constant>, <constant>mips-le</constant>, <constant>mips</constant>,
- <constant>nios2</constant>, <constant>parisc64</constant>, <constant>parisc</constant>, <constant>ppc64-le</constant>,
- <constant>ppc64</constant>, <constant>ppc</constant>, <constant>ppc-le</constant>, <constant>riscv32</constant>,
- <constant>riscv64</constant>, <constant>s390x</constant>, <constant>s390</constant>, <constant>sh64</constant>,
- <constant>sh</constant>, <constant>sparc64</constant>, <constant>sparc</constant> or <constant>tilegx</constant>.
- Which personality architectures are supported depends on the system architecture. Usually the 64bit versions of the various
- system architectures support their immediate 32bit personality architecture counterpart, but no others. For
- example, <constant>x86-64</constant> systems support the <constant>x86-64</constant> and
- <constant>x86</constant> personalities but no others. The personality feature is useful when running 32-bit
- services on a 64-bit host system. If not specified, the personality is left unmodified and thus reflects the
- personality of the host system's kernel.</para></listitem>
+ project='man-pages'><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry> shall
+ report, when invoked by unit processes. Takes one of the architecture identifiers
+ <constant>arm64</constant>, <constant>arm64-be</constant>, <constant>arm</constant>,
+ <constant>arm-be</constant>, <constant>x86</constant>, <constant>x86-64</constant>,
+ <constant>ppc</constant>, <constant>ppc-le</constant>, <constant>ppc64</constant>,
+ <constant>ppc64-le</constant>, <constant>s390</constant> or <constant>s390x</constant>. Which
+ personality architectures are supported depends on the kernel's native architecture. Usually the
+ 64bit versions of the various system architectures support their immediate 32bit personality
+ architecture counterpart, but no others. For example, <constant>x86-64</constant> systems support the
+ <constant>x86-64</constant> and <constant>x86</constant> personalities but no others. The personality
+ feature is useful when running 32-bit services on a 64-bit host system. If not specified, the
+ personality is left unmodified and thus reflects the personality of the host system's kernel. This
+ option is not useful on architectures for which only one native word width was ever available, such
+ as <constant>m68k</constant> (32bit only) or <constant>alpha</constant> (64bit only).</para></listitem>
</varlistentry>
<varlistentry>
not available), and the unit should be written in a way that does not solely rely on this setting for
security.</para>
+ <para>When this option is enabled, <varname>PrivateMounts=</varname> is implied unless it is
+ explicitly disabled, and <filename>/sys</filename> will be remounted to associate it with the new
+ network namespace.</para>
+
<para>When this option is used on a socket unit any sockets bound on behalf of this unit will be
bound within a private network namespace. This may be combined with
<varname>JoinsNamespaceOf=</varname> to listen on sockets inside of network namespaces of other
<varname>NetworkNamespacePath=</varname> configured, as otherwise the network namespace of those
units is reused.</para>
+ <para>When this option is enabled, <varname>PrivateMounts=</varname> is implied unless it is
+ explicitly disabled, and <filename>/sys</filename> will be remounted to associate it with the new
+ network namespace.</para>
+
<para>When this option is used on a socket unit any sockets bound on behalf of this unit will be
bound within the specified network namespace.</para>
<varlistentry>
<term><varname>ProtectClock=</varname></term>
- <listitem><para>Takes a boolean argument. If set, writes to the hardware clock or system clock will be denied.
- It is recommended to turn this on for most services that do not need modify the clock. Defaults to off. Enabling
- this option removes <constant>CAP_SYS_TIME</constant> and <constant>CAP_WAKE_ALARM</constant> from the
- capability bounding set for this unit, installs a system call filter to block calls that can set the
- clock, and <varname>DeviceAllow=char-rtc r</varname> is implied. This ensures <filename>/dev/rtc0</filename>,
- <filename>/dev/rtc1</filename>, etc. are made read-only to the service. See
+ <listitem><para>Takes a boolean argument. If set, writes to the hardware clock or system clock will
+ be denied. Defaults to off. Enabling this option removes <constant>CAP_SYS_TIME</constant> and
+ <constant>CAP_WAKE_ALARM</constant> from the capability bounding set for this unit, installs a system
+ call filter to block calls that can set the clock, and <varname>DeviceAllow=char-rtc r</varname> is
+ implied. Note that the system calls are blocked altogether, the filter does not take into account
+ that some of the calls can be used to read the clock state with some parameter combinations.
+ Effectively, <filename>/dev/rtc0</filename>, <filename>/dev/rtc1</filename>, etc. are made read-only
+ to the service. See
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
- for the details about <varname>DeviceAllow=</varname>. If this setting is on, but the unit
- doesn't have the <constant>CAP_SYS_ADMIN</constant> capability (e.g. services for which
+ for the details about <varname>DeviceAllow=</varname>. If this setting is on, but the unit doesn't
+ have the <constant>CAP_SYS_ADMIN</constant> capability (e.g. services for which
<varname>User=</varname> is set), <varname>NoNewPrivileges=yes</varname> is implied.</para>
+ <para>It is recommended to turn this on for most services that do not need modify the clock or check
+ its state.</para>
+
<xi:include href="system-or-user-ns.xml" xpointer="singular"/></listitem>
</varlistentry>
<listitem><para>Takes a boolean argument. If set, attempts to create memory mappings that are writable and
executable at the same time, or to change existing memory mappings to become executable, or mapping shared
- memory segments as executable, are prohibited. Specifically, a system call filter is added that rejects
- <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls with both
- <constant>PROT_EXEC</constant> and <constant>PROT_WRITE</constant> set,
+ memory segments as executable, are prohibited. Specifically, a system call filter is added (or
+ preferably, an equivalent kernel check is enabled with
+ <citerefentry><refentrytitle>prctl</refentrytitle><manvolnum>2</manvolnum></citerefentry>) that
+ rejects <citerefentry><refentrytitle>mmap</refentrytitle><manvolnum>2</manvolnum></citerefentry>
+ system calls with both <constant>PROT_EXEC</constant> and <constant>PROT_WRITE</constant> set,
<citerefentry><refentrytitle>mprotect</refentrytitle><manvolnum>2</manvolnum></citerefentry> or
<citerefentry><refentrytitle>pkey_mprotect</refentrytitle><manvolnum>2</manvolnum></citerefentry> system calls
with <constant>PROT_EXEC</constant> set and
<entry>@obsolete</entry>
<entry>Unusual, obsolete or unimplemented (<citerefentry project='man-pages'><refentrytitle>create_module</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>gtty</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
</row>
+ <row>
+ <entry>@pkey</entry>
+ <entry>System calls that deal with memory protection keys (<citerefentry project='man-pages'><refentrytitle>pkeys</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry>
+ </row>
<row>
<entry>@privileged</entry>
<entry>All system calls which need super-user capabilities (<citerefentry project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry>)</entry>
<entry>@resources</entry>
<entry>System calls for changing resource limits, memory and scheduling parameters (<citerefentry project='man-pages'><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setpriority</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
</row>
+ <row>
+ <entry>@sandbox</entry>
+ <entry>System calls for sandboxing programs (<citerefentry project='man-pages'><refentrytitle>seccomp</refentrytitle><manvolnum>2</manvolnum></citerefentry>, Landlock system calls, …)</entry>
+ </row>
<row>
<entry>@setuid</entry>
<entry>System calls for changing user ID and group ID credentials, (<citerefentry project='man-pages'><refentrytitle>setuid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setgid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setresuid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
<term><varname>LogExtraFields=</varname></term>
<listitem><para>Configures additional log metadata fields to include in all log records generated by
- processes associated with this unit. This setting takes one or more journal field assignments in the
- format <literal>FIELD=VALUE</literal> separated by whitespace. See
+ processes associated with this unit, including systemd. This setting takes one or more journal field
+ assignments in the format <literal>FIELD=VALUE</literal> separated by whitespace. See
<citerefentry><refentrytitle>systemd.journal-fields</refentrytitle><manvolnum>7</manvolnum></citerefentry>
for details on the journal field concept. Even though the underlying journal implementation permits
binary field values, this setting accepts only valid UTF-8 values. To include space characters in a
<term><varname>LogRateLimitIntervalSec=</varname></term>
<term><varname>LogRateLimitBurst=</varname></term>
- <listitem><para>Configures the rate limiting that is applied to log messages generated by this
- unit. If, in the time interval defined by <varname>LogRateLimitIntervalSec=</varname>, more messages
- than specified in <varname>LogRateLimitBurst=</varname> are logged by a service, all further messages
+ <listitem><para>Configures the rate limiting that is applied to log messages generated by this unit.
+ If, in the time interval defined by <varname>LogRateLimitIntervalSec=</varname>, more messages than
+ specified in <varname>LogRateLimitBurst=</varname> are logged by a service, all further messages
within the interval are dropped until the interval is over. A message about the number of dropped
messages is generated. The time specification for <varname>LogRateLimitIntervalSec=</varname> may be
- specified in the following units: "s", "min", "h", "ms", "us" (see
+ specified in the following units: "s", "min", "h", "ms", "us". See
<citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
- details). The default settings are set by <varname>RateLimitIntervalSec=</varname> and
+ details. The default settings are set by <varname>RateLimitIntervalSec=</varname> and
<varname>RateLimitBurst=</varname> configured in
- <citerefentry><refentrytitle>journald.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. Note
- that this only applies to log messages that are processed by the logging subsystem, i.e. by
- <filename>systemd-journald.service</filename>. This means, if you connect a service's stderr directly
- to a file via <varname>StandardOutput=file:…</varname> or a similar setting the rate limiting will
- not be applied to messages written that way (but they will be enforced for messages generated via
- <function>syslog()</function> or similar).</para></listitem>
+ <citerefentry><refentrytitle>journald.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
+ Note that this only applies to log messages that are processed by the logging subsystem, i.e. by
+ <citerefentry><refentrytitle>systemd-journald.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
+ This means that if you connect a service's stderr directly to a file via
+ <varname>StandardOutput=file:…</varname> or a similar setting, the rate limiting will not be applied
+ to messages written that way (but it will be enforced for messages generated via
+ <citerefentry project='man-pages'><refentrytitle>syslog</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ and similar functions).</para></listitem>
</varlistentry>
<varlistentry>
authenticated credentials improves security as credentials are not stored in plaintext and only
authenticated and decrypted into plaintext the moment a service requiring them is started. Moreover,
credentials may be bound to the local hardware and installations, so that they cannot easily be
- analyzed offline, or be generated externally.</para>
+ analyzed offline, or be generated externally. When <varname>DevicePolicy=</varname> is set to
+ <literal>closed</literal> or <literal>strict</literal>, or set to <literal>auto</literal> and
+ <varname>DeviceAllow=</varname> is set, or <varname>PrivateDevices=</varname> is set, then this
+ setting adds <filename>/dev/tpmrm0</filename> with <constant>rw</constant> mode to
+ <varname>DeviceAllow=</varname>. See
+ <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+ for the details about <varname>DevicePolicy=</varname> or <varname>DeviceAllow=</varname>.</para>
<para>The credential files/IPC sockets must be accessible to the service manager, but don't have to
be directly accessible to the unit's processes: the credential data is read and copied into separate,
<varlistentry>
<term><varname>$NOTIFY_SOCKET</varname></term>
- <listitem><para>The socket
- <function>sd_notify()</function> talks to. See
+ <listitem><para>The socket <function>sd_notify()</function> talks to. See
<citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>.
</para></listitem>
</varlistentry>
system.</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>$REMOTE_ADDR</varname></term>
+ <term><varname>$REMOTE_PORT</varname></term>
+
+ <listitem><para>If this is a unit started via per-connection socket activation (i.e. via a socket
+ unit with <varname>Accept=yes</varname>), these environment variables contain the IP address and
+ port number of the remote peer of the socket connection.</para></listitem>
+ </varlistentry>
+
<varlistentry>
<term><varname>$TRIGGER_UNIT</varname></term>
<term><varname>$TRIGGER_PATH</varname></term>
</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>$MEMORY_PRESSURE_WATCH</varname></term>
+ <term><varname>$MEMORY_PRESSURE_WRITE</varname></term>
+
+ <listitem><para>If memory pressure monitoring is enabled for this service unit, the path to watch
+ and the data to write into it. See <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure
+ Handling</ulink> for details about these variables and the service protocol data they
+ convey.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>$FDSTORE</varname></term>
+
+ <listitem><para>If the file descriptor store is enabled for a service
+ (i.e. <varname>FileDescriptorStoreMax=</varname> is set to a non-zero value, see
+ <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+ for details), this environment variable will be set to the maximum number of permitted entries, as
+ per the setting. Applications may check this environment variable before sending file descriptors
+ to the service manager via <function>sd_pid_notify_with_fds()</function> (see
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry> for
+ details).</para></listitem>
+ </varlistentry>
+
</variablelist>
<para>For system services, when <varname>PAMName=</varname> is enabled and <command>pam_systemd</command> is part