X-Git-Url: http://git.ipfire.org/?p=thirdparty%2Fsystemd.git;a=blobdiff_plain;f=man%2Fsystemd.exec.xml;h=c339f3b88582737c0eb25bdc58ff884b64934ec2;hp=8f57cc8bfb964fe31a7d34063f9acac2efe98c34;hb=a6991726f80c299ac7275f4570e310e1dd5bce96;hpb=b3d15d90c0ea163ddea1de82cc8e6f2f1aaefa4b diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml index 8f57cc8bfb9..c339f3b8858 100644 --- a/man/systemd.exec.xml +++ b/man/systemd.exec.xml @@ -145,6 +145,70 @@ + + RootImageOptions= + + Takes a comma-separated list of mount options that will be used on disk images specified by + RootImage=. Optionally a partition number can be prefixed, followed by colon, in + case the image has multiple partitions, otherwise partition number 0 is implied. + Options for multiple partitions can be specified in a single line with space separators. Assigning an empty + string removes previous assignments. For a list of valid mount options, please refer to + mount8. + + + + + + RootHash= + + Takes a data integrity (dm-verity) root hash specified in hexadecimal, or the path to a file + containing a root hash in ASCII hexadecimal format. This option enables data integrity checks using dm-verity, + if the used image contains the appropriate integrity data (see above) or if RootVerity= is used. + The specified hash must match the root hash of integrity data, and is usually at least 256 bits (and hence 64 + formatted hexadecimal characters) long (in case of SHA256 for example). If this option is not specified, but + the image file carries the user.verity.roothash extended file attribute (see xattr7), then the root + hash is read from it, also as formatted hexadecimal characters. If the extended file attribute is not found (or + is not supported by the underlying file system), but a file with the .roothash suffix is + found next to the image file, bearing otherwise the same name (except if the image has the + .raw suffix, in which case the root hash file must not have it in its name), the root hash + is read from it and automatically used, also as formatted hexadecimal characters. + + + + + + RootHashSignature= + + Takes a PKCS7 formatted binary signature of the RootHash= option as a path + to a DER encoded signature file or as an ASCII base64 string encoding of the DER encoded signature, prefixed + by base64:. The dm-verity volume will only be opened if the signature of the root hash + signature is valid and created by a public key present in the kernel keyring. If this option is not specified, + but a file with the .roothash.p7s suffix is found next to the image file, bearing otherwise + the same name (except if the image has the .raw suffix, in which case the signature file + must not have it in its name), the signature is read from it and automatically used. + + + + + + RootVerity= + + Takes the path to a data integrity (dm-verity) file. This option enables data integrity checks + using dm-verity, if RootImage= is used and a root-hash is passed and if the used image itself + does not contains the integrity data. The integrity data must be matched by the root hash. If this option is not + specified, but a file with the .verity suffix is found next to the image file, bearing otherwise + the same name (except if the image has the .raw suffix, in which case the verity data file must + not have it in its name), the verity data is read from it and automatically used. + + This option is supported only for disk images that contain a single file system, without an + enveloping partition table. Images that contain a GPT partition table should instead include both + root file system and matching Verity data in the same image, implementing the Discoverable Partition Specification. + + + + MountAPIVFS= @@ -197,6 +261,42 @@ + + MountImages= + + This setting is similar to RootImage= in that it mounts a file + system hierarchy from a block device node or loopback file, but the destination directory can be + specified as well as mount options. This option expects a whitespace separated list of mount + definitions. Each definition consists of a colon-separated tuple of source path and destination + directory. Each mount definition may be prefixed with -, in which case it will be + ignored when its source path does not exist. The source argument is a path to a block device node or + regular file. If source or destination contain a :, it needs to be escaped as + \:. + The device node or file system image file needs to follow the same rules as specified + for RootImage=. Any mounts created with this option are specific to the unit, and + are not visible in the host's mount table. + + These settings may be used more than once, each usage appends to the unit's list of mount + paths. If the empty string is assigned, the entire list of mount paths defined prior to this is + reset. + + Note that the destination directory must exist or systemd must be able to create it. Thus, it + is not possible to use those options for mount points nested underneath paths specified in + InaccessiblePaths=, or under /home/ and other protected + directories if ProtectHome=yes is specified. + + When DevicePolicy= is set to closed or + strict, or set to auto and DeviceAllow= is + set, then this setting adds /dev/loop-control with rw mode, + block-loop and block-blkext with rwm mode + to DeviceAllow=. See + systemd.resource-control5 + for the details about DevicePolicy= or DeviceAllow=. Also, see + PrivateDevices= below, as it may change the setting of + DevicePolicy=. + + + @@ -281,7 +381,7 @@ files or directories. Moreover ProtectSystem=strict and ProtectHome=read-only are implied, thus prohibiting the service to write to arbitrary file system locations. In order to allow the service to write to certain directories, they - have to be whitelisted using ReadWritePaths=, but care must be taken so that + have to be allow-listed using ReadWritePaths=, but care must be taken so that UID/GID recycling doesn't create security issues involving files created by the service. Use RuntimeDirectory= (see below) in order to assign a writable runtime directory to a service, owned by the dynamic user/group and removed automatically when the unit is terminated. Use @@ -460,10 +560,11 @@ CapabilityBoundingSet=~CAP_B CAP_C AppArmorProfile= - Takes a profile name as argument. The process executed by the unit will switch to this profile - when started. Profiles must already be loaded in the kernel, or the unit will fail. This result in a non - operation if AppArmor is not enabled. If prefixed by -, all errors will be ignored. This - does not affect commands prefixed with +. + Takes a profile name as argument. The process executed by the unit will switch to + this profile when started. Profiles must already be loaded in the kernel, or the unit will fail. If + prefixed by -, all errors will be ignored. This setting has no effect if AppArmor + is not enabled. This setting not affect commands prefixed with +. + @@ -681,9 +782,9 @@ CapabilityBoundingSet=~CAP_B CAP_C kernel default of private-anonymous shared-anonymous elf-headers private-huge). See - core5 for the - meaning of the mapping types. When specified multiple times, all specified masks are ORed. When not - set, or if the empty value is assigned, the inherited value is not changed. + core5 + for the meaning of the mapping types. When specified multiple times, all specified masks are + ORed. When not set, or if the empty value is assigned, the inherited value is not changed. Add DAX pages to the dump filter @@ -829,7 +930,7 @@ CapabilityBoundingSet=~CAP_B CAP_C in NUMAMask=. For more details on each policy please see, set_mempolicy2. For overall overview of NUMA support in Linux see, - numa7 + numa7. @@ -1016,14 +1117,16 @@ CapabilityBoundingSet=~CAP_B CAP_C RootDirectory= or RootImage= these paths always reside on the host and are mounted from there into the unit's file system namespace. - If DynamicUser= is used in conjunction with StateDirectory=, - CacheDirectory= and LogsDirectory= is slightly altered: the directories - are created below /var/lib/private, /var/cache/private and + If DynamicUser= is used in conjunction with + StateDirectory=, the logic for CacheDirectory= and + LogsDirectory= is slightly altered: the directories are created below + /var/lib/private, /var/cache/private and /var/log/private, respectively, which are host directories made inaccessible to - unprivileged users, which ensures that access to these directories cannot be gained through dynamic user ID - recycling. Symbolic links are created to hide this difference in behaviour. Both from perspective of the host - and from inside the unit, the relevant directories hence always appear directly below - /var/lib, /var/cache and /var/log. + unprivileged users, which ensures that access to these directories cannot be gained through dynamic + user ID recycling. Symbolic links are created to hide this difference in behaviour. Both from + perspective of the host and from inside the unit, the relevant directories hence always appear + directly below /var/lib, /var/cache and + /var/log. Use RuntimeDirectory= to manage one or more runtime directories for the unit and bind their lifetime to the daemon runtime. This is particularly useful for unprivileged daemons that cannot create @@ -1098,8 +1201,8 @@ StateDirectory=aaa/bbb ccc clean …, see systemctl1 for details. Takes the usual time values and defaults to infinity, i.e. by default - no time-out is applied. If a time-out is configured the clean operation will be aborted forcibly when - the time-out is reached, potentially leaving resources on disk. + no timeout is applied. If a timeout is configured the clean operation will be aborted forcibly when + the timeout is reached, potentially leaving resources on disk. @@ -1113,12 +1216,13 @@ StateDirectory=aaa/bbb ccc contain symlinks, they are resolved relative to the root directory set with RootDirectory=/RootImage=. - Paths listed in ReadWritePaths= are accessible from within the namespace with the same - access modes as from outside of it. Paths listed in ReadOnlyPaths= are accessible for - reading only, writing will be refused even if the usual file access controls would permit this. Nest - ReadWritePaths= inside of ReadOnlyPaths= in order to provide writable - subdirectories within read-only directories. Use ReadWritePaths= in order to whitelist - specific paths for write access if ProtectSystem=strict is used. + Paths listed in ReadWritePaths= are accessible from within the namespace + with the same access modes as from outside of it. Paths listed in ReadOnlyPaths= + are accessible for reading only, writing will be refused even if the usual file access controls would + permit this. Nest ReadWritePaths= inside of ReadOnlyPaths= in + order to provide writable subdirectories within read-only directories. Use + ReadWritePaths= in order to allow-list specific paths for write access if + ProtectSystem=strict is used. Paths listed in InaccessiblePaths= will be made inaccessible for processes inside the namespace along with everything below them in the file system hierarchy. This may be more restrictive than @@ -1186,8 +1290,8 @@ BindReadOnlyPaths=/var/lib/systemd PrivateTmp= Takes a boolean argument. If true, sets up a new file system namespace for the executed - processes and mounts private /tmp and /var/tmp directories inside it - that is not shared by processes outside of the namespace. This is useful to secure access to temporary files of + processes and mounts private /tmp/ and /var/tmp/ directories inside it + that are not shared by processes outside of the namespace. This is useful to secure access to temporary files of the process, but makes sharing between processes via /tmp or /var/tmp impossible. If this is enabled, all temporary files created by a service in these directories will be removed after the service is stopped. Defaults to false. It is possible to run two or more units within the same @@ -1347,7 +1451,7 @@ BindReadOnlyPaths=/var/lib/systemd this option removes CAP_SYS_TIME and CAP_WAKE_ALARM from the capability bounding set for this unit, installs a system call filter to block calls that can set the clock, and DeviceAllow=char-rtc r is implied. This ensures /dev/rtc0, - /dev/rtc1, etc are made read only to the service. See + /dev/rtc1, etc. are made read-only to the service. See systemd.resource-control5 for the details about DeviceAllow=. @@ -1432,29 +1536,31 @@ BindReadOnlyPaths=/var/lib/systemd RestrictAddressFamilies= - Restricts the set of socket address families accessible to the processes of this unit. Takes a - space-separated list of address family names to whitelist, such as AF_UNIX, - AF_INET or AF_INET6. When prefixed with ~ the - listed address families will be applied as blacklist, otherwise as whitelist. Note that this restricts access - to the socket2 system call - only. Sockets passed into the process by other means (for example, by using socket activation with socket - units, see systemd.socket5) - are unaffected. Also, sockets created with socketpair() (which creates connected AF_UNIX - sockets only) are unaffected. Note that this option has no effect on 32-bit x86, s390, s390x, mips, mips-le, - ppc, ppc-le, pcc64, ppc64-le and is ignored (but works correctly on other ABIs, including x86-64). Note that on - systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off alternative ABIs for - services, so that they cannot be used to circumvent the restrictions of this option. Specifically, it is - recommended to combine this option with SystemCallArchitectures=native or similar. If - running in user mode, or in system mode, but without the CAP_SYS_ADMIN capability - (e.g. setting User=nobody), NoNewPrivileges=yes is implied. By default, - no restrictions apply, all address families are accessible to processes. If assigned the empty string, any - previous address family restriction changes are undone. This setting does not affect commands prefixed with - +. + Restricts the set of socket address families accessible to the processes of this + unit. Takes a space-separated list of address family names to allow-list, such as + AF_UNIX, AF_INET or AF_INET6. When + prefixed with ~ the listed address families will be applied as deny list, + otherwise as allow list. Note that this restricts access to the socket2 + system call only. Sockets passed into the process by other means (for example, by using socket + activation with socket units, see + systemd.socket5) + are unaffected. Also, sockets created with socketpair() (which creates connected + AF_UNIX sockets only) are unaffected. Note that this option has no effect on 32-bit x86, s390, s390x, + mips, mips-le, ppc, ppc-le, ppc64, ppc64-le and is ignored (but works correctly on other ABIs, + including x86-64). Note that on systems supporting multiple ABIs (such as x86/x86-64) it is + recommended to turn off alternative ABIs for services, so that they cannot be used to circumvent the + restrictions of this option. Specifically, it is recommended to combine this option with + SystemCallArchitectures=native or similar. If running in user mode, or in system + mode, but without the CAP_SYS_ADMIN capability (e.g. setting + User=nobody), NoNewPrivileges=yes is implied. By default, no + restrictions apply, all address families are accessible to processes. If assigned the empty string, + any previous address family restriction changes are undone. This setting does not affect commands + prefixed with +. Use this option to limit exposure of processes to remote access, in particular via exotic and sensitive network protocols, such as AF_PACKET. Note that in most cases, the local - AF_UNIX address family should be included in the configured whitelist as it is frequently + AF_UNIX address family should be included in the configured allow list as it is frequently used for local communication, including for syslog2 logging. @@ -1472,9 +1578,9 @@ BindReadOnlyPaths=/var/lib/systemd any combination of: cgroup, ipc, net, mnt, pid, user and uts. Any namespace type listed is made accessible to the unit's processes, access to namespace types not listed is - prohibited (whitelisting). By prepending the list with a single tilde character (~) the + prohibited (allow-listing). By prepending the list with a single tilde character (~) the effect may be inverted: only the listed namespace types will be made inaccessible, all unlisted ones are - permitted (blacklisting). If the empty string is assigned, the default namespace restrictions are applied, + permitted (deny-listing). If the empty string is assigned, the default namespace restrictions are applied, which is equivalent to false. This option may appear more than once, in which case the namespace types are merged by OR, or by AND if the lines are prefixed with ~ (see examples below). Internally, this setting limits access to the @@ -1642,7 +1748,7 @@ RestrictNamespaces=~cgroup net mount propagation is used, but — as mentioned — as is applied first, propagation from the unit's processes to the host is still turned off. - It is not recommended to to use mount propagation for units, as this means + It is not recommended to use mount propagation for units, as this means temporary mounts (such as removable media) of the host will stay mounted and thus indefinitely busy in forked off processes, as unmount propagation events won't be received by the file system namespace of the unit. @@ -1664,15 +1770,15 @@ RestrictNamespaces=~cgroup net Takes a space-separated list of system call names. If this setting is used, all system calls executed by the unit processes except for the listed ones will result in immediate - process termination with the SIGSYS signal (whitelisting). (See + process termination with the SIGSYS signal (allow-listing). (See SystemCallErrorNumber= below for changing the default action). If the first character of the list is ~, the effect is inverted: only the listed system calls - will result in immediate process termination (blacklisting). Blacklisted system calls and system call + will result in immediate process termination (deny-listing). Deny-listed system calls and system call groups may optionally be suffixed with a colon (:) and errno error number (between 0 and 4095) or errno name such as EPERM, EACCES or EUCLEAN (see errno3 for a - full list). This value will be returned when a blacklisted system call is triggered, instead of + full list). This value will be returned when a deny-listed system call is triggered, instead of terminating the processes immediately. This value takes precedence over the one given in SystemCallErrorNumber=, see below. If running in user mode, or in system mode, but without the CAP_SYS_ADMIN capability (e.g. setting @@ -1681,7 +1787,7 @@ RestrictNamespaces=~cgroup net for enforcing a minimal sandboxing environment. Note that the execve, exit, exit_group, getrlimit, rt_sigreturn, sigreturn system calls and the system calls - for querying time and sleeping are implicitly whitelisted and do not need to be listed + for querying time and sleeping are implicitly allow-listed and do not need to be listed explicitly. This option may be specified more than once, in which case the filter masks are merged. If the empty string is assigned, the filter is reset, all prior assignments will have no effect. This does not affect commands prefixed with +. @@ -1699,12 +1805,13 @@ RestrictNamespaces=~cgroup net might be necessary to temporarily disable system call filters in order to simplify debugging of such failures. - If you specify both types of this option (i.e. whitelisting and blacklisting), the first encountered - will take precedence and will dictate the default action (termination or approval of a system call). Then the - next occurrences of this option will add or delete the listed system calls from the set of the filtered system - calls, depending of its type and the default action. (For example, if you have started with a whitelisting of - read and write, and right after it add a blacklisting of - write, then write will be removed from the set.) + If you specify both types of this option (i.e. allow-listing and deny-listing), the first + encountered will take precedence and will dictate the default action (termination or approval of a + system call). Then the next occurrences of this option will add or delete the listed system calls + from the set of the filtered system calls, depending of its type and the default action. (For + example, if you have started with an allow list rule for read and + write, and right after it add a deny list rule for write, + then write will be removed from the set.) As the number of possible system calls is large, predefined sets of system calls are provided. A set starts with @ character, followed by name of the set. @@ -1748,7 +1855,7 @@ RestrictNamespaces=~cgroup net @file-system - File system operations: opening, creating files and directories for read and write, renaming and removing them, reading file properties, or creating hard and symbolic links. + File system operations: opening, creating files and directories for read and write, renaming and removing them, reading file properties, or creating hard and symbolic links @io-event @@ -1764,7 +1871,7 @@ RestrictNamespaces=~cgroup net @memlock - Locking of memory into RAM (mlock2, mlockall2 and related calls) + Locking of memory in RAM (mlock2, mlockall2 and related calls) @module @@ -1788,7 +1895,7 @@ RestrictNamespaces=~cgroup net @process - Process control, execution, namespaceing operations (clone2, kill2, namespaces7, … + Process control, execution, namespaceing operations (clone2, kill2, namespaces7, …) @raw-io @@ -1816,11 +1923,11 @@ RestrictNamespaces=~cgroup net @sync - Synchronizing files and memory to disk: (fsync2, msync2, and related calls) + Synchronizing files and memory to disk (fsync2, msync2, and related calls) @system-service - A reasonable set of system calls used by common system services, excluding any special purpose calls. This is the recommended starting point for whitelisting system calls for system services, as it contains what is typically needed by system services, but excludes overly specific interfaces. For example, the following APIs are excluded: @clock, @mount, @swap, @reboot. + A reasonable set of system calls used by common system services, excluding any special purpose calls. This is the recommended starting point for allow-listing system calls for system services, as it contains what is typically needed by system services, but excludes overly specific interfaces. For example, the following APIs are excluded: @clock, @mount, @swap, @reboot. @timer @@ -1836,9 +1943,10 @@ RestrictNamespaces=~cgroup net systemd-analyze syscall-filter to list the actual list of system calls in each filter. - Generally, whitelisting system calls (rather than blacklisting) is the safer mode of operation. It is - recommended to enforce system call whitelists for all long-running system services. Specifically, the - following lines are a relatively safe basic choice for the majority of system services: + Generally, allow-listing system calls (rather than deny-listing) is the safer mode of + operation. It is recommended to enforce system call allow lists for all long-running system + services. Specifically, the following lines are a relatively safe basic choice for the majority of + system services: [Service] SystemCallFilter=@system-service @@ -1849,9 +1957,9 @@ SystemCallErrorNumber=EPERM call may be used to execute operations similar to what can be done with the older kill() system call, hence blocking the latter without the former only provides weak protection. Since new system calls are added regularly to the kernel as development progresses, - keeping system call blacklists comprehensive requires constant work. It is thus recommended to use - whitelisting instead, which offers the benefit that new system calls are by default implicitly - blocked until the whitelist is updated. + keeping system call deny lists comprehensive requires constant work. It is thus recommended to use + allow-listing instead, which offers the benefit that new system calls are by default implicitly + blocked until the allow list is updated. Also note that a number of system calls are required to be accessible for the dynamic linker to work. The dynamic linker is required for running most regular programs (specifically: all dynamic ELF @@ -1893,7 +2001,7 @@ SystemCallErrorNumber=EPERM manager is compiled for). If running in user mode, or in system mode, but without the CAP_SYS_ADMIN capability (e.g. setting User=nobody), NoNewPrivileges=yes is implied. By default, this option is set to the empty list, i.e. no - system call architecture filtering is applied. + filtering is applied. If this setting is used, processes of this unit will only be permitted to call native system calls, and system calls of the specified architectures. For the purposes of this option, the x32 architecture is treated @@ -2104,13 +2212,7 @@ SystemCallErrorNumber=EPERM systemd.socket5 for more details about named file descriptors and their ordering. - This setting defaults to . - - Note that services which specify and use - StandardInput= or StandardOutput= with - //, should specify - , to make sure that the tty initialization is - finished before they start. + This setting defaults to . @@ -2157,8 +2259,9 @@ SystemCallErrorNumber=EPERM AF_UNIX socket in the file system, as in that case only a single stream connection is created for both input and output. - is similar to above, but it opens the file in append mode. + is similar to + above, but it opens the file in append mode. + connects standard output to a socket acquired via socket activation. The semantics are similar to the same option of StandardInput=, see above. @@ -2324,7 +2427,9 @@ StandardInputData=SWNrIHNpdHplIGRhIHVuJyBlc3NlIEtsb3BzLAp1ZmYgZWVtYWwga2xvcHAncy so that they are automatically established prior to the unit starting up. Note that when this option is used log output of this service does not appear in the regular journalctl1 - output, unless the option is used. + output, unless the option is used. + + @@ -2495,7 +2600,7 @@ StandardInputData=SWNrIHNpdHplIGRhIHVuJyBlc3NlIEtsb3BzLAp1ZmYgZWVtYWwga2xvcHAncy UnsetEnvironment= are removed again from the compiled environment variable list, immediately before it is passed to the executed process. - The following select environment variables are set or propagated by the service manager for each invoked + The following environment variables are set or propagated by the service manager for each invoked process: @@ -2566,7 +2671,7 @@ StandardInputData=SWNrIHNpdHplIGRhIHVuJyBlc3NlIEtsb3BzLAp1ZmYgZWVtYWwga2xvcHAncy $LOGS_DIRECTORY $CONFIGURATION_DIRECTORY - Contains and absolute paths to the directories defined with + Absolute paths to the directories defined with RuntimeDirectory=, StateDirectory=, CacheDirectory=, LogsDirectory=, and ConfigurationDirectory= when those settings are used. @@ -3172,7 +3277,7 @@ StandardInputData=SWNrIHNpdHplIGRhIHVuJyBlc3NlIEtsb3BzLAp1ZmYgZWVtYWwga2xvcHAncy 242 EXIT_NUMA_POLICY - Failed to set up unit's NUMA memory policy. See NUMAPolicy= and NUMAMask=above. + Failed to set up unit's NUMA memory policy. See NUMAPolicy= and NUMAMask= above.