From: Zbigniew Jędrzejewski-Szmek Date: Tue, 6 May 2025 19:04:00 +0000 (+0200) Subject: man/systemd.exec: reword description of SystemCallFilter= X-Git-Tag: v258-rc1~675^2~1 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=802d23fcfbcacd3c33f421e0fb1bd372658beeef;p=thirdparty%2Fsystemd.git man/systemd.exec: reword description of SystemCallFilter= The existing text grew organically as features were added and was not very organized. Reorder it and break into paragraphs grouped by topic. The description of the :errno syntax is replaced by a short reference to the SystemCallErrorNumber= setting. This makes the text shorter and makes it easier to explain how the two settings combine. --- diff --git a/man/systemd.exec.xml b/man/systemd.exec.xml index c57fd2e5d3b..02b83a060f5 100644 --- a/man/systemd.exec.xml +++ b/man/systemd.exec.xml @@ -2589,40 +2589,42 @@ RestrictNamespaces=~cgroup net SystemCallFilter= - Takes a space-separated list of system call names. If this setting is used, all - system calls executed by the unit processes except for the listed ones will result in immediate - process termination with the SIGSYS signal (allow-listing). (See - SystemCallErrorNumber= below for changing the default action). If the first - character of the list is ~, the effect is inverted: only the listed system calls - will result in immediate process termination (deny-listing). Deny-listed system calls and system call - groups may optionally be suffixed with a colon (:) and errno - error number (between 0 and 4095) or errno name such as EPERM, - EACCES or EUCLEAN (see errno3 for a - full list). This value will be returned when a deny-listed system call is triggered, instead of - terminating the processes immediately. Special setting kill can be used to - explicitly specify killing. This value takes precedence over the one given in - SystemCallErrorNumber=, see below. This feature makes use of the Secure Computing Mode 2 - interfaces of the kernel ('seccomp filtering') and is useful for enforcing a minimal sandboxing environment. - Note that the execve(), exit(), exit_group(), - getrlimit(), rt_sigreturn(), sigreturn() - system calls and the system calls for querying time and sleeping are implicitly allow-listed and do not - need to be listed explicitly. This option may be specified more than once, in which case the filter masks are + Takes a space-separated list of system call names or system call groups. If this + setting is used, system calls executed by the unit processes except for the listed ones will result + in the system call being denied (allow-listing). If the first character of the list is + ~, the effect is inverted: only the listed system calls will be denied + (deny-listing). This option may be specified more than once, in which case the filter masks are merged. If the empty string is assigned, the filter is reset, all prior assignments will have no - effect. This does not affect commands prefixed with +. - - Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off - alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this + effect. + + Commands prefixed with + are not subject to filtering. The + execve(), exit(), exit_group(), + getrlimit(), rt_sigreturn(), + sigreturn() system calls and the system calls for querying time and sleeping are + implicitly allow-listed and do not need to be listed explicitly. + + The default action when a system call is denied is to terminate the processes with a + SIGSYS signal. This can changed using SystemCallErrorNumber=, + see below. In addition, deny-listed system calls and system call groups may optionally be suffixed + with a colon (:) and an argument in the same format as + SystemCallErrorNumber=, to take this action when the matching system call is made. + This takes precedence over the action specified in SystemCallErrorNumber=. + + This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp + filtering') and is useful for enforcing a minimal sandboxing environment. + + Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn + off alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this option. Specifically, it is recommended to combine this option with SystemCallArchitectures=native or similar. - Note that strict system call filters may impact execution and error handling code paths of the service - invocation. Specifically, access to the execve() system call is required for the execution - of the service binary — if it is blocked service invocation will necessarily fail. Also, if execution of the - service binary fails for some reason (for example: missing service executable), the error handling logic might - require access to an additional set of system calls in order to process and log this failure correctly. It - might be necessary to temporarily disable system call filters in order to simplify debugging of such - failures. + Note that strict system call filters may impact execution and error handling code paths of the + service invocation. Specifically, access to the execve() system call is required + for the execution of the service binary — if it is blocked service invocation will necessarily fail. + Also, if execution of the service binary fails for some reason (for example: missing service + executable), the error handling logic might require access to an additional set of system calls in + order to process and log this failure correctly. It might be necessary to temporarily disable system + call filters in order to allow debugging of such failures. If you specify both types of this option (i.e. allow-listing and deny-listing), the first encountered will take precedence and will dictate the default action (termination or approval of a @@ -2632,8 +2634,8 @@ RestrictNamespaces=~cgroup net write(), and right after it add a deny list rule for write(), then write() will be removed from the set.) - As the number of possible system calls is large, predefined sets of system calls are provided. A set - starts with @ character, followed by name of the set. + As the number of possible system calls is large, predefined groups of system calls are + provided. A group starts with @ character, followed by name of the set. Currently predefined system call sets