Make fopen_temporary and fopen_temporary_label unlocked
This is partially a refactoring, but also makes many more places use
unlocked operations implicitly, i.e. all users of fopen_temporary().
AFAICT, the uses are always for short-lived files which are not shared
externally, and are just used within the same context. Locking is not
necessary.
seccomp: check more error codes from seccomp_load()
We noticed in our tests that occasionally SystemCallFilter= would
fail to set and the service would run with no syscall filtering.
Most of the time the same tests would apply the filter and fail
the service as expected. While it's not totally clear why this happens,
we noticed seccomp_load() in the systemd code base would fail open for
all errors except EPERM and EACCES.
ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the
error set based on what I gather from libseccomp code and man pages:
-ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant
-EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl())
-EFAULT: addresses passed as args are invalid
The comment explains that $PATH might not be set in certain circumstances and
takes steps to handle this case. If we do that, let's assume that $PATH indeed
might be unset and not call setenv("PATH", NULL, 1). It is not clear from the
man page if that is allowed.
We had all kinds of indentation: 2 sp, 3 sp, 4 sp, 8 sp, and mixed.
4 sp was the most common, in particular the majority of scripts under test/
used that. Let's standarize on 4 sp, because many commandlines are long and
there's a lot of nesting, and with 8sp indentation less stuff fits. 4 sp
also seems to be the default indentation, so this will make it less likely
that people will mess up if they don't load the editor config. (I think people
often use vi, and vi has no support to load project-wide configuration
automatically. We distribute a .vimrc file, but it is not loaded by default,
and even the instructions in it seem to discourage its use for security
reasons.)
Also remove the few vim config lines that were left. We should either have them
on all files, or none.
Also remove some strange stuff like '#!/bin/env bash', yikes.
Media Access Control Security (MACsec) is an 802.1AE IEEE
industry-standard security technology that provides secure
communication for all traffic on Ethernet links.
MACsec provides point-to-point security on Ethernet links between
directly connected nodes and is capable of identifying and preventing
most security threats, including denial of service, intrusion,
man-in-the-middle, masquerading, passive wiretapping, and playback attacks.
Jonathan Lebon [Wed, 10 Apr 2019 21:28:15 +0000 (17:28 -0400)]
fstab-generator: use DefaultDependencies=no for /sysroot mounts
Otherwise we can end up with an ordering cycle. Since d54bab90, all
local mounts now gain a default `Before=local-fs.target` dependency.
This doesn't make sense for `/sysroot` mounts in the initrd though,
since those happen later in the boot process.
bus-message: validate signature in gvariant messages
We would accept a message with 40k signature and spend a lot of time iterating
over the nested arrays. Let's just reject it early, as we do for !gvariant
messages.
This is modelled after the existing ERRNO_IS_RESOURCES() and in
particular ERRNO_IS_DISCONNECT(). It returns true for all transient
network errors that should be handled like EAGAIN whenever we call
accept() or accept4(). This is per documentation in the accept(2) man
page that explicitly says to do so in the its "RETURN VALUE" section.
The error list we cover is a bit more comprehensive, and based on
existing code of ours. For example EINTR is included too (since we need
that to cover cases where we call accept()/accept4() on a blocking
socket), and of course ERRNO_IS_DISCONNECT() is a bit more comprehensive
than the list in the man page too.
errno-util: rework ERRNO_IS_RESOURCE() from macro into static inline function
No technical reason, except that later on we want to add a new
ERRNO_IS() which uses the parameter twice and where we want to avoid
double evaluation, and where we'd like to keep things in the same style.
Let's experiment with this in systemd upstream first. Downstreams and
users can after all still comment this easily.
Besides compat concern the most often heard issue with such high PIDs is
usability, since they are potentially hard to type. I am not entirely sure though
whether 4194304 (as largest new PID) is that much worse to type or to
copy than 65563.
This should also simplify management of per system tasks limits as by
this move the sysctl /proc/sys/kernel/threads-max becomes the primary
knob to control how many processes to have in parallel.
core: implement OOMPolicy= and watch cgroups for OOM killings
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).
On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".
All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
Let's rename the .cgroup_inotify_wd field of the Unit object to
.cgroup_control_inotify_wd. Let's similarly rename the hashmap
.cgroup_inotify_wd_unit of the Manager object to
.cgroup_control_inotify_wd_unit.
Why? As preparation for a later commit that allows us to watch the
"memory.events" cgroup attribute file in addition to the "cgroup.events"
file we already watch with the fields above. In that later commit we'll
add new fields "cgroup_memory_inotify_wd" to Unit and
"cgroup_memory_inotify_wd_unit" to Manager, that are used to watch these
other events file.
So far the priorities for cgroup empty event handling were pretty weird.
The raw events (on cgroupsv2 from inotify, on cgroupsv1 from the agent
dgram socket) where scheduled at a lower priority than the cgroup empty
queue dispatcher. Let's swap that and ensure that we can coalesce events
more agressively: let's process the raw events at higher priority than
the cgroup empty event (which remains at the same prio).