sd-event: refuse sd_event_add_child() if SIGCHLD is not blocked
We already refuse sd_event_add_signal() if the specified signal is not
blocked, let's do this also for sd_event_add_child(), since we might
need signalfd() to implement this, and this means the signal needs to be
blocked.
This adds support for watching for process exits via Linux new pidfd
concept. This makes watching processes and killing them race-free if
properly used, fixing a long-standing UNIX misdesign.
This patch adds implicit and explicit pidfd support to sd-event: if a
process shall be watched and is specified by PID we will now internally
create a pidfd for it and use that, if available. Alternatively a new
constructor for child process event sources is added that takes pidfds
as input.
Besides mere watching of child processes via pidfd two additional
features are added:
→ sd_event_source_send_child_signal() allows sending a signal to the
process being watched in the safest way possible (wrapping
the new pidfd_send_signal() syscall).
→ sd_event_source_set_child_process_own() allows marking a process
watched for destruction as soon as the event source is freed. This is
currently implemented in userspace, but hopefully will become a kernel
feature eventually.
Altogether this means an sd_event_source object is now a safe and stable
concept for referencing processes in race-free way, with automatic
fallback to pre-pidfd kernels.
Note that this patch adds support for this only to sd-event, not to PID
1. That's because PID 1 needs to use waitid(P_ALL) for reaping any
process that might get reparented to it. This currently semantically
conflicts with pidfd use for watching processes since we P_ALL is
undirected and thus might reap process earlier than the pidfd notifies
process end, which is hard to handle. The kernel will likely gain a
concept for excluding specific pidfds from P_ALL watching, as soon as
that is around we can start making use of this in PID 1 too.
This is not a new system call at all (since kernel 2.2), however it's
not exposed in glibc (a wrapper is exposed however in sigqueue(), but it
substantially simplifies the system call). Since we want a nice fallback
for sending signals on non-pidfd systems for pidfd_send_signal() let's
wrap rt_sigqueueinfo() since it takes the same siginfo_t parameter.
Paul Davey [Tue, 26 Nov 2019 23:51:59 +0000 (12:51 +1300)]
udev: Ensure udev_event_spawn reads stdout
When running the program with udev_event_spawn it is possible to miss
output in stdout when the program exits causing the result to be empty
which can cause rules using the result to not function correctly.
This is due to the on_spawn_sigchld callback being processed while IO is
still pending and causing the event loop to exit.
To correct this the sigchld event source is made a lower priority than
the other event sources to ensure it is processed after IO. This
requires changing the IO event source to oneshot and re-enabling it when
valid data is read but not for EOF, this prevents the empty pipes
constantly generating IO events.
Yu Watanabe [Sat, 30 Nov 2019 06:01:06 +0000 (15:01 +0900)]
sd-netlink: support NLMSGERR_ATTR_MSG
From v4.12 the kernel appends some attributes to netlink acks
containing a textual description of the error and other fields.
This makes sd-netlink parse the attributes.
pid1: use initrd.target in the initramfs by default
This makes the code do what the documentation says. The code had no inkling
about initrd.target, so I think this change is fairly risky. As a fallback,
default.target will be loaded, so initramfses which relied on current behaviour
will still work, as along as they don't have a different initrd.target.
In an initramfs created with recent dracut:
$ ls -l usr/lib/systemd/system/{default.target,initrd.target}
lrwxrwxrwx. usr/lib/systemd/system/default.target -> initrd.target
-rw-r--r--. usr/lib/systemd/system/initrd.target
So at least for dracut, there should be no difference.
Let's make sure we get back to 100% man page documentation coverage of
our sd-event APIs. We are bad enough at the others, let's get these ones
right at least.
man: drop reference to machined, add one for journald instead
We dropped documentation from sd_journal_open_container() long ago
(since the call is obsolete), hence drop the reference to machined. But
add one in for journald instead.
test-functions: make sure we use the right library path for binaries without RPATH
Meson appears to set the rpath only for some binaries it builds, but not
all. (The rules are not clear to me, but that's besides the point of
this commit).
Let's make sure if our test script operates on a binary that has no
rpath set we fall back preferably to the BUILD_DIR rather than directly
to the host.
This matters if a test uses a libsystemd symbol introduced in a version
newer than the one on the host. In that case "ldd" will not work on the
test binary if rpath is not set. With this fix that behaviour is
corrected, and "ldd" works correctly even in this case.
(Or in other words: before this fix on binaries lacking rpath we'd base
dependency info on the libraries of the host, not the buidl tree, if
they exist in both.)
Kevin Kuehler [Thu, 28 Nov 2019 00:35:15 +0000 (16:35 -0800)]
shared/ask-password-api: modify keyctl break value
We can break if KEYCTL_READ return value is equal to our buffer size.
From keyctl(2):
On a successful return, the return value is always the total size of
the payload data. To determine whether the buffer was of sufficient
size, check to see that the return value is less than or equal to the
value supplied in arg4.
Kevin Kuehler [Tue, 26 Nov 2019 19:20:14 +0000 (11:20 -0800)]
execute: Call capability_ambient_set_apply even if ambient set is 0
The function capability_ambient_set_apply() now drops capabilities not
in the capability_ambient_set(), so it is necessary to call it when
the ambient set is empty.
Kevin Kuehler [Tue, 26 Nov 2019 01:52:42 +0000 (17:52 -0800)]
test-capability: Modify ambient capability tests to test clearing caps
Change test_set_ambient_caps() to test_apply_ambient_caps(), since the
function capability_ambient_set_apply() not only sets ambient
capabilities, but clears inherited capabilities that are not explicitly
requested by the caller.
In that commit the output got moved a 2 chars to the right, hence make
sure to also shift the cgroup tree to the right, so that it gets
properly aligned under the cgroup path again.
Michal Sekletár [Wed, 27 Nov 2019 13:27:58 +0000 (14:27 +0100)]
cryptsetup: reduce the chance that we will be OOM killed
cryptsetup introduced optional locking scheme that should serialize
unlocking keyslots which use memory hard key derivation
function (argon2). Using the serialization should prevent OOM situation
in early boot while unlocking encrypted volumes.
This partially reverts a07a7324adf504381e9374d1f1a5db6edaa46435.
We have two pieces of information: the value and a boolean.
config_parse_timeout_abort() added in the reverted commit would write
the boolean to the usec_t value, making a mess.
The code is reworked to have just one implementation and two wrappers
which pass two pointers.
Martin Wilck [Tue, 26 Nov 2019 17:39:09 +0000 (18:39 +0100)]
udevd: don't use monitor after manager_exit()
If udevd receives an exit signal, it releases its reference on the udev
monitor in manager_exit(). If at this time a worker is hanging, and if
the event timeout for this worker expires before udevd exits, udevd
crashes in on_sigchld()->udev_monitor_send_device(), because the monitor
has already been freed.
Fix this by testing the validity of manager->monitor in on_sigchld().
test-fileio: cast EOF to (char) before comparing with char explicitly
EOF is defined to -1, hence on platforms that have "char" unsigned we
can't compare it as-is, except if we accept an implicit cast. let's make
it an explicit cast, acknowledging the issue.