seccomp: mmap test results depend on kernel/libseccomp/glibc
Like with shmat already the actual results of the test
test_memory_deny_write_execute_mmap depend on kernel/libseccomp/glibc
of the platform it is running on.
There are known-good platforms, but on the others do not assert success
(which implies test has actually failed as no seccomp blocking was achieved),
but instead make the check dependent to the success of the mmap call
on that platforms.
Finally the assert of the munmap on that valid pointer should return ==0,
so that is what the check should be for in case of p != MAP_FAILED.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
At the beginning of seccomp_memory_deny_write_execute architectures
can set individual filter_syscall, block_syscall, shmat_syscall values.
The former two are then used in the call to add_seccomp_syscall_filter
but shmat_syscall is not.
Right now all shmat_syscall values are the same, so the change is a
no-op, but if ever an architecture is added/modified this would be a
subtle source for a mistake so fix it by using shmat_syscall later.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
seccomp: ensure rules are loaded in seccomp_memory_deny_write_execute
If seccomp_memory_deny_write_execute was fatally failing to load rules it
already returned a bad retval.
But if any adding filters failed it skipped the subsequent seccomp_load and
always returned an rc of 0 even if no rule was loaded at all.
Lets fix this requiring to (non fatally-failing) load at least one rule set.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Since libseccomp 2.4.2 more architectures have shmat handled as multiplexed
call. Those will fail to be added due to seccomp_rule_add_exact failing
on them since they'd need to add multiple rules [1].
See the discussion at https://github.com/seccomp/libseccomp/issues/193
After discussions about the options rejected [2][3] the initial thought of
a fallback to the non '_exact' version of the seccomp rule adding the next
option is to handle those now affected (i386, s390, s390x) the same way as
ppc which ignores and does not block shmat.
Yu Watanabe [Mon, 2 Dec 2019 15:51:44 +0000 (00:51 +0900)]
nspawn: do not fail if udev is not running
If /sys is read only filesystem, e.g., nspawn is running in container,
then usually udev is not running. In such a case, let's assume that
the interface is already initialized. Also, this makes nspawn refuse
to use the network interface which is under renaming.
sd-bus: don't include properties maked as "emit-invalidation" in InterfacesAdded signals
Properties marked this way really shouldn't be sent around willy-nilly,
that's what the flag is about, hence exclude it from InterfacesAdded
signals (and in fact anything that is a signal).
sd-bus: add new call sd_bus_message_sensitive() and SD_BUS_VTABLE_SENSITIVE
This allows marking messages that contain "sensitive" data with a flag.
If it's set then the messages are erased from memory when the message is
freed.
Similar, a flag may be set on vtable entries: incoming/outgoing message
matching the entry will then automatically be flagged this way.
This is supposed to be an easy method to mark messages containing
potentially sensitive data (such as passwords) for proper destruction.
(Note that this of course is only is as safe as the broker in between is
doing something similar. But let's at least not be the ones at fault
here.)
Stochastic Fairness Queueing is a classless queueing discipline.
SFQ does not shape traffic but only schedules the transmission of packets, based on 'flows'.
The goal is to ensure fairness so that each flow is able to send data in turn,
thus preventing any single flow from drowning out the rest.
main-func: send main exit code to parent via sd_notify() on exit
So far we silently convert negative return values from run() as
EXIT_FAILURE, which is how UNIX expects it. In many cases it would be
very useful for the caller to retrieve the actual error number we exit
with. Let's generically return that via sd_notify()'s ERRNO= attribute.
This means callers can set $NOTIFY_SOCKET and get the actual error
number delivered at their doorstep just like that.
process-util: add new safe_fork() flag for connecting stdout to stderr
This adds a new safe_fork() flag. If set the child process' fd 1 becomes
fd 2 of the caller. This is useful for invoking tools (such as various
mkfs/fsck implementations) that output status messages to stdout, but
which we invoke and don't want to pollute stdout with their output.
The actual burst limit is modified by the remaining disk space. This
isn't mentioned anywhere in the available documentation and might be a
source of surprise for an end user expecting certain behaviors.
Leonid Bloch [Sun, 1 Dec 2019 23:05:02 +0000 (01:05 +0200)]
sd-boot: Add a 0.1 second delay before key-probing for showing menu
If there is no boot menu timeout, pressing a key during boot should get
the boot menu displayed. However, on some systems the keyboard is not
initialized right away, which causes the menu to be inaccessible if no
timeout is specified.
To resolve this, if the error is "not ready" after the initial attempt of
detection, wait for 0.1 second and retry. This solves the problem
described above on all the tested systems.
The reason for just a single retry, and not retrying while "not ready",
is that some firmwares continue to return the "not ready" error on
every probe attempt if no key is pressed.
Signed-off-by: Leonid Bloch <lb.workbox@gmail.com>
sd-event: refuse sd_event_add_child() if SIGCHLD is not blocked
We already refuse sd_event_add_signal() if the specified signal is not
blocked, let's do this also for sd_event_add_child(), since we might
need signalfd() to implement this, and this means the signal needs to be
blocked.
This adds support for watching for process exits via Linux new pidfd
concept. This makes watching processes and killing them race-free if
properly used, fixing a long-standing UNIX misdesign.
This patch adds implicit and explicit pidfd support to sd-event: if a
process shall be watched and is specified by PID we will now internally
create a pidfd for it and use that, if available. Alternatively a new
constructor for child process event sources is added that takes pidfds
as input.
Besides mere watching of child processes via pidfd two additional
features are added:
→ sd_event_source_send_child_signal() allows sending a signal to the
process being watched in the safest way possible (wrapping
the new pidfd_send_signal() syscall).
→ sd_event_source_set_child_process_own() allows marking a process
watched for destruction as soon as the event source is freed. This is
currently implemented in userspace, but hopefully will become a kernel
feature eventually.
Altogether this means an sd_event_source object is now a safe and stable
concept for referencing processes in race-free way, with automatic
fallback to pre-pidfd kernels.
Note that this patch adds support for this only to sd-event, not to PID
1. That's because PID 1 needs to use waitid(P_ALL) for reaping any
process that might get reparented to it. This currently semantically
conflicts with pidfd use for watching processes since we P_ALL is
undirected and thus might reap process earlier than the pidfd notifies
process end, which is hard to handle. The kernel will likely gain a
concept for excluding specific pidfds from P_ALL watching, as soon as
that is around we can start making use of this in PID 1 too.
This is not a new system call at all (since kernel 2.2), however it's
not exposed in glibc (a wrapper is exposed however in sigqueue(), but it
substantially simplifies the system call). Since we want a nice fallback
for sending signals on non-pidfd systems for pidfd_send_signal() let's
wrap rt_sigqueueinfo() since it takes the same siginfo_t parameter.
Paul Davey [Tue, 26 Nov 2019 23:51:59 +0000 (12:51 +1300)]
udev: Ensure udev_event_spawn reads stdout
When running the program with udev_event_spawn it is possible to miss
output in stdout when the program exits causing the result to be empty
which can cause rules using the result to not function correctly.
This is due to the on_spawn_sigchld callback being processed while IO is
still pending and causing the event loop to exit.
To correct this the sigchld event source is made a lower priority than
the other event sources to ensure it is processed after IO. This
requires changing the IO event source to oneshot and re-enabling it when
valid data is read but not for EOF, this prevents the empty pipes
constantly generating IO events.
Yu Watanabe [Sat, 30 Nov 2019 06:01:06 +0000 (15:01 +0900)]
sd-netlink: support NLMSGERR_ATTR_MSG
From v4.12 the kernel appends some attributes to netlink acks
containing a textual description of the error and other fields.
This makes sd-netlink parse the attributes.
pid1: use initrd.target in the initramfs by default
This makes the code do what the documentation says. The code had no inkling
about initrd.target, so I think this change is fairly risky. As a fallback,
default.target will be loaded, so initramfses which relied on current behaviour
will still work, as along as they don't have a different initrd.target.
In an initramfs created with recent dracut:
$ ls -l usr/lib/systemd/system/{default.target,initrd.target}
lrwxrwxrwx. usr/lib/systemd/system/default.target -> initrd.target
-rw-r--r--. usr/lib/systemd/system/initrd.target
So at least for dracut, there should be no difference.
Let's make sure we get back to 100% man page documentation coverage of
our sd-event APIs. We are bad enough at the others, let's get these ones
right at least.