journald: allow restarting journald without losing stream connections
Making use of the fd storage capability of the previous commit, allow
restarting journald by serilizing stream state to /run, and pushing open
fds to PID 1.
core: add new logic for services to store file descriptors in PID 1
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.
The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
When systemd starts a service, it first opened /run/systemd/journal/stdout
socket, and only later switched to the right user.group (if they are
specified). Later on, journald looked at the credentials, and saw
root.root, because credentials are stored at the time the socket is
opened. As a result, all messages passed over _TRANSPORT=stdout were
logged with _UID=0, _GID=0.
Drop real uid and gid temporarily to fix the issue.
machine: add reference to machine-dbus.h to Makefile.am
Commit 003dffde2c1b93 ("machined: Move image discovery logic into src/shared,
so that we can make use of it from nspawn") moved some definitions from
machine.h to a new machine-dbus.h, but did not include it in Makefile.am
Tested that `make distcheck` works after this fix.
fstab-generator: use more appropriate checks for swap and device availability
We always should use the same checks when deciding whether swap support
and mounting of devices is supported. Hence, let's make
fstab-generator's logic more similar to the usual logic we follow:
a) Look for /proc/swaps and no container support before activating
swaps.
b) Look for /sys being writable befire supporting device mounts.
journald: add some additional checks before we divide by values read from journal file headers
Since the file headers might be replaced by zeroed pages now due to
sigbus we should make sure we don't end up dividing by zero because we
don't check values read from journal file headers for changes.
journal: install sigbus handler for journal tools too
This makes them robust regarding truncation. Ideally, we'd export this
as an API, but given how messy SIGBUS handling is, and the uncertain
ownership logic of signal handlers we should not do this (unless libc
one day invents a scheme how to sanely install SIGBUS handlers for
specific memory areas only). However, for now we can still make all our
own tools robust.
Note that external tools will only have read-access to the journal
anyway, where SIGBUS is much more unlikely, given that only writes are
subject to disk full problems.
journald: process SIGBUS for the memory maps we set up
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
Peter Hutterer [Sun, 4 Jan 2015 21:26:18 +0000 (07:26 +1000)]
hwdb: revert Logitech Optical USB Mouse
Reporter says he incorrectly measured the data but the device is not available
anymore to correct it. We'll have to wait for someone else to submit the data.
-n is only allowed for root. /etc/mtab is nowadays almost always a link to /proc/,
so in practice this does not really matter too much, but should allow .mount units
to work in --user mode.
Chris Atkinson [Thu, 1 Jan 2015 02:59:16 +0000 (21:59 -0500)]
man: Clarify effect when both calendar day and date are listed in timer
See bug 87859 (https://bugs.freedesktop.org/show_bug.cgi?id=87859). Bug
reporter found the language describing the effect of specifying both a
day and date unclear; hopefully the attached patch will clarify and
allow the bug to be closed.
David Herrmann [Wed, 31 Dec 2014 15:28:48 +0000 (16:28 +0100)]
lldp: fix sd_lldp_save()
Fix a bunch of needless memzero() calls, a bunch of use-after-free
regarding _cleanup_free_ and drop unused variables.
Hint: Do NOT use _cleanup_free_ for temporary strappend() helpers that are
freed multiple times. All you safe is the last free() call, which is
really not worth the trouble resetting it to NULL all the time.
The commit in question changed a binary file. I didn't look at the diff in
particular, so I have no idea what exactly was changed. However, the file
is generated and it looked highly suspiciuous. Therefore, I reverted that
part.
Note that this is generated by "make update-unifont" so really no reason
to touch at all.
If some sleep operation was not possible (e.g. because swap is missing),
we would try twice: once through logind, which would result in a clean error:
Failed to execute operation: Sleep verb not supported
and then second time by starting the appropriate unit directly, which is
more messy. If logind tells us that something is not possible (or already
in progress), report that to the user and quit. If logind is present and working
we should not try to work around it.
Loosely based on https://bugs.freedesktop.org/show_bug.cgi?id=87832.
bus: replace ENOSYS return codes with EBADR/ENOTSUP
ENOSYS is used to signify compiled-out functionality. Using it for
different kinds of error is misleading.
For BUS_ERROR_SLEEP_VERB_NOT_SUPPORTED, logind-action.c uses ENOTSUP
already, so changing it to ENOTSUP makes the dbus and action paths
behave the same.
David Herrmann [Tue, 30 Dec 2014 10:37:35 +0000 (11:37 +0100)]
bus: add sd_bus_emit_object_{added/removed}()
This implements two new helpers, discussed on systemd-devel about 1 year
ago:
sd_bus_emit_object_added()
sd_bus_emit_object_removed()
Both calls are equivalent to their respective counterpart
sd_bus_emit_interfaces_{added/removed}(), but can figure out the list of
interfaces themselves, instead of requiring the caller to provide them.
Furthermore, both calls properly deal with builtin interfaces provided via
org.freedesktop.DBus.* and alike.
Both calls simply traverse a node and all its parent nodes to figure out a
list of all interfaces registered as vtable or fallback. It then appends
each of them, similar to the interfaces_{added/removed}() helpers.
Note that interfaces_{added/removed}() runs a parent traversal for *each*
passed interface. Therefore, it can simply bail out, once it found a
parent node that implements a given interface.
With object_{added/removed}() we cannot know the registered interfaces in
advance, thus, we cannot run one traversal per node. Instead, we run a
single traversal and remember all interfaces that we added. Therefore, a
child-interface overrides all conflicting parent-interfaces. We keep a
"Set *s" context to track those while climbing up the tree.
David Herrmann [Tue, 30 Dec 2014 08:09:41 +0000 (09:09 +0100)]
bus: fix capabilities on big-endian
The kernel provides capabilities as a u32 array, sd-bus uses an u8 array.
This works fine on little-endian as both are encoded the same way.
However, this fails on big-endian if we do not perform sufficient
byte-swapping on each u32 entry.
This patch makes sd-bus use u32, too. We avoid changing any kernel
provided data so we can keep pointing into kdbus pool buffers which
contain u32 arrays.
David Herrmann [Tue, 30 Dec 2014 07:42:53 +0000 (08:42 +0100)]
bus: drop creds->capability_size
The number of available caps can be read from
/proc/sys/kernel/cap_last_cap during runtime. Our helper cap_last_cap()
does that, so there's no reason to remember the size of any capability
cache. We can just pre-allocate arrays with a suitable size for all
available caps and reject any higher caps.
The kernel capability API uses u32 as base so make sure we do the same.
Note that this is specified by POSIX, so it's unlikely to change.
David Herrmann [Mon, 29 Dec 2014 16:51:36 +0000 (17:51 +0100)]
macro: add DIV_ROUND_UP()
This macro calculates A / B but rounds up instead of down. We explicitly
do *NOT* use:
(A + B - 1) / A
as it suffers from an integer overflow, even though the passed values are
properly tested against overflow. Our test-cases show this behavior.
Instead, we use:
A / B + !!(A % B)
Note that on "Real CPUs" this does *NOT* result in two divisions. Instead,
instructions like idivl@x86 provide both, the quotient and the remainder.
Therefore, both algorithms should perform equally well (I didn't verify
this, though).
We actually want to allow shutting down containers that use
RegisterMachine() rather than CreateMachine() to register their own
unit. It should be safe to do so, since the primary usecase for
RegisterMachine() are container managers that run only a single
container within their own unit, such as systemd-nspawn.