Alan Jenkins [Sat, 27 Jan 2018 18:02:06 +0000 (18:02 +0000)]
systemd-shutdown: use log_set_prohibit_ipc(true)
Now we have log_set_prohibit_ipc(), let's use it to clarify that
systemd-shutdown is not expected to try and log via journald (which it is
about to kill). We avoided ever asking systemd-shutdown to do this, but
it's more convenient for the reader if they don't have to think about that.
In that sense, it's similar to using assert() to validate a function's
arguments.
Alan Jenkins [Fri, 26 Jan 2018 13:42:53 +0000 (13:42 +0000)]
rationalize interface for opening/closing logging
log_open_console() did not switch from stderr to /dev/console, when
"always_reopen_console" was set. It was necessary to call
log_close_console() first.
By contrast, log_open() did switch between e.g. journald and kmsg according
to the value of "prohibit_ipc".
Let's fix log_open() to respect the values of all the log options, and we
can make log_close_*() private.
Also log_close_console() is changed. There was some precaution, avoiding
closing the console fd if we are not PID 1. I think commit 48a601fe made
a little mistake in leaving this in, and it only served to confuse
readers :).
Also I changed systemd-shutdown. Now we have log_set_prohibit_ipc(), let's
use it to clarify that systemd-shutdown is not expected to try and log via
journald (which it is about to kill). We avoided ever asking it to, but
it's more convenient for the reader if they don't have to think about that.
In that sense, it's similar to using assert() to validate a function's
arguments.
Alan Jenkins [Fri, 26 Jan 2018 22:47:16 +0000 (22:47 +0000)]
pid1: when we can't log to journal, remember our fallback log target
If we have to force the logging to close the journal fd, then we can open
any fallback log target. E.g. kmsg, if the target was the default
JOURNAL_OR_KMSG.
This is the behaviour I would expect from the documentation. I couldn't
find any justification in the code, for why we would want to start dropping
log messages instead of sending them to the fallback target.
This means we will match the behaviour of processes which we fork and which
set `open_when_needed`, and with generators - which use
log_set_prohibit_ipc(true) - which we fork+exec during a reload.
IMO this illustrates that the log_open/log_close interface is too clunky.
So with the behaviour settled, I will refactor the interface in the next
commit :).
Björn Esser [Thu, 25 Jan 2018 14:30:15 +0000 (15:30 +0100)]
firstboot: Include <crypt.h> for declaration of crypt() if needed (#7944)
Not every target system may provide a crypt() function in its stdlibc
and may use an external or replacement library, like libxcrypt, for
providing such functions.
See https://fedoraproject.org/wiki/Changes/Replace_glibc_libcrypt_with_libxcrypt.
systemctl: load unit if needed in "systemctl is-active"
Previously, we'd explicitly use "GetUnit()" on the server side to
convert a unit name into a bus path, as that function will return an
error if the unit is not currently loaded. If we'd convert the path on
the client side, and access the unit this way directly the unit would be
loaded automatically in the background.
The old logic was done in order to minimize the effect of "is-active" on
the system, i.e. that a monoitoring command does not itself alter the
state of the system.
however, this is problematic as this can lead to confusing results if
the queried unit name is an alias that currently is not loaded: we'd
claim the unit wasn't active even though this isn't strictly true: the
unit the name is an alias for might be.
Hence, let's simplify the code, and accept that we might end up loading
a unit briefly here, and let's make "systemctl is-active" skip the
GetUnit() thing and calculate the unit path right away.
Yu Watanabe [Thu, 25 Jan 2018 08:45:53 +0000 (17:45 +0900)]
bus-util: fix format of NextElapseUSecRealtime= and LastTriggerUSec=
Before this, `systemctl show` for calendar type timer unit outputs
something like below.
```
NextElapseUSecRealtime=48y 3w 3d 15h
NextElapseUSecMonotonic=0
LastTriggerUSec=48y 3w 3d 3h 41min 44.093095s
LastTriggerUSecMonotonic=0
```
As both NextElapseUSecRealtime= and LastTriggerUSec= are not timespan
but timestamp, this makes format these values by `format_timestamp()`.
core: rework how we count the n_on_console counter
Let's add a per-unit boolean that tells us whether our unit is currently
counted or not. This way it's unlikely we get out of sync again and
things are generally more robust.
This also allows us to remove the counting logic specific to service
units (which was in fact mostly a copy from the generic implementation),
in favour of fully generic code.
This call determines whether a specific unit currently needs access to
the console. It's a fancy wrapper around
exec_context_may_touch_console() ultimately, however for service units
we'll explicitly exclude the SERVICE_EXITED state from when we report
true.
manager: minor manager_get_show_status() simplification
Since the the whole function ultimately is just a fancy getter for the
show_status field, let's actually return it as last step literally
without an extra needless "if".
This removes LOG_TARGET_SAFE. It's made redundant by the new
"prohibit-ipc" logging flag, as it used to have a similar effect: avoid
logging to the journal/syslog, i.e. any local services in order to avoid
deadlocks when we lock from PID 1 or its utility processes (such as
generators).
All previous users of LOG_TARGET_SAFE are switched over to the new
setting. This makes things a bit safer for all, as not even the
SYSTEMD_LOG_TARGET env var can be used to accidentally log to the
journal anymore in these programs.
log: add new "prohibit_ipc" flag to logging system
If set, we'll avoid logging to any IPC log targets, i.e. syslog or the
journal, but allow stderr, kmsg, console logging.
This is useful as PID 1 wants to turn this off explicitly as long as the
journal is not up.
Previously we'd open/close the log stream to these services whenever
needed but this is incompatible with the "open_when_needed" logic
introduced in #6915, which might open the log streams whenever it likes,
including possibly inside of the child process we fork off that'll
become journald later on. Hence, let's make this all explicit, and
instead of managing when we open/close log streams add a boolean that
clearly prohibits the IPC targets when needed, so that opening can be
done at any time, but will honour this.
log: make log_set_upgrade_syslog_to_journal() take effect immediately
This doesn't matter much, and we don't rely on it, but I think it's much
nicer if we log_set_target() and log_set_upgrade_syslog_to_journal() can
be called in either order and have the same effect.
It is often the case that a file descriptor and its corresponding IO
sd_event_source share a life span. When this is the case, developers will
have to unref the event source and close the file descriptor. Instead, we
can just have the event source take ownership of the file descriptor and
close it when the event source is freed. This is especially useful when
combined with cleanup attributes and sd_event_source_unrefp().
The time-related functions in sd-event.h take as inputs constants (CLOCK_*)
defined in time.h. By including time.h in sd-event.h, we free the developer
from having to do this manually.
Apparently O_NONBLOCK is the modern name used in most documentation and
for most cases in our sources. Let's hence replace the old alias
O_NDELAY and stick to O_NONBLOCK everywhere.
tmpfiles: make "f" lines behaviour match what the documentation says
CHANGE OF BEHAVIOUR — with this commit "f" line's behaviour is altered
to match what the documentation says: if an "argument" string is
specified it is written to the file only when the file didn't exist
before. Previously, it would be appended to the file each time
systemd-tmpfiles was invoked — which is not a particularly useful
behaviour as the tool is not idempotent then and the indicated files
grow without bounds each time the tool is invoked.
I did some spelunking whether this change in behaviour would break
things, but afaics nothing relies on the previous O_APPEND behaviour of
this line type, hence I think it's relatively safe to make "f" lines
work the way the docs say, rather than adding a new modifier for it or
so.
The left side of the || expression is conditionalized on SERVICE_START,
but SERVICE_START is blanket listed on the right side anyway, hence we
can drop the left side entirely without any change in behaviour.
Moreover, if main_pid is initialized, it should be watched, hence this
is even the safe and right thing to do.
core: rework how we track which PIDs to watch for a unit
Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit
interested in SIGCHLD events for them. This scheme allowed a specific
PID to be watched by exactly 0, 1 or 2 units.
With this rework this is replaced by a single hashmap which is primarily
keyed by the PID and points to a Unit interested in it. However, it
optionally also keyed by the negated PID, in which case it points to a
NULL terminated array of additional Unit objects also interested. This
scheme means arbitrary numbers of Units may now watch the same PID.
Runtime and memory behaviour should not be impact by this change, as for
the common case (i.e. each PID only watched by a single unit) behaviour
stays the same, but for the uncommon case (a PID watched by more than
one unit) we only pay with a single additional memory allocation for the
array.
Why this all? Primarily, because allowing exactly two units to watch a
specific PID is not sufficient for some niche cases, as processes can
belong to more than one unit these days:
1. sd_notify() with MAINPID= can be used to attach a process from a
different cgroup to multiple units.
2. Similar, the PIDFile= setting in unit files can be used for similar
setups,
3. By creating a scope unit a main process of a service may join a
different unit, too.
4. On cgroupsv1 we frequently end up watching all processes remaining in
a scope, and if a process opens lots of scopes one after the other it
might thus end up being watch by many of them.
This patch hence removes the 2-unit-per-PID limit. It also makes a
couple of other changes, some of them quite relevant:
- manager_get_unit_by_pid() (and the bus call wrapping it) when there's
ambiguity will prefer returning the Unit the process belongs to based on
cgroup membership, and only check the watch-pids hashmap if that
fails. This change in logic is probably more in line with what people
expect and makes things more stable as each process can belong to
exactly one cgroup only.
- Every SIGCHLD event is now dispatched to all units interested in its
PID. Previously, there was some magic conditionalization: the SIGCHLD
would only be dispatched to the unit if it was only interested in a
single PID only, or the PID belonged to the control or main PID or we
didn't dispatch a signle SIGCHLD to the unit in the current event loop
iteration yet. These rules were quite arbitrary and also redundant as
the the per-unit handlers would filter the PIDs anyway a second time.
With this change we'll hence relax the rules: all we do now is
dispatch every SIGCHLD event exactly once to each unit interested in
it, and it's up to the unit to then use or ignore this. We use a
generation counter in the unit to ensure that we only invoke the unit
handler once for each event, protecting us from confusion if a unit is
both associated with a specific PID through cgroup membership and
through the "watch_pids" logic. It also protects us from being
confused if the "watch_pids" hashmap is altered while we are
dispatching to it (which is a very likely case).
- sd_notify() message dispatching has been reworked to be very similar
to SIGCHLD handling now. A generation counter is used for dispatching
as well.
This also adds a new test that validates that "watch_pid" registration
and unregstration works correctly.
core: unify call we use to synthesize cgroup empty events when we stopped watching any unit PIDs
This code is very similar in scope and service units, let's unify it in
one function. This changes little for service units, but for scope units
makes sure we go through the cgroup queue, which is something we should
do anyway.
service: don't send out dbus change notifications spuriously on SIGCHLD
Let's send them out only if the main or control processe exited and we
recorded a new exit status that is worth reporting. But if any other
service process died this is nothing to report since we don't expose any
properties about that anyway.
core: fix manager_get_unit_by_pid() special casing of manager PID
Previously, we'd hard map PID 1 to the manager scope unit. That's wrong
however when we are run in --user mode, as the PID 1 is outside of the
subtree we manage and the manager PID might be very differently. Correct
that by checking for getpid() rather than hardcoding 1.
tmpfiles: create parent directories if they are missing for more line types
Currently, we create leading directories implicitly for all lines that
create directory or directory-like nodes.
With this, we also do the same for a number of other lines: f/F, C, p,
L, c/b (that is regular files, pipes, symlinks, device nodes as well as
file trees we copy).
The leading directories are created with te default access mode of 0755.
If something else is desired, users should simply declare appropriate
"d" lines.
pid1: rework how we dispatch SIGCHLD and other signals
This fundamentally makes one change: we never process more than one
signal or more than one waitid() event per event loop. We'll never tight
loop around waitid() or around read() on our signalfd instead, but
always return to the main event loop after processing one event.
By doing this we put the event priorization handling into full power
again, as we'll always check for higher priority events before looking
at the next signal or waitid() again.
This introduces a new "defer" event source "sigchld_event". It's enabled
as soon as we see SIGCHLD, and disabled as soon as waitid() reported no
further children pending. It's running at a relatively high priority,
one step higher than signal handling itself, but lower than
/proc/self/mountinfo event handling, so that the latter always takes
precedence.
Since we want to process sd_notify() events at an even higher priority
than SIGCHLD (as before) it is moved one priority step up, too.
mount,swap: write event loop priority as "SD_EVENT_PRIORITY_NORMAL-x"
We do that in all other cases, let's do it here too. Since
SD_EVENT_PRIORITY_NORMAL evaluates to zero there's zero effective
difference, but it makes things easier to grok and grep for if we always
express relative priorities within PID 1 only.
manager: split out send_ready and basic.target checking into functions of their own
Let's shorten manager_check_finished() a bit by splitting out checking
of basic.target and the two things we do when we reach it.
This should not change behaviour, except for one thing: we now check
basic.target's actual state for figuring out whether it is up, instead
of generically checking whether it has any job queued. This is arguably
more correct, and is what other code does too for similar purposes, for
example manager_state()
Currently, sd-bus supports the ability to have thread-local default busses.
However, this is less useful than it can be since all functions which
require an sd_bus* as input require the caller to pass it. This patch adds
a new macro which allows the developer to pass a constant SD_BUS_DEFAULT,
SD_BUS_DEFAULT_USER or SD_BUS_DEFAULT_SYSTEM instead. This reduces work for
the caller.
For example:
r = sd_bus_default(&bus);
r = sd_bus_call_method(bus, ...);
sd_bus_unref(bus);
Becomes:
r = sd_bus_call_method(SD_BUS_DEFAULT, ...);
If the specified thread-local default bus does not exist, the function
calls will return -ENOPKG. No bus will ever be implicitly created.
Currently, sd-event supports the ability to have a thread-local default
event loop. However, this is less useful than it can be since all functions
which require an sd_event* as input require the caller to pass it. This
patch adds a new macro which allows the developer to pass a constant
SD_EVENT_DEFAULT instead. This reduces work for the caller.
For example:
r = sd_event_default(&e);
r = sd_event_add_io(e, ...);
sd_event_unref(e);
Becomes:
r = sd_event_add_io(SD_EVENT_DEFAULT, ...);
If no thread-local default event loop exists, the function calls will
return -ENOPKG. No event loop will ever be implicitly created.
The DHCPv6 client will set its state to DHCP6_STATE_STOPPED if
an error occurs or when receiving an Information Reply DHCPv6
message. Once in DHCP6_STATE_STOPPED, the DHCPv6 client needs
to be restarted by calling sd_dhcp6_client_start().
As of pull request #7796 client_reset() no longer closes the
network socket, thus a call to sd_dhcp6_client_start() needs to
check whether the file descriptor already exists in order not to
create a new one. Likewise, a call to sd_dhcp6_client_unref()
must now close the network socket as client_reset() is not
closing it.
Alan Jenkins [Sat, 20 Jan 2018 20:12:09 +0000 (20:12 +0000)]
mount: don't consider activated until /sbin/mount returns
So far, we considered mount units activated as soon as the mount
appeared. This avoided seeing a difference between mounts started by
systemd, and e.g. by running `mount` from a terminal.
(`umount` was not handled this way).
However in some cases, options passed to `mount` require additional
system calls after the mount is successfully created. E.g. the
`private` mount option, or the `ro` option on bind mounts.
It seems best to wait for mount to finish doing that. E.g. in
the `private` case, the current behaviour could theoretically cause
non-deterministic results, as child mounts inherit the
private/shared propagation setting from their parent.
This also avoids a special case in mount_reload().
Alan Jenkins [Mon, 22 Jan 2018 17:42:25 +0000 (17:42 +0000)]
mount: clarify that umount retries do not (anymore) allow multiple timeouts
It _looks_ as if, back when we used to retry unsuccessful calls to umount,
this would have inflated the effective timeout. Multiplying it by
RETRY_UMOUNT_MAX. Which is set to 32.
I'm surprised if it's true: I would have expected it to be noticed during
the work on NFS timeouts. But I can't see what would have stopped it.
Clarify that I do not expect this to happen anymore. I think each
individual umount call is allowed up to the full timeout, but if umount
ever exited with a signal status, we would stop retrying.
To be extra clear, make sure that we do not retry in the event that umount
perversely returned EXIT_SUCCESS after receiving SIGTERM.
Alan Jenkins [Sat, 20 Jan 2018 20:05:52 +0000 (20:05 +0000)]
mount: mountinfo event is supposed to always arrive before SIGCHLD
"Due to the io event priority logic we can be sure the new mountinfo is
loaded before we process the SIGCHLD for the mount command."
I think this is a reasonable expectation. But if it works, then the
other comment must be false:
"Note that mount(8) returning and the kernel sending us a mount table
change event might happen out-of-order."
Therefore we can clean up the code for the latter.
If this is working as advertised, then we can make sure that mount units
fail if the mount we thought we were creating did not actually appear,
due to races or trickery (or because /sbin/mount did something unexpected
despite returning EXIT_SUCCESS).
Include a specific warning message for this failure.
If we give up when the mount point is still mounted after 32 successful
calls to /sbin/umount, that seems a fairly similar case. So make that
message a LOG_WARN as well (not LOG_DEBUG). Also, this was recently changed to only
retry while umount is returning EXIT_SUCCESS; in that case in particular
there would be no other messages in the log to suggest what had happened.
Martin Pitt [Mon, 22 Jan 2018 20:17:08 +0000 (21:17 +0100)]
hwdb: map zoomin/out keys to up/down
Some keyboards come with a zoom see-saw or rocker which until now got
mapped to the Linux "zoomin/out" keys in hwdb. However, these keycodes
are not recognized by any major desktop. They now produce Up/Down key
events so that they can be used for scrolling.
The internet is full of instructions how to "unbreak" these keys, e. g.
ott [Tue, 23 Jan 2018 00:53:31 +0000 (01:53 +0100)]
resolve: Adjust and unify D-Bus call timeout (#7847)
DNS queries have a timeout of DNS_TRANSACTION_ATTEMPTS_MAX *
DNS_TIMEOUT_MAX_USEC = 120 s. Calls to the ResolveHostname method of
the org.freedesktop.resolve1.Manager interface have various call
timeouts that are smaller than 120 s. So it seems correct to adjust
the call timeout to the maximum query timeout and to unify the call
timeout among all callers.
A timeout of 120 s might seem large, in particular since BIND does seem
to have a query timeout of 10 s. However, it seems match the timeout
value of 120 s of Unbound. Moreover, the query and timeout handling of
resolve have problems and might be improved in the future, so this
change is at best an interim solution.