Michael Vogt [Thu, 1 Feb 2018 04:47:50 +0000 (05:47 +0100)]
sysusers: allow force reusing existing user/group IDs (#8037)
On Debian/Ubuntu systems the default passwd/group files use a
slightly strange mapping. E.g. in passwd:
```
man:x:6:12::/var/cache/man:/sbin/nologin
```
and in group:
```
disk:x:6:
man:x:12:
```
This is not supported in systemd-sysusers right now because
sysusers will not re-use an existing uid/gid in its normal
mode of operation. Unfortunately this reuse is needed to
replicate the default Debian/Ubuntu users/groups.
This commit enforces reuse when the "uid:gid" syntax is used
to fix this.
I also added a test that replicates the Debian base-passwd
passwd/group file to ensure things are ok.
Technically, `data` is a sequence of bytes without a trailing zero,
so the use of `memcmp` seems to be logical here. Besides, this helps get
around a bug that makes `asan` report the false positive mentioned in
#8052.
journalctl: add highlighting for matched substring
Red is used for highligting, the same as grep does. Except when the line is
highlighted red already, because it has high priority, in which case plain ansi
highlight is used for the matched substring.
Coloring is implemented for short and cat outputs, and not for other types.
I guess we could also add it for verbose output in the future.
journalctl: make matching optionally case sensitive
Case sensitive or case insensitive matching can be requested using
--case-sensitive[=yes|no].
Unless specified, matching is case sensitive if the pattern contains any
uppercase letters, and case insensitive otherwise. This matches what
forward-search does in emacs, and recently also --ignore-case in less. This
works surprisingly well, because usually when one is wants to do case-sensitive
matching, the pattern is usually camel-cased. In the less frequent case when
case-sensitive matching is required with an all-lowercase pattern,
--case-sensitive can be used to override the automatic logic.
Alan Jenkins [Sat, 27 Jan 2018 18:02:06 +0000 (18:02 +0000)]
systemd-shutdown: use log_set_prohibit_ipc(true)
Now we have log_set_prohibit_ipc(), let's use it to clarify that
systemd-shutdown is not expected to try and log via journald (which it is
about to kill). We avoided ever asking systemd-shutdown to do this, but
it's more convenient for the reader if they don't have to think about that.
In that sense, it's similar to using assert() to validate a function's
arguments.
Alan Jenkins [Fri, 26 Jan 2018 13:42:53 +0000 (13:42 +0000)]
rationalize interface for opening/closing logging
log_open_console() did not switch from stderr to /dev/console, when
"always_reopen_console" was set. It was necessary to call
log_close_console() first.
By contrast, log_open() did switch between e.g. journald and kmsg according
to the value of "prohibit_ipc".
Let's fix log_open() to respect the values of all the log options, and we
can make log_close_*() private.
Also log_close_console() is changed. There was some precaution, avoiding
closing the console fd if we are not PID 1. I think commit 48a601fe made
a little mistake in leaving this in, and it only served to confuse
readers :).
Also I changed systemd-shutdown. Now we have log_set_prohibit_ipc(), let's
use it to clarify that systemd-shutdown is not expected to try and log via
journald (which it is about to kill). We avoided ever asking it to, but
it's more convenient for the reader if they don't have to think about that.
In that sense, it's similar to using assert() to validate a function's
arguments.
journal: losen restrictions on journal file suffix (#8013)
Previously, we'd refuse open journal files with suffixes that aren't
either .journal or .journal~. With this change we only care when we are
creating the journal file.
I looked over the sources to see whether we ever pass files discovered
by directory enumeration to journal_file_open() without first checking
the suffix (in which case the old check made sense), but I couldn't find
any. hence I am pretty sure removing this check is safe.
meson: use env object instead of string in tags targets
I used 'tags' before because this way we avoided a unnecessary
line about 'env' detection. But we cannot use 'env' in test(), so
previous commit added 'env' detection. We might just as well use
it in custom_target().
This is a bit painful because a separate build of systemd is necessary. The
tests are guarded by tests!=false and slow-tests==true. Running them is not
slow, but compilation certainly is. If this proves unwieldy, we can add a
separate option controlling those builds later.
The build for each sanitizer has its own directory, and we build all fuzzer
tests there, and then pull them out one-by-one by linking into the target
position as necessary. It would be nicer to just build the desired fuzzer, but
we need to build the whole nested build as one unit.
[I also tried making systemd and nested meson subproject. This would work
nicely, but meson does not allow that because the nested target names are the
same as the outer project names. If that is ever fixed, that would be the way
to go.]
v2:
- make sure things still work if memory sanitizer is not available
v3:
- switch to syntax which works with meson 0.42.1 found in Ubuntu
Alan Jenkins [Fri, 26 Jan 2018 22:47:16 +0000 (22:47 +0000)]
pid1: when we can't log to journal, remember our fallback log target
If we have to force the logging to close the journal fd, then we can open
any fallback log target. E.g. kmsg, if the target was the default
JOURNAL_OR_KMSG.
This is the behaviour I would expect from the documentation. I couldn't
find any justification in the code, for why we would want to start dropping
log messages instead of sending them to the fallback target.
This means we will match the behaviour of processes which we fork and which
set `open_when_needed`, and with generators - which use
log_set_prohibit_ipc(true) - which we fork+exec during a reload.
IMO this illustrates that the log_open/log_close interface is too clunky.
So with the behaviour settled, I will refactor the interface in the next
commit :).
networkd: assume no link local addresses for where it isn't used
It turns out that link local doesn't make much sense in its context.
Since link local is disabled by the kernel driver, it's important that
networkd assumes it's off too, so that the link can reach the
"configured" stage, without waiting indefinitely for link local
addresses which will never come.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Björn Esser [Thu, 25 Jan 2018 14:30:15 +0000 (15:30 +0100)]
firstboot: Include <crypt.h> for declaration of crypt() if needed (#7944)
Not every target system may provide a crypt() function in its stdlibc
and may use an external or replacement library, like libxcrypt, for
providing such functions.
See https://fedoraproject.org/wiki/Changes/Replace_glibc_libcrypt_with_libxcrypt.
systemctl: load unit if needed in "systemctl is-active"
Previously, we'd explicitly use "GetUnit()" on the server side to
convert a unit name into a bus path, as that function will return an
error if the unit is not currently loaded. If we'd convert the path on
the client side, and access the unit this way directly the unit would be
loaded automatically in the background.
The old logic was done in order to minimize the effect of "is-active" on
the system, i.e. that a monoitoring command does not itself alter the
state of the system.
however, this is problematic as this can lead to confusing results if
the queried unit name is an alias that currently is not loaded: we'd
claim the unit wasn't active even though this isn't strictly true: the
unit the name is an alias for might be.
Hence, let's simplify the code, and accept that we might end up loading
a unit briefly here, and let's make "systemctl is-active" skip the
GetUnit() thing and calculate the unit path right away.
Yu Watanabe [Thu, 25 Jan 2018 08:45:53 +0000 (17:45 +0900)]
bus-util: fix format of NextElapseUSecRealtime= and LastTriggerUSec=
Before this, `systemctl show` for calendar type timer unit outputs
something like below.
```
NextElapseUSecRealtime=48y 3w 3d 15h
NextElapseUSecMonotonic=0
LastTriggerUSec=48y 3w 3d 3h 41min 44.093095s
LastTriggerUSecMonotonic=0
```
As both NextElapseUSecRealtime= and LastTriggerUSec= are not timespan
but timestamp, this makes format these values by `format_timestamp()`.
Michael Vogt [Wed, 24 Jan 2018 10:18:46 +0000 (11:18 +0100)]
test: add TEST-21-SYSUSERS test
This test tests the systemd-sysuser binary via the --root=$TESTDIR
option and ensures that for the given inputs the expected passwd
and group files will be generated.
Michael Vogt [Wed, 24 Jan 2018 08:26:51 +0000 (09:26 +0100)]
sysuser: use OrderedHashmap
This means we have more predicable behavior for "u foo uid:gid" lines
and also makes the generated files appear in the same order as the
inputs. So e.g.
```
u root 0 - /root
u daemon 1 - /usr/sbin
u games 5:60 - /usr/games
```
will generate
```
root:x:0:0::/root:/bin/sh
daemon:x:1:1::/usr/sbin:/sbin/nologin
games:x:5:60::/usr/games:/sbin/nologin
```
Michael Vogt [Tue, 23 Jan 2018 18:53:08 +0000 (19:53 +0100)]
sysusers: allow uid:gid in sysusers.conf files
This PR allows to write sysuser.conf lines like:
```
u games 5:60 -
```
This will create an a "games" user with uid 5 and games group with
gid 60. This is arguable ugly, however it is required to represent
certain configurations like the default passwd file on Debian and
Ubuntu.
When the ":" syntax is used and there is a group with the given
gid already then no new group is created. This allows writing the
following:
```
g unrelated 60
u games 5:60 -
```
which will create a "games" user with the uid 5 and the primary
gid 60. No group games is created here (might be useful for [1]).
core: rework how we count the n_on_console counter
Let's add a per-unit boolean that tells us whether our unit is currently
counted or not. This way it's unlikely we get out of sync again and
things are generally more robust.
This also allows us to remove the counting logic specific to service
units (which was in fact mostly a copy from the generic implementation),
in favour of fully generic code.
This call determines whether a specific unit currently needs access to
the console. It's a fancy wrapper around
exec_context_may_touch_console() ultimately, however for service units
we'll explicitly exclude the SERVICE_EXITED state from when we report
true.
manager: minor manager_get_show_status() simplification
Since the the whole function ultimately is just a fancy getter for the
show_status field, let's actually return it as last step literally
without an extra needless "if".
This removes LOG_TARGET_SAFE. It's made redundant by the new
"prohibit-ipc" logging flag, as it used to have a similar effect: avoid
logging to the journal/syslog, i.e. any local services in order to avoid
deadlocks when we lock from PID 1 or its utility processes (such as
generators).
All previous users of LOG_TARGET_SAFE are switched over to the new
setting. This makes things a bit safer for all, as not even the
SYSTEMD_LOG_TARGET env var can be used to accidentally log to the
journal anymore in these programs.
log: add new "prohibit_ipc" flag to logging system
If set, we'll avoid logging to any IPC log targets, i.e. syslog or the
journal, but allow stderr, kmsg, console logging.
This is useful as PID 1 wants to turn this off explicitly as long as the
journal is not up.
Previously we'd open/close the log stream to these services whenever
needed but this is incompatible with the "open_when_needed" logic
introduced in #6915, which might open the log streams whenever it likes,
including possibly inside of the child process we fork off that'll
become journald later on. Hence, let's make this all explicit, and
instead of managing when we open/close log streams add a boolean that
clearly prohibits the IPC targets when needed, so that opening can be
done at any time, but will honour this.
log: make log_set_upgrade_syslog_to_journal() take effect immediately
This doesn't matter much, and we don't rely on it, but I think it's much
nicer if we log_set_target() and log_set_upgrade_syslog_to_journal() can
be called in either order and have the same effect.
It is often the case that a file descriptor and its corresponding IO
sd_event_source share a life span. When this is the case, developers will
have to unref the event source and close the file descriptor. Instead, we
can just have the event source take ownership of the file descriptor and
close it when the event source is freed. This is especially useful when
combined with cleanup attributes and sd_event_source_unrefp().
The time-related functions in sd-event.h take as inputs constants (CLOCK_*)
defined in time.h. By including time.h in sd-event.h, we free the developer
from having to do this manually.
Apparently O_NONBLOCK is the modern name used in most documentation and
for most cases in our sources. Let's hence replace the old alias
O_NDELAY and stick to O_NONBLOCK everywhere.
tmpfiles: make "f" lines behaviour match what the documentation says
CHANGE OF BEHAVIOUR — with this commit "f" line's behaviour is altered
to match what the documentation says: if an "argument" string is
specified it is written to the file only when the file didn't exist
before. Previously, it would be appended to the file each time
systemd-tmpfiles was invoked — which is not a particularly useful
behaviour as the tool is not idempotent then and the indicated files
grow without bounds each time the tool is invoked.
I did some spelunking whether this change in behaviour would break
things, but afaics nothing relies on the previous O_APPEND behaviour of
this line type, hence I think it's relatively safe to make "f" lines
work the way the docs say, rather than adding a new modifier for it or
so.
The left side of the || expression is conditionalized on SERVICE_START,
but SERVICE_START is blanket listed on the right side anyway, hence we
can drop the left side entirely without any change in behaviour.
Moreover, if main_pid is initialized, it should be watched, hence this
is even the safe and right thing to do.
core: rework how we track which PIDs to watch for a unit
Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit
interested in SIGCHLD events for them. This scheme allowed a specific
PID to be watched by exactly 0, 1 or 2 units.
With this rework this is replaced by a single hashmap which is primarily
keyed by the PID and points to a Unit interested in it. However, it
optionally also keyed by the negated PID, in which case it points to a
NULL terminated array of additional Unit objects also interested. This
scheme means arbitrary numbers of Units may now watch the same PID.
Runtime and memory behaviour should not be impact by this change, as for
the common case (i.e. each PID only watched by a single unit) behaviour
stays the same, but for the uncommon case (a PID watched by more than
one unit) we only pay with a single additional memory allocation for the
array.
Why this all? Primarily, because allowing exactly two units to watch a
specific PID is not sufficient for some niche cases, as processes can
belong to more than one unit these days:
1. sd_notify() with MAINPID= can be used to attach a process from a
different cgroup to multiple units.
2. Similar, the PIDFile= setting in unit files can be used for similar
setups,
3. By creating a scope unit a main process of a service may join a
different unit, too.
4. On cgroupsv1 we frequently end up watching all processes remaining in
a scope, and if a process opens lots of scopes one after the other it
might thus end up being watch by many of them.
This patch hence removes the 2-unit-per-PID limit. It also makes a
couple of other changes, some of them quite relevant:
- manager_get_unit_by_pid() (and the bus call wrapping it) when there's
ambiguity will prefer returning the Unit the process belongs to based on
cgroup membership, and only check the watch-pids hashmap if that
fails. This change in logic is probably more in line with what people
expect and makes things more stable as each process can belong to
exactly one cgroup only.
- Every SIGCHLD event is now dispatched to all units interested in its
PID. Previously, there was some magic conditionalization: the SIGCHLD
would only be dispatched to the unit if it was only interested in a
single PID only, or the PID belonged to the control or main PID or we
didn't dispatch a signle SIGCHLD to the unit in the current event loop
iteration yet. These rules were quite arbitrary and also redundant as
the the per-unit handlers would filter the PIDs anyway a second time.
With this change we'll hence relax the rules: all we do now is
dispatch every SIGCHLD event exactly once to each unit interested in
it, and it's up to the unit to then use or ignore this. We use a
generation counter in the unit to ensure that we only invoke the unit
handler once for each event, protecting us from confusion if a unit is
both associated with a specific PID through cgroup membership and
through the "watch_pids" logic. It also protects us from being
confused if the "watch_pids" hashmap is altered while we are
dispatching to it (which is a very likely case).
- sd_notify() message dispatching has been reworked to be very similar
to SIGCHLD handling now. A generation counter is used for dispatching
as well.
This also adds a new test that validates that "watch_pid" registration
and unregstration works correctly.
core: unify call we use to synthesize cgroup empty events when we stopped watching any unit PIDs
This code is very similar in scope and service units, let's unify it in
one function. This changes little for service units, but for scope units
makes sure we go through the cgroup queue, which is something we should
do anyway.
service: don't send out dbus change notifications spuriously on SIGCHLD
Let's send them out only if the main or control processe exited and we
recorded a new exit status that is worth reporting. But if any other
service process died this is nothing to report since we don't expose any
properties about that anyway.
core: fix manager_get_unit_by_pid() special casing of manager PID
Previously, we'd hard map PID 1 to the manager scope unit. That's wrong
however when we are run in --user mode, as the PID 1 is outside of the
subtree we manage and the manager PID might be very differently. Correct
that by checking for getpid() rather than hardcoding 1.