Christian Hesse [Fri, 15 Sep 2017 19:28:24 +0000 (21:28 +0200)]
fix path in btrfs rule (#6844)
Commit 0e8856d2 (assemble multidevice btrfs volumes without external
tools (#6607)) introduced a call to udevadm. That lives in @rootbindir@,
not @rootlibexecdir@. So fix the path.
cryptsetup: make sure we invoke the cryptsetup tools with a shared keyring
We want that cryptsetup can cache keys between multiple invocations, and
it does so via the root user's user keyring, hence let's share it among
services.
core: add new per-unit setting KeyringMode= for controlling kernel keyring setup
Usually, it's a good thing that we isolate the kernel session keyring
for the various services and disconnect them from the user keyring.
However, in case of the cryptsetup key caching we actually want that
multiple instances of the cryptsetup service can share the keys in the
root user's user keyring, hence we need to be able to disable this logic
for them.
This adds KeyringMode=inherit|private|shared:
inherit: don't do any keyring magic (this is the default in systemd --user)
private: a private keyring as before (default in systemd --system)
shared: the new setting
Patrik Flykt [Mon, 21 Aug 2017 10:41:20 +0000 (13:41 +0300)]
sd-radv: Add Router Advertisement DNS Search List option
Add Router Advertisement DNS Search List option as specified
in RFC 8106. The search list option uses and identical option
header as the RDNSS option and therefore the option header
structure can be reused.
If systemd is compiled with IDNA support, internationalization
of the provided search domain is applied, after which the search
list is written in wire format into the DNSSL option.
Martin Pitt [Fri, 15 Sep 2017 07:21:49 +0000 (09:21 +0200)]
Revert "device : reload when udev generates a "changed" event" (#6836)
This reverts commit 0ffddc6e2c6e19e5dc81812aee9fbe964059f3aa. That
causes a rather severe disruption of D-Bus and other services when e. g.
restarting local-fs.target (as spotted by the "storage" test regression).
core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824)
If two separate log streams are connected to stdout and stderr, let's
make sure $JOURNAL_STREAM points to the latter, as that's the preferred
log destination, and the environment variable has been created in order
to permit services to automatically upgrade from stderr based logging to
native journal logging.
Martin Pitt [Fri, 15 Sep 2017 05:32:50 +0000 (07:32 +0200)]
cryptsetup: fix unused variable (#6833)
When building without veracrypt, gcc warns
../src/cryptsetup/cryptsetup.c:55:13: warning: ‘arg_tcrypt_veracrypt’ defined but not used [-Wunused-variable]
static bool arg_tcrypt_veracrypt = false;
networkd: add support to configure IP Rule (#5725)
Routing Policy rule manipulates rules in the routing policy database control the
route selection algorithm.
This work supports to configure Rule
```
[RoutingPolicyRule]
TypeOfService=0x08
Table=7
From= 192.168.100.18
```
```
ip rule show
0: from all lookup local
0: from 192.168.100.18 tos 0x08 lookup 7
```
V2 changes:
1. Added logic to handle duplicate rules.
2. If rules are changed or deleted and networkd restarted
then those are deleted when networkd restarts next time
Alan Jenkins [Thu, 14 Sep 2017 19:43:43 +0000 (20:43 +0100)]
units: don't kill the emergency shell when sysinit.target is triggered (#6765)
Why
---
The advantage of this is that starting sysinit.target from the emergency
shell will no longer kill the emergency shell and lock you out of the
system. Our docs already claimed that emergency.target was useful for
"starting individual units in order to continue the boot process in steps".
This resolves #6509 for my purposes.
Remaining limitation
--------------------
Starting getty.target will still kill the shell, and if you don't have a
root password you will then be locked out at that point. This is relevant
to distributions which patch the sulogin system to permit logins when the
root password is locked. Both Debian and RedHat used to follow this
behaviour! Debian have been discussing what they could replace it with at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806852
So this doesn't quite achieve perfection, but I think it's a worthwhile
change. It should be easier to understand the logic now it doesn't have
such a big hole in it. Repairing the sysinit stage of the boot is the main
reason we have emergency.target. And as discussed in the issue,
sysinit.target gets pulled in implicitly as soon as any DefaultDependencies
service is activated.
How
---
sysinit.target only needs to conflict with emergency.target. It didn't
need to conflict with emergency.service as well. In theory the conflicts
are pointless, we could just change the dependency of sysinit.target on
local-fs.target from Wants to Requires. However, doing so would mean that
when local-fs fails, the screen is flooded with yellow [DEPEND] failures.
That would hinder the poor unfortunate admin, so let's not do that.
There is no additional ordering requirement against emergency. If the
failure happens, the job for sysinit will be cancelled instantly. We don't
need to worry about when sysinit.target and its dependents would be
stopped, because sysinit waits for local-fs before it starts.
emergency.target is still necessarily stopped once we reach sysinit
(you can't express a one-way conflict in pure unit directives).
This is largely cosmetic... though perhaps it symbolizes that you're no
longer in Emergency Mode if System Initialization is successful ;-).
As a secondary advantage, the getty's which conflict on rescue.service now
need to conflict on emergency.service as well. This makes the system more
uniform and simpler to understand.
The only other effect this should have is that
`systemctl start emergency.target` is now practically the same as
`systemctl start rescue.target`. The only units this command will stop are
the conflicting getty units. Neither of those commands should ever be
used. E.g. they will not stop the gdm.service unit on Fedora 26.
Felipe Sateler [Thu, 14 Sep 2017 17:51:20 +0000 (14:51 -0300)]
shared: end string with % if one was found at the end of a expandible string (#6828)
Current behavior is that %X where X is an unidentified specifier, then the result is
the same %X string. This was not the case when the string ended with a stray %, where
the character would have not been output. Lets add that missing character.
core/manager: when running in test mode, use a temp dir for generated stuff
When running through systemd-analyze verify or with --test, we would
not run generators (environment or unit). But at the end, we would nuke
the generator dirs anyway.
Simplify things by actually running generators of both types, but redirecting
their output to a temporary directory. This has the advantage that we test more
code, and the verification is more complete.
Since now we are not touching the real generator directories, we also don't
delete them, which fixes #5609.
pid1: improve the check guarding unit_file_preset_all()
When running in systemd-analyze verify, first_boot was initialized to -1
and never changed, so we'd try to run unit_file_preset_all(). Change the
check to > 0 which is more correct. Also, add a separate test for !test_run,
since we wouldn't want to run presets even if we were in first boot
(or /etc was empty for whatever other reason).
timer: don't use persietent file timestamps from the future (#6823)
Also, use the mtime rather than the atime of the timestamp file. While
the atime is not completely wrong, the mtime appears more appropriate
as that's what we actually explicitly change, and is not effected by
mere reading.
core: don't synthesize empty list when empty string is read in config_parse_strv()
This was added to make
https://bugs.freedesktop.org/show_bug.cgi?id=62558 work, which has long
been removed, hence let's revert to the original behaviour and fully
flush out the list when an empty string is assigned.
Let's lock things down a bit, and maintain a list of what's permitted
rather than a list of what's prohibited in nspawn (also to make things a
bit more like Docker and friends).
Note that this slightly alters the effect of --system-call-filter=, as
now the negative list now takes precedence over the positive list.
However, given that the option is just a few days old and not included
in any released version it should be fine to change it at this point in
time.
Note that the whitelist is good chunk more restrictive thatn the
previous blacklist. Specifically:
- fanotify is not permitted (given the buffer size issues it's
problematic in containers)
- nfsservctl is not permitted (NFS server support is not virtualized)
- pkey_xyz stuff is not permitted (really new stuff I don't grok)
- @cpu-emulation is prohibited (untested legacy stuff mostly, and if
people really want to run dosemu in nspawn, they should use
--system-call-filter=@cpu-emulation and all should be good)
Include the waid syscalls. If we permit forking then we should also
permit waiting for a process.
Similar to that: also permit determining the usage counters for
processes.
Include calls to determine process/thread identity. They have little
impact security-wise, but are very likely used when process management
of any form is done.
Also, add rt_sigqueueinfo + rt_tgsigqueueinfo as they are similar to
kill() and friends, but permit passing along a userdata ptr.
Let's add fremovexattr which was the only xattr syscall so far missing
from the group, even though lremovexattr and friends where included.
Add inotify_init, which is an older (but still supported) version of
inotify_init1.
Add oldfstat, oldlstat, oldstat which are old versions of the stat
syscalls on some archs.
Add utime, which is an older more limited version of utimes and
utimensat.
Enclose the "statx" entry in some ifdeffery to ensure libseccomp
actually knows the syscall. If libseccomp doesn't know it, then we'd get
EINVAL rather than EDOM (which is what is returned if a syscall is known
but not available on the local system) when resolving the syscall name
and we really don't want that, as we use the EDOM vs. EINVAL check for
determining whether a syscall makes sense at all.
Let's add _llseek which is the syscall name on some archs that on others
is simply lseek (due to 64bit vs 32bit off_t confusion). Also, let's
sort things alphabetically.
Let's add more of the most basic operations to "@default" as absolute
baseline needed by glibc and such to operate. Specifically:
futex, get_robust_list, get_thread_area, membarrier, set_robust_list,
set_thread_area, set_tid_address are all required to properly implement
mutexes and other thread synchronization logic. Given that a ton of
datastructures are protected by mutexes (such as stdio and such), let's
just whitelist this by default, so that things can just work.
restart_syscall is used to implement EAGAIN SA_RESTART stuff in some
archs, and synthesized by the kernel without any explicit user logic,
hence let's make this work out of the box.
core: rework how we treat specifiers in Environment= of transient units
Let's validate the data passed in after resolving specifiers, but let's
write out to the unit snippet the list without specifiers applied. This
way the pre-existing comment actually starts matching what is actually
implemented.
core: support specifier expansion in PassEnvironment=
I can't come up with any usecase for this, but let's add this here, to
match what we support for Environment=. It's kind surprising if we
support specifier expansion for some environment related settings, but
not for others.
core: add new UnsetEnvironment= setting for unit files
With this setting we can explicitly unset specific variables for
processes of a unit, as last step of assembling the environment block
for them. This is useful to fix #6407.
While we are at it, greatly expand the documentation on how the
environment block for forked off processes is assembled.
man: unify titling, fix description of precedence in sysusers.d(5)
Fixes #6639.
(This behaviour of systemd-sysusers is long established, so it's better
to adjust the documentation rather than change the code. If there are any
situations out there where it matters, users must have adjusted to the
current behaviour.)
nss-systemd,sysusers: make sure sysusers doesn't get confused by nss-systemd (#6812)
In nss-systemd we synthesize user entries for "nobody" and "root", as
fallback if we boot up with an entirely empty /etc. This is supposed to
be a fallback only though, and it's intended that both users exists
regularly in /etc/passwd + /etc/group. Before this patch
systemd-sysusers would never create the entries however as it notices
the synthetic entries. Let's add a way how systemd-sysusers can tell
nss-systemd not to synthesize the entries for itself.
man: rework grammatical form of sentences in a table in systemd.exec(5)
"Currently, the following values are defined: xxx: in case <condition>" is
awkward because "xxx" is always defined unconditionally. It is _used_ in case
<condition> is true. Correct this and a bunch of other places where the
sentence structure makes it unclear what is the subject of the sentence.
As it turns out the authentication phase times out too often than is
good, mostly due to PRNG pools not being populated during boot. Hence,
let's increase the authentication timeout from 25s to 90s, to cover for
that.
(Note that we leave the D-Bus method call timeout at 25s, matching the
reference implementation's value. And if the auth phase managed to
complete then the pools should be populated enough and mehtod calls
shouldn't take needlessly long anymore).
pager: let's create pager fds with O_CLOEXEC first
We make copies (without O_CLOEXEC) of the fds anyway before using them,
hence let's be safe and create them with O_CLOEXEC first, so that we
don't run into issues should pager_open() be called in a threaded
environment where another thread fork()s at the wrong time and ends up
with fds not marked O_CLOEXEC.
main: skip many initialization steps when running in --test mode
Most importantly, don't collect open socket activation fds when in
--test mode. This specifically created a problem because we invoke
pager_open() beforehand (which these days makes copies of the original
stdout/stderr in order to be able to restore them when the pager goes
away) and we might mistakenly the fd copies it creates as socket
activation fds.
run: add new --pipe option for including "systemd-run" commands in shell pipelines
In this mode, we'll directly connect stdin/stdout/stderr of the invoked
service with whatever systemd-run itself is invoked on. This allows
inclusion of "systemd-run" commands in shell pipelines, as unlike
"--pty" this means EOF of stdin/stdout/stderr are propagated
independently.
If --pty and --pipe are combined systemd-run will automatically pick the
right choice for the context it is invoked in, i.e. --pty when invoked
on a TTY, and --pipe otherwise.
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.
This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.
(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)
Previously we were loading kernel modules on all device events save
for "remove". With the introduction of KOBJ_BIND/KOBJ_UNBIND this causes
issues, as driver modules that have devices bound to their drivers get
immediately reloaded, and it appears to the user that module unloading
does not work.
Let's change the rules to only load modules on "add" events instead.
seccomp: split out inner loop code of seccomp_add_syscall_filter_set()
Let's add a new helper function seccomp_add_syscall_filter_item() that
contains the inner loop code of seccomp_add_syscall_filter_set(). This
helper function we can then export and make use of elsewhere.