nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.
Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:
1. Support for "masking" files and directories. This functionatly is now
also available via the new --inaccesible= cmdline command, and
Inaccessible= in .nspawn files.
2. Support for mounting arbitrary file systems. (not exposed through
nspawn cmdline nor .nspawn files, because probably not a good idea)
3. Ability to configure the console settings for a container. This
functionality is now also available on the nspawn cmdline in the new
--console= switch (not added to .nspawn for now, as it is something
specific to the invocation really, not a property of the container)
4. Console width/height configuration. Not exposed through
.nspawn/cmdline, but this may be controlled through $COLUMNS and
$LINES like in most other UNIX tools.
5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
the cmdline, since containers likely have different user tables, and
the existing --user= switch appears to be the better option)
6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
OCI)
7. Creation of additional devices nodes in /dev. Most likely not a good
idea, hence not exposed in .nspawn/cmdline. There's already --bind=
to achieve the same, which is the better alternative.
8. Explicit syscall filters. This is not a good idea, due to the skewed
arch support, hence not exposed through .nspawn/cmdline.
9. Configuration of some sysctls on a whitelist. Questionnable, not
supported in .nspawn/cmdline for now.
10. Configuration of all 5 types of capabilities. Not a useful concept,
since the kernel will reduce the caps on execve() anyway. Not
exposed through .nspawn/cmdline as this is not very useful hence.
Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.
Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.
There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.
capability: let's protect against the kernel eventually doing more than 64 caps
Everyone will be in trouble then (as quite widely caps are store in
64bit fields). But let's protect ourselves at least to the point that we
ignore all higher caps for now.
capability: keep CAP_SETPCAP while dropping bounding caps
The kernel only allows dropping bounding caps as long as we have
CAP_SETPCAP. Hence, let's keep that before dropping the bounding caps,
and afterwards drop them too.
man: reorder and add examples to systemd-analyze(1)
The number of verbs supported by systemd-analyze has grown quite a bit, and the
man page has become an unreadable wall of text. Let's put each verb in a
separate subsection, grouping similar verbs together, and add a lot of examples
to guide the user.
Frantisek Sumsal [Fri, 15 Mar 2019 09:05:33 +0000 (10:05 +0100)]
test: use PBKDF2 instead of Argon2 in cryptsetup...
to reduce memory requirements for volume manipulation. Also,
to further improve the test performance, reduce number of PBKDF
iterations to 1000 (allowed minimum).
tests: install /usr/bin/dbus-broker when using dbus-broker
We'd install the service file, and then dbus-broker-launcher because it is
mentioned in ExecStart=, but not the main executable, so nothing would work.
Let's just install dbus-broker executables if found. They are small, so this
doesn't matter much, and is much easier than figuring the exact conditions
under which dbus-broker will be used instead of dbus-daemon.
The "include" files had type "book" for some raeason. I don't think this
is meaningful. Let's just use the same everywhere.
$ perl -i -0pe 's^..DOCTYPE (book|refentry) PUBLIC "-//OASIS//DTD DocBook XML V4.[25]//EN"\s+"http^<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"\n "http^gms' man/*.xml
David Rheinsberg [Thu, 14 Mar 2019 12:34:13 +0000 (13:34 +0100)]
sd-bus: skip sending formatted UIDs via SASL
The dbus external authentication takes as optional argument the UID the
sender wants to authenticate as. This uid is purely optional. The
AF_UNIX socket already conveys the same information through the
auxiliary socket data, so we really don't have to provide that
information.
Unfortunately, there is no way to send empty arguments, since they are
interpreted as "missing argument", which has a different meaning. The
SASL negotiation thus changes from:
AUTH EXTERNAL <uid>
NEGOTIATE_UNIX_FD (optional)
BEGIN
to:
AUTH EXTERNAL
DATA
NEGOTIATE_UNIX_FD (optional)
BEGIN
And thus the replies we expect as a client change from:
OK <server-id>
AGREE_UNIX_FD (optional)
to:
DATA
OK <server-id>
AGREE_UNIX_FD (optional)
Since the old sd-bus server implementation used the wrong reply for
"AUTH" requests that do not carry the arguments inlined, we decided to
make sd-bus clients accept this as well. Hence, sd-bus now allows
"OK <server-id>\r\n" replies instead of "DATA\r\n" replies.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
David Rheinsberg [Thu, 14 Mar 2019 12:33:28 +0000 (13:33 +0100)]
sd-bus: fix SASL reply to empty AUTH
The correct way to reply to "AUTH <protocol>" without any payload is to
send "DATA" rather than "OK". The "DATA" reply triggers the client to
respond with the requested payload.
In fact, adding the data as hex-encoded argument like
"AUTH <protocol> <hex-data>" is an optimization that skips the "DATA"
roundtrip. The standard way to perform an authentication is to send the
"DATA" line.
This commit fixes sd-bus to properly send the "DATA" line. Surprisingly
no existing implementation depends on this, as they all pass the data
directly as argument to "AUTH". This will not work if we want to pass
an empty argument, though.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Jonathan Lebon [Tue, 12 Mar 2019 19:23:25 +0000 (15:23 -0400)]
units: update catalog after systemd-tmpfiles runs
`systemd-journal-catalog-update.service` writes to `/var`. However, it's
not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means
that it may run before or after.
This is an issue for Fedora CoreOS, which uses Ignition. We want to be
able to prepare `/var` on first boot from the initrd, where the SELinux
policy is not loaded yet. This means that the hierarchy under `/var` is
not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it
gets relabeled once `/var` gets mounted post-switchroot.
So any service that tries to access `/var` before `systemd-tmpfiles`
relabels it is likely to hit `EACCES`.
Fix this by simply ordering `systemd-journal-catalog-update.service`
after `systemd-tmpfiles-setup.service`. This is also clearer since the
tmpfiles entries are the canonical source of how `/var` should be
populated.
For more context on this, see:
https://github.com/coreos/ignition/issues/635#issuecomment-446620297
Benjamin Berg [Fri, 8 Mar 2019 16:42:23 +0000 (17:42 +0100)]
hwdb: Fix airplane mode triggering when resuming HP Spectre x360 13
On these devices the key randomly fires during/after suspend/resume
triggering spurious airplane mode changes. The scancode simply needs to
be ignored.
Yu Watanabe [Tue, 12 Mar 2019 02:35:23 +0000 (11:35 +0900)]
network: automatically pick an address on link when L2TP.Local= is not specified
This makes L2TP.Local= support an empty string, 'auto', 'static', and
'dynamic'. When one of the values are specified, a local address is
automatically picked from the local interface of the tunnel.
Frantisek Sumsal [Wed, 13 Mar 2019 09:07:44 +0000 (10:07 +0100)]
test: avoid double-fsck'ing of the rootfs on Arch
Since systemd 206 the combination of systemd and mkinitcpio
causes, under certain conditions, the rootfs to be double fsck'd.
Symptoms:
```
:: performing fsck on '/dev/sda1'
systemd: clean, 3523/125488 files, 141738/501760 blocks
********************** WARNING **********************
* *
* The root device is not configured to be mounted *
* read-write! It may be fsck'd again later. *
* *
*****************************************************
<snip>
[ OK ] Started File System Check on Root Device
```
This occurs when neither 'ro' or 'rw', or only 'ro' is present
on the kernel command line. The solution is to mount the roofs
as read-write on the kernel command line, so systemd knows to not fsck
it again.
Adam Jackson [Tue, 12 Mar 2019 19:22:13 +0000 (20:22 +0100)]
login: mark nomodeset fb devices as master-of-seat
When 'nomodeset' is specified, there's no DRM driver to take over from
efifb. This means no device will be marked as a seat master, so gdm will
never find a sufficiently active seat to start on.
I'm not aware of an especially good way to detect this through a proper
kernel API, so check for the word 'nomodeset' on the command line and
allow fbdev devices to be seat masters if found.
For https://bugzilla.redhat.com/show_bug.cgi?id=1683197.