nspawn: optionally, automatically allocated --bind=/--overlay source from /var/tmp
This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.
This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
nspawn: make use of CHASE_NON_EXISTING when locking image
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
fs-util: add new CHASE_NON_EXISTING flag to chase_symlinks()
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
fs-util: change chase_symlinks() behaviour in regards to escaping the root dir
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
fs-util: add chase_symlinks_prefix() and extend comments
chase_symlinks() currently expects a fully qualified, absolute path, relative
to the host's root as first argument. Which is useful in many ways, and similar
to the paths unlink(), rename(), open(), … expect. Sometimes it's however
useful to first prefix the specified path with the specified root directory.
Add a new call chase_symlinks_prefix() for this, that is a simple wrapper.
nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory=
As suggested in PR #3667.
This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.
This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.
nspawn: properly handle image/directory paths that are symlinks
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.
tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
Martin Pitt [Thu, 24 Nov 2016 09:38:01 +0000 (10:38 +0100)]
hwdb/parse_hwdb.py: open files with UTF-8 mode
pyparsing uses the system locale by default, which in the case of 'C' (in lots
of build environment) will fail with a UnicodeDecodeError. Explicitly open it
with UTF-8 encoding to guard against this.
Martin Pitt [Wed, 30 Nov 2016 08:20:15 +0000 (09:20 +0100)]
parse_hwdb: fix to work with pyparsing 2.1.10
pyparsing 2.1.10 fixed the handling of LineStart to really just apply to line
starts and not ignore whitespace and comments any more. Adjust EMPTYLINE to
this.
Many thanks to Paul McGuire for pointing this out!
Martin Pitt [Wed, 30 Nov 2016 07:02:49 +0000 (08:02 +0100)]
test: retry checking for transient hostname in hostnamectl (#4769)
Sometimes setting the transient hostname does not happen synchronously, so
retry up to five times. It is not yet clear whether this is legitimate
behaviour or an underlying bug, but this will at least show whether the wrong
transient hostname is just a race condition or permanently wrong.
Dongsu Park [Tue, 29 Nov 2016 19:16:55 +0000 (20:16 +0100)]
cgroup: support prefix "-" in cgroups whitelisting entries (#4687)
So far systemd-nspawn container has been creating files under
/run/systemd/inaccessible, no matter whether it's running in user
namespace or not. That's fine for regular files, dirs, socks, fifos.
However, it's not for block and character devices, because kernel
doesn't allow them to be created under user namespace. It results
in warnings at booting like that:
====
Couldn't stat device /run/systemd/inaccessible/chr
Couldn't stat device /run/systemd/inaccessible/blk
====
Thus we need to have the cgroups whitelisting handler to silently ignore
a file, when the device path is prefixed with "-". That's exactly the
same convention used in directives like ReadOnlyPaths=. Also insert the
prefix "-" to inaccessible entries.
Stefan Berger [Tue, 29 Nov 2016 15:47:20 +0000 (10:47 -0500)]
ima: Write the policy filename into IMA's sysfs policy file (#4766)
IMA validates file signatures based on the security.ima xattr. As of
Linux-4.7, instead of copying the IMA policy into the securityfs policy,
the IMA policy pathname can be written, allowing the IMA policy file
signature to be validated.
This patch modifies the existing code to first attempt to write the
pathname, but on failure falls back to copying the IMA policy contents.
Jouke Witteveen [Thu, 24 Nov 2016 20:05:47 +0000 (21:05 +0100)]
service: only fail notify services on empty cgroup during start
We stay in the SERVICE_START while no READY=1 notification message has
been received. When we are in the SERVICE_START_POST state, we have
already received a ready notification. Hence we should not fail when the
cgroup becomes empty in that state.
units: add system-update-cleanup.service to guard against offline-update loops
Note: the name is "system-update-cleanup.service" rather than
"system-update-done.service", because it should not run normally, and also
because there's already "systemd-update-done.service", and having them named
so similarly would be confusing.
In https://bugzilla.redhat.com/show_bug.cgi?id=1395686 the system repeatedly
entered system-update.target on boot. Because of a packaging issue, the tool
that created the /system-update symlink could be installed without the service
unit that was supposed to perform the upgrade (and remove the symlink). In
fact, if there are no units in system-update.target, and /system-update symlink
is created, systemd always "hangs" in system-update.target. This is confusing
for users, because there's no feedback what is happening, and fixing this
requires starting an emergency shell somehow, and also knowing that the symlink
must be removed. We should be more resilient in this case, and remove the
symlink automatically ourselves, if there are no upgrade service to handle it.
This adds a service which is started after system-update.target is reached and
the symlink still exists. It nukes the symlink and reboots the machine. It
should subsequently boot into the default default.target.
This is a more general fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1395686 (the packaging issue was
already fixed).
- use "service" instead of "script", because various offline updaters that we have
aren't really scripts, e.g. dnf-plugin-system-upgrade, packagekit-offline-update,
fwupd-offline-update.
- strongly recommend After=sysinit.target, Wants=sysinit.target
- clarify a bit what should happen when multiple update services are started
- replace links to the wiki with refs to the man page that replaced it.
Tom Gundersen [Mon, 28 Nov 2016 19:42:40 +0000 (20:42 +0100)]
networkd: move event loop handling out of the manager (#4723)
This will allow us to have several managers sharing an event loop
and running in parallel, as if they were running in separate processes.
The long term-aim is to allow networkd to be split into separate
processes, so restructure the code to make this simpler.
For now we drop the exit-on-idle logic, as this was anyway severely
restricted at the moment. Once split, we will revisit this as it may
then make more sense again.
Martin Pitt [Mon, 28 Nov 2016 11:35:49 +0000 (12:35 +0100)]
test: make transient hostname tests fail verbosely (#4754)
This test fails sometimes but it is hard to reproduce, so we need more
information what happens. Set journal log level to "debug" for the entirety of
networkd-test.py, and show networkd's and hostnamed's journals and the DHCP
server log on failure of the two test_transient_hostname* tests. Also sync the
journal before querying it to get more precise output.
Dave Reisner [Sun, 27 Nov 2016 22:05:39 +0000 (17:05 -0500)]
device: Avoid calling unit_free(NULL) in device setup logic (#4748)
Since a581e45ae8f9bb5c, there's a few function calls to
unit_new_for_name which will unit_free on failure. Prior to this commit,
a failure would result in calling unit_free with a NULL unit, and hit an
assertion failure, seen at least via device_setup_unit:
Assertion 'u' failed at src/core/unit.c:519, function unit_free(). Aborting.
fix journald startup problem when code is compiled with -DNDEBUG (#4735)
Similar to this patch from here:
http://systemd-devel.freedesktop.narkive.com/AvfCbi6c/patch-0-3-using-assert-se-on-actions-with-side-effects-on-test-cases
If the code is compiled with -DNDEBUG which is the default for
some embedded buildsystems, systemd-journald does not startup
and silently fails.
calendarspec: reject strings with spurious spaces and signs
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".
Martin Pitt [Thu, 24 Nov 2016 09:38:01 +0000 (10:38 +0100)]
hwdb/parse_hwdb.py: open files with UTF-8 mode
pyparsing uses the system locale by default, which in the case of 'C' (in lots
of build environment) will fail with a UnicodeDecodeError. Explicitly open it
with UTF-8 encoding to guard against this.
Martin Pitt [Tue, 22 Nov 2016 07:41:51 +0000 (08:41 +0100)]
networkd: allow networkd to start in early boot
With the previous improvements, networkd.service's "After=dbus.service" can now
be dropped. That ordering effectively forced networkd.service to run in late
boot only (dbus.service was rejected to run in early boot in
https://bugs.freedesktop.org/show_bug.cgi?id=98254).
Martin Pitt [Tue, 22 Nov 2016 07:36:20 +0000 (08:36 +0100)]
networkd: set DHCP-acquired timezone and hostname after connecting to D-Bus
If setting the received timezone or transient hostname fails because D-Bus is
not (yet) up, store the data in the Manager object and try again after
connecting to D-Bus.
Martin Pitt [Tue, 22 Nov 2016 07:05:18 +0000 (08:05 +0100)]
networkd: allow networkd to set the timezone in timedated
systemd-networkd runs as user "systemd-network" and thus is not privileged to
set the timezone acquired from DHCP:
systemd-networkd[4167]: test_eth42: Could not set timezone: Interactive authentication required.
Similarly to commit e8c0de912, add a polkit rule to grant
org.freedesktop.timedate1.set-timezone to the "systemd-network" system user.
Move the polkit rules from src/hostname/ to src/network/ to avoid too many
small distributed policy snippets (there might be more in the future), as it's
easier to specify the privileges for a particular subject in this case.
Add NetworkdClientTest.test_dhcp_timezone() test case to verify this (for
all people except those in Pacific/Honolulu, there the test doesn't prove
anything -- sorry ☺ ).
Franck Bui [Wed, 23 Nov 2016 15:31:24 +0000 (16:31 +0100)]
core: consider SIGTERM as a clean exit status for initrd-switch-root.service (#4713)
Since commit 1f0958f640b8717, systemd considers SIGTERM for short-running
services (aka Type=oneshot) as a failure.
This can be an issue with initrd-switch-root.service as the command run by this
service (in order to switch to the new rootfs) may still be running when
systemd does the switch.
However PID1 sends SIGTERM to all remaining processes right before
switching and initrd-switch-root.service can be one of those.
After systemd is reexecuted and its previous state is deserialized, systemd
notices that initrd-switch-root.service was killed with SIGTERM and considers
this as a failure which leads to the emergency shell.
To prevent this, this patch teaches systemd to consider a SIGTERM exit as a
clean one for this service.
It also removes "KillMode=none" since this is pretty useless as the service is
never stopped by systemd but it either exits normally or it's killed by a
SIGTERM as described previously.
build-sys: check for lz4 in the old and new numbering scheme (#4717)
lz4 upstream decided to switch to an incompatible numbering scheme
(1.7.3 follows 131, to match the so version).
PKG_CHECK_MODULES does not allow two version matches for the same package,
so e.g. lz4 < 10 || lz4 >= 125 cannot be used. Check twice, once for
"new" numbers (anything below 10 is assume to be new), once for the "old"
numbers (anything above >= 125). This assumes that the "new" versioning
will not get to 10 to quickly. I think that's a safe assumption, lz4 is a
mature project.
Janne Heß [Wed, 23 Nov 2016 04:19:56 +0000 (05:19 +0100)]
Document an edge-case with resume and mounting (#4581)
When trying to read keyfiles from an encrypted partition to unlock the swap,
a cyclic dependency is generated because systemd can not mount the
filesystem before it has checked if there is a swap to resume from.
Jouke Witteveen [Sat, 1 Oct 2016 12:06:48 +0000 (14:06 +0200)]
service: fix main processes exit behavior for type notify services
Before this commit, when the main process of a Type=notify service exits the
service would enter a running state without passing through the startup post
state. This meant ExecStartPost= from being executed and allowed follow-up
units to start too early (before the ready notification).
Additionally, when RemainAfterExit=yes is used on a Type=notify service, the
exit status of the main process would be disregarded.
After this commit, an unsuccessful exit of the main process of a Type=notify
service puts the unit in a failed state. A successful exit is inconsequential
in case RemainAfterExit=yes. Otherwise, when no ready notification has been
received, the unit is put in a failed state because it has never been active.
When all processes in the cgroup of a Type=notify service are gone and no ready
notification has been received yet, the unit is also put in a failed state.
Jouke Witteveen [Tue, 22 Nov 2016 16:39:56 +0000 (17:39 +0100)]
service: introduce protocol error type
Introduce a SERVICE_FAILURE_PROTOCOL error type for when a service does
not follow the protocol.
This error type is used when a pid file is expected, but not delivered.
nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.
This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.
Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
This changes the return value a bit: 1 will be returned if the value is
changed. But the return value was not documented, and the change should
be for the good anyway. Current callers don't care.