units: add system-update-cleanup.service to guard against offline-update loops
Note: the name is "system-update-cleanup.service" rather than
"system-update-done.service", because it should not run normally, and also
because there's already "systemd-update-done.service", and having them named
so similarly would be confusing.
In https://bugzilla.redhat.com/show_bug.cgi?id=1395686 the system repeatedly
entered system-update.target on boot. Because of a packaging issue, the tool
that created the /system-update symlink could be installed without the service
unit that was supposed to perform the upgrade (and remove the symlink). In
fact, if there are no units in system-update.target, and /system-update symlink
is created, systemd always "hangs" in system-update.target. This is confusing
for users, because there's no feedback what is happening, and fixing this
requires starting an emergency shell somehow, and also knowing that the symlink
must be removed. We should be more resilient in this case, and remove the
symlink automatically ourselves, if there are no upgrade service to handle it.
This adds a service which is started after system-update.target is reached and
the symlink still exists. It nukes the symlink and reboots the machine. It
should subsequently boot into the default default.target.
This is a more general fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1395686 (the packaging issue was
already fixed).
- use "service" instead of "script", because various offline updaters that we have
aren't really scripts, e.g. dnf-plugin-system-upgrade, packagekit-offline-update,
fwupd-offline-update.
- strongly recommend After=sysinit.target, Wants=sysinit.target
- clarify a bit what should happen when multiple update services are started
- replace links to the wiki with refs to the man page that replaced it.
Tom Gundersen [Mon, 28 Nov 2016 19:42:40 +0000 (20:42 +0100)]
networkd: move event loop handling out of the manager (#4723)
This will allow us to have several managers sharing an event loop
and running in parallel, as if they were running in separate processes.
The long term-aim is to allow networkd to be split into separate
processes, so restructure the code to make this simpler.
For now we drop the exit-on-idle logic, as this was anyway severely
restricted at the moment. Once split, we will revisit this as it may
then make more sense again.
Martin Pitt [Mon, 28 Nov 2016 11:35:49 +0000 (12:35 +0100)]
test: make transient hostname tests fail verbosely (#4754)
This test fails sometimes but it is hard to reproduce, so we need more
information what happens. Set journal log level to "debug" for the entirety of
networkd-test.py, and show networkd's and hostnamed's journals and the DHCP
server log on failure of the two test_transient_hostname* tests. Also sync the
journal before querying it to get more precise output.
Dave Reisner [Sun, 27 Nov 2016 22:05:39 +0000 (17:05 -0500)]
device: Avoid calling unit_free(NULL) in device setup logic (#4748)
Since a581e45ae8f9bb5c, there's a few function calls to
unit_new_for_name which will unit_free on failure. Prior to this commit,
a failure would result in calling unit_free with a NULL unit, and hit an
assertion failure, seen at least via device_setup_unit:
Assertion 'u' failed at src/core/unit.c:519, function unit_free(). Aborting.
fix journald startup problem when code is compiled with -DNDEBUG (#4735)
Similar to this patch from here:
http://systemd-devel.freedesktop.narkive.com/AvfCbi6c/patch-0-3-using-assert-se-on-actions-with-side-effects-on-test-cases
If the code is compiled with -DNDEBUG which is the default for
some embedded buildsystems, systemd-journald does not startup
and silently fails.
calendarspec: reject strings with spurious spaces and signs
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".
Martin Pitt [Thu, 24 Nov 2016 09:38:01 +0000 (10:38 +0100)]
hwdb/parse_hwdb.py: open files with UTF-8 mode
pyparsing uses the system locale by default, which in the case of 'C' (in lots
of build environment) will fail with a UnicodeDecodeError. Explicitly open it
with UTF-8 encoding to guard against this.
Martin Pitt [Tue, 22 Nov 2016 07:41:51 +0000 (08:41 +0100)]
networkd: allow networkd to start in early boot
With the previous improvements, networkd.service's "After=dbus.service" can now
be dropped. That ordering effectively forced networkd.service to run in late
boot only (dbus.service was rejected to run in early boot in
https://bugs.freedesktop.org/show_bug.cgi?id=98254).
Martin Pitt [Tue, 22 Nov 2016 07:36:20 +0000 (08:36 +0100)]
networkd: set DHCP-acquired timezone and hostname after connecting to D-Bus
If setting the received timezone or transient hostname fails because D-Bus is
not (yet) up, store the data in the Manager object and try again after
connecting to D-Bus.
Martin Pitt [Tue, 22 Nov 2016 07:05:18 +0000 (08:05 +0100)]
networkd: allow networkd to set the timezone in timedated
systemd-networkd runs as user "systemd-network" and thus is not privileged to
set the timezone acquired from DHCP:
systemd-networkd[4167]: test_eth42: Could not set timezone: Interactive authentication required.
Similarly to commit e8c0de912, add a polkit rule to grant
org.freedesktop.timedate1.set-timezone to the "systemd-network" system user.
Move the polkit rules from src/hostname/ to src/network/ to avoid too many
small distributed policy snippets (there might be more in the future), as it's
easier to specify the privileges for a particular subject in this case.
Add NetworkdClientTest.test_dhcp_timezone() test case to verify this (for
all people except those in Pacific/Honolulu, there the test doesn't prove
anything -- sorry ☺ ).
Franck Bui [Wed, 23 Nov 2016 15:31:24 +0000 (16:31 +0100)]
core: consider SIGTERM as a clean exit status for initrd-switch-root.service (#4713)
Since commit 1f0958f640b8717, systemd considers SIGTERM for short-running
services (aka Type=oneshot) as a failure.
This can be an issue with initrd-switch-root.service as the command run by this
service (in order to switch to the new rootfs) may still be running when
systemd does the switch.
However PID1 sends SIGTERM to all remaining processes right before
switching and initrd-switch-root.service can be one of those.
After systemd is reexecuted and its previous state is deserialized, systemd
notices that initrd-switch-root.service was killed with SIGTERM and considers
this as a failure which leads to the emergency shell.
To prevent this, this patch teaches systemd to consider a SIGTERM exit as a
clean one for this service.
It also removes "KillMode=none" since this is pretty useless as the service is
never stopped by systemd but it either exits normally or it's killed by a
SIGTERM as described previously.
build-sys: check for lz4 in the old and new numbering scheme (#4717)
lz4 upstream decided to switch to an incompatible numbering scheme
(1.7.3 follows 131, to match the so version).
PKG_CHECK_MODULES does not allow two version matches for the same package,
so e.g. lz4 < 10 || lz4 >= 125 cannot be used. Check twice, once for
"new" numbers (anything below 10 is assume to be new), once for the "old"
numbers (anything above >= 125). This assumes that the "new" versioning
will not get to 10 to quickly. I think that's a safe assumption, lz4 is a
mature project.
Janne Heß [Wed, 23 Nov 2016 04:19:56 +0000 (05:19 +0100)]
Document an edge-case with resume and mounting (#4581)
When trying to read keyfiles from an encrypted partition to unlock the swap,
a cyclic dependency is generated because systemd can not mount the
filesystem before it has checked if there is a swap to resume from.
Jouke Witteveen [Sat, 1 Oct 2016 12:06:48 +0000 (14:06 +0200)]
service: fix main processes exit behavior for type notify services
Before this commit, when the main process of a Type=notify service exits the
service would enter a running state without passing through the startup post
state. This meant ExecStartPost= from being executed and allowed follow-up
units to start too early (before the ready notification).
Additionally, when RemainAfterExit=yes is used on a Type=notify service, the
exit status of the main process would be disregarded.
After this commit, an unsuccessful exit of the main process of a Type=notify
service puts the unit in a failed state. A successful exit is inconsequential
in case RemainAfterExit=yes. Otherwise, when no ready notification has been
received, the unit is put in a failed state because it has never been active.
When all processes in the cgroup of a Type=notify service are gone and no ready
notification has been received yet, the unit is also put in a failed state.
Jouke Witteveen [Tue, 22 Nov 2016 16:39:56 +0000 (17:39 +0100)]
service: introduce protocol error type
Introduce a SERVICE_FAILURE_PROTOCOL error type for when a service does
not follow the protocol.
This error type is used when a pid file is expected, but not delivered.
nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.
This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.
Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
This changes the return value a bit: 1 will be returned if the value is
changed. But the return value was not documented, and the change should
be for the good anyway. Current callers don't care.
networkd: do not automatically propagate bogus DNS/NTP servers
Never propagate DNS/NTP servers on the local link via the DHCP server. The
DNS/NTP servers 0.0.0.0 and 127.0.0.1 only make sense in the local context,
hence never propagate them automatically to other hosts.
networkd: store DNS servers configured per-network as parsed addresses
DNS servers must be specified as IP addresses, hence let's store them as that
internally, so that they are guaranteed to be fully normalized always, and
invalid data cannot be stored.
Let's reorder them a bit, so that stuff that belongs together semantically is
placed together (in particular, move the various DHCP "use" booleans together).
This adds in4_addr_is_localhost() and in4_addr_is_link_local() that only take
an IPv4 "struct in_addr", to match in_addr_is_localhost() and
in_addr_is_link_local() that that a "union in_addr_union".
This matches the existing in4_addr_is_null() call that already exists.
For IPv6 glibc already exports a set of macros, hence we don't add similar
functions in6_addr_is_localhost(). We also drop in6_addr_is_null() as
IN6_IS_ADDR_UNSPECIFIED() already provides that.
Martin Pitt [Fri, 18 Nov 2016 15:17:01 +0000 (16:17 +0100)]
hostnamed: allow networkd to set the transient hostname
systemd-networkd runs as user "systemd-network" and thus is not privileged to
set the transient hostname:
systemd-networkd[516]: ens3: Could not set hostname: Interactive authentication required.
Standard polkit *.policy files do not have a syntax for granting privileges to
a user, so ship a pklocalauthority (for polkit < 106) and a JavaScript rules
file (for polkit >= 106) that grants the "systemd-network" system user that
privilege.
Add DnsmasqClientTest.test_transient_hostname() test to networkd-test.py to
cover this. Make do_test() a bit more flexible by interpreting "coldplug==None"
as "test sets up the interface by itself". Change DnsmasqClientTest to set up
test_eth42 with a fixed MAC address so that we can configure dnsmasq to send a
special host name for that.
Add MSI VR420 (model MS-1422) to the list of MSI models which need to
ignore brightness hotkey presses, as these are already reported through
the acpi-video interface.
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.
If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.
This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.
This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
basic/process-util: we need to take the shorter of two strings
==30496== Conditional jump or move depends on uninitialised value(s)
==30496== at 0x489F654: memcmp (vg_replace_strmem.c:1091)
==30496== by 0x49BF203: getenv_for_pid (process-util.c:678)
==30496== by 0x4993ACB: detect_container (virt.c:442)
==30496== by 0x182DFF: test_get_process_comm (test-process-util.c:98)
==30496== by 0x185847: main (test-process-util.c:368)
==30496==
Franck Bui [Mon, 7 Nov 2016 16:14:59 +0000 (17:14 +0100)]
core: limit the length of the confirmation question
When "confirmation_spawn=1", the confirmation question can look like:
Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip]
which is pretty verbose and might not fit in the console width size (which is
usually 80 chars) and thus question will be splitted into 2 consecutive lines.
However since the question is now refreshed every 2 secs, the reprinted
question will overwrite the second line of the previous one...
To prevent this, this patch makes sure that the command line won't be longer
than 60 chars by ellipsizing it if the command is longer:
Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip]
A following patch will introduce a new choice that will allow the user to get
details on the command to be executed so it will still be possible to see the
full command line.
Franck Bui [Mon, 7 Nov 2016 16:14:59 +0000 (17:14 +0100)]
core: reprint the question every 2 sec in ask_char()
ask_char() now reprints the question every 2sec automatically.
It prefixes its output with '\r' to to bring the cursor to the
beginning of the terminal line, and then print the message, redoing it
every 2sec.
As long as nothing interferes with out output this logic will have no
visible effect as we constantly overprint the visible text with the
exact same text.
However, if something is dumped in the middle, then our question won't
get lost, as we'll ask soon again.
This is useful if the question is asked to a terminal that is also
used to dump some other status messages/logs. For example when
confirmation messages are enabled during the boot
(systemd.confirm_spawn=1), the question can easily be lost if the
kernel logs are also enabled and both use the same console.