machined: when renaming/removing/cloning images, always take care of .roothash file too
Since nspawn looks for them, importd now downloads them, and mkosi
generates them, let's make sure they also processed correctly on all
machined operations.
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).
Handle more types of read-only errors as debug-level issues.
gpt-auto-generator: enable auto-discovery logic also for verity root file systems
verity block devices have two backing devices: the data partition and
the hash partition. Previously the gpt auto-discovery logic would refuse
working on devices with multiple backing devices, losen this up a bit,
to permit them as long as the backing devices are all located on the
same physical media.
verity: add support for setting up verity-protected root disks in the initrd
This adds a generator and a small service that will look for "roothash="
on the kernel command line and use it for setting up a very partition
for the root device.
This provides similar functionality to nspawn's existing --roothash=
switch.
gpt-auto-discovery: port to dissect-image.c dissector
Change the gpt auto discovery generator to use the same dissector as
nspawn and the rest of the tools. This removes the separate dissector
code that the generator previously had and unifies the relevant code.
dissect: make using a generic partition as root partition optional
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.
In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
This adds support for a new kernel command line option "systemd.volatile=" that
provides the same functionality that systemd-nspawn's --volatile= switch
provides, but for host systems (i.e. systems booting with a kernel).
It takes the same parameter and has the same effect.
In order to implement systemd.volatile=yes a new service
systemd-volatile-root.service is introduced that only runs in the initrd and
rearranges the root directory as needed to become a tmpfs instance. Note that
systemd.volatile=state is implemented different: it simply generates a
var.mount unit file that is part of the normal boot and has no effect on the
initrd execution.
The way this is implemented ensures that other explicit configuration for /var
can always override the effect of these options. Specifically, the var.mount
unit is generated in the "late" generator directory, so that it only is in
effect if nothing else overrides it.
Let's follow symlinks before invoking mount() on arbitrary paths, so that we
won't get confused if directories are prepared with absolute symlinks.
Use FOREACH_STRING() instead of NULSTR_FOREACH() as it is more readable.
Don't use snprintf() for concatenating strings, let chase_symlinks() to that.
Replace homegrown mount check with path_is_mount_point(). Also, change the
behaviour when we encounter this: instead of unmounting the old mount point,
simply leave it around and don't replace it, so that initrds can mount stuff
there with different settings than we would apply. This is in-line with how we
handle automatic mounts in nspawn for example.
Use umount_recursive() instead of a simple umount2() for unmounting the old
root, so that we actually cover really all mounts, not just the top-level one.
journald: don't flush to /var/log/journal before we get asked to
This changes journald to not write to /var/log/journal until it received
SIGUSR1 for the first time, thus having been requested to flush the runtime
journal to disk.
This makes the journal work nicer with systems which have the root file system
writable early, but still need to rearrange /var before journald should start
writing and creating files to it, for example because ACLs need to be applied
first, or because /var is to be mounted from another file system, NFS or tmpfs
(as is the case for systemd.volatile=state).
Before this change we required setupts with /var split out to mount the root
disk read-only early on, and ship an /etc/fstab that remounted it writable only
after having placed /var at the right place. But even that was racy for various
preparations as journald might end up accessing the file system before it was
entirely set up, as soon as it was writable.
With this change we make scheduling when to start writing to /var/log/journal
explicit. This means persistent mode now requires
systemd-journal-flush.service in the mix to work, as otherwise journald would
never write to the directory.
This was broken by 19caffac75a2590a0c5ebc2a0214960f8188aec7 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
util-lib: various improvements to kernel command line parsing
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
build-sys: don't mke use of "sushell" automatically
"sushell" is a Fedora-specific concept, shipped as part of
"initscripts". We shouldn't actively search for it if we can avoid it.
Hence, lets now default to /bin/sh as debug shell on all systems, and
permit Fedora to override that for their RPMs via --with-debug-shell= at
configure time.
dissect: optionally, only look for GPT partition tables, nothing else
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
Let's more verbose error messages when validating the input parameters fails.
Also, call path_is_os_tree() properly, as it doesn't return a boolean, but
possibly also an error. Finally, check for the existance of the new init
process with chase_symlinks() to properly handle possible symlinks on the init
binary (which might actually be pretty likely).
Let's use chase_symlinks() when looking for /etc/os-release and
/usr/lib/os-release as these files might be symlinks (and actually are IRL on
some distros).
util-lib: accept invoking chase_symlinks() with a NULL return parameter
Let's permit invoking chase_symlinks() with a NULL return parameter. If so, the
resolved name is not returned, and call is useful for checking for existance of
a file, without actually returning its ultimate path.
This moves the VolatileMode enum and its helper functions to src/shared/. This
is useful to then reuse them to implement systemd.volatile= in a later commit.
systemctl: do not segfault when we cannot find template unit (#4915)
Core was generated by `systemctl cat test@.target test@.service'.
Program terminated with signal SIGSEGV, Segmentation fault.
32 movdqu (%rdi), %xmm0
(gdb) bt
-0 strrchr () at ../sysdeps/x86_64/strrchr.S:32
-1 0x00007f57fdf837fe in __GI___basename (filename=0x0) at basename.c:24
-2 0x000055b8a77d0d91 in unit_find_paths (bus=0x55b8a9242f90, unit_name=0x55b8a92428f0 "test@.service", lp=0x7ffdc9070400, fragment_path=0x7ffdc90703e0, dropin_paths=0x7ffdc90703e8) at src/systemctl/systemctl.c:2584
-3 0x000055b8a77dbae5 in cat (argc=3, argv=0x7ffdc9070678, userdata=0x0) at src/systemctl/systemctl.c:5324
-4 0x00007f57fe55fc6b in dispatch_verb (argc=5, argv=0x7ffdc9070668, verbs=0x55b8a77f1c60 <verbs>, userdata=0x0) at src/basic/verbs.c:92
-5 0x000055b8a77e477f in systemctl_main (argc=5, argv=0x7ffdc9070668) at src/systemctl/systemctl.c:8141
-6 0x000055b8a77e5572 in main (argc=5, argv=0x7ffdc9070668) at src/systemctl/systemctl.c:8412
The right behaviour is not easy in this case. Implement some "sensible" logic.
Jörg Thalheim [Mon, 19 Dec 2016 14:34:07 +0000 (15:34 +0100)]
networkd-ndisc: handle missing mtu gracefully (#4913)
At least bird's implementation of router advertisement does not
set MTU option by default (instead it supplies an option to the user).
In this case just leave MTU as it is.
We currently don't expect any warnings about format strings, on any
architecture (#4612 removed the last few warnings). Turn those warnings into
errors in the future.
As requested by Martin Pitt.
gcc documentation says that -Wformat=2 includes -Wformat-security and
-Wformat-nonliteral so don't include them explicitly.
core: downgrade "Time has been changed" to debug (#4906)
That message is emitted by every systemd instance on every resume:
Dec 06 08:03:38 laptop systemd[1]: Time has been changed
Dec 06 08:03:38 laptop systemd[823]: Time has been changed
Dec 06 08:03:38 laptop systemd[916]: Time has been changed
Dec 07 08:00:32 laptop systemd[1]: Time has been changed
Dec 07 08:00:32 laptop systemd[823]: Time has been changed
Dec 07 08:00:32 laptop systemd[916]: Time has been changed
-- Reboot --
Dec 07 08:02:46 laptop systemd[836]: Time has been changed
Dec 07 08:02:46 laptop systemd[1]: Time has been changed
Dec 07 08:02:46 laptop systemd[926]: Time has been changed
Dec 07 19:48:12 laptop systemd[1]: Time has been changed
Dec 07 19:48:12 laptop systemd[836]: Time has been changed
Dec 07 19:48:12 laptop systemd[926]: Time has been changed
...
Franck Bui [Sat, 17 Dec 2016 14:49:17 +0000 (15:49 +0100)]
coredumpctl: let gdb handle the SIGINT signal (#4901)
Even if pressing Ctrl-c after spawning gdb with "coredumpctl gdb" is not really
useful, we should let gdb handle the signal entirely otherwise the user can be
suprised to see a different behavior when gdb is started by coredumpctl vs when
it's started directly.
Indeed in the former case, gdb exits due to coredumpctl being killed by the
signal.
So this patch makes coredumpctl ignore SIGINT as long as gdb is running.
We should also mention this in NEWS before release. Suggested text:
> DBus policy files are now installed into /usr rather than /etc. Make sure
> your system has dbus = 1.9.18 running before upgrading to this version, or
> override the install path with --with-dbuspolicydir=
Franck Bui [Fri, 16 Dec 2016 16:13:58 +0000 (17:13 +0100)]
core: make mount units from /proc/self/mountinfo possibly bind to a device (#4515)
Since commit 9d06297, mount units from mountinfo are not bound to their devices
anymore (they use the "Requires" dependency instead).
This has the following drawback: if a media is mounted and the eject button is
pressed then the media is unconditionally ejected leaving some inconsistent
states.
Since udev is the component that is reacting (no matter if the device is used
or not) to the eject button, users expect that udev at least try to unmount the
media properly.
This patch introduces a new property "SYSTEMD_MOUNT_DEVICE_BOUND". When set on
a block device, all units that requires this device will see their "Requires"
dependency upgraded to a "BindTo" one. This is currently only used by cdrom
devices.
This patch also gives the possibility to the user to restore the previous
behavior that is bind a mount unit to a device. This is achieved by passing the
"x-systemd.device-bound" option to mount(8). Please note that currently this is
not working because libmount treats the x-* options has comments therefore
they're not available in utab for later application retrievals.
socket_find_symlink_target() returns a pointer to
p->address.sockaddr.un.sun_path when the first byte is non-zero without
checking that this is AF_UNIX socket. Since sockaddr is a union this
byte could be non-zero for AF_INET sockets.
Existing callers happen to be safe but is an accident waiting to happen.
Use socket_address_get_path() since it checks for AF_UNIX.
Daniel Drake [Thu, 15 Dec 2016 22:11:11 +0000 (16:11 -0600)]
rules: identify internal sound cards on platform bus (#4893)
We have a system which has the HDMI audio capability internally,
but pulseaudio is not giving it a very high priority compared
to e.g. USB sound cards.
The sound device appears on the platform bus and it is not
currently tagged with any form factor information.
It seems safe to assume that any sound card that is directly on the
platform bus is of internal form factor, but we must be careful because
udev rules will match all parent devices, not just the immediate parent,
and you will frequently encounter setups such as:
Platform bus -> USB host controller -> USB sound card
In that case, SUBSYSTEMS==platform would match even though we're
clearly working with an external USB sound card.
In order to detect true platform devices here, we rely on the observation
that if any parent devices of the sound card are PCI, USB or firewire
devices, then this sound card cannot directly connected to the platform
bus. Otherwise, if we find a parent device on the platform bus, we assume
this is an internal sound card connected directly to the platform bus.
Let's start placing our D-Bus policy files in /usr rather than /etc. D-Bus
supports this since 1.9.18, and moving our files over means we continue to work
even if /etc is flushed out entirely (for example if systemd-nspawn's
--volatile= switch is used).
Since 1.9.18 was released summer 2015 it should be fine to require a newer
version like this for our builds.
build-sys: include the builddir in $PATH while testing
udev-test.pl shells out systemd-detect-virt, and it really should invoke the
version from the build tree instead of one supplied by the installed system,
hence let's add the builddir to $PATH while building.
util-lib: rework rename_process() to be able to make use of PR_SET_MM_ARG_START
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.
Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
Previously, systemd-detect-virt was unable to detect "systemd-nspawn -a"
container environments, i.e. where PID 1 is a stub process running in host
context, as in that case /proc/1/environ was inherited from the host. Let's
improve that, and add an additional check for container environments where
/proc/1/environ is not cleaned up and does not contain the $container
environment variable:
The /proc/1/sched file shows the host PID in the first line. if this is not
1, we know we are running in a PID namespace (but not which implementation).
With these changes we should be able to detect container environments that
don't set $container at all.
core: rework logic to determine when we decide to add automatic deps for mounts
This adds a concept of "extrinsic" mounts. If mounts are extrinsic we consider
them managed by something else and do not add automatic ordering against
umount.target, local-fs.target, remote-fs.target.
Extrinsic mounts are considered:
- All mounts if we are running in --user mode
- API mounts such as everything below /proc, /sys, /dev, which exist from
earliest boot to latest shutdown.
- All mounts marked as initrd mounts, if we run on the host
- The initrd's private directory /run/initrams that should survive until last
reboot.
This primarily merges a couple of different exclusion lists into a single
concept.
core: make sure targets that get a default Conflicts=shutdown.target are also ordered against it
Let's tweak the automatic dependency generation of target units: not only add a
Conflicts= towards shutdown.target but also an After= line for it, so that we
can be sure the new target is not started when the old target is still up.
Discovered in the context of #4733
(Also, exclude dependency generation if for shutdown.target itself. — This is
strictly speaking redundant, as unit_add_two_dependencies_by_name() detects
that and becomes a NOP, but let's make this explicit for readability.)
Let's be a bit more careful when detecting chroot() environments, so that we
can discern them from namespaced environments.
Previously this would simply check if the root directory of PID 1 matches our
own root directory. With this commit, we also check whether the namespaces of
PID 1 and ourselves are the same. If not we assume we are running inside of a
namespaced environment instead of a chroot() environment.
This has the benefit that systemctl (which uses running_in_chroot()) will work
as usual when invoked in a namespaced service.