build-sys: include the builddir in $PATH while testing
udev-test.pl shells out systemd-detect-virt, and it really should invoke the
version from the build tree instead of one supplied by the installed system,
hence let's add the builddir to $PATH while building.
util-lib: rework rename_process() to be able to make use of PR_SET_MM_ARG_START
PR_SET_MM_ARG_START allows us to relatively cleanly implement process renaming.
However, it's only available with privileges. Hence, let's try to make use of
it, and if we can't fall back to the traditional way of overriding argv[0].
This removes size restrictions on the process name shown in argv[] at least for
privileged processes.
nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.
Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
Previously, systemd-detect-virt was unable to detect "systemd-nspawn -a"
container environments, i.e. where PID 1 is a stub process running in host
context, as in that case /proc/1/environ was inherited from the host. Let's
improve that, and add an additional check for container environments where
/proc/1/environ is not cleaned up and does not contain the $container
environment variable:
The /proc/1/sched file shows the host PID in the first line. if this is not
1, we know we are running in a PID namespace (but not which implementation).
With these changes we should be able to detect container environments that
don't set $container at all.
core: rework logic to determine when we decide to add automatic deps for mounts
This adds a concept of "extrinsic" mounts. If mounts are extrinsic we consider
them managed by something else and do not add automatic ordering against
umount.target, local-fs.target, remote-fs.target.
Extrinsic mounts are considered:
- All mounts if we are running in --user mode
- API mounts such as everything below /proc, /sys, /dev, which exist from
earliest boot to latest shutdown.
- All mounts marked as initrd mounts, if we run on the host
- The initrd's private directory /run/initrams that should survive until last
reboot.
This primarily merges a couple of different exclusion lists into a single
concept.
core: make sure targets that get a default Conflicts=shutdown.target are also ordered against it
Let's tweak the automatic dependency generation of target units: not only add a
Conflicts= towards shutdown.target but also an After= line for it, so that we
can be sure the new target is not started when the old target is still up.
Discovered in the context of #4733
(Also, exclude dependency generation if for shutdown.target itself. — This is
strictly speaking redundant, as unit_add_two_dependencies_by_name() detects
that and becomes a NOP, but let's make this explicit for readability.)
Let's be a bit more careful when detecting chroot() environments, so that we
can discern them from namespaced environments.
Previously this would simply check if the root directory of PID 1 matches our
own root directory. With this commit, we also check whether the namespaces of
PID 1 and ourselves are the same. If not we assume we are running inside of a
namespaced environment instead of a chroot() environment.
This has the benefit that systemctl (which uses running_in_chroot()) will work
as usual when invoked in a namespaced service.
Fixes:
```
$ ./libtool --mode=execute valgrind --leak-check=full ./test-fs-util
...
==22871==
==22871== 27 bytes in 1 blocks are definitely lost in loss record 1 of 1
==22871== at 0x4C2FC47: realloc (vg_replace_malloc.c:785)
==22871== by 0x4E86D05: strextend (string-util.c:726)
==22871== by 0x4E8F347: chase_symlinks (fs-util.c:712)
==22871== by 0x109EBF: test_chase_symlinks (test-fs-util.c:75)
==22871== by 0x10C381: main (test-fs-util.c:305)
==22871==
```
Closes #4888
core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.
The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).
namespace: instead of chasing mount symlinks a priori, do so as-we-go
This is relevant as many of the mounts we try to establish only can be followed
when some other prior mount that is a prefix of it is established. Hence: move
the symlink chasing into the actual mount functions, so that we do it as late
as possibly but as early as necessary.
After all, these don#t strictly encapsulate bind mounts anymore, and we are
preparing this for adding arbitrary user-defined bind mounts in a later commit,
at which point this would become really confusing. Let's clean this up, rename
the BindMount structure to MountEntry, so that it is clear that it can contain
information about any kind of mount.
This reworks handling of the read-only management for mount points. This will
become handy as soon as we add arbitrary bind mount support (which comes in a
later commit).
We want that systemd --user gets its own keyring as usual, even if the
barebones PAM snippet we ship upstream is used. If we don't do this we get the
basic keyring systemd --system sets up for us.
core: store the invocation ID in the per-service keyring
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.
The invocation ID is now passed in three different ways to a service:
- As environment variable $INVOCATION_ID. This is easy to use, but may be
overriden by unprivileged code (which might be a bad or a good thing), which
means it's incompatible with SUID code (see above).
- As extended attribute on the service cgroup. This cannot be overriden by
unprivileged code, and may be queried safely from "outside" of a service.
However, it is incompatible with containers right now, as unprivileged
containers generally cannot set xattrs on cgroupfs.
- As "invocation_id" key in the kernel keyring. This has the benefit that the
key cannot be changed by unprivileged service code, and thus is safe to
access from SUID code (see above). But do note that service code can replace
the session keyring with a fresh one that lacks the key. However in that case
the key will not be owned by root, which is easily detectable. The keyring is
also incompatible with containers right now, as it is not properly namespace
aware (but this is being worked on), and thus most container managers mask
the keyring-related system calls.
Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.
How to verify the new invocation ID in the keyring:
# systemd-run -t /bin/sh
Running as unit: run-rd917366c04f847b480d486017f7239d6.service
Press ^] three times within 1s to disconnect TTY.
# keyctl show
Session Keyring 680208392 --alswrv 0 0 keyring: _ses 250926536 ----s-rv 0 0 \_ user: invocation_id
# keyctl request user invocation_id 250926536
# keyctl read 250926536
16 bytes of data in key: 9c96317cac64495aa42b9cd74f3ff96b
# echo $INVOCATION_ID 9c96317cac64495aa42b9cd74f3ff96b
# ^D
This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
core: run each system service with a fresh session keyring
This patch ensures that each system service gets its own session kernel keyring
automatically, and implicitly. Without this a keyring is allocated for it
on-demand, but is then linked with the user's kernel keyring, which is OK
behaviour for logged in users, but not so much for system services.
With this change each service gets a session keyring that is specific to the
service and ceases to exist when the service is shut down. The session keyring
is not linked up with the user keyring and keys hence only search within the
session boundaries by default.
(This is useful in a later commit to store per-service material in the keyring,
for example the invocation ID)
Andrey Ulanov [Tue, 13 Dec 2016 01:38:18 +0000 (17:38 -0800)]
nspawn: when getting SIGCHLD make sure it's from the first child (#4855)
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.
Before this change the problem can be reproduced as follows:
$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
As requested in
https://github.com/systemd/systemd/pull/4864#pullrequestreview-12372557.
docbook will substitute triple dots for the ellipsis in man output, so this has
no effect on the troff output, only on HTML, making it infinitesimally nicer.
In some places we show output from programs, which use dots, and those places
should not be changed. In some tables, the alignment would change if dots were
changed to the ellipsis which is only one character. Since docbook replaces the
ellipsis automatically, we should leave those be. This patch changes all other
places.
systemd.journal-fields(7) documents CODE_FUNC=. Internally, we were
inconsistent: sd_journal_print uses CODE_FUNC=, log.h has CODE_FUNCTION=,
python-systemd and bootchart also used CODE_FUNC=, when they were internal.
Most external projects use sd_journal_* functions, so CODE_FUNC=,
python-systemd still uses CODE_FUNC=, as does systemd-bootchart, and
independent reimplementations in golang-github-coreos-go-systemd, qtbase,
network manager, glib, pulseaudio. Hence, I don't think there's much
choice.
share/log: change log_syntax from "[a:b] " to "a:b: "
Those square brackets don't fit how our other messages look like; we use colons
everywhere else. The "[a:b]" format was originally added in ed5bcfbe3c3b68e59242c03649eea03a9707d318, and remained unchanged for 7 years,
but in the meantime other conventions evolved.
The new version is also one character shorter.
[/etc/systemd/system/systemd-networkd.service.d/override.conf:2] Failed to parse sec value, ignoring: ...
↓
/etc/systemd/system/systemd-networkd.service.d/override.conf:2: Failed to parse sec value, ignoring: ...
tools/catalog-report.py: a script to scour the journal for bad catalog entries
I think it can be a useful tool to find such issues.
SD_MESSAGE_UNIT_STARTING 7d4958e842da4a758f6c1cdc7b36dcc5: no field UNIT
../src/core/unit.c:1239 unit_status_log_starting_stopping_reloading
Starting Paths.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
USER_UNIT=paths.target
SD_MESSAGE_UNIT_STARTED 39f53479d3a045ac8e11786248231fbf: no field UNIT
../src/core/job.c:721 job_log_status_message
Reached target Paths.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
RESULT=done
USER_UNIT=paths.target
SD_MESSAGE_STARTUP_FINISHED b07a249cd024414a82dd00cd181378ff: no field KERNEL_USEC
../src/core/manager.c:2532 manager_check_finished
Startup finished in 19ms.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
USERSPACE_USEC=19670
SD_MESSAGE_STARTUP_FINISHED b07a249cd024414a82dd00cd181378ff: no field INITRD_USEC
../src/core/manager.c:2532 manager_check_finished
Startup finished in 19ms.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
USERSPACE_USEC=19670
unknown 0ce153587afa4095832d233c17a88001: no catalog entry
gsm-manager.c:1366 start_phase
Entering running state
SYSLOG_IDENTIFIER=gnome-session
PRIORITY=5
SD_MESSAGE_UNIT_STOPPING de5b426a63be47a7b6ac3eaac82e2f6f: no field UNIT
../src/core/unit.c:1239 unit_status_log_starting_stopping_reloading
Stopping Default.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
USER_UNIT=default.target
SD_MESSAGE_UNIT_STOPPED 9d1aaa27d60140bd96365438aad20286: no field UNIT
../src/core/job.c:729 job_log_status_message
Stopped target Default.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
RESULT=done
USER_UNIT=default.target
SD_MESSAGE_TIME_CHANGE c7a787079b354eaaa9e77b371893cd27: no field REALTIME
src/core/manager.c:2049 manager_dispatch_time_change_fd
Time has been changed
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
PRIORITY=6
unknown f3ea493c22934e26811cd62abe8e203a: no catalog entry
shell-global.c:1375 shell_global_log_structured
GNOME Shell started at Sat Jun 11 2016 12:37:46 GMT-0400 (EDT)
SYSLOG_IDENTIFIER=gnome-shell
SD_MESSAGE_UNIT_FAILED be02cf6855d2428ba40df7e9d022f03d: no field UNIT
src/core/job.c:803 job_log_status_message
Failed to start GNOME Terminal Server.
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
RESULT=failed
PRIORITY=3
USER_UNIT=gnome-terminal-server.service
SD_MESSAGE_LID_CLOSED b72ea4a2881545a0b50e200e55b9b070: no catalog entry
src/login/logind-button.c:198 button_dispatch
Lid closed.
SYSLOG_FACILITY=4
SYSLOG_IDENTIFIER=systemd-logind
PRIORITY=6
SD_MESSAGE_LID_OPENED b72ea4a2881545a0b50e200e55b9b06f: no catalog entry
src/login/logind-button.c:219 button_dispatch
Lid opened.
SYSLOG_FACILITY=4
SYSLOG_IDENTIFIER=systemd-logind
PRIORITY=6
unknown fef1cc509d5047268b83a3a553f54b43: no catalog entry
/usr/lib/python3.5/site-packages/dnf-plugins/system_upgrade.py:422 log_status
Rebooting to perform upgrade.
SYSLOG_IDENTIFIER=python3
DNF_VERSION=1.1.10
TARGET_RELEASEVER=25
SYSTEM_RELEASEVER=24
PRIORITY=5
unknown 3e0a5636d16b4ca4bbe5321d06c6aa62: no catalog entry
/usr/lib/python3.5/site-packages/dnf-plugins/system_upgrade.py:422 log_status
Starting system upgrade. This will take a while.
SYSLOG_IDENTIFIER=python3
DNF_VERSION=1.1.10
SYSTEM_RELEASEVER=24
PRIORITY=5
TARGET_RELEASEVER=25
unknown 0123456789abcdef0123456789abcdef: no catalog entry
<doctest systemd.journal.JournalHandler[9]>:1 <module>
Message with ID
SYSLOG_IDENTIFIER=/usr/lib/python2.7/site-packages/py/test.py
LOGGER=custom_logger_name
PRIORITY=4
THREAD_NAME=MainThread
pid1,catalog: use a different MESSAGE_ID for user manager startup
This add a new message id for the end of user instance startup.
User manager startup is a different beast then the system startup.
Their descriptions are completely different too. Let's just separate
them.
Partially fixes #3351.
Also remove "successful" from the description, since we don't know if
the startup was successful or not.
hwdb_parser: make sure that our patterns match the full property
We would catch stuff like:
ACCEL_MOUNT_MATRIX=0, -1, 0; -1, 0, 0; 0, 0.0., 0
but not
ACCEL_MOUNT_MATRIX=0, -1, 0; -1, 0, 0; 0, 0, 0.0.
because the match would stop at the next-to-last char. Fix that
by requiring a line end.
Bastien Nocera [Tue, 6 Dec 2016 16:16:43 +0000 (17:16 +0100)]
udev: Add rules for accelerometer orientation quirks
This commit adds a rules file to extract the properties from hwdb
to set on i2c IIO devices. This is used to set the ACCEL_MOUNT_MATRIX
property on IIO devices, to be consumed by iio-sensor-proxy or
equivalent daemon.
The hwdb file contains documentation on how to write quirks. Note
however that mount information is usually exported in:
- the device-tree for ARM devices
- the ACPI DSDT for Intel-compatible devices
but currently not extracted by the kernel.
Also note that some devices have the framebuffer rotation that changes
between the bootloader and the main system, which might mean that the
accelerometer is then wrongly oriented. This is a missing feature in the
i915 kernel driver: https://bugs.freedesktop.org/show_bug.cgi?id=94894
which needs to be fixed, and won't require quirks.
man: make the examples in systemd.network(5) more useful
We shouldn't just have snippets of configuration, but instead
examples which show all the parts necessary to build a certain kind
of setup, with short explanations.
networkd: check that VTI/VTI6 tunnels have a local address
Otherwise we'd fail with an assertion:
Assertion 't->family == AF_INET' failed at ../src/network/netdev/tunnel.c:244, function netdev_vti_fill_message_create(). Aborting.
When assigning addresses, we'd set the family, and later
verify that the address on the other end has the same family.
But when the address was specified as "any", we'd simply unset
the family. Instead, only unset the family if both addresses
are wiped.
Also, don't bother setting family = AF_UNSPEC, since it's the default (0).
%c and %r rely on settings made in the unit files themselves and hence resolve
to different values depending on whether they are used before or after Slice=.
Let's simply deprecate them and drop them from the documentation, as that's not
really possible to fix. Moreover they are actually redundant, as the same
information may always be queried from /proc/self/cgroup and /proc/1/cgroup.
(Accurately speaking, %R is actually not broken like this as it is constant.
However, let's remove all cgroup-related specifiers at once, as it is also
redundant, and doesn't really make much sense alone.)
core: turn on specifier expansion for more unit file settings
Let's permit specifier expansion at a numbre of additional fields, where
arbitrary strings might be passed where this might be useful one day. (Or at
least where there's no clear reason where it wouldn't make sense to have.)
core: use unit_full_printf() at a couple of locations we used unit_name_printf() before
For settings that are not taking unit names there's no reason to use
unit_name_printf(). Use unit_full_printf() instead, as the names are validated
anyway in one form or another after expansion.
core: resolve more specifiers in unit_name_printf()
unit_name_printf() is usually what we use when the resulting string shall
qualify as unit name, and it hence avoids resolving specifiers that almost
certainly won't result in valid unit names.
Add a couple of more specifiers that unit_full_printf() resolves also to the
list unit_name_printf() resolves, as they are likely to be useful in valid unit
names too. (Note that there might be cases where this doesn't hold, but we
should still permit this, as more often than not they are safe, and if people
want to use them that way, they should be able to.)
core: move specifier expansion out of service.c/socket.c
This monopolizes unit file specifier expansion in load-fragment.c, and removes
it from socket.c + service.c. This way expansion becomes an operation done exclusively at time of loading unit files.
Previously specifiers were resolved for all settings during loading of unit
files with the exception of ExecStart= and friends which were resolved in
socket.c and service.c. With this change the latter is also moved to the
loading of unit files.
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.
Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.
mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:
```
# mkdir test
# cd test
# mkosi --verity
# systemd-nspawn -i ./image.raw -bn
```
Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.
It's pretty basic: only support for passphrase-based encryption.
In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.