man: add systemd.environment-generator(7) with two examples
v2:
- add example files to EXTRA_DIST
v3:
- rework for the new scheme where nothing is written to disk
v4:
- use separate dirs for system and user env generators
Environment file generators are a lot like unit file generators, but not
exactly:
1. environment file generators are run for each manager instance, and their
output is (or at least can be) individualized.
The generators themselves are system-wide, the same for all users.
2. environment file generators are run sequentially, in priority order.
Thus, the lifetime of those files is tied to lifecycle of the manager
instance. Because generators are run sequentially, later generators can use or
modify the output of earlier generators.
Each generator is run with no arguments, and the whole state is stored in the
environment variables. The generator can echo a set of variable assignments to
standard output:
VAR_A=something
VAR_B=something else
This output is parsed, and the next and subsequent generators run with those
updated variables in the environment. After the last generator is done, the
environment that the manager itself exports is updated.
Each generator must return 0, otherwise the output is ignored.
The generators in */user-env-generator are for the user session managers,
including root, and the ones in */system-env-generator are for pid1.
env-util,fileio: immediately replace variables in load_env_file_push()
strv_env_replace was calling env_match(), which in effect allowed multiple
values for the same key to be inserted into the environment block. That's
pointless, because APIs to access variables only return a single value (the
latest entry), so it's better to keep the block clean, i.e. with just a single
entry for each key.
Add a new helper function that simply tests if the part before '=' is equal in
two strings and use that in strv_env_replace.
In load_env_file_push, use strv_env_replace to immediately replace the previous
assignment with a matching name.
Afaict, none of the callers are materially affected by this change, but it
seems like some pointless work was being done, if the same value was set
multiple times. We'd go through parsing and assigning the value for each
entry. With this change, we handle just the last one.
core/manager: move environment serialization out to basic/env-util.c
This protocol is generally useful, we might just as well reuse it for the
env. generators.
The implementation is changed a bit: instead of making a new strv and freeing
the old one, just mutate the original. This is much faster with larger arrays,
while in fact atomicity is preserved, since we only either insert the new
entry or not, without being in inconsistent state.
basic/exec-util: add support for synchronous (ordered) execution
The output of processes can be gathered, and passed back to the callee.
(This commit just implements the basic functionality and tests.)
After the preparation in previous commits, the change in functionality is
relatively simple. For coding convenience, alarm is prepared *before* any
children are executed, and not before. This shouldn't matter usually, since
just forking of the children should be pretty quick. One could also argue that
this is more correct, because we will also catch the case when (for whatever
reason), forking itself is slow.
Three callback functions and three levels of serialization are used:
- from individual generator processes to the generator forker
- from the forker back to the main process
- deserialization in the main process
v2:
- replace an structure with an indexed array of callbacks
core/manager: split out creation of serialization fd out to a helper
There is a slight change in behaviour: the user manager for root will create a
temporary file in /run/systemd, not /tmp. I don't think this matters, but
simplifies implementation.
basic/exec-util: use conf_files_list_strv to list executables
Essentially the same logic as in conf_files_list() was independently implemented in
do_execute(). With previous commit, do_execute() can just call conf_files_list() to
get a list of executable paths.
After generating the template name we can shortcut things and just call
unit_file_find_dirs() from inside itself, just with the new name and
save a good number of duplicate lines.
man: improve documentation on seccomp regarding alternative ABIs
Let's clarify that RestrictAddressFamilies= and MemoryDenyWriteExecute=
are only fully effective if non-native system call architectures are
disabled, since they otherwise may be used to circumvent the filters, as
the filters aren't equally effective on all ABIs.
execute: set working directory to /root if User= is not set, but WorkingDirectory=~ is
Or actually, try to to do the right thing depending on what is
available:
- If we know $HOME from User=, then use that.
- If the UID for the service is 0, hardcode that WorkingDirectory=~ means WorkingDirectory=/root
- In any other case (which will be the unprivileged --user case), use
get_home_dir() to find the $HOME of the user we are running as.
- Otherwise fail.
David Glasser [Wed, 8 Feb 2017 23:12:36 +0000 (15:12 -0800)]
man: fix docs for swap's DefaultDependencies= (#5278)
There was a missing dependency and one with the wrong type. Additionally, refer
to DefaultDependencies= once instead of twice, without a vague reference in the
first one that doesn't mention that the value matters.
seccomp: on s390 the clone() parameters are reversed
Add a bit of code that tries to get the right parameter order in place
for some of the better known architectures, and skips
restrict_namespaces for other archs.
This also bypasses the test on archs where we don't know the right
order.
In this case I didn't bother with testing the case where no filter is
applied, since that is hopefully just an issue for now, as there's
nothing stopping us from supporting more archs, we just need to know
which order is right.
Franck Bui [Wed, 8 Feb 2017 19:56:22 +0000 (20:56 +0100)]
sd-event: "when exiting no signal event are pending" is a wrong assertion (#5271)
The code make the following assertion: when freeing a event loop object
(usually it's done after exiting from the main event loop), no signal events
are still queued and are pending.
This assertion can be found in event_unmask_signal_data() with
"assert(!d->current);" assertion.
It appears that this assertion can be wrong at least in a specific case
described below.
Consider the following example which is inspired from udev: a process defines 3
source events: 2 are created by sd_event_add_signal() and 1 is created by
sd_event_add_post().
1. the process receives the 2 signals consecutively so that signal 'A' source
event is queued and pending. Consequently the post source event is also
queued and pending. This is done by sd_event_wait().
2. The callback for signal 'A' is called by sd_event_dispatch().
3. The next call to sd_event_wait() will queue signal 'B' source event.
4. The callback for the post source event is called and calls sd_event_exit().
5. the event loop is exited.
6. freeing the event loop object will lead to the assertion failure in
event_unmask_signal_data().
This patch simply removes this assertion as it doesn't seem to be a
bug if the signal data still reference a signal source at this point.
Philip Withnall [Wed, 8 Feb 2017 15:54:31 +0000 (15:54 +0000)]
nspawn: Add support for sysroot pivoting (#5258)
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
Philip Withnall [Wed, 8 Feb 2017 15:53:01 +0000 (15:53 +0000)]
test: Fix a maybe-uninitialised compiler warning (#5269)
The compiler warning is a false positive, since n_addresses is always
initialised on the success path from parse_argv(), but the compiler
obviously can’t work that out.
Fixes:
src/test/test-nss.c:426:9: warning: 'n_addresses' may be used uninitialized in this function [-Wmaybe-uninitialized]
dissect: don't honour NOAUTO flags when looking for ESP (#5224)
The flag is originally defined for "basic data partitions", but not for the
ESP. We reuse it for the various partitions defined by the Discoverable
Partitions Spec, but it isn't defined for the ESP, hence don't check for
it. Instead, do check for GPT_FLAG_NO_BLOCK_IO_PROTOCOL, as that flag
actually is defined for all partition types, and recommended to use by
the UEFI spec.
Franck Bui [Fri, 27 Jan 2017 15:02:22 +0000 (16:02 +0100)]
tests: add dropin dependency tests
[zj: tests assertions adjusted to the different logic in which masking
of a dependency through one name, does not forbid the dependency
being added through another name.]
core/load-dropin: add more sanity checks on .wants/.requires symlinks
Feb 04 22:35:42 systemd[1462]: foo.service: Wants dependency dropin /home/zbyszek/.config/systemd/user/foo.service.wants/diffname.service target ../barbar.service has different name
Feb 04 22:35:42 systemd[1462]: foo.service: Wants dependency dropin /home/zbyszek/.config/systemd/user/foo.service.wants/wrongname is not a valid unit name, ignoring
core: implement masking of .wants/.requires symlinks
Fixes #1169.
Fixes #4830.
Example log errors:
Feb 04 22:13:28 systemd[1462]: foo.service: Wants dependency on empty_file.service is masked by /home/zbyszek/.config/systemd/user/foo.service.wants/empty_file.service, ignoring
Feb 04 22:13:28 systemd[1462]: foo.service: Wants dependency on masked.service is masked by /home/zbyszek/.config/systemd/user/foo.service.wants/masked.service, ignoring
core: when loading .wants and .requires, follow the same logic as .d conf dropins
Essentially, instead of sequentially adding deps based on all symlinks
encountered in .wants and .requires dirs for each name and each unit file load
path, iteratate over the load paths and unit names gathering symlinks, then
order them based on priority, and then iterate over the final list, adding
dependencies.
This patch doesn't change the logic too much, except that the order in which
dependencies are applied might be different. It wasn't defined before, so that
not really a change. Adding filtering on the symlinks is left for later
patches.
The --help text currently uses the "--umount" spelling, hence to the
same in the man page too.
And let's settle on "umount" instead of "unmount" here, since most folks
probably expect that when typing in a command, as util-linux' tool is
called "umount" after all, and so is the symlink "systemd-umount" we
install.
lewo [Tue, 7 Feb 2017 23:56:55 +0000 (00:56 +0100)]
tmpfiles.d: set primary group rights to r-w (#5265)
If the /var/log/journal directory is created with rigths 700, the application
of an ACL rules without any primary group right sets it to 0. A chmod 755 on
this file will then only set the ACL mask and let the ACL primary group right
to 0. The directory is then unreadable for the primary group.
This patch explicitly sets the primary group to avoid this problem.
install: when a template unit is instantiated via a /usr symlink, consider it enabled
If a unit foobar@.service stored below /usr is instantiated via a
symlink foobar@quux.service also below /usr, then we should consider the
instance statically enabled, while the template itself should continue
to be considered enabled/disabled/static depending on its [Install]
section.
In order to implement this we'll now look for enablement symlinks in all
unit search paths, not just in the config and runtime dirs.
install: don't enter loop when traversing a template symlinks
Before this patch, if we'd encounter an instance or template symlink
while traversing a chain of symlinks we'd fill in the instance name and
retry the iteration. This makes no sense if the resulting name is
actually the same as we are coming from, as we'd just spin a couple of
times in the loop, until the UNIT_FILE_FOLLOW_SYMLINK_MAX iteration
limit is hit.
Fix this, by accepted the symlink as it is, if it identical to what we
filled in.
The third paragraph of the Description already linked to
systemd.resource-control(5), but it was missing from the list of
additional options for the [Service] section.
dissect: try to read roothash value off user.verity.roothash xattr of image file
This slightly extends the roothash loading logic to first check for a
user.verity.roothash extended attribute on the image file. If it exists,
it is used as Verity root hash and the ".roothash" file is not used.
This should improve the chance that the roothash is retained when the
file is moved around, as the data snippet is attached directly to the
image file. The field is still detached from the file payload however,
in order to make sure it may be trusted independently.
This does not replace the ".roothash" file loading, it simply adds a
second way to retrieve the data.
Extended attributes are often a poor choice for storing metadata like
this as it is usually difficult to discover for admins and users, and
hard to fix if it ever gets out of sync. However, in this case I think
it's safe as verity implies read-only access, and thus there's little
chance of it to get out of sync.
core,nspawn,dissect: make nspawn's .roothash file search reusable
This makes nspawn's logic of automatically discovering the root hash of
an image file generic, and then reuses it in systemd-dissect and in
PID1's RootImage= logic, so that verity is automatically set up whenever
we can.
dissect: make sure to manually follow symlinks when mounting dissected image
If the dissected image contains symlinks for the mount points we need we
need to make sure to follow this with chase_symlinks() so that we don't
leave the image.
core: actually make "+" prefix in ReadOnlyPaths=, InaccessiblePaths=, ReadWritablePaths= work
5327c910d2fc1ae91bd0b891be92b30379c7467b claimed to add support for "+"
for prefixing paths with the configured RootDirectory=. But actually it
only implemented it in the backend, it did not add support for it to the
configuration file parsers. Fix that now.
core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory=
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.
(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
If we can, use a memfd for serializing state during a daemon reload or
reexec. Fall back to a file in /run/systemd or /tmp only if memfds are
not available.