Andreas Rammhold [Tue, 26 Sep 2017 23:54:20 +0000 (01:54 +0200)]
man/systemd.network: Updated documentation about VRF traffic redirection
Starting with kernel version 4.8 the kernel has a single `l3mdev` rule
that handles this. This rule will be created when the first VRF device
is added.
When an interface has been enslaved to a VRF the received routes should
be added to the VRFs RT instead of the main table.
This change modifies the default behaviour of routes in the case where a
network belongs to an VRF. When the user does not configure a
`DHCP.RouteTable` in a `systemd.network` file and the interface belongs
to a VRF, the VRFs routing table is used instead of RT_TABLE_MAIN.
When the user has configured a custom routing table for DHCP the VRFs
table is ignored and the users preference takes precedence.
core: log unit failure with type-specific result code
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.
This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.
Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
cgroup: after determining that a cgroup is empty, asynchronously dispatch this
This makes sure that if we learn via inotify or another event source
that a cgroup is empty, and we checked that this is indeed the case (as
we might get spurious notifications through inotify, as the inotify
logic through the "cgroups.event" is pretty unspecific and might be
trigger for a variety of reasons), then we'll enqueue a defer event for
it, at a priority lower than SIGCHLD handling, so that we know for sure
that if there's waitid() data for a process we used it before
considering the cgroup empty notification.
We are about to add second cgroup-related queue, called
"cgroup_empty_queue", hence let's rename "cgroup_queue" to
"cgroup_realize_queue" (as that is its purpose) to minimize confusion
about the two queues.
network: change log level when sd_rtnl_message_get_family() returns invalid family (#6923)
From bce67bbee359eec19e6778619b6651100a1c1477, systemd-networkd always shows
```
rtnl: received address with invalid family type 32, ignoring.
```
during boot-up. In the code, there are log_warning() and log_debug() for the
same situation, and the log_debug() is never called. So, let's lower the
log level and remove never called function.
socket: if RemoveOnStop= is turned on for a socket, try to unlink() pre-existing symlinks
Normally, Symlinks= failing is not considered fatal nor destructive.
Let's slightly alter behaviour here if RemoveOnStop= is turned on. In
that case the use in a way opted for destructive behaviour and we do
unlink all sockets and symlinks when the socket unit goes down. And that
means we might as well unlink any pre-existing if this mode is selected.
Yeah, it's a bit of a stretch to do this, but @OhNoMoreGit is right: if
RemoveOnStop= is on we are destructive regarding any pre-existing
symlinks on stop, and it would be quite weird if we wouldn't be on
start.
socket: make sure we warn loudly about symlinks we can't create
Note that this change does not make symlink creation failing fatal. I am
not entirely sure about whether it should be, but I am leaning towards
not making it fatal for two reasons: symlinks like this tend to be a
compatibility feature, and hence unlikely to be essential for operation,
in a way this breaks compatibility, and while doing that is not off the
table, we should probably avoid it if we are not entirely sure it's a
good thing.
Note that this also changes plain symlink() to symlink_idempotent() so
that existing symlinks with the right destination are nothing we log
about.
Let's use proc_cmdline_get_key() instead of some strstr() logic to find
a kernel command line key. Using strstr() gets confused by similarly
named keys, and we should reuse our own code as much as we can anyway...
# systemctl emergency
Failed to start emergency.target: Transaction order is cyclic. See syste...
See system logs and 'systemctl status emergency.target' for details.
# systemctl status emergency.target
● emergency.target - Emergency Mode
Loaded: loaded (/usr/lib/systemd/system/emergency.target; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2017-09-25 10:43:02 BST; 2h 42min ago
Docs: man:systemd.special(7)
systemd[1]: sysinit.target: Found dependency on sysinit.target/stop
sysinit.target: Unable to break cycle starting with sysinit.target/stop
network.target: Found ordering cycle on wpa_supplicant.service/stop
network.target: Found dependency on sysinit.target/stop
network.target: Found dependency on emergency.target/start
network.target: Found dependency on emergency.service/start
network.target: Found dependency on serial-getty@ttyS0.service/stop
network.target: Found dependency on systemd-user-sessions.service/stop
network.target: Found dependency on network.target/stop
network.target: Unable to break cycle starting with network.target/stop
IMO #6509 is ugly enough that we should aim to answer it. But it could
take some time to investigate, so let's re-open the issue as a first step.
Just in case something opened them, let's make sure glibc invalidates
them too.
Thankfully so far no library opened log channels behind our back, at
least as far as I know, hence this is actually a NOP, but let's better
be safe than sorry.
Now that logging can implicitly reopen the log streams when needed we
can log errors without any special magic, hence let's normalize things,
and log the same way we do everywhere else.
log: add a mode where we open the log fds for every single log message
This we can then make use in execute.c to make error logging a bit less
special when preparing for process execution, as we can still log but
don't have any fds open continously.
swap: adjust swap.c in a similar way to what we just did to mount.c
Also drop the redundant states and make all similar changes too.
Thankfully the swap.c state engine is much simpler than mount.c's, hence
this should be easier to digest.
The function returns true for all states that have a control process
running, and each time we call it that's what we want to know, hence
let's rename it accordingly. Moreover, the more generic unit states have
an ACTIVE state, and it is defined quite differently from the set of
states this function returns true for, hence let's avoid confusion and
not reuse the word "ACTIVE" here in a different context.
Finally, let's uppercase this, since in most ways it's pretty much
identical to a macro
This changes the mount unit state engine in the following ways:
1. The MOUNT_MOUNTING_SIGTERM and MOUNT_MOUNTING_SIGKILL are removed.
They have been pretty much equivalent to MOUNT_UNMOUNTING_SIGTERM and
MOUNT_UNMOUNTING_SIGKILL in what they do, and the outcome has been
the same as well: the unit is stopped. Hence, let's simplify things a
bit, and merge them. Note that we keep
MOUNT_REMOUNTING_{SIGTERM|SIGKILL} however, as those states have a
different outcome: the unit remains started.
2. mount_enter_signal() will now honour the SendSIGKILL= option of the
mount unit if it was set. This was previously done already when we
entered the signal states through a timeout, and was simply missing
here.
3. A new helper function mount_enter_dead_or_mounted() is added that
places the mount unit in either MOUNT_DEAD or MOUNT_MOUNTED,
depending on what the kernel thinks about the mount's state. This
function is called at various places now, wherever we finished an
operation, and want to make sure our own state reflects again what
the kernel thinks. Previously we had very similar code in a number of
places and in other places didn't recheck the kernel state. Let's do
that with the same logic and function at all relevant places now.
4. Rework mount_stop(): never forget about running control processes.
Instead: when we have a start (i.e. a /bin/mount) process running,
and are asked to stop, then enter the kill states for it, so that it
gets cleaned up. This fixes #6048. Moreover, when we have a reload
process running convert the possible states into the relevant
unmounting states, so that we can properly execute the requested
operation.
Let's only collect the first failure in the load result, and let's clear
it explicitly when we are about to enter a new reload operation. This
makes it more alike the handling of the main result value (which also
only stores the first failure), and also the handling of service.c's
reload state.
This variable is not set by meson, so let's not try to use it.
We could use some more elaborate scheme (e.g. based on $MESON_BUILD_ROOT and
$MESON_SUBDIR) to find the path to systemd-sysv-generator, but it seems
that plain ./systemd-sysv-generator works just as well and has the advantage
that it's easy to invoke the test by hand (as long as one cd's to the
meson build dir).
fileio: try to read one byte too much in read_full_stream()
Let's read one byte more than the file size we read from stat() on the
first fread() invocation. That way, the first read() will already be
short and indicate eof to fread().
As it turns out LINE_MAX at 4096 is too short for many usecases. Since
the general concept of having a common maximum line length limit makes
sense let's add our own, and make it larger (1MB for now).
fileio: add new helper call read_line() as bounded getline() replacement
read_line() is much like getline(), and returns a line read from a
FILE*, of arbitrary sizes. In contrast to gets() it will grow the buffer
dynamically, and in contrast to getline() it will place a user-specified
boundary on the line.
socket: assign socket units to a default slice unconditionally
Due to the chown() logic socket units might end up with processes even
if no explicit command is defined for them, hence let's make sure these
processes are in the right cgroup, and that means within a slice.
Mount, swap and service units unconditionally are assigned to a slice
already, let's do the same here, too.
(This becomes more important as soon as the ebpf/firewall stuff is
merged, as there'll be another reason to fork off processes then)
cgroup: rework which files we chown() on delegation
On cgroupsv2 we should also chown()/chmod() the subtree_control file,
so that children can use controllers the way they like.
On cgroupsv1 we should also chown()/chmod() cgroups.clone_children, as
not setting this for new cgroups makes little sense, and hence delegated
clients should be able to write to it.
Note that error handling for both cases is different. subtree_control
matters so we check for errors, but the clone_children/tasks stuff
doesn't really, as it's legacy stuff. Hence we only log errors and
proceed.
cgroup-util: downgrade log messages from library code to LOG_DEBUG
These errors don't really matter, that's why we log and proceed in the
current code. However, we currently log at LOG_WARNING, but we really
shouldn't given that this is library code. Hence downgrade this to
LOG_DEBUG.
install: consider globally enabled units as "enabled" for the user
We would not consider symlinks in /etc/systemd/user/*.{wants,requires}/
towards the user unit being "enabled", because the symlinks were not
located in "config" paths. But this is confusing to users, since those units
are clearly enabled and will be started. So let's muddle the definition of
enablement a bit to include the paths only accessible to root when looking for
enabled user units.