tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhere
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.
This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.
This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:
1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7)
2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)
This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.
Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
https://github.com/systemd/systemd/issues/1961
Lubomir Rintel [Tue, 3 May 2016 20:15:49 +0000 (22:15 +0200)]
strbuf: set the proper character when creating new nodes
Commit 82501b3fc added an early break when a terminal node is found to
incorrect place -- before setting c. This caused trie to be built that
does not correspond to what it points to in buffer, causing incorrect
deduplications:
Alex Crawford [Thu, 28 Apr 2016 06:59:20 +0000 (23:59 -0700)]
install: cache the presets before evaluating
The previous implementation traversed the various config directories,
walking the preset files and parsing each line to determine if a service
should be enabled or disabled. It did this for every service which
resulted in many more file operations than neccessary.
This approach parses each of the preset entries into an array which is
then used to check if each service should be enabled or disabled.
Susant Sahani [Tue, 3 May 2016 17:48:21 +0000 (23:18 +0530)]
networkd: add support to set route table
networkd: add support to set route table
1. add support to configure the table id.
if id is less than 256 we can fit this in the header of route as
netlink property is a char. But in kernel this proepty is a
unsigned 32. Hence if greater that 256 add this as RTA_TABLE
attribute.
2. we are not setting the address family now. Now set this property.
man: add a description of DUIDType and DUIDRawData
This is essentially a revert of f38e0cce75ff2ffbd99f7e382ed39c160bb7d799 (which
removed the documentation of DUIDType on purpose). The description is heavily
updated for the new semantics.
Second second duid type field is removed. The first field was used to carry
the result of DUIDType= configuration, and the second was either a copy of
this, or contained the type extracted from DuidRawData. The semantics are changed
so that the type specified in DUIDType is always used. DUIDRawData= no longer
overrides the type setting.
The networkd code is now more constrained than the sd-dhcp code:
DUIDRawData cannot have 0 length, length 0 is treated the same as unsetting.
Likewise, it is not possible to set a DUIDType=0. If it ever becomes necessary
to set type=0 or a zero-length duid, the code can be changed to support that.
Nevertheless, I think that's unlikely.
This addresses #3127 § 1 and 3.
v2:
- rename DUID.duid, DUID.duid_len to DUID.raw_data, DUID.raw_data_len
dh-dhcp{,6}-client: change the semantics of DUID setting
Both versions of the code are changed to allow the caller to override
DUID using simple rules: duid type and value may be specified, in
which case the caller is responsible to providing the contents,
or just duid type may be specified as DUID_TYPE_EN, in which case we
we fill in the values. In the future more support for other types may
be added, e.g. DUID_TYPE_LLT.
There still remains and ugly discrepancy between dhcp4 and dhcp6 code:
dhcp6 has sd_dhcp6_client_set_duid and sd_dhcp6_client_set_iaid and
requires client->state to be DHCP6_STATE_STOPPED, while dhcp4 has
sd_dhcp_client_set_iaid_duid and will reconfigure the client if it
is not stopped. This commit doesn't touch that part.
After all it is used in more than one place and is not that short.
Also tweak the test a bit:
- do not check that duid_len > 0, because we want to allow unknown
duid types, and there might be some which are fine with 0 length data,
(also assert should not be called from library code),
- always check that duid_len <= MAX_DUID_LEN, because we could overwrite
available buffer space otherwise.
resolved: work around broken DNS zones set up by incapdns.net
incapdns.net returns NXDOMAIN for the SOA of the zone itself but is not a
terminal. This is against the specs, but we really should be able to deal with
this.
Previously, when verifying whether an NXDOMAIN response for a SOA/NS lookup is
rightfully unsigned we'd issue a SOA lookup for the parent's domain, to derive
the state from that. If the parent SOA would get an NXDOMAIN, we'd continue
upwards, until we hit a signed top-level domain, which suggests that the domain
actually exists.
With this change whenver we need to authenticate an NXDOMAIN SOA reply, we'll
request the DS RR for the zone first, and use for validation, since that this
must be from the parent's zone, not the incorrect lower zone.
automount: rework propagation between automount and mount units
Port the progagation logic to the generic Unit->trigger_notify() callback logic
in the unit vtable, that is called for a unit not only when the triggered unit
of it changes state but also when a job for that unit finishes. This, firstly
allows us to make the code a bit cleaner and more generic, but more
importantly, allows us to notice correctly when a mount job fails, and
propagate that back to autofs client processes.
And let's make it more accurate: if we have acquire the list of unit drop-ins,
then let's do a full comparison against the old list we already have, and if
things differ in any way, we know we have to reload.
This makes sure we detect changes to drop-in directories in more cases.
Until that commit, do determine whether a daemon reload was required we compare
the mtime of the main unit file we loaded with the mtime of it on disk for
equality, but for drop-ins we only stored the newest mtime of all of them and
then did a "newer-than" comparison. This was brokeni with the above commit,
when all checks where changed to be for equality.
With this change all checks are now done as "newer-than", fixing the drop-in
mtime case. Strictly speaking this will not detect a number of changes that the
code before above commit detected, but given that the mtime is unlikely to go
backwards, and this is just intended to be a helpful hint anyway, this looks OK
in order to keep things simple.
core: move enforcement of the start limit into per-unit-type code again
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.
The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.
Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.
Note that restores the "start-limit-hit" result code that existed before 6bf0f408e4833152197fb38fb10a9989c89f3a59 already in the service code. However,
it's not introduced for all units that have a result code concept.
machined: also make image removal operation asynchronous
If we remove a directory image (i.e. not a btrfs snapshot) then things might
get quite expensive, hence run this asynchronous in a forked off process, too.
machined: support non-btrfs file systems with "machinectl clone"
Fall back to a normal copy operation when the backing file system isn't btrfs,
and hence doesn't support cheap snapshotting. Of course, this will be slow, but
given that the execution is asynchronous now, this should be OK.
copy: adjust directory times after writing to the directory
When recursively copying a directory tree, fix up the file times after having
created all contents in it, so that our changes don't end up altering any of
the directory times.
util: rework sigkill_wait() to not require pid_t pointer
Let's make sigkill_wait() take a normal pid_t, and add sigkill_waitp() that
takes a pointer (which is useful for usage in _cleanup_), following the usual
logic we have for this.
machined: run clone operation asynchronously in the background
Cloning an image can be slow, if the image is not on a btrfs subvolume, hence
let's make sure we do this asynchronously in a child process, so that machined
isn't blocked as long as we process the client request.
This adds a new, generic "Operation" object to machined, that is used to track
these kind of background processes.
This is inspired by the MachineOperation object that already exists to make
copy operations asynchronous. A later patch will rework the MachineOperation
logic to use the generic Operation instead.
shared/install: ignore Alias in [Install] of units which don't allow aliases
A downside is that a warning about missing [Install] is printed:
$ systemctl --root=/ enable mnt-test.mount
[/etc/systemd/system/mnt-test.mount:5] Aliases are not allowed for mount units, ignoring.
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units).
This means they are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
4) In case of template units, the unit is meant to be enabled with some
instance name specified.
That's a bit misleading, but I don't see an easy way to fix this. But
the situation is similar for many other parsing errors, so maybe that's
OK.
Alex Crawford [Sun, 1 May 2016 16:32:17 +0000 (09:32 -0700)]
test: ensure presets are evaluated in order
This tests to make sure that preset patterns are checked in the order
they were declared. Both "prefix-1.service" and "prefix-2.service" match
against two rules: their exact name (which enables the service) and
"prefix-*.service" (which disables the service). Because of the
ordering, only "prefix-1.service" should be enabled.
networkd: rework headers to avoid circular includes
Header files were organized in a way where the includer would add various
typedefs used by the includee before including it, resulting in a tangled
web of dependencies between files.
core: refuse merging on units when the unit type does not support alias
The concept of merging units exists so that we can create Unit objects for a
number of names early, and then load them only later, possibly merging units
which then turn out to be symlinked to other names. This of course only makes
sense for unit types where multiple names per unit are supported. For all
others, let's refuse the merge operation early.
core: merge service_connection_unref() into service_close_socket_fd()
We always call one after the other anyway, and this way service_set_socket_fd()
and service_close_socket_fd() nicely match each other as one undoes the effect
of the other.
There's no need to set the no_gc bit for service units that socket units
prepare, as we always keep a proper reference (as maintained by unit_ref_set())
on them, and such references are honoured by the GC logic anyway. Moreover,
explicitly setting the no_gc bit is problematic if the socket gets GC'ed for a
reason, as the service might then leak with the bit set.
In service_set_socket_fd(), let's make sure that if we can't add the requested
dependencies we take no possession of the passed connection fd.
This way, we follow the strict rule: we take possession of the passed fd on
success, but on failure we don't, and the fd remains in possession of the
caller.
core: rename StartLimitInterval= to StartLimitIntervalSec=
We generally follow the rule that for time settings we suffix the setting name
with "Sec" to indicate the default unit if none is specified. The only
exception was the rate limiting interval settings. Fix this, and keep the old
names for compatibility.
Do the same for journald's RateLimitInterval= setting
core: move start ratelimiting check after condition checks
With #2564 unit start rate limiting was moved from after the condition checks
are to before they are made, in an attempt to fix #2467. This however resulted
in #2684. However, with a previous commit a concept of per socket unit trigger
rate limiting has been added, to fix #2467 more comprehensively, hence the
start limit can be moved after the condition checks again, thus fixing #2684.
core: introduce activation rate limiting for socket units
This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that
define a rate limit for activation of socket units. When the limit is hit, the
socket is is put into a failure mode. This is an alternative fix for #2467,
since the original fix resulted in issue #2684.
In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be
changed to be applied after any start conditions checks are made. This way,
there are two separate rate limiters enforced: one at triggering time, before
any jobs are queued with this patch, as well as the start limit that is moved
again to be run immediately before the unit is activated. Condition checks are
done in between the two, and thus no longer affect the start limit.
build-sys: improve compat with older kernel headers
In 4.2 kernel headers, some netlink defines are missing that we need. missing.h
already can add them in, but currently makes this dependent on a definition
that these kernels already have. Change the check hence to check for the newest
definition in the table, so that the whole bunch of definitions as added in on
all kernels lacking this.
path-util: also support ".old" and ".new" suffixes and recommend them
~ suffix works fine, but looks to much like it the file is supposed to be
automatically cleaned up. For new versions of configuration files installers
might want to using something that looks more permanent like foobar.new.
So let's add treat ".old" and ".new" as special.