network: allow LLDP packets to cross non-customer bridges for container network interfaces
This changes the default .network files we ship for nspawn containers to set
EmitLLDP=customer-bridge in order to allow propagation of the LLDP packets
across bridges. This is useful so that "networkctl status" shows all peers
connected to a virtual container network, collecting this data via LLDP. This
is safe since the default configuration for these interfaces does not bridge
these links to external interfaces, but relies on IP routing for this.
networkd: reworkd LLDP emission to allow control of propagation level
This allows selecting the propagation level of emitted LLDP packets
(specifically: the destination MAC address of the packets). This is useful
because it allows generating LLDP packets that optionally cross certain types
of bridges.
nspawn: add new --network-zone= switch for automatically managed bridge devices
This adds a new concept of network "zones", which are little more than bridge
devices that are automatically managed by nspawn: when the first container
referencing a bridge is started, the bridge device is created, when the last
container referencing it is removed the bridge device is removed again. Besides
this logic --network-zone= is pretty much identical to --network-bridge=.
The usecase for this is to make it easy to run multiple related containers
(think MySQL in one and Apache in another) in a common, named virtual Ethernet
broadcast zone, that only exists as long as one of them is running, and fully
automatically managed otherwise.
In this test /etc/fstab is replaced by -.mount unit. This causes
systemd-remount-fs.service to not remount / rw, which in turn causes various
failures becuase /var is not writable. In particular
systemd-tmpfiles-setup.service reports many failures. This is something
to possibly fix on its own (see https://github.com/systemd/systemd/issues/791);
in the meanwhile let's fix this test so that it doesn't fail, since the
point of the test is to check aliases on mount units, and not a ro root.
tests: enable logging for pid1, disable for other systemd services
systemd-udev generated an insane amount of log output at debug level.
It would break TEST-02-CRYPTSETUP by filling the overflowing the disk
(which seems to be a bug in itself!).
tests: specify format=raw for qemu to avoid warning
WARNING: Image format was not specified for
'/var/tmp/systemd-test.tGi3od/rootdisk.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write
operations on block 0 will be restricted. Specify the 'raw' format
explicitly to remove the restrictions.
Also use unsafe caching mode, we don't care about data integrity here.
core: rework how we flush incoming traffic when a socket unit goes down
Previously, we'd simply close and reopen the socket file descriptors. This is
problematic however, as we won't transition through the SOCKET_CHOWN state
then, and thus the file ownership won't be correct for the sockets.
Rework the flushing logic, and actually read any queued data from the sockets
for flushing, and accept any queued messages and disconnect them.
core: don't implicit open missing socket fds on daemon reload
Previously, when the daemon was reloaded and the configuration of a socket unit
file was changed so that a different set of socket ports was defined for the
socket we'd simply reopen the socket fds not yet open. This is problematic
however, as this means the SOCKET_CHOWN state is not run for them, and thus
their UID/GID is not corrected.
With this change, don't open the missing file descriptors, but log about this
issue, and ask the user to restart the socket explicit, to make sure all
missing fds are opened.
For similar reasons as the recent addition of a limit on sessions.
Note that we don't enforce a limit on inhibitors per-user currently, but
there's an implicit one, since each inhibitor takes up one fd, and fds are
limited via RLIMIT_NOFILE, and the limit on the number of processes per user.
logind: don't include session lists in PropertyChanged messages
If we have a lot of simultaneous sessions we really shouldn't send the full
list of active sessions with each PropertyChanged message for user and seat
objects, as that can become quite substantial data, we probably shouldn't dump
on the bus on each login and logout.
Note that the global list of sessions doesn't send out changes like this
either, it only supports requesting the session list with ListSessions().
If cients want to get notified about sessions coming and going they should
subscribe to SessionNew and SessionRemoved signals, and clients generally do
that already.
This is kind of an API break, but then again the fact that this was included
was never documented.
logind: process session/inhibitor fds at higher priority
Let's make sure we process session and inhibitor pipe fds (that signal
sessions/inhibtors going away) at a higher priority
than new bus calls that might create new sessions or inhibitors. This helps
ensuring that the number of open sessions stays minimal.
We really should put limits on all resources we manage, hence add one to the
number of concurrent sessions, too. This was previously unbounded, hence set a
relatively high limit of 8K by default.
Note that most PAM setups will actually invoke pam_systemd prefixed with "-",
so that the return code of pam_systemd is ignored, and the login attempt
succeeds anyway. On systems like this the session will be created but is not
tracked by systemd.
tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhere
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.
This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.
This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:
1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7)
2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)
This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.
Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
https://github.com/systemd/systemd/issues/1961
Tejun Heo [Wed, 4 May 2016 21:43:13 +0000 (17:43 -0400)]
core: fix segfault on "systemctl --set-property UNIT BlockIODeviceWeight=WEIGHT"
bus_append_unit_property_assignment() was missing an argument for
sd_bus_message_append() when processing BlockIODeviceWeight leading to
segfault. Fix it.
Lubomir Rintel [Tue, 3 May 2016 20:15:49 +0000 (22:15 +0200)]
strbuf: set the proper character when creating new nodes
Commit 82501b3fc added an early break when a terminal node is found to
incorrect place -- before setting c. This caused trie to be built that
does not correspond to what it points to in buffer, causing incorrect
deduplications:
Alex Crawford [Thu, 28 Apr 2016 06:59:20 +0000 (23:59 -0700)]
install: cache the presets before evaluating
The previous implementation traversed the various config directories,
walking the preset files and parsing each line to determine if a service
should be enabled or disabled. It did this for every service which
resulted in many more file operations than neccessary.
This approach parses each of the preset entries into an array which is
then used to check if each service should be enabled or disabled.
Susant Sahani [Tue, 3 May 2016 17:48:21 +0000 (23:18 +0530)]
networkd: add support to set route table
networkd: add support to set route table
1. add support to configure the table id.
if id is less than 256 we can fit this in the header of route as
netlink property is a char. But in kernel this proepty is a
unsigned 32. Hence if greater that 256 add this as RTA_TABLE
attribute.
2. we are not setting the address family now. Now set this property.
man: add a description of DUIDType and DUIDRawData
This is essentially a revert of f38e0cce75ff2ffbd99f7e382ed39c160bb7d799 (which
removed the documentation of DUIDType on purpose). The description is heavily
updated for the new semantics.
Second second duid type field is removed. The first field was used to carry
the result of DUIDType= configuration, and the second was either a copy of
this, or contained the type extracted from DuidRawData. The semantics are changed
so that the type specified in DUIDType is always used. DUIDRawData= no longer
overrides the type setting.
The networkd code is now more constrained than the sd-dhcp code:
DUIDRawData cannot have 0 length, length 0 is treated the same as unsetting.
Likewise, it is not possible to set a DUIDType=0. If it ever becomes necessary
to set type=0 or a zero-length duid, the code can be changed to support that.
Nevertheless, I think that's unlikely.
This addresses #3127 § 1 and 3.
v2:
- rename DUID.duid, DUID.duid_len to DUID.raw_data, DUID.raw_data_len
dh-dhcp{,6}-client: change the semantics of DUID setting
Both versions of the code are changed to allow the caller to override
DUID using simple rules: duid type and value may be specified, in
which case the caller is responsible to providing the contents,
or just duid type may be specified as DUID_TYPE_EN, in which case we
we fill in the values. In the future more support for other types may
be added, e.g. DUID_TYPE_LLT.
There still remains and ugly discrepancy between dhcp4 and dhcp6 code:
dhcp6 has sd_dhcp6_client_set_duid and sd_dhcp6_client_set_iaid and
requires client->state to be DHCP6_STATE_STOPPED, while dhcp4 has
sd_dhcp_client_set_iaid_duid and will reconfigure the client if it
is not stopped. This commit doesn't touch that part.
After all it is used in more than one place and is not that short.
Also tweak the test a bit:
- do not check that duid_len > 0, because we want to allow unknown
duid types, and there might be some which are fine with 0 length data,
(also assert should not be called from library code),
- always check that duid_len <= MAX_DUID_LEN, because we could overwrite
available buffer space otherwise.