topimiettinen [Mon, 16 May 2016 02:34:05 +0000 (02:34 +0000)]
namespace: Make private /dev noexec and readonly (#3263)
Private /dev will not be managed by udev or others, so we can make it
noexec and readonly after we have made all device nodes. As /dev/shm
needs to be writable, we can't use bind_remount_recursive().
Susant Sahani [Sun, 15 May 2016 13:45:30 +0000 (19:15 +0530)]
networkd: do not generate a mac address for vlan interfaces (#3221)
While creating a VLAN the mac address should be copied from the parent interface, so that
the VLANs inherit the MAC address of the physical interface.
Before:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
6: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:07:73:9d:43:59 brd ff:ff:ff:ff:ff:ff
7: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 46:30:76:33:35:d4 brd ff:ff:ff:ff:ff:ff
```
After:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
11: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
12: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
```
Lars Uebernickel [Sat, 14 May 2016 20:10:22 +0000 (22:10 +0200)]
busctl: use Monitoring interface (#3245)
This is now the recommended way to do monitoring by upstream D-Bus.
It's also allowed in the default policy, whereas eavesdrop is not
anymore, which effectively broke busctl on many systems.
Tejun Heo [Sat, 14 May 2016 19:56:53 +0000 (15:56 -0400)]
core: allow slice to be overriden if cgroups aren't realized (#3246)
unit_set_slice() fails with -EBUSY if the unit already has a slice associated
with it. This makes it impossible to override slice through dropin config or
over dbus. There's no reason to disallow slice changes as long as cgroups
aren't realized. Fix it.
kayrus [Thu, 12 May 2016 16:58:59 +0000 (18:58 +0200)]
core: added ListUnitsByNames dbus method (#3182)
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves https://github.com/coreos/fleet/pull/1418
Daniel Drake [Thu, 12 May 2016 16:42:39 +0000 (10:42 -0600)]
Create initrd-root-device.target synchronization point (#3239)
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).
_const_ means that the caller can assume that the function will return the same
result every time (and will not modify global memory). special_glyph() meets
this: even though it depends on global memory, that part of global memory is
not expected to change. This allows the calls to special_glyph() to be
optimized, even if -flto is not used.
tree-wide: rename draw_special_char to special_glyph
That function doesn't draw anything on it's own, just returns a string, which
sometimes is more than one character. Also remove "DRAW_" prefix from character
names, TREE_* and ARROW and BLACK_CIRCLE are unambigous on their own, don't
draw anything, and are always used as an argument to special_glyph().
Rename "DASH" to "MDASH", as there's more than one type of dash.
shared/install: use "→" instead of "pointing to" for a symlink
It's quite a bit shorter and just as readable.
(The full sentence with "pointing to" was added to replace a text that used
"ln -s %s %s". Using the "ln" syntax is indeed unclear, because it's not
obvious which is the source and which is the target, and because symlink(2)
uses the opposite order to ln(1). But with the unicode arrow there should
be no ambiguity.)
shared/install: do not print warning when a unit is already enabled
Executing 'systemctl enable' on the same unit twice would cause
a warning about a missing [Install] section to be printed. To avoid
this, count all symlinks that "would" be created, and return 1
no matter if we actually created a symlink or skipped creation because
it already exists.
shared/install: handle dangling aliases as an explicit case, report nicely
This fixes 'preset-all' with a unit that is a dangling symlink.
$ systemctl --root=/ preset-all
Unit syslog.service is an alias to a unit that is not present, ignoring.
Unit auditd.service is masked, ignoring.
Unit NetworkManager.service is masked, ignoring.
shared/install: add some more debug messages and comments
$ systemctl --root=/ preset foobar.service
Cannot find unit foobar.service.
Failed to preset: No such file or directory.
$ systemctl --root=/ preset foobar@.service
Cannot find unit foobar@.service.
Failed to preset: No such file or directory.
$ systemctl --root=/ preset foobar@blah.service
Cannot find unit foobar@blah.service or foobar@.service.
Failed to preset: No such file or directory.
network: allow LLDP packets to cross non-customer bridges for container network interfaces
This changes the default .network files we ship for nspawn containers to set
EmitLLDP=customer-bridge in order to allow propagation of the LLDP packets
across bridges. This is useful so that "networkctl status" shows all peers
connected to a virtual container network, collecting this data via LLDP. This
is safe since the default configuration for these interfaces does not bridge
these links to external interfaces, but relies on IP routing for this.
networkd: reworkd LLDP emission to allow control of propagation level
This allows selecting the propagation level of emitted LLDP packets
(specifically: the destination MAC address of the packets). This is useful
because it allows generating LLDP packets that optionally cross certain types
of bridges.
nspawn: add new --network-zone= switch for automatically managed bridge devices
This adds a new concept of network "zones", which are little more than bridge
devices that are automatically managed by nspawn: when the first container
referencing a bridge is started, the bridge device is created, when the last
container referencing it is removed the bridge device is removed again. Besides
this logic --network-zone= is pretty much identical to --network-bridge=.
The usecase for this is to make it easy to run multiple related containers
(think MySQL in one and Apache in another) in a common, named virtual Ethernet
broadcast zone, that only exists as long as one of them is running, and fully
automatically managed otherwise.
In this test /etc/fstab is replaced by -.mount unit. This causes
systemd-remount-fs.service to not remount / rw, which in turn causes various
failures becuase /var is not writable. In particular
systemd-tmpfiles-setup.service reports many failures. This is something
to possibly fix on its own (see https://github.com/systemd/systemd/issues/791);
in the meanwhile let's fix this test so that it doesn't fail, since the
point of the test is to check aliases on mount units, and not a ro root.
tests: enable logging for pid1, disable for other systemd services
systemd-udev generated an insane amount of log output at debug level.
It would break TEST-02-CRYPTSETUP by filling the overflowing the disk
(which seems to be a bug in itself!).
tests: specify format=raw for qemu to avoid warning
WARNING: Image format was not specified for
'/var/tmp/systemd-test.tGi3od/rootdisk.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write
operations on block 0 will be restricted. Specify the 'raw' format
explicitly to remove the restrictions.
Also use unsafe caching mode, we don't care about data integrity here.
core: rework how we flush incoming traffic when a socket unit goes down
Previously, we'd simply close and reopen the socket file descriptors. This is
problematic however, as we won't transition through the SOCKET_CHOWN state
then, and thus the file ownership won't be correct for the sockets.
Rework the flushing logic, and actually read any queued data from the sockets
for flushing, and accept any queued messages and disconnect them.
core: don't implicit open missing socket fds on daemon reload
Previously, when the daemon was reloaded and the configuration of a socket unit
file was changed so that a different set of socket ports was defined for the
socket we'd simply reopen the socket fds not yet open. This is problematic
however, as this means the SOCKET_CHOWN state is not run for them, and thus
their UID/GID is not corrected.
With this change, don't open the missing file descriptors, but log about this
issue, and ask the user to restart the socket explicit, to make sure all
missing fds are opened.
For similar reasons as the recent addition of a limit on sessions.
Note that we don't enforce a limit on inhibitors per-user currently, but
there's an implicit one, since each inhibitor takes up one fd, and fds are
limited via RLIMIT_NOFILE, and the limit on the number of processes per user.
logind: don't include session lists in PropertyChanged messages
If we have a lot of simultaneous sessions we really shouldn't send the full
list of active sessions with each PropertyChanged message for user and seat
objects, as that can become quite substantial data, we probably shouldn't dump
on the bus on each login and logout.
Note that the global list of sessions doesn't send out changes like this
either, it only supports requesting the session list with ListSessions().
If cients want to get notified about sessions coming and going they should
subscribe to SessionNew and SessionRemoved signals, and clients generally do
that already.
This is kind of an API break, but then again the fact that this was included
was never documented.
logind: process session/inhibitor fds at higher priority
Let's make sure we process session and inhibitor pipe fds (that signal
sessions/inhibtors going away) at a higher priority
than new bus calls that might create new sessions or inhibitors. This helps
ensuring that the number of open sessions stays minimal.
We really should put limits on all resources we manage, hence add one to the
number of concurrent sessions, too. This was previously unbounded, hence set a
relatively high limit of 8K by default.
Note that most PAM setups will actually invoke pam_systemd prefixed with "-",
so that the return code of pam_systemd is ignored, and the login attempt
succeeds anyway. On systems like this the session will be created but is not
tracked by systemd.
tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhere
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification
dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On
overloaded systems this means that only 30 connections may be queued without
dbus-daemon processing them before further connection attempts fail. Our
cgroups-agent binary so far used D-Bus for its messaging, and hitting this
limit hence may result in us losing cgroup empty messages.
This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM.
Since sockets of these types need no connection set up, no listen() backlog
applies. Our cgroup-agent binary will hence simply block as long as it can't
enqueue its datagram message, so that we won't lose cgroup empty messages as
likely anymore.
This also rearranges the ordering of the processing of SIGCHLD signals, service
notification messages (sd_notify()...) and the two types of cgroup
notifications (inotify for the unified hierarchy support, and agent for the
classic hierarchy support). We now always process events for these in the
following order:
1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7)
2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6)
3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5)
This is because when receiving SIGCHLD we invalidate PID information, which we
need to process the service notification messages which are bound to PIDs.
Hence the order between the first two items. And we want to process SIGCHLD
metadata to detect whether a service is gone, before using cgroup
notifications, to decide when a service is gone, since the former carries more
useful metadata.
Related to this:
https://bugs.freedesktop.org/show_bug.cgi?id=95264
https://github.com/systemd/systemd/issues/1961
Tejun Heo [Wed, 4 May 2016 21:43:13 +0000 (17:43 -0400)]
core: fix segfault on "systemctl --set-property UNIT BlockIODeviceWeight=WEIGHT"
bus_append_unit_property_assignment() was missing an argument for
sd_bus_message_append() when processing BlockIODeviceWeight leading to
segfault. Fix it.