Tejun Heo [Thu, 19 May 2016 00:35:12 +0000 (17:35 -0700)]
core: translate between IO and BlockIO settings to ease transition
Due to the substantial interface changes in cgroup unified hierarchy, new IO
settings are introduced. Currently, IO settings apply only to unified
hierarchy and BlockIO to legacy. While the transition is necessary, it's
painful for users to have to provide configs for both. This patch implements
translation from one config set to another for configs which make sense.
* The translation takes place during application of the configs. Users won't
see IO or BlockIO settings appearing without being explicitly created.
* The translation takes place only if there is no config for the matching
cgroup hierarchy type at all.
While this doesn't provide comprehensive compatibility, it should considerably
ease transition to the new IO settings which are a superset of BlockIO
settings.
v2:
- Update test-cgroup-mask.c so that it accounts for the fact that
CGROUP_MASK_IO and CGROUP_MASK_BLKIO move together. Also, test/parent.slice
now sets IOWeight instead of BlockIOWeight.
Tejun Heo [Wed, 18 May 2016 20:51:46 +0000 (13:51 -0700)]
core: update CGroupBlockIODeviceBandwidth to record both rbps and wbps
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies. Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.
* There's no way to clear specific config entry.
* When configs are cleared for an IO direction of a unit, the kernel settings
aren't cleared accordingly creating discrepancies.
This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
Tejun Heo [Wed, 18 May 2016 20:50:56 +0000 (13:50 -0700)]
core: add support for IOReadIOPSMax and IOWriteIOPSMax
cgroup IO controller supports maximum limits for both bandwidth and IOPS but
systemd resource control currently only supports bandwidth limits. This patch
adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy
is in use.
It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy
hierarchies but IO control on legacy hierarchies is half-broken anyway, so
let's leave it alone for now.
Tejun Heo [Wed, 18 May 2016 20:50:56 +0000 (13:50 -0700)]
core: introduce CGroupIOLimitType enums
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places. This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.
This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
Susant Sahani [Wed, 18 May 2016 02:59:56 +0000 (08:29 +0530)]
networkd: do not update state or IPv6LL address if link is failed or lingering
This is partial fix for #2228 and #2977, #3204.
bridge-test: netdev ready
docker0: Gained IPv6LL
wlan0: Gained IPv6LL
eth0: Gained IPv6LL
Enumeration completed
bridge-test: netdev exists, using existing without changing its
parameters
vboxnet0: IPv6 enabled for interface: Success
lo: Configured
docker0: Could not drop address: No such process
vboxnet0: Gained carrier
wlan0: Could not drop address: No such process
eth0: Could not drop address: No such process
eth0: Could not drop address: No such process
eth0: Could not drop address: No such process
vboxnet0: Gained IPv6LL
vboxnet0: Could not set NDisc route or address: Invalid argument
vboxnet0: Failed
[New Thread 0x7ffff6505700 (LWP 1111)]
[Thread 0x7ffff6505700 (LWP 1111) exited]
Assertion 'link->state == LINK_STATE_SETTING_ROUTES' failed at
src/network/networkd-link.c:672, function link_enter_configured().
Aborting.
Program received signal SIGABRT, Aborted.
0x00007ffff6dc6a98 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install
iptables-1.4.21-15.fc23.x86_64 libattr-2.4.47-14.fc23.x86_64
libidn-1.32-1.fc23.x86_64 pcre-8.38-7.fc23.x86_64
Debugging
(gdb) bt
"link->state == LINK_STATE_SETTING_ROUTES", file=0x5555556a34c8
"src/network/networkd-link.c", line=672,
func=0x5555556a56d0 <__PRETTY_FUNCTION__.14850>
"link_enter_configured") at src/basic/log.c:788
src/network/networkd-link.c:672
src/network/networkd-link.c:720
flags=0 '\000', scope=0 '\000', cinfo=0x7fffffffe020) at
src/network/networkd-address.c:344
(rtnl=0x5555556eded0, message=0x55555570ff20, userdata=0x5555556ec590)
at src/network/networkd-manager.c:604
m=0x55555570ff20) at src/libsystemd/sd-netlink/sd-netlink.c:365
at src/libsystemd/sd-netlink/sd-netlink.c:395
ret=0x0) at src/libsystemd/sd-netlink/sd-netlink.c:429
revents=1, userdata=0x5555556eded0) at
src/libsystemd/sd-netlink/sd-netlink.c:723
src/libsystemd/sd-event/sd-event.c:2268
src/libsystemd/sd-event/sd-event.c:2629
timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2688
bus=0x5555556eeba0, name=0x55555568a2f5 "org.freedesktop.network1",
timeout=30000000,
check_idle=0x55555556adb6 <manager_check_idle>,
userdata=0x5555556ec590) at src/shared/bus-util.c:134
src/network/networkd-manager.c:1130
src/network/networkd.c:127
(gdb) f 3
src/network/networkd-link.c:672
672 assert(link->state == LINK_STATE_SETTING_ROUTES);
(gdb) p link->state
$1 = LINK_STATE_FAILED
We should not be in this state .
even if vboxnet0 failed we went into this state.
vboxnet0: Could not set NDisc route or address: Invalid argument
vboxnet0: Failed
Clemens Gruber [Tue, 17 May 2016 23:34:25 +0000 (01:34 +0200)]
networkd: Add EmitRouter= option for DHCP Server (#3251)
Add an option to disable appending DHCP option 3 (Router) to the DHCP
OFFER and ACK packets.
This commit adds the boolean option EmitRouter= for the [DHCPServer]
section in .network files.
Rationale: On embedded devices, it is very useful to have a DHCP server
running on an USB OTG ethernet gadget interface to avoid manual setup on
the client PCs, but it should only serve IP addresses, no route(r)s.
Otherwise, Windows clients experience network connectivity issues, due
to them using the address set in DHCP option 3 as default gateway.
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
man: clarify that IOXyz= only applies to the unified hierarchy, and BlockIOXyz= to the legacy hierarchy
With this change for each setting we say which hierarachy it applies to briefly
in the first sentence of the description, plus in longer form in an extra
pargraph at the end, with a recommendation for the counterpart of the option in
the other hierarchy.
Also adds markup and the "=" suffix to all mentioned settings.
Michal Sekletar [Mon, 16 May 2016 15:24:51 +0000 (17:24 +0200)]
core: don't log job status message in case job was effectively NOP (#3199)
We currently generate log message about unit being started even when
unit was started already and job didn't do anything. This is because job
was requested explicitly and hence became anchor job of the transaction
thus we could not eliminate it. That is fine but, let's not pollute
journal with useless log messages.
Current state:
$ journalctl -u systemd-resolved | grep Started
May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution.
After patch applied:
$ journalctl -u systemd-resolved | grep Started
May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution.
topimiettinen [Mon, 16 May 2016 02:34:05 +0000 (02:34 +0000)]
namespace: Make private /dev noexec and readonly (#3263)
Private /dev will not be managed by udev or others, so we can make it
noexec and readonly after we have made all device nodes. As /dev/shm
needs to be writable, we can't use bind_remount_recursive().
Susant Sahani [Sun, 15 May 2016 13:45:30 +0000 (19:15 +0530)]
networkd: do not generate a mac address for vlan interfaces (#3221)
While creating a VLAN the mac address should be copied from the parent interface, so that
the VLANs inherit the MAC address of the physical interface.
Before:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
6: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:07:73:9d:43:59 brd ff:ff:ff:ff:ff:ff
7: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 46:30:76:33:35:d4 brd ff:ff:ff:ff:ff:ff
```
After:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
11: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
12: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
```
It is just an alias for route_free which requires that route is not null,
but it was only used in one place where it was checked that route is not
null anyway. Let's just call route_free instead.
Lars Uebernickel [Sat, 14 May 2016 20:10:22 +0000 (22:10 +0200)]
busctl: use Monitoring interface (#3245)
This is now the recommended way to do monitoring by upstream D-Bus.
It's also allowed in the default policy, whereas eavesdrop is not
anymore, which effectively broke busctl on many systems.
Tejun Heo [Sat, 14 May 2016 19:56:53 +0000 (15:56 -0400)]
core: allow slice to be overriden if cgroups aren't realized (#3246)
unit_set_slice() fails with -EBUSY if the unit already has a slice associated
with it. This makes it impossible to override slice through dropin config or
over dbus. There's no reason to disallow slice changes as long as cgroups
aren't realized. Fix it.
kayrus [Thu, 12 May 2016 16:58:59 +0000 (18:58 +0200)]
core: added ListUnitsByNames dbus method (#3182)
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves https://github.com/coreos/fleet/pull/1418
Daniel Drake [Thu, 12 May 2016 16:42:39 +0000 (10:42 -0600)]
Create initrd-root-device.target synchronization point (#3239)
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).
_const_ means that the caller can assume that the function will return the same
result every time (and will not modify global memory). special_glyph() meets
this: even though it depends on global memory, that part of global memory is
not expected to change. This allows the calls to special_glyph() to be
optimized, even if -flto is not used.
tree-wide: rename draw_special_char to special_glyph
That function doesn't draw anything on it's own, just returns a string, which
sometimes is more than one character. Also remove "DRAW_" prefix from character
names, TREE_* and ARROW and BLACK_CIRCLE are unambigous on their own, don't
draw anything, and are always used as an argument to special_glyph().
Rename "DASH" to "MDASH", as there's more than one type of dash.
shared/install: use "→" instead of "pointing to" for a symlink
It's quite a bit shorter and just as readable.
(The full sentence with "pointing to" was added to replace a text that used
"ln -s %s %s". Using the "ln" syntax is indeed unclear, because it's not
obvious which is the source and which is the target, and because symlink(2)
uses the opposite order to ln(1). But with the unicode arrow there should
be no ambiguity.)
shared/install: do not print warning when a unit is already enabled
Executing 'systemctl enable' on the same unit twice would cause
a warning about a missing [Install] section to be printed. To avoid
this, count all symlinks that "would" be created, and return 1
no matter if we actually created a symlink or skipped creation because
it already exists.
shared/install: handle dangling aliases as an explicit case, report nicely
This fixes 'preset-all' with a unit that is a dangling symlink.
$ systemctl --root=/ preset-all
Unit syslog.service is an alias to a unit that is not present, ignoring.
Unit auditd.service is masked, ignoring.
Unit NetworkManager.service is masked, ignoring.
shared/install: add some more debug messages and comments
$ systemctl --root=/ preset foobar.service
Cannot find unit foobar.service.
Failed to preset: No such file or directory.
$ systemctl --root=/ preset foobar@.service
Cannot find unit foobar@.service.
Failed to preset: No such file or directory.
$ systemctl --root=/ preset foobar@blah.service
Cannot find unit foobar@blah.service or foobar@.service.
Failed to preset: No such file or directory.
network: allow LLDP packets to cross non-customer bridges for container network interfaces
This changes the default .network files we ship for nspawn containers to set
EmitLLDP=customer-bridge in order to allow propagation of the LLDP packets
across bridges. This is useful so that "networkctl status" shows all peers
connected to a virtual container network, collecting this data via LLDP. This
is safe since the default configuration for these interfaces does not bridge
these links to external interfaces, but relies on IP routing for this.
networkd: reworkd LLDP emission to allow control of propagation level
This allows selecting the propagation level of emitted LLDP packets
(specifically: the destination MAC address of the packets). This is useful
because it allows generating LLDP packets that optionally cross certain types
of bridges.
nspawn: add new --network-zone= switch for automatically managed bridge devices
This adds a new concept of network "zones", which are little more than bridge
devices that are automatically managed by nspawn: when the first container
referencing a bridge is started, the bridge device is created, when the last
container referencing it is removed the bridge device is removed again. Besides
this logic --network-zone= is pretty much identical to --network-bridge=.
The usecase for this is to make it easy to run multiple related containers
(think MySQL in one and Apache in another) in a common, named virtual Ethernet
broadcast zone, that only exists as long as one of them is running, and fully
automatically managed otherwise.
In this test /etc/fstab is replaced by -.mount unit. This causes
systemd-remount-fs.service to not remount / rw, which in turn causes various
failures becuase /var is not writable. In particular
systemd-tmpfiles-setup.service reports many failures. This is something
to possibly fix on its own (see https://github.com/systemd/systemd/issues/791);
in the meanwhile let's fix this test so that it doesn't fail, since the
point of the test is to check aliases on mount units, and not a ro root.
tests: enable logging for pid1, disable for other systemd services
systemd-udev generated an insane amount of log output at debug level.
It would break TEST-02-CRYPTSETUP by filling the overflowing the disk
(which seems to be a bug in itself!).
tests: specify format=raw for qemu to avoid warning
WARNING: Image format was not specified for
'/var/tmp/systemd-test.tGi3od/rootdisk.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write
operations on block 0 will be restricted. Specify the 'raw' format
explicitly to remove the restrictions.
Also use unsafe caching mode, we don't care about data integrity here.
core: rework how we flush incoming traffic when a socket unit goes down
Previously, we'd simply close and reopen the socket file descriptors. This is
problematic however, as we won't transition through the SOCKET_CHOWN state
then, and thus the file ownership won't be correct for the sockets.
Rework the flushing logic, and actually read any queued data from the sockets
for flushing, and accept any queued messages and disconnect them.
core: don't implicit open missing socket fds on daemon reload
Previously, when the daemon was reloaded and the configuration of a socket unit
file was changed so that a different set of socket ports was defined for the
socket we'd simply reopen the socket fds not yet open. This is problematic
however, as this means the SOCKET_CHOWN state is not run for them, and thus
their UID/GID is not corrected.
With this change, don't open the missing file descriptors, but log about this
issue, and ask the user to restart the socket explicit, to make sure all
missing fds are opened.