Djalal Harouni [Thu, 26 May 2016 20:42:29 +0000 (22:42 +0200)]
nspawn: split out seccomp call into nspawn-seccomp.[ch]
Split seccomp into nspawn-seccomp.[ch]. Currently there are no changes,
but this will make it easy in the future to share or use the seccomp logic
from systemd core.
Djalal Harouni [Thu, 26 May 2016 10:59:49 +0000 (12:59 +0200)]
nspawn: rename is_procfs_sysfs_or_suchlike() to is_fs_fully_userns_compatible()
Rename is_procfs_sysfs_or_suchlike() to is_fs_fully_userns_compatible()
to give it the real meaning. This may prevent future modifications that
may introduce bugs.
Christian Hesse [Thu, 26 May 2016 13:57:37 +0000 (15:57 +0200)]
{machine,system}ctl: always pass &changes and &n_changes (#3350)
We have to pass addresses of changes and n_changes to
bus_deserialize_and_dump_unit_file_changes(). Otherwise we are hit by
missing information (subsequent calls to unit_file_changes_add() to
not add anything).
Also prevent null pointer dereference in
bus_deserialize_and_dump_unit_file_changes() by asserting.
Werner Fink [Wed, 18 Nov 2015 11:28:30 +0000 (12:28 +0100)]
ask-password: ask for passphrases not only on the first console of /dev/console
but also on all other consoles. This does help on e.g. mainframes
where often a serial console together with other consoles are
used. Even rack based servers attachted to both a serial console
as well as having a virtual console do sometimes miss a connected
monitor.
To be able to ask on all terminal devices of /dev/console the devices
are collected. If more than one device are found, then on each of the
terminals a inquiring task for passphrase is forked and do not return
to the caller.
Every task has its own session and its own controlling terminal.
If one of the tasks does handle a password, the remaining tasks
will be terminated.
Also let contradictory options on the command of
systemd-tty-ask-password-agent fail.
Spwan for each device of the system console /dev/console a own process.
Replace the system call wait() with with system call waitid().
Use SIGTERM instead of SIGHUP to get unresponsive childs down.
Port the collect_consoles() function forward to a pulbic and strv
based function "get_kernel_consoles()" in terminal-util.c and use this
in tty-ask-password-agent.c.
In [1] Michel Dänzer and Daniel Vetter wrote:
>> The scenario you describe isn't possible if the Wayland compositor
>> directly uses the KMS API of /dev/dri/card*, but it may be possible if
>> the Wayland compositor uses the fbdev API of /dev/fb* instead (e.g. if
>> weston uses its fbdev backend).
>
> Yeah, if both weston and your screen grabber uses native fbdev API you can
> now screenshot your desktop. And since fbdev has no concept of "current
> owner of the display hw" like the drm master, I think this is not fixable.
> At least not just in userspace. Also even with native KMS compositors
> fbdev still doesn't have the concept of ownership, which is why it doesn't
> bother clearing it's buffer before KMS takes over. I agree that this
> should be reverted or at least hidden better.
TBH, I think that privilege separation between processes running under the same
UID is tenuous. Even with drm, in common setups any user process can ptrace the
"current owner of the display" and call DROP_MASTER or do whatever. It *is*
possible to prevent that, e.g. by disabling ptrace using yama.ptrace_scope, or
selinux, and so on, but afaik this is not commonly done. E.g. all Fedora
systems pull in elfutils-default-yama-scope.rpm through dependencies which sets
yama.ptrace_scope=0. And even assuming that ptrace was disabled, it is trivial
to modify files on disk, communicate through dbus, etc; there is just to many
ways for a non-sandboxed process to interact maliciously with the display shell
to close them all off. To achieve real protection, some sort of sandboxing
must be implemented, and in that case there is no need to rely on access mode
on the device files, since much more stringent measures have to be implemented
anyway.
The situation is similar for framebuffer devices. It is common to add
framebuffer users to video group to allow them unlimited access to /dev/fb*.
Using uaccess would be better solution in that case. Also, since there is no
"current owner" limitation like in DRM, processes running under the same UID
should be able to access /proc/<pid-of-display-server>/fd/* and gain access to
the devices. Nevertheless, weston implements a suid wrapper to access the
devices and then drop privileges, and this patch would make this daemon
pointless. So if the weston developers feel that this change reduces security,
I prefer to revert it.
Tom Gundersen [Mon, 23 May 2016 23:34:29 +0000 (01:34 +0200)]
sd-device: udev-db - handle properties with empty value (#3330)
The statemachine was unable to parse properties with empty values,
reported in [0].
When reaching the start of the KEY, we would unconditionally read
one more character before starting to look for the end-of-line.
Simply look for the end-of-line from the first character.
Susant Sahani [Mon, 23 May 2016 09:13:57 +0000 (14:43 +0530)]
networkd: networkd: ndisc set SO_BINDTODEVICE on socket (#3294)
From the issue #2004 we are receiving packet even if this
packet is not intended for this interface.
This can be reproduced.
lp3s0: Updating address: 2001:db8:1:0:7e7a:91ff:fe6d:ffe2/64 (valid for 1d)
wlp3s0: Updating address: fe80::7e7a:91ff:fe6d:ffe2/64 (valid forever)
NDisc CLIENT: Received RA from non-link-local address ::. Ignoring.
NDisc CLIENT: Received RA on wrong interface: 2 != 6. Ignoring.
NDisc CLIENT: Received RA on wrong interface: 2 != 3. Ignoring.
enp0s25: Updating address: 2001:db8:1:0:2ad2:44ff:fe6a:ae07/64 (valid for 1d)
enp0s25: Updating address: fe80::2ad2:44ff:fe6a:ae07/64 (valid forever)
NDisc CLIENT: Sent Router Solicitation
NDisc CLIENT: Sent Router Solicitation
NDisc CLIENT: Sent Router Solicitation
NDisc CLIENT: Received RA on wrong interface: 3 != 2. Ignoring.
NDisc CLIENT: Received RA on wrong interface: 3 != 6. Ignoring.
NDisc CLIENT: Received RA from non-link-local address ::. Ignoring.
NDisc CLIENT: Received RA on wrong interface: 2 != 6. Ignoring.
NDisc CLIENT: Received RA on wrong interface: 2 != 3. Ignoring.
enp0s25: Updating address: 2001:db8:1:0:2ad2:44ff:fe6a:ae07/64 (valid for 1d)
enp0s25: Updating address: fe80::2ad2:44ff:fe6a:ae07/64 (valid forever)
run: do not try to use reply after freeing it (#3318)
We'd call sd_bus_message_unref and then proceed to use
variables pointing into the reply buffer (fd and char*).
dup the fd and copy the string before destorying the reply.
units: restore ConditionNeesUpdate=/etc in ldconfig.service (#3311)
In order to support stateless systems that support offline /usr updates
properly, let's restore the ConditionNeesUpdate=/etc line that makes sure we
are run when /usr is updated and this update needs to be propagated to the
/etc/ld.so.conf file stored in /etc.
This reverts part of #2859, which snuck this change in, but really shouldn't
have.
Tom Gundersen [Sat, 21 May 2016 21:00:32 +0000 (23:00 +0200)]
libsystemd-network: use recv(..., 0) instead of read(...) (#3317)
According to recv(2) these should be the same, but that is not true.
Passing a buffer of length 0 to read is defined to be a noop according
to read(2), but passing a buffer of length 0 to recv will discard the
pending pacet.
We can easily hit this as we allocate our buffer size depending on
the size of the incoming packet (using FIONREAD). As pointed out in
issue #3299 simply sending an empty UDP packet to the DHCP client
port will trigger a busy loop in networkd as we are polling on the
socket but never discarding the empty packet.
Make the fix for net/if.h fuckup even worse (#3287)
The original conflict is fixed in the kernel in v4.6-rc7-40-g4a91cb61bb,
but now our work-around causes a compilation failure.
Keep the workaround to support 4.5 kernels for now, and layer
more ugliness on top.
resolved: fix accounting of dns serves on a link (#3291)
After a few link up/down events I got this warning:
May 17 22:05:10 laptop systemd-resolved[2983]: Failed to read DNS servers for interface wlp3s0, ignoring: Argument list too long
tomty89 [Fri, 20 May 2016 10:28:30 +0000 (18:28 +0800)]
[networkd-dhcp6] do not call sd_dhcp6_client_start() from dhcp6_request_address()
Starting the DHCP client doesn't seem like dhcp6_request_address()'s responsibility anyway. Whenever it's called, sd_dhcp6_client_start() is unconditionally called outside of it as well. See ndisc_router_handler() and ndisc_handler() in networkd-ndisc.c.
Franck Bui [Thu, 19 May 2016 14:37:04 +0000 (16:37 +0200)]
systemctl: reload configuration when enabling sysv units too (#3297)
After enabling/disabling a unit, the daemon configuration is expected
to be unless '--no-reload' option is passed.
However this is not done when enabling a sysv units. This can lead to
the following scenario:
$ cp /etc/init.d/named /etc/init.d/foo
$ systemctl enable foo
foo.service is not a native service, redirecting to systemd-sysv-install
Executing /usr/lib/systemd/systemd-sysv-install enable foo
$ systemctl start foo
Failed to start foo.service: Unit foo.service failed to load: No such file or directory.
This can also be seen after installing a package providing a sysv
service: the service can't be started unless 'daemon-reload' is called
manually. This shouldn't be needed and this patch will fix this case
too since during package installation, the service is expected to be
enabled/disabled.
tblume [Thu, 19 May 2016 14:35:27 +0000 (16:35 +0200)]
systemctl: restore the no-sync option for legacy halt (#3249)
The sync() call on shutdown had been removed with commit 57371e5829a61e5ee6c9f98404dfc729d6c62608
together with the no-sync option for the shutdown commands.
The sync call was restored in commit 4a3ad39957399c4a30fc472a804e72907ecaa4f9 but the no-sync option
wasn't re-added.
I think we should restore this option at least for the legacy halt command.
rules: add /dev/disk/by-partuuid symlinks also for dos partition tables
blkid reports PARTUUID values also for partitions that are defined by a
dos partitioning scheme. Instead of limiting the partitioning scheme to
"gpt or dos" just drop the test for the partitioning scheme and trust
blkid to do the right thing.
Tejun Heo [Thu, 19 May 2016 00:35:12 +0000 (17:35 -0700)]
core: translate between IO and BlockIO settings to ease transition
Due to the substantial interface changes in cgroup unified hierarchy, new IO
settings are introduced. Currently, IO settings apply only to unified
hierarchy and BlockIO to legacy. While the transition is necessary, it's
painful for users to have to provide configs for both. This patch implements
translation from one config set to another for configs which make sense.
* The translation takes place during application of the configs. Users won't
see IO or BlockIO settings appearing without being explicitly created.
* The translation takes place only if there is no config for the matching
cgroup hierarchy type at all.
While this doesn't provide comprehensive compatibility, it should considerably
ease transition to the new IO settings which are a superset of BlockIO
settings.
v2:
- Update test-cgroup-mask.c so that it accounts for the fact that
CGROUP_MASK_IO and CGROUP_MASK_BLKIO move together. Also, test/parent.slice
now sets IOWeight instead of BlockIOWeight.
Tejun Heo [Wed, 18 May 2016 20:51:46 +0000 (13:51 -0700)]
core: update CGroupBlockIODeviceBandwidth to record both rbps and wbps
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies. Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.
* There's no way to clear specific config entry.
* When configs are cleared for an IO direction of a unit, the kernel settings
aren't cleared accordingly creating discrepancies.
This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
Tejun Heo [Wed, 18 May 2016 20:50:56 +0000 (13:50 -0700)]
core: add support for IOReadIOPSMax and IOWriteIOPSMax
cgroup IO controller supports maximum limits for both bandwidth and IOPS but
systemd resource control currently only supports bandwidth limits. This patch
adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy
is in use.
It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy
hierarchies but IO control on legacy hierarchies is half-broken anyway, so
let's leave it alone for now.
Tejun Heo [Wed, 18 May 2016 20:50:56 +0000 (13:50 -0700)]
core: introduce CGroupIOLimitType enums
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places. This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.
This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
Susant Sahani [Wed, 18 May 2016 12:49:40 +0000 (18:19 +0530)]
networkd: Drop IPv6LL address when link is down
Now we are not dropping the IPv6LL address when link is down.
So next time when link is up and before kernel acquired this address
we are using the old address.
When the link is down kernel tells us that this address is no longer
valid . Let's remove this address and again when kernel tells us
that the address is added let's use it.
Susant Sahani [Wed, 18 May 2016 02:59:56 +0000 (08:29 +0530)]
networkd: do not update state or IPv6LL address if link is failed or lingering
This is partial fix for #2228 and #2977, #3204.
bridge-test: netdev ready
docker0: Gained IPv6LL
wlan0: Gained IPv6LL
eth0: Gained IPv6LL
Enumeration completed
bridge-test: netdev exists, using existing without changing its
parameters
vboxnet0: IPv6 enabled for interface: Success
lo: Configured
docker0: Could not drop address: No such process
vboxnet0: Gained carrier
wlan0: Could not drop address: No such process
eth0: Could not drop address: No such process
eth0: Could not drop address: No such process
eth0: Could not drop address: No such process
vboxnet0: Gained IPv6LL
vboxnet0: Could not set NDisc route or address: Invalid argument
vboxnet0: Failed
[New Thread 0x7ffff6505700 (LWP 1111)]
[Thread 0x7ffff6505700 (LWP 1111) exited]
Assertion 'link->state == LINK_STATE_SETTING_ROUTES' failed at
src/network/networkd-link.c:672, function link_enter_configured().
Aborting.
Program received signal SIGABRT, Aborted.
0x00007ffff6dc6a98 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install
iptables-1.4.21-15.fc23.x86_64 libattr-2.4.47-14.fc23.x86_64
libidn-1.32-1.fc23.x86_64 pcre-8.38-7.fc23.x86_64
Debugging
(gdb) bt
"link->state == LINK_STATE_SETTING_ROUTES", file=0x5555556a34c8
"src/network/networkd-link.c", line=672,
func=0x5555556a56d0 <__PRETTY_FUNCTION__.14850>
"link_enter_configured") at src/basic/log.c:788
src/network/networkd-link.c:672
src/network/networkd-link.c:720
flags=0 '\000', scope=0 '\000', cinfo=0x7fffffffe020) at
src/network/networkd-address.c:344
(rtnl=0x5555556eded0, message=0x55555570ff20, userdata=0x5555556ec590)
at src/network/networkd-manager.c:604
m=0x55555570ff20) at src/libsystemd/sd-netlink/sd-netlink.c:365
at src/libsystemd/sd-netlink/sd-netlink.c:395
ret=0x0) at src/libsystemd/sd-netlink/sd-netlink.c:429
revents=1, userdata=0x5555556eded0) at
src/libsystemd/sd-netlink/sd-netlink.c:723
src/libsystemd/sd-event/sd-event.c:2268
src/libsystemd/sd-event/sd-event.c:2629
timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2688
bus=0x5555556eeba0, name=0x55555568a2f5 "org.freedesktop.network1",
timeout=30000000,
check_idle=0x55555556adb6 <manager_check_idle>,
userdata=0x5555556ec590) at src/shared/bus-util.c:134
src/network/networkd-manager.c:1130
src/network/networkd.c:127
(gdb) f 3
src/network/networkd-link.c:672
672 assert(link->state == LINK_STATE_SETTING_ROUTES);
(gdb) p link->state
$1 = LINK_STATE_FAILED
We should not be in this state .
even if vboxnet0 failed we went into this state.
vboxnet0: Could not set NDisc route or address: Invalid argument
vboxnet0: Failed
Clemens Gruber [Tue, 17 May 2016 23:34:25 +0000 (01:34 +0200)]
networkd: Add EmitRouter= option for DHCP Server (#3251)
Add an option to disable appending DHCP option 3 (Router) to the DHCP
OFFER and ACK packets.
This commit adds the boolean option EmitRouter= for the [DHCPServer]
section in .network files.
Rationale: On embedded devices, it is very useful to have a DHCP server
running on an USB OTG ethernet gadget interface to avoid manual setup on
the client PCs, but it should only serve IP addresses, no route(r)s.
Otherwise, Windows clients experience network connectivity issues, due
to them using the address set in DHCP option 3 as default gateway.
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
This isn't quite symmetrical to in_addr_from_string() because it also returns
an offset indicating how much of the string was consumed by the matched
pattern. This offset reporting is needed for either of the following use
cases:
* verifying the lack of trailing garbage after such an address
* parsing subsequent data from the same string
networkd currently silently accepts some strings as MAC addresses that it
probably shouldn't (like "ab:cd:ef:12:34:56:78" and "ab:cd:ef:12:3 4:56").
Add tests to MAC address parsing to ensure that we only accept valid MAC
addresses, and that we accept the three most common forms of MAC address
(colon-delimited hex, IEEE, and Cisco)
Several of these tests currently fail, but another commit in this series will
resolve them.
man: clarify that IOXyz= only applies to the unified hierarchy, and BlockIOXyz= to the legacy hierarchy
With this change for each setting we say which hierarachy it applies to briefly
in the first sentence of the description, plus in longer form in an extra
pargraph at the end, with a recommendation for the counterpart of the option in
the other hierarchy.
Also adds markup and the "=" suffix to all mentioned settings.
Michal Sekletar [Mon, 16 May 2016 15:24:51 +0000 (17:24 +0200)]
core: don't log job status message in case job was effectively NOP (#3199)
We currently generate log message about unit being started even when
unit was started already and job didn't do anything. This is because job
was requested explicitly and hence became anchor job of the transaction
thus we could not eliminate it. That is fine but, let's not pollute
journal with useless log messages.
Current state:
$ journalctl -u systemd-resolved | grep Started
May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution.
After patch applied:
$ journalctl -u systemd-resolved | grep Started
May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution.
topimiettinen [Mon, 16 May 2016 02:34:05 +0000 (02:34 +0000)]
namespace: Make private /dev noexec and readonly (#3263)
Private /dev will not be managed by udev or others, so we can make it
noexec and readonly after we have made all device nodes. As /dev/shm
needs to be writable, we can't use bind_remount_recursive().
Susant Sahani [Sun, 15 May 2016 13:45:30 +0000 (19:15 +0530)]
networkd: do not generate a mac address for vlan interfaces (#3221)
While creating a VLAN the mac address should be copied from the parent interface, so that
the VLANs inherit the MAC address of the physical interface.
Before:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
6: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:07:73:9d:43:59 brd ff:ff:ff:ff:ff:ff
7: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 46:30:76:33:35:d4 brd ff:ff:ff:ff:ff:ff
```
After:
```
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
...
11: vlan1@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
12: vlan2@wlp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:26:c6:85:a3:c2 brd ff:ff:ff:ff:ff:ff
```
It is just an alias for route_free which requires that route is not null,
but it was only used in one place where it was checked that route is not
null anyway. Let's just call route_free instead.
Lars Uebernickel [Sat, 14 May 2016 20:10:22 +0000 (22:10 +0200)]
busctl: use Monitoring interface (#3245)
This is now the recommended way to do monitoring by upstream D-Bus.
It's also allowed in the default policy, whereas eavesdrop is not
anymore, which effectively broke busctl on many systems.
Tejun Heo [Sat, 14 May 2016 19:56:53 +0000 (15:56 -0400)]
core: allow slice to be overriden if cgroups aren't realized (#3246)
unit_set_slice() fails with -EBUSY if the unit already has a slice associated
with it. This makes it impossible to override slice through dropin config or
over dbus. There's no reason to disallow slice changes as long as cgroups
aren't realized. Fix it.
kayrus [Thu, 12 May 2016 16:58:59 +0000 (18:58 +0200)]
core: added ListUnitsByNames dbus method (#3182)
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves https://github.com/coreos/fleet/pull/1418
Daniel Drake [Thu, 12 May 2016 16:42:39 +0000 (10:42 -0600)]
Create initrd-root-device.target synchronization point (#3239)
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).