resolved: never allow _gateway lookups to go to the network
Make them rather fail than go to the network.
Previously we'd filter them on LLMNR (explicitly) and MDNS (implicitly,
because it doesn't have .local suffix), but not on DNS.
In order to make _gateway truly reliable, let's not allow it to go to
DNS either, and keep it local.
This is particular relevant, as clients can now request lookups without
local RR synthesis, where we'd rather have NXDOMAIN returned for
_gateway than have it hit the network.
resolved: lower SERVFAIL cache timeout from 30s to 10s
Apparently 30s is a bit too long for some cases, see #5552. But not
caching SERVFAIL at all also breaks stuff, see explanation in 201d99584ed7af8078bb243ce2587e5455074713.
Let's try to find some middle ground, by lowering the cache timeout to
10s. This should be ample for the problem 201d99584ed7af8078bb243ce2587e5455074713 attackes, but not as long as
half a miute, as #5552 complains.
virt: Properly detect nested UML inside another hypervisor
UML runs as a user-process so it can quite easily be ran inside of
another hypervisor, for instance inside a KVM instance. UML passes
through the CPUID from the host machine so in this case detect_vm
incorrectly identifies as running under KVM. So check we are running
a UML kernel first, before we check any other hypervisors.
Franck Bui [Mon, 30 Nov 2020 14:26:15 +0000 (15:26 +0100)]
scope: on unified, make sure to unwatch all PIDs once they've been moved to the cgroup scope
Commit 428a9f6f1d0396b9eacde2b38d667cbe3f15eb55 freed u->pids which is
problematic since the references to this unit in m->watch_pids were no more
removed when the unit was freed.
This patch makes sure to clean all this refs up before freeing u->pids by
calling unit_unwatch_all_pids().
Quoting #17559:
> libseccomp 2.5 added socket syscall multiplexing on ppc64(el):
> https://github.com/seccomp/libseccomp/pull/229
>
> Like with i386, s390 and s390x this breaks socket argument filtering, so
> RestrictAddressFamilies doesn't work.
>
> This causes the unit test to fail:
> /* test_restrict_address_families */
> Operating on architecture: ppc
> Failed to install socket family rules for architecture ppc, skipping: Operation canceled
> Operating on architecture: ppc64
> Failed to add socket() rule for architecture ppc64, skipping: Invalid argument
> Operating on architecture: ppc64-le
> Failed to add socket() rule for architecture ppc64-le, skipping: Invalid argument
> Assertion 'fd < 0' failed at src/test/test-seccomp.c:424, function test_restrict_address_families(). Aborting.
>
> The socket filters can't be added so `socket(AF_UNIX, SOCK_DGRAM, 0);` still
> works, triggering the assertion.
Only some small changes, because we updated recently. As usual, it seems that there are mostly
additions with a smaller amount of corrections, no big removals.
Dan Streetman [Wed, 25 Nov 2020 20:22:24 +0000 (15:22 -0500)]
test: use cap_last_cap() for max supported cap number, not capability_list_length()
This test assumes capability_list_length() is an invalid cap number,
but that isn't true if the running kernel supports more caps than we were
compiled with, which results in the test failing.
Instead use cap_last_cap() + 1.
If cap_last_cap() is 63, there are no more 'invalid' cap numbers to test with,
so the invalid cap number test part is skipped.
specifiers: introduce common macros for generating specifier tables
In many cases the tables are largely the same, hence define a common set
of macros to generate the common parts.
This adds in a couple of missing specifiers here and there, so is more
thant just refactoring: it actually fixes accidental omissions.
Note that some entries that look like they could be unified under these
macros can't really be unified, since they are slightly different. For
example in the DNSSD service logic we want to use the DNSSD hostname for
%H rather than the unmodified kernel one.
INSUN PYO [Thu, 19 Nov 2020 01:49:04 +0000 (10:49 +0900)]
sd-device-enumerator: do not return error when a device is removed
If /sys/class/OOO node is created and destroyed during booting (kernle driver initialization fails),
systemd-udev-trigger.service fails due to race condition.
3. device_enumerator_scan_devices() => enumerator_scan_devices_all() => enumerator_scan_dir("class") =>
opendir("/sys/class") and iterate all subdirs ==> enumerator_scan_dir_and_add_devices("/sys/class/OOO")
4. kernel driver fails and destroy /sys/class/OOO
5. enumerator_scan_dir_and_add_devices("/sys/class/OOO") fails in opendir("/sys/class/OOO")
6. "systemd-udev-trigger.service" fails
7. udev coldplug fails and some device units not ready
8. mount units asociated with device units fail
9. local-fs.target fails
10. enters emergency mode
********************************************************************************************************
***** status of systemd-udev-trigger.service unit ******************************************************
$ systemctl status systemd-udev-trigger.service
systemd-udev-trigger.service - udev Coldplug all Devices
Loaded: loaded (/usr/lib/systemd/system/systemd-udev-trigger.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-01-02 13:16:54 KST; 22min ago
Docs: man:udev(7)
man:systemd-udevd.service(8)
Process: 2162 ExecStart=/usr/bin/udevadm trigger --type=subsystems --action=add (code=exited, status=0/SUCCESS)
Process: 2554 ExecStart=/usr/bin/udevadm trigger --type=devices --action=add (code=exited, status=1/FAILURE)
Main PID: 2554 (code=exited, status=1/FAILURE)
Jan 02 13:16:54 localhost udevadm[2554]: Failed to scan devices: No such file or directory
Jan 02 13:16:54 localhost systemd[1]: systemd-udev-trigger.service: Main process exited, code=exited, status=1/FAILURE
Jan 02 13:16:54 localhost systemd[1]: systemd-udev-trigger.service: Failed with result 'exit-code'.
Jan 02 13:16:54 localhost systemd[1]: Failed to start udev Coldplug all Devices.
*******************************************************************************************************
***** journal log with Environment=SYSTEMD_LOG_LEVEL=debug in systemd-udev-trigger.service ***********
Jan 01 21:57:20 localhost udevadm[2039]: sd-device-enumerator: Scanning /sys/bus
Jan 01 21:57:20 localhost udevadm[2522]: sd-device-enumerator: Scan all dirs
Jan 01 21:57:20 localhost udevadm[2522]: sd-device-enumerator: Scanning /sys/bus
Jan 01 21:57:21 localhost udevadm[2522]: sd-device-enumerator: Scanning /sys/class
Jan 01 21:57:21 localhost udevadm[2522]: sd-device-enumerator: Failed to scan /sys/class: No such file or directory
Jan 01 21:57:21 localhost udevadm[2522]: Failed to scan devices: No such file or directory
*******************************************************************************************************
After the commit 1cdbff1c844ce46f1d84d8feeed426ebfd550988, each entry .conf contains
redundant slash like the following:
```
$ cat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-5.9.8-200.fc33.x86_64.conf
title Fedora 33 (Thirty Three)
version 5.9.8-200.fc33.x86_64
machine-id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
options root=/dev/nvme0n1p2 ro rootflags=subvol=system/fedora selinux=0 audit=0
linux //xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/5.9.8-200.fc33.x86_64/linux
initrd //xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/5.9.8-200.fc33.x86_64/initrd
```
Jörg Thalheim [Sat, 14 Nov 2020 13:50:39 +0000 (14:50 +0100)]
networkd/dhcp6: allow layer3 devices without MAC
Devices with multicast but without mac addresses i.e. tun devices
are not getting setuped correctly:
$ ip tuntap add mode tun dev tun0
$ ip addr show tun0
16: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
link/none
$ cat /etc/systemd/network/tun0.network
[Match]
Name = tun0
[Network]
Address=192.168.1.1/32
$ ./systemd-networkd
tun0: DHCP6 CLIENT: Failed to set identifier: Invalid argument
tun0: Failed
Franck Bui [Fri, 20 Nov 2020 10:52:36 +0000 (11:52 +0100)]
units: restore sysfs conditions in sys-fs-fuse-connections.mount and sys-kernel-config.mount
Commit 42cc2855ba2fe4c6f5d incorrectly removed the condition on sysfs in both
sys-fs-fuse-connections.mount and sys-kernel-config.mount. However there are
still needed in case modprobe of one of these modules is intentionally skipped
(due to lack of privs for example).
This patch restores the 2 conditions which should be safe for the common case,
since all conditions are only checked after all deps ordered before are
complete.
Franck Bui [Mon, 16 Nov 2020 14:12:21 +0000 (15:12 +0100)]
core: serialize u->pids until the processes have been moved to the scope cgroup
Otherwise if a daemon-reload happens somewhere between the enqueue of the job
start for the scope unit and scope_start() then u->pids might be lost and none
of the processes specified by "PIDs=" will be moved into the scope cgroup.
seccomp: move brk+mmap+mmap2 into @default syscall filter set
These three syscalls are internally used by libc's memory allocation
logic, i.e. ultimately back malloc(). Allocating a bit of memory is so
basic, it should just be in the default set.
This fixes a couple of issues with asan/msan and the seccomp tests: when
asan/msan is used some additional, large memory allocations take place
in the background, and unless mmap/mmap2/brk are allowlisted these will
fail, aborting the test prematurely.
Franck Bui [Thu, 19 Nov 2020 08:17:19 +0000 (09:17 +0100)]
units: wait until some fs modules are entirely loaded before mounting their corresponding filesystem
udev requests to start the fs mount units when their respective module is
loaded. For that it monitors uevents of type "ADD" for the relevant fs modules.
However the uevent is sent by the kernel too early, ie before the init() of the
module is called hence before directories in /sys/fs/ are created.
This patch workarounds adds "Requires/After=modprobe@<fs-module>.service" to
the mount unit, which means that modprobe(8) will be called once the fs module
is announced to be loaded. This sounds pointless, but given that modprobe only
returns after the initialization of the module is complete, it should
workaround the issue.
As a side effect, the module will be automatically loaded if the mount unit is
started manually.
The presence of /sys/module/%I directory can't be used to assert that the load
of a given module is complete and therefore the call to modprobe(8) can be
skipped. Indeed this directory is created before the init() function of the
module is called.
Users of modprobe@.service needs to be sure that once this service returns the
module is fully operational.