This makes sure nspawn's --volatile=yes switch works again: there we
have a read-only image that is overmounted by a tmpfs (with the
exception of /usr). This we need to mkdir all mount points even though
the image is read-only.
Hence, let's drop the optimizatio of avoiding mkdir() on images that are
read-only, it's wrong and misleading here, since the image itself might
be read-only but our mounts are not.
dissect-image: clean up meaning of DISSECT_IMAGE_MKDIR
Previously handling of DISSECT_IMAGE_MKDIR was pretty weird and broken:
it would control both if we create the top-level mount point when
mounting an image, and the inner mount points for images that consist of
multiple file systems. However, the latter is redundant, since 1f0f82f1311e4c52152b8e2b6f266258709c137d does this too, a few lines
further up – unconditionally!
Hence, let's make the meaning of DISSECT_IMAGE_MKDIR more strict: it
shall be only about the top-level mount point, not about the inner ones
(where we'll continue to create what is missing alwayway). Having a
separate flag for the top-level mount point is relevant, since the mount
point dir created by it will remain on the host fs – unlike the
directories we create inside the image, which will stay within the
image.
This slightly change of meaning is actually inline with what the flag is
actually used for and documented in systemd-dissect.
This creates a "well known" for sgx_enclave ownership. By doing this here we
avoid the risk that various projects making use of the device will provide
similar-but-slightly-incompatible installation instructions, in particular
using different group names.
ACLs are actually a better approach to grant access to users, but not in all
cases, so we want to provide a standard group anyway.
Mode is 0o660, not 0o666 because this is very new code and distributions are
likely to not want to give full access to all users. This might change in the
future, but being conservative is a good default in the beginning.
Rules for /dev/sgx_provision will be provided by libsg-ae-pce:
https://github.com/intel/linux-sgx/issues/678.
Frantisek Sumsal [Wed, 10 Mar 2021 15:41:35 +0000 (16:41 +0100)]
coredump: omit coredump info when -q is used with the `debug` verb
Skip printing the coredump info table when using the `debug` verb in
combination with the `-q/--quiet` option. Useful when trying to gather
coredump info non-interactively via scripted gdb commands.
Frantisek Sumsal [Wed, 10 Mar 2021 11:30:04 +0000 (12:30 +0100)]
test: fix permissions of the ASan udev workaround
otherwise udev complains about the file being world-writable:
systemd-udevd[228]: Configuration file /etc/udev/rules.d/00-set-LD_PRELOAD.rules is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
The patch seems to cause usb devices to get some attributes set from the parent
PCI device. 'hwdb' builtin has support for breaking iteration upwards on usb
devices. But when '--subsystem=foo' is specified, iteration is continued. I'm
sure it *could* be figured out, but it seems hard to get all the combinations
correct. So let's revert to functional status quo ante, even if does the lookup
more than once unnecessarily.
When running TEST-22 under ASan, there's a chain of events which causes
`stat` to output an extraneous ASan error message, causing following
fail:
```
+ test -d /tmp/d/1
++ stat -c %U:%G:%a /tmp/d/1
==82==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
+ test = daemon:daemon:755
.//usr/lib/systemd/tests/testdata/units/testsuite-22.02.sh: line 24: test: =: unary operator expected
```
This is caused by `stat` calling nss which in Arch's configuration calls
the nss-systemd module, that pulls in libasan which causes the $LD_PRELOAD
error message, since `stat` is an uninstrumented binary.
The $LD_PRELOAD variable is explicitly unset for all testsuite-* services
since it causes various issues when calling uninstrumented libraries, so
setting it globally is not an option. Another option would be to set
$LD_PRELOAD for each `stat` call, but that would unnecessarily clutter
the test code.
socket-util: refuse ifnames with embedded '%' as invalid
So Linux has this (insane — in my opinion) "feature" that if you name a
network interface "foo%d" then it will automatically look for the
interface starting with "foo…" with the lowest number that is not used
yet and allocates that.
We should never clash with this "magic" handling of ifnames, hence
refuse this, since otherwise we never know what the name is we end up
with.
We should probably switch things from a deny list to an allow list
sooner or later and be much stricter. Since the kernel directly enforces
only very few rules on the names, we'd need to do some research what is
safe and what is not first, though.
sd-netlink: use setsockopt_int() also for NETLINK_ADD/DROP_MEMBERSHIP
We use 'unsigned' as the type, but netlink(7) says the type is 'int'.
It doesn't really matter, since they are both the same size. Let's use
our helper to shorten the code a bit.
timedated: fix skipping of comments in config file
Reading file '/usr/lib/systemd/ntp-units.d/80-systemd-timesync.list'
Failed to add NTP service "# This file is part of systemd.", ignoring: Invalid argument
Failed to add NTP service "# See systemd-timedated.service(8) for more information.", ignoring: Invalid argument
efi-loader: make efi_loader_entry_name_valid() check a bit stricter
Previously we'd just check if the ID was no-empty an no longer than
FILENAME_MAX. The latter was probably a mistake, given the comment next
to it. Instead of fixing that to check for NAME_MAX let's instead just
switch over to filename_is_valid() which odes a similar check, plus a
some minor additional checks. After all we do want that valid EFI boot
menu entry ids are usable as filenames.
This fixes two checks where we compare string sizes when validating with
FILENAME_MAX. In both cases the check apparently wants to check if the
name fits in a filename, but that's not actually what FILENAME_MAX can
be used for, as it — in contrast to what the name suggests — actually
encodes the maximum length of a path.
In both cases the stricter change doesn't actually change much, but the
use of FILENAME_MAX is still misleading and typically wrong.
Yu Watanabe [Mon, 8 Mar 2021 03:00:32 +0000 (12:00 +0900)]
seccomp: do not ignore deny-listed syscalls with errno when list is allow-list
Previously, if the hashmap is allow-list and a new deny-listed syscall
is added, seccomp_parse_syscall_filter() simply drop the new syscall
from hashmap even if error number is specified.
This makes 'allow-list' hashmap store two types of entries:
- allow-listed syscalls, which are stored with negative value (-1).
- deny-listed syscalls, which are stored with specified errno.
Yu Watanabe [Mon, 8 Mar 2021 02:54:05 +0000 (11:54 +0900)]
core: drop meaningless parse_syscall_and_errno() calls
parse_syscall_and_errno() does not check the validity of syscall name or
syscall group name, but it just split into syscall name and errno.
So, it is not necessary to call it for SystemCallLog=.
homed: unref the sd_event object after the sources
Shouldn't make any difference, but let's first flush any pending messages, then
unref the reference-counted stuff, and only at the end do the direct free calls.
We'd crash when trying to access an already-deallocated object:
Thread no. 1 (7 frames)
#2 log_assert_failed_realm at ../src/basic/log.c:844
#3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035
#4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250
#5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631
#6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689
#7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711
#8 run at ../src/home/homed.c:47
The source in question is an inotify source, and the messages are:
systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning.
systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting.
on_home_inotify() got called, then manager_watch_home(), which unrefs the
existing inotify_event_source. I assume that the source gets dispatched again
because it was still in the pending queue.
I can't reproduce the issue (timing?), but this should
fix #17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
Dell new Privacy feature provide new hardware level privacy
protect for users
This patch remaps scancode 0x120001 to key code F20 micmute
The old matching string cannot cover some other Dell products
which have the privacy feature,expand the string to all the system
that can load the privacy driver,privacy driver already detect the
system if it can support this feature. So here we can safely just
map the micmute key to scancode 0x120001
This test would normally get stuck when trying to mount the verity image
due to:
systemd-udevd[299]: dm-0: '/usr/sbin/dmsetup udevflags 6293812'(err) '==371==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.'
systemd-udevd[299]: dm-0: Process '/usr/sbin/dmsetup udevflags 6293812' failed with exit code 1
...
systemd-udevd[299]: dm-0: '/usr/sbin/dmsetup udevcomplete 6293812'(err) '==372==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.'
systemd-udevd[299]: dm-0: Process '/usr/sbin/dmsetup udevcomplete 6293812' failed with exit code 1.
systemd-udevd[299]: dm-0: Command "/usr/sbin/dmsetup udevcomplete 6293812" returned 1 (error), ignoring.
so let's add a simple udev rule which sets $LD_PRELOAD for the block
subsystem.
Also, install the ASan library along with necessary dependencies into
the verity minimal image, to get rid of the annoying (yet harmless)
errors about missing library from $LD_LIBRARY.
resolved: when synthesizing stub replies from multiple upstream packet, let's avoid RR duplicates
If we synthesize a stub reply from multiple upstream packet (i.e. a
series of CNAME/DNAME redirects), it might happen that we add the same
RR to a different reply section at a different CNAME/DNAME redirect
chain element. Let's clean this up once we are about to send the reply
message to the client: let's remove sections from "lower-priority"
sections when they are already listed in a "higher-priority" section.
resolved: fully follow CNAMEs in the DNS stub after all
In 2f4d8e577ca7bc51fb054b8c2c8dd57c2e188a41 I argued that following
CNAMEs in the stub is not necessary anymore. However, I think it' better
to revert to the status quo ante and follow it after all, given it is
easy for us and makes sure our D-Bus/varlink replies are more similar to
our DNS stub replies that way, and we save clients potential roundtrips.
Hence, whenever we hit a CNAME/DNAME redirect, let's restart the query
like we do for the D-Bus/Varlink case, and collect replies as we go.
resolved: handle multiple CNAME redirects in a single reply from upstream
www.netflix.com responds with a chain of CNAMEs in the same packet.
Let's handle that properly (so far we only followed CNAMEs a single step
when in the same packet)
resolved: tighten checks in dns_resource_record_get_cname_target()
Let's refuse to consider CNAME/DNAME replies matching for RR types where
that is not really conceptually allow (i.e. on CNAME/DNAME lookups
themselves).
(And add a similar check to dns_resource_key_match_cname_or_dname() too,
which implements a smilar match)
Also use structued initialization in one more place, use '\0' for NUL bytes,
and move variable to the right block (the code was OK, but it is strange to
have 'char *value' defined in a different scope then 'size_t value_allocated').
The fuzzer seems to have no trouble with this sample. It seems that the
problem reported in the bug is not caused by the match parsing code. But
let's add the sample just in case.
This fuzzer is based on test-bus-match. Even the initial corpus is
derived entirely from it.
https://bugzilla.redhat.com/show_bug.cgi?id=1935084 shows an crash
in bus_match_parse(). I checked the coverage stats on oss-fuzz, and
sadly existing fuzzing did not cover this code at all.
==305970== Invalid free() / delete / delete[] / realloc()
==305970== at 0x483E9F1: free (vg_replace_malloc.c:538)
==305970== by 0x4012CD: mfree (alloc-util.h:48)
==305970== by 0x4012EF: freep (alloc-util.h:83)
==305970== by 0x4017F4: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970== Address 0x59972f0 is 0 bytes inside a block of size 8,192 free'd
==305970== at 0x483FCE4: realloc (vg_replace_malloc.c:834)
==305970== by 0x4C986F7: _IO_mem_finish (in /usr/lib64/libc-2.33.so)
==305970== by 0x4C8F5E0: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.33.so)
==305970== by 0x49D2CDB: fclose_nointr (fd-util.c:108)
==305970== by 0x49D2D3D: safe_fclose (fd-util.c:124)
==305970== by 0x4A4BCCC: fclosep (fd-util.h:41)
==305970== by 0x4A4E00F: bus_match_to_string (bus-match.c:859)
==305970== by 0x4016C2: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970== Block was alloc'd at
==305970== at 0x483FAE5: calloc (vg_replace_malloc.c:760)
==305970== by 0x4C98787: open_memstream (in /usr/lib64/libc-2.33.so)
==305970== by 0x49D56D6: open_memstream_unlocked (fileio.c:97)
==305970== by 0x4A4DEC5: bus_match_to_string (bus-match.c:859)
==305970== by 0x4016C2: LLVMFuzzerTestOneInput (fuzz-bus-match.c:58)
==305970== by 0x401A58: main (fuzz-main.c:39)
==305970==
So the fclose() which is called from _cleanup_fclose_ clearly reallocates the
buffer (maybe to save memory?). open_memstream(3) says:
The locations referred to by these pointers are updated each time the
stream is flushed (fflush(3)) and when the stream is closed (fclose(3)).
This seems to mean that we should close the stream first before grabbing the
buffer pointer.