Mike Gilbert [Tue, 9 Mar 2021 22:57:37 +0000 (17:57 -0500)]
cg_unified_cached: return ENOMEDIUM if we cannot find a known hierarchy
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
resolved: don't flush answer RRs on CNAME redirect too early
When doing a CNAME/DNAME redirect let's first check if the answer we
already have fully answers the redirected question already. If so, let's
use that. If not, let's properly restart things.
This simply removes one call to dns_answer_reset() that was placed too
early: instead of resetting when we detect a CNAME/DNAME redirect, do so
only after checking if the answer we already have doesn't match the
reply, and then decide to *actually* follow it. Or in other words: rely
on the dns_answer_reset() call in dns_query_go() which we'll call to
actually begin with the redirected question.
(This doesn't really matter as much as one might think, since our cache
stepped in anyway and answered the questions before going back to the
network. However, this adds noise if RRs with very short TTLs are cached
– which some CDNs do – and is of course relavant when people turn off
the local cache.)
Previously by mistake we'd always match every single reply we get in a
CNAME chain to the original question from the stub client. That's
broken, we need to test it against the CNAME query we are currently
looking at.
The effect of this incorrect matching was that we'd assign the RRs to
the wrong section since we'd assume they'd be auxiliary answers instead
of primary answers.
When responding from DNS cache, let's slightly tweak how the TTL is
lowered: as before let's round down when converting from our internal µs
to the external seconds. (This is preferable, since records should
better be cached too short instead of too long.) Let's avoid rounding
down to zero though, since that has special semantics in many cases (in
particular mDNS). Let's just use 1s in that case.
resolved: take shortest TTL of all of RRs in answer as cache lifetime
We nowadays cache full answer RRset combinations instead of just the
exact matching rrset. This means we should not cache RRs that are not
immediate answers to our question for longer then their own RRs. Or in
other words: let's determine the shortest TTL of all RRs in the whole
answer, and use that as cache lifetime.
Luca Boccassi [Sun, 14 Mar 2021 12:36:15 +0000 (12:36 +0000)]
man: specify that ProtectProc= does not work with root/cap_sys_ptrace
When using hidepid=invisible on procfs, the kernel will check if the
gid of the process trying to access /proc is the same as the gid of
the process that mounted the /proc instance, or if it has the ptrace
capability:
Given we set up the /proc instance as root for system services,
The same restriction applies to CAP_SYS_PTRACE, if a process runs with
it then hidepid=invisible has no effect.
ProtectProc effectively can only be used with User= or DynamicUser=yes,
without CAP_SYS_PTRACE.
Update the documentation to explicitly state these limitations.
Daan De Meyer [Fri, 12 Mar 2021 22:09:44 +0000 (22:09 +0000)]
boot: Move console declarations to missing_efi.h
These were added to eficonex.h in gnu-efi 3.0.13. Let's move them
to missing_efi.h behind an appropriate guard to fix the build with
recent versions of gnu-efi.
Kevin Backhouse [Fri, 12 Mar 2021 17:00:56 +0000 (18:00 +0100)]
ask-password-api: fix error handling on invalid unicode character
The integer overflow happens when utf8_encoded_valid_unichar() returns an error
code. The error code is a negative number: -22. This overflows when it is
assigned to `z` (type `size_t`). This can cause an infinite loop if the value
of `q` is 22 or larger.
To reproduce the bug, you need to run `systemd-ask-password` and enter an
invalid unicode character, followed by a backspace character.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
Frantisek Sumsal [Thu, 11 Mar 2021 11:49:00 +0000 (12:49 +0100)]
repart: fix the loop dev support check
Since f17bdf8264e231fa31c769bff2475ef698487d0b the test-repart was
effectively disabled, since `/dev/loop-control` is a character special
file, whereas `-f` works only on regular files. Even though we could use
`-c` to check specifically for character special files, let's use `-e`
just in case.
Michal Sekletar [Tue, 9 Mar 2021 16:22:32 +0000 (17:22 +0100)]
install: refactor find_symlinks() and don't search for symlinks recursively
After all we are only interested in symlinks either in top-level config
directory or in .wants and .requires sub-directories.
As a bonus this should speed up ListUnitFiles() roughly 3-4x on systems
with a lot of units that use drop-ins (e.g. SSH jump hosts with a lot of
user session scopes).
Tables with only one column aren't really tables, they are lists. And if
each cell only consists of a single word, they are probably better
written in a single line. Hence, shorten the man page a bit, and list
boot loader spec partition types in a simple sentence.
Also, drop "root-secondary" from the list. When dissecting images we'll
upgrade "root-secondary" to "root" if we mount it, and do so only if
"root" doesn't exist. Hence never mention "root-secondary" as we never
will mount a partition under that id.
This makes sure nspawn's --volatile=yes switch works again: there we
have a read-only image that is overmounted by a tmpfs (with the
exception of /usr). This we need to mkdir all mount points even though
the image is read-only.
Hence, let's drop the optimizatio of avoiding mkdir() on images that are
read-only, it's wrong and misleading here, since the image itself might
be read-only but our mounts are not.
dissect-image: clean up meaning of DISSECT_IMAGE_MKDIR
Previously handling of DISSECT_IMAGE_MKDIR was pretty weird and broken:
it would control both if we create the top-level mount point when
mounting an image, and the inner mount points for images that consist of
multiple file systems. However, the latter is redundant, since 1f0f82f1311e4c52152b8e2b6f266258709c137d does this too, a few lines
further up – unconditionally!
Hence, let's make the meaning of DISSECT_IMAGE_MKDIR more strict: it
shall be only about the top-level mount point, not about the inner ones
(where we'll continue to create what is missing alwayway). Having a
separate flag for the top-level mount point is relevant, since the mount
point dir created by it will remain on the host fs – unlike the
directories we create inside the image, which will stay within the
image.
This slightly change of meaning is actually inline with what the flag is
actually used for and documented in systemd-dissect.
shared/fstab-util: teach fstab_filter_options() a mode where all values are returned
Apart from tests, the new argument isn't used anywhere, so there should be no
functional change. Note that the two arms of the big conditional are switched, so the
diff is artificially inflated. The actual code change is rather small. I dropped the
path which extracts ret_value manually, because it wasn't supporting unescaping of the
escape character properly.
shared/fstab-util: pass through the escape character
… when not used to escape the separator (,) or the escape character (\).
This mostly restores behaviour from before 0645b83a40d1c782f173c4d8440ab2fc82a75006,
but still allows "," to be escaped.
basic/extract-word: allow escape character to be escaped
With EXTRACT_UNESCAPE_SEPARATORS, backslash is used to escape the separator.
But it wasn't possible to insert the backslash itself. Let's allow this and
add test.
basic/extract_word: try to explain what the various options do
A test for stripping of escaped backslashes without any flags was explicitly
added back in 4034a06ddb82ec9868cd52496fef2f5faa25575f. So it seems to be on
purpose, though I would say that this is at least surprising and hence deserves
a comment.
In test-extract-word, add tests for standalone EXTRACT_UNESCAPE_SEPARATORS.
Only behaviour combined with EXTRACT_CUNESCAPE was tested.
shared/fstab-util: immediately drop empty options again
In the conversion from strv_split() to strv_split_full() done in 7bb553bb98a57b4e03804f8192bdc5a534325582, EXTRACT_DONT_COALESCE_SEPARATORS was
added. I think this was just by mistake… We never look for "empty options", so
whether we immediately ignore the extra separator or store the empty string in
strv, should make no difference.
generators: warn but ignore failure to write timeouts
When we failed to split the options (because of disallowed quoting syntax, which
might be a bug in its own), we would silently fail. Instead, let's emit a warning.
Since we ignore the value if we cannot parse it anyway, let's ignore this error
too.
This creates a "well known" for sgx_enclave ownership. By doing this here we
avoid the risk that various projects making use of the device will provide
similar-but-slightly-incompatible installation instructions, in particular
using different group names.
ACLs are actually a better approach to grant access to users, but not in all
cases, so we want to provide a standard group anyway.
Mode is 0o660, not 0o666 because this is very new code and distributions are
likely to not want to give full access to all users. This might change in the
future, but being conservative is a good default in the beginning.
Rules for /dev/sgx_provision will be provided by libsg-ae-pce:
https://github.com/intel/linux-sgx/issues/678.
Frantisek Sumsal [Wed, 10 Mar 2021 15:41:35 +0000 (16:41 +0100)]
coredump: omit coredump info when -q is used with the `debug` verb
Skip printing the coredump info table when using the `debug` verb in
combination with the `-q/--quiet` option. Useful when trying to gather
coredump info non-interactively via scripted gdb commands.
Frantisek Sumsal [Wed, 10 Mar 2021 11:30:04 +0000 (12:30 +0100)]
test: fix permissions of the ASan udev workaround
otherwise udev complains about the file being world-writable:
systemd-udevd[228]: Configuration file /etc/udev/rules.d/00-set-LD_PRELOAD.rules is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
The patch seems to cause usb devices to get some attributes set from the parent
PCI device. 'hwdb' builtin has support for breaking iteration upwards on usb
devices. But when '--subsystem=foo' is specified, iteration is continued. I'm
sure it *could* be figured out, but it seems hard to get all the combinations
correct. So let's revert to functional status quo ante, even if does the lookup
more than once unnecessarily.
When running TEST-22 under ASan, there's a chain of events which causes
`stat` to output an extraneous ASan error message, causing following
fail:
```
+ test -d /tmp/d/1
++ stat -c %U:%G:%a /tmp/d/1
==82==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
+ test = daemon:daemon:755
.//usr/lib/systemd/tests/testdata/units/testsuite-22.02.sh: line 24: test: =: unary operator expected
```
This is caused by `stat` calling nss which in Arch's configuration calls
the nss-systemd module, that pulls in libasan which causes the $LD_PRELOAD
error message, since `stat` is an uninstrumented binary.
The $LD_PRELOAD variable is explicitly unset for all testsuite-* services
since it causes various issues when calling uninstrumented libraries, so
setting it globally is not an option. Another option would be to set
$LD_PRELOAD for each `stat` call, but that would unnecessarily clutter
the test code.
socket-util: refuse ifnames with embedded '%' as invalid
So Linux has this (insane — in my opinion) "feature" that if you name a
network interface "foo%d" then it will automatically look for the
interface starting with "foo…" with the lowest number that is not used
yet and allocates that.
We should never clash with this "magic" handling of ifnames, hence
refuse this, since otherwise we never know what the name is we end up
with.
We should probably switch things from a deny list to an allow list
sooner or later and be much stricter. Since the kernel directly enforces
only very few rules on the names, we'd need to do some research what is
safe and what is not first, though.
sd-netlink: use setsockopt_int() also for NETLINK_ADD/DROP_MEMBERSHIP
We use 'unsigned' as the type, but netlink(7) says the type is 'int'.
It doesn't really matter, since they are both the same size. Let's use
our helper to shorten the code a bit.
PID1 already logs about the service being started, so this line isn't necessary
in normal use. Also, by the time it is emitted, the service has already
signalled readiness, so let's not say "starting" but "started".
We drop the reference stored in Manager.managed_oom_varlink_request in two code paths:
vl_disconnect() which is installed as a disconnect callback, and in manager_varlink_done().
But we also make a disconnect from manager_varlink_done(). So we end up with the following
call stack:
(gdb) bt
0 vl_disconnect (s=0x112c7b0, link=0xea0070, userdata=0xe9bcc0) at ../src/core/core-varlink.c:414
1 0x00007f1366e9d5ac in varlink_detach_server (v=0xea0070) at ../src/shared/varlink.c:1210
2 0x00007f1366e9d664 in varlink_close (v=0xea0070) at ../src/shared/varlink.c:1228
3 0x00007f1366e9d6b5 in varlink_close_unref (v=0xea0070) at ../src/shared/varlink.c:1240
4 0x0000000000524629 in manager_varlink_done (m=0xe9bcc0) at ../src/core/core-varlink.c:479
5 0x000000000048ef7b in manager_free (m=0xe9bcc0) at ../src/core/manager.c:1357
6 0x000000000042602c in main (argc=5, argv=0x7fff439c43d8) at ../src/core/main.c:2909
When we enter vl_disconnect(), m->managed_oom_varlink_request.n_ref==1.
When we exit from vl_discconect(), m->managed_oom_varlink_request==NULL. But
varlink_close_unref() has a copy of the pointer in *v. When we continue executing
varlink_close_unref(), this pointer is dangling, and the call to varlink_unref()
is done with an invalid pointer.
timedated: fix skipping of comments in config file
Reading file '/usr/lib/systemd/ntp-units.d/80-systemd-timesync.list'
Failed to add NTP service "# This file is part of systemd.", ignoring: Invalid argument
Failed to add NTP service "# See systemd-timedated.service(8) for more information.", ignoring: Invalid argument