Ronan Pigott [Wed, 6 Mar 2024 02:03:16 +0000 (19:03 -0700)]
resolved: don't cache NXDOMAIN for SUDN resolver.arpa
The name resolver.arpa is reserved for RFC9462 "Discovery of Designated
Resolvers" (DDR). This relies on regular dns queries for SVCB records at
the special use domain name _dns.resolver.arpa. Unfortunately, older
nameservers (or broken ones) won't know about this SUDN and will likely
return NXDOMAIN. If this is cached, the cache entry will become an
impediment for any clients trying to discover designated resolvers
through the stub-resolver, or potentially even sd-resolved itself, were
it to implement DDR.
The RFC recommendation is that "clients MUST NOT perform A or AAAA
queries for resolver.arpa", and "resolvers SHOULD respond to queries of
any type other than SVCB for _dns.resolver.arpa. with NODATA and queries
of any type for any domain name under resolver.arpa with NODATA." which
should help avoid potential compatibility issues. This enforces that
condition within sd-resolved, and avoids caching any such erroneous
NXDOMAIN.
The RFC also recommends requests for this domain should never be
forwarded, to prevent authentication failures. Since there isn't much
point in establishing secure communication to the local stub, we still
allow SVCB to be forwarded from the stub, in case the client cares to
implement some other authentication method and understands the
consequences of skipping the local stub. Normal clients are not
expected to implement DDR, but this change will protect sd-resolved's
own caches in case they try.
Although A and AAAA are prohibited, I think validating resolvers
might reasonably query for dnssec records, even though the resolver.arpa
zone does not exist (it is declared to be a locally served zone). For
this reason, I have also added resolver.arpa to the builtin dnssec NTA.
kernel-install: support full set of config files and drop-ins
This brings the handling of config for kernel-install in line with most of
systemd, i.e. we search the set of paths for the main config file, and the full
set of drop-in paths for drop-ins.
various: use new config loader instead of config_parse_config_file()
This means the main config file is loaded also from /run and /usr.
We should load the main config file from all the places where we load drop-ins.
I realize I had a giant blind spot: I always assumed that we load config files
from /etc, /run, /usr/local/lib, /usr/lib. But it turns out that we only used
those paths for drop-ins. For the main config file, we only looked in /etc. The
docs actually partially described this behaviour, i.e. most SYNOPSIS sections
and some parts of the text, but not others.
This is strange, because 6495361c7d5e8bf640841d1292ef6cfe1ea244cf was completely
bogus with the behaviour before this patch. We had a huge discussion before it
was merged, and clearly nobody noticed this. Similarly, in the previous version
of the current pull request, we had a long discussion about the appropriate
order of directories, and apparently nobody noticed that there was no order,
because only looked in one directory. So the blind spot seems to have been
shared.
Also, systemd-analyze cat-config behaved incorrectly, i.e. its behaviour matches
the new behaviour.
Possibly, in the future it'll make it easier to add support for --root.
shared/conf-parser: use chase() in config_parse_many_files()
The function was partially implementing chroot lookups. It would be given
file names that were prefixed with the chroot, so it would mostly work.
But if any of those files were symlinks, fopen() would do the wrong thing.
Also we don't need locking.
So give 'root' as the argument and use chase_and_fopen_unlocked() to get
proper chroot-aware lookups.
Also, use the more correct type of 'const char* const*' for the input strv.
This requires adding the cast in a few places, but also allows to remove some
casts in others.
shared/conf-parser: collapse pkgdir and conf_file args into one
This essentially reverts 5656cdfeeabc16b5489f5ec7a0a36025a2ec1f23. I find it
much easier to understand what is going on when the
path-relative-to-the-search-path is passed in full, instead of being constructed
from two parts, with one of the parts being implicit in some places.
Also, we call 'systemd-analyze cat-config <path>' with <path> with the same
meaning, so this makes the internal and external APIs more consistent.
Xiaotian Wu [Wed, 27 Dec 2023 08:25:22 +0000 (16:25 +0800)]
loongarch64: disable simd when build efi
LoongArch does not yet support the `-mgeneral-regs-only` option, so when
compiling for EFI, we need to use the `-mno-lsx` and `-mno-lasx` options
to disable SIMD instructions.
Daan De Meyer [Thu, 25 Jan 2024 21:48:55 +0000 (22:48 +0100)]
Build distribution packages in mkosi
Instead of running meson install and hoping for the best, let's build
distribution packages from the downstream packaging specs. This gets
us the following:
- Vastly simplified mkosi scripts since we don't need a separate initrd
image anymore but can just reuse the default mkosi initrd.
- Almost everything can move to the base image as its not the basis
anymore for the initrd and as such we don't need to care about the
size anymore.
- The systemd packages that get pulled in as dependencies of other
packages get properly uninstalled and replaced with our packages that
we built instead of just installing on top of an existing systemd
installation with no guarantee that everything from that previous
installation was removed.
- Much better testing coverage as what we're testing is much closer
to what will actually be deployed in distributions.
- Immediate feedback if something we change breaks distribution packaging
- We get integration with the distribution for free as we'll automatically
use the proper directories and such instead of having to hack this
into a mkosi build script.
- ...
Also, before this commit, the 'X' is only applied if
the owner has execute bit set. Now it takes group and
other into consideration too. setfacl(1) also has
the same behavior.
Ronan Pigott [Wed, 6 Mar 2024 01:05:57 +0000 (18:05 -0700)]
resolved: decrease mdns/llmnr priority for the reverse mapping domains
Previously all queries to the reverse mapping domains (in-addr.arpa and
ip6.arpa) were considered to be in-scope for mdns and llmnr at the same
priority as DNS. This caused sd-resolved to ignore NXDOMAIN responses
from dns in favor of lengthy timeouts.
This narrows the scope of mdns and llmnr so they are not invariably
considered as fallbacks for these domains. Now, mdns/llmnr on a link
will only be used as a fallback when there is no suitable DNS scope, and
when that link is DefaultRoute.
ci: explicitly change oom-{score}-adj before running tests
For some reason root in GH actions is able to _decrease_ its oom score
even after dropping all capabilities (including CAP_SYS_RESOURCE), until
the oom score is changed explicitly after sudo:
$ systemd-detect-virt
microsoft
$ sudo su -
~# capsh --drop=all -- -c 'capsh --print; grep -H . /proc/self/oom*; choom -p $$ -n -101'
Current: =
Bounding set =
Ambient set =
Current IAB: !cap_chown,!cap_dac_override,!cap_dac_read_search,...,!cap_sys_resource,...,!cap_checkpoint_restore
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
/proc/self/oom_adj:8
/proc/self/oom_score:1000
/proc/self/oom_score_adj:500
pid 22180's OOM score adjust value changed from 500 to -101
~# choom -p $$ -n 500
pid 22027's OOM score adjust value changed from 500 to 500
~# capsh --drop=all -- -c 'capsh --print; grep -H . /proc/self/oom*; choom -p $$ -n -101'
Current: =
Bounding set =
Ambient set =
...
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
/proc/self/oom_adj:8
/proc/self/oom_score:1000
/proc/self/oom_score_adj:500
choom: failed to set score adjust value: Permission denied
I have no idea what's going on, but it breaks
exec-oomscoreadjust-negative.service from test-execute when running
unprivileged.
ci: make the build dir accessible when running w/o privileges
Otherwise the unprivileged part of test-execute gets silently skipped:
/* test_run_tests_unprivileged */
Successfully forked off '(test-execute-unprivileged)' as PID 20998.
...
pin_callout_binary: build dir binary: /home/runner/work/systemd/systemd/build/systemd-executor
pin_callout_binary: open(/home/runner/work/systemd/systemd/build/systemd-executor)=-13
Failed to pin executor binary: No such file or directory
(test-execute-unprivileged): manager_new, skipping tests: No such file or directory
(test-execute-unprivileged) succeeded.
Daan De Meyer [Wed, 6 Mar 2024 14:15:55 +0000 (15:15 +0100)]
Use VERSION_TAG instead of GIT_VERSION in kernel-install scripts
GIT_VERSION only makes sense for C files as it depends on C preprocessor
macro expansion now so let's use VERSION_TAG instead of GIT_VERSION
for the two remaining usages of GIT_VERSION that are not in C files.
meson/man: allow man pages to use multiple conditions
This way the man pages are installed only when the corresponding binary is
installed. The conditions in man pages and man/rules/meson.build are adjusted to
match the conditions for units in units/meson.build.
test: use 'ahost' instead of 'hosts' where applicable
As explained in [0] the 'hosts' database uses deprecated
gethostbyname2() which uses AF_INET6 instead of AF_UNSPEC for IPv6
lookups which is broken and makes the test fail with disabled IPv6.
test: bump the timeout for test-execute subtests if running w/ QEMU
Bump the timeout for test-execute subtests if running with plain QEMU
(as part of TEST-02-UNITTESTS), since we might start hitting the default
2m timeout with some more involved subtests, especially when the AWS
region we're running in is under heavy load. I see this regularly in the
CentOS Stream 9 nightly cron job with exec-dynamicuser-statedir.service
which has a lot of ExecStart's.
resolved: remove entry from cache when goodbye packet received
RFC6762 10.1 says that queriers receiving a Multicast DNS response with a TTL
of zero SHOULD record a TTL of 1 and then delete the record one second later.
Added a timer event to trigger a callback to clean-up the cache one second after
a goodbye packet is received. The callback also checks for any cache entries
expiring within the next one second and schedules follow-up cleanup callbacks
accordingly.
Adrian Vovk [Sat, 23 Dec 2023 23:00:48 +0000 (18:00 -0500)]
homework: Lock/Unlock: Freeze/Thaw user session
Whenever a home directory is in a locked state, accessing the files of
the home directory is extremely likely to cause the thread to hang. This
will put the session in a strange state, where some threads are hanging
due to file access and others are not hanging because they are not
trying to access any of the user's files.
This can lead to a whole slew of consequences. For example, imagine a
likely situation where the Wayland compositor is not hanging, but the
user's open apps are. Eventually, the compositor will detect that none
of the apps are responding to its pings, assume that they're frozen
(which they are), and kill them. The systemd user instance can end up in
a similarly confused state and start killing user services. In the worst
case, killing an app at an unexpected moment can lead to data loss.
The solution is to suspend execution of the whole user session by
freezing the user's slice.
Adrian Vovk [Sat, 23 Dec 2023 22:03:42 +0000 (17:03 -0500)]
sleep: Always freeze user.slice
Previously, we'd only freeze user.slice in the case of s2h, because we
didn't want the user session to resume while systemd was transitioning
from suspend to hibernate.
This commit extends this freezing behavior to all sleep modes.
We also have an environment variable to disable the freezing behavior
outright. This is a necessary workaround for someone that has hooks
in /usr/lib/systemd/system-sleep/ which communicate with some
process running under user.slice, or if someone is using the proprietary
NVIDIA driver which breaks when user.slice is frozen (issue #27559)
Adrian Vovk [Sat, 23 Dec 2023 21:57:47 +0000 (16:57 -0500)]
bus-unit-util: Add utility to freeze/thaw units
This utility lets us freeze units, and then automatically thaw them
when via a _cleanup_ handler. For example, you can now write something
like:
```
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_freeze("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
```
Aside from the basic _freeze and _thaw methods, there's also
_cancel and _restore. Cancel destroys the UnitFreezer without
thawing the unit. Restore creates a UnitFreezer without freezing it.
The idea of these two methods is that it allows the freeze/thaw to
be separated from each other (i.e. done in response to two separate
DBus method calls). For example:
```
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_freeze("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
r = do_something()
if (r < 0)
return r; // Freeze is thawed
unit_freezer_cancel(&freezer); // Thaw is canceled.
```
Then in another scope:
```
// Bring back a UnitFreezer object for the already-frozen service
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_restore("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
```
resolved: make resolved authoritative in resolveing our local host name
This is a kinda a follow-up for ce266330fc3bd6767451ac3400336cd9acebe9c1: it
makes resolved authoritative on our local hostname, and never contacts
DNS anymore for it.
We effectively already were authoritative for it, except if the user
queried for other RR types than just A/AAAA. This closes the gap and
refuses routing other RR type queries to DNS.
resolved: make outselves authoritative for /etc/hosts entries in full
If you query for an MX RR of a host listed in /etc/hosts, let's return
an empty reply rather than NXDOMAIN, i.e. indicate that the name exists
but has no MX RR assigned, thus making ourselves authoritative.
The venerable "host" tool by default sends requests for A + AAAA + MX
and ensures we never propagate queries further on.
Te variables indicate what kind of RRs we are looking for, but the name
so far suggests it was about what we already found. Let's rename the
variables to make the purpose clearer.
So far we only looked at the domain name when routing requests to
specific scopes. With this we'll also take the DNS RR type into account.
This takes benefit of the fact that lookups for RRs such as SOA or NS or
the various DNSSEC RR types never really make sense to be routed to
LLMNR or mDNS, since they don't have concepts there.
This hence refuses to route requests for those RR types to the
LLMNR/mDNS scopes, which hence means they'll likely be routed to classic
DNS instead.
This should improve behaviour of tools that assumes it speaks to classic
DNS only via 127.0.0.53, since it will now usually do that.
dig question with DNSSEC on will now be proxied upstream, i.e. to the
test knot server. This leads to different results, but the result isn't
tha tinteresting since we don't want to test knot, but resolved. Hence
comment this test.
There seems to be something wrong with the test though, as the upstream
server refused recursion, but if so it is not suitable as an upstream
server really, as resolved can only be client to a recursive resolver.
resolved: enable DNS proxy mode if client wants DNSSEC
So far we disabled DNSSEC if local clients asked for it via DO flag if
DNSSEC=no is set. Let's instead switch to proxy mode in this case, and
thus treat client requested DO mode as a way to force proxy mode.
This means DNSSEC=no just controls whether resolved will do validation
for regular looups, but it has no effect anymore on lookups from clients
that indicated they want to do their own DNSSEC anyway.
resolved: use relaxed single label rules when proxying DNS queries
When we use proxy mode when propagating DNS queries to upstream DNS
servers, let's use the relaxed single label rules. This has the benefit
that tools such "delv" work on the proxy stub 127.0.0.54.