Ronan Pigott [Wed, 6 Mar 2024 01:05:57 +0000 (18:05 -0700)]
resolved: decrease mdns/llmnr priority for the reverse mapping domains
Previously all queries to the reverse mapping domains (in-addr.arpa and
ip6.arpa) were considered to be in-scope for mdns and llmnr at the same
priority as DNS. This caused sd-resolved to ignore NXDOMAIN responses
from dns in favor of lengthy timeouts.
This narrows the scope of mdns and llmnr so they are not invariably
considered as fallbacks for these domains. Now, mdns/llmnr on a link
will only be used as a fallback when there is no suitable DNS scope, and
when that link is DefaultRoute.
ci: explicitly change oom-{score}-adj before running tests
For some reason root in GH actions is able to _decrease_ its oom score
even after dropping all capabilities (including CAP_SYS_RESOURCE), until
the oom score is changed explicitly after sudo:
$ systemd-detect-virt
microsoft
$ sudo su -
~# capsh --drop=all -- -c 'capsh --print; grep -H . /proc/self/oom*; choom -p $$ -n -101'
Current: =
Bounding set =
Ambient set =
Current IAB: !cap_chown,!cap_dac_override,!cap_dac_read_search,...,!cap_sys_resource,...,!cap_checkpoint_restore
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
/proc/self/oom_adj:8
/proc/self/oom_score:1000
/proc/self/oom_score_adj:500
pid 22180's OOM score adjust value changed from 500 to -101
~# choom -p $$ -n 500
pid 22027's OOM score adjust value changed from 500 to 500
~# capsh --drop=all -- -c 'capsh --print; grep -H . /proc/self/oom*; choom -p $$ -n -101'
Current: =
Bounding set =
Ambient set =
...
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
/proc/self/oom_adj:8
/proc/self/oom_score:1000
/proc/self/oom_score_adj:500
choom: failed to set score adjust value: Permission denied
I have no idea what's going on, but it breaks
exec-oomscoreadjust-negative.service from test-execute when running
unprivileged.
ci: make the build dir accessible when running w/o privileges
Otherwise the unprivileged part of test-execute gets silently skipped:
/* test_run_tests_unprivileged */
Successfully forked off '(test-execute-unprivileged)' as PID 20998.
...
pin_callout_binary: build dir binary: /home/runner/work/systemd/systemd/build/systemd-executor
pin_callout_binary: open(/home/runner/work/systemd/systemd/build/systemd-executor)=-13
Failed to pin executor binary: No such file or directory
(test-execute-unprivileged): manager_new, skipping tests: No such file or directory
(test-execute-unprivileged) succeeded.
Daan De Meyer [Wed, 6 Mar 2024 14:15:55 +0000 (15:15 +0100)]
Use VERSION_TAG instead of GIT_VERSION in kernel-install scripts
GIT_VERSION only makes sense for C files as it depends on C preprocessor
macro expansion now so let's use VERSION_TAG instead of GIT_VERSION
for the two remaining usages of GIT_VERSION that are not in C files.
meson/man: allow man pages to use multiple conditions
This way the man pages are installed only when the corresponding binary is
installed. The conditions in man pages and man/rules/meson.build are adjusted to
match the conditions for units in units/meson.build.
test: use 'ahost' instead of 'hosts' where applicable
As explained in [0] the 'hosts' database uses deprecated
gethostbyname2() which uses AF_INET6 instead of AF_UNSPEC for IPv6
lookups which is broken and makes the test fail with disabled IPv6.
test: bump the timeout for test-execute subtests if running w/ QEMU
Bump the timeout for test-execute subtests if running with plain QEMU
(as part of TEST-02-UNITTESTS), since we might start hitting the default
2m timeout with some more involved subtests, especially when the AWS
region we're running in is under heavy load. I see this regularly in the
CentOS Stream 9 nightly cron job with exec-dynamicuser-statedir.service
which has a lot of ExecStart's.
resolved: remove entry from cache when goodbye packet received
RFC6762 10.1 says that queriers receiving a Multicast DNS response with a TTL
of zero SHOULD record a TTL of 1 and then delete the record one second later.
Added a timer event to trigger a callback to clean-up the cache one second after
a goodbye packet is received. The callback also checks for any cache entries
expiring within the next one second and schedules follow-up cleanup callbacks
accordingly.
Adrian Vovk [Sat, 23 Dec 2023 23:00:48 +0000 (18:00 -0500)]
homework: Lock/Unlock: Freeze/Thaw user session
Whenever a home directory is in a locked state, accessing the files of
the home directory is extremely likely to cause the thread to hang. This
will put the session in a strange state, where some threads are hanging
due to file access and others are not hanging because they are not
trying to access any of the user's files.
This can lead to a whole slew of consequences. For example, imagine a
likely situation where the Wayland compositor is not hanging, but the
user's open apps are. Eventually, the compositor will detect that none
of the apps are responding to its pings, assume that they're frozen
(which they are), and kill them. The systemd user instance can end up in
a similarly confused state and start killing user services. In the worst
case, killing an app at an unexpected moment can lead to data loss.
The solution is to suspend execution of the whole user session by
freezing the user's slice.
Adrian Vovk [Sat, 23 Dec 2023 22:03:42 +0000 (17:03 -0500)]
sleep: Always freeze user.slice
Previously, we'd only freeze user.slice in the case of s2h, because we
didn't want the user session to resume while systemd was transitioning
from suspend to hibernate.
This commit extends this freezing behavior to all sleep modes.
We also have an environment variable to disable the freezing behavior
outright. This is a necessary workaround for someone that has hooks
in /usr/lib/systemd/system-sleep/ which communicate with some
process running under user.slice, or if someone is using the proprietary
NVIDIA driver which breaks when user.slice is frozen (issue #27559)
Adrian Vovk [Sat, 23 Dec 2023 21:57:47 +0000 (16:57 -0500)]
bus-unit-util: Add utility to freeze/thaw units
This utility lets us freeze units, and then automatically thaw them
when via a _cleanup_ handler. For example, you can now write something
like:
```
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_freeze("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
```
Aside from the basic _freeze and _thaw methods, there's also
_cancel and _restore. Cancel destroys the UnitFreezer without
thawing the unit. Restore creates a UnitFreezer without freezing it.
The idea of these two methods is that it allows the freeze/thaw to
be separated from each other (i.e. done in response to two separate
DBus method calls). For example:
```
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_freeze("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
r = do_something()
if (r < 0)
return r; // Freeze is thawed
unit_freezer_cancel(&freezer); // Thaw is canceled.
```
Then in another scope:
```
// Bring back a UnitFreezer object for the already-frozen service
_cleanup_(unit_freezer_thaw) UnitFreezer freezer = UNIT_FREEZER_NULL;
r = unit_freezer_restore("myunit.service", &freezer);
if (r < 0)
return r;
// Freeze is thawed once this scope ends
```
resolved: make resolved authoritative in resolveing our local host name
This is a kinda a follow-up for ce266330fc3bd6767451ac3400336cd9acebe9c1: it
makes resolved authoritative on our local hostname, and never contacts
DNS anymore for it.
We effectively already were authoritative for it, except if the user
queried for other RR types than just A/AAAA. This closes the gap and
refuses routing other RR type queries to DNS.
resolved: make outselves authoritative for /etc/hosts entries in full
If you query for an MX RR of a host listed in /etc/hosts, let's return
an empty reply rather than NXDOMAIN, i.e. indicate that the name exists
but has no MX RR assigned, thus making ourselves authoritative.
The venerable "host" tool by default sends requests for A + AAAA + MX
and ensures we never propagate queries further on.
Te variables indicate what kind of RRs we are looking for, but the name
so far suggests it was about what we already found. Let's rename the
variables to make the purpose clearer.
So far we only looked at the domain name when routing requests to
specific scopes. With this we'll also take the DNS RR type into account.
This takes benefit of the fact that lookups for RRs such as SOA or NS or
the various DNSSEC RR types never really make sense to be routed to
LLMNR or mDNS, since they don't have concepts there.
This hence refuses to route requests for those RR types to the
LLMNR/mDNS scopes, which hence means they'll likely be routed to classic
DNS instead.
This should improve behaviour of tools that assumes it speaks to classic
DNS only via 127.0.0.53, since it will now usually do that.
dig question with DNSSEC on will now be proxied upstream, i.e. to the
test knot server. This leads to different results, but the result isn't
tha tinteresting since we don't want to test knot, but resolved. Hence
comment this test.
There seems to be something wrong with the test though, as the upstream
server refused recursion, but if so it is not suitable as an upstream
server really, as resolved can only be client to a recursive resolver.
resolved: enable DNS proxy mode if client wants DNSSEC
So far we disabled DNSSEC if local clients asked for it via DO flag if
DNSSEC=no is set. Let's instead switch to proxy mode in this case, and
thus treat client requested DO mode as a way to force proxy mode.
This means DNSSEC=no just controls whether resolved will do validation
for regular looups, but it has no effect anymore on lookups from clients
that indicated they want to do their own DNSSEC anyway.
resolved: use relaxed single label rules when proxying DNS queries
When we use proxy mode when propagating DNS queries to upstream DNS
servers, let's use the relaxed single label rules. This has the benefit
that tools such "delv" work on the proxy stub 127.0.0.54.
Matteo Croce [Tue, 27 Feb 2024 20:28:14 +0000 (21:28 +0100)]
dynamically load compression libraries
Dynamically load liblz4, libzstd and liblzma with dlopen().
This helps to reduce the size of the initrd image when these libraries
are not really needed.
Matteo Croce [Tue, 27 Feb 2024 06:36:46 +0000 (07:36 +0100)]
move dlfcn-util into basic
I'm going to dlopen_many_sym_or_warn() in src/basic/compress.c, this
will introduce a circular dependency because libshared already depends
from libbasic.
To avoid this, move dlfcn-util.c from libshared to libbasic.
Nick Rosbrook [Mon, 4 Mar 2024 20:43:57 +0000 (15:43 -0500)]
test: check for kernel.apparmor_restrict_unprivileged_userns
Some tests in test-execute are already skipped if we do not have
unprivileged user namespaces. Extend this check to look for an apparmor
specific sysctl indicating that unprivileged userns creation is
restricted.
Mike Yuan [Sat, 2 Mar 2024 13:22:51 +0000 (21:22 +0800)]
core/service: don't transition to start-post on cgroup empty event
with ExitType=cgroup
It's not clear to me what the rationale of the logic was
when ExitType=cgroup got introduced. But similar to
the previous commit, I think we should not transition to
'start-post' on cgroup empty event. This is especially
important for Type=dbus/notify services.
Luca Boccassi [Sun, 3 Mar 2024 18:14:31 +0000 (18:14 +0000)]
test: fix test-resolved-stream unit test failure
On Noble setting this ioctl fails:
1570s 819/1330 systemd:resolve / test-resolved-stream FAIL 0.14s killed by signal 6 SIGABRT
1570s Successfully forked off '(usernstest)' as PID 27737.
1570s Skipping PR_SET_MM, as we don't have privileges.
1570s (usernstest) succeeded.
1570s Assertion 'ioctl(socket_fd, SIOCSIFFLAGS, &req) >= 0' failed at src/resolve/test-resolved-stream.c:372, function try_isolate_network(). Aborting.
Luca Boccassi [Sun, 3 Mar 2024 18:15:26 +0000 (18:15 +0000)]
test: fix test-loopback failure when lacking privileges
Setting up the loopback might fail due to lack of privileges, as it
happens when running unit tests in the Noble CI environment. Skip
the test when it happens.
1584s 862/1330 systemd:test / test-loopback FAIL 0.01s exit status 1
1584s /* test_loopback_setup */
1584s Failed to configure loopback network device, ignoring: Operation not permitted
1584s loopback: Operation not permitted
hostnamectl: gracefully handle old hostnamed replies to GetHardwareSerial()
Old versions of hostnamed used to propagate ENODEV/ENOENT as-is. Bad
idea. This was fixed in 171ddae1a122e9c97b4ef12ccb2d29e1ba7a318a, but
let's handle this gracefully in hostnamectl.
test: explicitly set TERM=linux for TEST-69-SHUTDOWN
sulogin from the latest util-linux started falling back to vt102 instead
of linux, which makes screen sad (because we install only the linux
terminfo into the test image) and expect trips over the unexpected
warning. Let's just explicitly set TERM=linux before invoking screen to
avoid this.
+ make -C TEST-69-SHUTDOWN setup run
...
INFO:test-shutdown:log in and start screen
root
root
Last login: Sun Mar 3 13:19:31 from 18.191.105.60
-bash-5.2# screen
screen
Cannot find terminfo entry for 'vt102'.
-bash-5.2# ERROR:test-shutdown:Timeout exceeded.
Mike Yuan [Sun, 3 Mar 2024 10:37:36 +0000 (18:37 +0800)]
man/sd_notify: be explicit that FDPOLL= is not a global setting
"submitted" is already used in the description of FDNAME=.
Let's use that instead of "stored" for FDPOLL= too, to make
it more clear that it's a per-submission/per-fdset setting.