Luca Boccassi [Thu, 28 Jan 2021 17:02:33 +0000 (17:02 +0000)]
namespace: store and use original MountEntry paths when prefixing
Some paths (eg: mount_tmpfs) simply assumed that prefixing always
happens and it always stores the original path in path_const, and
the prefixed path in path_malloc.
But if a MountEntry is set up in a helper function and thus uses
only _malloc struct members, this assumption doesn't hold and there's
a crash.
Refactor so that prefixing is done with a helper which stores the
original path in a separate struct member, and accessing it also
uses a helper which does the right thing.
resolved: log process info of clients requesting resolution via D-Bus
Let's make things more debuggable: when debug logging is on, let's
say which client is asking for our services.
This is helpful for easily figuring out which local process might
interfere with your debugging sessions by issuing additional requests
while you try to debug a request (I am looking at you, geoclue!).
resolved: add "confidential" flag for replies passed to clients
Let's introduce a new flag that indicates whether the response was
acquired in "confidential" mode, i.e. via encrypted DNS-over-TLS, or
synthesized locally.
resolved: replace "answer_authenticated" bool by uint64_t query_flags field
Let's use the same flags type we use for client communication, i.e.
instead of "bool answer_authenticated", let's use "uint64_t
answer_query_flags", with the SD_RESOLVED_AUTHENTICATED flag.
This is mostly just search/replace, i.e. a refactoring, no change in
behaviour.
This becomes useful once in a later commit SD_RESOLVED_CONFIDENTIAL is
added to indicate resolution that either were encrypted (DNS-over-TLS)
or never left the local system.
resolvectl: clarify IDNA and search path logic in combination with "resolvectl query --type="
When low-level RR resolution is requested from "resolvectl query" via
"--type=" or "--class=" no search domain logic is applied and no IDNA
translation.
Explain this in detail in the documentation, and also mentions this when
users attempt to resolve single-label names or names with international
characters in the output.
I believe the current behaviour is correct, but it is indeed surprising.
Hence the documentation and output improvement.
resolved: instead of closing DNS UDP transaction fds right-away, add them to a socket "graveyard"
The "socket graveyard" shall contain sockets we have sent a question out
of, but not received a reply. If we'd close thus sockets immediately
when we are not interested anymore, we'd trigger ICMP port unreachable
messages once we after all *do* get a reply. Let's avoid that, by
leaving the fds open for a bit longer, until a timeout is reached or a
reply datagram received.
Numeric ifnames should be acceptable only if that's enabled by flag, and
refused otherwise. Hence, let's parse as ifindex first, and if that
works decide. Finally, let's refuse any numeric ifnames that are not
valid ifindexs, but look like them.
resolved: don't redundantly switch DNS servers because of transaction failures
When a transaction fails and we decide to switch DNS servers, don#t do
so unconditionally. Check if the current DNS server is still the same as
when the transaction was initiated. And if not, do not do anything.
That should reduce the number of redundant DNS server switches if many
parallel transactions fail simultaneously (which is pretty likely if
DNSSEC is on).
resolved: reuse check for link-local IP address lookups
Let's reuse accept_link_local_reverse_lookups() at one more place, where
we check for the list of link local reverase address domains. Since we
don't actually accept the domains here (but rather the opposite, not
accept), let's rename the function a bit more generically with accept_ →
match_.
While we are at it invert the if branches, to make things more easily
understandable: filter out the unwatnted stuff and have the "all good"
state as main codepath.
This fixes a long-standing issue in packaging scriptlets: daemon-reload
was moved to the end of the transaction, but restarting services was still
straightaway after package installation.
Note that daemon-reload is called twice. This wouldn't be hardly noticable,
except that now a bunch of units (at least in Fedora) generate very verbose
warnings about deprecated features. So we get those warnings twice…
reload-or-restart --needing-restart is also called twice, but the second call
is usually a noop, because the first clears the flag for restarted units. The
second call is necessary for the case where we only uninstall packages, and the
%transfiletriggerpostun trigger fires, but not the %transfiletriggerin
scriptlet.
Also note that this assumes that units are marked only for restart if paths
under @systemunitdir@ or /etc/systemd/system have been touched. I would prefer
make the trigger that does 'restart --needing-restart' fire always, but it
seems rpm doesn't have such functionality. (Except as a %transfiletrigger that
would trigger on "/*" to catch all transactions, but that seems ineffiecient
and ugly.)
rpm: order sysctl/sysusers/tmpfiles execution before package scriptlets
P>1000000 is *before* "normal" scriptlets, P<1000000 is *after*. I think it
makes sense to do stuff like execution of sysctl/sysusers/tmpfiles configuration
before package scriptlets. I think that was the intent, but a single digit got
dropped ;(
Also, let's reorder the scriptlets in the file to match execution order, to
make it easier to see what is going on.
Most of those may happen in any order, but there are some exceptions:
tmpfiles should be after sysusers,
udevadm --reload should be after hwdb.
The trigger was initially written to use %transfiletriggerun instead
of %transfiletriggerpostun because the latter would not fire. It turned
out to a buffer overread in rpm that since has been long fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=1284645
https://github.com/rpm-software-management/rpm/commit/f6521c50f6836374a0f7995f8f393aaf36e178ea
rpm: sync the shell version of triggers.systemd with the lua version
Note that this goes both ways: in particular the lua version had udev
scriptlets in the wrong package, fixed in
https://src.fedoraproject.org/rpms/systemd/c/3c9433d7cf4afc8d76660402f6c3d9d991596b83.
rpm: pull in the alternative trigger implementation in sh
From https://src.fedoraproject.org/rpms/systemd/blob/master/f/triggers.systemd.
In 12dde791d519bc80d5cca4ab6f088763cd481015 scriptlets were converted to lua.
This is not only faster and cleaner, but also avoids a nasty dependency loop:
rpm implements the lua scripting internally, so we don't need a working shell
for the scriplets. This is nice and all, but unfortunately ostree wants to
capture scriptlets and execute them at a later time and does not support lua.
So in Fedora we ended up with a revert back to a shell-based implementation
[1]. At the time I hoped this would only be a temporary workaround, but three
years later I think it's fair to assume that this will not happen any time
soon. But carrying the upstream lua version and the downstream sh version is
error prone. So let's import the other version into our tree too so that they
can be kept in sync.
This is almost equivalent to 'busctl call-method org.freedesktop.systemd1
/org/freedesktop/systemd1 org.freedesktop.systemd1.Manager EnqueueMarkedJobs',
but waits for the jobs to finish.
core: add EnqueueMarkedJobs method to reload/restart marked units
We support two return types for methods that start jobs. EnqueueJob support the
full-monty mode with affected jobs. I didn't do this here, since it seems
unlikely to be used. In the common case there'd be a huge list of jobs and
affected jobs. EnqueueMarkedJobs() just returns a list of jobs that we can wait
upon.
The name of the method is generic in case we decide to add something other than
just reload/restart later on.
When errors occur, resource errors are treated as fatal, but for other error
types we queue up other jobs, and only return an error at the end. The
assumption is that the caller will ignore the result error anyway, so it's
better to try to reload/restart as much as possible.
The property is never set by systemd, only reset after a stop or restart or
reload. It may externally be set to mark the unit for a later restart/reload.
I wasn't sure whether to configure the property only for the types where this
makes sense (Service, Swap, etc). But Restart() method is defined on the unit,
and also having this always under the same property name is more convenient.
The ELN composes are quite unstable and take a while to refresh. Let's
drop them again and revisit this once they get more mature to reduce
the CI noise.
Let's suppress repeated stub queries coming in, to minimize resource
usage. Many DNS clients are pretty aggressive regarding repeating DNS
requests, hence let's find them and suppress the follow-ups should we
need more time to fulfill the queries.
Luca Boccassi [Sun, 14 Feb 2021 19:29:42 +0000 (19:29 +0000)]
test: install binaries from local d/control file
The source package in the apt cache might be older than the
packaging from salsa.debian.org/systemd-team/systemd so it might not
list all the current binary packages.
This is currently the case for systemd-timesyncd, so TEST-30 fails.
Simply grep the control file rather than using apt-cache when iterating
over the packages contents.
systemctl: use argv[0] not program_invocation_short_name for arg dispatch
The immediate motivation is to allow fuzz-systemctl-parse-argv to cover also
the other code paths. p_i_s_n is not getting set (and it probably shouldn't),
so the fuzzer would only cover the paths for ./systemctl, and not ./reboot,
etc. Looking at argv[0] instead, which is passed as part of the fuzzer data,
fixes that.
But I think in general it's more correct to look at argv[0] here: after all we
have all the information available through local variables and shouldn't go out
of our way to look at a global.
This lists numerical signal values:
$ systemctl --signal list
SIGNAL NAME
1 SIGHUP
2 SIGINT
3 SIGQUIT
...
62 SIGRTMIN+28
63 SIGRTMIN+29
64 SIGRTMIN+30
This is useful when trying to kill e.g. systemd with a specific signal number
using kill. kill doesn't accept our fancy signal names like RTMIN+4, so one
would have to calculate that value somehow. Doing
systemctl --signal list | grep -F RTMIN+4
is a nice way of doing that.
resolved: refuse sending packets to our own stub listeners
A previous commit made sure that when one of our own packets is looped
back to us, we ignore it. But let's go one step further, and refuse
operation if we notice the server we talk to is our own. This way we
won't generate unnecessary traffic and can return a cleaner error.
Let's be more precise in naming this function, after all this doesn#t
actually check if the packet is really ours, but just that the source IP
address is a local one. Hence name it that way.
(This is preparation to add a helper that checks if packet belongs to
local transaction later on)
Let's add some overflow checks. Also, if 0 records are reserved, use
this as indication that a copy shall be done and do not grow the answer
beyond the current size.
resolved: gracefully handle with packets with too large RR count
Apparently, there are plenty routers in place that report an incorrect
RR count in the packets: they declare more RRs than are actually
included.
Let's accept these responses, but let's downgrade them to baseline, i.e.
let's suppress OPT in this case: if they don't even get the RR count
right, let's operate on the absolute baseline, and not bother with
anything fancier such as EDNS.
Prompted-by: https://github.com/systemd/systemd/issues/12841#issuecomment-724063973 Fixes: #3980
Most likely fixes: #12841
units: turn off DNSSEC validation when timesyncd resolves hostnames
We have a chicken and egg problem: validation of DNSSEC signatures
doesn't work without a correct clock, but to set the correct clock we
need to contact NTP servers which requires resolving a hostname, which
would normally require DNSSEC validation.
Let's break the cycle by excluding NTP hostname resolution from
validation for now.
Of course, this leaves NTP traffic unprotected. To cover that we need
NTPSEC support, which we can add later.
basic/unit-file: when loading linked unit files, use link source as "fragment path"
The general idea is that when a unit file is "linked" (i.e. installed by
symlinking from outside of the search paths), the *destination* name is
irrelevant. It doesn't even have to be a valid unit name, or to match the type
or instance value. The obvious collorary is that we shouldn't look at the
symlink destination name to derive the unit name, instance value, or anything
else at all.
When building the name map, when we find a linked unit (possibly at the end
of a series of alias redirects), store the *source* of the final symlink as the
fragment path. This has two effects:
- we stop looking at the *target* file name to derive unit info, i.e. actually
implement the stuff described in the first paragraph.
- we load the unit fragment through the symlink. If someone were to remove the
symlink, we'll not load the unit. This seems like the right thing.
Fixes #18058.
Before this change, we were generally quite confused about unit alises for
linked units. Fortunately most poeple use the same symlink source and target,
so in practice we wouldn't hit this too often.
In unit_load_fragment() a comment is added to explain what we're doing there.
core: slightly improve error message on load errors
Let's be a bit more helpful when refusing jobs on units that failed to
load properly. We already have explicit D-Bus errors for the error
conditions that are common and expected (such as "not found"), but for
the rest we so far generate a fairly cryptic message.
Let's try to be friendlier towards users and suggest what to do on such
errors.