With all the preparatory work in previous PRs, we can now call static destructors
repeatedly without issue. We need to do it here so that global variables allocated
during parsing are properly freed.
tree-wide: reset the cleaned-up variable in cleanup functions
If the cleanup function returns the appropriate type, use that to reset the
variable. For other functions (usually the foreign ones which return void), add
an explicit value to reset to.
This causes a bit of code churn, but I think it might be worth it. In a
following patch static destructors will be called from a fuzzer, and this
change allows them to be called multiple times. But I think such a change might
help with detecting unitialized code reuse too. We hit various bugs like this,
and things are more obvious when a pointer has been set to NULL.
I was worried whether this change increases text size, but it doesn't seem to:
-Dbuildtype=debug:
before "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494520 Feb 16 15:06 build/systemd*
after "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494576 Feb 16 15:10 build/systemd*
now:
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494640 Feb 16 15:15 build/systemd*
-Dbuildtype=release:
before "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:09 build-rawhide/systemd*
after "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:10 build-rawhide/systemd*
now:
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:16 build-rawhide/systemd*
I would expect that the compiler would be able to elide the setting of a
variable if the variable is never used again. And this seems to be the case:
in optimized builds there is no change in size whatsoever. And the change in
size in unoptimized build is negligible.
Something strange is happening with size of libsystemd: it's bigger in
optimized builds. Something to figure out, but unrelated to this patch.
I started working on this because I wanted to change how
DEFINE_TRIVIAL_CLEANUP_FUNC is defined. Even independently of that change, it's
nice to make make things more consistent and predictable.
resolved: make dns_transaction_gc return a pointer
_gc() does cleanup if it is possible. So far it returned a bool to
signal if it succeeded (false on success). When working on the resolved
code I had to look at the definition every time, because the (arguably
reversed) calling convention is unobvious. So let's return a pointer
(non-NULL: gc has not been done, NULL: gc has been done).
This fits nicely with the standard to return a pointer from all free
functions obviously.
The function to cleanup IPv6Token was defined using freep, i.e. the macro
generated a freepp function. The correct way would be to do something like
#define ipv6_token_free mfree
DEFINE_TRIVIAL_CLEANUP_FUNC(IPv6Token *, ipv6_token_free);
which would create ipv6_token_freep().
But since the cleanup function is unused, let's just drop it.
fuzz-systemctl-parse-argv: avoid "leak" of bus object
Memory sanitizer would report leaked memory from --boot-load-entry=help.
Maybe we should disable all bus connections from the fuzzer? It seems not
appropriate to communicate with logind. OTOH, in a real fuzzing environment
this call should just fail, so maybe that's OK.
resolved: don't redundantly switch DNS servers because of transaction failures
When a transaction fails and we decide to switch DNS servers, don#t do
so unconditionally. Check if the current DNS server is still the same as
when the transaction was initiated. And if not, do not do anything.
That should reduce the number of redundant DNS server switches if many
parallel transactions fail simultaneously (which is pretty likely if
DNSSEC is on).
resolved: reuse check for link-local IP address lookups
Let's reuse accept_link_local_reverse_lookups() at one more place, where
we check for the list of link local reverase address domains. Since we
don't actually accept the domains here (but rather the opposite, not
accept), let's rename the function a bit more generically with accept_ →
match_.
While we are at it invert the if branches, to make things more easily
understandable: filter out the unwatnted stuff and have the "all good"
state as main codepath.
This fixes a long-standing issue in packaging scriptlets: daemon-reload
was moved to the end of the transaction, but restarting services was still
straightaway after package installation.
Note that daemon-reload is called twice. This wouldn't be hardly noticable,
except that now a bunch of units (at least in Fedora) generate very verbose
warnings about deprecated features. So we get those warnings twice…
reload-or-restart --needing-restart is also called twice, but the second call
is usually a noop, because the first clears the flag for restarted units. The
second call is necessary for the case where we only uninstall packages, and the
%transfiletriggerpostun trigger fires, but not the %transfiletriggerin
scriptlet.
Also note that this assumes that units are marked only for restart if paths
under @systemunitdir@ or /etc/systemd/system have been touched. I would prefer
make the trigger that does 'restart --needing-restart' fire always, but it
seems rpm doesn't have such functionality. (Except as a %transfiletrigger that
would trigger on "/*" to catch all transactions, but that seems ineffiecient
and ugly.)
rpm: order sysctl/sysusers/tmpfiles execution before package scriptlets
P>1000000 is *before* "normal" scriptlets, P<1000000 is *after*. I think it
makes sense to do stuff like execution of sysctl/sysusers/tmpfiles configuration
before package scriptlets. I think that was the intent, but a single digit got
dropped ;(
Also, let's reorder the scriptlets in the file to match execution order, to
make it easier to see what is going on.
Most of those may happen in any order, but there are some exceptions:
tmpfiles should be after sysusers,
udevadm --reload should be after hwdb.
The trigger was initially written to use %transfiletriggerun instead
of %transfiletriggerpostun because the latter would not fire. It turned
out to a buffer overread in rpm that since has been long fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=1284645
https://github.com/rpm-software-management/rpm/commit/f6521c50f6836374a0f7995f8f393aaf36e178ea
rpm: sync the shell version of triggers.systemd with the lua version
Note that this goes both ways: in particular the lua version had udev
scriptlets in the wrong package, fixed in
https://src.fedoraproject.org/rpms/systemd/c/3c9433d7cf4afc8d76660402f6c3d9d991596b83.
rpm: pull in the alternative trigger implementation in sh
From https://src.fedoraproject.org/rpms/systemd/blob/master/f/triggers.systemd.
In 12dde791d519bc80d5cca4ab6f088763cd481015 scriptlets were converted to lua.
This is not only faster and cleaner, but also avoids a nasty dependency loop:
rpm implements the lua scripting internally, so we don't need a working shell
for the scriplets. This is nice and all, but unfortunately ostree wants to
capture scriptlets and execute them at a later time and does not support lua.
So in Fedora we ended up with a revert back to a shell-based implementation
[1]. At the time I hoped this would only be a temporary workaround, but three
years later I think it's fair to assume that this will not happen any time
soon. But carrying the upstream lua version and the downstream sh version is
error prone. So let's import the other version into our tree too so that they
can be kept in sync.
This is almost equivalent to 'busctl call-method org.freedesktop.systemd1
/org/freedesktop/systemd1 org.freedesktop.systemd1.Manager EnqueueMarkedJobs',
but waits for the jobs to finish.
core: add EnqueueMarkedJobs method to reload/restart marked units
We support two return types for methods that start jobs. EnqueueJob support the
full-monty mode with affected jobs. I didn't do this here, since it seems
unlikely to be used. In the common case there'd be a huge list of jobs and
affected jobs. EnqueueMarkedJobs() just returns a list of jobs that we can wait
upon.
The name of the method is generic in case we decide to add something other than
just reload/restart later on.
When errors occur, resource errors are treated as fatal, but for other error
types we queue up other jobs, and only return an error at the end. The
assumption is that the caller will ignore the result error anyway, so it's
better to try to reload/restart as much as possible.
The property is never set by systemd, only reset after a stop or restart or
reload. It may externally be set to mark the unit for a later restart/reload.
I wasn't sure whether to configure the property only for the types where this
makes sense (Service, Swap, etc). But Restart() method is defined on the unit,
and also having this always under the same property name is more convenient.
The ELN composes are quite unstable and take a while to refresh. Let's
drop them again and revisit this once they get more mature to reduce
the CI noise.
Let's suppress repeated stub queries coming in, to minimize resource
usage. Many DNS clients are pretty aggressive regarding repeating DNS
requests, hence let's find them and suppress the follow-ups should we
need more time to fulfill the queries.
Luca Boccassi [Sun, 14 Feb 2021 19:29:42 +0000 (19:29 +0000)]
test: install binaries from local d/control file
The source package in the apt cache might be older than the
packaging from salsa.debian.org/systemd-team/systemd so it might not
list all the current binary packages.
This is currently the case for systemd-timesyncd, so TEST-30 fails.
Simply grep the control file rather than using apt-cache when iterating
over the packages contents.
systemctl: use argv[0] not program_invocation_short_name for arg dispatch
The immediate motivation is to allow fuzz-systemctl-parse-argv to cover also
the other code paths. p_i_s_n is not getting set (and it probably shouldn't),
so the fuzzer would only cover the paths for ./systemctl, and not ./reboot,
etc. Looking at argv[0] instead, which is passed as part of the fuzzer data,
fixes that.
But I think in general it's more correct to look at argv[0] here: after all we
have all the information available through local variables and shouldn't go out
of our way to look at a global.
This lists numerical signal values:
$ systemctl --signal list
SIGNAL NAME
1 SIGHUP
2 SIGINT
3 SIGQUIT
...
62 SIGRTMIN+28
63 SIGRTMIN+29
64 SIGRTMIN+30
This is useful when trying to kill e.g. systemd with a specific signal number
using kill. kill doesn't accept our fancy signal names like RTMIN+4, so one
would have to calculate that value somehow. Doing
systemctl --signal list | grep -F RTMIN+4
is a nice way of doing that.
resolved: refuse sending packets to our own stub listeners
A previous commit made sure that when one of our own packets is looped
back to us, we ignore it. But let's go one step further, and refuse
operation if we notice the server we talk to is our own. This way we
won't generate unnecessary traffic and can return a cleaner error.
Let's be more precise in naming this function, after all this doesn#t
actually check if the packet is really ours, but just that the source IP
address is a local one. Hence name it that way.
(This is preparation to add a helper that checks if packet belongs to
local transaction later on)
Let's add some overflow checks. Also, if 0 records are reserved, use
this as indication that a copy shall be done and do not grow the answer
beyond the current size.
resolved: gracefully handle with packets with too large RR count
Apparently, there are plenty routers in place that report an incorrect
RR count in the packets: they declare more RRs than are actually
included.
Let's accept these responses, but let's downgrade them to baseline, i.e.
let's suppress OPT in this case: if they don't even get the RR count
right, let's operate on the absolute baseline, and not bother with
anything fancier such as EDNS.
Prompted-by: https://github.com/systemd/systemd/issues/12841#issuecomment-724063973 Fixes: #3980
Most likely fixes: #12841
units: turn off DNSSEC validation when timesyncd resolves hostnames
We have a chicken and egg problem: validation of DNSSEC signatures
doesn't work without a correct clock, but to set the correct clock we
need to contact NTP servers which requires resolving a hostname, which
would normally require DNSSEC validation.
Let's break the cycle by excluding NTP hostname resolution from
validation for now.
Of course, this leaves NTP traffic unprotected. To cover that we need
NTPSEC support, which we can add later.
basic/unit-file: when loading linked unit files, use link source as "fragment path"
The general idea is that when a unit file is "linked" (i.e. installed by
symlinking from outside of the search paths), the *destination* name is
irrelevant. It doesn't even have to be a valid unit name, or to match the type
or instance value. The obvious collorary is that we shouldn't look at the
symlink destination name to derive the unit name, instance value, or anything
else at all.
When building the name map, when we find a linked unit (possibly at the end
of a series of alias redirects), store the *source* of the final symlink as the
fragment path. This has two effects:
- we stop looking at the *target* file name to derive unit info, i.e. actually
implement the stuff described in the first paragraph.
- we load the unit fragment through the symlink. If someone were to remove the
symlink, we'll not load the unit. This seems like the right thing.
Fixes #18058.
Before this change, we were generally quite confused about unit alises for
linked units. Fortunately most poeple use the same symlink source and target,
so in practice we wouldn't hit this too often.
In unit_load_fragment() a comment is added to explain what we're doing there.
core: slightly improve error message on load errors
Let's be a bit more helpful when refusing jobs on units that failed to
load properly. We already have explicit D-Bus errors for the error
conditions that are common and expected (such as "not found"), but for
the rest we so far generate a fairly cryptic message.
Let's try to be friendlier towards users and suggest what to do on such
errors.