Tony Finch [Thu, 6 Apr 2023 10:30:00 +0000 (11:30 +0100)]
A SET_IF_NOT_NULL() macro for optional return values
The SET_IF_NOT_NULL() macro avoids a fair amount of tedious boilerplate,
checking pointer parameters to see if they're non-NULL and updating
them if they are. The macro was already in the dns_zone unit, and this
commit moves it to the <isc/util.h> header.
I have included a Coccinelle semantic patch to use SET_IF_NOT_NULL()
where appropriate. The patch needs an #include in `openssl_shim.c`
in order to work.
Mark Andrews [Wed, 2 Aug 2023 06:16:30 +0000 (16:16 +1000)]
Add sleeps so that the modification time changes
The mkeys system test could fail because root zone was resigned
within the same second as it was previously signed causing reloads
to fail. Add delays to the test to prevent this.
Revert commit that always uses OpenSSL 3.0 API when available,
the new APIs should work always, but OpenSSL has non-obvious
omissions in the automatic mappings it provides.
Mark Andrews [Mon, 7 Aug 2023 08:22:29 +0000 (18:22 +1000)]
Fix 'addr', 'ckresult' and 'drop' functions
'addr', 'ckresult' and 'drop' should return 0 rather than 1 after
calling 'setret' as the error has been logged and these functions
are not expect to fail.
Michal Nowak [Mon, 7 Aug 2023 16:28:34 +0000 (18:28 +0200)]
Exclude dupsigs and keymgr2kasp from cross-version-config-tests
pytest should not schedule dupsigs and keymgr2kasp system tests removed
in BIND 9 mainline but still present in BIND 9 baseline version
(v9.19.15). (Can be dropped once the v9.19.16 tag is present.)
Michal Nowak [Wed, 25 Jan 2023 20:38:56 +0000 (21:38 +0100)]
Cross-version testing with named configurations
In #3381 (and #3385), we committed a backward-incompatible change to
BIND 9.19.5, 9.18.7, and 9.16.33, explicitly requiring "inline-signing"
for every "dnssec-policy".
We did this backward-incompatible change deliberately, knowing the
consequences for users and their configurations. But if we didn't, say,
we were unaware this is a backward-incompatible change and fixed failing
systems test by "tweaking a knob to make the CI pass", we would not have
a second look before the change hits user configurations.
"cross-version-config-tests" CI job is such a second look. It will run
system tests from the latest release tag specific to the particular
branch (e.g., v9.19.12 for the "main" branch) with BIND 9 binaries from
the current "HEAD" (the future v9.19.13). This Frankenstein build gets
conceived by altering the "TOP_BUILDDIR" variable in
"bin/tests/system/conf.sh".
Caveats:
- Only system test configurations are tested; no actual test code is
run.
- Problems with namedN.conf configurations are not identified.
When backward-incompatible change is introduced, the CI job is expected
to fail. If the change is deliberate, the job will keep failing until
the version with the backward-incompatible change is tagged, and the
minor version in configure.ac is bumped.
Timo Teräs [Fri, 28 Jul 2023 10:15:48 +0000 (13:15 +0300)]
Fix OpenSSL 3.0 API EC curve names
The OpenSSL man page examples used the NIST curve names which
are supported. But when querying the name, the native OpenSSL
name is returned. Use these names to pass curve type checks for
engine/provider objects.
Michał Kępień [Tue, 11 Jul 2023 13:56:31 +0000 (15:56 +0200)]
Convert setup.pl into static configurations
The setup.pl script has been replaced with static BIND configurations,
and in the course of this change, the unused ns1 server was removed.
This enhancement has greatly improved the overall test's readability.
Michal Nowak [Tue, 9 May 2023 17:11:00 +0000 (19:11 +0200)]
Rewrite stress test to pytest
The shell version of the test was completed only after all DNS zone
updates were sent, even if the BIND server crashed while processing
them, leading to prolonged execution and potential hang in the CI
environment. The Python rewrite of the test ensures that DNS update
tasks finish within five minutes of starting, irrespective of a BIND
crash possibility or DNS zone updates not finishing in time.
Michał Kępień [Mon, 7 Aug 2023 09:26:58 +0000 (11:26 +0200)]
Lower the minimum expected dnstap output file size
Lower the size requirement for the dnstap output file produced during
the "dnstap" system test from 454 to 450 bytes; while files of that size
are not generated in any GitLab CI job, they are in other environments
where the test passes.
Michał Kępień [Mon, 7 Aug 2023 09:26:58 +0000 (11:26 +0200)]
Wait until fstrm_capture is ready
The fstrm_capture utility is started in the background during the
"dnstap" system test. Consequently, "rndc dnstap-reopen" and similar
commands may be executed before fstrm_capture starts listening on the
Unix domain socket it is configured to receive dnstap data on. This
results in the dnstap data sent to that socket in the meantime to be
lost; while the fstrm writer thread is able to recover from such a
scenario within a couple of seconds (by reopening the configured dnstap
destination itself), only one write attempt is made for data
successfully queued to the writer thread, so dnstap frames can still be
lost in the process. This may happen during the "dnstap" system test,
leading to the dnstap output file being empty, which in turn causes the
test to fail.
Fix by waiting until fstrm_capture starts listening on the Unix domain
socket it is configured to use before asking named to reopen the
configured dnstap destination. Since various fstrm_capture versions log
different messages when the listening socket is set up, wait for a
common string that works for all fstrm_capture versions released to
date. Add a few extra debug messages indicating test progress and make
the test fail if the expected fstrm_capture log message is not generated
within 10 seconds.
Michał Kępień [Mon, 7 Aug 2023 09:26:58 +0000 (11:26 +0200)]
Capture all fstrm_capture output
The fstrm_capture.out file is overwritten when the fstrm_capture utility
is restarted during the "dnstap" system test. Use a separate output
file for each fstrm_capture instance to ensure all output produced by
that tool during the "dnstap" system test is preserved for forensic
purposes.
Mark Andrews [Sun, 6 Aug 2023 23:38:56 +0000 (09:38 +1000)]
Set ret=1 if _wait_for_stats does not succeed
Errors getting transfer statistics from named.run where not detected
as ret was not set to one if there hadn't been a success after looping
for a while.
This means that if you use TTL values larger than 1 day in your zone,
your zone runs the risk of going bogus before it moves safely to
insecure.
Most resolvers by default cap the maximum TTL that they cache RRsets,
at one day (Unbound, Knot, PowerDNS) so that is fine. However, BIND 9's
default is one week.
Change the default TTLsig to one week, so that also for BIND 9
resolvers in the default cases responses for zones that are going
insecure will not be evaluated as bogus.
This change does mean that when unsigning your zone, it will take six
days longer to safely go insecure, regardless of what TTL values you
use in the zone.
these options concentrate zone maintenance actions into
bursts for the benefit of servers with intermittent connections.
that's no longer something we really need to optimize.
Mark Andrews [Thu, 11 May 2023 02:09:26 +0000 (12:09 +1000)]
Use sub shell to isolate enviroment changes
'HOME=value command' should only change HOME for command but on
some platforms this occasionally sets HOME for the rest of the
test. Explicitly isolate the enviroment change using a sub shell.
Allow larger TTL values in zones that go insecure. This is necessary
because otherwise the zone will not be loaded due to the max-zone-ttl
of P1D that is part of the current insecure policy.
In the keymgr.c code, default back to P1D if the max-zone-ttl is set
to zero.
When using automated DNSSEC management, it is required that the zone
is dynamic, or that inline-signing is enabled (or both). Update the
checkconf code to also allow inline-signing to be enabled within
dnssec-policy.
Add an option to enable/disable inline-signing inside the
dnssec-policy clause. The existing inline-signing option that is
set in the zone clause takes priority, but if it is omitted, then the
value that is set in dnssec-policy is taken.
The built-in policies use inline-signing.
This means that if you want to use the default policy without
inline-signing you either have to set it explicitly in the zone
clause:
zone "example" {
...
dnssec-policy default;
inline-signing no;
};
Or create a new policy, only overriding the inline-signing option:
Use cds_lfht for updatenotify mechanism in dns_db unit
The updatenotify mechanism in dns_db relied on unlocked ISC_LIST for
adding and removing the "listeners". The mechanism relied on the
exclusive mode - it should have been updated only during reconfiguration
of the server. This turned not to be true anymore in the dns_catz - the
updatenotify list could have been updated during offloaded work as the
offloaded threads are not subject to the exclusive mode.
Change the update_listeners to be cds_lfht (lock-free hash-table), and
slightly refactor how register and unregister the callbacks - the calls
are now idempotent (the register call already was and the return value
of the unregister function was mostly ignored by the callers).
Ondřej Surý [Thu, 30 Mar 2023 08:08:52 +0000 (10:08 +0200)]
Add rwlock unit test
Add simple rwlock unit test and rwlock benchmark. The benchmark
compares the pthread rwlock with isc rwlock implementation, so it's
mainly useful when developing a new isc rwlock implementation.
Ondřej Surý [Tue, 27 Jun 2023 06:26:12 +0000 (08:26 +0200)]
Call rcu_barrier() five times in the isc__mem_destroy()
Because rcu_barrier() needs to be called as many times as the number of
nested call_rcu() calls (call_rcu() calls made from call_rcu thread),
and currently there's no mechanism to detect whether there are more
call_rcu callbacks scheduled, we simply call the rcu_barrier() multiple
times. The overhead is negligible and it prevents rare assertion
failures caused by the check for memory leaks in isc__mem_destroy().
Ondřej Surý [Thu, 22 Jun 2023 13:43:04 +0000 (15:43 +0200)]
Don't cleanup the dns_message_checksig fuzzer in atexit handler
After the dns_badcache refactoring, the dns_badcache_destroy() would
call call_rcu(). The dns_message_checksig cleanup which calls
dns_view_detach() happens in the atexit handler, so there might be
call_rcu threads started very late in the process. The liburcu
registers library destructor that destroys the data structured internal
to liburcu and this clashes with the call_rcu thread that just got
started in the atexit() handler causing either (depending on timing):
- a normal run
- a straight segfault
- an assertion failure from liburcu
Instead of trying to cleanup the dns_message_checksig unit, ignore the
leaked memory as we do with all the other fuzzing tests.
Ondřej Surý [Wed, 21 Jun 2023 12:10:28 +0000 (14:10 +0200)]
Make the load-names benchmark multithreaded
The load-names benchmark was originally only measuring single thread
performance of the data structures. As this is not how those are used
in the real life, it was refactored to be multi-threaded with proper
protections in place (rwlock for ht, hashmap and rbt; transactions for
qp).
The qp test has been extended to see effect of the dns_qp_compact() and
rcu_barrier() on the overall speed and memory consumption.
Ondřej Surý [Mon, 19 Jun 2023 13:43:02 +0000 (15:43 +0200)]
Refactor dns_badcache to use cds_lfht lock-free hashtable
The dns_badcache unit had (yet another) own locked hashtable
implementation. Replace the hashtable used by dns_badcache with
lock-free cds_lfht implementation from liburcu.
When dns_request was canceled via dns_requestmgr_shutdown() the cancel
event would be propagated on different loop (loop 0) than the loop where
request was created on. In turn this would propagate down to isc_netmgr
where we require all the events to be called from the matching isc_loop.
Pin the dns_requests to the loops and ensure that all the events are
called on the associated loop. This in turn allows us to remove the
hashed locks on the requests and change the single .requests list to be
a per-loop list for the request accounting.
Additionally, do some extra cleanup because some race condititions are
now not possible as all events on the dns_request are serialized.
With ThreadSanitizer support added to the Userspace RCU, we no longer
need to wrap the call_rcu and caa_container_of with
__tsan_{acquire,release} hints. Remove the direct calls to
__tsan_{acquire,release} and the isc_urcu_{container,cleanup} macros.
Ondřej Surý [Thu, 22 Jun 2023 10:25:45 +0000 (12:25 +0200)]
Workaround AddressSanitizer overzealous check
The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
had a code that operated on the NULL pointer, at the end of the list it
was calling caa_container_of() on the NULL pointer in the init-clause
and iteration-expression, but the result wasn't actually used anywhere
because the cond-expression in the for loop has prevented executing
loop-statement. This made AddressSanitizer notice the invalid operation
and rightfully complain.
This was reported to the upstream and fixed there. Pull the upstream
fix into our <isc/urcu.h> header, so our CI checks pass.
Free struct stub_glue_request in stub_glue_response() callback
When stub_glue_response() is called, the associated data is stored in
newly allocated struct stub_glue_request. The allocated structure is
never freed in the callback, thus we leak a little bit of memory.