Tom Krizek [Thu, 8 Jun 2023 13:34:30 +0000 (15:34 +0200)]
Use arithmetic expansion in system tests
Change the way arithmetic operations are performed in system test shell
scripts from using `expr` to `$(())`. This ensures that updating the
variable won't end up with a non-zero exit code, which would case the
script to exit prematurely when `set -e` is in effect.
The following replacements were performed using sed in all text files
(git grep -Il '' | xargs sed -i):
Tom Krizek [Wed, 7 Jun 2023 13:35:57 +0000 (15:35 +0200)]
Run system tests with set -e
Ensure all shell system tests are executed with the errexit option set.
This prevents unchecked return codes from commands in the test from
interfering with the tests, since any failures need to be handled
explicitly.
Mark Andrews [Thu, 6 Jul 2023 06:58:53 +0000 (16:58 +1000)]
Return BADCOOKIE on validly formed bad SERVER COOKIES
The server was previously tolerant of out-of-date or otherwise bad
DNS SERVER COOKIES that where well formed unless require-cookie was
set. BADCOOKIE is now return for these conditions.
Add shutdown checks in dns_catz_dbupdate_callback()
When a zone database update callback is called, the 'catzs' object,
extracted from the callback argument, might be already shutting down,
in which case the 'catzs->zones' can be NULL and cause an assertion
failure when calling isc_ht_find().
Add an early return from the callback if 'catzs->shuttingdown' is true.
Also check the validity of 'catzs->zones' after locking 'catzs' in
case there is a race with dns_catz_shutdown_catzs() running in another
thread.
Matthijs Mekking [Tue, 27 Jun 2023 14:29:50 +0000 (16:29 +0200)]
Add test for "three is a crowd" bug (GL #2375)
Add this test scenario for a bug fixed a while ago. When a third key is
introduced while the previous rollover hasn't finished yet, the keymgr
could decide to remove the first two keys, because it was not checking
for an indirect dependency on the keys.
In other words, the previous bug behavior was that the first two keys
were removed from the zone too soon.
This test case checks that all three keys stay in the zone, and no keys
are removed premature after another new key has been introduced.
Matthijs Mekking [Tue, 27 Jun 2023 14:27:35 +0000 (16:27 +0200)]
Check all keys despite early failure
In the kasp script, if one expected key is not found, continue checking
the other key ids, even if there is no match for the first one. This
provides a bit more information which keys mismatch and makes for
easier debugging test failures.
in the past there was overlap between the fields used
as resolver fetch options and ADB addrinfo flags. this has
mostly been eliminated; now we can clean up the rest of
it and remove some confusing comments.
Tom Krizek [Mon, 26 Jun 2023 13:19:35 +0000 (15:19 +0200)]
Use timeout for rndc status in shutdown test
Pass 5 second timeout to the rndc status command(s) to avoid hitting the
hard 10 second timeout from subprocess.call, which would result in an
unwanted exception that would only mask the real issue: if the rndc
status times out in this test, it is likely due to the server not
stopping as it should.
Tom Krizek [Mon, 26 Jun 2023 13:18:45 +0000 (15:18 +0200)]
Split shutdown test into separate test cases
The shutdown test attempts to shut down the server using two different
methods - rndc and sigterm. Use pytest.mark.parametrize to run these as
separate test cases for easier identification of failures.
Evan Hunt [Wed, 28 Jun 2023 21:52:53 +0000 (14:52 -0700)]
fix a TSAN bug in "rndc fetchlimit"
fctx counters could be accessed without locking when
"rndc fetchlimit" is called; while this is probably harmless
in production, it triggered TSAN reports in system tests.
Evan Hunt [Wed, 28 Jun 2023 16:28:01 +0000 (09:28 -0700)]
minor refactoring of resume_qmin() for clarity
make the code flow clearer by enumerating the result codes that
are treated as success conditions for an intermediate minimized
query (ISC_R_SUCCESS, DNS_R_DELEGATION, DNS_R_NXRRSET, etc), rather
than just folding them all into the 'default' branch of a switch
statement.
Tom Krizek [Mon, 26 Jun 2023 15:14:16 +0000 (17:14 +0200)]
Fix checking for executables in shell conditions in tests
Surround the variables which are checked whether they're executable in
double quotes. Without them, empty paths won't be properly interpreted
as not executable.
Tom Krizek [Mon, 26 Jun 2023 14:40:03 +0000 (16:40 +0200)]
Disable delv tests under TSAN
Since delv can occasionally hang in system tests when running with TSAN
(see GL#4119), disable these tests as a workaround. Otherwise, the hung
delv process will just waste CI resources and prevent any meaningful
output from the rest of the test suite.
Mark Andrews [Mon, 19 Jun 2023 07:49:05 +0000 (17:49 +1000)]
Test legacy HMAC key files with dig
tsig-keygen is now used to generate key files for TSIG. These have
a different format to those that were generated by dnssec-keygen.
Test that dig can still read these files.
Mark Andrews [Mon, 19 Jun 2023 04:14:39 +0000 (14:14 +1000)]
Test support with legacy HMAC K files with nsupdate
tsig-keygen generates key files that are different to those that
where generated by dnssec-keygen. Check that nsupdate can still
read those old format files.
Mark Andrews [Mon, 19 Jun 2023 04:17:14 +0000 (14:17 +1000)]
Restore the ability to read legacy K*+157+* files
The ability to read legacy HMAC-MD5 K* keyfile pairs using algorithm
number 157 was accidentally lost when the algorithm numbers were
consolidated into a single block, in commit 09f7e0607a34d90eae53f862954e98c31b5ae532.
The assumption was that these algorithm numbers were only known
internally, but they were also used in key files. But since HMAC-MD5
got renumbered from 157 to 160, legacy HMAC-MD5 key files no longer
work.
Move HMAC-MD5 back to 157 and GSSAPI back to 160. Add exception for
GSSAPI to list_hmac_algorithms.
Tony Finch [Tue, 6 Jun 2023 14:20:44 +0000 (15:20 +0100)]
Check for overflow in jemalloc_shim
When compiled using a malloc that lacks an equivalent to sallocx(),
the jemalloc_shim adds a size prefix to each allocation. We must check
that this does not overflow.
Tony Finch [Tue, 6 Jun 2023 14:11:13 +0000 (15:11 +0100)]
Add <isc/overflow.h> for checked mul, add, and sub
The `ISC_OVERFLOW_XXX()` macros are usually wrappers around
`__builtin_xxx_overflow()`, with alternative implementations
for compilers that lack the builtins.
Replace the overflow checks in `isc/time.c` with the new macros.
Ondřej Surý [Wed, 7 Jun 2023 13:23:08 +0000 (15:23 +0200)]
Use per-loop memory contexts for dns_resolver child objects
The dns_resolver creates a lot of smaller objects (fetch context, fetch
counter, query, response, ...) and those are all loop-bound.
Previously, those objects were allocated from the a single resolver
context, which in turn increases contention between threads - remember
"dead by thousand atomic paper cuts". Instead of using a single memory
context, use the per-loop memory contexts that are bound to a specific
loop and thus there's no contention between them when doing the memory
accounting.
Ondřej Surý [Mon, 26 Jun 2023 13:58:48 +0000 (15:58 +0200)]
Remove the explicit call_rcu thread creating and destruction
The free_all_cpu_call_rcu_data() call can consume hundreds of
milliseconds on shutdown. Don't try to be smart and let the RCU library
handle this internally.
Evan Hunt [Fri, 2 Jun 2023 00:14:49 +0000 (17:14 -0700)]
explicitly set dnssec-validation in system tests
the default value of dnssec-validation is 'auto', which causes
a server to send a key refresh query to the root zone when starting
up. this is undesirable behavior in system tests, so this commit
sets dnssec-validation to either 'yes' or 'no' in all tests where
it had not previously been set.
this change had the mostly-harmless side effect of changing the cached
trust level of unvalidated answer data from 'answer' to 'authanswer',
which caused a few test cases in which dumped cache data was examined in
the serve-stale system test to fail. those test cases have now been
updated to expect 'authanswer'.
Tom Krizek [Thu, 22 Jun 2023 16:08:17 +0000 (18:08 +0200)]
Check for proper file size output in dnstap test
Previously, the first check silently failed, as 454 is apparently (in my
local setup) the minimum output size for the dnstap output, rather than
470 which the test was expecting. Effectively, the check served as a 5
second sleep rather than waiting for the proper file size.
Additionally, check the expected file sizes and fail if expectations
aren't met.
Tom Krizek [Thu, 22 Jun 2023 15:57:22 +0000 (17:57 +0200)]
Check for proper log message in kasp test
The log message is supposed to contain the zone name which was
erroneously omitted, but didn't pop up during tests, since return code
was silently ignored.
Now it actually waits for the proper log message rather than being an
equivalent of 3 second sleep (which was also sufficient to make the test
pass, thus we detected no failure).