Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Add reconfiguration support to NamedInstance
Reconfiguring named using RNDC is a common action in BIND 9 system
tests. It involves sending the "reconfig" RNDC command to a named
instance and waiting until it is fully processed. Add a reconfigure()
method to the NamedInstance class in order to simplify and standardize
named reconfiguration using RNDC in Python-based system tests.
TODO:
- full reconfiguration support (w/templating *.in files)
- add an "rndc null" before every reconfiguration to show which file
is used (NamedInstance.add_mark_to_log() as it may be generically
useful?)
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Run mypy checks on Python helpers in GitLab CI
Ensure the type hints provided in helper code for Python-based system
tests are correct by continuously checking them using mypy in GitLab CI.
Check bin/tests/system/isctest.py exclusively for the time being because
it is the only Python file in the source tree which uses static typing
at the moment and working around the issues reported by mypy for other
(non-statically-typed) Python files present in the source tree would be
cumbersome.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Clean up the "checkds" system test
The "checkds" system test contains a lot of duplicated code despite
carrying out the same set of actions for every tested scenario
(zone_check() → wait for logs to appear → keystate_check()). Extract
the parts of the code shared between all tests into a new function,
test_checkds(), and use pytest's test parametrization capabilities to
pass distinct sets of test parameters to this new function, in an
attempt to cleanly separate the fixed parts of this system test from the
variable ones. Replace format() calls with f-strings.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Drop use of dns.resolver.Resolver from "checkds"
The "checkds" system test only uses dns.resolver.Resolver objects to
access their 'nameservers' and 'port' attributes. Instances of the
NamedInstance class also expose that information via their attributes,
so only pass NamedInstance objects around instead of needlessly
depending on dns.resolver.Resolver.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Use helper Python classes for watching log files
Make log file watching in Python-based system tests consistent by
employing the helper Python classes designed for that purpose. Drop the
custom code currently used.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Add helper Python classes for watching log files
Waiting for a specific log line to appear in a named.run file is a
common action in BIND 9 system tests. Implement a set of Python classes
which intend to simplify and standardize this task in Python-based
system tests.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Simplify use of RNDC in Python-based tests
The "addzone" and "shutdown" system tests currently invoke rndc using
test-specific helper code. Rework the relevant bits of those tests so
that they use the helper classes from bin/tests/system/isctest.py.
Michał Kępień [Tue, 25 Jul 2023 12:37:05 +0000 (14:37 +0200)]
Implement Python helpers for using RNDC in tests
Controlling named instances using RNDC is a common action in BIND 9
system tests. However, there is currently no standardized way of doing
that from Python-based system tests, which leads to code duplication.
Add a set of Python classes and pytest fixtures which intend to simplify
and standardize use of RNDC in Python-based system tests.
For now, RNDC commands are sent to servers by invoking the rndc binary.
However, a switch to a native Python module able to send RNDC commands
without executing external binaries is expected to happen soon. Even
when that happens, though, having the capability to invoke the rndc
binary (in order to test it) will remain useful. Define a common Python
interface that such "RNDC executors" should implement (RNDCExecutor), in
order to make switching between them convenient.
Evan Hunt [Wed, 20 Dec 2023 08:32:57 +0000 (00:32 -0800)]
prevent an infinite loop in fix_iterator()
it was possible for fix_iterator() to get stuck in a loop while
trying to find the predecessor of a missing node. this has been
fixed and a regression test has been added.
Evan Hunt [Wed, 20 Dec 2023 20:38:12 +0000 (12:38 -0800)]
fix_iterator() could produce incoherent iterator stacks
the fix_iterator() function moves an iterator so that it points
to the predecessor of the searched-for name when that name doesn't
exist in the database. the tests only checked the correctness of
the top of the stack, however, and missed some cases where interior
branches in the stack could be missing or duplicated. in these
cases, the iterator would produce inconsistent results when walked.
the predecessors test case in qp_test has been updated to walk
each iterator to the end and ensure that the expected number of
nodes are found.
Matthijs Mekking [Tue, 19 Dec 2023 12:23:44 +0000 (13:23 +0100)]
Regression check for NSEC3 to NSEC3 conversion
When changing the NSEC3 chain, the new NSEC3 chain must be built before
the old NSEC3PARAM is removed. Check each delta in the conversion to
ensure this ordering is met.
Mark Andrews [Mon, 18 Dec 2023 00:23:21 +0000 (11:23 +1100)]
Regression check for NSEC3 to NSEC conversion
When transitioning from NSEC3 to NSEC the NSEC3 must be built before
the NSEC3PARAM is removed. Check each delta in the conversion to
ensure this ordering is met.
Mark Andrews [Wed, 20 Dec 2023 02:07:51 +0000 (13:07 +1100)]
Update the NSEC3PARAM TTL to match the SOA minimum
When building NSEC3 chains update the NSEC3PARAM TTL to match
the SOA minimum. Delete all records using the old TTL then
re-add them using the new TTL.
Evan Hunt [Thu, 16 Nov 2023 02:42:43 +0000 (18:42 -0800)]
disable checks by default in named-compilezone
Zone content integrity checks can significantly slow the conversion
of zones from raw to text. As this is more properly a job for
named-checkzone anyway, we now disable all zone checks by
default in named-compilezone.
Users relying on named-compilezone for integrity checks as
well as format conversion can run named-checkzone separately,
or re-enable the checks in named-compilezone by using:
"named-compilezone -n fail -k fail -r warn -T warn -W warn".
Mark Andrews [Wed, 6 Dec 2023 00:34:52 +0000 (11:34 +1100)]
dns_request_cancel needs to be callable from any thread
Check the tid and cancel the request immediately or pass it to the
appropriate loop for processing. Call request->cb directly from
req_sendevent as it is now always called with the correct tid.
Michał Kępień [Wed, 20 Dec 2023 16:21:14 +0000 (17:21 +0100)]
Do not destroy IXFR journal in xfrin_end()
The xfrin_end() function is run when a zone transfer is finished or
canceled. One of the actions it takes for incremental transfers (IXFR)
is calling dns_journal_destroy() on the zone journal structure that is
stored in the relevant zone transfer context (xfr->ixfr.journal). That
immediately invalidates that structure as it is not reference-counted.
However, since the changes present in the IXFR stream are applied to the
journal asynchronously (via isc_work_enqueue()), it is possible that
some zone changes may still be in the process of being written to the
journal by the time xfrin_end() destroys the relevant structure. Such a
scenario leads to crashes.
Fix by not destroying the zone journal structure until the entire zone
transfer context is destroyed. xfrin_destroy() already conditionally
calls dns_journal_destroy() and when the former is called, all
asynchronous work for a given zone transfer process is guaranteed to be
complete.
Matthijs Mekking [Wed, 13 Dec 2023 08:38:17 +0000 (09:38 +0100)]
Remove kasp mutex lock
Multiple zones should be able to read the same key and signing policy
at the same time. Since writing the kasp lock only happens during
reconfiguration, and the complete kasp list is being replaced, there
is actually no need for a lock. Reference counting ensures that a kasp
structure is not destroyed when still being attached to one or more
zones.
This significantly improves the load configuration time.
Mark Andrews [Mon, 18 Dec 2023 00:23:21 +0000 (11:23 +1100)]
Regression check for missing RRSIGs
When transitioning from NSEC3 to NSEC the added records where not
being signed because the wrong time was being used to determine if
a key should be used or not. Check that these records are actually
signed.
Mark Andrews [Thu, 14 Dec 2023 22:42:10 +0000 (09:42 +1100)]
Use 'now' rather than 'inception' in 'add_sigs'
When kasp support was added 'inception' was used as a proxy for
'now' and resulted in signatures not being generated or the wrong
signatures being generated. 'inception' is the time to be set
in the signatures being generated and is usually in the past to
allow for clock skew. 'now' determines what keys are to be used
for signing.
Michał Kępień [Mon, 18 Dec 2023 14:11:39 +0000 (15:11 +0100)]
"trust-anchor-telemetry" is no longer experimental
Remove the CFG_CLAUSEFLAG_EXPERIMENTAL flag from the
"trust-anchor-telemetry" statement as the behavior of the latter has not
been changed since its initial implementation and there are currently no
plans to do so. This silences a relevant log message that was emitted
even when the feature was explicitly disabled.
Michał Kępień [Mon, 18 Dec 2023 10:33:43 +0000 (11:33 +0100)]
Fix reference counting in do_nsfetch()
Each function queuing a do_nsfetch() call using isc_async_run() is
expected to increase the given zone's internal reference count
(zone->irefs), which is then correspondingly decreased in either
do_nsfetch() itself (when the dns_resolver_createfetch() fails) or in
nsfetch_done() (when recursion is finished).
However, do_nsfetch() can also return early if either the zone itself or
the relevant view's resolver object is being shut down. In that case,
do_nsfetch() simply returns without decreasing the internal reference
count for the zone. This leaves a dangling zone reference around, which
leads to hangs during named shutdown.
Fix by executing the same cleanup code for early returns from
do_nsfetch() as for a failed dns_resolver_createfetch() call in that
function as the reference count will not be decreased in nsfetch_done()
in any of these cases.
Michał Kępień [Mon, 18 Dec 2023 10:07:04 +0000 (11:07 +0100)]
Prevent an infinite loop in shutdown_listener()
The loop in shutdown_listener() assumes that the reference count for
every controlconnection_t object on the listener->connections linked
list will drop down to zero after the conn_shutdown() call in the loop's
body. However, when the timing is just right, some netmgr callbacks for
a given control connection may still be awaiting processing by the same
event loop that executes shutdown_listener() when the latter is run.
Since these netmgr callbacks must be run in order for the reference
count for the relevant controlconnection_t objects to drop to zero, when
the scenario described above happens, shutdown_listener() runs into an
infinite loop due to one of the controlconnection_t objects on the
listener->connections linked list never going away from the head of that
list.
Fix by safely iterating through the listener->connections list and
initiating shutdown for all controlconnection_t objects found. This
allows any pending netmgr callbacks to be run by the same event loop in
due course, i.e. after shutdown_listener() returns.
Aram Sargsyan [Tue, 12 Dec 2023 14:54:40 +0000 (14:54 +0000)]
Fix a statschannel system test zone loadtime issue
The check_loaded() function compares the zone's loadtime value and
an expected loadtime value, which is based on the zone file's mtime
extracted from the filesystem.
For the secondary zones there may be cases, when the zone file isn't
ready yet before the zone transfer is complete and the zone file is
dumped to the disk, so a so zero value mtime is retrieved.
In such cases wait one second and retry until timeout. Also modify
the affected check to allow a possible difference of the same amount
of seconds as the chosen timeout value.
Aram Sargsyan [Fri, 15 Dec 2023 09:43:36 +0000 (09:43 +0000)]
Use atomic store operations instead of atomic initialize
The atomic_init() function makes sense to use with structure's
members when creating a new instance of a strucutre. In other
places, use atomic store operations instead, in order to avoid
data races.
Move the code to find the predecessor into one function, as it is shares
quite some similarities: In both cases we first need to find the
immediate predecessor/successor, then we need to find the immediate
predecessor if the iterator is not already pointing at it.
This one is similar to the bug when searching for a key, reaching a
dead-end branch that doesn't match, because the branch offset point
is after the point where the search key differs.
This fixes the case where we are multiple levels deep. In other
words, we had a more-than-one matches *after* the point where the
search key differs.
If searching for a key "monky", we would reach the branch with
twigs "moo[k]" and "moo[n]". The key matches on the 'k' on offset=4,
and reaches the branch with twigs "mook[e]" and "mook[o]". This time
we cannot find a twig that matches our key at offset=5, there is no
twig for 'y'. The closest name we found was "mooker".
Note that on a branch it can't detect it is on a dead branch because the
key is not encapsulated in a branch node.
In the previous code we considered "mooker" to be the successor of
"monky" and so we needed to the predecessor of "mooker" to find the
predecessor for "monky". However, since the search key alread differed
before entering this branch, this is not enough. We would be left with
"moog" as the predecessor of "monky", while in this example "a.b.c.d.e"
is the actual predecessor.
Instead, we need to go up a level, find the predecessor and check
again if we are on the right branch, and repeat the process until we
are.
There was yet another edge case in which an iterator could be
positioned at the wrong node after dns_qp_lookup(). When searching for
a key, it's possible to reach a leaf that matches at the given offset,
but because the offset point is *after* the point where the search key
differs from the leaf's contents, we are now at the wrong leaf.
In other words, the bug fixed the previous commit for dead-end branches
must also be applied on matched leaves.
For example, if searching for the key "monpop", we could reach a branch
containing "moop" and "moor". the branch offset point - i.e., the point
after which the branch's leaves differ from each other - is the
fourth character ("p" or "r"). The search key matches the fourth
character "p", and takes that twig to the next node (which can be
a branch for names starting with "moop", or could be a leaf node for
"moop").
The old code failed to detect this condition, and would have
incorrectly left the iterator pointing at some successor, and not
at the predecessor of the "moop".
To find the right predecessor in this case, we need to get to the
previous branch and get the previous from there.
This has been fixed and the unit test now includes several new
scenarios for testing search names that match and unmatch on the
offset but have a different character before the offset.