Many of our test cases only use a single NamedInstance from the
`servers` fixture. Introduce `nsX` helper fixtures to simplify these
tests and reduce boilterplate code further.
Specifically, the test no longer has to either define its own variable
to extract a single server from the list, or use the longer
servers["nsX"] syntax. While this may seem minor, the amount of times it
is repeated across the tests justifies the change. It also promotes
using more explicit server identification, i.e. `nsX`, rather than
generic `server`. This also improves the clarity of the tests and may be
helpful in traceback during debugging as well.
Prior to this change, there was a single `rollover` test directory, containing 8 tests. These contained even more test scenarios, that were mostly unrelated to each other. This made debugging or even comprehending the tests difficult, as you'd often have to grasp the importance (or rather lack of it) of thousands of lines of setup, configuration and test code, and debug logs.
Now the tests were split up into 14 different test directories, containing 67 tests in total. This makes it much more comprehensible to understand what's going on in any single of these test cases, as there is no unrelated code. It also allows better parallelization and debugging of individual test cases, because of the improved granularity.
Merge branch 'nicki/split-rollover-test-cases' into 'main'
Nicki Křížek [Tue, 10 Jun 2025 14:03:26 +0000 (16:03 +0200)]
Use common test functions for three-is-a-crowd test
Previously, a lot of the checking was re-implemented and duplicated from
check_rollover_step(). Use that function where possible and only
override the needed checks.
Nicki Křížek [Fri, 30 May 2025 15:21:36 +0000 (17:21 +0200)]
Use a single named.conf template in rollover test
Rather than using multiple slightly modified named.conf files, use a
single template which can be rendered differently based on an input
argument -- in this case, csk_roll.
- Use WatchLog.wait_for_sequence() for the configloading test.
- Omit artifacts check, as it seems quite useless for this test case.
- Join all the tests together. The test case is fairly simple here and
this is the easiest way to ensure the log will be in a predictable
state for all tests. Previously, there was no way to ensure
test_configloading_loading() won't be executed after the other tests,
which would render the check moot. It could also be separated into
its own module, but that seems excessive for a simple test case like
this.
- Use jinja2 template for named.conf and remove setup.sh.
- Remove README and put the relevent comment directly next to the test.
- Remove _sh_ from the test filename to uphold the naming convention.
Merge branch 'nicki/refactor-configloading-test' into 'main'
Nicki Křížek [Thu, 26 Jun 2025 15:28:11 +0000 (17:28 +0200)]
Refactor configloading test
- Use WatchLog.wait_for_sequence() for the configloading test.
- Omit artifacts check, as it seems quite useless for this test case.
- Join all the tests together. The test case is fairly simple here and
this is the easiest way to ensure the log will be in a predictable
state for all tests. Previously, there was no way to ensure
test_configloading_loading() won't be executed after the other tests,
which would render the check moot. It could also be separated into
its own module, but that seems excessive for a simple test case like
this.
- Use jinja2 template for named.conf and remove setup.sh.
- Remove README and put the relevent comment directly next to the test.
- Remove _sh_ from the test filename to uphold the naming convention.
- Refactor and extend the `WatchLog.wait_for_line()` API:
1. To allow for usage of one or more FlexPatterns, i.e. either plain
strings to be matched verbatim, or regular expressions. Both can be
used interchangeably to provide the caller to write simple and
readable test code, while allowing for increased complexity to allow
special cases.
2. Always return the regex match, which allows the caller to identify
which line was matched, as well as to extract any additional
information, such as individual regex groups.
- Add `WatchLog.wait_for_sequence()` and `WatchLog.wait_for_all()` helper functions
Merge branch 'nicki/watchlog-improvements' into 'main'
Nicki Křížek [Mon, 23 Jun 2025 15:36:08 +0000 (17:36 +0200)]
Change NamedInstance.rndc() doctest into doc example
The test is troublesome, because NamedInstance(identifier) expects that
a directory with such a name exists. While it'd be possible to mock
those directories as well, it'd make the doctest overly long and
complex, which isn't justified, given that it's only testing a couple of
options. Turn it into regular documentation instead.
Nicki Křížek [Mon, 23 Jun 2025 12:37:09 +0000 (14:37 +0200)]
Split up waiting for match to a separate WatchLog method
To allow re-use in upcoming functions, isolate the line matching logic
into a separate function. Use an instance-wide deadline attribute, which
is set by the calling function.
Nicki Křížek [Mon, 16 Jun 2025 16:39:56 +0000 (18:39 +0200)]
Unify the WatchLog.wait_for_line/s() API
Rather than using two distinct functions for matching either one pattern
(wait_for_line()), or any of multiple patterns (wait_for_lines()), use a
single function that handles both in the same way.
Extend the wait_for_line() API:
1. To allow for usage of one or more FlexPatterns, i.e. either plain
strings to be matched verbatim, or regular expressions. Both can be
used interchangeably to provide the caller to write simple and
readable test code, while allowing for increased complexity to allow
special cases.
2. Always return the regex match, which allows the caller to identify
which line was matched, as well as to extract any additional
information, such as individual regex groups.
Nicki Křížek [Mon, 16 Jun 2025 13:35:43 +0000 (15:35 +0200)]
Abstract WatchLog line buffering to a separate function
Move the line buffering functionality into _readline() to improve the
readability of code. This also allows reading the file contents from
other functions, since the line buffer is now an attribute of the class.
Colin Vidal [Wed, 11 Jun 2025 13:45:52 +0000 (15:45 +0200)]
fix watchlog.py doctest
Fix some broken doctest in watchlog.py (no semantic error, but API
slightly changed and broke some output messags). Also add a test for a
missing failure case.
Michał Kępień [Wed, 16 Jul 2025 05:06:09 +0000 (07:06 +0200)]
Fix broken markup in doc/arm/dlz.inc.rst
Commit a6cce753e2b1096c4db64555d2aee096ba8236ae erroneously used
Markdown syntax in doc/arm/dlz.inc.rst. Replace it with proper
reStructuredText so that the relevant section of the ARM is rendered
correctly.
Michał Kępień [Wed, 16 Jul 2025 05:06:09 +0000 (07:06 +0200)]
Update broken reference to dlz_minimal.h
Commit a6cce753e2b1096c4db64555d2aee096ba8236ae missed a spot in
lib/dns/include/dns/clientinfo.h. Replace the outdated file reference
with the URL used in all similar cases.
The DLZ modules have been moved to a separate Git repository in commit a6cce753e2b1096c4db64555d2aee096ba8236ae. Remove leftover references to
the contrib/dlz/ directory from the main BIND 9 repository.
Michał Kępień [Wed, 16 Jul 2025 05:24:00 +0000 (07:24 +0200)]
fix: pkg: Fix plugin loading
Loading plugins specified using just the shared library name (i.e.
without using an absolute path or a relative path) did not work. This
has been fixed.
See #5379
Merge branch '5379-fix-plugin-loading' into 'main'
Michał Kępień [Wed, 16 Jul 2025 05:22:53 +0000 (07:22 +0200)]
Fix plugin loading
Plugins are built as shared libraries and are therefore installed into
$libdir/bind. Meanwhile, the build system sets the NAMED_PLUGINDIR
preprocessor variable to $datadir/bind instead. This prevents loading
plugins specified in the configuration file using just the shared
library name (i.e. without using an absolute path or a relative path).
Fix by setting NAMED_PLUGINDIR to the path that plugins are actually
installed into.
Mark Andrews [Tue, 15 Jul 2025 14:38:53 +0000 (00:38 +1000)]
chg: usr: Add deprecation warnings for RSASHA1, RSASHA1-NSEC3SHA1 and DS digest type 1
RSASHA1 and RSASHA1-NSEC-SHA1 DNSKEY algorithms have been deprecated
by the IETF and should no longer be used for DNSSEC. DS digest type
1 (SHA1) has also been deprecated. Validators are now expected
to treat these algorithms and digest as unknown, resulting in
some zones being treated as insecure when they were previously treated
as secure. Warnings have been added to named and tools when these
algorithms and this digest are being used for signing.
Zones signed with RSASHA1 or RSASHA1-NSEC-SHA1 should be migrated
to a different DNSKEY algorithm.
Zones with DS or CDS records with digest type 1 (SHA1) should be
updated to use a different digest type (e.g. SHA256) and the digest
type 1 records should be removed.
Related to #5358
Merge branch '5358-add-sha1-deprecation-warnings' into 'main'
Mark Andrews [Thu, 5 Jun 2025 04:49:10 +0000 (14:49 +1000)]
Warn about deprecated DNSKEY and DS algorithms / digest types
DNSKEY algorithms RSASHA1 and RSASHA-NSEC3-SHA1 and DS digest type
SHA1 are deprecated. Log when these are present in primary zone
files and when generating new DNSKEYs, DS and CDS records.
chg: test: Use isctest.asyncserver in the "tsig" test
Replace the custom DNS server used in the "tsig" system test with
new code based on the isctest.asyncserver module.
Changes to isctest.asyncserver are required, previously it did not
handle TSIG signed queries at all. Now, with some hacking around
a [dnspython bug](https://github.com/rthalley/dnspython/issues/1205) it does.
Merge branch 'stepan/tsig-asyncserver' into 'main'
Štěpán Balážik [Mon, 23 Jun 2025 14:43:56 +0000 (16:43 +0200)]
Let queries with TSIG parse in isctest.asyncserver.AsyncDnsServer
Previously, upon receiving a query with TSIG, the server would log
an error and timeout. As there is no way to set up the keyring in the
class anyway (and I believe we don't need it), this commit lets such
queries parse but logs the fact that the query has TSIG.
However, there is a bug [1] in dnspython, which causes `make_response`
and `to_wire` to crash on messages constructed by `from_wire` with
`keyring=False`, so the hack with `message.__class__` is needed to work
around this.
This makes just enough changes for the tsig system test to work with
dnspython >= 2.0.0. On older version the server gives up.
chg: test: Check for FEATURETEST before running pytest
When compiling with meson, it may be easy to forget to compile system
test dependencies before running the tests. In that case, the test
results would be quite incosistent and unpredictable, with some tests
ending up with ERROR, some with FAILURE and others PASS, without a clear
indication that something is off before running the entire machinery.
Add a check to fail early on if the FEATURETEST binary isn't available,
indicating that system test dependencies were most likely not compiled.
Merge branch 'nicki/system-test-check-featuretest' into 'main'
When compiling with meson, it may be easy to forget to compile system
test dependencies before running the tests. In that case, the test
results would be quite incosistent and unpredictable, with some tests
ending up with ERROR, some with FAILURE and others PASS, without a clear
indication that something is off before running the entire machinery.
Add a check to fail early on if the FEATURETEST binary isn't available,
indicating that system test dependencies were most likely not compiled.
Michał Kępień [Thu, 10 Jul 2025 14:57:32 +0000 (16:57 +0200)]
fix: pkg: Fix cross builds
Cross-compilation did not work even when the ``-Ddoc=disabled`` build
option was passed to Meson due to the build targets used for generating
documentation depending on a non-native executable. This has been fixed.
Michał Kępień [Thu, 10 Jul 2025 14:56:15 +0000 (16:56 +0200)]
Fix cross builds
Commit 5c9b4f3163e05f64b97d04cba2c17ef59d682830 inadvertently broke
cross builds by making Meson process the doc/misc/meson.build file even
when sphinx-build is not found in PATH. The doc/misc/meson.build file
defines targets that require a non-native executable, cfg_test, in order
to be built.
Fix by reverting to only processing the doc/misc/ subdirectory when
sphinx-build is found in PATH and moving the relevant alias_target()
method call so that the build targets depending on a non-native
executable are only defined if sphinx-build is found in PATH.
Is there a time when new_qp(c|z)node() would not be followed by
assignment of the namespace? No, so let's add the assignment to the
function that creates the node.
Matthijs Mekking [Tue, 27 May 2025 08:57:11 +0000 (10:57 +0200)]
Rename dns_qp_lookup2 back to dns_qp_lookup
Now that we have to code working, rename 'dns_qp_lookup2' back to
'dns_qp_lookup' and adjust all remaining 'dns_qp_lookup' occurrences
to take a space=0 parameter.
Matthijs Mekking [Mon, 26 May 2025 09:34:16 +0000 (11:34 +0200)]
Fix the dbiterator to assume only one qp-trie
The dbiterator can take three modes: full, nsec3only and nonsec3.
Previously, in full mode the dbiterator requires special logic to jump
from one qp-trie to the other. Now everything is in one trie, other
special logic is needed.
The qp-trie is now sorted in such a way that all the normal nodes come
first, followed by NSEC nodes, and finally the NSEC3 nodes. NSEC nodes
are empty nodes and need to be skipped when iterating.
We add an additional auxiliary node to the trie, an NSEC origin, so
we can easily find the point in the trie where we need to continue
iterating.
Matthijs Mekking [Tue, 13 May 2025 10:47:02 +0000 (11:47 +0100)]
Update qp unit tests merging denial and zone data
If zone and denial data are going to be stored in the same qp storage,
the unit tests need to be updated to reflect this change. The code
changes mainly affect name to qpkey conversion, lookups, and
predecessors.
A note on predecessors: since the denial and zone data are now in the
same qp storage, the predecessor of the first name in the zone data will
consequently be the last name in the denial data.
In preparation to merge the three qp tries (tree, nsec, nsec3) into
one, add the piece of information into the qpkey. This is the most
significant bit of information, so prepend the denial type to the qpkey.
This means we need to pass on the denial type when constructing the
qpkey from a name, or doing a lookup.
Reuse the the DNS_DB_NSEC_* values. Most qp tries in the code we just
pass on 0 (nta, rpz, zt, etc.), because there is no need for denial of
existence, but for qpzone and qpcache we must pass the right value.
Change the code, so that node->nsec no longer can have the value
DNS_DB_NSEC_HAS_NSEC, instead track this in a new attribute 'havensec'.
Since we use node->nsec to convert names to keys, the value MUST be set
before inserting the node into the qp-trie.
Update the fuzzing and unit tests accordingly. This only adds a few
extra test cases, more are needed.
In the qp_test.c we can remove test code for empty keys as this is
no longer possible.
When used with the ``+keepopen`` option with a TCP connection, iscman:`dig`
could terminate unexpectedly in rare situations. Additionally, iscman:`dig`
could hang and fail to shutdown properly when interrupted during a query.
These have been fixed.
Closes #5381
Merge branch '5381-dig-keepalive-crash' into 'main'
Fix a possible hang in dig if a send is interrupted/canceled
When send_done() is called with a ISC_R_CANCELED status (e.g. because
of a signal from ctrl+c), dig can fail to shutdown because
check_if_done() is not called in the branch. Add a check_if_done()
call.
When reusing a TCP connection (because of the '+keepopen' option),
dig detaches from the query after launching it. This can cause a
crash in dig in rare cases when the "receive" callback is called
earlier than the "send" callback.
The '_cancel_lookup()' function detaches a query only if it's
found in the 'lookup->q' list. Before this commit, with one
additional detach happening before recv_done() -> _cancel_lookup()
is called, it didn't cause problems because an earlier _query_detach()
was unlinking the query from 'lookup->q' (because it was the last
reference), so the additional detach and the skipped detach were
undoing each other.
That is unless the "receive" callback was called earlier than the
"send" callback, in which case the additional detach wasn't destroying
the query (and wasn't unlinking it from 'lookup->q') because the "send"
callback's attachment was still there, and so _cancel_lookup() was
trying to "steal" the "send" callback's attachment and causing an
assertion on 'INSIST(query->sendhandle == NULL);'.
Delete the detachment which caused the described situation.
Michał Kępień [Thu, 10 Jul 2025 09:21:04 +0000 (11:21 +0200)]
fix: nil: Do not hardcode release date in man pages
The util/meson-dist-package.sh script hardcodes the date it is run on
into the man pages it creates in the dist tarball. This causes pkgdiff
to report discrepancies if the util/release-tarball-comparison.sh script
is run on a different day than the one the dist tarball was generated
on.
Fix by using the exact same solution as in BIND 9.20: generating the man
page stubs with a @RELEASE_DATE@ placeholder instead of a specific date
and only replacing that placeholder with a specific date during the
build process.
Closes #5412
Merge branch '5412-do-not-hardcode-release-date-in-man-pages' into 'main'
Michał Kępień [Thu, 10 Jul 2025 09:20:46 +0000 (11:20 +0200)]
Do not hardcode release date in man pages
The util/meson-dist-package.sh script hardcodes the date it is run on
into the man pages it creates in the dist tarball. This causes pkgdiff
to report discrepancies if the util/release-tarball-comparison.sh script
is run on a different day than the one the dist tarball was generated
on.
Fix by using the exact same solution as in BIND 9.20: generating the man
page stubs with a @RELEASE_DATE@ placeholder instead of a specific date
and only replacing that placeholder with a specific date during the
build process.
fix: usr: Log dropped or slipped responses in the query-errors category
Responses which were dropped or slipped because of RRL (Response Rate
Limiting) were logged in the ``rate-limit`` category instead of the
``query-errors`` category, as documented in ARM. This has been fixed.
Closes #5388
Merge branch '5388-rrl-log-category-fix' into 'main'
Aram Sargsyan [Mon, 30 Jun 2025 13:12:09 +0000 (13:12 +0000)]
Log dropped or slipped responses in the query-errors category
As mentioned in the comments block before the changed code block,
the dropped or slipped responses should be logged in the query
category (or rather query-errors category as done in lib/ns/client.c),
so that requests are not silently lost.
Also fix a couple of errors/typos in the code comments.
The ns_client_t struct is reset and zeroed out on every query,
but some fields (query, message, manager) are preserved.
We observe two things:
- The sendbuf field is going to be overwritten anyway, there's
no need to zero it out.
- The fields are copied out when the struct is zero-ed out, and
then copied back in. For the query field (which is 896 bytes)
this is very inefficient.
This commit makes the reset more efficient by avoiding the unnecessary
zeroing and copying.
Merge branch 'alessio/experimental-ns-client-noinit' into 'main'
Alessio Podda [Wed, 14 May 2025 13:32:53 +0000 (15:32 +0200)]
Improve efficiency of ns_client_t reset
The ns_client_t struct is reset and zero-ed out on every query,
but some fields (query, message, manager) are preserved.
We observe two things:
- The sendbuf field is going to be overwritten anyway, there's
no need to zero it out.
- The fields are copied out when the struct is zero-ed out, and
then copied back in. For the query field (which is 896 bytes)
this is very inefficient.
This commit makes the reset more efficient avoiding to unnecessary
zero-ing and copy.
This MR reduces lock contention and increases scalability in the ADB by:
a) Using SIEVE algorithm instead of classical LRU;
b) Replacing rwlocked isc_hashmap with RCU cds_lfht table;
c) Replace the single LRU table per-object with per-loop LRU tables per-object.
Merge branch 'ondrej/use-urcu-lfht-for-ADB-tables' into 'main'
Ondřej Surý [Wed, 25 Jun 2025 17:02:37 +0000 (19:02 +0200)]
Use cds_lfht for lock-free hashtables in dns_adb
Replace the read-write locked isc_hashmap with lock-free cds_lfht
hashtable and replace the singular LRU tables for ADB names and entries
with a per-thread LRU tables. These changes allowed to remove all the
read-write locking on the names and entries tables.
Use regular reference counting macro for isc_nm_t structure
Instead of having hand crafted attach/detach/destroy functions, replace
them with the standard ISC_REFCOUNT macro. This also have advantage
that delayed netmgr detach (from dns_dispatch) now doesn't cause
assertion failure. This can happen with delayed (call_rcu) shutdown of
dns_adb.
Ondřej Surý [Thu, 12 Jun 2025 09:14:51 +0000 (11:14 +0200)]
Rewrite dns_adb LRU to SIEVE
The dns_adb cleaning is little bit muddled as it mixes the "TTL"
based cleaning (.expire_v4 and .expire_v6 for adbname, .expires for
adbentry) with overmem cleaning.
Rewrite the LRU based cleaning to use SIEVE algorithm and to be overmem
cleaning only with a requirement to always cleanup at least 2-times the
size of the newly added entry.
chg: dev: Replace per-zone lock buckets with global buckets
Qpzone employs a locking strategy where rwlocks are grouped into
buckets, and each zone gets 17 buckets.
This strategy is suboptimal in two ways:
- If named is serving a single zone or a zone is the majority of the
traffic, this strategy pretty much guarantees contention when using
more than a dozen threads.
- If named is serving many small zones, it causes substantial memory
usage.
This commit switches the locking to a global table initialized at start
time. This should have three effects:
- Performance should improve in the single zone case, since now we are
selecting from a bigger pool of locks.
- Memory consumption should go down significantly in the many zone
cases.
- Performance should not degrade substantially in the many zone cases.
The reason for this is that, while we could have substantially more
zones than locks, we can query/edit only O(num threads) at the same
time. So by making the global table much bigger than the expected
number of threads, we can limit contention.
Merge branch 'alessio/global-qpzone-lock-table' into 'main'