Colin Vidal [Mon, 21 Jul 2025 12:38:30 +0000 (14:38 +0200)]
Export plugin extension in config.h
Dynamically loadable libraries all use the `.so` extension on
BIND9-supported platforms, except for macOS. Export the dynamic library
extension of the current build platform in the generated `config.h`
file, in order to let the plugin code building plugin path based on a
simple plugin name. (which then would be platform-independent)
Fix one-definition-rule violation in the loop unit test
Locally, clang reported following odr-violation:
=================================================================
==1132009==ERROR: AddressSanitizer: odr-violation (0x555555589280):
[1] size=8 'isc__loopmgr' ../lib/isc/loop.c:52:16 in /home/ondrej/Projects/bind9/build/tests/isc/loop
[2] size=8 'isc__loopmgr' ../lib/isc/loop.c:52:16 in /home/ondrej/Projects/bind9/build/tests/isc/../../libisc.so
These globals were registered at these points:
[1]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x55555556abce in _sub_I_00099_1 (/home/ondrej/Projects/bind9/build/tests/isc/loop+0x16bce) (BuildId: e7c586e966e6986532a3da40df41223ae16e55c9)
#2 0x7ffff702a303 in call_init ../csu/libc-start.c:145
#3 0x7ffff702a303 in __libc_start_main_impl ../csu/libc-start.c:347
#4 0x5555555622e4 in _start (/home/ondrej/Projects/bind9/build/tests/isc/loop+0xe2e4) (BuildId: e7c586e966e6986532a3da40df41223ae16e55c9)
[2]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x7ffff75335b9 in _sub_I_00099_1 (/home/ondrej/Projects/bind9/build/tests/isc/../../libisc.so+0x1335b9) (BuildId: 33ab72bc676e9ef9111b3db1fc4347595069cd29)
#2 0x7ffff7fca71e in call_init elf/dl-init.c:74
#3 0x7ffff7fca823 in call_init elf/dl-init.c:120
#4 0x7ffff7fca823 in _dl_init elf/dl-init.c:121
#5 0x7ffff7fe459f (/lib64/ld-linux-x86-64.so.2+0x1f59f) (BuildId: 281ac1521b4102509b1c7ac7004db7c1efb81796)
==1132009==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'isc__loopmgr' at ../lib/isc/loop.c:52:16 in /home/ondrej/Projects/bind9/build/tests/isc/loop
==1132009==ABORTING
Aborted (core dumped)
Rename isc__loopmgr when including the loop.c into loop_test.c to
prevent odr-violation over isc__loopmgr.
Michał Kępień [Wed, 23 Jul 2025 10:16:25 +0000 (12:16 +0200)]
Account for idle timeouts in the "dispatch" test
When the tests-connreset.py module was initially implemented in commit 5c17919019ef0af8226e5bb61214b805bb3e2451, the dispatch code did not
properly apply the idle timeout to TCP connections. This allowed the
check in that test module to reset the TCP connection after 5 seconds as
named did not attempt to tear the connection down earlier than that.
However, as the dispatch code was improved, the idle timeout started
being enforced for TCP dispatches; the exact value it is set to in the
current code depends on a given server's SRTT, but it defaults to about
1.2 seconds for responsive servers. This means that the code paths
triggered by the "dispatch" system test are now different than the ones
it was originally supposed to trigger because it is now named itself
that shuts the TCP connection down cleanly before the ans3 server gets a
chance to reset it.
Account for the above by lowering the amount of time after which the
ans3 server in the "dispatch" system test resets TCP connections to just
1 second, so that the test actually does what its name implies.
Michał Kępień [Wed, 23 Jul 2025 10:16:25 +0000 (12:16 +0200)]
Enable resetting TCP connections
Add a TCP connection handler, ConnectionReset, which enables closing TCP
connections without emptying the client socket buffer, causing the
kernel to send an RST segment to the client. This relies on a horrible
asyncio hack that can break at any point in the future due to abusing
implementation details in the Python Standard Library. Despite the eye
bleeding this code may cause, the approach it takes was still deemed
preferable to implementing an asyncio transport from scratch just to
enable triggering connection resets.
Michał Kępień [Tue, 15 Jul 2025 10:16:32 +0000 (12:16 +0200)]
Enable requesting TCP connections to be closed
In response to client queries, AsyncDnsServer users can currently only
make the server either send a reply or silently ignore the query. In
the case of TCP queries, neither of these actions causes the client's
connection to be closed - the onus of doing that is on the client.
However, in some cases the server may be required to close the
connection on its own, so AsyncDnsServer users need to have some way of
requesting such an action.
Add a new ResponseAction subclass, ResponseDropAndCloseConnection, which
enables AsyncDnsServer users to conveniently request TCP connections to
be closed. Instead of returning the response to send,
ResponseDropAndCloseConnection raises a custom exception that
AsyncDnsServer._handle_tcp() handles accordingly.
=================================================================
==588371==ERROR: AddressSanitizer: odr-violation (0x55555556a060):
[1] size=256 'client_addrs' ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/query
[2] size=256 'client_addrs' ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/../libbindtest.so
These globals were registered at these points:
[1]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x7ffff6a2a303 in call_init ../csu/libc-start.c:145
#2 0x7ffff6a2a303 in __libc_start_main_impl ../csu/libc-start.c:347
#3 0x55555555a084 in _start (/home/ondrej/Projects/bind9/build/tests/ns/query+0x6084) (BuildId: fbe4a3fcf1a249c7d7da69ee8b255a1dbb610c7a)
[2]:
#0 0x7ffff785306f in __asan_register_globals ../../../../src/libsanitizer/asan/asan_globals.cpp:350
#1 0x7ffff7fca71e in call_init elf/dl-init.c:74
#2 0x7ffff7fca823 in call_init elf/dl-init.c:120
#3 0x7ffff7fca823 in _dl_init elf/dl-init.c:121
#4 0x7ffff7fe459f (/lib64/ld-linux-x86-64.so.2+0x1f59f) (BuildId: 281ac1521b4102509b1c7ac7004db7c1efb81796)
==588371==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'client_addrs' at ../tests/ns/netmgr_wrap.c:36:18 in /home/ondrej/Projects/bind9/build/tests/ns/query
==588371==ABORTING
Move the client_addrs and client_refs to libtest to prevent this.
There is only a single network manager running on top of the loop
manager (except for tests). Refactor the network manager to be a
singleton (a single instance) and change the unit tests, so that the
shorter read timeouts apply only to a specific handle, not the whole
extra 'connect_nm' network manager instance.
All the applications built on top of the loop manager were required to
create a single instance of the loop manager. Refactor the loop
manager not to expose this instance to the callers, and keep the loop
manager object internal to the `isc_loop` compilation unit.
This significantly simplifies a number of data structures and calls to
the `isc_loop` API.
Merge branch 'ondrej/refactor-isc_loopmgr-to-be-singleton' into 'main'
All the applications built on top of the loop manager were required to
create just a single instance of the loop manager. Refactor the loop
manager to not expose this instance to the callers and keep the loop
manager object internal to the isc_loop compilation unit.
This significantly simplifies a number of data structures and calls to
the isc_loop API.
chg: usr: Reword the 'shut down hung fetch while resolving' message
The log message 'shut down hung fetch while resolving' may be confusing
because no detection of hung fetches actually takes place, but rather
the timer on the fetch context expires and the resolver gives up.
Change the log message to actually say that instead of the original
cryptic message about hung fetch.
Closes #3148
Merge branch '3148-rename-shut-down-hung-fetch' into 'main'
Reword the 'shut down hung fetch while resolving' message
The log message 'shut down hung fetch while resolving' may be confusing
because no detection of hung fetches actually takes place, but rather
the timer on the fetch context expires and the resolver gives up.
Change the log message to actually say that instead of the original
cryptic message about hung fetch.
chg: test: Use isctest.asyncserver in the "zero" test
The original `ans.pl` server was a copy of the one in `fetchlimit`, so
there are some changes:
- The server now only responds with A replies (which is the only thing
needed).
- The incrementing of the IP address goes beyond the least significant
octet (so, after 192.0.2.255 it will yield 192.0.3.0).
Merge branch 'stepan/zero-asyncserver' into 'main'
Štěpán Balážik [Wed, 18 Jun 2025 10:13:37 +0000 (12:13 +0200)]
Use isctest.asyncserver in the "zero" test
The original `ans.pl` server was based on a copy of the one in
`fetchlimit`, so there are some changes:
- The server now only responds with A replies (which is the only thing
needed).
- The incrementing of the IP address goes beyond the least significant
octet (so, after 192.0.2.255 it will yield 192.0.3.0).
fix: usr: Stale RRsets in a CNAME chain were not always refreshed
With serve-stale enabled, a CNAME chain that contains a stale RRset, the refresh query doesn't always properly refresh the stale RRsets. This has been fixed.
Closes #5243
Merge branch '5243-stale-refresh-as-prefetch' into 'main'
A serve-stale refresh is similar to a prefetch, the only difference
is when it triggers. Where a prefetch is done when an RRset is about
to expire, a serve-stale refresh is done when the RRset is already
stale.
This means that the check for the stale-refresh window needs to
move into query_stale_refresh(). We need to clear the
DNS_DBFIND_STALEENABLED option at the same places as where we clear
DNS_DBFIND_STALETIMEOUT.
Now that serve-stale refresh acts the same as prefetch, there is no
worry that the same rdataset is added to the message twice. This makes
some code obsolete, specifically where we need to clear rdatasets from
the message.
Rename 'free' variable to 'nfree' to not clash with free()
The beauty and horrors of the C - the compiler properly detects variable
shadowing, but you can freely shadow a standard function 'free()' with
variable called 'free'. And if you reference 'free()' just as 'free'
you get the function pointer which means you can do also pointer
arithmetics, so 'free > 0' is always valid even when you delete the
local variable.
Replace the local variables 'free' with a name that doesn't shadow the
'free()' function to prevent future hard to detect bugs.
Mark Andrews [Tue, 15 Jul 2025 05:17:46 +0000 (15:17 +1000)]
test synth-from-dnssec with no cached parent NSECs
Add \007.no-apex-covering as an owner name so that the cache does
not get primed with a parent NSEC RRset to test the case where
dns_qp_lookup returns ISC_R_NOTFOUND.
Mark Andrews [Tue, 15 Jul 2025 05:14:23 +0000 (15:14 +1000)]
Fix find_coveringnsec in qpcache.c
dns_qp_lookup was returning ISC_R_NOTFOUND rather than DNS_R_PARTIALMATCH
when there wasn't a parent with a NSEC record in the cache. This was
causing find_coveringnsec to fail rather than returing the covering NSEC.
fix: test: Add wait_for_keymgr_done() util function to tests
The kasp test cases assume that keymgr operations on the zone under test
have been completed before the test is executed. These are typically
quite fast, but the logs need to be explicitly checked for the messages,
otherwise there's a possibility of race conditions causing the
kasp/rollover tests to become unstable.
Call the wait function in all the kasp/rollover tests where it is
expected (which is generally in each test, unless we're dealing with
unsigned zones).
Closes #5371
Merge branch '5371-wait-keymgr-done-rollover-kasp-tests' into 'main'
The kasp test cases assume that keymgr operations on the zone under test
have been completed before the test is executed. These are typically
quite fast, but the logs need to be explicitly checked for the messages,
otherwise there's a possibility of race conditions causing the
kasp/rollover tests to become unstable.
Call the wait function in all the kasp/rollover tests where it is
expected (which is generally in each test, unless we're dealing with
unsigned zones).
Many of our test cases only use a single NamedInstance from the
`servers` fixture. Introduce `nsX` helper fixtures to simplify these
tests and reduce boilterplate code further.
Specifically, the test no longer has to either define its own variable
to extract a single server from the list, or use the longer
servers["nsX"] syntax. While this may seem minor, the amount of times it
is repeated across the tests justifies the change. It also promotes
using more explicit server identification, i.e. `nsX`, rather than
generic `server`. This also improves the clarity of the tests and may be
helpful in traceback during debugging as well.
Prior to this change, there was a single `rollover` test directory, containing 8 tests. These contained even more test scenarios, that were mostly unrelated to each other. This made debugging or even comprehending the tests difficult, as you'd often have to grasp the importance (or rather lack of it) of thousands of lines of setup, configuration and test code, and debug logs.
Now the tests were split up into 14 different test directories, containing 67 tests in total. This makes it much more comprehensible to understand what's going on in any single of these test cases, as there is no unrelated code. It also allows better parallelization and debugging of individual test cases, because of the improved granularity.
Merge branch 'nicki/split-rollover-test-cases' into 'main'
Nicki Křížek [Tue, 10 Jun 2025 14:03:26 +0000 (16:03 +0200)]
Use common test functions for three-is-a-crowd test
Previously, a lot of the checking was re-implemented and duplicated from
check_rollover_step(). Use that function where possible and only
override the needed checks.
Nicki Křížek [Fri, 30 May 2025 15:21:36 +0000 (17:21 +0200)]
Use a single named.conf template in rollover test
Rather than using multiple slightly modified named.conf files, use a
single template which can be rendered differently based on an input
argument -- in this case, csk_roll.
- Use WatchLog.wait_for_sequence() for the configloading test.
- Omit artifacts check, as it seems quite useless for this test case.
- Join all the tests together. The test case is fairly simple here and
this is the easiest way to ensure the log will be in a predictable
state for all tests. Previously, there was no way to ensure
test_configloading_loading() won't be executed after the other tests,
which would render the check moot. It could also be separated into
its own module, but that seems excessive for a simple test case like
this.
- Use jinja2 template for named.conf and remove setup.sh.
- Remove README and put the relevent comment directly next to the test.
- Remove _sh_ from the test filename to uphold the naming convention.
Merge branch 'nicki/refactor-configloading-test' into 'main'
Nicki Křížek [Thu, 26 Jun 2025 15:28:11 +0000 (17:28 +0200)]
Refactor configloading test
- Use WatchLog.wait_for_sequence() for the configloading test.
- Omit artifacts check, as it seems quite useless for this test case.
- Join all the tests together. The test case is fairly simple here and
this is the easiest way to ensure the log will be in a predictable
state for all tests. Previously, there was no way to ensure
test_configloading_loading() won't be executed after the other tests,
which would render the check moot. It could also be separated into
its own module, but that seems excessive for a simple test case like
this.
- Use jinja2 template for named.conf and remove setup.sh.
- Remove README and put the relevent comment directly next to the test.
- Remove _sh_ from the test filename to uphold the naming convention.
- Refactor and extend the `WatchLog.wait_for_line()` API:
1. To allow for usage of one or more FlexPatterns, i.e. either plain
strings to be matched verbatim, or regular expressions. Both can be
used interchangeably to provide the caller to write simple and
readable test code, while allowing for increased complexity to allow
special cases.
2. Always return the regex match, which allows the caller to identify
which line was matched, as well as to extract any additional
information, such as individual regex groups.
- Add `WatchLog.wait_for_sequence()` and `WatchLog.wait_for_all()` helper functions
Merge branch 'nicki/watchlog-improvements' into 'main'
Nicki Křížek [Mon, 23 Jun 2025 15:36:08 +0000 (17:36 +0200)]
Change NamedInstance.rndc() doctest into doc example
The test is troublesome, because NamedInstance(identifier) expects that
a directory with such a name exists. While it'd be possible to mock
those directories as well, it'd make the doctest overly long and
complex, which isn't justified, given that it's only testing a couple of
options. Turn it into regular documentation instead.
Nicki Křížek [Mon, 23 Jun 2025 12:37:09 +0000 (14:37 +0200)]
Split up waiting for match to a separate WatchLog method
To allow re-use in upcoming functions, isolate the line matching logic
into a separate function. Use an instance-wide deadline attribute, which
is set by the calling function.
Nicki Křížek [Mon, 16 Jun 2025 16:39:56 +0000 (18:39 +0200)]
Unify the WatchLog.wait_for_line/s() API
Rather than using two distinct functions for matching either one pattern
(wait_for_line()), or any of multiple patterns (wait_for_lines()), use a
single function that handles both in the same way.
Extend the wait_for_line() API:
1. To allow for usage of one or more FlexPatterns, i.e. either plain
strings to be matched verbatim, or regular expressions. Both can be
used interchangeably to provide the caller to write simple and
readable test code, while allowing for increased complexity to allow
special cases.
2. Always return the regex match, which allows the caller to identify
which line was matched, as well as to extract any additional
information, such as individual regex groups.
Nicki Křížek [Mon, 16 Jun 2025 13:35:43 +0000 (15:35 +0200)]
Abstract WatchLog line buffering to a separate function
Move the line buffering functionality into _readline() to improve the
readability of code. This also allows reading the file contents from
other functions, since the line buffer is now an attribute of the class.
Colin Vidal [Wed, 11 Jun 2025 13:45:52 +0000 (15:45 +0200)]
fix watchlog.py doctest
Fix some broken doctest in watchlog.py (no semantic error, but API
slightly changed and broke some output messags). Also add a test for a
missing failure case.
Michał Kępień [Wed, 16 Jul 2025 05:06:09 +0000 (07:06 +0200)]
Fix broken markup in doc/arm/dlz.inc.rst
Commit a6cce753e2b1096c4db64555d2aee096ba8236ae erroneously used
Markdown syntax in doc/arm/dlz.inc.rst. Replace it with proper
reStructuredText so that the relevant section of the ARM is rendered
correctly.
Michał Kępień [Wed, 16 Jul 2025 05:06:09 +0000 (07:06 +0200)]
Update broken reference to dlz_minimal.h
Commit a6cce753e2b1096c4db64555d2aee096ba8236ae missed a spot in
lib/dns/include/dns/clientinfo.h. Replace the outdated file reference
with the URL used in all similar cases.
The DLZ modules have been moved to a separate Git repository in commit a6cce753e2b1096c4db64555d2aee096ba8236ae. Remove leftover references to
the contrib/dlz/ directory from the main BIND 9 repository.
Michał Kępień [Wed, 16 Jul 2025 05:24:00 +0000 (07:24 +0200)]
fix: pkg: Fix plugin loading
Loading plugins specified using just the shared library name (i.e.
without using an absolute path or a relative path) did not work. This
has been fixed.
See #5379
Merge branch '5379-fix-plugin-loading' into 'main'
Michał Kępień [Wed, 16 Jul 2025 05:22:53 +0000 (07:22 +0200)]
Fix plugin loading
Plugins are built as shared libraries and are therefore installed into
$libdir/bind. Meanwhile, the build system sets the NAMED_PLUGINDIR
preprocessor variable to $datadir/bind instead. This prevents loading
plugins specified in the configuration file using just the shared
library name (i.e. without using an absolute path or a relative path).
Fix by setting NAMED_PLUGINDIR to the path that plugins are actually
installed into.
Mark Andrews [Tue, 15 Jul 2025 14:38:53 +0000 (00:38 +1000)]
chg: usr: Add deprecation warnings for RSASHA1, RSASHA1-NSEC3SHA1 and DS digest type 1
RSASHA1 and RSASHA1-NSEC-SHA1 DNSKEY algorithms have been deprecated
by the IETF and should no longer be used for DNSSEC. DS digest type
1 (SHA1) has also been deprecated. Validators are now expected
to treat these algorithms and digest as unknown, resulting in
some zones being treated as insecure when they were previously treated
as secure. Warnings have been added to named and tools when these
algorithms and this digest are being used for signing.
Zones signed with RSASHA1 or RSASHA1-NSEC-SHA1 should be migrated
to a different DNSKEY algorithm.
Zones with DS or CDS records with digest type 1 (SHA1) should be
updated to use a different digest type (e.g. SHA256) and the digest
type 1 records should be removed.
Related to #5358
Merge branch '5358-add-sha1-deprecation-warnings' into 'main'
Mark Andrews [Thu, 5 Jun 2025 04:49:10 +0000 (14:49 +1000)]
Warn about deprecated DNSKEY and DS algorithms / digest types
DNSKEY algorithms RSASHA1 and RSASHA-NSEC3-SHA1 and DS digest type
SHA1 are deprecated. Log when these are present in primary zone
files and when generating new DNSKEYs, DS and CDS records.
chg: test: Use isctest.asyncserver in the "tsig" test
Replace the custom DNS server used in the "tsig" system test with
new code based on the isctest.asyncserver module.
Changes to isctest.asyncserver are required, previously it did not
handle TSIG signed queries at all. Now, with some hacking around
a [dnspython bug](https://github.com/rthalley/dnspython/issues/1205) it does.
Merge branch 'stepan/tsig-asyncserver' into 'main'
Štěpán Balážik [Mon, 23 Jun 2025 14:43:56 +0000 (16:43 +0200)]
Let queries with TSIG parse in isctest.asyncserver.AsyncDnsServer
Previously, upon receiving a query with TSIG, the server would log
an error and timeout. As there is no way to set up the keyring in the
class anyway (and I believe we don't need it), this commit lets such
queries parse but logs the fact that the query has TSIG.
However, there is a bug [1] in dnspython, which causes `make_response`
and `to_wire` to crash on messages constructed by `from_wire` with
`keyring=False`, so the hack with `message.__class__` is needed to work
around this.
This makes just enough changes for the tsig system test to work with
dnspython >= 2.0.0. On older version the server gives up.
chg: test: Check for FEATURETEST before running pytest
When compiling with meson, it may be easy to forget to compile system
test dependencies before running the tests. In that case, the test
results would be quite incosistent and unpredictable, with some tests
ending up with ERROR, some with FAILURE and others PASS, without a clear
indication that something is off before running the entire machinery.
Add a check to fail early on if the FEATURETEST binary isn't available,
indicating that system test dependencies were most likely not compiled.
Merge branch 'nicki/system-test-check-featuretest' into 'main'