Michal Nowak [Fri, 5 Jun 2026 14:33:17 +0000 (16:33 +0200)]
chg: ci: Build unit tests in the unit test job
Building the unit tests in the build job ships them in the CI artifact
(+200 MB) and transfers them over the network. Build them in the unit
test job instead.
Git checks the sources out newer than the build tree restored from the
artifact, which would make meson rebuild all of BIND 9 in the unit test
job. Age the sources so the build is treated as up to date and only the
unit tests get compiled.
Assisted-by: Claude:claude-opus-4-8
Merge branch 'mnowak/build-unit-tests-in-unit-job' into 'main'
Michal Nowak [Wed, 3 Jun 2026 13:53:51 +0000 (13:53 +0000)]
Build unit tests in the unit test job
Building the unit tests in the build job ships them in the CI artifact
(+200 MB) and transfers them over the network. Build them in the unit
test job instead.
When Git checks out the sources, their modification times are newer than
the build tree restored from the artifact, so meson would rebuild all of
BIND 9 in the unit test job. Age the tracked sources so the build is
treated as up to date and only the unit tests get compiled.
Nicki Křížek [Thu, 4 Jun 2026 17:16:12 +0000 (19:16 +0200)]
new: test: pytest helpers for dnssec and zone setup
- Create `isctest.zone.Zone` helper for zone setup (including signing).
- Add `ZoneKey` helpers for both dnssec-keygen managed keys and python-based keys.
- Add `dnssec_py` shared test setup for DNSSEC tests.
- Add the first example - refactor `nsec3_delegations` into a `dnssec_py` test module.
Add ZoneKey helpers for key operations in isctest.zone
Introduce an abstract ZoneKey base class with two concrete
implementations:
- FileZoneKey wraps a dnssec-keygen-managed key file (kasp.Key).
- PythonZoneKey holds a Python-native keypair for dnspython-based
signing and key operations.
Both share ZoneKey.into_ta() and ZoneKey.is_ksk(). The ZoneKey
abstraction lets Zone.copy_dssets() and Zone.trust_anchors() handle
pure-Python keys without callers needing to know how the key was made.
Rewrite nsec3_delegation/tests_excessive_nsec3_iterations.py as
dnssec_py/tests_nsec3_iter_too_many.py using the isctest.zone helpers.
The test is a reproducer for CVE-2026-1519 [GL#5708]. It sets up a
delegation from nsec3-iter-too-many. (ns2) to an unsigned sub zone
(ns3), signing the parent with NSEC3 at 51 iterations. A validating
resolver (ns9) must use NSEC3 to prove the sub zone is insecure; the
excessive iteration count is logged as a warning. The test verifies that
the query still resolves successfully (insecure, not SERVFAIL) despite
the high iteration count.
Add a new system test directory for DNSSEC tests written in Python,
using the isctest.zone helpers for zone setup rather than shell sign
scripts.
Set up four nameservers:
- ns1: authoritative for the signed root zone
- ns2: authoritative for test zones (primary)
- ns3: authoritative for additional test zones (typically delegations)
- ns9: validating resolver
Zone configuration for ns2 and ns3 is driven by the ``zones`` template
variable via _common/zones.conf.j2, so each test module's bootstrap()
controls which zones those servers load without touching named.conf.
Individual test modules will be added in subsequent commits.
System tests that set up zones — especially DNSSEC tests — require a
chain of common operations: rendering zone files from templates,
generating keys, signing, and propagating DS records to parent zones.
Implement these as methods on isctest.zone.Zone so individual tests
don't need to repeat the logic in shell or ad-hoc Python.
isctest.zone.Zone is a plain class that holds the zone's data and
accumulated state (delegations, keys) alongside the methods that operate
on it. It is intentionally separate from isctest.template.Zone, which
remains a dumb data container for jinja2 template rendering.
Key design points:
- zone.Zone.name is the text form without trailing dot ("." for root);
zone.Zone.dname holds the dns.name.Name for DNS-level operations;
zone.Zone.basename is the filesystem-safe name ("root" for ".").
- filepath_unsigned / filepath_signed are both always available.
filepath returns the appropriate one based on zone.Zone.signed.
- The zones/ subdirectory is the default (subdir="zones"); old-style
tests that place zone files directly in the ns workdir can pass
subdir=None.
- Signing is opt-in via signed=True; configure() auto-detects whether to
generate keys and sign based on this flag, so the same method handles
both signed and unsigned zones.
- delegations and keys are mutable list attributes; callers append to
them before calling configure() rather than threading them through
every call.
Also:
- Add isctest.template.zones() as a bridge from a list of zone.Zone to a
{name: template.Zone} dict suitable for use as the ``zones`` template
variable. template.zones() resolves filepath to the actual zone file
so templates don't need to know whether a zone is signed.
Ondřej Surý [Thu, 4 Jun 2026 13:55:29 +0000 (15:55 +0200)]
fix: dev: Fix a possible crash when cleaning up a view's caches
In rare cases named could crash while a view was being removed, for example
during reconfiguration or shutdown, as its internal caches were torn down.
This has been fixed.
Closes #6119
Merge branch '6119-fix-possible-uaf-when-destroying-dns_badcache' into 'main'
Ondřej Surý [Wed, 3 Jun 2026 09:27:14 +0000 (11:27 +0200)]
Fix use-after-free when destroying the bad and unreachable caches
Eviction of an entry owned by another loop was bounced to that loop via
isc_async_run(), so a queued list removal could run after the cache had
freed its LRU lists. Use a single mutex-guarded LRU list instead, removing
entries synchronously under the lock, and let each entry hold its own
memory-context reference so the RCU free never touches a gone loop.
Colin Vidal [Thu, 4 Jun 2026 13:09:46 +0000 (15:09 +0200)]
new: dev: Add DTrace support for resolver queries
When `fctx_query()` is called, a DTrace probe (if enabled) prints the
fetch context address, the upstream server address and port, and the
latest known SRTT for the server.
Merge branch 'colin/dtrace-resolver-query' into 'main'
Colin Vidal [Wed, 13 May 2026 07:53:35 +0000 (09:53 +0200)]
Add DTrace support for resolver queries
When `fctx_query()` is called, a DTrace probe (if enabled) prints the
fetch context address, the upstream server address and port, and the
latest known SRTT for the server.
Colin Vidal [Thu, 4 Jun 2026 11:53:39 +0000 (13:53 +0200)]
fix: usr: Do not assert on synthrecord reverse mode with huge prefix
When using the `synthrecord` plugin in reverse mode, if a very long
prefix is configured by the operator such that there is no room to fit
the reversed IP address into a DNS name, `named` could assert. This has
now been fixed. In such situations, an error is logged so the operator
is aware of the problem, and `NXDOMAIN` is answered.
Closes #6115
Merge branch '6115-synthrecord-prefix' into 'main'
Colin Vidal [Wed, 3 Jun 2026 14:08:57 +0000 (16:08 +0200)]
Do not assert on synthrecord reverse mode with huge prefix
When using the `synthrecord` plugin in reverse mode, if a very long
prefix is configured by the operator such that there is no room to fit
the reversed IP address into a DNS name, `named` could assert. This has
now been fixed. In such situations, an error is logged so the operator
is aware of the problem, and `NXDOMAIN` is answered.
Colin Vidal [Wed, 3 Jun 2026 14:09:12 +0000 (16:09 +0200)]
Add synthrecord systest with long prefix
Add a system test covering the synthrecord in reverse mode with a (too)
long prefix. If the prefix size doesn't leave room to add the reversed
IP address, the attempt to generate a name is aborted, and `NXDOMAIN` is
returned.
Ondřej Surý [Thu, 4 Jun 2026 11:25:09 +0000 (13:25 +0200)]
chg: dev: Simplify the delegation database memory management
This is an internal simplification of the delegation database's memory
management, replacing the per-thread eviction lists and deferred,
cross-thread record cleanup with a single shared eviction list and
immediate cleanup. There is no change to how delegations are cached or
resolved.
Merge branch 'ondrej/delegdb-shared-sieve-lru' into 'main'
Ondřej Surý [Wed, 3 Jun 2026 17:58:03 +0000 (19:58 +0200)]
Simplify the delegation database LRU to a single shared SIEVE
The delegation database kept one SIEVE LRU list per loop so that node
eviction could run lock-free on each node's owning loop; this required
every node to hold a loop reference and to defer its own destruction to
that loop via isc_async_run(). Move the SIEVE unlink into the QP write
transaction, taking the evicted node directly from dns_qp_deletename(),
which serialises every list mutation under the qpmulti writer lock and
lets a single shared list replace the per-loop arrays. Node and database
teardown are now synchronous.
The QP trie and the SIEVE list are wrapped in a reference-counted holder.
Each node keeps a reference to the holder so it (and its memory context)
stays valid until the node is destroyed, while shutdown drains the SIEVE
and destroys the trie from an RCU callback and frees the holder once the
last node drops its reference. Reuse across a reconfiguration now moves
ownership of the holder to the new view instead of sharing it through a
separate owners counter, so dns_delegdb_reuse() is removed.
Ondřej Surý [Thu, 4 Jun 2026 09:58:12 +0000 (11:58 +0200)]
Only update the global tid_count once
Normally, the tid_count is initialized only once at the beginning of the
application. The only exception is the pattern in the unit test where
isc_loopmgr is repeatedly created and torn down and each creation of
isc_loopmgr_t calls isc__tid_initcount() with the previous value.
ThreadSanitizer sees that as write operation on unprotected memory are
reports this as data race even though the value has not really changed.
This has been fixed by skipping the tid_count value update on repeated
calls.
A previous commit introduced a latent bug where the wrong popcount
definition was used when overriding the compilation mode to C23.
This commit fixes it.
Michal Nowak [Mon, 1 Jun 2026 20:56:50 +0000 (22:56 +0200)]
fix: ci: Disable dnstap in reproducible-build CI job
Commit 515ff3763c ("Simplify reproducible-build CI job") dropped the
-Ddnstap=disabled option from the "meson reprotest" invocation, which
re-introduced a known reproducibility failure:
The job builds with CFLAGS=${CFLAGS_COMMON}, which enables LTO with
-ffat-lto-objects. Fat LTO objects embed GIMPLE bytecode keyed by a
per-compilation random LTO hash, so they are not reproducible run to
run. libdnstap.a is the only static archive in the build, and meson
treats every .a as a final, checked artifact, so the two reprotest
builds disagree on its contents. The shared libraries are unaffected
because final LTO linking re-emits and strips the bytecode.
Restore the -Ddnstap=disabled workaround, along with a comment
explaining the instability. The unrelated -Ddoc=disabled and
-Doptimization=1 options are left dropped, as they were only build-time
speedups and not related to reproducibility.
Assisted-by: Claude:claude-opus-4-8
Merge branch 'mnowak/reprotest-disable-dnstap-lto' into 'main'
Michal Nowak [Mon, 1 Jun 2026 20:38:50 +0000 (20:38 +0000)]
Disable dnstap in reproducible-build CI job
Commit 515ff3763c ("Simplify reproducible-build CI job") dropped the
-Ddnstap=disabled option from the "meson reprotest" invocation, which
re-introduced a known reproducibility failure:
The job builds with CFLAGS=${CFLAGS_COMMON}, which enables LTO with
-ffat-lto-objects. Fat LTO objects embed GIMPLE bytecode keyed by a
per-compilation random LTO hash, so they are not reproducible run to
run. libdnstap.a is the only static archive in the build, and meson
treats every .a as a final, checked artifact, so the two reprotest
builds disagree on its contents. The shared libraries are unaffected
because final LTO linking re-emits and strips the bytecode.
Restore the -Ddnstap=disabled workaround, along with a comment
explaining the instability. The unrelated -Ddoc=disabled and
-Doptimization=1 options are left dropped, as they were only build-time
speedups and not related to reproducibility.
Michal Nowak [Sun, 24 May 2026 18:29:55 +0000 (18:29 +0000)]
Remove redundant Python 3.7 skip markers from system tests
The test framework already requires Python 3.10+ (conftest.py raises
RuntimeError if version < 3.10), so skipif(sys.version_info < (3, 7))
can never trigger. Remove the dead markers and now-unused sys imports.
Michal Nowak [Mon, 25 May 2026 13:00:43 +0000 (13:00 +0000)]
Fix nzd2nzf test always being skipped
When LMDB was made a required dependency (929eccdfdc), the "LMDB" entry
was removed from features.py and the --with-lmdb flag was removed from
feature-test.c. However, the with_lmdb skip marker in mark.py and its
usage in nzd2nzf were not cleaned up. Since FEATURE_LMDB was no longer
being set, the skip condition became permanently true, silently skipping
the test on every run.
Remove the dead skip marker and update other stale references that still
described LMDB as optional (build docs, addzone test comments).
Michal Nowak [Wed, 27 May 2026 21:25:27 +0000 (21:25 +0000)]
Increase ans5 NS response delay in rpzrecurse test
The nsip-wait-recurse and nsdname-wait-recurse timing tests
compare query times with wait-recurse yes vs no. With a
1-second NS response delay in ans5, the timing difference is
too small to reliably measure with whole-second granularity,
causing intermittent failures when both cases round to the
same integer.
Increase the delay from 1 to 3 seconds and add explicit dig
timeout options (+time=30 +tries=1) so that dig does not time
out or retry during the slow wait-recurse yes queries.
Michal Nowak [Mon, 1 Jun 2026 14:50:50 +0000 (16:50 +0200)]
fix: dev: Fix wrong variable in named_server_sync() log message
named_server_sync() logged isc_result_totext(result) but returns
tresult. The loop accumulates errors into tresult, so result only
holds the last iteration's value. If the last view succeeded but an
earlier one failed, the log would incorrectly say "success".
Merge branch 'mnowak/fix-server-sync-log' into 'main'
Michal Nowak [Mon, 25 May 2026 06:52:31 +0000 (06:52 +0000)]
Fix wrong variable in named_server_sync() log message
named_server_sync() logged isc_result_totext(result) but returns
tresult. The loop accumulates errors into tresult, so result only
holds the last iteration's value. If the last view succeeded but an
earlier one failed, the log would incorrectly say "success".
Michal Nowak [Mon, 1 Jun 2026 13:59:18 +0000 (15:59 +0200)]
fix: test: Increase timeout for reload-based kasp signing checks
```
______________________________ test_kasp_default _______________________________
[gw0] freebsd15 -- Python 3.11.15 /usr/local/bin/python3.11
/home/ec2-user/builds/isc-private/bind9/bin/tests/system/kasp/tests_kasp.py:910: in test_kasp_default
isctest.run.retry_with_timeout(update_is_signed, timeout=5)
/home/ec2-user/builds/isc-private/bind9/bin/tests/system/isctest/run.py:164: in retry_with_timeout
assert False, msg
E AssertionError: tests_kasp.test_kasp_default.<locals>.update_is_signed() timed out after 5 s
E assert False
```
Merge branch 'mnowak/kasp-default-update-is-signed-timeout' into 'main'
Michal Nowak [Mon, 1 Jun 2026 12:30:23 +0000 (12:30 +0000)]
Increase timeout for reload-based kasp signing checks
After reloading an inline-signed zone from file, named must re-read it,
detect the deltas and generate RRSIGs before the answer is signed, which
can take longer than 5 seconds on a loaded CI host and cause spurious
update_is_signed() timeouts. Bump these reload-based checks to 10
seconds, matching cb_ixfr_is_signed.
Michal Nowak [Mon, 1 Jun 2026 13:56:36 +0000 (15:56 +0200)]
fix: test: Bump edns-expire refresh timeout to 30 seconds
Rarely, RNDC fails to refresh the zone on FreeBSD in the default 10
seconds, causing test_edns_expire_refresh to fail with a TimeoutExpired
on the "rndc refresh edns-expire." call. Give it more time, the same
way the reconfigure timeout was bumped in
test_reconfiguration_when_zone_transfer_is_in_the_middle_of_soa_query.
Assisted-by: Claude:claude-opus-4-8
Merge branch 'mnowak/bump-edns-expire-refresh-rndc-timeout' into 'main'
Michal Nowak [Mon, 1 Jun 2026 12:33:59 +0000 (12:33 +0000)]
Bump edns-expire refresh timeout to 30 seconds
Rarely, RNDC fails to refresh the zone on FreeBSD in the default 10
seconds, causing test_edns_expire_refresh to fail with a TimeoutExpired
on the "rndc refresh edns-expire." call. Give it more time, the same
way the reconfigure timeout was bumped in
test_reconfiguration_when_zone_transfer_is_in_the_middle_of_soa_query.
Nicki Křížek [Thu, 28 May 2026 16:13:20 +0000 (16:13 +0000)]
Avoid rndc loadkeys race in checkds system test
The wait loop in test_checkds() called "rndc loadkeys" once per
second while polling ns9.log for expected parental-agent response
lines. Under load (notably the rbt CI job), responses to one query
batch could land after a subsequent loadkeys had already reset the
per-key DSPUBCOUNT counter in lib/dns/zone.c without cancelling the
in-flight requests. Stragglers from the earlier round then bumped the
new round's counter to parentalscnt and BIND finalized DSPublish for
zones where one parental-agent legitimately serves no DS, spuriously
failing the !DSPublish keystate assertion.
Trigger at most one loadkeys per test case and wait passively via
watch_log_from_start() / wait_for_all(). Watching from the start
of the log preserves the original implicit semantics for zones
whose DS state was already finalized by BIND's automatic checkds
polling at zone-load time -- the expected lines are already
present and the watcher returns immediately.
Ondřej Surý [Sat, 30 May 2026 03:59:42 +0000 (05:59 +0200)]
chg: dev: Consolidate the validator's DS fetches into one helper
Internal cleanup with no change in resolution behaviour. The DNSSEC
validator started DS record lookups from three separate places, each
set up slightly differently; they now go through a single helper.
Merge branch 'ondrej/validator-ds-fetch-zonecut' into 'main'
Ondřej Surý [Fri, 29 May 2026 10:10:29 +0000 (12:10 +0200)]
Funnel the validator's DS fetches through a single helper
The validator starts a DS fetch from three places while building or
proving a trust chain. Only validate_dnskey() handed the resolver the
parent zone cut as a delegation hint; the other two started the fetch
with no hint at all.
Factor the shared setup into create_ds_fetch() and route all three
through it, so every validator DS fetch is created identically and
carries the parent zone cut. DS is an at-parent type, so the resolver
already anchors such a query at the parent on its own; supplying the
zone cut explicitly also lets the resolver's fetch loop detection match
the fetch by domain, which it cannot do for a fetch with no hint.
Ondřej Surý [Fri, 29 May 2026 20:34:46 +0000 (22:34 +0200)]
chg: usr: Fix a resolver stall on a CNAME response to a DS query
A validating resolver could stall for about twelve seconds and then return
SERVFAIL when an authoritative server answered a DS query with a CNAME. Such
responses are now rejected promptly, so the query fails fast instead of
hanging.
Closes #5878
Merge branch '5878-reject-cname-at-dnssec-types' into 'main'
Ondřej Surý [Fri, 29 May 2026 09:32:52 +0000 (11:32 +0200)]
Add a system test for CNAME answers to DNSSEC meta-type queries
Two authoritative zones drive the cases. 'example.' answers DNSKEY,
NSEC, NSEC3 and RRSIG queries with a CNAME: a direct recursive query for
one of these must not crash the resolver, and the validator's own DNSKEY
fetch for a signed name must fail as a broken trust chain and return
SERVFAIL promptly.
'secure.' is served faithfully but answers DS queries with an unsigned
CNAME -- the input that drove the validator's insecurity proof into a
self-join. The resolver must return SERVFAIL within a couple of seconds
instead of stalling for twelve.
Ondřej Surý [Fri, 29 May 2026 15:43:54 +0000 (17:43 +0200)]
Fail promptly on an RRSIG answer with no usable record
A query for an RRSIG is handled as a subset of ANY, so rctx_answer_any()
filters out records that do not match the queried type. When every
record was filtered out (an answer carrying only unrelated types), the
function still returned success with nothing cached, and the fetch then
waited for a validator that was never started until the backstop fetch
timer fired ~12s later. Treat an all-filtered answer as a broken
response, matching how non-meta types already reject a reply with no
usable record.
Ondřej Surý [Fri, 29 May 2026 09:32:44 +0000 (11:32 +0200)]
Detect non-advancing alias chains in the validator
The resolver turned a CNAME response to an RRSIG or NSEC query into
FORMERR inside rctx_answer_cname(). That is redundant -- every caller
already copes with a DNS_R_CNAME or DNS_R_DNAME result -- and it is the
wrong layer, because the resolver cannot tell a legitimate alias from a
broken one. Drop it; a CNAME for one of these types now flows back as
an ordinary alias.
The case that must be stopped lives in the validator. While proving an
unsigned CNAME insecure, proveunsecure() fetches the DS for the CNAME's
own name; because fetches are shared, that fetch re-enters and stalls on
the in-flight fetch the validator is waiting for, deadlocking for about
twelve seconds (GL#5878). Unlike the resolver, the validator knows it
is validating an alias, so check_chaining() now aborts a fetch whose
name matches the chaining rdataset's owner: it cannot advance the chain
and would only self-join.
Ondřej Surý [Fri, 29 May 2026 19:25:39 +0000 (21:25 +0200)]
fix: dev: Refine resolver fetch loop detection
The resolver's fetch loop detection now triggers only when a new
fetch would join an already in-flight fetch that is also one of
its own ancestors, which is the actual loop condition. Previously
the check ran against the original request before the fetch was
set up.
Merge branch 'ondrej/improve-resolver-loop-detection' into 'main'
Ondřej Surý [Fri, 29 May 2026 14:36:45 +0000 (16:36 +0200)]
Detect resolver fetch loops only when joining an in-flight fetch
dns_resolver_createfetch() guarded against fetch loops by comparing the
raw request name/type/domain before any fetch context existed. Move the
check after the context is obtained and run it against the context
itself, and only when we joined an already in-flight context
(!new_fctx) that is also an ancestor in the parent chain. That is the
real loop condition: the new fetch would block waiting on a fetch that
is itself waiting on us. A newly created context waits on nothing, so it
proceeds, bounded by the fetch depth limit and the complementary ADB
loop detection.
Alessio Podda [Fri, 29 May 2026 08:43:51 +0000 (08:43 +0000)]
fix: dev: Bound memory use during incoming zone transfers
During an incoming zone transfer, an optimization could let
the batch of pending records grow without bound for a large
zone, raising memory usage. It gave no measurable performance
benefit, so it has been removed.
Alessio Podda [Fri, 22 May 2026 15:58:10 +0000 (17:58 +0200)]
Remove name boundary optimization
In MR !9740, we introduced an optimization that reduces memory usage
by processing rdatas in batches during AXFR.
The maximum batch size is 128, but the batch size was allowed to grow
beyond that limit if all rdatas in a batch were for the same name, as
that allows a more efficient optimization.
This optimization could theoretically allow the batch size arbitrarily
for a sufficient large zone transfer. Since synthetic tests don't show
any performance improvement from the optimization, this MR removes it.
Michal Nowak [Thu, 28 May 2026 16:01:39 +0000 (18:01 +0200)]
fix: test: Fix pytest-xdist loadscope splitting on "::" in params
LoadScopeScheduling._split_scope() uses rsplit("::", 1) to
extract the test file scope from a node ID. When parametrized
test values contain "::" (IPv6 addresses like "cafe:cafe::cafe"
or "::1"), the split lands inside the parameter instead of at
the .py:: boundary. This creates spurious scopes that get
assigned to different workers, each triggering a full fixture
setup (starting named instances).
Override _split_scope() in conftest.py to split on ".py::"
which is unambiguous.
Six tests in synthrecord/tests_synthrecord.py are affected.
A verification script is included in util/.
Assisted-by: Claude:claude-opus-4-7
Merge branch 'mnowak/fix-xdist-loadscope-split' into 'main'
Michal Nowak [Tue, 26 May 2026 16:09:21 +0000 (16:09 +0000)]
Fix pytest-xdist loadscope splitting on "::" in params
LoadScopeScheduling._split_scope() uses rsplit("::", 1) to
extract the test file scope from a node ID. When parametrized
test values contain "::" (IPv6 addresses like "cafe:cafe::cafe"
or "::1"), the split lands inside the parameter instead of at
the .py:: boundary. This creates spurious scopes that get
assigned to different workers, each triggering a full fixture
setup (starting named instances).
Override _split_scope() in conftest.py to split on ".py::"
which is unambiguous.
Six tests in synthrecord/tests_synthrecord.py are affected.
A verification script is included in util/.
Michal Nowak [Thu, 28 May 2026 14:54:58 +0000 (16:54 +0200)]
chg: test: Prioritize the 10 slowest system test scopes
Update PRIORITY_TESTS with the 10 longest-running test
scopes measured from CI (job 7468217). These get scheduled
first so that with --dist=loadscope they land on separate
workers instead of piling up at the end.
Also fix "serve-stale/" to "serve_stale/" to match the
actual directory name, and add a startup check that fails
if any PRIORITY_TESTS entry does not match an existing
directory.
Assisted-by: Claude:claude-opus-4-7
Merge branch 'mnowak/prioritize-slow-system-tests' into 'main'
Michal Nowak [Tue, 26 May 2026 16:40:13 +0000 (16:40 +0000)]
Prioritize the 10 slowest system test scopes
Update PRIORITY_TESTS with the 10 longest-running test
scopes measured from CI (job 7468217). These get scheduled
first so that with --dist=loadscope they land on separate
workers instead of piling up at the end.
Also fix "serve-stale/" to "serve_stale/" to match the
actual directory name, and add a startup check that fails
if any PRIORITY_TESTS entry does not match an existing
directory.
Matthijs Mekking [Thu, 28 May 2026 14:26:03 +0000 (14:26 +0000)]
fix: dev: Check options in templates that must be non-zero
`named-checkconf` should reject a template that has options that must be non-zero
(`max-refresh-time`, `max-retry-time`, `min-refresh-time`, `min-retry-time`).
`rndc addzone` with a zone that refers to such template should fail cleanly.
Closes #6041
Merge branch '6041-check-nonzero-skips-templates' into 'main'
Colin Vidal [Thu, 28 May 2026 11:59:46 +0000 (13:59 +0200)]
fix: usr: Restore delegdb size after `rndc flush`
When the delegation database was flushed using `rndc flush`, its size was also reset but not restored. As a result, after `rndc flush` was used at least once, the delegation database size could grow unbounded. This has now been fixed.
Colin Vidal [Wed, 27 May 2026 08:50:42 +0000 (10:50 +0200)]
Add system test for delegdb size preservation across `rndc flush`
Test that flushing the delegdb via `rndc flush` preserves its
configured size limit. The test checks delegdb watermarks after
`named` startup, flushes caches, and verifies that the delegdb
watermarks are correctly restored afterwards.
To distinguish between the previous `delegdb` memory contexts and the
new ones, we need to know exactly when the previous `delegdb` memory
contexts are removed (this is not immediate, since those are removed
during RCU reclamation phase). A trace is therefore added when a memory
context is destroyed, if `ISC_MEM_DEBUGTRACE` is set.
Colin Vidal [Tue, 26 May 2026 16:13:44 +0000 (18:13 +0200)]
Fix delegdb flush API
The `rndc flush` command flushes the delegdb by deleting the
existing database and creating a new one. In the process, the
delegdb was losing its configured size limit; as a result, once
flushed, the delegdb size became unbounded.
This is now fixed by using `dns_delegdb_getconfig()` to back up the
current configuration before instantiating a new delegdb, then
restoring it with `dns_delegdb_setconfig()`.
Colin Vidal [Tue, 26 May 2026 16:11:12 +0000 (18:11 +0200)]
Add delegdb configuration struct
Instead of having independent APIs to configure various aspects of the
delegdb (i.e. cache size, other settings that may come up later), a
single configuration struct is passed to `dns_delegdb_setconfig()`, which
internally does all the plumbing. To avoid relying on
atomics/synchronization, `dns_delegdb_setconfig()` must be called from
exclusive mode (for now).
The configuration can be retrieved at any time (not necessarily from
exclusive mode) using `dns_delegdb_getconfig()`. This is useful, for
instance, to flush the delegdb without losing its parameters.
Ondřej Surý [Thu, 28 May 2026 11:21:07 +0000 (13:21 +0200)]
rem: usr: Remove legacy special handling for SIG, NXT, and KEY records
BIND no longer applies legacy RFC 2535 handling to the obsolete ``SIG``, ``NXT``
and ``KEY`` record types; they are now served as plain zone data. Zones with
both a ``CNAME`` and a ``KEY`` and or ``NXT`` at the same name — invalid under
:rfc:`2181` — will now fail to load and must be corrected.
Closes #6007
Merge branch '6007-remove-SIG-and-NXT-special-handling' into 'main'
Ondřej Surý [Tue, 19 May 2026 13:58:54 +0000 (15:58 +0200)]
Drop RFC 2535 special-casing of the KEY record type
After SIG and NXT lost their special handling, KEY remained the only
RFC 2535-era type still receiving coexistence allowances: KEY
alongside CNAME at the same owner, KEY answered from the parent side
of a zone cut, KEY kept across CNAME eviction in the cache. RFC 3755
retains type 25 only for SIG(0) and TKEY transaction signatures, and
neither relies on those allowances in practice. The in-tree comment
that flagged the RFC 3007 parent-side carve-out as "unclear" predicted
this cleanup.
Zones that publish CNAME and KEY at the same owner — already invalid
under RFC 2181 — now fail to load. System test fixtures are updated
accordingly, and a new test asserts that SIG, NXT, and KEY records
pick up covering RRSIGs when their zone is signed.
Ondřej Surý [Tue, 19 May 2026 13:38:28 +0000 (15:38 +0200)]
Stop treating SIG and NXT records specially
RFC 3755 retired SIG and NXT in favour of RRSIG and NSEC. BIND still
warned about them at zone load, refused them in dynamic updates,
parsed SIG with a non-zero "type covered" field as a signature on an
RRset, and tracked them via dns_rdatatype_issig(). Those carve-outs
were the sole path that made the GL#5818 crash class reachable.
Treat both types as ordinary unknown rdata: they load, transfer, sign
and answer like any other record, and dynamic updates carry them
through the generic path. SIG(0) is unaffected; its message-parsing
carve-out is preserved.
Nicki Křížek [Wed, 27 May 2026 15:28:07 +0000 (15:28 +0000)]
Add isctest.mark.with_developer pytest mark
Tests that exercise instrumentation, log output, or other behaviour
that only exists in developer builds (the gcc:almalinux9:amd64 CI job
sets -Ddeveloper=disabled to guard against such accidental coupling)
can now decorate themselves with isctest.mark.with_developer to skip on
non-developer builds.
Nicki Křížek [Wed, 27 May 2026 15:26:47 +0000 (15:26 +0000)]
Add --enable-developer probe to feature-test
System tests that depend on log output, instrumentation, or other
behaviour only present in developer builds can use this probe to detect
the build configuration at runtime.
Nicki Křížek [Wed, 27 May 2026 15:26:15 +0000 (15:26 +0000)]
Define DEVELOPER_MODE in developer-mode builds
So that build-time consumers (e.g. feature-test) can detect developer
mode through a single dedicated symbol rather than proxying through
implementation-detail defines like ISC_MEM_TRACKLINES.
Ondřej Surý [Thu, 28 May 2026 09:11:57 +0000 (11:11 +0200)]
fix: usr: Fix nxdomain-redirect combined with dns64
When a resolver was configured with both `nxdomain-redirect` and `dns64`
in the same view, an AAAA query for a nonexistent name could abort
`named`. The combination failed whenever the redirect zone held A
records but no AAAA records. The server now serves the empty AAAA
response from the redirect zone as-is, instead of attempting DNS64
synthesis on top of it.
Closes #5789
Merge branch '5789-fix-nxdomain-redirect-dns64-assert' into 'main'
Ondřej Surý [Wed, 20 May 2026 16:28:15 +0000 (18:28 +0200)]
Skip DNS64 synthesis when answering a redirected response
redirect2() swaps qctx->db to the redirect zone before
query_nodata() runs. The DNS64 fallback there issues an A lookup
for the original query name, which is out of zone for the
redirect db, and the resulting query_notfound() trips
INSIST(!is_zone). The cached NCACHENXRRSET variant trips a
REQUIRE in dns_rdataset_first() on a disassociated rdataset.
The synth-from-dnssec entry reaches the same fallback via
query_coveringnsec(). Guarding the fallback with
!qctx->redirected leaves the nxdomain-redirect NXRRSET answer to
be served as-is.
Ondřej Surý [Wed, 20 May 2026 16:28:15 +0000 (18:28 +0200)]
System test for nxdomain-redirect combined with dns64
An AAAA query for a non-existent name into a view that combines
nxdomain-redirect with dns64 used to abort named via the DNS64
fallback in query_nodata(). The new module exercises all three
documented entry paths into query_redirect(): the authoritative
NXDOMAIN path (ns7, tripping INSIST(!is_zone) in
query_notfound()), the recursive NCACHENXRRSET path (ns8,
tripping REQUIRE in dns_rdataset_first() on a disassociated
rdataset), and the synth-from-dnssec path (ns10 validating
against ns9's signed root, with a primer A query so the second
AAAA reaches query_redirect() via query_coveringnsec()). ns9
serves as a neutral upstream so the cached and synthesized
negatives land real NXRRSETs.
Nicki Křížek [Wed, 27 May 2026 15:54:34 +0000 (17:54 +0200)]
chg: test: Improve pytest jinja2 templates
- Enable rendering ns-specific data in jinja2 templates using the `ns` varible.
- Add common zone/config snippets an `_common` templates.
- Allow jinja2 imports from `_common`.
- Improve the `_common/controls.conf.j2` snippet to render ns-specific IP rather than hardocded one.
Merge branch 'nicki/pytest-template-improvements' into 'main'