Michał Kępień [Thu, 12 Mar 2026 11:27:36 +0000 (12:27 +0100)]
Fix a typo in job name
As hinted upon by the comment preceding it, the job preparing packager
notifications was (rather unsurprisingly) supposed to be called
"prepare-packager-notification". Fix the typo in its name.
Petr Špaček [Tue, 10 Mar 2026 17:04:51 +0000 (18:04 +0100)]
Delete early access token when code is published
Technically this is not necessary because the token expires in one week
after creation, and new code would have got there only one week before
the next public release, but better be safe than sorry.
Catch is, after_script gets executed even if a job fails or is
canceled. Delete distros token only if publication succeeded.
Ondřej Surý [Tue, 10 Mar 2026 17:38:37 +0000 (18:38 +0100)]
fix: dev: Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference. When dns_dispatch_connect() fails
synchronously on TCP (e.g. from dns_transport_get_tlsctx() failing
in tcp_dispatch_connect()), the connect callback is never scheduled,
so the extra reference is never consumed. This has been fixed.
Merge branch 'ondrej/fix-resquery-refcount' into 'main'
Ondřej Surý [Fri, 6 Mar 2026 16:06:24 +0000 (17:06 +0100)]
Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.
When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed. The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.
Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
Nicki Křížek [Tue, 10 Mar 2026 15:07:54 +0000 (16:07 +0100)]
chg: test: Disable statschannel RTT tests on FreeBSD
These tests rely on somewhat precise timing, as they test that answers
arrive in a particular latency bucket within the statschannel stats.
These tests are affected by various timing and network issues on our
FreeBSD CI runners and the results are very unstable. Skip these on
FreeBSD entirely.
Merge branch 'nicki/disable-statschannel-rtt-on-freebsd' into 'main'
Nicki Křížek [Tue, 10 Mar 2026 12:35:56 +0000 (13:35 +0100)]
Disable statschannel RTT tests on FreeBSD
These tests rely on somewhat precise timing, as they test that answers
arrive in a particular latency bucket within the statschannel stats.
These tests are affected by various timing and network issues on our
FreeBSD CI runners and the results are very unstable. Skip these on
FreeBSD entirely.
Nicki Křížek [Mon, 9 Mar 2026 15:48:24 +0000 (16:48 +0100)]
chg: ci: Re-enable shotgun runs for nightlies and tags
The recent rewrite of DNS Shotgun infrastructure might've improved the
prior instability. In order to evaluate, re-enable the regular shotgun
pipelines to gather data.
Merge branch 'nicki/ci-shotgun-enable' into 'main'
Nicki Křížek [Thu, 29 Jan 2026 10:10:10 +0000 (11:10 +0100)]
Re-enable shotgun runs
Make the shotgun pipelines on-demand with 5 samples (and no retry) by
defautl. MRs are compared to their base, while other sources (triggers,
web, schedule...) are compared against the latest released version.
For schedules, run the shotgun pipelines on Monday morning only, but
with the increased number of samples. This should provide useful data
without too many false positives.
Nicki Křížek [Mon, 9 Mar 2026 12:12:14 +0000 (13:12 +0100)]
chg: test: Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
Merge branch 'nicki/pytest-log-querymsg' into 'main'
Nicki Křížek [Tue, 3 Mar 2026 12:37:14 +0000 (13:37 +0100)]
Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
Alessio Podda [Fri, 6 Mar 2026 14:06:13 +0000 (14:06 +0000)]
chg: dev: Replace lock keyfile hashmap with lock pool
Kasp used a lock per zone origin in order to prevent concurrent access
to keyfiles. This lead to substantial memory consumption in the case of
authoritative servers with many small zones, as lots of locks need to be
allocated.
Since the number of keyfile locks taken cannot exceed the number of
helper threads, it makes more sense to use a lock pool of fixed size
keyed by the hash of the origin name, leading to memory savings.
Merge branch 'alessio/keyfile-lock-pool' into 'main'
Alessio Podda [Fri, 27 Feb 2026 12:33:55 +0000 (13:33 +0100)]
Replace lock keyfile hashmap with lock pool
Kasp used a lock per zone origin in order to prevent concurrent access
to keyfiles. This lead to substantial memory consumption in the case of
authoritative servers with many small zones, as lots of locks need to be
allocated.
Since the number of keyfile locks taken cannot exceed the number of
helper threads, it makes more sense to use a lock pool of fixed size
keyed by the hash of the origin name, leading to memory savings.
This commit adds a new CI job to update the BIND9 version in the
isc-projects/bind9-docker project, which will cause the docker images
to be rebuilt for release. Previously a manual step.
A notification is sent to the relevant Mattermost channel.
fix: usr: Fix setting retire in dns_keymgr_key_init
A wrong-variable bug in `dns_keymgr_key_init()` causes the DNSSEC key inactive
time to never be read. This means the key state is retracting zone signatures
where it should have, delaying the key rollover.
ISC would like to thank Naresh Kandula Parmar (Nottiboy) for reporting this.
Closes #5774
Merge branch '5774-fix-setting-retire' into 'main'
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).
The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload. This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.
The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
Merge branch 'ondrej/make-NS_PROCESSING_LIMIT-configurable' into 'main'
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).
The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload. This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.
The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
Štěpán Balážik [Tue, 3 Mar 2026 06:50:22 +0000 (06:50 +0000)]
fix: ci: Fix .respdiff-recent-named anchor to work when the ABI changes
Previously, on 9.20 and 9.18, both builds (reference and the version
being tested) would use the same .so files which lead to a crash if the
ABI changed.
Use `git worktree` to get completely separate build environment for the
reference version.
This is not a problem on 9.21 as Meson is smart and covers this mistake,
but apply the fix to it as well for consistency.
This also is not a problem on non-MR pipelines: the latest released version
was used as a reference there, so the .so versions would differ.
Štěpán Balážik [Mon, 2 Mar 2026 14:54:53 +0000 (15:54 +0100)]
Fix .respdiff-recent-named anchor to work when the ABI changes
Previously, on 9.20 and 9.18, both builds (reference and the version
being tested) would use the same .so files which lead to a crash if the
ABI changed.
Use `git worktree` to get completely separate build environment for the
reference version.
This is not a problem on 9.21 as Meson is smart and covers this mistake,
but apply the fix to it as well for consistency.
Colin Vidal [Sun, 1 Mar 2026 08:21:03 +0000 (09:21 +0100)]
fix: usr: Resolve "key defined in view is not found"
A recent change in `2956e4fc45b3c2142a3351682d4200647448f193` hardened the `key` name check when used in `primaries` to immediately reject the configuration if the key was not defined (rather than only checking whether the key name was correctly formed). However, the change introduced a regression that prevented the use of a `key` defined in a view. This is now fixed.
Colin Vidal [Mon, 23 Feb 2026 18:36:19 +0000 (19:36 +0100)]
checkconf: check key existence in views
Commit `2956e4fc45b3c2142a3351682d4200647448f193` hardened the `key`
name check when used in `primaries` to reject the configuration if
the key was not defined, rather than simply checking whether the
key name was correctly formed.
However, the key name check didn't include the view configuration,
causing keys not to be recognized if they were defined inside the
view and not at the global level. This regression is now fixed.
Michał Kępień [Fri, 27 Feb 2026 15:52:20 +0000 (16:52 +0100)]
chg: doc: Update Sphinx-related Python packages
Update Sphinx-related Python packages to their current versions pulled
in by "pip install sphinx-rtd-theme" run in a fresh Debian "bookworm"
container.
Merge branch 'michal/update-sphinx-related-python-packages' into 'main'
Michał Kępień [Fri, 27 Feb 2026 13:10:26 +0000 (14:10 +0100)]
Update Sphinx-related Python packages
Update Sphinx-related Python packages to their current versions pulled
in by "pip install sphinx-rtd-theme" run in a fresh Debian "bookworm"
container.
Arаm Sаrgsyаn [Thu, 26 Feb 2026 17:21:24 +0000 (17:21 +0000)]
new: usr: Provide response round-trip time (RTT) counters via statistics channel
Previously, :iscman:`named` provided RTT counters for outgoing
queries performed by itself during name resolutions. Now this
has been improved to provide more granular counters (histogram),
and to also provide RTT counters for the incoming queries.
Closes #5279
Merge branch '5279-query-rtt-isc_histo_t-statistics' into 'main'
Aram Sargsyan [Thu, 15 Jan 2026 14:46:06 +0000 (14:46 +0000)]
Replace the outgoing queries RTT histogram code with isc_histomulti
The granularity of the simple histogram with fixed number of ranges
sometimes isn't good enough. As there's a need to implement a new
histogram statistics for the incoming query times (RTT), it was decided
to also update the existing RTT statistics of the outgoing queries
so that they look similar and use common code.
Remove the old histogram code from the resolver and from the statistics
channel. Reimplement the outgoing queries RTT histogram using the
isc_histomulti module, and prepare the necessary base for implementing
the incoming queries RTT histogram. The statistics channel will be
updated to expose the new histograms in an upcoming commit.
Aram Sargsyan [Thu, 15 Jan 2026 14:38:44 +0000 (14:38 +0000)]
Use standard reference counting for isc_histomulti
Use reference counting for isc_histomulti module so that it's
possible to attach/detach to/from the objects when used in the
statistics channel in the coming commits.
Ondřej Surý [Thu, 26 Feb 2026 06:33:29 +0000 (07:33 +0100)]
chg: dev: Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound partial
Fisher-Yates shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
introduces fctx_getaddresses_nsorder() to perform an in-place
randomization of indices into a bounded, stack-allocated lookup array
(nsorder) representing the "winning" fetch slots.
The nameserver dataset is now traversed in exactly one sequential pass:
1. Every nameserver is evaluated for local cached data.
2. If the current nameserver's sequential index exists in the randomized
nsorder array, it is permitted to launch an outgoing network fetch.
3. If not, it is restricted to local lookups via DNS_ADBFIND_NOFETCH.
This guarantees a fair random distribution for outbound queries while
maximizing local cache hits, entirely within O(1) memory and without
the overhead of linked-list pointer shuffling or dynamic allocation.
Closes #5695
Merge branch '5695-refactor-the-random-NS-selection' into 'main'
Colin Vidal [Wed, 25 Feb 2026 18:01:22 +0000 (19:01 +0100)]
Add test coverage for nameserver processing limits
Introduce a new system test (nsprocessinglimit) to verify that the
resolver strictly respects outgoing network fetch quotas when presented
with heavily delegated, unresponsive zones.
This test acts as a regression check for the recent Fisher-Yates nameserver
selection refactor. It sets up an authoritative server delegating a zone
to 23 distinct nameservers (all pointing to unresponsive loopback IPs).
Using dnstap, the test forces a resolution failure and verifies that:
1. The resolver successfully traverses the zone delegation path.
2. The resolver caps the outgoing network queries to the delegated
nameservers exactly at the processing limit (20 fetches), ensuring
array boundaries and dynamic fetch quotas are strictly enforced without
crashing or hanging.
Ondřej Surý [Wed, 25 Feb 2026 15:46:40 +0000 (16:46 +0100)]
Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.
This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.
This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
Ondřej Surý [Wed, 25 Feb 2026 09:05:55 +0000 (10:05 +0100)]
fix: usr: Remove deterministic selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset. This could lead to resolution failure
despite there being address that could be resolved for the
other names. Use a random starting point when selecting which
names to lookup.
Closes #5695
Closes #5745
Merge branch '5695-add-random-server-selection' into 'main'
Colin Vidal [Tue, 24 Feb 2026 16:30:56 +0000 (17:30 +0100)]
system test covering NS randomization
Add randomizens system test which ensures that NS are randomly selected.
The test relies of the fact that `getaddresses_allowed()` logic won't
allow to query more than 3 NS at the top-level. The `example.` zone has
4 NS and the 3 formers are lame. As a result, if the resolved doesn't
randomize the NS selection, it will only quiery the 3 formers, which
won't give an answer, and fails. With randomization enabled, there is a
chance that the resolver queries the fourth NS, and gets the result.
Mark Andrews [Fri, 19 Dec 2025 07:12:06 +0000 (18:12 +1100)]
Remove determinist selection of nameserver
When selecting nameserver addresses to be looked up we where
always selecting them in dnssec name order from the start of
the nameserver rrset. This could lead to resolution failure
despite there being address that could be resolved for the
other names. Use a random starting point when selecting which
names to lookup.
Ondřej Surý [Wed, 25 Feb 2026 06:29:23 +0000 (07:29 +0100)]
sec: usr: Remove purged adb names and entries from SIEVE list immediately
Both expire_name() and expire_entry() use isc_async mechanism to remove
the names and entries from the SIEVE-LRU lists on the matching isc_loop.
Under certain circumstances, this could lead to double counting the
purged named/entries when purging the SIEVE-LRU lists under the overmem
condition. This would cause not enough memory to be cleaned up and the
ADB would then never recover from the overmem condition leading to OOM
crash of the named.
Merge branch 'ondrej/fix-runaway-memory-in-adb' into 'main'
Ondřej Surý [Tue, 10 Feb 2026 05:16:31 +0000 (06:16 +0100)]
Remove purged adb names and entries from SIEVE list immediately
Both `expire_name()` and `expire_entry()` use the isc_async mechanism to
remove names and entries from the SIEVE-LRU lists on the matching
isc_loop.
Under heavy load when the cleaning mechanism didn't have the chance to
kick in yet, this delay could lead to double-counting the purged names
and entries when purging the SIEVE-LRU lists during an overmem
condition. This would result in insufficient memory being cleaned up,
causing the ADB to never recover from the overmem condition and leading
to an OOM crash of `named`.
This patch resolves the issue by bypassing the async queue and executing
the removal synchronously if the target loop matches the current
isc_loop().
If an BIND 9 administrator imports an invalid SKR file, local stack
in the import function might overflow. This could lead to
a memory corruption on the stack and ultimately server crash.
This has been fixed.
ISC would like to thank mcsky23 for bringing this bug to our attention.
Closes #5758
Merge branch '5758-fix-stack-overflow-via-rndc-skr-import' into 'main'
Ondřej Surý [Sun, 22 Feb 2026 05:37:33 +0000 (06:37 +0100)]
Importing invalid SKR file might overflow the stack buffer
If an invalid SKR file is imported, reading the time from the token
buffer might overflow the buffer on the local stack. This has been
fixed by removing the intermediate buffer and parsing the lexer token
directly.
Mark Andrews [Tue, 24 Feb 2026 02:35:07 +0000 (13:35 +1100)]
Test maximum length NSEC3 hash detection
Adds text and wire format unit tests to verify the newly enforced
maximum NSEC3 hash length constraints. These tests ensure that hash
lengths up to the 39-byte maximum are accepted, while larger sizes
correctly fail.
Ondřej Surý [Fri, 20 Feb 2026 14:44:14 +0000 (15:44 +0100)]
Add tests for NSEC3 invalid length
Adds a static system test that fails to load an NSEC3 record with an
invalid next part length. Additionally, introduces a dynamic test using
a crafted authoritative DNS proxy to inject invalid NSEC3 records on the
fly to test runtime behavior.
Mark Andrews [Wed, 18 Feb 2026 01:30:22 +0000 (12:30 +1100)]
Enforce NSEC3 record consistency
NSEC3 hashes are required to fit within a single DNS label. Since there
are 5 bits per label byte without pad characters, the maximum hash size
is floor(63*5/8) (39 bytes).
This patch enforces this maximum length for unknown algorithms, while
strictly enforcing the exact expected digest length for known algorithms
like SHA-1.
Ondřej Surý [Sat, 14 Feb 2026 13:43:41 +0000 (14:43 +0100)]
Invalid NSEC3 can cause OOB read of the isdelegation() stack
When .next_length is longer than NSEC3_MAX_HASH_LENGTH, it causes a
harmless out-of-bound read of the isdelegation() stack. This patch
fixes the issue by skipping NSEC3 records with an oversized hash length
during validation.
Ondřej Surý [Mon, 23 Feb 2026 19:57:50 +0000 (20:57 +0100)]
fix: usr: Fail DNSKEY validation when supported but invalid DS is found
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms. When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS. This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
Closes #5757
Merge branch '5757-fix-mixed-algorithm-DS-handling' into 'main'
Ondřej Surý [Mon, 23 Feb 2026 10:17:40 +0000 (11:17 +0100)]
Add test for mixed unsupported DS records
Add a system test that has one invalid DS record with supported
algorithm and one unsupported DS record. Both DNSKEY and A queries must
fail with SERVFAIL.
Ondřej Surý [Mon, 23 Feb 2026 05:13:59 +0000 (06:13 +0100)]
Fail DNSKEY validation when supported but invalid DS is found
A regression was introduced when adding the EDE code for unsupported
DNSKEY and DS algorithms. When the parent has both supported and
unsupported algorithm in the DS record, the validator would treat the
supported DS algorithm as insecure when validating DNSKEY records
instead of BOGUS. This has not security impact as the rest of the child
zone correctly ends with BOGUS status, but it is incorrect and thus the
regression has been fixed.
Matthijs Mekking [Thu, 19 Feb 2026 11:06:14 +0000 (12:06 +0100)]
Test serve-stale with upstream zones and CNAMEs
Three variants of YWH-PGM40640-56: Stale/Wrong DNS Data Served via
CNAME Flag Leak (DNS_DBFIND_STALEOK persistence) are presented in
GitLab issue #5751. All these variants have been converted to system
tests.
Variant 1 forwards source.stale to another server, that provides a
CNAME record, while the resolver is authoritative for target.stale.
The CNAME points to a non-existing name. A stale CNAME record should
result in a stale NXDOMAIN (instead of SERVFAIL).
Variant 2 forwards both source.stale and target.stale to other servers.
This time the CNAME points to an A RRset. If the source.stale server
is not available (and stale-answer-client-timeout is off), the cached
CNAME should be followed and pick up the fresh RRset (instead of the
stale A RRset).
Variant 3 is similar to variant 2, but this time the CNAME points to
a non-existing name again. After flushing the target, BIND should
return a stale NXDOMAIN (instead of SERVFAIL).
Ondřej Surý [Mon, 23 Feb 2026 06:23:25 +0000 (07:23 +0100)]
new: doc: Provide guidelines for tool-generated content
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.
Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.
In short: Please show your work and make sure your contribution is
easy to review.
This has been adopted from the Linux Kernel guidelines.
Merge branch 'ondrej/clarify-the-use-of-tools' into 'main'
Ondřej Surý [Mon, 12 Jan 2026 10:06:32 +0000 (11:06 +0100)]
Provide guidelines for tool-generated content
In the last few years, the capabilities of coding tools have exploded.
As those capabilities have expanded, contributors and maintainers have
more and more questions about how and when to apply those capabilities.
Add new documentation to guide contributors on how to best use BIND 9
development tools, new and old.
In short: Please show your work and make sure your contribution is
easy to review.
This has been adopted from the Linux Kernel guidelines.
Julia Evans [Fri, 20 Feb 2026 15:43:26 +0000 (10:43 -0500)]
Add examples to the dig man page
The goal here is to help new or infrequent users figure out the most
basic ways to use dig.
Notes on the choice of examples:
* I wrote examples that users can copy and paste exactly as is, without
having to come up with an appropriate IP address or domain name to use.
The one exception is the `dig -x` example which uses an IP from the
example range.
* `dig +noall +answer` here is because learning about `+noall +answer`
was lifechanging for me when I learned about it, I've heard from
others that they find it helpful too, and it's pretty hard to infer
from the man page as is that it might be useful
* I thought about adding `+trace` but left it out because 5 examples was
already starting to feel like a lot.
Štěpán Balážik [Fri, 20 Feb 2026 14:59:10 +0000 (14:59 +0000)]
chg: ci: Rework linting of Python code
With the Python version bumped to 3.10 and the dependency situation cleared with !11415 it is now time to run linters and formatters on more parts of the Python code that was previously skipped or ignored.
Switch configuration of the various Python-adjacent tools to `pyproject.toml` to ensure that the same configuration is used in CI and locally.
See the individual commits for details on settings changed and linters added.
Tweaks to type checking and enabling more `ruff` lints will come in a subsequent MRs.
Štěpán Balážik [Mon, 9 Feb 2026 18:22:44 +0000 (19:22 +0100)]
Clean up imports of dnspython modules
Add a pylint plugin that enforces:
- There is no bare `import dns` statement.
- All `dns.<module>` used are explicitly imported.
- There are no unused `dns.<module>` imports.
Štěpán Balážik [Fri, 20 Feb 2026 14:03:16 +0000 (15:03 +0100)]
Replace Optional["T"] with "T | None"
In Python 3.10 strings don't support the | operator, so ruff doesn't
attempt to fix these. Quote the entire type specification to avoid the
typing.Optional import.
Alternatives I considered:
- leaving it as is (only use of Optional in the code base)
- using `from future import __annotations__` (replacing one import with
another one)