Ondřej Surý [Fri, 15 May 2026 07:51:02 +0000 (09:51 +0200)]
[9.20] fix: test: Fix flaky reclimit test
The max-types-per-name cache eviction tests were flaky because two test steps were missing a sleep between queries, causing TTL-based cache verification to fail when both queries completed within the same second.
Backport of MR !11782
Merge branch 'backport-ondrej/fix-flaky-reclimit-9.20' into 'bind-9.20'
The cache verification in steps 11 and 15 checks that the TTL has
decreased from its initial value to confirm the response was served
from cache, but the sleep between the two queries was missing. Both
queries could complete within the same second, leaving the TTL
unchanged and causing the test to incorrectly conclude the entry was
not cached.
Ondřej Surý [Fri, 15 May 2026 06:49:26 +0000 (08:49 +0200)]
[9.20] chg: usr: Fall back to TCP on a UDP response with a mismatched query id
BIND used to wait silently for the correct DNS message id on a UDP fetch
even after receiving a response from the expected server with the wrong
id, leaving room for off-path spoofing attempts to keep guessing within
that window. The resolver now retries the fetch over TCP on the first
such response, and a new MismatchTCP statistics counter tracks how
often the fallback fires.
Closes #5449
Backport of MR !12023
Merge branch 'backport-5449-immediate-tcp-fallback-on-id-mismatch-9.20' into 'bind-9.20'
Ondřej Surý [Thu, 14 May 2026 10:20:19 +0000 (12:20 +0200)]
Switch UDP fetches to TCP on the first response with a wrong query id
Until now, the dispatcher silently dropped UDP responses from the
expected peer that carried the wrong DNS message id and kept listening
for the correct id to arrive within the read timeout. An off-path
attacker who knows the destination address and source port of an
outgoing fetch could exploit that quiet retry window to flood the
resolver with guessed responses; with a gigabit link the per-query
success probability grows linearly with the number of guesses that
arrive before the legitimate answer or the timeout.
Treat any such mismatch as a possible spoofing attempt and let the
resolver immediately retry the same query over TCP, the same control
path the truncation handler already uses.
Add a resolver statistics counter - exposed as 'queries retried over TCP
after a response with mismatched query id' in rndc stats and
'MismatchTCP' in the statistics channel
Mark Andrews [Thu, 14 May 2026 00:00:21 +0000 (10:00 +1000)]
Disable output escaping in bind9.xsl
The statistics charts where not displaying on some browsers (e.g. Chrome)
due to '>' being escaped as '>'. Use disable-output-escaping="yes" to
turn this off.
Ondřej Surý [Thu, 14 May 2026 07:37:52 +0000 (09:37 +0200)]
[9.20] fix: dev: Fix data race during rndc dumpdb or zone load
'rndc dumpdb' against a server with zones, and async zone load,
had a timing window where the operation's completion could fire
before the server had finished registering the operation,
occasionally leading to a possible crash. The completion is now
delivered after the registration is in place.
Closes #5952
Backport of MR !11991
Merge branch 'backport-5952-fix-masterdump-async-ctx-race-9.20' into 'bind-9.20'
Ondřej Surý [Fri, 8 May 2026 05:46:03 +0000 (07:46 +0200)]
Fix data race in async master dump/load context publication
Bouncing the offload itself to the target loop let the after-work
callback fire on the target thread and run the user's done callback
before the calling thread had published *dctxp / *lctxp. Enqueue on
the calling loop and bounce only the done callback instead, so the
publish is sequenced before the cross-thread hand-off by construction
and cannot be reintroduced by reordering the entry-point body.
The global RUNNER_SCRIPT_TIMEOUT: 55m in the parent pipeline was being
forwarded to the stress and tsan:stress child pipelines, where forwarded
yaml variables outrank job-level variables. That caused stress jobs with
BIND_STRESS_TESTS_RUN_TIME >= 60 to be killed at 55 minutes, regardless
of the per-job RUNNER_SCRIPT_TIMEOUT set in the generated child config.
Set forward:yaml_variables: false on both trigger jobs; the generated
configs already declare every variable they need.
Assisted-by: Claude:claude-opus-4-7
Backport of MR !12012
Merge branch 'backport-mnowak/fix-stress-test-script-timeout-9.20' into 'bind-9.20'
Michal Nowak [Wed, 13 May 2026 09:44:26 +0000 (11:44 +0200)]
Selectively inherit yaml vars in stress trigger jobs
The parent's global RUNNER_SCRIPT_TIMEOUT: 55m was reaching the stress
and tsan:stress child pipelines via inherited yaml variables, where
inherited values outrank the child's job-level variables. That caused
stress jobs with BIND_STRESS_TESTS_RUN_TIME >= 60 to be killed at 55
minutes, regardless of the per-job RUNNER_SCRIPT_TIMEOUT set in the
generated child config.
Use inherit:variables with a positive list on both trigger jobs:
inherit only CI_REGISTRY_IMAGE so the parent's registry override
(needed for image pulls in the child) flows through, while keeping
RUNNER_SCRIPT_TIMEOUT (and other globals) out of the child pipeline's
variable scope. The per-job RUNNER_SCRIPT_TIMEOUT values set by the
generated child config now take effect.
Michal Nowak [Wed, 25 Mar 2026 12:31:49 +0000 (13:31 +0100)]
Set RUNNER_SCRIPT_TIMEOUTs
Sometimes jobs can get stuck and be terminated by GitLab, leaving us
without artefacts that could contain useful information about why the
job got stuck.
Ondřej Surý [Tue, 12 May 2026 14:19:05 +0000 (16:19 +0200)]
chg: usr: Limit the number of glue records cached from a referral
When a delegation response contained many glue addresses per listed
nameserver, all of them were cached without a per-nameserver bound,
inflating resolver cache memory beyond what resolution could ever use.
The cache now keeps at most 20 IPv4 and 20 IPv6 glue addresses per
nameserver from a delegation.
Closes #5701
Merge branch '5701-limit-the-number-of-GLUE-records-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 6 May 2026 12:31:19 +0000 (14:31 +0200)]
Cap glue records cached from a referral
The resolver marked every NS RR's glue from a referral for caching with
no aggregate bound, so a parent server returning many NS RRs and many
glue addresses per NS could inflate cache memory long beyond what
resolution can ever use.
Truncate each glue rdataset to DELEG_MAX_GLUES_PER_NS (20) A and 20 AAAA
records before marking it for caching. The NS RRset itself is still
cached in full, bounded by max-records-per-type.
Ondřej Surý [Wed, 6 May 2026 12:30:01 +0000 (14:30 +0200)]
Remove the disabled CHECK_FOR_GLUE_IN_ANSWER code
The CHECK_FOR_GLUE_IN_ANSWER macro defaulted to 0 and was never enabled
by the build system, leaving check_answer() and the answer-section glue
scan in rctx_referral() as dead code. Drop them so the surrounding
referral-cache path is easier to reason about.
Michał Kępień [Mon, 11 May 2026 15:46:30 +0000 (17:46 +0200)]
[9.20] chg: ci: Add commit link and diff to RPM build job logs
The output of update_rpms.py is terse, making it difficult to verify its
actions. Add a commit link and "git show" output to the log of every CI
job running the update_rpms.py script in "build" mode to facilitate
double-checking its actions.
Backport of MR !11828
Merge branch 'backport-michal/add-commit-link-and-diff-to-rpm-build-job-logs-9.20' into 'bind-9.20'
Michał Kępień [Mon, 11 May 2026 15:41:50 +0000 (17:41 +0200)]
Add commit link and diff to RPM build job logs
The output of update_rpms.py is terse, making it difficult to verify its
actions. Add a commit link and "git show" output to the log of every CI
job running the update_rpms.py script in "build" mode to facilitate
double-checking its actions.
Michal Nowak [Mon, 11 May 2026 15:20:25 +0000 (17:20 +0200)]
[9.20] new: test: Add isctest.transfer.transfer_message() helper and convert tests
Add a new helper function, `isctest.transfer.transfer_message()`, to
`bin/tests/system/isctest/transfer.py` that generates the log message
produced by `xfrin_log()` in `lib/dns/xfrin.c` for an incoming zone
transfer:
transfer of '<zone>/IN' from <source_ns>#<port>: <msg>
The explicit use of `port` matches current shell system usage.
- zone - zone name without class (e.g. "example.com")
- source_ns - IP string, or None to wildcard the source address
- msg - the transfer-level message
(e.g. "Transfer status: success")
- port - integer source port, or None to wildcard the port number
When both source_ns and port are concrete values a plain str is returned
and `wait_for_line()` treats it as a literal substring match. Whenever
either is `None` a compiled `re.Pattern` is returned, with the unknown part
replaced by a constrained wildcard:
- source_ns=None, port=None -> from .*#[0-9]+:
- source_ns=None, port=53 -> from .*#53:
- source_ns="1.2.3.4", port=None -> from 1.2.3.4#[0-9]+:
- source_ns="1.2.3.4", port=N -> "from 1.2.3.4#N:" (plain str)
The port wildcard is [0-9]+ (not .*) because a port is always numeric.
Convert all hard-coded transfer log patterns in the Python system tests
to use transfer_message().
Notable cases:
- `mirror_root_zone`: source_ns=None (live internet, any root server),
port=53.
- `cipher_suites`: source_ns="10.53.0.1", port=None (each zone transfers
over a different TLS port).
- `test_under_signed_transfer`: parametrize gains a boolean xfrin_msg
flag to distinguish messages that go through xfrin_log() from
lower-level TSIG errors that do not.
Testing
-------
All system tests pass under `pytest -n auto`. The `mirror_root_zone`
live-internet test was also verified separately with
`CI_ENABLE_LIVE_INTERNET_TESTS=1`.
LLM usage
---------
This commit was produced in an interactive session with Claude Code
(Claude Sonnet 4.6), guided step by step by a human reviewer.
Closes #5735
Backport of MR !11796
Merge branch 'backport-5735-make-transfer-message-formatter-9.20' into 'bind-9.20'
Michal Nowak [Mon, 11 May 2026 11:24:22 +0000 (13:24 +0200)]
Add isctest.transfer.transfer_message() helper and convert tests
Add a new helper function, isctest.transfer.transfer_message(), to
bin/tests/system/isctest/transfer.py that generates the log message
produced by xfrin_log() in lib/dns/xfrin.c for an incoming zone
transfer:
transfer of '<zone>/IN' from <source_ns>#<port>: <msg>
The helper always returns a compiled re.Pattern. source_ns and port
each accept None to match any source address / port. msg accepts
either a plain str (regex-escaped automatically) or a compiled
re.Pattern (spliced into the regex as-is), so callers that need regex
syntax in the message part can pass Re(r"...") without having to
wrap the whole result.
source_ns is passed through re.escape() when provided, so dots in
IPv4 addresses (e.g. "10.53.0.1") match a literal dot rather than
any character.
Convert the existing call sites across the system tests to use the
new helper.
Michał Kępień [Mon, 11 May 2026 14:26:03 +0000 (16:26 +0200)]
[9.20] fix: ci: Increase GIT_DEPTH for the "assign-milestones" job
Cloning tags with the default GIT_DEPTH of 1 prevents the milestone
assignment script from identifying any merge requests that are included
in a given release. Fix by increasing GIT_DEPTH to an arbitrary value
that is high enough for practical purposes.
The GIT_DEPTH CI variable defaults to 1 for all jobs through the
top-level "variables" key. Explicitly setting it to 1 in job
definitions is unnecessary and may cause confusion. Remove these
redundant assignments.
Backport of MR !11996
Merge branch 'backport-michal/fix-assign-milestones-job-9.20' into 'bind-9.20'
Michał Kępień [Mon, 11 May 2026 14:07:47 +0000 (16:07 +0200)]
Remove redundant "GIT_DEPTH: 1" assignments
The GIT_DEPTH CI variable defaults to 1 for all jobs through the
top-level "variables" key. Explicitly setting it to 1 in job
definitions is unnecessary and may cause confusion. Remove these
redundant assignments.
Michał Kępień [Mon, 11 May 2026 14:07:47 +0000 (16:07 +0200)]
Increase GIT_DEPTH for the "assign-milestones" job
Cloning tags with the default GIT_DEPTH of 1 prevents the milestone
assignment script from identifying any merge requests that are included
in a given release. Fix by increasing GIT_DEPTH to an arbitrary value
that is high enough for practical purposes.
Michał Kępień [Mon, 11 May 2026 08:14:11 +0000 (10:14 +0200)]
[9.20] fix: ci: Fix triggering rules for the "publish-cleanup" job
The "publish-cleanup" tag pipeline job is currently created for all
security releases, including BIND -S releases, but it depends on the
"publish" job, which is only created for open source releases. This
breaks CI configuration for BIND -S tags, preventing pipelines from
getting created for such tags altogether. Fix by only creating the
"publish-cleanup" job in tag pipelines for open source security
releases.
Backport of MR !11992
Merge branch 'backport-michal/fix-triggering-rules-for-the-publish-cleanup-job-9.20' into 'bind-9.20'
Michał Kępień [Mon, 11 May 2026 08:07:38 +0000 (10:07 +0200)]
Fix triggering rules for the "publish-cleanup" job
The "publish-cleanup" tag pipeline job is currently created for all
security releases, including BIND -S releases, but it depends on the
"publish" job, which is only created for open source releases. This
breaks CI configuration for BIND -S tags, preventing pipelines from
getting created for such tags altogether. Fix by only creating the
"publish-cleanup" job in tag pipelines for open source security
releases.
Michał Kępień [Thu, 7 May 2026 16:08:33 +0000 (18:08 +0200)]
[9.20] chg: ci: Mark merged security fixes as "Not released yet"
Adjust the triggering rules for the "merged-metadata" CI job so that
merge requests merged into security-* branches are automatically
assigned to the "Not released yet" milestone, just like merge requests
targeting public branches. This enables merge requests containing
security fixes to be correctly processed by release automation scripts.
Backport of MR !11984
Merge branch 'backport-pspacek/extend-not-released-yet-milestone-9.20' into 'bind-9.20'
Petr Špaček [Tue, 5 May 2026 13:04:36 +0000 (15:04 +0200)]
Mark merged security fixes as "Not released yet"
Adjust the triggering rules for the "merged-metadata" CI job so that
merge requests merged into security-* branches are automatically
assigned to the "Not released yet" milestone, just like merge requests
targeting public branches. This enables merge requests containing
security fixes to be correctly processed by release automation scripts.
Michał Kępień [Thu, 7 May 2026 15:55:28 +0000 (17:55 +0200)]
[9.20] chg: ci: Enable automatic backports for security fixes
Ensure the "backports" CI job is created when new changes are merged
into security-* branches. This enables using backport automation for
security fixes.
Backport of MR !11938
Merge branch 'backport-michal/extend-automatic-backports-9.20' into 'bind-9.20'
Michał Kępień [Thu, 7 May 2026 15:45:35 +0000 (17:45 +0200)]
Enable automatic backports for security fixes
Ensure the "backports" CI job is created when new changes are merged
into security-* branches. This enables using backport automation for
security fixes.
Mark Andrews [Thu, 7 May 2026 02:01:04 +0000 (12:01 +1000)]
[9.20] fix: dev: Check validator name when adding EDE text
When a validator is being shut down, the associated name
`val->name` is set to NULL. This could cause a crash if a worker
thread subsequently added an EDE code with `val->name` in the
extra text.
`validator_addede()` now checks whether the name is NULL before
trying to add it to the extra text.
Closes #5613
Backport of MR !11945
Merge branch 'backport-each-validator-log-after-shutdown-9.20' into 'bind-9.20'
Evan Hunt [Fri, 1 May 2026 18:12:54 +0000 (11:12 -0700)]
check for val->name == NULL when adding EDE text
When a validator is being shut down, the associated name
`val->name` is set to NULL. This could cause a crash if a worker
thread subsequently added an EDE code to the response containing
val->name in the extra text.
`validator_addede()` now checks whether the name is NULL before
trying to add it to the extra text.
Arаm Sаrgsyаn [Wed, 6 May 2026 21:02:40 +0000 (21:02 +0000)]
[9.20] fix: usr: Fix a bug in allow-query/allow-transfer catalog zone custom properties
The :iscman:`named` process could terminate unexpectedly when
processing a catalog zone with an invalid ``allow-query`` or
``allow-transfer`` custom property (i.e. having a non-APL type)
coexisting with the valid property. This has been fixed.
Closes #5941
Backport of MR !11954
Merge branch 'backport-5941-catz-catz_process_apl-bug-fix-9.20' into 'bind-9.20'
Aram Sargsyan [Mon, 4 May 2026 22:34:01 +0000 (22:34 +0000)]
Fix a bug in catz_process_apl()
The allow-transfer/allow-query catalog zone custom properties support
only APL RRtypes. All other types are correctly rejected by the
catz_process_apl() function. However, when an APL RRtype is processed
by that function, and another (non-APL) RRtype is then attempted to be
processed, there is an assertion failure happening in the prologue
of the function because `*aclbp != NULL` (i.e. an APL has been already
processed). Move the code to do type checking before the affected
REQUIRE assertion.
Arаm Sаrgsyаn [Wed, 6 May 2026 19:35:18 +0000 (19:35 +0000)]
[9.20] fix: usr: Fix a memory leak issue in the catalog zones
The :iscman:`named` process could leak small amounts of memory
when processing a catalog zone entry which had defined custom
primary servers with TSIG keys using both the regular ``primaries``
custom property syntax and the legacy alternative syntax (``masters``)
at the same time. This has been fixed.
Closes #5943
Backport of MR !11951
Merge branch 'backport-5943-catz-primaries-tsig-key-name-leak-fix-9.20' into 'bind-9.20'
Arаm Sаrgsyаn [Wed, 6 May 2026 14:36:43 +0000 (14:36 +0000)]
[9.20] fix: dev: Make BIND9 compatible with OpenSSL 4
OPENSSL_cleanup() in OpenSSL 4 doesn't free the memory, and that is
not compatible with BIND 9's memory leak detection code. Don't use
custom allocation/deallocation functions for OpenSSL's internal memory
management.
See https://github.com/openssl/openssl/pull/29721
Closes #5808
Backport of MR !11865
Merge branch 'backport-5808-openssl4-compat-fix-9.20' into 'bind-9.20'
Remove OpenSSL memory tracking support from the tls.c module
OPENSSL_cleanup() in OpenSSL 4 doesn't free the memory, and that is
not compatible with BIND 9's memory leak detection code. Don't use
custom allocation/deallocation functions for OpenSSL's internal memory
management in the tls.c module.
The resolver can and will reuse outgoing TCP connections to the same host, as recommended by RFC 7766. This prevents a whole class of attacks that abuse the fact that establishing a TCP connection is expensive and it is fairly easy to deplete the outgoing TCP ports by putting them into TIME_WAIT state.
The number of pipelined queries per connection is capped at 256 to limit the impact of a connection drop.
Backport of MR !11845
Merge branch 'backport-3741-reuse-tcp-connections-9.20' into 'bind-9.20'
Include disptype and transport in dispatch hash key
Move disptype and transport into dispatch_hash() and dispatch_match()
so that the match function is the single source of truth for whether
two TCP dispatches are interchangeable. This replaces the post-loop
disptype filter in dispatch_gettcp() and makes the disptype field in
struct dispatch_key actually used.
Ondřej Surý [Sun, 15 Mar 2026 06:52:34 +0000 (07:52 +0100)]
Use sequential per-dispatch message IDs for TCP
TCP dispentries no longer use the global QID hash table at all.
Responses are matched by scanning disp->active, and sequential
per-dispatch IDs (bounded by the pipelining limit) are unique
within a single dispatch by construction. Since TCP delivers
only data we asked for on a specific connection, the per-peer
uniqueness that the global table enforced was never actually
needed for TCP.
DNS_DISPATCHOPT_FIXEDID is plumbed through dns_request_createraw
-> get_dispatch -> dns_dispatch_createtcp so FIXEDID TCP requests
always get a fresh isolated dispatch — the caller-supplied ID
then cannot collide with any other in-flight query either.
Ondřej Surý [Sun, 15 Mar 2026 06:23:33 +0000 (07:23 +0100)]
Limit TCP pipelining per shared dispatch
Cap the number of in-flight queries on a single shared TCP dispatch.
When the limit is reached, the dispatch is removed from the hash
table so subsequent queries get a fresh connection. The existing
dispatch continues serving its queries until they complete.
This bounds the blast radius of a connection drop: at most N queries
fail simultaneously instead of all queries to that server.
The default limit is 256. It can be overridden for testing via
'named -T tcppipelining=N'.
Ondřej Surý [Sun, 15 Mar 2026 07:57:26 +0000 (08:57 +0100)]
Disable TCP pipelining in tcp and masterformat system test
Set tcppipelining=1 on recursive servers in the system tests to
restore one-query-per-connection behavior. The tests relies on
specific connection and query counting that breaks with TCP
connection sharing.
Ondřej Surý [Tue, 17 Feb 2026 10:05:33 +0000 (11:05 +0100)]
Implement seamless TCP connection reuse in dns_dispatch
Previously, the user of dns_dispatch API had to first call
dns_dispatch_gettcp() and if that failed create a new TCP dispatch with
dns_dispatch_createtcp(). This has been changed and the TCP connection
reuse happens transparently inside dns_dispatch_createtcp(). There are
separate buckets for dns_resolver, dns_request and dns_xfrin units, so
these don't get mixed together.
Ondřej Surý [Wed, 6 May 2026 07:33:18 +0000 (09:33 +0200)]
[9.20] fix: usr: Fix a crash when reconfiguring while an NTA is being rechecked
When named was reconfigured or shut down while a negative trust anchor
was being rechecked against authoritative servers, the in-flight recheck
could outlive the view that owned it and cause `named` to crash. This
has been fixed.
Closes #5938
Backport of MR !11948
Merge branch 'backport-5938-ref-ntatable-9.20' into 'bind-9.20'
Evan Hunt [Mon, 4 May 2026 07:05:27 +0000 (00:05 -0700)]
Hold a reference to the NTA table for the lifetime of each NTA
Each dns__nta_t now references its parent ntatable in nta_create() and
releases it in dns__nta_destroy(). This avoids a use-after-free in
fetch_done() and other callbacks that dereference nta->ntatable: the
ntatable could otherwise be released by view destruction while an
in-flight resolver fetch still holds a reference to the NTA.
Ondřej Surý [Wed, 6 May 2026 05:53:51 +0000 (07:53 +0200)]
[9.20] fix: usr: Prevent a crash when using both dns64 and filter-aaaa
An assertion failure could be triggered if both `dns64` and the `filter-aaaa` plugin were in use simultaneously. This happened if the plugin triggered a second recursion process, which then attempted to store DNS64 state information in a pointer that had already been set by the original recursion process. This has been fixed.
Closes #5854
Backport of MR !11949
Merge branch 'backport-5854-dns64-aaaaok-9.20' into 'bind-9.20'
Evan Hunt [Mon, 4 May 2026 05:00:39 +0000 (22:00 -0700)]
Clear dns64_aaaaok immediately after use
The DNS64 state information stored in client->query.dns64_aaaaok
could cause an assertion failure in query_respond() if the server
was configured in such a way as to trigger a new recursion before
the query had been reset - for example, by using the filter-aaaa
plugin, which may need to recurse to find out whether an A record
exists.
This has been addressed by clearing DNS64 state information
immediately after the call to query_filter64().
Evan Hunt [Wed, 6 May 2026 00:01:08 +0000 (00:01 +0000)]
[9.20] fix: dev: Fix a stack use-after-free in qpzone
In previous_closest_nsec(), a new qpreader was opened to search the NSEC
tree. It was possible for that to be used to update a QP iterator object
owned by the caller, and then be destroyed when the function returned.
This has been addressed by having the caller open the NSEC qpreader
instead.
Closes #5942
Merge branch '5942-qpiter-fix-bind-9.20' into 'bind-9.20'
Evan Hunt [Mon, 4 May 2026 23:23:42 +0000 (16:23 -0700)]
Fix a stack use-after-free in qpzone
In previous_closest_nsec(), a new qpreader was opened to search the NSEC
tree. It was possible for that to be used to update a QP iterator object
owned by the caller, and then be destroyed when the function returned.
This has been addressed by having the caller open the NSEC qpreader
instead.
Ondřej Surý [Tue, 5 May 2026 19:18:27 +0000 (21:18 +0200)]
[9.20] fix: usr: Reject record sets too large to serve in DNS
When BIND was asked to store a record set whose total size exceeds
what fits in a DNS message, it would allocate memory and build the
structure, then fail later at response time. Such oversized record
sets are now rejected at the time of storage with an error, avoiding
wasted work on data that can never be served.
Backport of MR !11963
Merge branch 'backport-ondrej/harden-buflen-overflow-9.20' into 'bind-9.20'
dns_rdataslab_fromrdataset(), dns_rdataslab_merge() and
dns_rdataslab_subtract() summed per-record storage into an
unsigned int with no upper-bound check. An RRset whose total
encoded size exceeds DNS_RDATA_MAXLENGTH cannot fit in a DNS
message and is unservable; building its in-memory representation
only burns memory on data that will fail at response time, and at
the upper bound the running sum could in theory wrap.
Cap the running total at DNS_RDATA_MAXLENGTH and return ISC_R_NOSPACE
when exceeded. Update the qpdb cache memory-purge test to use a
record size that fits within the new limit.
Ondřej Surý [Tue, 5 May 2026 12:24:06 +0000 (14:24 +0200)]
[9.20] rem: dev: Remove obsolete KEY record EXTENDED flag deprecated by RFC 3445
KEY resource records originally defined EXTENDED flag that was removed
by RFC 3445 back in 2002. BIND still carried code to parse and emit it,
including the additional two-octet flags field that followed when the
EXTENDED bit was set. That handling has been removed and the affected
bit positions are now reserved.
Dropping the extended-flags handling also eliminates a possible crash
that could be reached when signing a zone containing an invalid key.
Closes #5900
Partial backport of MR !11961
Merge branch 'backport-5900-remove-keyflag-extended-9.20' into 'bind-9.20'
The DNS_KEYFLAG_EXTENDED flag was only legitimate for type KEY
and was eliminated by RFC 3445. Dropping the extended-flags
handling in pub_compare() also fixes a possible crash when
signing a zone whose journal contains a crafted DNSKEY: a
6-byte record with the EXTENDED bit set produced a memmove()
length that underflowed and ran off a stack buffer.
Ondřej Surý [Tue, 5 May 2026 06:20:19 +0000 (08:20 +0200)]
[9.20] fix: dev: Tidy up the cleanup path in check_signer()
When check_signer() processed a DNSKEY whose public-key data could not
be parsed, the early return on the parse error skipped the cleanup of
the cloned signature rdataset. In every code path that currently
reaches this function the cloned rdataset holds no resources, so no
memory was actually leaked, but the cleanup is restructured so the
parse and the iteration cannot diverge again.
Closes #5869
Merge branch '5869-fix-memory-leak-in-check_signer-9.20' into 'bind-9.20'
The cloned signature rdataset was not disassociated on the early
return taken when dns_dnssec_keyfromrdata() fails to parse the DNSKEY
public-key data. In every current caller val->sigrdataset reaches
check_signer() rdatalist-backed, so dns_rdataset_clone() copies the
struct without taking any reference and dns_rdataset_disassociate()
is a no-op -- no memory is actually leaked today. Hoist the key
parse out of the per-RRSIG loop and let the function fall through
to a single cleanup path, so the parse and the iteration cannot
diverge again.
Ondřej Surý [Tue, 5 May 2026 05:07:28 +0000 (07:07 +0200)]
[9.20] fix: usr: Prevent crafted queries from degrading RRL performance
With response rate limiting enabled, an attacker sending queries from many
spoofed source addresses could steer entries into the same slot of the
internal rate-limit table and slow down query processing on the affected
server. The table now uses a per-process keyed hash so the placement of
entries cannot be predicted or influenced from the network.
Closes #5906
Backport of MR !11950
Merge branch 'backport-5906-rrl-hash-collision-dos-9.20' into 'bind-9.20'
The previous hash_key() was a deterministic, unkeyed (<<1) + add over the
key words. An off-path attacker could invert it offline and submit
queries whose source /24, qname hash, and qtype map to a single bucket;
under chaining this turns every lookup into an O(N) walk under
rrl->lock and starves legitimate query processing on the very feature
deployed to mitigate DoS.
Replace it with isc_hash32(), which is HalfSipHash-2-4 keyed by a
per-process random seed, so collision sets cannot be precomputed.
Michał Kępień [Thu, 30 Apr 2026 20:38:25 +0000 (22:38 +0200)]
[9.20] fix: ci: Use "git push --force-with-lease" for autorebases
If a merge request is merged to an autorebased branch while it is
getting rebased, the "git push -f" command at the end of the autorebase
job will cause the contents of that merge request to be silently deleted
from Git history even though the merge request will still be (correctly)
shown as "merged" by GitLab.
Use "git push --force-with-lease" instead to prevent force-pushing the
rebased version of the branch if it is pushed to after its pre-rebase
version is fetched by the autorebase job. Report such an event
accordingly. For simplicity, no retries are attempted as the problem is
expected to be resolved by the next autorebase and the chances of this
scenario happening in practice are already low to begin with.
Backport of MR !11939
Merge branch 'backport-michal/use-git-push-force-with-lease-for-autorebases-9.20' into 'bind-9.20'
Michał Kępień [Thu, 30 Apr 2026 20:19:59 +0000 (22:19 +0200)]
Use "git push --force-with-lease" for autorebases
If a merge request is merged to an autorebased branch while it is
getting rebased, the "git push -f" command at the end of the autorebase
job will cause the contents of that merge request to be silently deleted
from Git history even though the merge request will still be (correctly)
shown as "merged" by GitLab.
Use "git push --force-with-lease" instead to prevent force-pushing the
rebased version of the branch if it is pushed to after its pre-rebase
version is fetched by the autorebase job. Report such an event
accordingly. For simplicity, no retries are attempted as the problem is
expected to be resolved by the next autorebase and the chances of this
scenario happening in practice are already low to begin with.
Michał Kępień [Thu, 30 Apr 2026 11:27:56 +0000 (13:27 +0200)]
[9.20] new: ci: Set up automatic rebasing for security-* branches
Introduce a set of private branches containing only security fixes that
are automatically rebased onto the corresponding open source branches
whenever new changes are merged. Each rebase triggers a basic build,
failing the CI job if the build breaks.
When a security-* branch is rebased, create a CI pipeline for its new
revision and rebase its corresponding bind-9.x-sub branch (if it exists)
on top of it, creating a rebase chain.
Report any failures in the process via Mattermost.
These changes enable treating security fixes similarly to other code
changes, without deferring merges all the way until release prep.
Backport of MR !11930
Merge branch 'backport-michal/autorebase-chain-9.20' into 'bind-9.20'
Michał Kępień [Thu, 30 Apr 2026 09:58:55 +0000 (11:58 +0200)]
Set up automatic rebasing for security-* branches
Introduce a set of private branches containing only security fixes that
are automatically rebased onto the corresponding open source branches
whenever new changes are merged. Each rebase triggers a basic build,
failing the CI job if the build breaks.
When a security-* branch is rebased, create a CI pipeline for its new
revision and rebase its corresponding bind-9.x-sub branch (if it exists)
on top of it, creating a rebase chain.
Report any failures in the process via Mattermost.
These changes enable treating security fixes similarly to other code
changes, without deferring merges all the way until release prep.
[9.20] fix: usr: prevent malicious DNSSEC zones from exhausting validator CPU
A DNSSEC-signed zone could publish a DNSKEY with an unusually large
RSA public exponent and force any validator resolving names in that
zone to spend disproportionate CPU verifying signatures. The
validator now rejects such DNSKEYs, matching the limit already
applied to keys read from files or HSMs.
Closes #5881
Backport of MR !11917
Merge branch 'backport-5881-rsa-exponent-keytrap-cpu-amplification-9.20' into 'bind-9.20'
Reject RSA DNSKEYs with oversize public exponents at parse time
The wire-format RSA DNSKEY parser was the only key path with no upper
bound on the public exponent — opensslrsa_parse and opensslrsa_fromlabel
already cap at RSA_MAX_PUBEXP_BITS. An attacker-controlled DNSKEY could
therefore force a validator to compute s^e mod n with e up to ~|n| bits,
amplifying every verify by ~120x for typical 2048-bit moduli (OpenSSL
itself only caps the exponent for moduli above 3072 bits). Apply the
same bit-count cap to wire-format keys.
[9.20] fix: usr: Stop delv from aborting on a malformed query name
delv aborts with SIGABRT instead of exiting cleanly when given a query
name that fails wire-format conversion (e.g. a label longer than 63
octets). After this change delv prints the parse error and exits with
a normal failure code.
Closes #5916
Backport of MR !11921
Merge branch 'backport-5916-delv-run-resolve-null-detach-abort-9.20' into 'bind-9.20'
run_resolve allocates dns_client_t late, but the cleanup epilogue
called dns_client_detach() unconditionally. When convert_name() or
dns_client_create() failed first, the detach hit a NULL client and
the REQUIRE(DNS_CLIENT_VALID) inside it aborted the process with
SIGABRT instead of a clean error exit.
Guard the detach with a NULL check. Add a digdelv test that runs
delv on a query name whose first label exceeds 63 octets and
asserts the process does not exit 134.
[9.20] fix: usr: prevent rare named crash when notifies are cancelled
Under heavy load, named could occasionally crash when a queued
outbound notify or zone refresh was cancelled at the moment it
was being sent — for example, while a zone was being reloaded or
removed. The race that caused the crash is now prevented.
Closes #5915
Backport of MR !11918
Merge branch 'backport-5915-ratelimiter-dequeue-tick-uaf-9.20' into 'bind-9.20'
isc__ratelimiter_tick() and isc_ratelimiter_shutdown() each pulled
events out of rl->pending into a function-local list, dropped the
mutex, and then iterated. ISC_LIST_APPEND leaves the link in the
LINKED state, so a concurrent isc_ratelimiter_dequeue() saw an
event as still queued, called ISC_LIST_UNLINK against rl->pending —
which patched the prev/next of the local list — and freed the
event before dispatch finished, producing either an INSIST in the
unlink macro or a use-after-free in the dispatch loop.
isc_async_run() is a non-blocking wfcq enqueue, so there is no
benefit to dropping the mutex around it. Unlink each event and
hand it to isc_async_run() while still holding rl->lock; the
existing ISC_LINK_LINKED check in dequeue then correctly
distinguishes "still queued and cancellable" from "already taken".
[9.20] fix: dev: free per-command rndc state when response serialisation fails
When isccc_cc_towire failed while building an rndc reply,
control_respond returned without releasing the per-command request,
response, HMAC secret copy, and text buffer. They were eventually
freed when the connection closed, but until then the HMAC key copy
stayed in named's memory. The failure path now goes through the
same cleanup label as every other error.
Closes #5913
Backport of MR !11915
Merge branch 'backport-5913-controlconf-control-respond-cleanup-leak-9.20' into 'bind-9.20'
Run conn_cleanup on isccc_cc_towire failure in control_respond
The bare return left conn->secret, conn->response, conn->request, and
conn->text pinned until the connection itself was torn down — every
other error in the function reaches conn_cleanup via goto, and the
success path falls into the same label, so the towire-failure return
was the lone outlier. Send it through the existing cleanup path.
[9.20] fix: dev: Fix swapped arguments in redirect2() single-label branch
On a recursive resolver with nxdomain-redirect configured, an
NXDOMAIN result for a query whose qname is the root could corrupt
the view's nxdomain-redirect target, after which the redirect
feature stopped working for every subsequent query in that view
until named was restarted.
Closes #5908
Backport of MR !11908
Merge branch 'backport-5908-query-redirect2-name-copy-arg-swap-9.20' into 'bind-9.20'
Fix swapped arguments in redirect2() single-label branch
For a query whose qname is the root, the labels==1 branch in
redirect2() called dns_name_copy(redirectname, view->redirectzone)
with arguments reversed, overwriting the view-global
nxdomain-redirect target with the empty redirectname rather than
copying the configured target into the per-query lookup name. After
the corruption, view->redirectzone names the root, so
dns_name_issubdomain() makes redirect2() short-circuit for every
subsequent query and the nxdomain-redirect feature stops working
until named is restarted.
Triggering this needs the resolver to receive an NXDOMAIN for the
root from upstream, which does not happen in normal DNS operation.
Swap the arguments to match the dns_name_copy(source, dest)
signature. Add a system test that issues a root query through the
nxdomain-redirect resolver and verifies the redirect feature still
works for a normal NXDOMAIN-producing query afterwards.
`rndc-confgen -A hmac-sha384` and `-A hmac-sha512` documented a `-b`
range of 1..1024, but any value above 512 aborted on hardened builds
instead of producing a key. The full advertised range now works.
Closes #5903
Backport of MR !11903
Merge branch 'backport-5903-hmac-generate-stack-overflow-9.20' into 'bind-9.20'
Size HMAC key generation buffers to the maximum block size
hmac_generate() declared its on-stack nonce buffer as
unsigned char data[ISC_MAX_MD_SIZE], i.e. 64 bytes. That is the maximum
digest size, but the buffer is filled up to the algorithm's HMAC block
size, which is 128 bytes for SHA-384 and SHA-512. Asking rndc-confgen
for an HMAC-SHA-384 or HMAC-SHA-512 key with -b > 512 (the documented
range allows up to 1024) wrote past the end of the stack buffer; on
hardened builds this aborted with a stack-smash detector firing
instead of producing a key.
Use the existing ISC_MAX_BLOCK_SIZE (128) for the buffer so the full
1..1024 range advertised by -A hmac-sha{384,512} works as documented.
The matching key_rawsecret[64] in confgen's generate_key() is enlarged
the same way so the generated key fits when dumped to the buffer.
Add a system test that exercises rndc-confgen across the previously
overflowing keysizes; with -Db_sanitize=address it caught the abort
before the fix.
[9.20] fix: usr: Fix suppressed missing-glue check in named-checkzone
named-checkzone and named-checkconf -z silently skipped the
missing-glue check for any NS name that had already triggered an
extra-AAAA-glue warning, so zones missing required A glue could pass
validation and be deployed with broken delegations.
Backport of MR !11899
Merge branch 'backport-ondrej/check-tool-err-glue-code-collision-9.20' into 'bind-9.20'
Resolve ERR_MISSING_GLUE / ERR_EXTRA_AAAA value collision
Both constants were defined as 5. The symbol table used by checkns() to
deduplicate log messages keys on (name, error_code), so logging an
extra-AAAA error caused logged() to also return true for the
missing-glue check, silently skipping the entire missing-glue block for
the same name in named-checkzone and named-checkconf -z.
Convert the ERR_* defines to an auto-numbered enum so the compiler
guarantees the values stay pairwise distinct.
[9.20] new: doc: Add AI coding assistants guidance to CONTRIBUTING.md
Adapted from the Linux kernel's Documentation/process/coding-assistants.rst
to the BIND 9 context. Adds three subsections under the existing
"Guidelines for Tool-Generated Content" section:
- Licensing and legal requirements (MPL-2.0, SPDX identifiers).
- Signed-off-by and Developer Certificate of Origin: AI agents must
not add Signed-off-by trailers; only the human submitter may
certify the DCO.
- Attribution: the Assisted-by: AGENT_NAME:MODEL_VERSION trailer
for recording AI involvement, with an explicit prohibition on
AI-added Co-Authored-By trailers (Co-Authored-By designates a
human co-author who shares responsibility).
Backport of MR !11888
Merge branch 'backport-ondrej/coding-assistants-doc-9.20' into 'bind-9.20'
Add AI coding assistants guidance to CONTRIBUTING.md
Adapted from the Linux kernel's Documentation/process/coding-assistants.rst
to the BIND 9 context. Adds three subsections under the existing
"Guidelines for Tool-Generated Content" section:
- Licensing and legal requirements (MPL-2.0, SPDX identifiers).
- Signed-off-by and Developer Certificate of Origin: AI agents must
not add Signed-off-by trailers; only the human submitter may
certify the DCO.
- Attribution: the Assisted-by: AGENT_NAME:MODEL_VERSION trailer
for recording AI involvement, with an explicit prohibition on
AI-added Co-Authored-By trailers (Co-Authored-By designates a
human co-author who shares responsibility).
[9.20] fix: usr: Fix named crash when processing SIG records in dynamic updates
Previously, :iscman:`named` could abort if a client sent a dynamic update containing a SIG record (the legacy signature type) to a zone configured with an update-policy. The function `dns_db_findrdataset` had an incorrect requirements prerequisite that prevented SIG records being looked up, which was triggered as part of processing an UPDATE request and could be triggered remotely by any client permitted to send updates. This has been fixed by ensuring that SIG records are handled consistently with RRSIG records during update processing.
Closes #5818
Backport of MR !11864
Merge branch 'backport-5818-fix-update-of-sig-9.20' into 'bind-9.20'
Make sure the nameserver correctly handles SIG records in the
prerequisites of the dynamic update. The first check is to ensure that
the prerequisites are not examined prior to checking the credentials.
The second test case checks that the SIG present prerequisite is
examined and therefore refuses the update. Also this should not trigger
an assertion failure in dns__db_findrdataset() (due to the REQUIRE()
only accepted dns_rdatatype_rrsig when the covers parameter was set).
Add AXFR regression test for SIG covers preservation
diff.c rdata_covers() runs on both dns_diff_apply (IXFR, ns/update.c
dynamic updates) and dns_diff_load (AXFR). After the previous commit
refused SIG and NXT in dynamic updates, the AXFR path remains the
most natural way to drive legacy SIG records into a secondary's zone
DB and regression-gate the rdata_covers() fix.
The test adds ans11 as an AsyncDnsServer primary for a small zone
whose AXFR carries two SIG rdatas at the same owner with different
covered types (A, MX) and different TTLs (600, 1200), and declares
ns6 a secondary of that zone. With the bug present, dns_diff_load
groups both tuples at typepair (SIG, 0) and the MX-covering record
inherits the first-seen TTL (600); the fix keeps them at (SIG, A)
and (SIG, MX) with their original TTLs.
rndc dumpdb -zones on the secondary is used to inspect stored state
directly, because the wire-level SIG query response merges
same-(owner,type,class) RRs and masks the per-rdataset TTLs.
SIG (24) and NXT (30) are obsolete DNSSEC record types, superseded by
RRSIG and NSEC in RFC 3755. Allowing them through dynamic update
exposes two distinct bugs that the surrounding GL#5818 work already
fixes as defense-in-depth:
- dns__db_findrdataset() used to REQUIRE that (covers == 0 ||
type == RRSIG), which aborts named when a SIG update reaches the
prescan foreach_rr() call. Fixed to accept dns_rdatatype_issig().
- diff.c rdata_covers() used to test only RRSIG, dropping the
covered-type field for SIG rdatas; the zone DB then filed every
SIG rdataset under typepair (SIG, 0) instead of
(SIG, covered_type) and follow-up adds collided at that bucket.
Fixed to use dns_rdatatype_issig().
Both underlying bugs are still reachable via inbound zone transfer
(diff.c rdata_covers() runs from both dns_diff_apply on the IXFR path
and dns_diff_load on the AXFR path), so the type-helper fixes above
remain necessary. For the dynamic-update path, the simplest and
safest posture is to refuse SIG and NXT outright at the front door in
ns/update.c, alongside the existing NSEC/NSEC3/non-apex-RRSIG
refusals. KEY remains permitted because it is still used to carry
public keys for SIG(0) transaction authentication.
The existing tcp-self SIG regression test is repointed to assert
REFUSED on the SIG add, a symmetric NXT test is added, and the
SIG-via-dyn-update covers-bucket test is removed because it is no
longer reachable through this entry point; AXFR-based coverage of
diff.c rdata_covers() follows in a separate commit.
Add regression test for SIG covers being dropped in dns_diff_apply
rdata_covers() in lib/dns/diff.c tests `type == dns_rdatatype_rrsig`
instead of dns_rdatatype_issig(), so for a legacy SIG (24) rdata it
returns 0 and the covered type is discarded on the dynamic-update /
IXFR path. The zone DB then files every SIG rdataset under typepair
(SIG, 0) instead of (SIG, covered_type), and a follow-up add with a
different covers field but a different TTL collides at that bucket,
trips DNS_DBADD_EXACTTTL in qpzone, returns DNS_R_NOTEXACT, and comes
back to the client as SERVFAIL.
The new test adds a PTR to establish the node (tcp-self requires the
client IP's reverse form to equal the owner), then two SIG updates
with different covers and different TTLs; on a buggy build the second
update is SERVFAIL and named logs `dns_diff_apply: .../SIG/IN: add
not exact`. The test is expected to pass once rdata_covers() is
switched to dns_rdatatype_issig(), matching the fix already adopted
for dns__db_findrdataset() on this branch and the helper pattern used
in master.c, xfrout.c, and qpcache.c.
Fix dropped covers field for SIG records in dns_diff_apply
rdata_covers() in lib/dns/diff.c discriminated only on
dns_rdatatype_rrsig (46) and returned 0 for the legacy SIG (24), so
the covered-type field was silently discarded on the dynamic-update
and IXFR paths. Every SIG rdataset was then filed in the zone DB
under typepair (SIG, 0) instead of (SIG, covered_type); a second SIG
add with a different covers but a different TTL collided at that
bucket, tripped DNS_DBADD_EXACTTTL in qpzone, returned
DNS_R_NOTEXACT, and came back to the client as SERVFAIL.
Use dns_rdatatype_issig() here so both SIG and RRSIG carry their
covers through the diff, matching the helper pattern already used in
lib/dns/master.c, lib/ns/xfrout.c, lib/dns/qpcache.c, and the
dns__db_findrdataset() REQUIRE that the surrounding merge request
just relaxed.
Mark Andrews [Tue, 7 Apr 2026 14:39:57 +0000 (16:39 +0200)]
Fix assertion failure in dns_db_findrdataset() for SIG records
dns__db_findrdataset() had a REQUIRE() that only accepted
dns_rdatatype_rrsig when the covers parameter was set. A dynamic
update containing a SIG record (type 24) would trigger this
assertion, crashing named. Use dns_rdatatype_issig() to accept
both SIG and RRSIG.
[9.20] fix: dev: Fix inverted gethostname() check in rndc status
The replacement of named_os_gethostname() with raw gethostname()
inverted the success check: the "localhost" fallback runs on success,
and on failure the uninitialized hostname buffer is read by snprintf(),
leaking stack memory via the rndc status reply.
Closes #5889
Backport of MR !11879
Merge branch 'backport-5889-fix-gethostname-inverted-check-9.20' into 'bind-9.20'