Mark Andrews [Fri, 10 Apr 2026 08:08:15 +0000 (18:08 +1000)]
[9.20] fix: usr: Fix zone verification of NSEC3 signed zones
Previously, when computing the compressed bitmap during verification of an NSEC3-signed zone, an undersized buffer was used that resulted in an out-of-bounds write if there were too many active windows in the bitmap. This impacted mirror zones which are NSEC3-signed, `dnssec-signzone` and `dnssec-verifyzone`. This has been fixed.
Closes #5834
Backport of MR !11804
Merge branch 'backport-5834-fix-cbm-size-9.20' into 'bind-9.20'
Michał Kępień [Thu, 9 Apr 2026 11:48:45 +0000 (13:48 +0200)]
[9.20] fix: ci: Purge distros token in a separate CI job
The "publish" job runs on a dedicated, locked-down runner that lacks the
Python modules necessary to execute the manage_distros_token.py script.
Instead of deleting the token within the "publish" job, purge it in a
separate job that automatically runs on the "base" image after the
"publish" job succeeds. Define "rules" for the new job so that the
token is only deleted for security releases, as it should have been
initially.
Backport of MR !11817
Merge branch 'backport-michal/purge-distros-token-in-a-separate-ci-job-9.20' into 'bind-9.20'
Michał Kępień [Thu, 9 Apr 2026 11:23:57 +0000 (13:23 +0200)]
Purge distros token in a separate CI job
The "publish" job runs on a dedicated, locked-down runner that lacks the
Python modules necessary to execute the manage_distros_token.py script.
Instead of deleting the token within the "publish" job, purge it in a
separate job that automatically runs on the "base" image after the
"publish" job succeeds. Define "rules" for the new job so that the
token is only deleted for security releases, as it should have been
initially.
Mark Andrews [Thu, 9 Apr 2026 02:07:26 +0000 (12:07 +1000)]
[9.20] fix: doc: nsupdate does not handle zero length RDATA well
Nsupdate does not distinguish between a non-existing RDATA field
and an empty RDATA field when determining which action is desired
when the RDATA field is empty. This only affects a few data types,
like APL, which allow an empty RDATA field. Document a workaround
of using the '\# 0' form for entering these specific records. e.g.
# delete the APL RRset
update delete IN APL
# delete the APL record with a zero length rdata
update delete IN APL \# 0
Closes #5835
Backport of MR !11775
Merge branch 'backport-5835-nsupdate-doc-zero-length-rdata-how-to-9.20' into 'bind-9.20'
Mark Andrews [Tue, 31 Mar 2026 01:26:42 +0000 (12:26 +1100)]
nsupdate does not handle zero length RDATA well
Nsupdate does not distinguish between a non-existing RDATA field
and an empty RDATA field when determining which action is desired
when the RDATA field is empty. This only affects a few data types,
like APL, which allow an empty RDATA field. Document a workaround
of using the '\# 0' form for entering these specific records. e.g.
# delete the APL RRset
update delete IN APL
# delete the APL record with a zero length rdata
update delete IN APL \# 0
Mark Andrews [Tue, 7 Apr 2026 21:58:22 +0000 (07:58 +1000)]
[9.20] fix: test: Check exit status of dig and nsupdate in nsupdate system test
Add missing failure checks to six dig and nsupdate invocations in nsupdate system test so that command failures are properly caught instead of silently ignored.
Backport of MR !11811
Merge branch 'backport-marka/check-return-codes-in-nsupdate-test-9.20' into 'bind-9.20'
The system test was also subject to the same off by one bug that also
existed in the code. That is: if the inception time of the signature
is exactly equal to the inactive time of the key, we still have to
expect the signature.
This specific test case triggered a bug where the SKR included bundles
with unsigned DNSKEY RRsets (signatures where omitted because the
inception time was equal to the inactive time of the key).
If the inception time of the signature is exactly equal to the
inactive time of the key, still include the signature. Otherwise there
may be corner cases where signatures are omitted erroneously.
The name 'isdelegation()' was confusing. This function is not checking
whether this message is a delegation, but whether the denial of
existence proofs in this message is a proof of a referral to an
unsigned zone.
The name 'is_unsecure_referral()' is more appropriate.
Revert isdelegation() to return boolean value again
The isdelegation() was changed to return an isc_result_t because the
idea was to have a separate return value DNS_R_NSEC3ITERRANGE to signal
to the caller we could not verify the proof because of too many
iterations in the NSEC3 record, or perhaps ISC_R_UNEXPECTED for a more
generic cause that verification was not done.
But this would make error handling more fragile and all we care about
is whether we can reliably say the NS bit was not set.
If we can not reliably say so, we have to treat it as an insecure
referrral.
Since the answer is either yes or no, we can revert back to returning
a boolean value.
Replace the hand-rolled threaded socket server with the standard
AsyncDnsServer framework used by other ans.py servers in the test suite.
The DNS wire-format message builders (IXFR diff, AXFR, SOA, SERVFAIL)
are retained unchanged since they produce carefully crafted messages
needed to trigger the IXFR->AXFR race condition. The server
infrastructure is replaced:
- Manual TCP/UDP socket management and threading replaced by
AsyncDnsServer, which handles both protocols, pidfile lifecycle,
and signal handling.
- Query parsing replaced by the framework's dns.message-based parser;
query dispatch moved into IxfrRaceHandler.get_responses().
- The axfr_done_event threading.Event replaced by a boolean instance
variable on IxfrRaceHandler, safe within the single asyncio event
loop.
- For IXFR over TCP, the handler yields two BytesResponseSend actions
(msg1 then msg2) so the framework sends both with TCP length prefixes,
preserving the race-triggering sequence.
- For IXFR over UDP, the TC flag is set on the response to force TCP
retry.
- Unused encode_name_compressed() and parse_dns_query() removed.
Also fix a timing issue that might result in the initial transfer not
being done by the time the test is executed -- since ns11 is started
after ns6. Ensure the initial transfer has happened before running the
ixfr_race test.
Aram Sargsyan [Wed, 4 Mar 2026 16:25:33 +0000 (16:25 +0000)]
Fix a race condition in xfrin_recv_done() when calling xfrin_reset()
When the xfrin_recv_done() function decides to retry the transfer
using AXFR because of a previous error, it calls the xfrin_reset()
function which calls dns_db_closeversion() on 'xfr->ver'. The problem
is that the ixfr processing of a previous message could be still
in process in a worker thread, which then can use freed 'xfr->ver'.
If there is an ongoing worker thread delay the AXFR retry until after
the worker thread has finished its work.
Aram Sargsyan [Thu, 5 Mar 2026 11:15:38 +0000 (11:15 +0000)]
Add a test to check for IXFR->AXFR race-condition
The test initiates a zone transfer with IXFR, which produces
a big amount of differences and then generates an error. The
secondary should be able to gracefully shutdown the ongoing
IXFR transfer and retry with AXFR without race conditions
between them.
This test checks for an issue (GL#5767) but since a race
condition is usually time-sensitive it might require several
attempts before it reproduces the issue.
[9.20] new: test: Add regression test for NSEC proof after unsigned-to-signed IXFR
Test that a secondary receiving an IXFR transitioning a zone from
unsigned to NSEC-signed returns the correct covering NSEC record
for empty non-terminal names.
Backport of MR !11786
Merge branch 'backport-ondrej/fix-nsec-ixfr-9.20' into 'bind-9.20'
Add regression test for NSEC proof after unsigned-to-signed IXFR
Test that a secondary receiving an IXFR transitioning a zone from
unsigned to NSEC-signed returns the correct covering NSEC record
for empty non-terminal names.
Add isctest.query.wait_for_serial() shared helper for waiting until
a server has a specific SOA serial.
[9.20] fix: usr: Use the zone file's basename as origin in DNSSEC tools
In `dnssec-signzone` and `dnssec-verify`, when the zone origin is not specified using the `-o` parameter, the default behavior is to try to sign using the zone's file name as the origin. So, for example, `dnssec-signzone -S example.com` will work, so long as the file name matches the zone name.
This now also works if the zone is in a different directory. For example, `dnssec-signzone -S zones/example.com` will set the origin value to `example.com`.
Closes #5678
Backport of MR !11360
Merge branch 'backport-5678-signzone-basename-9.20' into 'bind-9.20'
Evan Hunt [Wed, 10 Dec 2025 00:52:44 +0000 (16:52 -0800)]
use the zone file's basename as origin in dnssec tools
In dnssec-signzone and dnssec-verify, if the zone origin is not
specified using the `-o` parameter, the default behavior is to try
to use the zone's file name as the origin. So, for example,
`dnssec-signzone -S example.com` or 'dnssec-verify example.com'
will work, so long as the file name matches the zone name.
This now also works if the zone is in a different directory.
For example, `dnssec-signzone -S zones/example.com` or
'dnssec-verify zones/example.com' will set the origin value
to `example.com`.
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.
Fix the second pair to check RADIX_V6.
Backport of MR !11664
Merge branch 'backport-ondrej/fix-copy-paste-error-checking-RADIX_V4-instead-of-RADIX_V6-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 11 Mar 2026 12:17:56 +0000 (13:17 +0100)]
Fix INSIST copy-paste error checking RADIX_V4 instead of RADIX_V6
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.
Ondřej Surý [Mon, 30 Mar 2026 17:01:19 +0000 (19:01 +0200)]
[9.20] fix: usr: Count temporal problems with DNSSEC validation as attempts
After KeyTrap, the temporal DNSSEC were originally hard errors that
caused validation failures even if the records had another valid
signature. This has been changed and the RRSIGs outside of the
inception and expiration time are not counted as hard errors. However,
these errors are not even counted as validation attempts, so excessive
number of expired RRSIGs would cause some non-cryptograhic extra work
for the validator. This has been fixed and the temporal errors are
correctly counted as validation attempts.
Closes #5760
Backport of MR !11589
Merge branch 'backport-5760-count-DNSSEC-temporal-errors-as-validation-attempts-9.20' into 'bind-9.20'
Ondřej Surý [Mon, 23 Feb 2026 18:42:49 +0000 (19:42 +0100)]
Count temporal problems with DNSSEC validation as attempts
After KeyTrap, the temporal DNSSEC were originally hard errors that
caused validation failures even if the records had another valid
signature. This has been changed and the RRSIGs outside of the
inception and expiration time are not counted as hard errors. However,
these errors are not even counted as validation attempts, so excessive
number of expired RRSIGs would cause some non-cryptograhic extra work
for the validator. This has been fixed and the temporal errors are
correctly counted as validation attempts.
Ondřej Surý [Mon, 30 Mar 2026 10:31:31 +0000 (12:31 +0200)]
[9.20] fix: dev: Backport test for update-policy per-type max quota bypass via crafted UPDATE messages
An authenticated DDNS client could bypass update-policy per-type record limits
(e.g. TXT(3)) by including padding records in the UPDATE message that are
silently skipped during processing in the main branch.
As BIND 9.20 is not affected, only backport the test.
Closes #5799
Backport of MR !11708
Merge branch 'backport-5799-fix-counter-desync-in-SSU-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 18 Mar 2026 09:33:06 +0000 (10:33 +0100)]
Fix update-policy per-type max quota bypass via counter desynchronization
The prescan and main update loops in DNS UPDATE processing both used the
same counter to index the maxbytype[] quota array. The prescan loop
always incremented the counter, but the main loop had 14 continue paths
that skipped the increment. This allowed an authenticated DDNS client to
craft an UPDATE message with padding records (e.g. CNAME+A pairs that
trigger CNAME-conflict skips) to shift the counter and read wrong quota
entries, bypassing per-type record limits entirely.
Fix by incrementing the counter unconditionally at the start of each
iteration in the main loop.
Arаm Sаrgsyаn [Fri, 27 Mar 2026 14:34:35 +0000 (14:34 +0000)]
[9.20] fix: usr: Fix the processing of empty catalog zone ACLs
The :iscman:`named` process could terminate unexpectedly when
processing a catalog zone ACL in an APL resource record that
was completely empty. This has been fixed.
Closes #5801
Backport of MR !11740
Merge branch 'backport-5801-catz-empty-apl-rr-bug-fix-9.20' into 'bind-9.20'
Aram Sargsyan [Mon, 23 Mar 2026 15:15:18 +0000 (15:15 +0000)]
Allow empty APL records
Allow empty APL records because RFC 3123 (Section 4) says "zero or
more items". This fixes processing of a catalog zone ACL (which is
based on APL records) when the zone contains an empty APL record or
when a zone update arrives which creates an empty APL record.
Michał Kępień [Wed, 25 Mar 2026 17:09:02 +0000 (18:09 +0100)]
Prevent unscheduled release publication
The "publish" job has no dependencies on other jobs, so nothing prevents
it from being accidentally started before the scheduled publication
date. Although publication still requires confirmation via an SSH
connection to a dedicated, locked-down runner, performing that action
prematurely may have drastic consequences. Therefore, it is worth
implementing additional safeguards.
Add an extra check to the "publish" job to ensure it can only be run on
the scheduled publication day. In exceptional circumstances, this check
can be overridden by setting the FORCE_PUBLICATION CI variable to any
non-empty value.
Michał Kępień [Wed, 25 Mar 2026 17:09:02 +0000 (18:09 +0100)]
Tighten dependencies for tag-related jobs
The "merge-tag" and "update-stable-tag" jobs currently use the
"manual_release_job_qa" YAML anchor, which makes them depend on the
"staging" job. Meanwhile, both of these jobs require the tag they were
created for to be public for them to work. While this is harmless, as
these jobs will simply fail if they are run too early, it still makes
sense for them to depend on the "publish" job instead, if only to reduce
confusion in the pipeline view. Adjust the "needs" key for the
"merge-tag" and "update-stable-tag" jobs accordingly.
Michał Kępień [Wed, 25 Mar 2026 17:09:02 +0000 (18:09 +0100)]
Extend artifact lifetime for Cloudsmith build jobs
The commit.txt file produced by each Cloudsmith build job is required to
run the corresponding publication job. Therefore, the artifact lifetime
for the former must be long enough to prevent the file from expiring
before the publication job is run. Set the lifetime of the artifacts
created by Cloudsmith build jobs to one month to ensure that the
publication jobs can access them.
Michał Kępień [Wed, 25 Mar 2026 17:09:02 +0000 (18:09 +0100)]
Fix building EVN & -S Cloudsmith packages
Setting "artifacts: false" for the dependency on the "publish-private"
job prevents the url-*.txt files produced by that job from being pulled
from GitLab when the jobs that build EVN & -S Cloudsmith packages are
run, effectively breaking the latter. Fix by making these jobs depend
on the artifacts of the "publish-private" job.
Michał Kępień [Wed, 25 Mar 2026 17:03:23 +0000 (18:03 +0100)]
[9.20] chg: test: Rename "nsec3-delegation" to "nsec3_delegation"
The "nsec3-delegation" test was added in a release branch, before commit e40db975d9917f07da81ae01bfbe8f9cf3b0cab2 introduced the current system
test naming convention. Rename the test to comply with that convention.
Backport of MR !11753
Merge branch 'backport-michal/rename-nsec3-delegation-test-9.20' into 'bind-9.20'
Michał Kępień [Wed, 25 Mar 2026 14:36:17 +0000 (15:36 +0100)]
Rename "nsec3-delegation" to "nsec3_delegation"
The "nsec3-delegation" test was added in a release branch, before commit e40db975d9917f07da81ae01bfbe8f9cf3b0cab2 introduced the current system
test naming convention. Rename the test to comply with that convention.
Ondřej Surý [Wed, 25 Mar 2026 16:06:46 +0000 (17:06 +0100)]
[9.20] sec: usr: Fix crash when reconfiguring zone update policy during active updates
Fixed a crash that could occur when running rndc reconfig to change a zone's update policy (e.g., from allow-update to update-policy) while DNS UPDATE requests were being processed for that zone.
ISC would like to thank Vitaly Simonovich for bringing this issue to our attention.
Fixes #5817
Backport of MR !11707
Merge branch 'backport-5817-fix-crash-via-SSU-table-desynchronization-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 18 Mar 2026 03:09:50 +0000 (04:09 +0100)]
Add regression test for TOCTOU race in DNS UPDATE SSU handling
Race rndc reconfig (toggling between allow-update and update-policy)
against a stream of DNS UPDATEs for 5 seconds and verify that named
does not crash.
Before the fix, the race between send_update() and update_action()
reading the SSU table independently could trigger an assertion
failure (INSIST) when the zone's update policy changed between the
two reads.
Ondřej Surý [Wed, 18 Mar 2026 02:55:51 +0000 (03:55 +0100)]
Fix TOCTOU race in DNS UPDATE SSU table handling
Pass the SSU table through the update event struct from
send_update() to update_action() instead of reading it from the
zone twice. If rndc reconfig changed the zone's update policy
between the two reads (e.g., from allow-update to update-policy),
send_update() would skip the maxbytype allocation but
update_action() would see a non-NULL ssutable, triggering
INSIST(ssutable == NULL || maxbytype != NULL) and crashing named.
The ssutable reference is now taken once in send_update() and
transferred to update_action() via the event struct, ensuring
both functions see the same value.
Michal Nowak [Wed, 25 Mar 2026 11:38:20 +0000 (12:38 +0100)]
[9.20] fix: ci: Set User-Agent for Sphinx to fix gitlab.gnome.org
The linkcheck started to fail because of a new check on gitlab.gnome.org
that now forbids Sphinx User-Agent, returnin 406 HTTP status.
( chapter10: line 115) broken https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home - 406 Client Error: Not Acceptable for url: https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home
Backport of MR !11747
Merge branch 'backport-mnowak/linkcheck-set-user-agent-9.20' into 'bind-9.20'
Michal Nowak [Wed, 25 Mar 2026 09:39:15 +0000 (10:39 +0100)]
Set User-Agent for Sphinx to fix gitlab.gnome.org
The linkcheck started to fail because of a new check on gitlab.gnome.org
that now forbids Sphinx User-Agent, returnin 406 HTTP status.
( chapter10: line 115) broken https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home - 406 Client Error: Not Acceptable for url: https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home
Matthijs Mekking [Wed, 25 Mar 2026 09:14:49 +0000 (09:14 +0000)]
[9.20] fix: usr: Fix a crash triggered by rndc modzone on zone that already existed in NZF file
Calling `rndc modzone` didn't work properly for a zone hat was configured in
the configuration file. It could crash if BIND 9 was built without LMDB or if
there was already an NZF file for the zone. In addition, `rndc modzone` failed
in subsequent attempts. These problems are now fixed.
Closes #5826
Merge branch '5826-fix-modzone-issues-ytatuya' into 'bind-9.20'
JINMEI Tatuya [Mon, 23 Mar 2026 16:58:39 +0000 (09:58 -0700)]
ensure rndc modzone succeeds twice for a zone in named.conf
If a zone is in named.conf, not originally added by rndc addzone,
rndc modzone for that zone succeeds once, but subsequent modzone
attempts fail. This is because do_modzone removes the zone config
from global or view options, but it would fail due to 'not found'
once the config is removed.
The fix is to ensure re-adding the updated zone config to the
global or view options. This also works as a more complete fix
for the issue 85453d3 atempted to solve, ensuring rndc showzone
shows the latest config: it now works for multple attemps of
modzone, and with named that is not built with LMDB.
The change in this commit relies on UNCONST in a few places.
That's not clean, but 'add/mod/delzone' generally seems to
need it (for example, delete_zoneconf uses it to modify the list
of zones). In that sense, this change follows the convention
(for a longer term, there may have to be a better API so that we
can modify config obtions that were once parsed).
This commit doesn't seem to be a complete solution of what
it appears to fix: showzone succeeds and shows the modified
config after first modzone, but subsequent attempts of modzone
fail (though not because of the commit being reverted), let
alone showing the correct new config.
Revering the change for now, and will provide a more comprehensive
fix in the next commit.
JINMEI Tatuya [Sat, 21 Mar 2026 06:33:04 +0000 (23:33 -0700)]
prevent named crash on rndc modzone for a zone in named.conf
If named is built without LMDB and has a zone in named.conf,
then rndc modzone for that zone triggers an assertion failure
unless there's already an NZF file. This is because load_nzf
doesn't create 'nzf_config' when NZF is missing, while a valid
nzf_config is assumed in do_modzone when it tries to add the
modified zone config to add_parser.
The crash is fixed by skipping the call to cfg_parser_mapadd when
nzf_config is NULL. Skipping it should be okay since the config stored
in add_parser would be needed only for subsequently deleting a zone by
rndc delzone when the zone was originally added by rndc addzone, but
in this case the zone was not 'added'. Checking if nzf_config is NULL
before using it also seems to be consistent with other parts of the
implementation.
Ondřej Surý [Mon, 23 Mar 2026 11:05:40 +0000 (12:05 +0100)]
[9.20] new: dev: Add MOVE_OWNERSHIP() macro for transferring pointer ownership
A helper macro that returns the current value of a pointer and sets
it to NULL in one expression, useful for transferring ownership in
designated initializers.
Backport of MR !11724
Merge branch 'backport-ondrej/TAKE_OWNERSHIP-macro-9.20' into 'bind-9.20'
Ondřej Surý [Fri, 20 Mar 2026 01:15:17 +0000 (02:15 +0100)]
Add MOVE_OWNERSHIP() macro for transferring pointer ownership
A helper macro that returns the current value of a pointer and sets
it to NULL in one expression, useful for transferring ownership in
designated initializers.
Ondřej Surý [Mon, 23 Mar 2026 10:08:04 +0000 (11:08 +0100)]
[9.20] chg: dev: Skip cache flush ordering on NTA expiry
dns_view_flushnode() was called in the delete_expired() async
callback, which runs after the query that detected the NTA expiry.
This created a race: the query would proceed with stale cached data
from the NTA period before the flush had a chance to run, resulting
in transient SERVFAIL with EDE 22 (No Reachable Authority).
Skip dns_view_flushnode() in the older branches as the solutions for
older branches are too complicated and this was not a critical bug.
Also simplify the expiry comparison in delete_expired() to a direct
pointer comparison (nta == pval) instead of comparing expiry
timestamps.
Backport of MR !11729
Merge branch 'backport-ondrej/refactor-nta-using-RCU-delete-order-fix-9.20' into 'bind-9.20'
Ondřej Surý [Fri, 20 Mar 2026 22:56:02 +0000 (23:56 +0100)]
Replace existing NTA instead of reusing it in dns_ntatable_add()
When an NTA already exists for a name, the old code retrieved
and reused the existing NTA object, then reset its timer via
settimer(). This is incorrect because isc_timer_start() and
isc_timer_stop() require the timer to be manipulated from its
owning loop (enforced by REQUIRE(timer->loop == isc_loop()) in
lib/isc/timer.c), and the caller may be running on a different
loop than the one that created the original NTA.
Instead, delete the old NTA (shutting down its timer on the
correct loop) and insert a fresh one that is owned by the
current loop.
Ondřej Surý [Fri, 20 Mar 2026 13:29:57 +0000 (14:29 +0100)]
SKIP cache flush ordering on NTA expiry
dns_view_flushnode() was called in the delete_expired() async
callback, which runs after the query that detected the NTA expiry.
This created a race: the query would proceed with stale cached data
from the NTA period before the flush had a chance to run, resulting
in transient SERVFAIL with EDE 22 (No Reachable Authority).
Skip dns_view_flushnode() in the older branches as the solutions for
older branches are too complicated and this was not a critical bug.
Ondřej Surý [Fri, 20 Mar 2026 02:25:21 +0000 (03:25 +0100)]
[9.20] fix: usr: Fix NTA (Negative Trust Anchor) expiration issue
When a configured NTA for a name expired, any possibly cached
data for the name (with "insecure" DNSSEC validation result)
was not flushed from the resolver's cache. This has been fixed.
Closes #5747
Backport of MR !11597
Merge branch 'backport-5747-nta-expiry-cache-flush-bug-fix-9.20' into 'bind-9.20'
Ondřej Surý [Fri, 20 Mar 2026 02:23:20 +0000 (03:23 +0100)]
[9.20] fix: dev: Fix data race on fctx->vresult in validated()
Move the write to fctx->vresult after LOCK(&fctx->lock). The field was
being set before acquiring the lock, but dns_resolver_logfetch() reads
it under the same lock from another thread.
Backport of MR !11717
Merge branch 'backport-ondrej/fix-data-race-on-fctx-result-in-validated-9.20' into 'bind-9.20'
Ondřej Surý [Thu, 19 Mar 2026 02:42:08 +0000 (03:42 +0100)]
Fix data race on fctx->vresult in validated()
Move the write to fctx->vresult after LOCK(&fctx->lock). The field was
being set before acquiring the lock, but dns_resolver_logfetch() reads
it under the same lock from another thread.
Ondřej Surý [Fri, 20 Mar 2026 01:47:55 +0000 (02:47 +0100)]
[9.20] fix: dev: Fix data race in server round-trip time tracking
The SRTT (Smoothed Round-Trip Time) update for remote servers was not
atomic — concurrent callers could each read the same value and one
update would be silently lost. Additionally, the aging decay applied
once per second could run multiple times if several threads entered the
function simultaneously.
Use compare-and-swap loops for the SRTT update and for the aging
timestamp to ensure no updates are lost.
Backport of MR !11718
Merge branch 'backport-ondrej/fix-non-atomic-srtt-aging-9.20' into 'bind-9.20'
Ondřej Surý [Thu, 19 Mar 2026 03:17:45 +0000 (04:17 +0100)]
Fix non-atomic read-modify-write on entry->srtt in adjustsrtt()
The SRTT update loaded the old value, computed a new one, and stored it
back as separate operations. Two concurrent callers could each read the
same old value and one update would be silently lost.
Use a CAS loop for the read-modify-write on entry->srtt. For the aging
path, also CAS on entry->lastage to prevent multiple threads from aging
the same entry in the same second.
Arаm Sаrgsyаn [Wed, 18 Mar 2026 17:44:23 +0000 (17:44 +0000)]
[9.20] fix: dev: Take dns_dtenv_t reference before an async function call
A 'dns_dtenv_t' pointer is passed to an async function without taking
a reference first, which can potentially cause a use-after-free error.
Take a reference, then detach in the async function.
Closes #5820
Backport of MR !11705
Merge branch 'backport-5820-dns_dtenv-reference-bug-fix-9.20' into 'bind-9.20'
Aram Sargsyan [Tue, 17 Mar 2026 11:23:22 +0000 (11:23 +0000)]
Take 'env' reference before async calling perform_reopen()
The 'env' pointer is passed to an async function without taking
a reference first, which can potentially cause a use-after-free
error. Take a reference, then detach in the async function.
Nicki Křížek [Wed, 18 Mar 2026 15:12:03 +0000 (16:12 +0100)]
[9.20] chg: dev: Use underscore for system test names
Change the convention for system test directory names to always use an
underscore rather than a hyphen. Names using underscore are valid python
package names and can be used with standard `import` facilities in
python, which allows easier code reuse.
Backport of MR !11710
Merge branch 'backport-nicki/system-test-dir-underscore-names-9.20' into 'bind-9.20'
Nicki Křížek [Tue, 17 Mar 2026 16:18:48 +0000 (17:18 +0100)]
Rename all system test to use underscore
All system tests previously using a hyphen have been renamed to use
underscore instead. A couple of symlinks were corrected and one path in
`nsec3-answer` adjusted accordingly.
Nicki Křížek [Tue, 17 Mar 2026 16:08:15 +0000 (17:08 +0100)]
Use underscore for system test names
Change the convention for system test directory names to always use an
underscore rather than a hyphen. Names using underscore are valid python
package names and can be used with standard `import` facilities in
python, which allows easier code reuse.
The temporary directories for test execution and their convenience
symlinks have been switched to using hyphens rather than underscores to
keep the pytest collection, filtering and .gitignore working as
expected.
Ondřej Surý [Wed, 18 Mar 2026 14:26:29 +0000 (15:26 +0100)]
[9.20] fix: dev: Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated. This makes the buffer
believe it has more space than actually allocated. A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.
Pass h2->content_length as the capacity to match the allocation.
Backport of MR !11662
Merge branch 'backport-ondrej/fix-isc_buffer_init-capacity-mismatch-in-DoH-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 11 Mar 2026 12:17:45 +0000 (13:17 +0100)]
Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated. This makes the buffer
believe it has more space than actually allocated. A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.
Pass h2->content_length as the capacity to match the allocation.