Mark Andrews [Wed, 4 Mar 2026 06:51:09 +0000 (17:51 +1100)]
Clear errno before calling strtol
The previous code was incorrectly clearing errno after calling
strtol but before testing the result rather than clearing it and
then calling strtol so that changes to errno can be correctly
determined.
Ondřej Surý [Mon, 16 Mar 2026 11:00:13 +0000 (12:00 +0100)]
[9.20] fix: dev: Fix use-after-free in xfrin_recv_done
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.
Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object. Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference. Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
Closes #5786
Backport of MR !11632
Merge branch 'backport-5786-fix-dtrace-after-free-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 4 Mar 2026 16:08:50 +0000 (17:08 +0100)]
Fix use-after-free in xfrin_recv_done
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.
Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object. Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference. Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
Colin Vidal [Mon, 16 Mar 2026 10:59:01 +0000 (11:59 +0100)]
[9.20] chg: dev: Exclude named.args.j2 and system test README files from license header checks
Exclude named.args.j2 files from license header checks so named.args can
be generated from Jinja templates. Also exclude system test README files
from the license header checks.
Backport of MR !11690
Merge branch 'backport-colin/reuse-namedargs-9.20' into 'bind-9.20'
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
Closes #5807
Backport of MR !11676
Merge branch 'backport-5807-openssl-4-X509_get_subject_name-compat-fix-9.20' into 'bind-9.20'
Aram Sargsyan [Thu, 12 Mar 2026 13:10:38 +0000 (13:10 +0000)]
OpenSSL 4 compatibility fix
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
Ondřej Surý [Sat, 14 Mar 2026 11:53:51 +0000 (12:53 +0100)]
Simplify checkds_create() to return void
Since memory allocation never fails in BIND 9, checkds_create() cannot
fail. Change it to return void and use designated initializers,
removing error handling at all call sites.
Ondřej Surý [Sat, 14 Mar 2026 11:53:03 +0000 (12:53 +0100)]
Fix TSIG key and transport leaks in zone_notify() error paths
Two 'goto next' paths in zone_notify() skipped detaching the TSIG
key and transport, leaking them on TLS configuration failure and
when the destination address is disabled.
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference. When dns_dispatch_connect() fails
synchronously on TCP (e.g. from dns_transport_get_tlsctx() failing
in tcp_dispatch_connect()), the connect callback is never scheduled,
so the extra reference is never consumed. This has been fixed.
Backport of MR !11640
Merge branch 'backport-ondrej/fix-resquery-refcount-9.20' into 'bind-9.20'
Ondřej Surý [Fri, 6 Mar 2026 16:06:24 +0000 (17:06 +0100)]
Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.
When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed. The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.
Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
Ondřej Surý [Sat, 14 Mar 2026 09:56:16 +0000 (10:56 +0100)]
[9.20] fix: dev: Fix memory leak in dns_catz_options_setdefault() for zonedir
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.
Free the existing opts->zonedir before replacing it.
Backport of MR !11660
Merge branch 'backport-ondrej/fix-memory-leak-in-dns_catz_options_setdefault-9.20' into 'bind-9.20'
Ondřej Surý [Wed, 11 Mar 2026 12:17:32 +0000 (13:17 +0100)]
Fix memory leak in dns_catz_options_setdefault() for zonedir
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.
Free the existing opts->zonedir before replacing it.
Ondřej Surý [Sat, 14 Mar 2026 08:43:54 +0000 (09:43 +0100)]
[9.20] fix: usr: Fix intermittent named crashes during asynchronous zone operations
Asynchronous zone loading and dumping operations occasionally dispatched tasks
to the wrong internal event loop. This threading violation triggered internal
safety assertions that abruptly terminated named. Strict loop affinity is now
enforced for these tasks, ensuring they execute on their designated threads
and preventing the crashes.
Closes #4882
Backport of MR !11655
Merge branch 'backport-4882-run-rndc-zone-commands-on-correct-loop-9.20' into 'bind-9.20'
Ondřej Surý [Tue, 10 Mar 2026 17:25:54 +0000 (18:25 +0100)]
Dispatch async work jobs from the correct loop
Refactor dns_loadctx_t and dns_dumpctx_t to use standard
ISC_REFCOUNT_DECL and ISC_REFCOUNT_IMPL macros, retiring the
redundant manual attach and detach implementations.
Introduce dns_loadctx_enqueue() and dns_dumpctx_enqueue() to
ensure compliance with the new strict loop affinity in
isc_work_enqueue(). If the current loop does not match the
target loop, the enqueue operation is safely bounced to the
correct thread via isc_async_run().
Ondřej Surý [Tue, 10 Mar 2026 17:25:37 +0000 (18:25 +0100)]
Enforce isc_work enqueue loop affinity
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
Michał Kępień [Thu, 12 Mar 2026 11:27:36 +0000 (12:27 +0100)]
Fix a typo in job name
As hinted upon by the comment preceding it, the job preparing packager
notifications was (rather unsurprisingly) supposed to be called
"prepare-packager-notification". Fix the typo in its name.
Petr Špaček [Tue, 10 Mar 2026 17:04:51 +0000 (18:04 +0100)]
Delete early access token when code is published
Technically this is not necessary because the token expires in one week
after creation, and new code would have got there only one week before
the next public release, but better be safe than sorry.
Catch is, after_script gets executed even if a job fails or is
canceled. Delete distros token only if publication succeeded.
Nicki Křížek [Mon, 9 Mar 2026 17:02:50 +0000 (18:02 +0100)]
[9.20] chg: ci: Re-enable shotgun runs for nightlies and tags
The recent rewrite of DNS Shotgun infrastructure might've improved the
prior instability. In order to evaluate, re-enable the regular shotgun
pipelines to gather data.
Backport of MR !11506
Merge branch 'backport-nicki/ci-shotgun-enable-9.20' into 'bind-9.20'
Nicki Křížek [Thu, 29 Jan 2026 10:10:10 +0000 (11:10 +0100)]
Re-enable shotgun runs
Make the shotgun pipelines on-demand with 5 samples (and no retry) by
defautl. MRs are compared to their base, while other sources (triggers,
web, schedule...) are compared against the latest released version.
For schedules, run the shotgun pipelines on Monday morning only, but
with the increased number of samples. This should provide useful data
without too many false positives.
Nicki Křížek [Mon, 9 Mar 2026 14:54:39 +0000 (15:54 +0100)]
[9.20] chg: test: Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
Backport of MR !11623
Merge branch 'backport-nicki/pytest-log-querymsg-9.20' into 'bind-9.20'
Nicki Křížek [Tue, 3 Mar 2026 12:37:14 +0000 (13:37 +0100)]
Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
This commit adds a new CI job to update the BIND9 version in the
isc-projects/bind9-docker project, which will cause the docker images
to be rebuilt for release. Previously a manual step.
A notification is sent to the relevant Mattermost channel.
Štěpán Balážik [Tue, 3 Mar 2026 08:03:40 +0000 (08:03 +0000)]
[9.20] fix: ci: Fix .respdiff-recent-named anchor to work when the ABI changes
Previously, on 9.20 and 9.18, both builds (reference and the version
being tested) would use the same .so files which lead to a crash if the
ABI changed.
Use `git worktree` to get completely separate build environment for the
reference version.
This is not a problem on 9.21 as Meson is smart and covers this mistake,
but apply the fix to it as well for consistency.
This also is not a problem on non-MR pipelines: the latest released version
was used as a reference there, so the .so versions would differ.
Štěpán Balážik [Mon, 2 Mar 2026 14:54:53 +0000 (15:54 +0100)]
Fix .respdiff-recent-named anchor to work when the ABI changes
Previously, on 9.20 and 9.18, both builds (reference and the version
being tested) would use the same .so files which lead to a crash if the
ABI changed.
Use `git worktree` to get completely separate build environment for the
reference version.
This is not a problem on 9.21 as Meson is smart and covers this mistake,
but apply the fix to it as well for consistency.
Colin Vidal [Sun, 1 Mar 2026 19:01:20 +0000 (20:01 +0100)]
[9.20] fix: usr: Resolve "key defined in view is not found"
Commit `2956e4fc` hardened the `key` name check when used in `primaries` to reject the configuration if the key was not defined, rather than simply checking whether the key name was correctly formed.
However, the key name check didn't include the view configuration, causing keys not to be recognized if they were defined inside the view and not at the global level. This regression is now fixed.
Backport of MR !11588
Closes #5761
Merge branch 'backport-5761-key-view-9.20' into 'bind-9.20'
Colin Vidal [Mon, 23 Feb 2026 18:36:19 +0000 (19:36 +0100)]
checkconf: check key existence in views
Commit `2956e4fc45b3c2142a3351682d4200647448f193` hardened the `key`
name check when used in `primaries` to reject the configuration if
the key was not defined, rather than simply checking whether the
key name was correctly formed.
However, the key name check didn't include the view configuration,
causing keys not to be recognized if they were defined inside the
view and not at the global level. This regression is now fixed.
Ondřej Surý [Thu, 26 Feb 2026 08:13:34 +0000 (09:13 +0100)]
[9.20] chg: dev: Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound partial
Fisher-Yates shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
introduces fctx_getaddresses_nsorder() to perform an in-place
randomization of indices into a bounded, stack-allocated lookup array
(nsorder) representing the "winning" fetch slots.
The nameserver dataset is now traversed in exactly one sequential pass:
1. Every nameserver is evaluated for local cached data.
2. If the current nameserver's sequential index exists in the randomized
nsorder array, it is permitted to launch an outgoing network fetch.
3. If not, it is restricted to local lookups via DNS_ADBFIND_NOFETCH.
This guarantees a fair random distribution for outbound queries while
maximizing local cache hits, entirely within O(1) memory and without
the overhead of linked-list pointer shuffling or dynamic allocation.
Closes #5695
Backport of MR !11604
Merge branch 'backport-5695-refactor-the-random-NS-selection-9.20' into 'bind-9.20'
Colin Vidal [Wed, 25 Feb 2026 18:01:22 +0000 (19:01 +0100)]
Add test coverage for nameserver processing limits
Introduce a new system test (nsprocessinglimit) to verify that the
resolver strictly respects outgoing network fetch quotas when presented
with heavily delegated, unresponsive zones.
This test acts as a regression check for the recent Fisher-Yates nameserver
selection refactor. It sets up an authoritative server delegating a zone
to 23 distinct nameservers (all pointing to unresponsive loopback IPs).
Using dnstap, the test forces a resolution failure and verifies that:
1. The resolver successfully traverses the zone delegation path.
2. The resolver caps the outgoing network queries to the delegated
nameservers exactly at the processing limit (20 fetches), ensuring
array boundaries and dynamic fetch quotas are strictly enforced without
crashing or hanging.
Ondřej Surý [Wed, 25 Feb 2026 15:46:40 +0000 (16:46 +0100)]
Implement Fisher-Yates shuffle for nameserver selection
Replace the two-pass "random start index and wrap around" logic in
fctx_getaddresses_nameservers() with a statistically sound Fisher-Yates
shuffle.
The previous implementation picked a random starting node and did two
passes over the linked list to find query candidates. The new logic
extracts the available nameservers into a bounded, stack-allocated array
of dns_rdata_t structures.
This array is then randomized in-place using a Fisher-Yates shuffle.
Finally, the shuffled array is traversed sequentially to launch fetches
until the dynamic quota (fctx->pending_running >= fetches_allowed) is
reached.
This guarantees a fair random distribution for outbound queries while
properly respecting dynamic query limits, entirely within O(1) memory
and without the overhead of linked-list pointer shuffling or multiple
dataset traversals.
Štěpán Balážik [Wed, 25 Feb 2026 13:49:25 +0000 (13:49 +0000)]
[9.20] chg: ci: Rework linting of Python code
With the Python version bumped to 3.10 and the dependency situation cleared with !11415 it is now time to run linters and formatters on more parts of the Python code that was previously skipped or ignored.
Switch configuration of the various Python-adjacent tools to `pyproject.toml` to ensure that the same configuration is used in CI and locally.
See the individual commits for details on settings changed and linters added.
Tweaks to type checking and enabling more `ruff` lints will come in a subsequent MRs.
Prerequisites:
- bind9-qa!160.
- images!442
Backport of MR !11499
Merge branch 'backport-stepan/python-tooling-9.20' into 'bind-9.20'
Štěpán Balážik [Mon, 9 Feb 2026 18:22:44 +0000 (19:22 +0100)]
Clean up imports of dnspython modules
Add a pylint plugin that enforces:
- There is no bare `import dns` statement.
- All `dns.<module>` used are explicitly imported.
- There are no unused `dns.<module>` imports.
Štěpán Balážik [Fri, 20 Feb 2026 14:03:16 +0000 (15:03 +0100)]
Replace Optional["T"] with "T | None"
In Python 3.10 strings don't support the | operator, so ruff doesn't
attempt to fix these. Quote the entire type specification to avoid the
typing.Optional import.
Alternatives I considered:
- leaving it as is (only use of Optional in the code base)
- using `from future import __annotations__` (replacing one import with
another one)
Štěpán Balážik [Wed, 4 Feb 2026 17:17:17 +0000 (18:17 +0100)]
Make default_algorithm accessible through a fixture and method
Importing pytest fixture trips up static analysis tools, so move
default_algorithm to conftest.py and use it instead of os.environ
accesses in various system tests.
For use outside test function, use Algorithm.default().