Arаm Sаrgsyаn [Wed, 18 Mar 2026 17:04:31 +0000 (17:04 +0000)]
fix: dev: Take dns_dtenv_t reference before an async function call
A 'dns_dtenv_t' pointer is passed to an async function without taking
a reference first, which can potentially cause a use-after-free error.
Take a reference, then detach in the async function.
Closes #5820
Merge branch '5820-dns_dtenv-reference-bug-fix' into 'main'
Aram Sargsyan [Tue, 17 Mar 2026 11:23:22 +0000 (11:23 +0000)]
Take 'env' reference before async calling perform_reopen()
The 'env' pointer is passed to an async function without taking
a reference first, which can potentially cause a use-after-free
error. Take a reference, then detach in the async function.
Nicki Křížek [Wed, 18 Mar 2026 14:10:30 +0000 (15:10 +0100)]
chg: dev: Use underscore for system test names
Change the convention for system test directory names to always use an
underscore rather than a hyphen. Names using underscore are valid python
package names and can be used with standard `import` facilities in
python, which allows easier code reuse.
Merge branch 'nicki/system-test-dir-underscore-names' into 'main'
Nicki Křížek [Tue, 17 Mar 2026 16:18:48 +0000 (17:18 +0100)]
Rename all system test to use underscore
All system tests previously using a hyphen have been renamed to use
underscore instead. A couple of symlinks were corrected and one path in
`nsec3-answer` adjusted accordingly.
Nicki Křížek [Tue, 17 Mar 2026 16:08:15 +0000 (17:08 +0100)]
Use underscore for system test names
Change the convention for system test directory names to always use an
underscore rather than a hyphen. Names using underscore are valid python
package names and can be used with standard `import` facilities in
python, which allows easier code reuse.
The temporary directories for test execution and their convenience
symlinks have been switched to using hyphens rather than underscores to
keep the pytest collection, filtering and .gitignore working as
expected.
Ondřej Surý [Wed, 18 Mar 2026 10:39:16 +0000 (11:39 +0100)]
fix: dev: Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated. This makes the buffer
believe it has more space than actually allocated. A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.
Pass h2->content_length as the capacity to match the allocation.
Merge branch 'ondrej/fix-isc_buffer_init-capacity-mismatch-in-DoH' into 'main'
Ondřej Surý [Wed, 11 Mar 2026 12:17:45 +0000 (13:17 +0100)]
Fix isc_buffer_init capacity mismatch in DoH data chunk callback
isc_buffer_init() is given MAX_DNS_MESSAGE_SIZE (65535) as capacity but
only h2->content_length bytes are allocated. This makes the buffer
believe it has more space than actually allocated. A secondary bounds
check (new_bufsize <= h2->content_length) prevents actual overflow, but
the buffer invariant is violated.
Pass h2->content_length as the capacity to match the allocation.
Ondřej Surý [Wed, 18 Mar 2026 10:37:31 +0000 (11:37 +0100)]
rem: usr: Remove NZF file support in favor of NZD (New Zone Database)
The NZF (New Zone File) backend for storing rndc addzone configurations
has been removed; LMDB-based NZD is now the only storage backend and
LMDB is now a required build dependency.
Existing NZF files are automatically migrated to NZD on startup, so no manual
intervention is required when upgrading.
Merge branch 'ondrej/drop-nzf-support' into 'main'
Ondřej Surý [Sun, 15 Mar 2026 04:05:09 +0000 (05:05 +0100)]
Split NZD functions into a separate compilation unit
Move all LMDB-based new zone database functions from server.c into
nzd.c to reduce the size of server.c and isolate the NZD/LMDB
interface. Rename load_nzf() to nzd_load_nzf() to match the nzd_
namespace.
Ondřej Surý [Sun, 15 Mar 2026 03:42:45 +0000 (04:42 +0100)]
Remove dead NZF writer parameter and simplify newzone locking
Now that NZF write support is gone, remove the unused nzfwriter_t
typedef and nzfwriter parameter from delete_zoneconf(). Remove the
bool locked parameter and simplify the locking in do_modzone() and
rmzone() to unconditional lock/unlock pairs.
Ondřej Surý [Sun, 15 Mar 2026 03:25:12 +0000 (04:25 +0100)]
Remove NZF support, make LMDB required for new zone storage
Drop the NZF (New Zone File) fallback for persisting runtime zone
configurations, making LMDB (NZD) the only storage backend. This
removes all #ifdef HAVE_LMDB conditionals, the meson 'lmdb' option,
and the NZF-related functions. LMDB is now a mandatory build
dependency.
Ondřej Surý [Tue, 17 Mar 2026 15:40:29 +0000 (16:40 +0100)]
fix: usr: Fix potential resource during resolver error handling
Under specific error conditions during query processing, resources were not
being properly released, which could eventually lead to unnecessary memory
consumption for the server. The a potential resource leak in the resolver
has been fixed.
Merge branch 'ondrej/fix-pthread-primitives-usage' into 'main'
Ondřej Surý [Tue, 10 Mar 2026 10:30:54 +0000 (11:30 +0100)]
Fix missing mutex destroy and ede invalidate on fctx_create() error paths
The error cleanup in fctx_create() was missing isc_mutex_destroy() and
dns_ede_invalidate() calls. When error paths (cleanup_nameservers,
cleanup_fcount, cleanup_qmessage, cleanup_adb) were taken after the
mutex and edectx were initialized, the fctx memory was freed without
properly destroying these resources first.
Mark Andrews [Wed, 4 Mar 2026 06:51:09 +0000 (17:51 +1100)]
Clear errno before calling strtol
The previous code was incorrectly clearing errno after calling
strtol but before testing the result rather than clearing it and
then calling strtol so that changes to errno can be correctly
determined.
We return DNS_R_NOVALIDSIG if we detected a deadlock. Then in
'validate_async_done()', this result value is used to check if we
need to fall back to insecure. As part of that we create a new fetch
but that fails because of the detected deadlock. This results in a loop
of deadlock detected, fallback to insecure, deadlock detected, ...
Add a new result value, ISC_R_DEADLOCK, and return this instead when
we have detected a deadlock. This will be treated as a generic error,
as there is no special handling for this result value.
Ondřej Surý [Mon, 16 Mar 2026 11:17:16 +0000 (12:17 +0100)]
chg: nil: Cleanup the duplicate logic and comments around add into NSEC tree
After merging the NORMAL, NSEC and NSEC3 tree into single QP tree, there were some comments still speaking about auxiliary NSEC tree. These were cleaned up and the logic when we pass the qp tree (write transaction) to qpzone_addrdataset_inner() was changed to be more obvious that this is needed only when we are adding NSEC records.
Merge branch 'ondrej/additional-cleanups-around-NSEC-namespace' into 'main'
Ondřej Surý [Sat, 31 Jan 2026 06:32:08 +0000 (07:32 +0100)]
Cleanup weird syntax defining struct dns_ixfr
The struct dns_ixfr was defined as part of struct dns_xfrin, probably
because at some point it was an anonymous struct and then it was changed
to named struct with typedef at the top. Move the definition from
struct dns_xfrin into and fold into the typedef ... dns_ixfr_t.
Ondřej Surý [Sat, 31 Jan 2026 06:24:49 +0000 (07:24 +0100)]
Cleanup the duplicate logic and comments around add into NSEC tree
After merging the NORMAL, NSEC and NSEC3 tree into single QP tree, there
were some comments still speaking about auxiliary NSEC tree. These were
cleaned up and the logic when we pass the qp tree (write transaction) to
qpzone_addrdataset_inner() was changed to be more obvious that this is
needed only when we are adding NSEC records.
Colin Vidal [Mon, 16 Mar 2026 10:36:25 +0000 (11:36 +0100)]
chg: dev: Exclude named.args.j2 and system test README files from license header checks
Exclude named.args.j2 files from license header checks so named.args can
be generated from Jinja templates. Also exclude system test README files
from the license header checks.
Ondřej Surý [Mon, 16 Mar 2026 10:06:28 +0000 (11:06 +0100)]
fix: dev: Fix use-after-free in xfrin_recv_done
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.
Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object. Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference. Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
Closes #5786
Merge branch '5786-fix-dtrace-after-free' into 'main'
Ondřej Surý [Wed, 4 Mar 2026 16:08:50 +0000 (17:08 +0100)]
Fix use-after-free in xfrin_recv_done
Move the LIBDNS_XFRIN_RECV_DONE probe execution before dns_xfrin_detach
in xfrin_recv_done.
Previously, dns_xfrin_detach was called before the trace probe, which
could free the xfr object. Because the accessed member xfr->info is an
embedded array, the expression evaluates via pointer arithmetic rather
than a direct memory dereference. Although this prevents a reliable
crash in practice, it technically remains a use-after-free issue.
Reorder the statements to ensure the transfer context is fully valid
when the probe executes.
Arаm Sаrgsyаn [Mon, 16 Mar 2026 10:01:32 +0000 (10:01 +0000)]
fix: dev: Fix OpenSSL 4 compatibility issue when calling X509_get_subject_name()
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
Closes #5807
Merge branch '5807-openssl-4-X509_get_subject_name-compat-fix' into 'main'
Aram Sargsyan [Thu, 12 Mar 2026 13:10:38 +0000 (13:10 +0000)]
OpenSSL 4 compatibility fix
Starting from OpenSSL 4 the the X509_get_subject_name() function
returns a 'const' pointer to a name instead of a regular pointer.
Duplicate the name before operating on it, then free it.
Ondřej Surý [Sat, 14 Mar 2026 11:53:51 +0000 (12:53 +0100)]
Simplify checkds_create() to return void
Since memory allocation never fails in BIND 9, checkds_create() cannot
fail. Change it to return void and use designated initializers,
removing error handling at all call sites.
Ondřej Surý [Sat, 14 Mar 2026 11:53:03 +0000 (12:53 +0100)]
Fix TSIG key and transport leaks in zone_notify() error paths
Two 'goto next' paths in zone_notify() skipped detaching the TSIG
key and transport, leaking them on TLS configuration failure and
when the destination address is disabled.
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.
Fix the second pair to check RADIX_V6.
Merge branch 'ondrej/fix-copy-paste-error-checking-RADIX_V4-instead-of-RADIX_V6' into 'main'
Ondřej Surý [Wed, 11 Mar 2026 12:17:56 +0000 (13:17 +0100)]
Fix INSIST copy-paste error checking RADIX_V4 instead of RADIX_V6
The INSIST in isc_radix_insert() checks node->data[RADIX_V4] and
node->node_num[RADIX_V4] twice due to a copy-paste error, never
verifying the RADIX_V6 fields.
Ondřej Surý [Sat, 14 Mar 2026 10:02:10 +0000 (11:02 +0100)]
fix: dev: Fix port validation rejecting valid port 65535
Three port validation checks use >= UINT16_MAX instead of > UINT16_MAX,
incorrectly rejecting port 65535 as out of range. Port 65535 is a valid
TCP/UDP port number. Other port checks in the same file already use the
correct > comparison.
Merge branch 'ondrej/fix-port-validation-rejecting-valid-port-65535' into 'main'
Ondřej Surý [Wed, 11 Mar 2026 12:18:01 +0000 (13:18 +0100)]
Fix port validation rejecting valid port 65535
A few port validation checks use >= UINT16_MAX instead of > UINT16_MAX,
incorrectly rejecting port 65535 as out of range. Port 65535 is a valid
TCP/UDP port number. Other port checks in the same file already use the
correct > comparison.
Ondřej Surý [Sat, 14 Mar 2026 09:10:37 +0000 (10:10 +0100)]
fix: dev: Fix memory leak in dns_catz_options_setdefault() for zonedir
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.
Free the existing opts->zonedir before replacing it.
Merge branch 'ondrej/fix-memory-leak-in-dns_catz_options_setdefault' into 'main'
Ondřej Surý [Wed, 11 Mar 2026 12:17:32 +0000 (13:17 +0100)]
Fix memory leak in dns_catz_options_setdefault() for zonedir
When defaults->zonedir is set, opts->zonedir is unconditionally
overwritten without freeing the previous value. This leaks memory
on every catalog zone update when zonedir defaults are configured.
Free the existing opts->zonedir before replacing it.
Ondřej Surý [Sat, 14 Mar 2026 06:45:57 +0000 (07:45 +0100)]
fix: usr: Fix intermittent named crashes during asynchronous zone operations
Asynchronous zone loading and dumping operations occasionally dispatched tasks
to the wrong internal event loop. This threading violation triggered internal
safety assertions that abruptly terminated named. Strict loop affinity is now
enforced for these tasks, ensuring they execute on their designated threads
and preventing the crashes.
Closes #4882
Merge branch '4882-run-rndc-zone-commands-on-correct-loop' into 'main'
Ondřej Surý [Tue, 10 Mar 2026 17:25:54 +0000 (18:25 +0100)]
Dispatch async work jobs from the correct loop
Refactor dns_loadctx_t and dns_dumpctx_t to use standard
ISC_REFCOUNT_DECL and ISC_REFCOUNT_IMPL macros, retiring the
redundant manual attach and detach implementations.
Introduce dns_loadctx_enqueue() and dns_dumpctx_enqueue() to
ensure compliance with the new strict loop affinity in
isc_work_enqueue(). If the current loop does not match the
target loop, the enqueue operation is safely bounced to the
correct thread via isc_async_run().
Ondřej Surý [Tue, 10 Mar 2026 17:25:37 +0000 (18:25 +0100)]
Enforce isc_work enqueue loop affinity
Add a REQUIRE(isc_loop() == loop) assertion to isc_work_enqueue()
to strictly enforce that work is enqueued from the loop it is
assigned to. This loudly prohibits cross-thread queue manipulation
before it inevitably turns into a concurrency debugging nightmare.
Michał Kępień [Thu, 12 Mar 2026 11:27:36 +0000 (12:27 +0100)]
Fix a typo in job name
As hinted upon by the comment preceding it, the job preparing packager
notifications was (rather unsurprisingly) supposed to be called
"prepare-packager-notification". Fix the typo in its name.
Petr Špaček [Tue, 10 Mar 2026 17:04:51 +0000 (18:04 +0100)]
Delete early access token when code is published
Technically this is not necessary because the token expires in one week
after creation, and new code would have got there only one week before
the next public release, but better be safe than sorry.
Catch is, after_script gets executed even if a job fails or is
canceled. Delete distros token only if publication succeeded.
Ondřej Surý [Tue, 10 Mar 2026 17:38:37 +0000 (18:38 +0100)]
fix: dev: Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference. When dns_dispatch_connect() fails
synchronously on TCP (e.g. from dns_transport_get_tlsctx() failing
in tcp_dispatch_connect()), the connect callback is never scheduled,
so the extra reference is never consumed. This has been fixed.
Merge branch 'ondrej/fix-resquery-refcount' into 'main'
Ondřej Surý [Fri, 6 Mar 2026 16:06:24 +0000 (17:06 +0100)]
Fix resquery reference imbalance on TCP connect failure
In fctx_query(), resquery_ref(query) is called before
dns_dispatch_connect() in anticipation of the resquery_connected()
callback consuming the reference.
When dns_dispatch_connect() fails synchronously on TCP (e.g. from
dns_transport_get_tlsctx() failing in tcp_dispatch_connect()), the
connect callback is never scheduled, so the extra reference is never
consumed. The error path then tears down the query via manual cleanup
(isc_mem_put) without going through the refcount destructor, leaving
the reference imbalanced.
Fix by dropping the extra reference on the error path, just after
dns_dispatch_done() which cleans up the dispatch entry.
Nicki Křížek [Tue, 10 Mar 2026 15:07:54 +0000 (16:07 +0100)]
chg: test: Disable statschannel RTT tests on FreeBSD
These tests rely on somewhat precise timing, as they test that answers
arrive in a particular latency bucket within the statschannel stats.
These tests are affected by various timing and network issues on our
FreeBSD CI runners and the results are very unstable. Skip these on
FreeBSD entirely.
Merge branch 'nicki/disable-statschannel-rtt-on-freebsd' into 'main'
Nicki Křížek [Tue, 10 Mar 2026 12:35:56 +0000 (13:35 +0100)]
Disable statschannel RTT tests on FreeBSD
These tests rely on somewhat precise timing, as they test that answers
arrive in a particular latency bucket within the statschannel stats.
These tests are affected by various timing and network issues on our
FreeBSD CI runners and the results are very unstable. Skip these on
FreeBSD entirely.
Nicki Křížek [Mon, 9 Mar 2026 15:48:24 +0000 (16:48 +0100)]
chg: ci: Re-enable shotgun runs for nightlies and tags
The recent rewrite of DNS Shotgun infrastructure might've improved the
prior instability. In order to evaluate, re-enable the regular shotgun
pipelines to gather data.
Merge branch 'nicki/ci-shotgun-enable' into 'main'
Nicki Křížek [Thu, 29 Jan 2026 10:10:10 +0000 (11:10 +0100)]
Re-enable shotgun runs
Make the shotgun pipelines on-demand with 5 samples (and no retry) by
defautl. MRs are compared to their base, while other sources (triggers,
web, schedule...) are compared against the latest released version.
For schedules, run the shotgun pipelines on Monday morning only, but
with the increased number of samples. This should provide useful data
without too many false positives.
Nicki Křížek [Mon, 9 Mar 2026 12:12:14 +0000 (13:12 +0100)]
chg: test: Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
Merge branch 'nicki/pytest-log-querymsg' into 'main'
Nicki Křížek [Tue, 3 Mar 2026 12:37:14 +0000 (13:37 +0100)]
Log dnspython queries after .to_wire() is called
Some dns message modifications like TSIG happen only after .to_wire() is
called on the message. To ensure there isn't a discrepancy between what
has been logged and what has been sent, log the query after
dns.query.udp() is executed (which calls .to_wire() on the message).
Alessio Podda [Fri, 6 Mar 2026 14:06:13 +0000 (14:06 +0000)]
chg: dev: Replace lock keyfile hashmap with lock pool
Kasp used a lock per zone origin in order to prevent concurrent access
to keyfiles. This lead to substantial memory consumption in the case of
authoritative servers with many small zones, as lots of locks need to be
allocated.
Since the number of keyfile locks taken cannot exceed the number of
helper threads, it makes more sense to use a lock pool of fixed size
keyed by the hash of the origin name, leading to memory savings.
Merge branch 'alessio/keyfile-lock-pool' into 'main'
Alessio Podda [Fri, 27 Feb 2026 12:33:55 +0000 (13:33 +0100)]
Replace lock keyfile hashmap with lock pool
Kasp used a lock per zone origin in order to prevent concurrent access
to keyfiles. This lead to substantial memory consumption in the case of
authoritative servers with many small zones, as lots of locks need to be
allocated.
Since the number of keyfile locks taken cannot exceed the number of
helper threads, it makes more sense to use a lock pool of fixed size
keyed by the hash of the origin name, leading to memory savings.
This commit adds a new CI job to update the BIND9 version in the
isc-projects/bind9-docker project, which will cause the docker images
to be rebuilt for release. Previously a manual step.
A notification is sent to the relevant Mattermost channel.
fix: usr: Fix setting retire in dns_keymgr_key_init
A wrong-variable bug in `dns_keymgr_key_init()` causes the DNSSEC key inactive
time to never be read. This means the key state is retracting zone signatures
where it should have, delaying the key rollover.
ISC would like to thank Naresh Kandula Parmar (Nottiboy) for reporting this.
Closes #5774
Merge branch '5774-fix-setting-retire' into 'main'
Make the maximum number of processed delegation nameservers configurable
via the new 'max-delegation-servers' option (default: 13), replacing the
hardcoded NS_PROCESSING_LIMIT (20).
The default is reduced to 13 to precisely match the maximum number of
root servers that can fit into a classic 512-byte UDP payload. This
provides a natural, historically sound cap that mitigates resource
exhaustion and amplification attacks from artificially inflated or
misconfigured delegations.
The configuration option is strictly bounded between 1 and 100 to ensure
resolver stability.
Merge branch 'ondrej/make-NS_PROCESSING_LIMIT-configurable' into 'main'