Evan Hunt [Fri, 21 Nov 2025 21:18:25 +0000 (21:18 +0000)]
chg: dev: add dns_message functions to set EDNS options
The new `dns_message_ednsinit()` and `dns_message_ednsaddopt()` functions
allow EDNS options to be added to a message one at a time; it is no
longer necessary to construct a full array of EDNS options and set
them all at once.
This allows us to simplify EDNS option handling code, and in the
future it wlil allow plugins to add EDNS options to existing
messages.
Merge branch 'each-refactor-message-edns' into 'main'
Evan Hunt [Fri, 21 Nov 2025 07:33:29 +0000 (23:33 -0800)]
remove dns_message_buildopt
now that the EDNS state is stored within dns_message_t, it's no longer
necessary to have a public API call to build an opt rdataset; we can
just have dns_message_setopt() build the opt record internally.
Evan Hunt [Wed, 19 Nov 2025 07:29:12 +0000 (23:29 -0800)]
add dns_message API to add EDNS options
The new dns_message_ednsinit() and dns_message_ednsaddopt() functions
allow EDNS options to be added to a message one at a time; it is no
longer necessary to construct a full array of EDNS options and set
them all at once.
This allows us to simplify EDNS option handling code, and in the
future it wlil allow plugins to add EDNS options to existing
messages.
Nicki Křížek [Thu, 20 Nov 2025 17:09:58 +0000 (18:09 +0100)]
Only render required zones in config for nsec3 tests
When all zones are configured, regardless of whether the test module
actually uses them, it makes debugging the logs needlessly more
complicated, as there is a bunch of stuff going on that is completely
unrelated to the test.
Define a list of tested zones in each test module and only render the
named.conf with those zones defined.
1. A zone that previously failed to load is now fixed. Make sure the
zone is signed correctly with the right NSEC3 parameters.
2. Test case to ensure the salt is the same after a restart, i.e. no
re-salting takes place. Previously we only tested with salt length
0, this commit adds a test case for salt length 8 as well.
This converts the nsec3 system test cases after to reconfiguring the
name server.
Two extra test for nsec3-change.kasp is updated. It depends on the
zone being updated, and a reconfig. This test code is moved to
tests_nsec3_reconfig.py.
Furthermore, an additional 'rndc signing -nsec3param' error test
case has been added.
Change the named.conf templating to make use of jinja template
rendering. The ns2 server is trivial. The ns3 server configuration
structure has changed:
The common configuration is moved out of named-fips.conf.
The main named.conf file is in named.conf.j2. It always includes the
common part, named-common.conf.j2, and the FIPS part,
named-fips.conf.j2.
The named-fips.conf.j2 and named-rsasha1.conf.j2 templates are
rendered differently depending on the reconfiged status. Mainly the
dnssec-policy for zones are different after reconfiguration, but there
are some other changes to, for example some zones change their
inline-signing setting.
Some zones only exist prior or after the configuration.
Finally, this is a bit hackish: If RSASHA1 is supported, named.conf
includes "named-rsasha1.conf", otherwise it includes the deliberately
empty "named-rsasha0.conf".
This converts all the nsec3 system test cases prior to reconfiguring the
name server. There are two main classes, one that tests the zone is
correctly signed with NSEC, the other with NSEC3.
Two extra tests for nsec3-dynamic-update-inline.kasp and
nsec3-change.kasp are also rewritten. For the former, we need to
change the 'nsupdate' definition to be able to set the expected RCODE.
The merging of the user options and defaults into the effective configuration broke the mutual inheritance of the `allow-recursion`, `allow-query`, and `allow-query-cache` ACLs, and of the `allow-recursion-on` and `allow-query-cache-on` ACLs. This has been fixed.
Closes #5647
Merge branch '5647-allow-recursion-inheritance' into 'main'
Evan Hunt [Thu, 20 Nov 2025 06:09:53 +0000 (22:09 -0800)]
fix ACL settings when merging views
when merging view objects into the effective configuration, add
allow-query-cache, allow-recursion, allow-query-cache-on and
allow-recursion-on ACLs as needed to reflect the way those
options inherit from each other.
this means the effective configuration is now correct for each
view. ACLs no longer need to be corrected when applying the
configuration, and the actual effective ACL values will be
displayed in "rndc showconf" and "named-checkconf -pe".
Evan Hunt [Thu, 20 Nov 2025 01:52:39 +0000 (17:52 -0800)]
fix allow-recursion/allow-query-cache inheritance
the merging of options and defaults into the effective configuration
broke the mutual inheritance of the allow-recursion, allow-query, and
allow-query-cache ACLs, and of the allow-recursion-on and
allow-query-cache-on ACLs.
this has been corrected by adding a 'cloned' flag to the cfg_obj
structure to indicate whether it was configured explicitly or
cloned from the defaults during parsing. we can then adjust the
ACLs while configuring a view, favoring user-configured values
when they're available over cloned defaults.
currently the adjustments to the ACLs are done in configure_view();
later they'll be moved into the effective configuration and this
special handling can be removed.
Evan Hunt [Thu, 20 Nov 2025 00:35:31 +0000 (16:35 -0800)]
add a test for allow-recursion/allow-query-cache inheritance
allow-recursion is set to "none" in the options block and to
"any" in the view. allow-query-cache in the view should inherit
the "any", not the "none". (currently this test does not pass.)
Colin Vidal [Thu, 20 Nov 2025 17:52:29 +0000 (18:52 +0100)]
fix: dev: Attach socket before async streamdns_resume_processing
Call to `streamdns_resume_processing` is asynchronous but the socket
passed as argument is not attached when scheduling the call.
While there is no reproducible way (so far) to make the socket reference
number down to 0 before `streamdns_resume_processing` is called, attach
the socket before scheduling the call. This guard against an hypothetic
case where, for some reasons, the socket refcount would reach 0, and be
freed from memory when `streamdns_resume_processing` is called.
Closes #5620
Merge branch '5620-attach-socket-streamdns_resume_processing' into 'main'
Colin Vidal [Tue, 18 Nov 2025 09:31:24 +0000 (10:31 +0100)]
attach socket before async streamdns_resume_processing
Call to `streamdns_resume_processing` is asynchronous but the socket
passed as argument is not attached when scheduling the call.
While there is no reproducible way (so far) to make the socket reference
number down to 0 before `streamdns_resume_processing` is called, attach
the socket before scheduling the call. This guard against an hypothetic
case where, for some reasons, the socket refcount would reach 0, and be
freed from memory when `streamdns_resume_processing` is called.
Ondřej Surý [Thu, 20 Nov 2025 12:32:42 +0000 (13:32 +0100)]
chg: usr: Reduce the number of outgoing queries
Reduces the number of outgoing queries when resolving the nameservers
for delegation points. This helps the DNS resolver with cold cache
resolve client queries with complex delegation chains and redirections.
Merge branch 'ondrej/fctx_getaddresses' into 'main'
Ondřej Surý [Wed, 19 Nov 2025 07:57:09 +0000 (08:57 +0100)]
Refactor fctx_getaddresses() into couple smaller functions
The fctx_getaddresses() was lengthy and little bit confusing with
goto statements. Split the single function into smaller parts:
one for forwarders, one for nameservers and one for alternates.
Ondřej Surý [Thu, 23 Oct 2025 11:11:45 +0000 (13:11 +0200)]
Reduce the number of outgoing queries
The dns_resolver mode of operation is to resolve all the domains as it
iterates the DNS tree to fill up the cache as quickly as possible.
This commit reduces the number of outgoing queries by reducing the
number of remote fetches started for the nameserver addresses resolution
via dns_adb_createfind() to a smaller number per depth of the recursion
since the delegation point (3 2 1 0) - where 0 means only create fetch
on demand if we don't have any addresses yet.
Mark Andrews [Thu, 20 Nov 2025 08:46:10 +0000 (19:46 +1100)]
fix: usr: AMTRELAY type 0 presentation format handling was wrong
RFC 8777 specifies a placeholder value of "." for the gateway field when the
gateway type is 0 (no gateway). This was not being checked for nor emitted
when displaying the record. This has been corrected.
Instances of this record will need the placeholder period added to them when upgrading.
Closes #5639
Merge branch '5639-fix-atmrelay-type-0-support' into 'main'
Colin Vidal [Tue, 18 Nov 2025 11:16:39 +0000 (12:16 +0100)]
chg: dev: Remove exclusive mode when scheduling zone load
Remove exclusive mode when scheduling the zone load, as it is no longer necessary;
data that can be read or written by multiple threads are locked or atomic.
The detection of the post zone DB loading logic has been refactored
to take into account the fact that zone databases may be loaded before the
function scheduling the loads.
Merge branch 'colin/remove-exclusive-zone-load' into 'main'
Colin Vidal [Mon, 10 Nov 2025 14:14:44 +0000 (15:14 +0100)]
refactor detection of zone DB load completion
Because the asynchronous loading logic expected all jobs to be scheduled
then to be run (because it used to be scheduled during the exclusive
mode) and because all jobs are scheduled on various threads, there were
random situations where load_zones() would return after the scheduled
DB zone loading actually ran. In such cases, the zl->refs ref counter
in view_loaded() wouldn't go down to 0 and the remaining task to do
once all zones were loaded was never called. In particular,
server->reload_status kept the NAMED_RELOAD_PENDING state.
This problem is fixed by handling zoneload_t as a ref-counted object,
shared between load_zones() and each instance of scheduled zone DB
loading. Its destructor function is actually the content of
view_loaded() in the case the zt->refs went to 0. This ensures a
correct post-loading routine to be called once the last load is done.
Colin Vidal [Mon, 10 Nov 2025 11:07:18 +0000 (12:07 +0100)]
harden configloading system test
The configloading system script attempts multiple `rndc
{reconfig,reload}` commands without ensuring the system left
exclusive mode; which normally raise an RNDC error as the server is
currently reloading already. This used to work because the request was
enqueued while the server was in exclusive mode, and was processed
after the server `reload_status` was reset to `NAMED_RELOAD_DONE`.
Due to the fact the exclusive mode is not retaken after
`apply_configuration()` by `load_zones()`, the scheduling of
pending tasks is changed and, regularly, the RNDC command sent by the
test is processed before `NAMED_RELOAD_DONE` is set. This is the same
kind of issue the views system tests had, solved by
`4b2dcb3128fbd5af4609a5a73aeeee1f93bde237`
Fix the problem by waiting for a log line matching the end of
the reloading phase.
Colin Vidal [Mon, 10 Nov 2025 12:11:05 +0000 (13:11 +0100)]
set `reload_status` to fail before logging it
The `reload_status` is set to `NAMED_RELOAD_FAILED` after the log line is
printed about this change. Update `reload_status` first, to avoid
(unlikely) case where a test waiting for this log line would attempt a
RNDC reload query but it would be processed by `named` before the status
is updated.
Colin Vidal [Mon, 10 Nov 2025 08:49:06 +0000 (09:49 +0100)]
remove exclusive mode when scheduling zone load
Remove the exclusive mode when scheduling the zone load right after
(re)loading `named` configuration, as there is no reason anymore to
schedule zone loading while the exclusive lock is held. Data which can
be read or written by multiple threads are locked or atomic.
Colin Vidal [Tue, 18 Nov 2025 10:04:49 +0000 (11:04 +0100)]
chg: usr: Enforce bounds of prefetch configuration option
The prefetch configuration option now enforces boundaries. The configuration (including when using `named-checkconf`) now fails if the trigger (first value) is above 10, and if the eligibility (second optional value) isn't at least six seconds greater than the trigger value.
Merge branch 'colin/prefetch-enforcebounds' into 'main'
Colin Vidal [Mon, 17 Nov 2025 11:33:48 +0000 (12:33 +0100)]
enforces bounds of prefetch statement
The prefetch statement now enforces its bounds. The configuration
(including `named-checkconf`) now fails if the trigger (first value) is
above 10, or if the eligibility (second optional value) isn't at least
six seconds more than the trigger value.
Colin Vidal [Tue, 18 Nov 2025 09:08:57 +0000 (10:08 +0100)]
chg: usr: Enforces the fact that catalog-zone can not be used in non IN views
Catalog-zones can't be used in a view which is not from the IN class.
This is now enforced as the server won't load (instead of loading
without the catalog-zone) if such configuration is detected. This
configuration error is now also caught by `named-checkconf`.
Merge branch 'colin/catz-enforce-non-in' into 'main'
Colin Vidal [Mon, 17 Nov 2025 16:00:27 +0000 (17:00 +0100)]
enforces that catalog-zone can't be used in non IN views
Catalog-zones can't be used in view which are not from the IN class.
This is now enforced as the server won't load (instead of loading
without the catalog-zone). This configuration error is now also caught
by `named-checkconf`.
Colin Vidal [Mon, 17 Nov 2025 14:23:58 +0000 (15:23 +0100)]
remove need_hints parameters to configure_view
The `configure_view()` `need_hints` is removed as it this function was
always called with the value `true`.
The `need_hints` wasn't even used in the function. The only thing it was
actually used was to throw a warning which can be done simply in an
`else` condition branch.
Moreoever, in the case of catalog zones and response-policy, it fixes a
possible bug that would affect root zones, as those wouldn't be reverted
back to their previous version in case of the view fails to load
(during a server reconfiguration).
Colin Vidal [Tue, 18 Nov 2025 08:16:57 +0000 (09:16 +0100)]
chg: dev: No effective config as text if allow-new-zones is yes
Do not save the text version of the effective configuration when
`allow-new-zones` is enabled, as in that case the object tree can
be printed on demand, reducing unnecessary memory consumption.
Merge branch 'colin/no-effective-config-as-text-allownewzones' into 'main'
Colin Vidal [Mon, 17 Nov 2025 10:06:34 +0000 (11:06 +0100)]
no effective config as text if allow-new-zones is yes
Do not save the textual version of the effective configuration when
`allow-new-zones` is enabled, as it can be printed on-demand. This
enable to reduce the memory footprint of ~70MB on huge configurations
(1M zones).
Colin Vidal [Thu, 13 Nov 2025 14:33:02 +0000 (15:33 +0100)]
fix: dev: Remove holes in `dns_zoneflg_t` enum
The `dns_zoneflg_t` enum defined multiple possible flags for a zone, but
contains numerous holes (likely from flag removed in the past). This
fixes the holes, and use a bit-shift and decimal notation to make holes
easier to spot.
Merge branch 'colin/remove-zoneflag-holes' into 'main'
Colin Vidal [Fri, 31 Oct 2025 09:32:53 +0000 (10:32 +0100)]
remove holes in `dns_zoneflg_t` enum
`dns_zoneflg_t` enum defined multiple possible flags for a zone, but
contains numerous holes (likely from flag removed in the past). This
fixes the holes, and use a bit-shift and decimal notation to make holes
easier to spot.
Colin Vidal [Wed, 12 Nov 2025 10:40:33 +0000 (11:40 +0100)]
fix: dev: Save configuration as text
A `cfg_obj_t` object tree structure takes up considerably more space than the equivalent canonical text. If `allow-new-zones` is disabled and catalog zones are not in use, then we don't need the object tree. By storing the configuration in text format, we can use less memory, and `rndc showconf` and `rndc showzone` still work.
Evan Hunt [Wed, 12 Nov 2025 02:50:23 +0000 (18:50 -0800)]
save effective configuration as text
the effective configuration tree is now detached if allow-new-zones
or catalog-zones aren't enabled in any views. this reduces memory
consumption while still allowing "rndc showconf -effective" to work.
Evan Hunt [Tue, 11 Nov 2025 23:46:23 +0000 (15:46 -0800)]
save zone configuration as text
as previously mentioned in commit c65b2868ab, a cfg_obj_t
configuration tree structure takes up considerably more space than
the canonical text. since the zone configuration saved in the zone
object using dns_zone_setcfg() is only currently used for "rndc
showzone", it can be saved as text more efficiently than as an
object tree. (and, if a tree were needed, the text could be
re-parsed quickly; zone configuration text is generally small.)
Colin Vidal [Wed, 15 Oct 2025 13:35:59 +0000 (15:35 +0200)]
check-cocci fails in WARNING is found on stderr
As the implicit cast check print "WARNING: ..." on stderr, add a pattern
to make sure that check-cocci would fails if such warning is found on
stderr. This is generic (not specific like the existing "parse error")
so it should be able to support future Coccinelle spatch warnings.
Colin Vidal [Tue, 14 Oct 2025 11:31:44 +0000 (13:31 +0200)]
mdig: fix implicit bool to int cast
The `display_rrcomments` is a tri-state (-1, 0, 1) which is (in some
cases) initialized with `state`, a boolean, through an implicit cast.
This was spot by Coccinelle. Remove the implcit cast by explicitly
assigning 0 or 1 to `display_rrcomments` based on `state` value.
Nicki Křížek [Mon, 10 Nov 2025 15:21:52 +0000 (16:21 +0100)]
new: test: Add isctest.check.ede() helper for pytest
Add a utility function to check for EDE codes present in the DNS
message. The primary benefit of this helper function is that it
handles the compatibility issues with different dnspython versions
and the actual test code doesn't have to deal with that any more.
Merge branch 'nicki/isctest-check-ede-helper' into 'main'
Nicki Křížek [Thu, 30 Oct 2025 17:12:25 +0000 (18:12 +0100)]
Use new EDE helper in existing system tests
Previously, hasattr("extended_errors") was used as a check to detect a
mimumum required dnspython version in order to only perform the EDE
check if a new-enough dnspython was present. This is now abstracted into
isctest.check.ede().
In order to support dnspython<2.2.0, use isctest.compat.EDECode rather
than using dns.edns.EDECode directly.
Nicki Křížek [Thu, 30 Oct 2025 17:08:01 +0000 (18:08 +0100)]
Add isctest.check.ede() helper for pytest
Add a utility function to check for EDE options present in the DNS
message. The primary benefit of this helper function is that it
handles the compatibility issues with different dnspython versions
and the actual test code doesn't have to deal with that any more.
Rather than using the convenience .extended_errors() method
introduced in dnspython 2.7.0, iterate over the options and find
EDEOption types, which is supported from 2.2.0 onwards.
To work around the issue of using dns.edns.EDECode to specify EDE codes
in our tests, create an isctest.compat.EDECode wrapper. This can be used
even with dnspython versions prior to 2.2.0 and will simply result in
no-op, since EDE isn't supported in the older dnspython anyway.
Colin Vidal [Fri, 7 Nov 2025 14:46:15 +0000 (15:46 +0100)]
fix: test: Rewrite views/addzone in loop system test
A part of the `views` system test attempts to add multiples zones in a
loop, and after each zone being added, reconfig the server.
However, the test didn't take into account the fact that the server
might take a bit more time to reload than the script to move to the next
iteration, and in some case the test was re-requesting the server reload
when it was still reloading.
Since `b49f83a3`, `named` explicitly fails to reload when a load/reload
is pending, which is (unless proved otherwise) the reason of the test
was now randomly failing.
That part of the test is now waiting for the server log message saying
the server has added the new zone and is running. Also, that part of the
test has been rewrote in Python.
Closes #5617
Merge branch '5617-rewrite-reload-view-test' into 'main'
Colin Vidal [Fri, 7 Nov 2025 09:45:09 +0000 (10:45 +0100)]
rewrite views/addzone in loop system test
A part of the `views` system test attempts to add multiples zones in a
loop, and after each zone being added, reconfig the server.
However, the test didn't take into account the fact that the server
might take a bit more time to reload than the script to move to the next
iteration, and in some case the test was re-requesting the server reload
when it was still reloading.
Since `b49f83a3`, `named` explicitly fails to reload when a load/reload
is pending, which is (unless proved otherwise) the reason of the test
was now randomly failing.
That part of the test is now waiting for the server log message saying
the server has added the new zone and is running. Also, that part of the
test has been rewrote in Python.
Colin Vidal [Thu, 6 Nov 2025 15:13:29 +0000 (16:13 +0100)]
fix: test: Harden EDE 24 system tests
Harden `ede24` system test in order to avoid random failures, likely caused by timing issues. Also remove expiration-related dead-code (which should have been done in the original ede24 changes) as well as printing the query ID, as this should be useful to debug further flaky system test issues. (In particular, this one, if the changes made here are not enough).
Colin Vidal [Thu, 6 Nov 2025 13:35:33 +0000 (14:35 +0100)]
split ede24 system test into separate modules
Because ede24 system tests require stopping/restarting server, there is
always the risk that the test ends (with a failure) with server in an
wrong and impredictible state. This would make the other tests to fail
in a strange way as well.
To avoid this problem, split the test into different modules, so if a
module fails, the other module is not impacted as it uses separate
server instances.
Colin Vidal [Wed, 5 Nov 2025 14:08:51 +0000 (15:08 +0100)]
harden ede24 system test
There was a random failure of ede24 system test. While this is still a
bit speculative, the two reasons were:
- in the case of `test_ede24_noloaded` the test might attempt to early
(before the zone actually transfered on the secondary server) to query
ns2.
- still in the case of `test_ede24_noloaded`, even after waiting for
transfer succeed logs, if the CI machine is slow, the zone could be
expired before the request checking the secondary zone works because
the expiration time of the zone was very short (1s). Moving this
expiration time to 3 seconds should be enough (while not making the
test execution too much longer when waiting for the zone expiration).
- in the case of `test_ede24_expired`, the zone expired flag is flipped
and the log message is printed immediately after. However, it is
possible that because the flag is set using a relaxed atomic
operation, another thread process the query and gets the previous
(non-expired) value of the flag. In order to workaround this, the
test now also expects another log written after the zone expiration
(stop timers) on the next UV tick.
There is code duplication between `keyfetch` and `nsfetch`, refactor to allow common code paths to differentiate between them. This is in preparation for support of generalized DNS notifications, that will require fetching DSYNC records.
Merge branch 'matthijs-refactor-zone-fetch' into 'main'
Scheduling and rescheduling a zonefetch is also similar. Refactor into
zonefetch functions. This also increments and decrements the zone's
internal reference counter in the same module, which may be less
confusing when reading the code.
Ondřej Surý [Wed, 5 Nov 2025 11:39:43 +0000 (12:39 +0100)]
Fix parser test (missing string termination)
Parser test could crash because the `dumpb2` buffer hasn't explicit C
NULL string termination after dumping the configuration tree in it.
`cfg_printx` does not doing this by default.
Fix the test by comparing only the strings written with strncmp.
Ondřej Surý [Tue, 4 Nov 2025 19:30:08 +0000 (20:30 +0100)]
fix: usr: Skip unsupported algorithms when looking for signing key
A mix of supported and unsupported DNSSEC algorithms in the same zone could have caused validation failures. Ignore the DNSSEC keys with unsupported algorithm when looking for the signing keys.
Closes #5622
Merge branch '5622-dont-fail-on-unsupported-algorithms' into 'main'
Ondřej Surý [Tue, 4 Nov 2025 01:09:38 +0000 (02:09 +0100)]
Skip unsupported algorithms when looking for signing key
When looking for a signing key in select_signing_key(), the result code
indicating unsupported algorithm would abort the search. Instead, skip
such keys and continue searching for the right key.
Co-Authored-By: Aram Sargsyan <aram@isc.org> Co-Authored-By: Petr Menšík <pemensik@redhat.com>
Ondřej Surý [Tue, 4 Nov 2025 18:53:09 +0000 (19:53 +0100)]
fix: dev: Only unlink from SIEVE LRU if it is still linked
Under the overmem conditions, the header could get unlinked from the
SIEVE LRU using a different path. This could lead to double-unlink
which causes assertion failure. Add a guard to ISC_SIEVE_UNLINK() to
unlink only still linked headers.
Closes #5606
Merge branch '5606-fix-assertion-failure-in-overmem-cleaning' into 'main'