Nicki Křížek [Mon, 5 Jan 2026 13:45:06 +0000 (14:45 +0100)]
[CVE-2025-8677] sec: test: Test that DNSSEC validation is aborted on malformed DNSKEY
Create a signed zone file that contains malformed ZSKs with colliding
key tags. The ZSKs don't represent valid ECDSA keys and will cause a
crypto failure when attempting to use them. Sign the zone with KSK, with
the exception of one record which is "signed" with the invalid ZSKs.
Check that the resolver aborts the DNSSEC verification after
encountering the first crypto failure, indicating malformed DNSKEY.
Closes #5343
Merge branch '5343-count-invalid-keys-into-validation-fails-test' into 'main'
Nicki Křížek [Mon, 13 Oct 2025 16:35:33 +0000 (18:35 +0200)]
Test zone with truncated revoked DNSKEY
Ensure that named can handle a situation where the zone is signed with a
truncated, self-signed revoked DNSKEY. The signatures are inevitably
bogus and a SERVFAIL is expected. However, prior to CVE-2025-8677 fix,
this could trigger an assertion failure.
Test that DNSSEC validation is aborted on malformed DNSKEY
Create a signed zone file that contains malformed ZSKs with colliding
key tags. The ZSKs don't represent valid ECDSA keys and will cause a
crypto failure when attempting to use them. Sign the zone with KSK, with
the exception of one record which is "signed" with the invalid ZSKs.
Check that the resolver aborts the DNSSEC verification after
encountering the first crypto failure, indicating malformed DNSKEY.
Štěpán Balážik [Mon, 5 Jan 2026 13:03:11 +0000 (13:03 +0000)]
fix: test: Set default_aa on AsyncDnsServer to False by default
In !11179 I mistakenly set the default for `default_aa` for
`AsyncDnsServer()` to `True` and then explicitly set it to True in
cases where all the `ResponseHandlers` said
`yield DnsResponseSend(..., authoritative=True)` as if the default was
`False`.
Also the rest of `AsyncDnsServer` code (namely `_prepare_responses`)
reads like `default_aa` is `False` by default.
This accidentally changed the behavior of servers which don't set the
`default_aa` and where AA is not set from the zone data
(e.g. `dispatch/ans3`).
Merge branch 'stepan/set-asyncdnsserver-dafault-aa-to-false-by-default' into 'main'
Štěpán Balážik [Fri, 2 Jan 2026 18:05:33 +0000 (19:05 +0100)]
Set default_aa on AsyncDnsServer to False by default
In 6e684d44 I mistakenly set the default for `default_aa` for
`AsyncDnsServer()` to `True` and then explicitly set it to True in
cases where all the `ResponseHandlers` said
`yield DnsResponseSend(..., authoritative=True)` as if the default was
`False`.
Also the rest of `AsyncDnsServer` code (namely `_prepare_responses`)
reads like `default_aa` is `False` by default.
This accidentally changed the behavior of servers which don't set the
`default_aa` and where AA is not set from the zone data
(e.g. `dispatch/ans3`).
Ondřej Surý [Sun, 4 Jan 2026 20:46:05 +0000 (21:46 +0100)]
fix: nil: Fix building on uclibc
While building on uclibc this error is thrown:
In file included from ./include/dns/log.h:20,
from callbacks.c:19:
../../lib/isc/include/isc/log.h:141:9: error: unknown type name ‘off_t’
141 | off_t maximum_size;
| ^~~~~
This is due to missing include unistd.h, so let's add it on top of
isc/log.h
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Merge branch 'fix/uclibc-off_t-main' into 'main'
Giulio Benetti [Sat, 3 Jan 2026 21:59:39 +0000 (22:59 +0100)]
Fix building on uclibc
While building on uclibc this error is thrown:
In file included from ./include/dns/log.h:20,
from callbacks.c:19:
../../lib/isc/include/isc/log.h:141:9: error: unknown type name ‘off_t’
141 | off_t maximum_size;
| ^~~~~
This is due to missing include unistd.h, so let's add it on top of
isc/log.h
Matthijs Mekking [Wed, 31 Dec 2025 10:40:42 +0000 (11:40 +0100)]
Wait for "sending notifies" for step3.zsk-prepub
Commit c17ac426082b2eca802dd1b2e1bb9b4b4b291199 changed some tests to
wait for "zone_needdump" messages instead of "sending notifies", because
notifies are rate limited and "zone_needdump" happen on every change.
However, inspecting the logs, the "zone_needdump" changes happen more
than once (likely because the re-signing is done in batches):
received control channel command 'sign step3.zsk-prepub.manual'
zone_journal: zone step3.zsk-prepub.manual/IN (signed): enter
zone_needdump: zone step3.zsk-prepub.manual/IN (signed): enter
zone_journal: zone step3.zsk-prepub.manual/IN (signed): enter
zone_needdump: zone step3.zsk-prepub.manual/IN (signed): enter
zone_journal: zone step3.zsk-prepub.manual/IN (signed): enter
zone_needdump: zone step3.zsk-prepub.manual/IN (signed): enter
zone step3.zsk-prepub.manual/IN (signed): sending notifies
This means we are running the rollover step checks too fast in some
test runs.
Revert the wait for log change for the rollover-zsk-prepub test.
Matthijs Mekking [Tue, 16 Dec 2025 16:31:24 +0000 (17:31 +0100)]
Change zone set/get options related to notify
Add a type to all dns_zone_(get|set) functions that apply to sending
notifies, so the options can be set and retrieved separately per type.
This affects dns_zone_setnotifydefer, dns_zone_getnotifydefer,
dns_zone_setnotifydelay, dns_zone_getnotifydelay,
dns_zone_setnotifysrc4, and dns_zone_setnotifysrc6.
The functions dns_zone_getnotifysrc4 and dns_zone_getnotifysrc6 are
unused and can be removed.
Petr Špaček [Mon, 28 Jul 2025 09:33:14 +0000 (11:33 +0200)]
Test that spoofed DNAME is not accepted via spoofable transport
A single spoofed DNAME answer can impact many names, and because of the
nature of DNAME, the attacker can use randomized query names to get
unlimited number of tries to spoof the answer. To limit impact, we
should not be accepting DNAME over insecure transport, like UDP without
cookies etc.
In short, the attacker tries to spoof at least one answer that has the
following form:
opcode QUERY
rcode NOERROR
flags QR AA
;QUESTION
trigger$RANDOM.test. IN A
;ANSWER
trigger$RANDOM.test. 3600 IN CNAME trigger$RANDOM.attacker.net.
test. 3600 IN DNAME attacker.net.
;AUTHORITY
;ADDITIONAL
Petr Špaček [Wed, 23 Jul 2025 18:26:43 +0000 (20:26 +0200)]
Test that fake child delegation cannot overwrite parent's glue RR
In short, the attacker tries to spoof at least one answer that has the
following form:
rcode NOERROR
flags QR
;QUESTION
trigger$RANDOM.victim. IN TXT
;ANSWER
;AUTHORITY
trigger$RANDOM.victim. 3600 IN NS ns.victim.
;ADDITIONAL
ns.victim. 3600 IN A 10.53.0.3
This attack was originally reported as "test case 2".
Petr Špaček [Wed, 23 Jul 2025 15:25:18 +0000 (17:25 +0200)]
Test that unsolicited NS in positive answer cannot overwrite current NS
Before the fixes for CVE-2025-40778, an unsolicited in-bailiwick NS
record was accepted from a (spoofed) answer, enabling a single spoofed A
query/response to redirect traffic for a whole delegation.
In short, the attacker tries to spoof at least one answer that has the
following form:
rcode NOERROR
flags QR AA
;QUESTION
trigger$RANDOM.victim. IN TXT
;ANSWER
trigger$RANDOM.victim. 3600 IN TXT "spoofed answer with extra NS"
;AUTHORITY
victim. 3600 IN NS ns.attacker.
;ADDITIONAL
This attack was originally reported as "test case 1".
Petr Špaček [Fri, 11 Jul 2025 16:37:57 +0000 (18:37 +0200)]
Test that positive answer cannot overwrite sibling NS RRs
Before the fixes for CVE-2025-40778, a positive answer was allowed to
overwrite sibling NS RRs. The answer had to be a positive AA=1 answer
with a fake NS along with it. This combination of conditions avoided
the code path with "unrelated <RRTYPE>" detection logic.
If it were some other answer, named from the main branch would detect
the attempt and log:
DNS format error from 10.53.0.1#16386 resolving trigger/A for <unknown>: unrelated NS victim in trigger authority section
In short, the attacker tries to spoof at least one answer that has the
following form:
opcode QUERY
rcode NOERROR
flags QR AA
;QUESTION
trigger$RANDOM. IN A
;ANSWER
trigger$RANDOM. 3600 IN A 10.53.0.3
;AUTHORITY
victim. 3600 IN NS ns.attacker.
;ADDITIONAL
ns.attacker. 3600 IN A 10.53.0.3
This attack was originally reported as "test case 1c".
Michał Kępień [Mon, 22 Dec 2025 10:58:39 +0000 (11:58 +0100)]
Add a reusable, bare-bones AsyncDnsServer
Add bin/tests/system/ans.py, a bare-bones DNS server that can be used in
system tests instead of full-blown named instances when a server is only
required to return zone-based data. Where applicable, this reduces load
on the test host and the amount of generated logs.
Mark Andrews [Mon, 22 Dec 2025 02:31:09 +0000 (13:31 +1100)]
Tidy up (fixed)names in dsyncfetch_start
Use a static dns_name_t for the "_dsync" label. Remove some
unnecessary dns_fixedname_t variables. Remove unnecessary dsyncname
dns_name_t from dns_dsyncfetch and rename dns_fixedname_t fname to
dsyncname.
Due to the way various asyncio-related objects (tasks, streams,
transports, selectors) are referencing each other, pausing reads for a
TCP transport (which in practice means removing the client socket from
the set of descriptors monitored by a selector) can cause the client
task (AsyncDnsServer._handle_tcp()) to be prematurely garbage-collected,
causing asyncio code to raise a "Task was destroyed but it is pending!"
exception. Who knew that solutions as elegant as the one introduced by e4078885073a6c5b59729f4313108e3e7637efdb could cause unexpected trouble?
Fix by making a horrible hack even more horrible, specifically by
keeping a reference to each incoming TCP connection to protect its
related asyncio objects from getting garbage-collected. This prevents
AsyncDnsServer from closing any of the ignored TCP connections
indefinitely, which is obviously a pretty brain-dead idea for a
production-grade DNS server, but AsyncDnsServer was never meant to be
one and this hack reliably solves the problem at hand.
Only apply this change for the IgnoreAllConnections handler as the
ConnectionReset handler triggers a connection reset immediately after
pausing reads for an incoming TCP connection.
As pointed out in e4078885073a6c5b59729f4313108e3e7637efdb, the proper
solution would require implementing a custom asyncio transport from
scratch and that is still deemed to be too much work for the purpose at
hand. Let's see how much longer we can limp along with the existing
approach.
Michał Kępień [Sun, 21 Dec 2025 05:25:56 +0000 (06:25 +0100)]
Make exception/signal handlers idempotent
Calling asyncio.Future.set_exception() or asyncio.Future.set_result()
more than once for a given Future object raises an
asyncio.InvalidStateError exception.
In the case of AsyncServer:
- it is enough to capture the first exception raised by higher-level
logic as no exceptions at all are expected to be raised in the first
place,
- no distinction is made between SIGINT and SIGTERM; the only purpose
of the signal handler is to make the server exit cleanly.
Given the above, make both AsyncServer._handle_exception() and
AsyncServer._signal_done() idempotent by ignoring
asyncio.InvalidStateError exceptions raised by the relevant
asyncio.Future.set_*() calls.
Štěpán Balážik [Fri, 19 Dec 2025 19:02:17 +0000 (19:02 +0000)]
chg: ci: Use CMocka generated JUnit reports where possible
Where applicable, use the more detailed CMocka generated JUnit
reports which include subtest results and timings instead of the
one generated by Meson.
Prerequisites:
- bind9-qa!137
Closes #5511
Merge branch '5511-cmocka-junit-ouput' into 'main'
Štěpán Balážik [Wed, 15 Oct 2025 17:23:59 +0000 (19:23 +0200)]
Use CMocka generated JUnit reports where possible
Where applicable, use the more detailed CMocka generated JUnit
reports which include subtest results and timings instead of the
one generated by Meson.
Flaky tests also require retrying, so use a wrapper and mark them
with a environment variable. This is done to avoid the need to compute
an intersection of suites in Meson which is not supported out-of-the-box
(`meson test --suite=foo,bar` runs the union of foo and bar).
Matthijs Mekking [Fri, 19 Dec 2025 16:33:53 +0000 (16:33 +0000)]
fix: usr: Reconfigure NSEC3 opt-out zone to NSEC causes zone to be invalid
A zone that is signed with NSEC3, opt-out enabled, and then reconfigured to use NSEC, causes the zone to be published with missing NSEC records. This has been fixed.
Closes #5679
Merge branch '5679-nsec3-optout-to-nsec' into 'main'
When switching from NSEC3 opt-out to NSEC, add NSEC records if we saw an
RR. This corrects a mistake in style cleanups done in commit 308ab1b4a5c5239860ca06c64b0def9b98ae4b17.
If we change from NSEC3 to NSEC we should not produce a zone with
missing NSEC records.
The code only considered having seen a record if there was previously
a signature present at the owner name. However with opt-out, insecure
delegations don't have a RRSIG record. Reconfiguring to NSEC causes
all insecure delegations to have a missing NSEC record.
Add a DNAME record to the test zone to also cover DNAME delegations.
In a sense, the ans6 black holeserver, based on asyncserver, "does
nothing". In our case, it won't respond to any query, and if the
IgnoreAllConnections connection handler was installed, it would not read
anything from the client socket.
Previously, sending notifications to an unconfigured address resulted in
no communication from the target (10.53.10.53); hence, the ns3
configuration comment requested a "non-responsive notify recipient (no
reply, no ICMP errors)".
However, examining the PCAP of ans6 reveals some communication from the
10.53.0.6 server to the 10.53.0.3 client, including ICMP Destination
Unreachable (Port Unreachable), and TCP SYN/ACK.
The ans6 communication seems to be sufficiently different to touch
different code paths in named, resulting in the BIND 9.20 backport
failing in the "checking notify retries expire within 30 seconds" test.
But we better revert it from "main" as well.
Matthijs Mekking [Fri, 19 Dec 2025 14:46:23 +0000 (14:46 +0000)]
new: usr: Add support for Generalized DNS Notifications
A new configuration option, ``notify-cfg CDS``, is added to enable Generalized DNS Notifications for CDS and/or CDNSKEY RRset changes, as specified in RFC 9859.
Closes #5611
Merge branch '5611-generalized-dns-notifications-rfc-9859' into 'main'
Matthijs Mekking [Fri, 12 Dec 2025 14:49:19 +0000 (15:49 +0100)]
Test invalid DSYNC RRset is rejected
The RFC says There MUST NOT be more than one DSYNC record for each
combination of RRtype and Scheme. If we encounter more we should drop
the response, as the DSYNC RRset is invalid.
When doing rollover and the CDS/CDNSKEY RRset is updated, test that a
NOTIFY(CDS) message is sent. For other steps in the rollover, prohibit
any dsyncfetch activity.
Matthijs Mekking [Tue, 25 Nov 2025 07:56:32 +0000 (08:56 +0100)]
Test sending NOTIFY(CDS) messages
When starting up the services, send notifies for the existing CDS RRset.
This requires setting up a chain of trust for the test, so the DSYNC
records can be retrieved and validated.
This feature requires enabling 'notify-cds' and 'dnssec-validation'.
In this test, the scanner is pointed to ns2. Since there is no code
for receiving NOTIFY(CDS) messages for delegations, this is treated
as "not authoritative". Checking for this log message ensures us that
the NOTIFY(CDS) message was actually sent.
Matthijs Mekking [Thu, 30 Oct 2025 08:48:35 +0000 (09:48 +0100)]
Implement NOTIFY(CDS) logic
When the CDS/CDNSKEY RRset gets updated, schedule a NOTIFY(CDS) to be
sent to the parental agent. The parental agent is published in the
parent zone as a DSYNC RRset, so first we need to figure out the
parent owner name. This is done by finding the zonecut (querying for
NS RRset until we find a postive answer).
In nsfetch_dsync, we then schedule a zone fetch for the DSYNC record
at <child-labels>._dsync.<parent-labels>. Then we queue the notify
for each target in the DSYNC records that matches the NOTIFY scheme
and CDS RRtype.
Now that we log the type of the notify, some expected log messages
in the system tests need to be adjusted accordingly.
The bin/tests/system/nsec3/tests_nsec3_retransfer.py log is changed
to zone_needdump because it is more reliable. Other tests were
adjusted similar in MR !11265, but !11226 introduced a new
"sending notify" log line.
Matthijs Mekking [Tue, 18 Nov 2025 08:56:34 +0000 (09:56 +0100)]
Add type parameter to dns_notify_create()
With Generalized DNS Notifications, a zone may need to send different
type of NOTIFY messages for different reasons. When creating a new
notify, allow for specifying the type.
Matthijs Mekking [Tue, 28 Oct 2025 14:25:29 +0000 (15:25 +0100)]
Add port parameter to dns_notify_create()
The DSYNC record has a Port rdata field, so NOTIFY(CDS) messages may be
configured at different ports. When creating a new notify, allow for
specifying the port.
Matthijs Mekking [Tue, 28 Oct 2025 07:30:05 +0000 (08:30 +0100)]
Maintain separate notify contexts for SOA and CDS
With Generalized DNS Notifications, a zone may need to send different
NOTIFY messages for different reasons. Introduce a method to
initialize a notify context and maintain a notify contexts per RRtype.
Matthijs Mekking [Fri, 28 Nov 2025 12:42:28 +0000 (13:42 +0100)]
rollover-zsk-prepub: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM@ in ns3/kasp.conf.j2 to
ecdsa256 and rename to ns3/kasp.conf.
Matthijs Mekking [Fri, 28 Nov 2025 10:38:06 +0000 (11:38 +0100)]
rollover-ksk-doubleksk: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM@ in ns3/kasp.conf.j2 to
ecdsa256 and rename to ns3/kasp.conf.
Matthijs Mekking [Fri, 28 Nov 2025 09:43:42 +0000 (10:43 +0100)]
rollover-going-insecure: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM@ in ns3/kasp.conf.j2 to
ecdsa256 and rename to ns3/kasp.conf.
Now we have to fake different lifetimes, so adjust fake_lifetime
to update a single key.
Note that we have changed the setup slightly: We also sign the
step2 zones, but with post validation disabled. This is more
accurate because we need to test that the public keys and signatures
are being removed from the zone.
Matthijs Mekking [Fri, 28 Nov 2025 08:59:51 +0000 (09:59 +0100)]
rollover-enable-dnssec: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM_NUMBER@ in ns3/kasp.conf.j2 to
13 and rename to ns3/kasp.conf.
This test introduces an unsigned delegation, adjust render_and_sign_zone
and configure_tld accordingly.
Matthijs Mekking [Thu, 27 Nov 2025 13:01:28 +0000 (14:01 +0100)]
rollover-csk-roll1: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM@ in ns3/kasp.conf.j2 to
ecdsa256 and rename to ns3/kasp.conf.
Write a python method to set the key predecessor/successor relationship
into the key state files.
Matthijs Mekking [Thu, 27 Nov 2025 11:11:35 +0000 (12:11 +0100)]
rollover-algo-ksk-zsk: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
The RSASHA256 keys are generated with dnssec-keygen, without a policy
provided. Thus we have to fake the lifetime for these keys.
Signing has to be done without the -z option, because the KSK should
not sign all records in case of a KSK/ZSK split. Update the signing
code to allow for extra options when signing with CSK only.
Matthijs Mekking [Thu, 27 Nov 2025 09:37:22 +0000 (10:37 +0100)]
rollover-algo-csk: From setup.sh to pytest bootstrap
Symlink ns1 and ns2 to rollover/ns1 and rollover/ns2.
Symlink ns3/template.db.j2.manual to rollover/ns3/template.db.j2.manual.
Since the bootstrapping is done before the templates are rendered
automatically, replace @DEFAULT_ALGORITHM@ in ns3/csk2.conf.j2 to
ecdsa256 and rename to ns3/csk2.conf.
Matthijs Mekking [Tue, 25 Nov 2025 10:17:40 +0000 (11:17 +0100)]
rollover: From setup.sh to pytest bootstrap
Introduce rollover/setup.py for all setup related test code.
Introduce rollover/ns1 and rollover/ns2 to create a chain of trust to
all rollover related test zones. The tld zones in rollover/ns2 contain
a DSYNC record that at a later time will be used for testing Generalized
DNS Notifications.
Write a python version of private_type_record so we can put such
records in the zone via jinja2 templating.
Matthijs Mekking [Tue, 25 Nov 2025 09:54:57 +0000 (10:54 +0100)]
Move ns6 to ns3 in rollover tests
There is no difference, so we are going to make it consistent. This will
make it easier to add a chain of trust for these zones (to be done in
a future commit).
Štěpán Balážik [Wed, 12 Nov 2025 15:19:25 +0000 (16:19 +0100)]
Allow ResponseHandlers to roll back changes made to a response
Previously, this was only possible by making a new response by calling
make_response on qctx.query. This however ignored the `default_aa` and
`default_rcode` parameters of AsyncDnsServer.
Add prepare_new_response and save_initialized_response methods to
QueryContext.
Štěpán Balážik [Wed, 12 Nov 2025 15:02:48 +0000 (16:02 +0100)]
Add TSIG keyring support to AsyncDnsServer
Previously, ResponseHandlers had to reparse the queries themselves if
they wanted to use TSIG. This led to `default_aa` and `default_rcode`
information being lost from the newly created messages.
Add support for TSIG keyrings to the AsyncDnsServer class directly.
Štěpán Balážik [Wed, 29 Oct 2025 17:59:31 +0000 (18:59 +0100)]
Allow users of AsyncDnsServer to set AA bit for all responses
Previously, all responses had to be set as authoritative explicitly
using DnsResponseSend(..., authoritative=True). After using this,
it became obvious that this is obnoxious.
Add an optional keyword-only parameter to AsyncDnsServer that sets the
default value of the AA bit on outgoing responses.
Make all the other parameters keyword-only as well.
Štěpán Balážik [Thu, 30 Oct 2025 12:41:23 +0000 (13:41 +0100)]
Refactor ControllableAsyncDnsServer setup
When this class was introduced, the constructor of its base class had no
parameters. This was changed in the meantime and these parameters were
not accessible by users of the subclass.
Don't override the constructor.
Move command setup to methods.
Move subclass-specific storage to cached properties.
Take instances of Command instead of the classes themselves for
symmetry with install_response_handler.
Arаm Sаrgsyаn [Wed, 17 Dec 2025 14:55:43 +0000 (14:55 +0000)]
fix: usr: Fix a possible catalog zone issue during reconfiguration
The :iscman:`named` process could terminate unexpectedly during
reconfiguration when a catalog zone update was taking place at
the same time. This has been fixed.
Merge branch 'aram/catz-reconfig-crash-fix' into 'main'
Aram Sargsyan [Fri, 5 Dec 2025 10:06:28 +0000 (10:06 +0000)]
Lock the catalog zone when reconfiguring it
A catalog zone is updated in an offloaded thread, which is not
stopped during a reconfiguration in an exclusive mode, and so
can cause a race condition with it.
Waiting for the offloaded threads to complete their work before
entering into the exclusive mode can potentially cause unwanted
delays, because offloaded threads are generally "allowed" to take
a longer amount of time before they complete.
Add a dns_catz_zone_prereconfig()/dns_catz_zone_postreconfig() pair
of functions which currently just lock the catalog zone when
reconfiguring it. The change should eliminate the race.
As a side note, there was already a similar pair of functions,
dns_catz_prereconfig() and dns_catz_postreconfig() which are called
before and after reconfiguring a 'dns_catz_zones_t' object.
Below are the stack traces of the reconfiguration thread which has
asserted, and a catalog zone update thread which was caught in the
middle of its work despite the fact that the exclusive mode is
turned on.