Mark Andrews [Fri, 14 Mar 2025 00:45:20 +0000 (00:45 +0000)]
new: usr: dig can now display the received BADVERS message during negotiation
Dig +showbadvers now displays the received BADVERS message and
continues the EDNS version negotiation. Previously to see the
BADVERS message +noednsneg had to be specified which terminated the
EDNS negotiation. Additionally the specified EDNS value (+edns=value)
is now used when making all the initial queries with +trace. i.e EDNS
version negotiation will be performed with each server when performing
the trace.
Closes #5234
Merge branch '5234-have-dig-display-the-badvers-message' into 'main'
Mark Andrews [Tue, 11 Mar 2025 23:02:05 +0000 (10:02 +1100)]
Add "+showbadvers" to dig and reset EDNS version
Add "+showbadvers" to display the BADVERS response similarly
to "+showbadcookie". Additionally reset the EDNS version to
the requested version in "dig +trace" so that EDNS version
negotiation can be tested at all levels of the trace rather
that just when requesting the root nameservers.
Matthijs Mekking [Thu, 13 Mar 2025 08:28:37 +0000 (09:28 +0100)]
Raise max-clients-per-query to be at least
In the case where 'clients-per-query' is larger than
'max-clients-per-query', raise 'max-clients-per-query' so that
'clients-per-query' equals 'max-clients-per-query' and log a warning
that this is what happened.
Colin Vidal [Thu, 13 Mar 2025 11:56:37 +0000 (11:56 +0000)]
new: usr: Add support for EDE 20 (Not Authoritative)
Support was added for EDE codes 20 (Not Authoritative) when client requests recursion (RD) but the server has recursion disabled.
RFC 8914 mention EDE 20 should also be returned if the client doesn't have the RD bit set (and recursion is needed) but it doesn't apply for
BIND as BIND would try to resolve from the "deepest" referral in AUTHORITY section. For example, if the client asks for "www.isc.org/A" but the server only knows the root domain, it will return NOERROR but no answer for "www.isc.og/A", just the list of other servers to ask.
Colin Vidal [Wed, 12 Mar 2025 09:28:27 +0000 (10:28 +0100)]
add support for EDE 20 (Not Authoritative)
Extended DNS Error message EDE 20 (Not Authoritative) is now sent when
client request recursion (RD) but the server has recursion disabled.
RFC 8914 mention EDE 20 should also be returned if the client doesn't
have the RD bit set (and recursion is needed) but it doesn't apply for
BIND as BIND would try to resolve from the "deepest" referral in
AUTHORITY section. For example, if the client asks for "www.isc.org/A"
but the server only knows the root domain, it will returns NOERROR but
no answer for "www.isc.og/A", just the list of other servers to ask.
Colin Vidal [Wed, 12 Mar 2025 09:53:11 +0000 (10:53 +0100)]
add support for EDE 7 and 8
Extended DNS Error messages EDE 7 (expired key) and EDE 8 (validity
period of the key not yet started) are now sent in case of such DNSSEC
validation failures.
Refactor the existing validator extended error APIs in order to make it
easy to have a consisdent extra info (with domain/type) in the various
use case (i.e. when the EDE depends on validator state,
validate_extendederror or when the EDE doesn't depend of any state but
can be called directly in a specific flow).
Matthijs Mekking [Wed, 12 Mar 2025 15:14:52 +0000 (16:14 +0100)]
ksr: Take into account key collisions
When generating new key pairs, one test checks if existing keys that
match the time bundle are selected, rather than extra keys being
generated. Part of the test is to check the verbose output, counting
the number of "Selecting" and "Generating" occurences. But if there
is a key collision, the ksr tool will output that the key already
exists and includes the substring "already exists, or might collide
with another key upon revokation. Generating a new key".
So substract by one the generated counter if there is a "collide"
occurrence.
Ondřej Surý [Wed, 5 Mar 2025 11:20:21 +0000 (11:20 +0000)]
chg: dev: Cleanup parts of the isc_mem API
This MR changes custom attach/detach implementation with refcount macros, replaces isc_mem_destroy() with isc_mem_detach(), and does various small cleanups.
Merge branch 'ondrej/cleanup-isc_mem-api' into 'main'
Replace attach/detach in isc_mem with refcount implementation
The isc_mem API is one of the most commonly used APIs that didn't
used ISC_REFCOUNT_DECL and ISC_REFCOUNT_IMPL macros. Replace the
implementation of isc_mem_attach(), isc_mem_detach() and
isc_mem_destroy() with the respective macros.
This also removes the legacy isc_mem_destroy() functionality that would
check whether all references had been detached from the memory context
as it doesn't work reliably when using the call_rcu() API. Instead of
doing this individually, call isc_mem_checkdestroyed(stderr) from the
isc_mem_destroy() macro to keep the extra check that all contexts were
freed when the program is exiting.
Mark Andrews [Wed, 23 Jun 2021 09:51:51 +0000 (19:51 +1000)]
Implement digest_sig and digest_rrsig for ZONEMD
ZONEMD needs to be able to digest SIG and RRSIG records. The signer
field can be compressed in SIG so we need to call dns_name_digest().
While for RRSIG the records the signer field is not compressed the
canonical form has the signer field downcased (RFC 4034, 6.2). This
also implies that compare_rrsig needs to downcase the signer field
during comparison.
Ondřej Surý [Wed, 5 Mar 2025 06:49:59 +0000 (06:49 +0000)]
fix: dev: Fix the foundname vs dcname madness in qpcache_findzonecut()
The qpcache_findzonecut() accepts two "foundnames": 'foundname' and
'dcname' could be NULL. Originally, when 'dcname' would be NULL, the
'dcname' would be set to 'foundname' which basically means that we were
copying the .ndata over itself for no apparent reason.
Merge branch 'ondrej/refactor-qpcache_findzonecut' into 'main'
Ondřej Surý [Sun, 2 Feb 2025 14:50:15 +0000 (15:50 +0100)]
Fix the foundname vs dcname madness in qpcache_findzonecut()
The qpcache_findzonecut() accepts two "foundnames": 'foundname' and
'dcname' could be NULL. Originally, when 'dcname' would be NULL, the
'dcname' would be set to 'foundname'. Then code like this was present:
Evan Hunt [Sun, 2 Mar 2025 05:03:51 +0000 (21:03 -0800)]
when recording an rr trace, use libtool
when running a system test with the USE_RR environment
variable set to 1, an rr trace is generated for named.
because rr wasn't run using libtool --mode=execute, the
trace would actually be generated for the wrapper script
generated by libtool, not for the actual named binary.
Artem Boldariev [Mon, 3 Mar 2025 10:10:33 +0000 (10:10 +0000)]
fix: dev: Post [CVE-2024-12705] Performance Drop Fixes, Part 2
This merge request addresses several key performance bottlenecks in the DoH (DNS over HTTPS) implementation by introducing significant optimizations and improvements.
### Key Improvements
1. **Simplification and Optimisation of `http_do_bio()` Function**:
- The code flow in the `http_do_bio()` function has been significantly simplified.
2. **Flushing HTTP Write Buffer on Outgoing DNS Messages**:
- The buffer is flushed and a send operation is performed when there is an outgoing DNS message.
3. **Bumping Active Streams Processing Limit**:
- The total number of active streams has been increased to 60% of the total streams limit.
These changes collectively enhance the performance and reliability of the DoH implementation, making it more efficient and robust for handling high-load scenarios, particularly noticeable in long runs (>= 1h) of `stress:long:rpz:doh+udp:linux:*` tests. It improves perf. for tests for BIND 9.18, but it likely will have a positive but less pronounced effect on newer versions as well.
In essence, the merge request fixes three bottlenecks stacked upon each other.
*It is a logical continuation of the merge requests !10109.* !10109, unfortunately, did not completely [address the performance drop in 9.18](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/221545) for longer runs of the stress test. This merge request [addresses that](https://gitlab.isc.org/isc-projects/bind9/-/pipelines/223661).
**P.S.**
The origin of the fixes is, in fact, the branch in !10193. So this MR is a ... *forward port* of them.
Merge branch 'artem-doh-performance-drop-post-fix' into 'main'
Artem Boldariev [Tue, 25 Feb 2025 17:58:24 +0000 (19:58 +0200)]
DoH: Bump the active streams processing limit
This commit bumps the total number of active streams (= the opened
streams for which a request is received, but response is not ready) to
60% of the total streams limit.
The previous limit turned out to be too tight as revealed by
longer (≥1h) runs of "stress:long:rpz:doh+udp:linux:*" tests.
Artem Boldariev [Tue, 25 Feb 2025 07:52:19 +0000 (09:52 +0200)]
DoH: Flush HTTP write buffer on an outgoing DNS message
Previously, the code would try to avoid sending any data regardless of
what it is unless:
a) The flush limit is reached;
b) There are no sends in flight.
This strategy is used to avoid too numerous send requests with little
amount of data. However, it has been proven to be too aggressive and,
in fact, harms performance in some cases (e.g., on longer (≥1h) runs
of "stress:long:rpz:doh+udp:linux:*").
Now, additionally to the listed cases, we also:
c) Flush the buffer and perform a send operation when there is an
outgoing DNS message passed to the code (which is indicated by the
presence of a send callback).
That helps improve performance for "stress:long:rpz:doh+udp:linux:*"
tests.
Artem Boldariev [Mon, 24 Feb 2025 16:32:23 +0000 (18:32 +0200)]
DoH: Limit the number of delayed IO processing requests
Previously, a function for continuing IO processing on the next UV
tick was introduced (http_do_bio_async()). The intention behind this
function was to ensure that http_do_bio() is eventually called at
least once in the future. However, the current implementation allows
queueing multiple such delayed requests needlessly. There is currently
no need for these excessive requests as http_do_bio() can requeue them
if needed. At the same time, each such request can lead to a memory
allocation, particularly in BIND 9.18.
This commit ensures that the number of enqueued delayed IO processing
requests never exceeds one in order to avoid potentially bombarding IO
threads with the delayed requests needlessly.
Artem Boldariev [Thu, 20 Feb 2025 20:08:01 +0000 (22:08 +0200)]
DoH: Simplify http_do_bio()
This commit significantly simplifies the code flow in the
http_do_bio() function, which is responsible for processing incoming
and outgoing HTTP/2 data. It seems that the way it was structured
before was indirectly caused by the presence of the missing callback
calls bug, fixed in 8b8f4d500d9c1d41d95d34a79c8935823978114c.
The change introduced by this commit is known to remove a bottleneck
and allows reproducible and measurable performance improvement for
long runs (>= 1h) of "stress:long:rpz:doh+udp:linux:*" tests.
Additionally, it fixes a similar issue with potentially missing send
callback calls processing and hardens the code against use-after-free
errors related to the session object (they can potentially occur).
Ondřej Surý [Sat, 1 Mar 2025 06:35:58 +0000 (06:35 +0000)]
rem: dev: Cleanup isc/util.h header and friends
Cleanup short list macros from <isc/util.h>, remove two unused headers, move locking macros to respective headers and use only the C11 static assertion.
Merge branch 'ondrej/cleanup-short-macros' into 'main'
Ondřej Surý [Fri, 28 Feb 2025 20:51:18 +0000 (21:51 +0100)]
Remove STATIC_ASSERT variants in favor of the C11 variant
Previously, a gcc < 4.6 shim for _Static_assert() was included. Such an
old compiler is not supported now anyway, so the macro variant has been
removed in favor of a single definition using _Static_assert().
Ondřej Surý [Fri, 28 Feb 2025 20:49:48 +0000 (21:49 +0100)]
Move locking macros into individual headers
Previously, the LOCK()/UNLOCK() and friends macros were defined in the
isc/util.h header. Those macros were moved to their respective headers
as those would have to be included anyway if that particular lock was in
use.
Ondřej Surý [Fri, 28 Feb 2025 20:40:50 +0000 (21:40 +0100)]
Remove superflous header includes from isc/util.h header
Formerly, isc/util.h would pull a few extra headers (isc/list.h,
isc/attributes.h, isc/result.h and errno.h). These includes were
removed in favor of including them directly when used.
Ondřej Surý [Fri, 28 Feb 2025 20:04:21 +0000 (21:04 +0100)]
Remove convenience list macros from isc/util.h
The short convenience list macros were used very sparingly and
inconsistenly in the code base. As the consistency is prefered over
the convenience, all shortened list macro were removed in favor of
their ISC_LIST API targets.
Arаm Sаrgsyаn [Fri, 28 Feb 2025 15:34:05 +0000 (15:34 +0000)]
fix: usr: Fix a bug in the statistics channel when querying zone transfers information
When querying zone transfers information from the statistics channel there was a rare possibility that `named` could terminate unexpectedly if a zone transfer was in a state when transferring from all the available primary servers had failed earlier. This has been fixed.
Closes #5198
Merge branch '5198-dns_remote_curraddr-bug-fix' into 'main'
Aram Sargsyan [Thu, 27 Feb 2025 11:52:30 +0000 (11:52 +0000)]
Fix a bug in dns_zone_getprimaryaddr()
When all the addresses were already iterated over, the
dns_remote_curraddr() function asserts. So before calling it,
dns_zone_getprimaryaddr() now checks the address list using the
dns_remote_done() function. This also means that instead of
returning 'isc_sockaddr_t' it now returns 'isc_result_t' and
writes the primary's address into the provided pointer only when
returning success.
Michal Nowak [Fri, 28 Feb 2025 10:15:06 +0000 (10:15 +0000)]
fix: ci: Fix Clang TSAN reports
Disabling dynamic tags ensures the Clang symbolizer creates a valid TSAN
report. For consistency, also add the option to gcc:tsan so they are
both on the same footing.
Michal Nowak [Thu, 27 Feb 2025 10:32:53 +0000 (11:32 +0100)]
Fix Clang TSAN reports
Disabling new dynamic ELF tags ensures the Clang symbolizer creates
valid TSAN reports. For consistency, also add the option to gcc:tsan so
they are both on the same footing.
Evan Hunt [Thu, 27 Feb 2025 19:01:02 +0000 (19:01 +0000)]
fix: nil: Complete the fix for the import_rdataset() crash
The fix in !10172 was incomplete; there were still some code paths in the resolver that could set the dns_fetchresponse `result` field to the wrong value when the rdataset type was CNAME or DNAME.
Aydın Mercan [Thu, 27 Feb 2025 15:29:07 +0000 (15:29 +0000)]
chg: dev: unify fips handling to isc_crypto and make the toggle one way
Since algorithm fetching is handled purely in libisc, FIPS mode toggling
can be purely done in within the library instead of provider fetching in
the binary for OpenSSL >=3.0.
Disabling FIPS mode isn't a realistic requirement and isn't done
anywhere in the codebase. Make the FIPS mode toggle enable-only to
reflect the situation.
Aydın Mercan [Mon, 16 Dec 2024 12:31:15 +0000 (15:31 +0300)]
unify fips handling to isc_crypto and make the toggle one way
Since algorithm fetching is handled purely in libisc, FIPS mode toggling
can be purely done in within the library instead of provider fetching in
the binary for OpenSSL >=3.0.
Disabling FIPS mode isn't a realistic requirement and isn't done
anywhere in the codebase. Make the FIPS mode toggle enable-only to
reflect the situation.
Nicki Křížek [Thu, 27 Feb 2025 13:30:26 +0000 (13:30 +0000)]
new: ci: Run shotgun tests on MRs
Execute DNS Shotgun performance tests on the regular MRs and compare the changes they introduce against the MR diff base. The results are evaluated automatically - the shotgun jobs will fail if thresholds for CPU/memory/latency difference is exceeded.
Nicki Křížek [Tue, 25 Feb 2025 16:48:05 +0000 (17:48 +0100)]
Replace deprecated only/except with rules in .gitlab-ci.yml
The keyword rules allows more flexible and complex conditions when
deciding whether to create the job and also makes it possible run tweak
variables or job properties depending on arbitraty rules. Since it's
not possible to combine only/except and rules together, replace all
uses of only/except to avoid any potential future issues.
Nicki Křížek [Wed, 19 Feb 2025 15:06:22 +0000 (16:06 +0100)]
Run shotgun tests on MRs
If the shotgun tests are executed for MRs, compare it against the MR's
base rather than the previous release. Only fail the job in case the
performance drops (pass on performance improvements).
Note that start_in optimization was removed, since it isn't properly
supported with rules as of February 2025
(https://gitlab.com/gitlab-org/gitlab/-/issues/424203). Without this
optimization, container test images are likely to be re-built
unnecessarily when testing different protocols. A workaround for the
.gitlab-ci.yml exists, but the extra complexity doesn't seem justified.
The container image builds might change or be optimized in the future,
so let's just go with the build duplication for now.
Arаm Sаrgsyаn [Thu, 27 Feb 2025 09:19:12 +0000 (09:19 +0000)]
fix: usr: Fix TTL issue with ANY queries processed through RPZ "passthru"
Answers to an "ANY" query which were processed by the RPZ "passthru"
policy had the response-policy's `max-policy-ttl` value unexpectedly
applied. This has been fixed.
Closes #5187
Merge branch '5187-rpz-passthru-any-type-ttl-bug-fix' into 'main'
Aram Sargsyan [Wed, 26 Feb 2025 13:32:20 +0000 (13:32 +0000)]
Fix TTL issue with ANY queries processed through RPZ "passthru"
Answers to an "ANY" query which are processed by the RPZ "passthru"
policy have the response-policy's 'max-policy-ttl' value unexpectedly
applied. Do not change the records' TTL when RPZ uses a policy which
does not alter the answer.
Evan Hunt [Wed, 26 Feb 2025 20:34:27 +0000 (20:34 +0000)]
fix: dev: Validating ADB fetches could cause a crash in import_rdataset()
Previously, in some cases, the resolver could return rdatasets of type CNAME or DNAME without the result code being set to `DNS_R_CNAME` or `DNS_R_DNAME`. This could trigger an assertion failure in the ADB. The resolver error has been fixed.
Evan Hunt [Tue, 25 Feb 2025 22:41:41 +0000 (14:41 -0800)]
set eresult based on the type in ncache_adderesult()
when the caching of a negative record failed because of the
presence of a positive one, ncache_adderesult() could override
this to ISC_R_SUCCESS. this could cause CNAME and DNAME responses
to be handled incorrectly. ncache_adderesult() now sets the result
code correctly in such cases.
Evan Hunt [Wed, 26 Feb 2025 00:43:46 +0000 (00:43 +0000)]
fix: dev: Remove 'target' from dns_adb
When a server name turns out to be a CNAME or DNAME, the ADB does not use it, but the `dns_adbname` structure still stored a copy of the target name. This is unnecessary and the code has been removed.
Evan Hunt [Fri, 21 Feb 2025 09:24:42 +0000 (01:24 -0800)]
remove 'target' from dns_adb
the target name parameter to dns_adb_createfind() was always passed as
NULL, so we can safely remove it.
relatedly, the 'target' field in the dns_adbname structure was never
referenced after being set. the 'expire_target' field was used, but
only as a way to check whether an ADB name represents a CNAME or DNAME,
and that information can be stored as a single flag.
The dual-stack-servers configuration option was not working as expected; the specified servers were not being used when they should have been, leading to resolution failures. This has been fixed.
Closes #5019
Merge branch '5019-dual-stack-servers-wasn-t-working-in-all-cases' into 'main'
Mark Andrews [Fri, 1 Nov 2024 03:45:43 +0000 (14:45 +1100)]
Removing now unneeded priming queries
Now that fctx_try is being called when adb returns DNS_ADB_NOMOREADDRESSES
we don't need these priming queries for the dual-stack-servers test
to succeed.
Mark Andrews [Fri, 1 Nov 2024 02:54:52 +0000 (13:54 +1100)]
Fix dual-stack-servers
Named was stopping nameserver address resolution attempts too soon
when dual stack servers are configured. Dual stack servers are
used when there are *not* addresses for the server in a particular
address family so find->status == DNS_ADB_NOMOREADDRESSES is not a
sufficient stopping condition when dual stack servers are available.
Call fctx_try to see if the alternate servers can be used.
Mark Andrews [Tue, 25 Feb 2025 23:39:40 +0000 (23:39 +0000)]
fix: usr: Relax private DNSKEY and RRSIG constraints
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow empty key and signature material after the algorithm identifier for PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within the expected use of these types as no key material is shared and the signatures are ineffective but these are private algorithms and they can be totally insecure.
Closes #5167
Merge branch '5167-relax-private-dnskey-constraints' into 'main'
Mark Andrews [Thu, 6 Feb 2025 23:01:57 +0000 (10:01 +1100)]
Relax private DNSKEY and RRSIG constraints
DNSKEY, KEY, RRSIG and SIG constraints have been relaxed to allow
empty key and signature material after the algorithm identifier for
PRIVATEOID and PRIVATEDNS. It is arguable whether this falls within
the expected use of these types as no key material is shared and
the signatures are ineffective but these are private algorithms and
they can be totally insecure.
Evan Hunt [Tue, 25 Feb 2025 22:40:55 +0000 (22:40 +0000)]
fix: dev: Prevent a reference leak when using plugins
The `NS_QUERY_DONE_BEGIN` and `NS_QUERY_DONE_SEND` plugin hooks could cause a reference leak if they returned `NS_HOOK_RETURN` without cleaning up the query context properly.
Closes #2094
Merge branch '2094-plugin-reference-leak' into 'main'
Evan Hunt [Wed, 22 Jan 2025 01:57:00 +0000 (17:57 -0800)]
prevent a reference leak from the ns_query_done hooks
if the NS_QUERY_DONE_BEGIN or NS_QUERY_DONE_SEND hook is
used in a plugin and returns NS_HOOK_RETURN, some of the
cleanup in ns_query_done() can be skipped over, leading
to reference leaks that can cause named to hang on shut
down.
this has been addressed by adding more housekeeping
code after the cleanup: tag in ns_query_done().
Evan Hunt [Tue, 25 Feb 2025 21:34:31 +0000 (21:34 +0000)]
fix: dev: Simplify some dns_name API calls
Several functions in the `dns_name` module have had parameters removed, that were rarely or never used:
- `dns_name_fromtext()` and `dns_name_concatenate()` no longer take a target buffer.
- `dns_name_towire()` no longer takes a compression offset pointer; this is now part of the compression context.
- `dns_name_towire()` with a `NULL` compression context will copy name data directly into a buffer with no processing.
Evan Hunt [Sat, 22 Feb 2025 08:11:38 +0000 (00:11 -0800)]
simplify dns_name_fromtext() interface
previously, dns_name_fromtext() took both a target name and an
optional target buffer parameter, which could override the name's
dedicated buffer. this interface is unnecessarily complex.
we now have two functions, dns_name_fromtext() to convert text
into a dns_name that has a dedicated buffer, and dns_name_wirefromtext()
to convert text into uncompressed DNS wire format and append it to a
target buffer.
in cases where it really is necessary to have both, we can use
dns_name_fromtext() to load the dns_name, then dns_name_towire()
to append the wire format to the target buffer.
Evan Hunt [Fri, 21 Feb 2025 21:36:57 +0000 (13:36 -0800)]
avoid the 'target' buffer in dns_name_fromtext()
dns_name_fromtext() stores the converted name in the 'name'
passed to it, and optionally also copies it in wire format to
a buffer 'target'. this makes the interface unnecessarily
complex, and could be simplified by having a different function
for each purpose. as a first step, remove uses of the target
buffer in calls to dns_name_fromtext() where it wasn't actually
needed.