fix: usr: Fix a statistics channel counter bug when 'forward only' zones are used
When resolving a zone with a 'forward only' policy, and
finding out that all the forwarders are marked as "bad",
the 'ServerQuota' counter of the statistics channel was
incorrectly increased. This has been fixed.
Closes #1793
Merge branch '1793-serverquota-counter-bug-with-forward-only' into 'main'
Add a statistics channel check in the forward system test
Check that the fix in the previous commit works and that the
'ServerQuota' counter in the statistics channel is still unset
after a SERVFAIL result in a 'forward only' zone.
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.
Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.
Mark Andrews [Mon, 16 Sep 2024 02:49:11 +0000 (02:49 +0000)]
chg: dev: Remove statslock from dnssec-signzone
Silence Coverity CID 468757 and 468767 (DATA RACE read not locked) by converting dnssec-signzone to use atomics for statistics counters rather than using a lock.
Closes #4939
Merge branch '4939-remove-stats-lock-from-dnssec-signzone' into 'main'
Mark Andrews [Fri, 13 Sep 2024 03:30:34 +0000 (13:30 +1000)]
Remove 'statslock' from dnssec-signzone
Silence Coverity CID 468757 and 468767 (DATA RACE read not locked)
by converting dnssec-signzone to use atomics for statistics counters
rather than using a lock. This should be marginally faster than
using the lock as well when statistics are requested.
fix: usr: Separate DNSSEC validation from the long-running tasks
As part of the KeyTrap \[CVE-2023-50387\] mitigation, the DNSSEC CPU-intensive operations were offloaded to a separate threadpool that we use to run other tasks that could affect the networking latency.
If that threadpool is running some long-running tasks like RPZ, catalog zone processing, or zone file operations, it would delay DNSSEC validations to a point where the resolving signed DNS records would fail.
Split the CPU-intensive and long-running tasks into separate threadpools in a way that the long-running tasks don't block the CPU-intensive operations.
Closes #4898
Merge branch '4898-move-offloaded-DNSSEC-to-own-threads' into 'main'
Move offloaded DNSSEC operations to different helper threads
Currently, the isc_work API is overloaded. It runs both the
CPU-intensive operations like DNSSEC validations and long-term tasks
like RPZ processing, CATZ processing, zone file loading/dumping and few
others.
Under specific circumstances, when many large zones are being loaded, or
RPZ zones processed, this stops the CPU-intensive tasks and the DNSSEC
validation is practically stopped until the long-running tasks are
finished.
As this is undesireable, this commit moves the CPU-intensive operations
from the isc_work API to the isc_helper API that only runs fast memory
cleanups now.
Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.
Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop. In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.
fix: dev: Fix data race in offloaded dns_message_checksig()
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
Closes #4929
Merge branch '4929-data-race-in-dns_dnssec_verifymessage-memmove' into 'main'
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
rem: usr: Remove "port" from source address options
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.
Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.
Closes #3843
Merge branch '3843-remove-deprecated-source-port-options' into 'main'
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.
Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.
Mark Andrews [Thu, 12 Sep 2024 03:27:10 +0000 (03:27 +0000)]
fix: usr: Don't allow statistics-channel if libxml2 and libjson-c are unsupported
When the libxml2 and libjson-c libraries are not supported, the statistics channel can't return anything useful, so it is now disabled. Use of `statistics-channel` in `named.conf` is a fatal error.
Closes #4895
Merge branch '4895-link-style-sheet-to-libxml2-support' into 'main'
Mark Andrews [Wed, 11 Sep 2024 23:05:34 +0000 (23:05 +0000)]
fix: test: The statschannel tests fails if one of libxml2 or json-c is configured
The `statschannel` system test failed if only one of `libxml2` or `json-c` is
available / configured as checks were being run against the non available
statistics page.
Closes #4919
Merge branch '4919-fix-statschannel-system-test' into 'main'
chg: usr: allow IXFR-to-AXFR fallback on DNS_R_TOOMANYRECORDS
This change allows fallback from an IXFR failure to AXFR when the reason is `DNS_R_TOOMANYRECORDS`. This is because this error condition could be temporary only in an intermediate version of IXFR transactions and it's possible that the latest version of the zone doesn't have that condition. In such a case, the secondary would never be able to update the zone (even if it could) without this fallback.
This fallback behavior is particularly useful with the recently introduced `max-records-per-type` and `max-types-per-name` options: the primary may not have these limitations and may temporarily introduce "too many" records, breaking IXFR. If the primary side subsequently deletes these records, this fallback will help recover the zone transfer failure automatically; without it, the secondary side would first need to increase the limit, which requires more operational overhead and has its own adverse effect.
Closes #4928
Merge branch 'fallback-ixfr-to-axfr-on-toomanyrecords' into 'main'
JINMEI Tatuya [Fri, 16 Aug 2024 07:53:38 +0000 (16:53 +0900)]
allow IXFR-to-AXFR fallback on DNS_R_TOOMANYRECORDS
This change allows fallback from an IXFR failure to AXFR when the
reason is DNS_R_TOOMANYRECORDS. This is because this error condition
could be temporary only in an intermediate version of IXFR
transactions and it's possible that the latest version of the zone
doesn't have that condition. In such a case, the secondary would never
be able to update the zone (even if it could) without this fallback.
This fallback behavior is particularly useful with the recently
introduced max-records-per-type and max-types-per-name options:
the primary may not have these limitations and may temporarily
introduce "too many" records, breaking IXFR. If the primary side
subsequently deletes these records, this fallback will help recover
the zone transfer failure automatically; without it, the secondary
side would first need to increase the limit, which requires more
operational overhead and has its own adverse effect.
This change also fixes a minor glitch that DNS_R_TOOMANYRECORDS wasn't
logged in xfrin_fail.
The rcu_xchg_pointer() function can be used outside of a critical
section, and usually must be followed by a synchronize_rcu() or
call_rcu() call to detach from the resource, unless if there are
some guarantees in place because of our own reference counting.
Mark Andrews [Tue, 10 Sep 2024 00:08:51 +0000 (00:08 +0000)]
new: usr: Add flag to named-checkconf to ignore "not configured" errors
`named-checkconf` now takes "-n" to ignore "not configured" errors. This allows named-checkconf to check the syntax of configurations from other builds which have support for more options.
Merge branch '4913-add-option-to-named-checkconf-to-override-notconfigured-flag' into 'main'
Mark Andrews [Mon, 2 Sep 2024 06:03:17 +0000 (16:03 +1000)]
Add flag to named-checkconf to ignore "not configured" errors
named-checkconf now takes "-n" to ignore "not configured" errors.
This allows named-checkconf to check the syntax of configurations
from other builds which have support for more options.
This file was initially created for unit testing, but later code was added to generate the file. The static file should have been removed from the git repo.
Closes #4916
Merge branch '4916-skr-unit-test-rm-test-file' into 'main'
This file was initially created for unit testing, but later code was
added to generate the file. The static file should have been removed
from the git repo.
fix: usr: Fix bug in Offline KSK that is using ZSK with unlimited lifetime
If the ZSK has unlimited lifetime, the timing metadata "Inactive" and "Delete" cannot be found and is treated as an error, preventing the zone to be signed. This has been fixed.
Closes #4914
Merge branch '4914-offline-ksk-zsk-lifetime-unlimited-bug' into 'main'
If the ZSK has lifetime unlimited, the timing metadata "Inactive" and
"Delete" cannot be found and is treated as an error. Fix by allowing
these metadata to not exist.
Evan Hunt [Thu, 29 Aug 2024 18:11:15 +0000 (18:11 +0000)]
fix: usr: Delay release of root privileges until after configuring controls
Delay relinquishing root privileges until the control channel has been configured, for the benefit of systems that require root to use privileged port numbers. This mostly affects systems without fine-grained privilege systems (i.e., other than Linux).
Closes #4793
Merge branch '4793-bind-9-19-24-not-listening-to-rndc-port-953-on-localhost' into 'main'
Delay release of root privileges until after configuring controls
On systems where root access is needed to configure privileged
ports, we don't want to fully relinquish root privileges until
after the control channel (which typically runs on port 953) has
been established.
named_os_changeuser() now takes a boolean argument 'permanent'.
This allows us to switch the effective userid temporarily with
named_os_changeuser(false) and restore it with named_os_restoreuser(),
before permanently dropping privileges with named_os_changeuser(true).
Ondřej Surý [Thu, 29 Aug 2024 14:43:34 +0000 (14:43 +0000)]
chg: usr: Follow the number of CPU set by taskset/cpuset
Administrators may wish to constrain the set of cores that BIND 9 runs on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on other O/S).
If the admin has used taskset, the `named` will now follow to automatically use the given number of CPUs rather than the system wide count.
Closes #4884
Merge branch '4884-use-cpuset-to-get-number-of-cpus' into 'main'
Ondřej Surý [Thu, 22 Aug 2024 15:23:09 +0000 (17:23 +0200)]
Follow the number of CPU set by taskset/cpuset
Administrators may wish to constrain the set of cores that BIND 9 runs
on via the 'taskset', 'cpuset' or 'numactl' programs (or equivalent on
other O/S), for example to achieve higher (or more stable) performance
by more closely associating threads with individual NIC rx queues. If
the admin has used taskset, it follows that BIND ought to
automatically use the given number of CPUs rather than the system wide
count.
Michal Nowak [Thu, 29 Aug 2024 14:38:06 +0000 (14:38 +0000)]
chg: test: Bump max-recursion-queries to 100 in resolver system test
With max-recursion-queries set to 50 the resolver system test was
unstable in the "checking query resolution for a domain with a valid
glueless delegation chain" check as ns1 replied with SERVFAIL.
Closes #4897
Merge branch '4897-resolver-ns1-max-recursion-queries-100' into 'main'
Michal Nowak [Mon, 26 Aug 2024 15:56:56 +0000 (17:56 +0200)]
Bump max-recursion-queries to 100 in resolver system test
With max-recursion-queries set to 50 the resolver system test was
unstable in the "checking query resolution for a domain with a valid
glueless delegation chain" check as ns1 replied with SERVFAIL.
Mark Andrews [Thu, 29 Aug 2024 13:24:09 +0000 (13:24 +0000)]
fix: chg: Improve performance when looking for the closest encloser when returning NSEC3 proofs
Use the fact that the database returns the longest matching part of the requested name to find the required NSEC3 record. If there are multiple versions present in the database we may have to search further.
Closes #4460
Merge branch '4460-auth-nsec3-many-labels' into 'main'
Mark Andrews [Mon, 4 Dec 2023 06:15:41 +0000 (17:15 +1100)]
Return partial match when requested
Return partial match from dns_db_find/dns_db_find when requested
to short circuit the closest encloser discover process. Most of the
time this will be the actual closest encloser but may not be when
there yet to be committed / cleaned up versions of the zone with
names below the actual closest encloser.
Mark Andrews [Thu, 29 Aug 2024 12:46:12 +0000 (12:46 +0000)]
fix: Accessing fctx->state without holding lock
Move lock earlier in the call sequence to address access without lock report.
```
1559 /*
1560 * Caller must be holding the fctx lock.
1561 */
CID 468796: (#1 of 1): Data race condition (MISSING_LOCK)
1. missing_lock: Accessing fctx->state without holding lock fetchctx.lock. Elsewhere, fetchctx.state is written to with fetchctx.lock held 2 out of 2 times.
1562 REQUIRE(fctx->state == fetchstate_done);
1563
1564 FCTXTRACE("sendevents");
1565
1566 LOCK(&fctx->lock);
1567
```
Closes #4902
Merge branch '4902-accessing-fctx-state-without-holding-lock' into 'main'
Mark Andrews [Wed, 28 Aug 2024 03:07:54 +0000 (13:07 +1000)]
Move lock earlier in the call sequence
fctx->state should be read with the lock held.
1559 /*
1560 * Caller must be holding the fctx lock.
1561 */
CID 468796: (#1 of 1): Data race condition (MISSING_LOCK)
1. missing_lock: Accessing fctx->state without holding lock fetchctx.lock.
Elsewhere, fetchctx.state is written to with fetchctx.lock held 2 out of 2 times.
1562 REQUIRE(fctx->state == fetchstate_done);
1563
1564 FCTXTRACE("sendevents");
1565
1566 LOCK(&fctx->lock);
1567
Arаm Sаrgsyаn [Mon, 26 Aug 2024 15:50:50 +0000 (15:50 +0000)]
chg: usr: Exempt prefetches from the fetches-per-zone and fetches-per-server quotas
Fetches generated automatically as a result of 'prefetch' are now
exempt from the 'fetches-per-zone' and 'fetches-per-server' quotas.
This should help in maintaining the cache from which query responses
can be given.
Closes #4219
Merge branch '4219-exempt-good-queries-from-fetch-limits' into 'main'
Aram Sargsyan [Fri, 7 Jun 2024 16:24:00 +0000 (16:24 +0000)]
Exempt prefetches from the fetches-per-server quota
Give prefetches a free pass through the quota so that the cache
entries for popular zones could be updated successfully even if the
quota for is already reached.
Aram Sargsyan [Fri, 7 Jun 2024 16:19:40 +0000 (16:19 +0000)]
Exempt prefetches from the fetches-per-zone quota
Give prefetches a free pass through the quota so that the cache entry
for a popular zone could be updated successfully even if the quota for
it is already reached.
Ondřej Surý [Mon, 26 Aug 2024 15:01:03 +0000 (15:01 +0000)]
fix: dev: Stop using malloc_usable_size and malloc_size
The `malloc_usable_size()` can return size larger than originally allocated and when these sizes disagree the fortifier enabled by `_FORTIFY_SOURCE=3` detects overflow and stops the `named` execution abruptly. Stop using these convenience functions as they are primary used for introspection-only.
Closes #4880
Merge branch '4880-dont-use-malloc_usable_size' into 'main'
Ondřej Surý [Fri, 23 Aug 2024 04:02:00 +0000 (06:02 +0200)]
Stop using malloc_usable_size and malloc_size
Although the nanual page of malloc_usable_size says:
Although the excess bytes can be over‐written by the application
without ill effects, this is not good programming practice: the
number of excess bytes in an allocation depends on the underlying
implementation.
it looks like the premise is broken with _FORTIFY_SOURCE=3 on newer
systems and it might return a value that causes program to stop with
"buffer overflow" detected from the _FORTIFY_SOURCE. As we do have own
implementation that tracks the allocation size that we can use to track
the allocation size, we can stop relying on this introspection function.
Also the newer manual page for malloc_usable_size changed the NOTES to:
The value returned by malloc_usable_size() may be greater than the
requested size of the allocation because of various internal
implementation details, none of which the programmer should rely on.
This function is intended to only be used for diagnostics and
statistics; writing to the excess memory without first calling
realloc(3) to resize the allocation is not supported. The returned
value is only valid at the time of the call.
Remove usage of both malloc_usable_size() and malloc_size() to be on the
safe size and only use the internal size tracking mechanism when
jemalloc is not available.
Michal Nowak [Mon, 26 Aug 2024 14:28:47 +0000 (14:28 +0000)]
chg: ci: Drop removed system tests from cross-version-config-tests
The cross-version-config-tests job fails when a system test is removed
from the upcoming release. To avoid this, remove the system test also
from the $BIND_BASELINE_VERSION.
See the failure mode at https://gitlab.isc.org/isc-projects/bind9/-/jobs/4668947.
Merge branch 'mnowak/remove-dialup-from-cross-version-config-tests-job' into 'main'
Michal Nowak [Mon, 26 Aug 2024 11:41:47 +0000 (13:41 +0200)]
Drop removed system tests from $BIND_BASELINE_VERSION
The cross-version-config-tests job fails when a system test is removed
from the upcoming release. To avoid this, remove the system test also
from the $BIND_BASELINE_VERSION.
James Addison [Sun, 25 Feb 2024 21:10:36 +0000 (21:10 +0000)]
Preserve de-duplicated tag order in documentation
The 'set' datatype in Python does not provide iteration-order
guarantees related to insertion-order. That means that its
usage in the 'split_csv' helper function during documentation
build can produce nondeterministic results.
That is non-desirable for two reasons: it means that the
documentation output may appear to vary unnecessarily between
builds, and secondly there could be loss-of-information in cases
where tag order in the source documentation is significant.
This patch implements order-preserving de-duplication of tags,
allowing authors to specify tags using intentional priority
ordering, while also removing tags that appear more than once.
Petr Špaček [Mon, 5 Aug 2024 08:48:34 +0000 (10:48 +0200)]
Automatically adjust MR metadata after merge
1. Set milestone to 'Not released yet' after merge
We will set milestone to actual version number when we actually tag a
particular version. This will get rid of mass MR reassignment when we
do last minute changes to a release plan etc.
2. Adjust No CHANGES and Release Notes MR labels to match gitchangelog
workflow.
Petr Špaček [Mon, 5 Aug 2024 08:21:46 +0000 (10:21 +0200)]
Mark backports CI job as non-interruptible
Previously CI job for the autobackport bot inherited "interruptible:
true" global configuration. This caused premature termination of the job
when another merge was finished before the autobackport job ran to
completion.
Arаm Sаrgsyаn [Thu, 22 Aug 2024 15:33:17 +0000 (15:33 +0000)]
new: usr: implement the 'request-ixfr-max-diffs' configuration option
The new 'request-ixfr-max-diffs' configuration option sets the
maximum number of incoming incremental zone transfer (IXFR) differences,
exceeding which triggers a full zone transfer (AXFR).
Closes #4389
Merge branch '4389-request-ixfr-max-diffs' into 'main'
Aram Sargsyan [Fri, 7 Jun 2024 14:49:59 +0000 (14:49 +0000)]
Test the 'request-ixfr-max-diffs' configuration option
Configure a maximum of 3 allowed differences and add 5 new records.
Check that named detected that the differences exceed the allowed
limit and successfully retries with AXFR.
Aram Sargsyan [Fri, 7 Jun 2024 14:47:55 +0000 (14:47 +0000)]
Implement the 'request-ixfr-max-diffs' configuration option
This limits the maximum number of received incremental zone
transfer differences for a secondary server. Upon reaching the
confgiured limit, the secondary aborts IXFR and initiates a full
zone transfer (AXFR).
Mark Andrews [Thu, 22 Aug 2024 12:55:46 +0000 (12:55 +0000)]
new: usr: Support restricted key tag range when generating new keys
It is useful when multiple signers are being used
to sign a zone to able to specify a restricted
range of range of key tags that will be used by an
operator to sign the zone. This adds controls to
named (dnssec-policy), dnssec-signzone, dnssec-keyfromlabel and
dnssec-ksr (dnssec-policy) to specify such ranges.
Closes #4830
Merge branch '4830-support-restricted-key-tag-range-when-generating-new-keys' into 'main'