Since the changes aren't tracked in the single changelog.rst file,
generate the changelog to stdout instead, so it can be easily redirected
to the proper file.
Use libuv functions to get memory available to BIND 9
This change uses uv_get_total_memory() to get the memory available to
BIND 9 with possible modification by uv_get_constrained_memory() if the
libuv version is recent enough to honour constraints created by
f.e. cgroups.
chg: ci: Increase the load TCP/DoT shotgun perf tests
Due to the recent improvements to the TCP processing, much higher loads
can be handled by BIND9 without causing client timeouts. The updated
parameters give us useful data for both cold and hot cache testing.
Merge branch 'nicki/increase-tcp-dot-shotgun-load' into 'main'
Due to the recent improvements to the TCP processing, much higher loads
can be handled by BIND9 without causing client timeouts. The updated
parameters give us useful data for both cold and hot cache testing.
Michal Nowak [Mon, 23 Sep 2024 15:39:16 +0000 (15:39 +0000)]
chg: test: Downgrade "timeout" and "attempts" arguments in shutdown
The shutdown system test sends queries when named is shutting down, not
in an attempt to get answers but to destabilize the server into a crash.
With isctest.query.udp() defaulting to try up to ten times with a
ten-second timeout to get a response we don't care about from a likely
terminated server, we make the test run much longer than needed because
of retries and long timeouts.
Also, see isc-projects/bind9#4943.
Merge branch 'mnowak/shutdown-downgrade-timeout-and-attempts-arguments' into 'main'
Michal Nowak [Mon, 16 Sep 2024 12:55:06 +0000 (14:55 +0200)]
Downgrade "timeout" and "attempts" arguments in shutdown
The shutdown system test sends queries when named is shutting down, not
in an attempt to get answers but to destabilize the server into a crash.
With isctest.query.udp() defaulting to try up to ten times with a
ten-second timeout to get a response we don't care about from a likely
terminated server, we make the test run much longer than needed because
of retries and long timeouts.
chg: dev: Use uv_available_parallelism() if available
Instead of cooking up our own code for getting the number of available
CPUs for named to use, make use of uv_available_parallelism() from
libuv >= 1.44.0.
Merge branch 'ondrej/use-uv_available_parallelism-if-available' into 'main'
Add support to read number of online CPUs on OpenBSD
The OpenBSD doesn't have sysctlbyname(), but sysctl() can be used to
read the number of online/available CPUs by reading following MIB(s):
[CTL_HW, HW_NCPUONLINE] with fallback to [CTL_HW, HW_NCPU].
Cleanup the sysctlbyname and friends configure checks and ifdefs
Cleanup various checks and cleanups that are available on the all
platforms like sysctlbyname() and various related <sys/*.h> headers
that are either defined in POSIX or available on Linux and all BSDs.
Instead of cooking up our own code for getting the number of available
CPUs for named to use, make use of uv_available_parallelism() from
libuv >= 1.44.0.
Incoming transfers that took longer than 30 seconds would stop reading from the TCP stream and the incoming transfer would be indefinitely stuck causing BIND 9 to hang during shutdown.
This has been fixed and the `max-transfer-time-in` and `max-transfer-idle-in` timeouts are now honoured.
Closes #4949
Merge branch '4949-fix-ignored-and-invalid-dispatch-timeout-in-dns_xfrin' into 'main'
Don't enable timeouts in dns_dispatch for incoming transfers
The dns_dispatch_add() call in the dns_xfrin unit had hardcoded 30
second limit. This meant that any incoming transfer would be stopped in
it didn't finish within 30 seconds limit. Additionally, dns_xfrin
callback was ignoring the return value from dns_dispatch_getnext() when
restarting the reading from the TCP stream; this could cause transfers
to get stuck waiting for a callback that would never come due to the
dns_dispatch having already been shut down.
Call the dns_dispatch_add() without a timeout and properly handle the
result code from the dns_dispatch_getnext().
The dns_dispatch_add() has timeout parameter that could not be 0 (for
not timeout). Modify the dns_dispatch implementation to accept a zero
timeout for cases where the timeouts are undesirable because they are
managed externally.
chg: dev: Restore the number of threadpool threads back to original value
The issue of long-running operations potentially blocking query resolution has been fixed. Revert this temporary workaround and restore the number of threadpool threads.
Related #4898
Merge branch '4898-remove-workaround-and-note' into 'main'
Petr Menšík [Wed, 6 Oct 2021 11:53:33 +0000 (13:53 +0200)]
Move common flags logging to shared functions
Query and response log shares the same flags. Move flags logging out of
log_query to share it with log_response. Use buffer instead of snprintf
to fill flags a bit faster.
Petr Menšík [Tue, 13 Jul 2021 18:12:11 +0000 (20:12 +0200)]
Make responselog flags similar to querylog
Remove answer flag from log, log instead count of records for each
message section. Include EDNS version and few flags of response. Add
also status of result.
when the QP cache was adapted from the RBT database, some names
weren't changed. this could be confusing, so let's change them now.
also, we no longer need to include rbt.h.
rem: usr: Remove DNSRPS implementation from the open-source version
DNSRPS was the API for a commercial implementation of Response-Policy
Zones that was supposedly better. However, it was never open-sourced
and has only ever been available from a single vendor. This goes against
the principle that the open-source edition of BIND 9 should contain only
features that are generally available and universal.
This commit removes the DNSRPS implementation from BIND 9. It may be
reinstated in the subscription edition if there's enough interest from
customers, but it would have to be rewritten as a plugin (hook) instead
of hard-wiring it again in so many places.
Merge branch 'ondrej/remove-DNSRPS-from-open-source-edition' into 'main'
Ondřej Surý [Mon, 19 Aug 2024 15:19:21 +0000 (17:19 +0200)]
Remove DNSRPS implementation
DNSRPS was the API for a commercial implementation of Response-Policy
Zones that was supposedly better. However, it was never open-sourced
and has only ever been available from a single vendor. This goes against
the principle that the open-source edition of BIND 9 should contain only
features that are generally available and universal.
This commit removes the DNSRPS implementation from BIND 9. It may be
reinstated in the subscription edition if there's enough interest from
customers, but it would have to be rewritten as a plugin (hook) instead
of hard-wiring it again in so many places.
Evan Hunt [Thu, 22 Aug 2024 23:01:50 +0000 (16:01 -0700)]
use uv_dlopen() instead of dlopen() when linking DNSRPZ
take advantage of libuv's shared library handling capability
when linking to a DNSRPS library. (see b396f555861 and 37b9511ce1d
for prior related work.)
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages. Try a bit harder to deliver the
outgoing UDP messages synchronously and if that fails, drop the outgoing
DNS message that would get queued up and then timeout on the client side.
Closes #4930
Merge branch '4930-limit-the-UDP-send-queue' into 'main'
If the operating system UDP queue gets full and the outgoing UDP sending
starts to be delayed, BIND 9 could exhibit memory spikes as it tries to
enqueue all the outgoing UDP messages. As those are not going to be
delivered anyway (as we argued when we stopped enlarging the operating
system send and receive buffers), try to send the UDP messages directly
using `uv_udp_try_send()` and if that fails, drop the outgoing UDP
message.
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue by setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.
Closes #4936
Merge branch '4936-remove-so-incoming-cpu' into 'main'
We currently set SO_INCOMING_CPU incorrectly, and testing by Ondrej
shows that fixing the issue and setting affinities is worse than letting
the kernel schedule threads without constraints. So we should not set
SO_INCOMING_CPU anymore.
fix: usr: Fix a statistics channel counter bug when 'forward only' zones are used
When resolving a zone with a 'forward only' policy, and
finding out that all the forwarders are marked as "bad",
the 'ServerQuota' counter of the statistics channel was
incorrectly increased. This has been fixed.
Closes #1793
Merge branch '1793-serverquota-counter-bug-with-forward-only' into 'main'
Add a statistics channel check in the forward system test
Check that the fix in the previous commit works and that the
'ServerQuota' counter in the statistics channel is still unset
after a SERVFAIL result in a 'forward only' zone.
The 'all_spilled' local variable in resolver.c:fctx_getaddresses()
is 'true' by default, and only becomes false when there is at least
one successfully found NS address. However, when a 'forward only;'
configuration is used, the code jumps over the part where it looks
for NS addresses and doesn't reset the 'all_spilled' to false, which
results in incorretly increased 'serverquota' statistics variable,
and also in invalid return error code from the function. The result
code error didn't make any differences, because all codes other than
'ISC_R_SUCCESS' or 'DNS_R_WAIT' were treated in the same way, and
the result code was never logged anywhere.
Set the default value of 'all_spilled' to 'false', and only make it
'true' before actually starting to look up NS addresses.
Mark Andrews [Mon, 16 Sep 2024 02:49:11 +0000 (02:49 +0000)]
chg: dev: Remove statslock from dnssec-signzone
Silence Coverity CID 468757 and 468767 (DATA RACE read not locked) by converting dnssec-signzone to use atomics for statistics counters rather than using a lock.
Closes #4939
Merge branch '4939-remove-stats-lock-from-dnssec-signzone' into 'main'
Mark Andrews [Fri, 13 Sep 2024 03:30:34 +0000 (13:30 +1000)]
Remove 'statslock' from dnssec-signzone
Silence Coverity CID 468757 and 468767 (DATA RACE read not locked)
by converting dnssec-signzone to use atomics for statistics counters
rather than using a lock. This should be marginally faster than
using the lock as well when statistics are requested.
fix: usr: Separate DNSSEC validation from the long-running tasks
As part of the KeyTrap \[CVE-2023-50387\] mitigation, the DNSSEC CPU-intensive operations were offloaded to a separate threadpool that we use to run other tasks that could affect the networking latency.
If that threadpool is running some long-running tasks like RPZ, catalog zone processing, or zone file operations, it would delay DNSSEC validations to a point where the resolving signed DNS records would fail.
Split the CPU-intensive and long-running tasks into separate threadpools in a way that the long-running tasks don't block the CPU-intensive operations.
Closes #4898
Merge branch '4898-move-offloaded-DNSSEC-to-own-threads' into 'main'
Move offloaded DNSSEC operations to different helper threads
Currently, the isc_work API is overloaded. It runs both the
CPU-intensive operations like DNSSEC validations and long-term tasks
like RPZ processing, CATZ processing, zone file loading/dumping and few
others.
Under specific circumstances, when many large zones are being loaded, or
RPZ zones processed, this stops the CPU-intensive tasks and the DNSSEC
validation is practically stopped until the long-running tasks are
finished.
As this is undesireable, this commit moves the CPU-intensive operations
from the isc_work API to the isc_helper API that only runs fast memory
cleanups now.
Add isc_helper API that adds 1:1 thread for each loop
Add an extra thread that can be used to offload operations that would
affect latency, but are not long-running tasks; those are handled by
isc_work API.
Each isc_loop now has matching isc_helper thread that also built on top
of uv_loop. In fact, it matches most of the isc_loop functionality, but
only the `isc_helper_run()` asynchronous call is exposed.
fix: dev: Fix data race in offloaded dns_message_checksig()
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
Closes #4929
Merge branch '4929-data-race-in-dns_dnssec_verifymessage-memmove' into 'main'
When verifying a message in an offloaded thread there is a race with
the worker thread which writes to the same buffer. Clone the message
buffer before offloading.
rem: usr: Remove "port" from source address options
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.
Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.
Closes #3843
Merge branch '3843-remove-deprecated-source-port-options' into 'main'
Remove the use of "port" when configuring query-source(-v6),
transfer-source(-v6), notify-source(-v6), parental-source(-v6),
etc. Remove the use of source ports for parental-agents.
Also remove the deprecated options use-{v4,v6}-udp-ports and
avoid-{v4,v6}udp-ports.
Mark Andrews [Thu, 12 Sep 2024 03:27:10 +0000 (03:27 +0000)]
fix: usr: Don't allow statistics-channel if libxml2 and libjson-c are unsupported
When the libxml2 and libjson-c libraries are not supported, the statistics channel can't return anything useful, so it is now disabled. Use of `statistics-channel` in `named.conf` is a fatal error.
Closes #4895
Merge branch '4895-link-style-sheet-to-libxml2-support' into 'main'
Mark Andrews [Wed, 11 Sep 2024 23:05:34 +0000 (23:05 +0000)]
fix: test: The statschannel tests fails if one of libxml2 or json-c is configured
The `statschannel` system test failed if only one of `libxml2` or `json-c` is
available / configured as checks were being run against the non available
statistics page.
Closes #4919
Merge branch '4919-fix-statschannel-system-test' into 'main'
chg: usr: allow IXFR-to-AXFR fallback on DNS_R_TOOMANYRECORDS
This change allows fallback from an IXFR failure to AXFR when the reason is `DNS_R_TOOMANYRECORDS`. This is because this error condition could be temporary only in an intermediate version of IXFR transactions and it's possible that the latest version of the zone doesn't have that condition. In such a case, the secondary would never be able to update the zone (even if it could) without this fallback.
This fallback behavior is particularly useful with the recently introduced `max-records-per-type` and `max-types-per-name` options: the primary may not have these limitations and may temporarily introduce "too many" records, breaking IXFR. If the primary side subsequently deletes these records, this fallback will help recover the zone transfer failure automatically; without it, the secondary side would first need to increase the limit, which requires more operational overhead and has its own adverse effect.
Closes #4928
Merge branch 'fallback-ixfr-to-axfr-on-toomanyrecords' into 'main'
JINMEI Tatuya [Fri, 16 Aug 2024 07:53:38 +0000 (16:53 +0900)]
allow IXFR-to-AXFR fallback on DNS_R_TOOMANYRECORDS
This change allows fallback from an IXFR failure to AXFR when the
reason is DNS_R_TOOMANYRECORDS. This is because this error condition
could be temporary only in an intermediate version of IXFR
transactions and it's possible that the latest version of the zone
doesn't have that condition. In such a case, the secondary would never
be able to update the zone (even if it could) without this fallback.
This fallback behavior is particularly useful with the recently
introduced max-records-per-type and max-types-per-name options:
the primary may not have these limitations and may temporarily
introduce "too many" records, breaking IXFR. If the primary side
subsequently deletes these records, this fallback will help recover
the zone transfer failure automatically; without it, the secondary
side would first need to increase the limit, which requires more
operational overhead and has its own adverse effect.
This change also fixes a minor glitch that DNS_R_TOOMANYRECORDS wasn't
logged in xfrin_fail.
The rcu_xchg_pointer() function can be used outside of a critical
section, and usually must be followed by a synchronize_rcu() or
call_rcu() call to detach from the resource, unless if there are
some guarantees in place because of our own reference counting.
Mark Andrews [Tue, 10 Sep 2024 00:08:51 +0000 (00:08 +0000)]
new: usr: Add flag to named-checkconf to ignore "not configured" errors
`named-checkconf` now takes "-n" to ignore "not configured" errors. This allows named-checkconf to check the syntax of configurations from other builds which have support for more options.
Merge branch '4913-add-option-to-named-checkconf-to-override-notconfigured-flag' into 'main'
Mark Andrews [Mon, 2 Sep 2024 06:03:17 +0000 (16:03 +1000)]
Add flag to named-checkconf to ignore "not configured" errors
named-checkconf now takes "-n" to ignore "not configured" errors.
This allows named-checkconf to check the syntax of configurations
from other builds which have support for more options.