Evan Hunt [Fri, 31 Jan 2025 04:45:07 +0000 (04:45 +0000)]
chg: dev: Refactor decref() in both QPDB
Clean up the pattern in the newref() and decref() functions in QP databases. Replace the `db_nodelock_t` structure with plain reference counting for every active database node in QPDB.
Related to #5134
Merge branch '5134-refactor-decref-functions-in-qpdb' into 'main'
Evan Hunt [Thu, 30 Jan 2025 22:42:57 +0000 (14:42 -0800)]
Clarify reference counting in QP databases
Change the names of the node reference counting functions
and add comments to make the mechanism easier to understand:
- newref() and decref() are now called qpcnode_acquire()/
qpznode_acquire() and qpcnode_release()/qpznode_release()
respectively; this reflects the fact that they modify both
the internal and external reference counters for a node.
- qpcnode_newref() and qpznode_newref() are now called
qpcnode_erefs_increment() and qpznode_erefs_increment(), and
qpcnode_decref() and qpznode_decref() are now called
qpcnode_erefs_decrement() and qpznode_erefs_decrement(),
to reflect that they only increase and decrease the node's
external reference counters, not internal.
Ondřej Surý [Mon, 27 Jan 2025 20:07:11 +0000 (21:07 +0100)]
Remove db_nodelock_t in favor of reference counted qpdb
This removes the db_nodelock_t structure and changes the node_locks
array to be composed only of isc_rwlock_t pointers. The .reference
member has been moved to qpdb->references in addition to
common.references that's external to dns_db API users. The .exiting
members has been completely removed as it has no use when the reference
counting is used correctly.
Ondřej Surý [Mon, 27 Jan 2025 17:13:38 +0000 (18:13 +0100)]
Remove origin_node from qpcache
The origin_node in qpcache was always NULL, so we can remove the
getoriginode() function and origin_node pointer as the
dns_db_getoriginnode() correctly returns ISC_R_NOTFOUND when the
function is not implemented.
Ondřej Surý [Mon, 27 Jan 2025 17:06:17 +0000 (18:06 +0100)]
Refactor decref() in both qpcache.c and qpzone.c
Cleanup the pattern in the decref() functions in both qpcache.c and
qpzone.c, so it follows the similar patter as we already have in
newref() function.
Colin Vidal [Thu, 30 Jan 2025 13:07:18 +0000 (13:07 +0000)]
fix: dev: DNSSEC EDE system tests on FIPS platform
Changes introducing the support of extended DNS error code 1 and 2 uses
SHA-1 digest for some tests which break FIPS platform. The digest itself
was irrelevant, another digest is used.
Colin Vidal [Mon, 27 Jan 2025 11:52:19 +0000 (12:52 +0100)]
fix DNSSEC EDE system tests on FIPS platform
Changes !9948 introducing the support of extended DNS error code 1 and 2
uses SHA-1 digest for some tests which break FIPS platform. The digest
itself was irrelevant, another digest is used.
Ondřej Surý [Thu, 30 Jan 2025 11:29:53 +0000 (11:29 +0000)]
fix: dev: Split and simplify the use of EDE list implementation
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.
Additionally, the new dns_edelist_t doesn't have to be copied into all
responses as those are attached to the fetch context, but it could be
only passed by reference.
This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.
Colin Vidal [Wed, 29 Jan 2025 22:27:34 +0000 (23:27 +0100)]
detect dup EDE with bitmap and store next pos
In order to avoid to loop to find the next position to store an EDE in
a dns_edectx_t, add a "nextede" state which holds the next available
position.
Also, in order ot avoid to loop to find if an EDE is already existing in
a dns_edectx_t, and avoid a duplicate, use a bitmap to immediately know
if the EDE is there or not.
Those both changes applies for adding or copying EDE.
Also make the direction of dns_ede_copy more explicit/avoid errors by
making "edectx_from" a const pointer.
Colin Vidal [Wed, 29 Jan 2025 17:34:51 +0000 (18:34 +0100)]
Refactor test covering dns_ede API
Migrate tests cases in client_test code which were exclusively testing
code which is now all wrapped inside ede compilation unit. Those are
testing maximum number of EDE, duplicate EDE as well as truncation of
text of an EDE.
Also add coverage for the copy of EDE from an edectx to another one, as
well as checking the assertion of the maximum EDE info code which can be
used.
Ondřej Surý [Wed, 29 Jan 2025 10:11:32 +0000 (11:11 +0100)]
Split and simplify the use of EDE list implementation
Instead of mixing the dns_resolver and dns_validator units directly with
the EDE code, split-out the dns_ede functionality into own separate
compilation unit and hide the implementation details behind abstraction.
Additionally, the EDE codes are directly copied into the ns_client
buffers by passing the EDE context to dns_resolver_createfetch().
This makes the dns_ede implementation simpler to use, although sligtly
more complicated on the inside.
Co-authored-by: Colin Vidal <colin@isc.org> Co-authored-by: Ondřej Surý <ondrej@isc.org>
Andoni Duarte [Thu, 30 Jan 2025 10:02:18 +0000 (10:02 +0000)]
fix: ci: remove allow failure in cross version config tests
From https://gitlab.isc.org/isc-projects/bind9/-/issues/5087, the relevant MRs have been merged in the January 2025 release. Hence this MR removes `allow_failure: true` in CI.
Merge branch 'andoni/remove-allow-failure-in-cross-version-config-tests' into 'main'
Nicki Křížek [Wed, 29 Jan 2025 14:11:13 +0000 (14:11 +0000)]
chg: ci: Use make clean to reduce artifacts in successful jobs
Reduce the amount of artifacts stored by running make clean at the end
of unit and system test run. If any of the previous commands fail, the
runner will stop executing the commands in `script` immediately, so the
cleanup only happens if none of the previous commands failed.
The build artifacts from unit and system tests are re-used anywhere and
should be safe to throw away immediately.
Merge branch 'nicki/reduce-ci-artifacts' into 'main'
Nicki Křížek [Tue, 28 Jan 2025 14:23:01 +0000 (15:23 +0100)]
Use make clean to reduce artifacts in successful jobs
Reduce the amount of artifacts stored by running make clean at the end
of unit and system test run. If any of the previous commands fail, the
runner will stop executing the commands in `script` immediately, so the
cleanup only happens if none of the previous commands failed.
The build artifacts from unit and system tests are re-used anywhere and
should be safe to throw away immediately. Same for respdiff.
Nicki Křížek [Tue, 28 Jan 2025 13:35:07 +0000 (13:35 +0000)]
fix: ci: Run merged-metadata job for release branches in private repo
The prior regex didn't match the actual names we use for release
branches in the private repo. This caused the merged-metadata job to not
be created upon merging to a release branch, resulting in the private MR
not being properly milestoned.
Use the correct regex along with protecting the v9.*-release branches in
the gitlab UI so that they have access to the token used to perform the
required API operations.
Merge branch 'nicki/ci-fix-post-merge-in-private-repo' into 'main'
Nicki Křížek [Mon, 27 Jan 2025 14:30:39 +0000 (15:30 +0100)]
Run merged-metadata job for release branches in private repo
The prior regex didn't match the actual names we use for release
branches in the private repo. This caused the merged-metadata job to not
be created upon merging to a release branch, resulting in the private MR
not being properly milestoned.
Use the correct regex along with protecting the v9.*-release branches in
the gitlab UI so that they have access to the token used to perform the
required API operations.
Things to consider:
- FreeBSD DoH jobs are not added because Flamethrower queries always timeout.
- This adds 15 more CI jobs:
- Linux (AWS autoscaler): `(auth + recursive + RPZ) * (DoH + DoT) * (amd64 + arm64) = 12`
- FreeBSD (one FreeBSD runner): `(auth + recursive + RPZ) * (DoT) * (amd64) = 3`
- Autoscaler is not yet present on FreeBSD. Adding 3 CI jobs (i.e., DoT) run serially adds 3 hours to the pipeline runtime. Should we add just one FreeBSD DoT job to limit the runtime?
- DoH/DoT performance is slightly lower than pure TCP, so the threshold for the test to pass must be lowered by 5-10% (see isc-private/bind-qa!40).
Merge branch 'mnowak/stress-test-with-doh-dot' into 'main'
Michal Nowak [Mon, 19 Feb 2024 14:55:00 +0000 (15:55 +0100)]
Add DoH and DoT stress tests, generate test configurations
Add DoH and DoT stress test jobs. The DoH scenario on FreeBSD is omitted
because all Flamethrower's DoH queries timeout on this platform.
Since the response rate of DoT queries is lower than that of DoH and
TCP, the expected TCP response rate is 80%.
Due to the large number of similar stress test configurations, the
"util/generate-stress-test-configs.py" script now generates them as part
of a downstream pipeline. The script is expected to be run exclusively
within the CI environment, which sources all environmental variables and
files.
This refactoring brought the following changes:
- To start a stress test immediately and not wait for artifacts of the
autoreconf job, run the "autoreconf -fi" command as part of every job.
- Drop the BIND_STRESS_TEST_* variables as they were rarely used and
conflicted with mode and platform selection in the configuration
generator.
- Most pipelines now include a few short, randomly selected stress test
jobs. To schedule all stress tests, set the ALL_BIND_STRESS_TESTS
environmental variable, push a tag to CI, or run a scheduled pipeline.
- Set the BIND_STRESS_TESTS_RUN_TIME environmental variable to pick the
stress test runtime of your choosing, set the BIND_STRESS_TESTS_RATE
environmental variable to set different than the default query rate.
- Job timeout is set to 30 minutes plus stress test runtime in minutes.
Colin Vidal [Mon, 27 Jan 2025 11:48:49 +0000 (11:48 +0000)]
fix: dev: fix EDE 22 time out detection
Extended DNS error 22 (No reachable authority) was previously detected when `fctx_expired` fired. It turns out this function is used as a "safety net" and the timeout detection should be caught earlier.
It was working though, because of another issue fixed by !9927. But then, the recursive request timed out detection occurs before `fctx_expired` making impossible to raise the EDE 22 error.
This fixes the problem by triggering the EDE 22 in the part of the code detecting the (TCP or UDP) time out and taking the decision to cancel the whole fetch (i.e. There is no other server to attempt to contact).
Note this is not targeting users (no release note) because there is no release versions of BIND between !9927 and this changes. Thus a release note would be confusing.
Colin Vidal [Fri, 24 Jan 2025 16:27:09 +0000 (17:27 +0100)]
fix byte order in EDE logging
When an EDE code is added to a message, the code is converted early in a
big-endian order so it can be memcpy-ed directly in the EDE buffer that
will go on the wire.
This previous change forget to update debug logs which still assume the
EDE code was in host byte order. Add a separate variable to
differentiate both and avoid ambiguities
Colin Vidal [Fri, 24 Jan 2025 10:23:43 +0000 (11:23 +0100)]
update serve-stale test to support EDE 22
When EDE 3 (stale answer) was added the serve-stale tests were checking
for those exclusively, i.e. grepping for no "EDE" in the dig output when
no stale answer was expected.
However, some stale tests disable stale answers and make the
authoritative server unresponsive, effectively triggering a timed out
request thus an EDE 22. Update those tests so they still tests the
absence of EDE 3 error, but also the presence of EDE 22.
Colin Vidal [Thu, 23 Jan 2025 15:43:53 +0000 (16:43 +0100)]
add new EDE 22 system tests
This re-do a previously existing EDE 22 system test as well as add
another one making sure the timed out flow detection works also on UDP
when the resolver is contacting the authoritative server. (the existing
test was using TCP to contact the authoritative servers).
Colin Vidal [Thu, 23 Jan 2025 15:38:35 +0000 (16:38 +0100)]
fix EDE 22 time out detection
Extended DNS error 22 (No reachable authority) was previously detected
when `fctx_expired` fired. It turns out this function is used as a
"safety net" and the timeout detection should be caught earlier.
It was working though, because of another issue fixed by !9927. Since
this change, the recursive request timed out detection occurs before
`fctx_expired` so EDE 22 is not added to the response message anymore.
The fix of the problem is to add the EDE 22 code in two situations:
- When the dispatch code timed out (rctx_timedout) the resolver code
checks various properties to figure out if it needs to make another
fetch attempt. One of the paramters if the fetch expiration time. If
it expires, the whole recursion is canceled, so it now adds the EDE 22
code.
- If the fetch expiration time doesn't expires in the case above (and
other parameters allows it) a new fetch attempt is made (fctx_query).
But before the new request is actually made, the fetch expiration time
is re-checked. It might then has elapsed, and the whole recursion is
canceled. So it now also adds the EDE 22 code here as well.
Aydın Mercan [Sat, 25 Jan 2025 12:53:38 +0000 (12:53 +0000)]
new: usr: add a rndc command to toggle jemalloc profiling
The new command is `rndc memprof`. The memory profiling status is also
reported inside `rndc status`. The status also shows whether named can
toggle memory profiling or not and if the server is built with jemalloc.
Closes #4759
Merge branch '4759-add-a-trigger-to-dump-jeprof-data-or-memory-statistics' into 'main'
Aydın Mercan [Mon, 12 Aug 2024 15:17:05 +0000 (18:17 +0300)]
add a rndc command to toggle jemalloc profiling
The new command is `rndc memprof`. The memory profiling status is also
reported inside `rndc status`. The status also shows whether named can
toggle memory profiling or not and if the server is built with jemalloc.
Nicki Křížek [Fri, 24 Jan 2025 18:10:32 +0000 (18:10 +0000)]
chg: ci: Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In 49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
Merge branch 'nicki/ci-fix-changelog-job' into 'main'
Nicki Křížek [Mon, 2 Dec 2024 14:31:53 +0000 (15:31 +0100)]
Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In 49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
Colin Vidal [Mon, 20 Jan 2025 19:59:23 +0000 (20:59 +0100)]
add DNSSEC EDE test for unsupported digest and alg
A DNSSEC validation can fail in the case where multiple DNSKEY are
available for a zone and none of them are supported, but for different
reasons: one has a DS record in the parent zone using an unsupported
digest while the other one uses an unsupported encryption algorithm.
Add a specific test case covering this flow and making sure that two
extended DNS error are provided: code 1 and 2, each of them highlighting
unsupported algorithm and digest.
Colin Vidal [Mon, 13 Jan 2025 13:50:01 +0000 (14:50 +0100)]
add support for EDE code 1 and 2
Add support for EDE codes 1 (Unsupported DNSKEY Algorithm) and 2
(Unsupported DS Digest Type) which might occurs during DNSSEC
validation in case of unsupported DNSKEY algorithm or DS digest type.
Because DNSSEC internally kicks off various fetches, we need to copy
all encountered extended errors from fetch responses to the fetch
context. Upon an event, the errors from the fetch context are copied
to the client response.
Nicki Křížek [Wed, 28 Aug 2024 13:00:02 +0000 (15:00 +0200)]
Make servers fixture in pytest module-wide
The servers are setup and torn down once per each test module. All the
logs and server state persists between individual tests within the same
module. The servers fixture representing these servers should be
module-wide as well.
Michal Nowak [Thu, 10 Oct 2024 17:46:22 +0000 (19:46 +0200)]
Use Debian "sid" for pylint and mypy jobs to get recent dnspython
The base image tends to have a rather old dnspython version and when
used with pylint and mypy it produces errors about newer dnspython
features the old version does not know about.
$ mypy "bin/tests/system/isctest/"
bin/tests/system/isctest/query.py:55: error: Unexpected keyword argument "verify" for "tls" [call-arg]
/usr/lib/python3/dist-packages/dns/query.py:958: note: "tls" defined here
Michal Nowak [Wed, 17 Jan 2024 19:47:42 +0000 (20:47 +0100)]
Add isctest.query.tls() function
When explicitly set to True, the "verify" argument lets dnspython verify
certificates used for the connection. As most certificates in the system
test will inevitably be self-signed, the "verify" argument defaults to
False.
The "verify" argument is present in dnspython since the version 2.5.0.
Evan Hunt [Fri, 24 Jan 2025 00:58:06 +0000 (00:58 +0000)]
rem: dev: Clean up unused result codes
A number of result codes are obsolete and can be removed. Others, including `ISC_R_NOMEMORY`, are still checked in various places even though they can't occur any longer. These have been cleaned up.
Evan Hunt [Thu, 9 Jan 2025 04:13:34 +0000 (20:13 -0800)]
deduplicate result codes
ISCCC_R_SYNTAX, ISCCC_R_EXPIRED, and ISCCC_R_CLOCKSKEW have the
same usage and text formats as DNS_R_SYNTAX, DNS_R_EXPIRED and
DNS_R_CLOCKSCREW respectively. this was originally done because
result codes were defined in separate libraries, and some tool
might be linked with libisccc but not libdns. as the result codes
are now defined in only one place, there's no need to retain the
duplicates.
Evan Hunt [Wed, 8 Jan 2025 21:50:53 +0000 (13:50 -0800)]
clean up uses of DST_R_NOCRYPTO
building BIND without crypto support is no longer possible.
consequently this result code is never sent, and therefore we
don't need code in calling functions to handle it.
Evan Hunt [Wed, 8 Jan 2025 03:03:07 +0000 (19:03 -0800)]
clean up uses of ISC_R_NOMEMORY
the isc_mem allocation functions can no longer fail; as a result,
ISC_R_NOMEMORY is now rarely used: only when an external library
such as libjson-c or libfstrm could return NULL. (even in
these cases, arguably we should assert rather than returning
ISC_R_NOMEMORY.)
code and comments that mentioned ISC_R_NOMEMORY have been
cleaned up, and the following functions have been changed to
type void, since (in most cases) the only value they could
return was ISC_R_SUCCESS:
(the exception is dns_view_create(), which could have returned
other error codes in the event of a crypto library failure when
calling isc_file_sanitize(), but that should be a RUNTIME_CHECK
anyway.)
Nicki Křížek [Thu, 23 Jan 2025 17:26:40 +0000 (17:26 +0000)]
chg: ci: Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.
The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.
The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.
Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
Merge branch 'nicki/ci-respdiff-limits' into 'main'
Nicki Křížek [Mon, 13 Jan 2025 13:29:24 +0000 (14:29 +0100)]
Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.
The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.
The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.
Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
Matthijs Mekking [Thu, 23 Jan 2025 11:12:33 +0000 (11:12 +0000)]
fix: doc: Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.
Add text to clarify that the interval is calculated from the new
validity period.
Closes #5128
Merge branch '5128-clarify-dnssec-signzone-interval' into 'main'
Matthijs Mekking [Wed, 15 Jan 2025 12:47:48 +0000 (13:47 +0100)]
Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.
Add text to clarify that the interval is calculated from the new
validity period.
Matthijs Mekking [Thu, 23 Jan 2025 10:36:15 +0000 (10:36 +0000)]
fix: usr: Fix a bug in dnssec-signzone related to keys being offline
In the case when `dnssec-signzone` is called on an already signed zone, and the private key file is unavailable, a signature that needs to be refreshed may be dropped without being able to generate a replacement. This has been fixed.
Closes #5126
Merge branch '5126-dnssec-signzone-retain-rrsig-if-key-is-offline' into 'main'
Matthijs Mekking [Tue, 14 Jan 2025 14:18:40 +0000 (15:18 +0100)]
dnssec-signzone retain signature if key is offline
Track inside the dns_dnsseckey structure whether we have seen the
private key, or if this key only has a public key file.
If the key only has a public key file, or a DNSKEY reference in the
zone, mark the key 'pubkey'. In dnssec-signzone, if the key only
has a public key available, consider the key to be offline. Any
signatures that should be refreshed for which the key is not available,
retain the signature.
So in the code, 'expired' becomes 'refresh', and the new 'expired'
is only used to determine whether we need to keep the signature if
the corresponding key is not available (retaining the signature if
it is not expired).
In the 'keysthatsigned' function, we can remove:
- key->force_publish = false;
- key->force_sign = false;
because they are redundant ('dns_dnsseckey_create' already sets these
values to false).
Matthijs Mekking [Tue, 14 Jan 2025 13:10:20 +0000 (14:10 +0100)]
Test dnssec-signzone with private key file missing
Add a test case for the scenario below.
There is a case when signing a zone with dnssec-signzone where the
private key file is moved outside the key directory (for offline
ksk purposes), and then the zone is resigned. The signature of the
DNSKEY needs refreshing, but is not expired.
Rather than removing the signature without having a valid replacement,
leave the signature in the zone (despite it needs to be refreshed).
Colin Vidal [Wed, 22 Jan 2025 21:32:28 +0000 (21:32 +0000)]
new: usr: Add support for multiple extended DNS errors
Extended DNS error mechanism (EDE) may have several errors raised during a DNS resolution. `named` is now able to add up to three EDE codes in a DNS response. In the case of duplicate error codes, only the first one will be part of the DNS response.
Colin Vidal [Tue, 14 Jan 2025 16:13:42 +0000 (17:13 +0100)]
add support for multiple EDE
Extended DNS error mechanism (EDE) enables to have several EDE raised
during a DNS resolution (typically, a DNSSEC query will do multiple
fetches which each of them can have an error). Add support to up to 3
EDE errors in an DNS response. If duplicates occur (two EDEs with the
same code, the extra text is not compared), only the first one will be
part of the DNS answer.
Because the maximum number of EDE is statically fixed, `ns_client_t`
object own a static vector of `DNS_DE_MAX_ERRORS` (instead of a linked
list, for instance). The array can be fully filled (all slots point to
an allocated `dns_ednsopt_t` object) or partially filled (or
empty). In such case, the first NULL slot means there is no more EDE
objects.
Arаm Sаrgsyаn [Wed, 22 Jan 2025 13:41:25 +0000 (13:41 +0000)]
chg: dev: Use a suitable response in tcp_connected() when initiating a read
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read timeouts
were not separated in dispatch this affected only logging, but now since
we are setting a new timeout after a successful connection, we need to
choose a suitable response from the active queue.
Merge branch 'aram/dispatch-tcp_connected-fix' into 'main'
Aram Sargsyan [Sat, 21 Dec 2024 09:10:20 +0000 (09:10 +0000)]
Clean up fctx->next_timeout
Since the support for non-zero values of stale-answer-client-timeout
was removed in bd7463914fe6375e3e9157f305c60d0172f2b312, 'next_timeout'
is unused. Clean it up.
Aram Sargsyan [Fri, 20 Dec 2024 10:39:26 +0000 (10:39 +0000)]
Adjust the resolver-query-timeout test
Since the read timeout now works, the resolver time outs from the
dispatch level instead of from the "hung fetch" timer, and so the
EDE value in 'fctx_expired()' is not being set. Remove the expected
EDE value from the test.
Aram Sargsyan [Fri, 20 Dec 2024 08:41:03 +0000 (08:41 +0000)]
Fix rtt calculation bug for TCP in the resolver
When TCP is used, 'fctx_query()' adds one second to the rtt
(round-trip time) value, but there's a bug when the decision
about using TCP is made already after the calculation. Move the
block of the code which looks up the peers list to decide
whether to use TCP into a place that's before the rtt calculation
is performed. This commit doesn't add or remove any code, it just
moves the code and adds a comment block.
Aram Sargsyan [Thu, 19 Dec 2024 14:22:54 +0000 (14:22 +0000)]
Use a suitable response in tcp_connected() when initiating a read
When 'ISC_R_TIMEDOUT' is received in 'tcp_recv()', it times out the
oldest response in the active responses queue, and only after that it
checks whether other active responses have also timed out. So when
setting a timeout value for a read operation after a successful
connection, it makes sense to take the timeout value from the oldest
response in the active queue too, because, theoretically, the responses
can have different timeout values, e.g. when the TCP dispatch is shared.
Currently 'resp' is always NULL. Previously when connect and read
timeouts were not separated in dispatch this affected only logging, but
now since we are setting a new timeout after a successful connection,
we need to choose a suitable response from the active queue.
Ondřej Surý [Wed, 22 Jan 2025 13:27:40 +0000 (13:27 +0000)]
fix: usr: Avoid unnecessary locking in the zone/cache database
Prevent lock contention among many worker threads referring to the same database node at the same time. This would improve zone and cache database performance for the heavily contended database nodes.
Closes #5130
Merge branch '5130-reduce-lock-contention-in-decrement-reference' into 'main'
JINMEI Tatuya [Sat, 18 Jan 2025 00:54:19 +0000 (16:54 -0800)]
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Ondřej Surý [Wed, 15 Jan 2025 12:02:20 +0000 (13:02 +0100)]
Shutdown the fetch context after canceling the last fetch
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
Ondřej Surý [Wed, 22 Jan 2025 13:14:40 +0000 (13:14 +0000)]
fix: usr: Apply the memory limit only to ADB database items
Resolver under heavy-load could exhaust the memory available for storing
the information in the Address Database (ADB) effectively evicting already
stored information in the ADB. The memory used to retrieve and provide
information from the ADB is now not a subject of the same memory limits
that are applied for storing the information in the Address Database.
Closes #5127
Merge branch '5127-change-ADB-memory-split' into 'main'
Ondřej Surý [Wed, 15 Jan 2025 09:36:33 +0000 (10:36 +0100)]
Remove memory limit on ADB finds and fetches
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
Arаm Sаrgsyаn [Wed, 22 Jan 2025 12:58:29 +0000 (12:58 +0000)]
chg: dev: Separate the connect and the read TCP timeouts in dispatch
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
Closes #5009
Merge branch '5009-dispatch-separate-connect-and-read-timeouts' into 'main'
Aram Sargsyan [Wed, 30 Oct 2024 13:31:34 +0000 (13:31 +0000)]
Remove dispatch timeout INT16_MAX limitation
In some places there was a limitation of the maximum timeout
value of INT16_MAX, which is only about 32 seconds. Refactor
the code to remove the limitation.
Aram Sargsyan [Wed, 30 Oct 2024 09:23:33 +0000 (09:23 +0000)]
Separate the connect and the read timeouts in dispatch
The network manager layer has two different timers with their
own timeout values for TCP connections: connect timeout and read
timeout. Separate the connect and the read TCP timeouts in the
dispatch module too.
Colin Vidal [Thu, 9 Jan 2025 11:01:28 +0000 (12:01 +0100)]
remove validator link form fetchctx
struct fetchctx does have a list of pending validators as well as a
pointer to the HEAD validator. Remove the validator pointer to avoid
confusion, as there is no perticular reasons to have it directly
accessible outside of the list.