in QP keys, characters that are not common in DNS names are
encoded as two-octet sequences. this caused a glitch in iterator
positioning when some lookups failed.
consider the case where we're searching for "\009" (represented
in a QP key as {0x03, 0x0c}) and a branch exists for "\000"
(represented as {0x03, 0x03}). we match on the 0x03, and continue
to search down. at the point where we find we have no match,
we need to pop back up to the branch before the 0x03 - which may
be multiple levels up the stack - before we position the iterator.
there were some structure names used in qpcache.c and qpzone.c that
were too similar to each other and could be confusing when debugging.
they have been changed as follows:
in qcache.c:
- changed_t was unused, and has been removed
- search_t -> qpc_search_t
- qpdb_rdatasetiter_t -> qpc_rditer_t
- qpdb_dbiterator_t -> qpc_dbiter_t
when calling dns_qp_lookup() from qpcache, instead of passing
'foundname' so that a name would be constructed from the QP key,
we now just use the name field in the node data. this makes
dns_qp_lookup() run faster.
the same optimization has also been added to qpzone.
the documentation for dns_qp_lookup() has been updated to
discuss this performance consideration.
Evan Hunt [Thu, 14 Mar 2024 23:46:52 +0000 (16:46 -0700)]
include the nodenames when calculating memory to purge
when the cache is over memory, we purge from the LRU list until
we've freed the approximate amount of memory to be added. this
approximation could fail because the memory allocated for nodenames
wasn't being counted.
add a dns_name_size() function so we can look up the size of nodenames,
then add that to the purgesize calculation.
Evan Hunt [Wed, 13 Mar 2024 05:19:47 +0000 (22:19 -0700)]
simplify qpcache iterators
in a cache database, unlike zones, NSEC3 records are stored in
the main tree. it is not necessary to maintain a separate 'nsec3'
tree, nor to have code in the dbiterator implementation to traverse
from one tree to another.
(if we ever implement synth-from-dnssec using NSEC3 records, we'll
need to revert this change. in the meantime, simpler code is better.)
clean up unnecessary dbiterator code related to origin
the QP database doesn't support relative names as the RBTDB did, so
there's no need for a 'new_origin' flag or to handle `DNS_R_NEWORIGIN`
result codes.
Evan Hunt [Tue, 12 Mar 2024 08:05:07 +0000 (01:05 -0700)]
more cleanups in qpcache.c
- remove unneeded struct members and misleading comments.
- remove unused parameters for static functions.
- rename 'find_callback' to 'delegating' for consistency with qpzone;
the find callback mechanism is not used in QP databases.
- change dns_qpdata_t to qpcnode_t (QP cache node), and dns_qpdb_t to
qpcache_t, as these types are only accessed locally.
- also change qpdata_t in qpzone.c to qpznode_t (QP zone node), for
consistency.
- make the refcount declarations for qpcnode_t and qpznode_t static,
using the new ISC_REFCOUNT_STATIC macros.
Ondřej Surý [Tue, 26 Mar 2024 13:13:24 +0000 (14:13 +0100)]
Improve the reference counting checks in newref()
In qpcache (and rbtdb), there are some functions that acquire
neither the tree lock nor the node lock when calling newref().
In theory, this could lead to a race in which a new reference
is added to a node that was about to be deleted.
We now detect this condition by passing the current tree and node
lock status to newref(). If the node was previously unreferenced
and we don't hold at least one read lock, we will assert.
Michal Nowak [Thu, 4 Apr 2024 10:59:50 +0000 (12:59 +0200)]
Use FreeBSD autoscaler for "stress" tests
The FreeBSD autoscaler has been configured to utilize the new "instance"
GitLab Runner executor to spawn "stress" test CI jobs on AWS EC2
dynamically. A shared GitLab Runner named "freebsd-instance-autoscaler"
has been set up in GitLab CI/CD to communicate with EC2, provisioning VM
instances on demand based on a FreeBSD 13 AMI image. This image is the
same as the one previously used for FreeBSD "stress" tests before the
implementation of autoscaling (specifically, the
"freebsd13-amd64-bind9stress.aws.lab.isc.org" GitLab Runner in CI/CD).
Michał Kępień [Fri, 26 Apr 2024 16:43:07 +0000 (18:43 +0200)]
Update URLs and paths for the BIND 9 QA repository
Since the BIND 9 QA repository has been made public, adjust the relevant
URLs and paths used in .gitlab-ci.yml so that they work with the public
version of that repository.
Aydın Mercan [Thu, 15 Feb 2024 10:30:42 +0000 (13:30 +0300)]
Provide an early escape hatch for ns_client_transport_type
Because some tests don't have a legtimate handle, provide a temporary
return early that should be fixed and removed before squashing. This
short circuiting is still correct until DoQ/DoH3 support is introduced.
Aydın Mercan [Wed, 7 Feb 2024 08:35:59 +0000 (11:35 +0300)]
Add fallback to ns_client_get_type despite unreachable
GCC might fail to compile because it expects a return after UNREACHABLE.
It should ideally just work anyway since UNREACHABLE is either a
noreturn or UB (__builtin_unreachable / C23 unreachable).
Either way, it should be optimized almost always so the fallback is
free or basically free anyway when it isn't optimized out.
Ondřej Surý [Tue, 26 Mar 2024 00:23:19 +0000 (17:23 -0700)]
avoid a race in the qpzone getsigningtime() implementation
the previous commit introduced a possible race in getsigningtime()
where the rdataset header could change between being found on the
heap and being bound.
getsigningtime() now looks at the first element of the heap, gathers the
locknum, locks the respective lock, and retrieves the header from the
heap again. If the locknum has changed, it will rinse and repeat.
Theoretically, this could spin forever, but practically, it almost never
will as the heap changes on the zone are very rare.
we simplify matters further by changing the dns_db_getsigningtime()
API call. instead of passing back a bound rdataset, we pass back the
information the caller actually needed: the resigning time, owner name
and type of the rdataset that was first on the heap.
Evan Hunt [Wed, 13 Mar 2024 22:59:53 +0000 (15:59 -0700)]
simplify qpzone database by using only one heap for resigning
in RBTDB, the heap was used by zone databases for resigning, and
by the cache for TTL-based cache cleaning. the cache use case required
very frequent updates, so there was a separate heap for each of the
node lock buckets.
qpzone is for zones only, so it doesn't need to support the cache
use case; the heap will only be touched when the zone is updated or
incrementally signed. we can simplify the code by using only a single
heap.
simplify code by removing return values where possible
fix_iterator() and related functions are quite difficult to read.
perhaps it would be a little clearer if we didn't assign values
to variables that won't subsequently be used, or unnecessarily
pop the stack and then push the same value back onto it.
also, in dns_qp_lookup() we previously called fix_iterator(),
removed the leaf from the top of the iterator stack, and then
added it back on. this would be clearer if we just push the leaf
onto the stack when we need to, but leave the stack alone when
it's already complete.
under some circumstances it was possible for the iterator to
be set to the first leaf in a set of twigs, when it should have
been set to the last.
a unit test has been added to test this scenario. if there is a
a tree containing the following values: {".", "abb.", "abc."}, and
we query for "acb.", previously the iterator would be positioned at
"abb." instead of "abc.".
the tree structure is:
branch (offset 1, ".")
branch (offset 3, ".ab")
leaf (".abb")
leaf (".abc")
we find the branch with offset 3 (indicating that its twigs differ
from each other in the third position of the label, "abB" vs "abC").
but the search key differs from the found keys at position 2
("aC" vs "aB"). we look up the bit value in position 3 of the
search key ("B"), and incorrectly follow it onto the wrong twig
("abB").
to correct for this, we need to check for the case where the search
key is greater than the found key in a position earlier than the
branch offset. if it is, then we need to pop from the current leaf
to its parent, and get the greatest leaf from there.
a further change is needed to ensure that we don't do this twice;
when we've moved to a new leaf and the point of difference between
it and the search key even earlier than before, then we're definitely
at a predecessor node and there's no need to continue the loop.
The previous value of 30 minutes used to cache the ADB names and entries
was quite long. Change the value to 60 seconds for faster recovery
after cached intermittent failure of the remote nameservers.
Always set ADB entry expiration to now + ADB_ENTRY_WINDOW
When ADB entry was created it was set to never expire. If we never
called any of the functions that adjust the expiration, it could linger
in the ADB forever.
Set the expiration (.expires) to now + ADB_ENTRY_WINDOW when creating
the new ADB entry to ensure the ADB entry will always expire.
Michal Nowak [Fri, 19 Apr 2024 09:58:19 +0000 (11:58 +0200)]
Drop respdiff-short CI jobs
In the past, our CI infrastructure was more sensitive to the number of
CI jobs running on it. We tried to limit long-running jobs in merge
request-triggered pipelines, as there are many of them, and spawned them
only in daily scheduled ones. Moving most of the CI infrastructure to
AWS has made it way better to run jobs in parallel, and the existence of
short respdiff jobs has lost its original merit. It can also be harmful
as some problems are detected only by the longer respdiff variant when a
faulty merge request has already been merged. We should run all long
respdiff tests in merge request-triggered pipelines.
Also, move the former respdiff-long job (now just "respdiff") to AWS as
old instance memory constraints (see f09cf69594c6aab4d0c5608226424c566b833f3c) are no longer an issue.
We now have ctx.kskflag, ctx.zskflag, and ctx.revflag, but zskflag is
not quite like the other two, as it doesn't have a special bit in the
DNS packet, and is used as a boolean.
This patch changes so that we use booleans for all three, and
construct the flags based on which ones are set.
Matthijs Mekking [Tue, 15 Aug 2023 09:56:06 +0000 (11:56 +0200)]
Test dnssec-ksr request
Add test cases for the 'request' command. Reuse the earlier
pregenerated ZSKs. We also need to set up some KSK files, that can
be done with 'dnssec-keygen -k <policy> -fK' now.
The 'check_keys()' function is adjusted such that the expected active
time of the successor key is set to the inactive time of the
predecessor. Some additional information is saved to make 'request'
testing easier.
Matthijs Mekking [Tue, 15 Aug 2023 09:46:53 +0000 (11:46 +0200)]
Refactor dnssec-ksr keygen
Create some helper functions for code that is going to be reused by the
other commands (request, sign), such as setting and checking the context
parameters, and retrieving the dnssec-policy/kasp.
Matthijs Mekking [Tue, 15 Aug 2023 08:30:36 +0000 (10:30 +0200)]
dnssec-keygen: allow -f and -k together
The 'dnssec-keygen' tool now allows the options '-k <dnssec-policy>'
and '-f <flags>' together to create keys from a DNSSEC policy that only
match the given role. Allow setting '-fZ' to only create ZSKs, while
'-fK' will only create KSKs.
Add a system test for testing dnssec-ksr, initally for the keygen
command. This should be able to create or select key files given a
DNSSEC policy and a time window.
Introduce a new DNSSEC tool, dnssec-ksr, for creating signed key
response (SKR) files, given one or more key signing requests (KSRs).
For now it is just a dummy tool, but the future purpose of this utility
is to pregenerate ZSKs and signed RRsets for DNSKEY, CDNSKEY, and CDS
for a given period that a KSK is to be offline.
Rework isccc_ccmsg to support multiple messages per tcp read
Previously, only a single controlconf message would be processed from a
single TCP read even if the TCP read buffer contained multiple messages.
Refactor the isccc_ccmsg unit to store the extra buffer in the internal
buffer and use the already read data first before reading from the
network again.