Ondřej Surý [Thu, 12 Oct 2023 15:48:56 +0000 (17:48 +0200)]
Remove the logic that applies differences when over limit
The ixfr_putdata() and axfr_putdata() had a logic to apply dns_diff when
the number of pending tuples went over 100. Since we are going to
offload the XFR data processing, we don't need to do that anymore.
Ondřej Surý [Thu, 19 Oct 2023 09:39:53 +0000 (11:39 +0200)]
Disable OpenSSL memory contexts for OpenSSL < 3.0.0
OpenSSL 1.1 has already reached end-of-life and since we are
experiencing a weird memory leak in the mirror system test on just
Ubuntu 20.04 (Focal) with OpenSSL 1.1, we disable the legacy code for
enabling memory contexts for OpenSSL < 3.0.0 in this commit.
Aram Sargsyan [Thu, 19 Oct 2023 08:46:58 +0000 (08:46 +0000)]
Fix an error in the qp_test.c unit test
In order to check whether there are enough inserted values the
code uses the 'tests' variable (loop counter), which is unreliable,
because the loop sometimes removes an item instead of inserting
one (when the randomly generated item already exists).
Instead of the loop counter, use the existing variable 'inserted',
which should indicate the correct number of the inserted items.
Mark Andrews [Wed, 16 Aug 2023 04:40:12 +0000 (14:40 +1000)]
Adjust UDP timeouts used in zone maintenance
Drop timeout before resending a UDP request from 15 seconds to 5
seconds and add 1 second to the total time to allow for the reply
to the third request to arrive. This will speed up the time it
takes for named to recover from a lost packet when refreshing a
zone and for it to determine that a primary is down.
Matthijs Mekking [Thu, 12 Oct 2023 11:56:46 +0000 (13:56 +0200)]
Update addzone test
Now that inline-signing is ignored when there is no dnssec-policy,
add 'dnssec-policy default;' to the zones when attempting to add them
via 'rndc addzone'.
Matthijs Mekking [Thu, 12 Oct 2023 10:04:30 +0000 (12:04 +0200)]
Update inline-signing documentation
Add the missing documentation for 'dnssec-policy/inline-signing'.
Update the zone-only option 'inline-signing' to indicate that the
use of inline signing should be set in 'dnssec-policy' and that this
is merely a way to override the value for the given zone.
Matthijs Mekking [Thu, 12 Oct 2023 10:02:02 +0000 (12:02 +0200)]
Ignore inline-signing by default
Ignore the option 'inline-signing' unless there is a 'dnssec-policy'
configured for the zone. Having inline signing enabled while the zone
is not DNSSEC signed does not make sense.
If there is a 'dnssec-policy' the 'inline-signing' zone-only option
can be used to override the value for the given zone.
Matthijs Mekking [Mon, 16 Oct 2023 09:08:59 +0000 (11:08 +0200)]
Two minor fixes in the kasp system test
The 'dynamic-signed-inline-signing.kasp' zone was set up with
the environment variable 'ksktimes', but that should be 'csktimes'
which is set one line above. Since the values are currently the same
the behavior is identical, but of course it should use the correct
variable.
The 'step4.enable-dnssec.autosign' zone was set up twice. This is
unnecessary.
Matthijs Mekking [Fri, 13 Oct 2023 09:46:05 +0000 (11:46 +0200)]
Don't resign raw version of the zone
Update the function 'set_resigntime()' so that raw versions of
inline-signing zones are not scheduled to be resigned.
Also update the check in the same function for zone is dynamic, there
exists a function 'dns_zone_isdynamic()' that does a similar thing
and is more complete.
Also in 'zone_postload()' check whether the zone is not the raw
version of an inline-signing zone, preventing calculating the next
resign time.
Ondřej Surý [Fri, 13 Oct 2023 06:59:41 +0000 (08:59 +0200)]
Convert rwlock in dns_acl to RCU
The dns_aclenv_t contains two dns_acl_t - localhost and localnets that
can be swapped with a different ACLs as we configure BIND 9. Instead of
protecting those two pointers with heavyweight read-write lock, use RCU
mechanism to dereference and swap the pointers.
Petr Špaček [Tue, 10 Oct 2023 09:27:16 +0000 (11:27 +0200)]
Describe BIND threat model
Basically all local data is considered trusted, and proper ACLs and
limits need to be explicitly configured. We are also free to let
protocol non-compliant servers burn in flames.
Michał Kępień [Thu, 12 Oct 2023 12:24:42 +0000 (14:24 +0200)]
Remove PDF-related bits from the build system
Read the Docs is capable of building the PDF version of the BIND 9 ARM
using just the contents of the doc/arm/ directory - it does not need the
build system to facilitate that. Since the BIND 9 ARM is also built in
other formats when "make doc" is run, drop the parts of the build system
that enable building the PDF version as they pull in complexity without
bringing much added value in return. Update related files accordingly.
Ondřej Surý [Thu, 12 Oct 2023 07:20:42 +0000 (09:20 +0200)]
Use mul and div instead of bitshifts to calculate srtt
There was a microoptimization for smoothing srtt with bitshifts. Revert
the code to use * 98 / 100, it doesn't really make that difference on
modern CPUs, for comparison here:
muldiv:
imul eax, edi, 98
imul rax, rax, 1374389535
shr rax, 37
ret
shift:
mov eax, edi
sal eax, 9
sub eax, edi
shr eax, 9
ret
Ondřej Surý [Thu, 12 Oct 2023 07:17:40 +0000 (09:17 +0200)]
Skip the no-op code in adjustsrtt()
If factor == DNS_ADB_RTTADJAGE and addr->entry->lastage == now we would
load value into new_srtt and then immediatelly store it back which
triggers the synchronization between threads using .srtt values.
Replace some ADB entry locking with atomics to reduce ADB contention
Use atomics on couple of ADB entry members (.srtt, .flags, .expires, and
.lastage) to remove ADB entry locking from couple of hot spots. The
most prominent place is copy_namehook_lists() that gets called under ADB
name lock and if the namehook list is long it acquires-releases quite a
few ADB entry locks. Changing those ADB entry members to atomics
allowed us to new_adbaddrinfo() not require locked ADB entry and since
adbentry_overquota() already used atomics and handling lame information
was dropped in the previous commit, we could not make the
copy_namehook_lists() lockless.
The other hotspot is dns_adb_adjustsrtt() and dns_adb_agesrtt() that can
now use atomics because .srtt is already atomic_uint.
And the last place that could now use atomics is dns_adb_changeflags().
Keeping the information about lame server in the ADB was done in !322 to
fix following security issue:
[CVE-2021-25219] Disable "lame-ttl" cache
The handling of the lame servers needs to be redesigned and it is not
going to be enabled any time soon, and the current code is just dead
code that takes up space, code and stands in the way of making ADB work
faster.
Remove all the internals needed for handling the lame servers in the ADB
for now. It might get reintroduced later if and when we redesign ADB.
now that we're using qpmulti for the summary database, we
no longer need to hold search_lock for it. we do still need
it for the radix tree and the trigger counts.
depending on how the QP trie is traversed during a lookup, it is
possible for a search to terminate on a leaf which is a partial
match, without that leaf being added to the chain. to ensure the
chain is correct in this case, when a partial match condition is
detected via qpkey_compare(), we will call add_link() again, just
in case. (add_link() will check for a duplicated node, so it will
be harmless if it was already done.)
Ondřej Surý [Fri, 6 Oct 2023 08:30:21 +0000 (10:30 +0200)]
Add base testing set of names for load-names benchmark
This was generated from dnsperf queryfile with following script:
#!/usr/bin/env python3
names = {}
import sys
i = 0
for line in iter(sys.stdin.readline, ''):
name = line.rstrip('\n')
if not name in names:
names[name] = line
print(f"{i},{name}")
i += 1
if i >= 1024*1024:
break
Ondřej Surý [Fri, 6 Oct 2023 08:28:32 +0000 (10:28 +0200)]
Fix hashmap part of load-names benchmark
The name_match() was errorneously converting struct item into dns_name
pointer. Correctly retype void *node to struct item * first and then
use item.fixed.name to pass the name to dns_name_equal() function.
Ondřej Surý [Fri, 6 Oct 2023 08:06:36 +0000 (10:06 +0200)]
Use read number of items instead of raw array size in load_names
The load_names benchmark expected the input CSV with domains would fill
the whole item array and it would crash when the number of lines would
be less than that.
Fix the expectations by using the real number or lines read to calculate
the array start and end position for each benchmark thread.
Michał Kępień [Fri, 6 Oct 2023 11:07:55 +0000 (13:07 +0200)]
Move Linux "stress" tests to autoscaled instances
The autoscaling GitLab CI runners currently used for most GitLab CI jobs
spin up AWS EC2 instances that are at least as powerful as the dedicated
instances used for running "stress" tests. Move all Linux-based
"stress" tests to autoscaling GitLab CI runners to enable deprovisioning
Linux AWS instances reserved for running "stress" tests. Leave FreeBSD
"stress" tests intact as there is currently no support for autoscaling
BSD instances.
Michal Nowak [Wed, 16 Aug 2023 14:19:30 +0000 (16:19 +0200)]
Report hung system tests
At times, a problem might occur where a test is not responding,
especially in the CI, determining the specific test responsible can be
difficult. Fortunately, when running tests with the pytest runner,
pytest sets the PYTEST_CURRENT_TEST environment variable to the current
test nodeid and stage. Afterward, the variable can be examined to
identify the test that has stopped responding.
The monitoring script needs to be started in the background. Still, the
shell executor used for BSD and FIPS testing can't handle the background
process cleanly, and the script step will wait for the background
process for the entire duration of the background process (currently
3000 seconds). Therefore, run the monitoring script only when the Docker
executor is used where this is not a problem.
Move the block on the error path, where the link is checked, to a place
where it makes sense, to avoid accessing an unitialized link when
jumping to the 'cleanup_query' label from 4 different places. The link
is initialized only after those jumps happen.
In addition, initilize the link when creating the object, to avoid
similar errors.
get predecessor name in dns_qp_findname_ancestor()
dns_qp_findname_ancestor() now takes an optional 'predecessor'
parameter, which if non-NULL is updated to contain the DNSSEC
predecessor of the name searched for. this is done by constructing
an iterator stack while carrying out the search, so it can be used
to step backward if needed.
since dns_qp_findname_ancestor() can now return a chain object, it is no
longer necessary to provide a _NOEXACT search option. if we want to look
up the closest ancestor of a name, we can just do a normal search, and
if successful, retrieve the second-to-last node from the QP chain.
this makes ancestor lookups slightly more complicated for the caller,
but allows us to simplify the code in dns_qp_findname_ancestor(), making
it easier to ensure correctness. this was a fairly rare use case:
outside of unit tests, DNS_QPFIND_NOEXACT was only used in the zone
table, which has now been updated to use the QP chain. the equivalent
RBT feature is only used by the resolver for cache lookups of 'atparent'
types (i.e, DS records).
- make iterators reversible: refactor dns_qpiter_next() and add a new
dns_qpiter_prev() function to support iterating both forwards and
backwards through a QP trie.
- added a 'name' parameter to dns_qpiter_next() (as well as _prev())
to make it easier to retrieve the nodename while iterating, without
having to construct it from pointer value data.
- the helper functions for accessing twigs beneath a branch
(branch_twig_pos(), branch_twig_ptr(), etc) were somewhat confusing
to read, since several of them were implemented by calling other
helper functions. they now all show what they're really doing.
- branch_twigs_vector() has been renamed to simply branch_twigs().
- revised some unrelated comments in qp_p.h for clarity.
dns_qp_findname_ancestor() now takes an optional 'chain' parameter;
if set, the dns_qpchain object it points to will be updated with an
array of pointers to the populated nodes between the tree root and the
requested name. the number of nodes in the chain can then be accessed
using dns_qpchain_length() and the individual nodes using
dns_qpchain_node().
modify dns_qp_findname_ancestor() to return found name
add a 'foundname' parameter to dns_qp_findname_ancestor(),
and use it to set the found name in dns_nametree.
this required adding a dns_qpkey_toname() function; that was
done by moving qp_test_keytoname() from the test library to qp.c.
added some more test cases and fixed bugs with the handling of
relative and empty names.
this loads a file containing DNS names and measures the time it takes to:
1) iterate it,
2) look up each name with dns_qp_getname()
3) look up each name with dns_qp_findname_ancestor()
4) look up a modified name based on the name, to check performance
when the name is not found.
the refactoring of isc_job_run() and isc_async_run() in 9.19.12
intefered with the way the qpmulti benchmark uses uv_idle.
it has now been modified to use isc_job/isc_async instead.
When the given zone is not associated with a zone manager, the function
currently returns ISC_R_NOTFOUND, which is documented as the return
value for the case in which no incoming zone transfer is found. Make
the function return ISC_R_FAILURE in such a case instead.
Also update the description of the function as the value it returns is
not meant to indicate whether an ongoing incoming transfer for the given
zone exists. The boolean variables that the function sets via the
pointers provided as its parameters, combined with either keeping
'*xfrp' set to NULL or updating it to a valid pointer, can be used by
the caller to infer all the necessary information.
The TRY0 macro doesn't set the 'result' variable, so the error
log message is never printed. Remove the 'result' variable and
modify the function's control flow to be similar to the the
zone_xmlrender() function, with a separate error returning path.
Workaround compiler bug that optimizes setting .free_pools
The .free_pools bitfield would not be set on some levels of
optimizations - workaround the compiler bug by reordering the setting
the .freepools in the initializer.