Michał Kępień [Thu, 2 Nov 2023 06:22:20 +0000 (07:22 +0100)]
Add a release signing job to GitLab CI
Add a GitLab CI job that is only run for tags and makes signing BIND 9
releases more convenient by utilizing a signing VM that is registered as
a GitLab CI runner. This pulls the signing process into the release
pipelines in GitLab CI, resulting in job artifacts containing the
signatures for BIND 9 releases, which in turns simplifies the subsequent
release publication steps.
Evan Hunt [Mon, 14 Aug 2023 21:28:53 +0000 (14:28 -0700)]
if GLUEOK is set, and glue is found in a zone DB, don't check the cache
EXPERIMENT: when DNS_DB_GLUEOK is set, dns_view_find() will now return
glue if it is found it a local zone database, without checking to see
if a better answer has been cached previously.
Mark Andrews [Mon, 14 Aug 2023 00:26:18 +0000 (10:26 +1000)]
Also look for additional records in dns_adb_find
If a child zone is served by the same servers as a parent zone and
a NS query is made for the zone name then the addresses of the
nameservers are returned in the additional section are tagged as
trust additional.
Evan Hunt [Tue, 31 Oct 2023 11:00:14 +0000 (12:00 +0100)]
restore isc_mem_setwater() call in the cache
Commit 4db150437e14b28c5b50ae466af9ce502fd73185 incorrectly removed the
call to isc_mem_setwater() from dns_cache_setcachesize(). The water()
function is a no-op, but we still need to set high- and low-water marks
in the memory context, otherwise overmem conditions will not be
detected.
Tom Krizek [Wed, 27 Sep 2023 13:48:31 +0000 (15:48 +0200)]
ci: trigger a DNS Shotgun performance test
Run comparative performance tests against the latest released version of
the same branch. This is done for different protocols with an
appropriate load the server is expected to be able to handle.
Currently, the results need to be inspected manually, since a success of
the job doesn't indicate there is no issue. Instead, the job provides an
URL to an overview with latency, memory and CPU charts which display the
test results with the current code against the reference version. There
should be no major unexplained and reproducible differences in the
charts.
Tom Krizek [Wed, 27 Sep 2023 15:41:26 +0000 (17:41 +0200)]
util: script to get DNS Shotgun pipeline results
The shotgun performance tests are executed in a different repository, in
a couple of different pipelines. To hide away the complexity, this
script takes the pipeline ID of the triggered pipeline and then takes
care of the rest - waits for the pipeline to finish, locates the child
pipeline and the relevant results. The output from this script is a
convenient link to the charts with the results once they're available.
GitLab also has a mechanism which can wait for another pipeline.
However, it can't be utilized here, since there are variables which
need to be passed in when the pipeline is triggered (like protocol to be
tested, load, runtime etc.). This isn't currently supported by the
GitLab feature.
Tom Krizek [Wed, 27 Sep 2023 13:26:10 +0000 (15:26 +0200)]
ci: move baseline version detection into separate job
Multiple CI jobs may utilize a baseline version, i.e. the version that
the current code should be tested against when doing comparative
testing. To avoid repeating the non-trivial detection of the baseline
version, move it into a separate job which creates an environment file
that subsequent jobs may require via `needs` option. It is then possible
to use the variable(s) defined in the script section of the new job.
Matthijs Mekking [Mon, 30 Oct 2023 18:33:19 +0000 (19:33 +0100)]
Don't ignore auth zones when in serve-stale mode
When serve-stale is enabled and recursive resolution fails, the fallback
to lookup stale data always happens in the cache database. Any
authoritative data is ignored, and only information learned through
recursive resolution is examined.
If there is data in the cache that could lead to an answer, and this can
be just the root delegation, the resolver will iterate further, getting
closer to the answer that can be found by recursing down the root, and
eventually puts the final response in the cache.
Change the fallback to serve-stale to use 'query_getdb()', that finds
out the best matching database for the given query.
Matthijs Mekking [Mon, 23 Oct 2023 11:52:12 +0000 (13:52 +0200)]
Test case for issue #4355
Add a test case where serve-stale is enabled on a server that also
servers a local authoritative zone.
The particular case tests a lame delegation and checks if falling
back to serving stale data does not attempt to retrieve the query
by recursing from the root down.
Ondřej Surý [Fri, 27 Oct 2023 06:54:59 +0000 (08:54 +0200)]
Bump the mempool sizes in dns_message
Increasing the initial and freemax sizes for dns_message memory pools
restores the root zone performance. The former sizes were suited for
per-dns_message memory pools and we need to bump the sizes up for
per-thread memory pools.
Michał Kępień [Fri, 27 Oct 2023 11:19:03 +0000 (13:19 +0200)]
Always use default RCU variant in pairwise builds
Commit 42d43aa0758513a45b54e0fd0bff4381fdc4d803 made --with-liburcu
depend on --enable-developer. This broke pairwise testing as this new
dependency was not codified in configure.ac. Since the --with-liburcu
option is currently just a convenience for developers, there is no need
to test building against all possible RCU variants in GitLab CI until
they actually work with BIND 9. Update the pairwise testing
"configuration" in configure.ac so that builds with non-standard RCU
variants are not tested.
Ondřej Surý [Fri, 27 Oct 2023 09:44:02 +0000 (11:44 +0200)]
Bump the timeouts in the dispatch_test
The client connection timeout was set to just one second, which might
not be enough on busy systems (and the CI machines are oh-boy-busy).
Bump the server timeouts to 10 seconds and client timeouts to 5 seconds,
this will make the unit test run a little bit longer, but it should be
more reliable.
Ondřej Surý [Fri, 27 Oct 2023 08:16:13 +0000 (10:16 +0200)]
Call isccc_ccmsg_invalidate() when shutting down the connection
Previously, the isccc_ccmsg_invalidate() was called from conn_free() and
this could lead to netmgr calling control_recvmessage() after we
detached the reading controlconnection_t reference, but it wouldn't be
the last reference because controlconnection_t is also attached/detached
when sending response or running command asynchronously.
Instead, move the isccc_ccmsg_invalidate() call to control_recvmessage()
error handling path to make sure that control_recvmessage() won't be
ever called again from the netmgr.
Ondřej Surý [Thu, 26 Oct 2023 08:47:03 +0000 (10:47 +0200)]
Replace mutex for listener->connections with TID check
The controlconf channel runs single-threaded on the main thread.
Replace the listener->connections locking with check that we are still
running on the thread with TID 0.
Ondřej Surý [Thu, 26 Oct 2023 09:55:54 +0000 (11:55 +0200)]
Remove the lock-file configuration and -X argument to named
The lock-file configuration (both from configuration file and -X
argument to named) has better alternatives nowadays. Modern process
supervisor should be used to ensure that a single named process is
running on a given configuration.
Alternatively, it's possible to wrap the named with flock(1).
Ondřej Surý [Thu, 26 Oct 2023 09:08:49 +0000 (11:08 +0200)]
Mark the lock-file configuration option as deprecated
This is first step in removing the lock-file configuration option, it
marks both the `lock-file` configuration directive and -X option to
named as deprecated.
Aram Sargsyan [Thu, 26 Oct 2023 12:28:25 +0000 (12:28 +0000)]
Do not warn about lock-file option change when -X is used
When -X is used the 'lock-file' option change detection condition
is invalid, because it compares the 'lock-file' option's value to
the '-X' argument's value instead of the older 'lock-file' option
value (which was ignored because of '-X').
Don't warn about changing 'lock-file' option if '-X' is used.
Aram Sargsyan [Thu, 26 Oct 2023 12:24:17 +0000 (12:24 +0000)]
Fix an invalid condition check when detecting a lock-file change
It is obvious that the '!cfg_obj_asstring(obj)' check should be
'cfg_obj_asstring(obj)' instead, because it is an AND logic chain
which further uses 'obj' as a string.
Aram Sargsyan [Thu, 26 Oct 2023 12:21:57 +0000 (12:21 +0000)]
Fix assertion failure when using -X none and lock-file in configuration
When 'lock-file <lockfile>' is used in configuration at the same time
as using '-X none' in 'named' invocation, there is an invalid
logic that would lead to a isc_mem_strdup() call on a NULL value.
Also, contradicting to ARM, 'lock-file none' is overriding the '-X'
argument.
Fix the overall logic, and make sure that the '-X' takes precedence to
'lock-file'.
Ondřej Surý [Thu, 26 Oct 2023 08:54:28 +0000 (10:54 +0200)]
Fix assertion failure when using -X and lock-file in configuration
When 'lock-file <lockfile1>' was used in configuration at the same time
as using `-X <lockfile2>` in `named` invocation, there was an invalid
logic that would lead to a double isc_mem_strdup() call on the
<lockfile2> value.
Skip the second allocation if `lock-file` is being used in
configuration, so the <lockfile2> is used only single time.
Ondřej Surý [Thu, 26 Oct 2023 07:14:10 +0000 (09:14 +0200)]
Allowing changing Userspace-RCU variant only in developer mode
The Userspace-RCU variants other than membarrier is untested and at
least in QSBR case it's broken. Allow changing the Userspace-RCU
variant only in the developer's mode.
Evan Hunt [Wed, 25 Oct 2023 21:59:55 +0000 (14:59 -0700)]
Prevent a possible race in dns_qpmulti_query() and _snapshot()
The `.reader` member of dns_qpmulti_t was accessed without RCU
protection; reader_open() calls rcu_dereference() on it, and this
call needs to be inside an RCU critical section.
A similar problem was identified in the dns_qpmulti_snapshot() - the
RCU critical section was completely missing.
These are relicts of the isc_qsbr - in the QSBR mode the rcu_read_lock()
and rcu_read_unlock() are no-ops and whole event loop is a critical section.
Mark Andrews [Thu, 26 Oct 2023 04:07:58 +0000 (15:07 +1100)]
Check that the lock file was not removed too early
When named fails to starts due to not being able to obtain
a lock on the lock file that lock file should remain. Check
that the lock file exists before and after the attempt to
start a second instance of named.
Mark Andrews [Thu, 26 Oct 2023 03:50:43 +0000 (14:50 +1100)]
Only remove the lock file if we managed to lock it
The lock file was being removed when we hadn't successfully locked
it which defeated the purpose of the lockfile. Adjust cleanup_lockfile
such that it only unlinks the lockfile if we have successfully locked
the lockfile and it is still active (lockfile != NULL).
Ondřej Surý [Thu, 19 Oct 2023 08:22:59 +0000 (10:22 +0200)]
Refactor dns_message using ISC_LIST_FOREACH macros
Do a light refactoring and cleanups that replaces common list walking
patterns with ISC_LIST_FOREACH macros and split some nested loops into
separate static functions to reduce the nesting depth.
Ondřej Surý [Thu, 19 Oct 2023 08:21:20 +0000 (10:21 +0200)]
Add ISC_LIST_FOREACH_REV(_SAFE) macros
Add complementary macros to ISC_LIST_FOREACH(_SAFE) that walk the lists
in reverse.
* ISC_LIST_FOREACH_REV(list, elt, link) - walk the static list from
tail to head
* ISC_LIST_FOREACH_REV_SAFE(list, elt, link, next) - walk the list
from tail to head in a manner that's safe against list member
deletions
Ondřej Surý [Mon, 23 Oct 2023 10:26:50 +0000 (12:26 +0200)]
Add dispatch_getcp and dispatch_newtcp tests
Refactor the dispatch unit test to use more local variables (previously
dispatchmgr, dispatch and dispentry were all global), and add two new
tests:
* dispatch_getcp - test whether the TCP connection will get reused
* dispatch_newtcp - test that the TCP connection will not get reused
when DNS_DISPATCHOPT_UNSHARED is in effect
Ondřej Surý [Fri, 20 Oct 2023 06:14:27 +0000 (08:14 +0200)]
Add option to mark TCP dispatch as unshared
The current dispatch code could reuse the TCP connection when
dns_dispatch_gettcp() would be used first. This is problematic as the
dns_resolver doesn't use TCP connection sharing, but dns_request could
get the TCP stream that was created outside of the dns_request.
Add new DNS_DISPATCHOPT_UNSHARED option to dns_dispatch_createtcp() that
would prevent the TCP stream to be reused. Use that option in the
dns_resolver call to dns_dispatch_createtcp() to prevent dns_request
from reusing the TCP connections created by dns_resolver.
Additionally, the dns_xfrin unit added TCP connection sharing for
incoming transfers. While interleaving *xfr streams on a TCP connection
should work this should be a deliberate change and be property of the
server that can be controlled. Additionally some level of parallel TCP
streams is desirable. Revert to the old behaviour by removing the
dns_dispatch_gettcp() calls from dns_xfrin and use the new option to
prevent from sharing the transfer streams with dns_request.
Ondřej Surý [Fri, 20 Oct 2023 05:58:26 +0000 (07:58 +0200)]
Don't set the offloaded work result from main thread
The xfrin_recv_done() was accessing xfr->result where we stored the
result of the offloaded work from a thread that could receive data while
processing the transfer on the offloaded thread.
Completely remove the offloaded result from the dns_xfrin_t structure
and keep it local for *xfr_apply() and *xfr_apply_done() as the failure
is already recorded in .shutdown_result and we now that the processing
has failed because .shuttingdown has been already set.
Aram Sargsyan [Thu, 19 Oct 2023 12:57:13 +0000 (12:57 +0000)]
sd_notify(3): set the MONOTONIC_USEC field with RELOADING=1
When using sd_notify(3) to send a message to the service manager
about named being reloaded, systemd also requires the MONOTONIC_USEC
field to be set to the current monotonic time in microseconds,
otherwise the 'systemctl reload' command fails.
Aram Sargsyan [Fri, 20 Oct 2023 10:45:35 +0000 (10:45 +0000)]
Fix shutdown races in catzs
The dns__catz_update_cb() does not expect that 'catzs->zones'
can become NULL during shutdown.
Add similar checks in the dns__catz_update_cb() and dns_catz_zone_get()
functions to protect from such a case. Also add an INSIST in the
dns_catz_zone_add() function to explicitly state that such a case
is not expected there, because that function is called only during a
reconfiguration.
Mark Andrews [Tue, 17 Oct 2023 23:45:41 +0000 (10:45 +1100)]
Suppress reporting upcoming changes in root hints
To reduce the amount of log spam when root servers change their
addresses keep a table of upcoming changes by expected date and time
and suppress reporting differences for them until then.
Add initial entry for B.ROOT-SERVERS.NET, Nov 27, 2023.