Ondřej Surý [Mon, 6 Nov 2023 19:19:20 +0000 (20:19 +0100)]
Make sure we shutdown the controlconf listeners and connections once
It was possible that controlconf connections could be shutdown twice
when shutting down the server, because they would receive the
signal (ISC_R_SHUTTINGDOWN result) from netmgr and then the shutdown
procedure would be called second time via controls_shutdown().
Split the shutdown procedure from control_recvmessage(), so we can call
it independently from netmgr callbacks and make sure it will be called
only once. Do the similar thing for the listeners.
Michał Kępień [Thu, 16 Nov 2023 10:39:51 +0000 (11:39 +0100)]
Remove steps related to the post-mortem meeting
The post-mortem meeting is now considered an on-demand event. The past
few security release cycles proved that there is rarely a need to
discuss things in this form, so there is little point in carrying out
the relevant steps for every single vulnerability - which does not
prevent us from doing so if the actual need arises.
Michał Kępień [Thu, 16 Nov 2023 10:39:51 +0000 (11:39 +0100)]
Rebase -S branches after version bumps
Applying version bumps in open source branches breaks automatic rebasing
of the bind-9.x-sub branches. Ensure the latter are manually rebased
after each version bump to prevent the "rebase" job in GitLab CI from
failing.
Michał Kępień [Thu, 16 Nov 2023 10:39:51 +0000 (11:39 +0100)]
Prepare the patches/ subdirectory earlier
The patches/ subdirectory needs to be present in each prerelease
directory before the ASN releases get pre-published or else the latter
will not contain standalone patches.
Evan Hunt [Sat, 11 Nov 2023 21:15:27 +0000 (13:15 -0800)]
set loadtime during initial transfer of a secondary zone
when transferring in a non-inline-signing secondary for the first time,
we previously never set the value of zone->loadtime, so it remained
zero. this caused a test failure in the statschannel system test,
and that test case was temporarily disabled. the value is now set
correctly and the test case has been reinstated.
Mark Andrews [Thu, 16 Nov 2023 00:15:49 +0000 (11:15 +1100)]
Check that buffer length in dns_message_renderbegin
The maximum DNS message size is 65535 octets. Check that the buffer
being passed to dns_message_renderbegin does not exceed this as the
compression code assumes that all offsets are no bigger than this.
Ondřej Surý [Tue, 7 Nov 2023 13:42:33 +0000 (14:42 +0100)]
Remove AES algorithm for DNS cookies
The AES algorithm for DNS cookies was being kept for legacy reasons, and
it can be safely removed in the next major release. Remove both the AES
usage for DNS cookies and the AES implementation itself.
Aram Sargsyan [Thu, 9 Nov 2023 08:33:35 +0000 (08:33 +0000)]
Fix catz db update callback registration logic error (take two)
Please see the 998765fea536daacfba96d8ed0a4855668d2e242 commit for
the description of the original issue. The commit had fixed the
logic error, but it was reintroduced again later with the a1afa31a5a7d318508efe5a32001104d094be057 commit, where the check of
the 'db_registered' flag was removed in dns__catz_update_cb(). The
check was removed, because the registration function was made
idempotent, so double registration is not an issue, but the check
also prevented from unneeded registration, on which the original
fix relied.
This commit just removes the update callback registration code from
the dns__catz_update_cb() function instead of bringing back the check,
because after code flow analysis, it is now clear that it's not required
at all. The "call onupdate() artificially" comment (which was mentioned
by the removed code) is speaking about the dns_catz_dbupdate_callback()
function, which is called by server.c on (re)configuration, and that
function already takes care of update callback's registration since the 998765fea536daacfba96d8ed0a4855668d2e242 commit was applied, so there
is no need to do that here again.
Aram Sargsyan [Tue, 7 Nov 2023 10:21:36 +0000 (10:21 +0000)]
Use atomics for the iterators number in isc_hashmap_t
Concurrent threads can access a hashmap for reading by creating and
then destroying an iterator, in which case the integer number of the
active iterators is increased or decreased from different threads,
introducing a data race. Use atomic operations to protect the variable.
Ondřej Surý [Tue, 7 Nov 2023 14:17:10 +0000 (15:17 +0100)]
Deprecate AES algorithm for DNS cookies
The AES algorithm for DNS cookies was being kept for legacy reasons,
and it can be safely removed in the next major release. Mark is as
deprecated, so the `named-checkconf` prints a warning when in use.
Aram Sargsyan [Tue, 7 Nov 2023 10:02:57 +0000 (10:02 +0000)]
Use a read lock when iterating over a hashmap
The 'dns_tsigkeyring_t' structure has a read/write lock to protect
its 'keys' member, which is a 'isc_hashmap_t' pointer and needs to
be protected.
The dns_tsigkeyring_dump() function, however, doesn't use the lock,
which can introduce a race with another thread, if the other thread
tries to modify the hashmap.
Add a read lock around the code, which iterates over the hashmap.
Michał Kępień [Thu, 2 Nov 2023 06:22:20 +0000 (07:22 +0100)]
Add a release signing job to GitLab CI
Add a GitLab CI job that is only run for tags and makes signing BIND 9
releases more convenient by utilizing a signing VM that is registered as
a GitLab CI runner. This pulls the signing process into the release
pipelines in GitLab CI, resulting in job artifacts containing the
signatures for BIND 9 releases, which in turns simplifies the subsequent
release publication steps.
Evan Hunt [Mon, 14 Aug 2023 21:28:53 +0000 (14:28 -0700)]
if GLUEOK is set, and glue is found in a zone DB, don't check the cache
EXPERIMENT: when DNS_DB_GLUEOK is set, dns_view_find() will now return
glue if it is found it a local zone database, without checking to see
if a better answer has been cached previously.
Mark Andrews [Mon, 14 Aug 2023 00:26:18 +0000 (10:26 +1000)]
Also look for additional records in dns_adb_find
If a child zone is served by the same servers as a parent zone and
a NS query is made for the zone name then the addresses of the
nameservers are returned in the additional section are tagged as
trust additional.
Evan Hunt [Tue, 31 Oct 2023 11:00:14 +0000 (12:00 +0100)]
restore isc_mem_setwater() call in the cache
Commit 4db150437e14b28c5b50ae466af9ce502fd73185 incorrectly removed the
call to isc_mem_setwater() from dns_cache_setcachesize(). The water()
function is a no-op, but we still need to set high- and low-water marks
in the memory context, otherwise overmem conditions will not be
detected.
Tom Krizek [Wed, 27 Sep 2023 13:48:31 +0000 (15:48 +0200)]
ci: trigger a DNS Shotgun performance test
Run comparative performance tests against the latest released version of
the same branch. This is done for different protocols with an
appropriate load the server is expected to be able to handle.
Currently, the results need to be inspected manually, since a success of
the job doesn't indicate there is no issue. Instead, the job provides an
URL to an overview with latency, memory and CPU charts which display the
test results with the current code against the reference version. There
should be no major unexplained and reproducible differences in the
charts.
Tom Krizek [Wed, 27 Sep 2023 15:41:26 +0000 (17:41 +0200)]
util: script to get DNS Shotgun pipeline results
The shotgun performance tests are executed in a different repository, in
a couple of different pipelines. To hide away the complexity, this
script takes the pipeline ID of the triggered pipeline and then takes
care of the rest - waits for the pipeline to finish, locates the child
pipeline and the relevant results. The output from this script is a
convenient link to the charts with the results once they're available.
GitLab also has a mechanism which can wait for another pipeline.
However, it can't be utilized here, since there are variables which
need to be passed in when the pipeline is triggered (like protocol to be
tested, load, runtime etc.). This isn't currently supported by the
GitLab feature.
Tom Krizek [Wed, 27 Sep 2023 13:26:10 +0000 (15:26 +0200)]
ci: move baseline version detection into separate job
Multiple CI jobs may utilize a baseline version, i.e. the version that
the current code should be tested against when doing comparative
testing. To avoid repeating the non-trivial detection of the baseline
version, move it into a separate job which creates an environment file
that subsequent jobs may require via `needs` option. It is then possible
to use the variable(s) defined in the script section of the new job.
Matthijs Mekking [Mon, 30 Oct 2023 18:33:19 +0000 (19:33 +0100)]
Don't ignore auth zones when in serve-stale mode
When serve-stale is enabled and recursive resolution fails, the fallback
to lookup stale data always happens in the cache database. Any
authoritative data is ignored, and only information learned through
recursive resolution is examined.
If there is data in the cache that could lead to an answer, and this can
be just the root delegation, the resolver will iterate further, getting
closer to the answer that can be found by recursing down the root, and
eventually puts the final response in the cache.
Change the fallback to serve-stale to use 'query_getdb()', that finds
out the best matching database for the given query.
Matthijs Mekking [Mon, 23 Oct 2023 11:52:12 +0000 (13:52 +0200)]
Test case for issue #4355
Add a test case where serve-stale is enabled on a server that also
servers a local authoritative zone.
The particular case tests a lame delegation and checks if falling
back to serving stale data does not attempt to retrieve the query
by recursing from the root down.
Ondřej Surý [Fri, 27 Oct 2023 06:54:59 +0000 (08:54 +0200)]
Bump the mempool sizes in dns_message
Increasing the initial and freemax sizes for dns_message memory pools
restores the root zone performance. The former sizes were suited for
per-dns_message memory pools and we need to bump the sizes up for
per-thread memory pools.
Michał Kępień [Fri, 27 Oct 2023 11:19:03 +0000 (13:19 +0200)]
Always use default RCU variant in pairwise builds
Commit 42d43aa0758513a45b54e0fd0bff4381fdc4d803 made --with-liburcu
depend on --enable-developer. This broke pairwise testing as this new
dependency was not codified in configure.ac. Since the --with-liburcu
option is currently just a convenience for developers, there is no need
to test building against all possible RCU variants in GitLab CI until
they actually work with BIND 9. Update the pairwise testing
"configuration" in configure.ac so that builds with non-standard RCU
variants are not tested.
Ondřej Surý [Fri, 27 Oct 2023 09:44:02 +0000 (11:44 +0200)]
Bump the timeouts in the dispatch_test
The client connection timeout was set to just one second, which might
not be enough on busy systems (and the CI machines are oh-boy-busy).
Bump the server timeouts to 10 seconds and client timeouts to 5 seconds,
this will make the unit test run a little bit longer, but it should be
more reliable.
Ondřej Surý [Fri, 27 Oct 2023 08:16:13 +0000 (10:16 +0200)]
Call isccc_ccmsg_invalidate() when shutting down the connection
Previously, the isccc_ccmsg_invalidate() was called from conn_free() and
this could lead to netmgr calling control_recvmessage() after we
detached the reading controlconnection_t reference, but it wouldn't be
the last reference because controlconnection_t is also attached/detached
when sending response or running command asynchronously.
Instead, move the isccc_ccmsg_invalidate() call to control_recvmessage()
error handling path to make sure that control_recvmessage() won't be
ever called again from the netmgr.
Ondřej Surý [Thu, 26 Oct 2023 08:47:03 +0000 (10:47 +0200)]
Replace mutex for listener->connections with TID check
The controlconf channel runs single-threaded on the main thread.
Replace the listener->connections locking with check that we are still
running on the thread with TID 0.
Ondřej Surý [Thu, 26 Oct 2023 09:55:54 +0000 (11:55 +0200)]
Remove the lock-file configuration and -X argument to named
The lock-file configuration (both from configuration file and -X
argument to named) has better alternatives nowadays. Modern process
supervisor should be used to ensure that a single named process is
running on a given configuration.
Alternatively, it's possible to wrap the named with flock(1).
Ondřej Surý [Thu, 26 Oct 2023 09:08:49 +0000 (11:08 +0200)]
Mark the lock-file configuration option as deprecated
This is first step in removing the lock-file configuration option, it
marks both the `lock-file` configuration directive and -X option to
named as deprecated.
Aram Sargsyan [Thu, 26 Oct 2023 12:28:25 +0000 (12:28 +0000)]
Do not warn about lock-file option change when -X is used
When -X is used the 'lock-file' option change detection condition
is invalid, because it compares the 'lock-file' option's value to
the '-X' argument's value instead of the older 'lock-file' option
value (which was ignored because of '-X').
Don't warn about changing 'lock-file' option if '-X' is used.
Aram Sargsyan [Thu, 26 Oct 2023 12:24:17 +0000 (12:24 +0000)]
Fix an invalid condition check when detecting a lock-file change
It is obvious that the '!cfg_obj_asstring(obj)' check should be
'cfg_obj_asstring(obj)' instead, because it is an AND logic chain
which further uses 'obj' as a string.
Aram Sargsyan [Thu, 26 Oct 2023 12:21:57 +0000 (12:21 +0000)]
Fix assertion failure when using -X none and lock-file in configuration
When 'lock-file <lockfile>' is used in configuration at the same time
as using '-X none' in 'named' invocation, there is an invalid
logic that would lead to a isc_mem_strdup() call on a NULL value.
Also, contradicting to ARM, 'lock-file none' is overriding the '-X'
argument.
Fix the overall logic, and make sure that the '-X' takes precedence to
'lock-file'.