Michał Kępień [Wed, 29 Dec 2021 08:58:48 +0000 (09:58 +0100)]
Clarify use of the "today" Sphinx variable
Add a comment explaining the purpose of setting the "today" variable in
Sphinx invocations to prevent confusion caused by the absence of that
variable from reStructuredText sources.
Drop the -A command-line option from the sphinx-build invocation for
EPUB output as "today" is already set in the ALLSPHINXOPTS variable.
Michał Kępień [Wed, 29 Dec 2021 08:58:48 +0000 (09:58 +0100)]
Set version and release variables in conf.py
Some Sphinx variables used in the ARM are only set in Makefile.docs.
This works fine when building the ARM using "make", but does not work
with Read the Docs, which only looks at conf.py files.
Since Read the Docs does not run ./configure, renaming conf.py to
conf.py.in and using Autoconf output variables is not a feasible
solution.
Instead, extend doc/arm/conf.py with some Python code which processes
configure.ac using regular expressions and sets the relevant Sphinx
variables accordingly. As this solution also works fine when building
the ARM using "make", drop the relevant -D options from the list of
sphinx-build options used for building the ARM in Makefile.docs.
Note that the man_SPHINXOPTS counterparts of the removed -D switches are
left intact because doc/man/conf.py is a separate Sphinx project which
is only processed using "make" and duplicating the Python code added to
doc/arm/conf.py by this commit would be inelegant.
Artem Boldariev [Thu, 23 Dec 2021 14:08:41 +0000 (16:08 +0200)]
Use the TLS context cache for client-side contexts (XoT)
This commit enables client-side TLS contexts re-use for zone transfers
over TLS. That, in turn, makes it possible to use the internal session
cache associated with the contexts, allowing the TLS connections to be
established faster and requiring fewer resources by not going through
the full TLS handshake procedure.
Previously that would recreate the context on every connection, making
TLS session resumption impossible.
Also, this change lays down a foundation for Strict TLS (when the
client validates a server certificate), as the TLS context cache can
be extended to store additional data required for validation (like
intermediates CA chain).
Artem Boldariev [Thu, 23 Dec 2021 10:01:34 +0000 (12:01 +0200)]
Use the TLS context cache for server-side contexts
Using the TLS context cache for server-side contexts could reduce the
number of contexts to initialise in the configurations when e.g. the
same 'tls' entry is used in multiple 'listen-on' statements for the
same DNS transport, binding to multiple IP addresses.
In such a case, only one TLS context will be created, instead of a
context per IP address, which could reduce the initialisation time, as
initialising even a non-ephemeral TLS context introduces some delay,
which can be *visually* noticeable by log activity.
Also, this change lays down a foundation for Mutual TLS (when the
server validates a client certificate, additionally to a client
validating the server), as the TLS context cache can be extended to
store additional data required for validation (like intermediates CA
chain).
Additionally to the above, the change ensures that the contexts are
not being changed after initialisation, as such a practice is frowned
upon. Previously we would set the supported ALPN tags within
isc_nm_listenhttp() and isc_nm_listentlsdns(). We do not do that for
client-side contexts, so that appears to be an overlook. Now we set
the supported ALPN tags right after server-side contexts creation,
similarly how we do for client-side ones.
Artem Boldariev [Wed, 22 Dec 2021 15:11:11 +0000 (17:11 +0200)]
Add TLS context cache
This commit adds a TLS context object cache implementation. The
intention of having this object is manyfold:
- In the case of client-side contexts: allow reusing the previously
created contexts to employ the context-specific TLS session resumption
cache. That will enable XoT connection to be reestablished faster and
with fewer resources by not going through the full TLS handshake
procedure.
- In the case of server-side contexts: reduce the number of contexts
created on startup. That could reduce startup time in a case when
there are many "listen-on" statements referring to a smaller amount of
`tls` statements, especially when "ephemeral" certificates are
involved.
- The long-term goal is to provide in-memory storage for additional
data associated with the certificates, like runtime
representation (X509_STORE) of intermediate CA-certificates bundle for
Strict TLS/Mutual TLS ("ca-file").
Michał Kępień [Tue, 28 Dec 2021 14:09:50 +0000 (15:09 +0100)]
Fix error codes passed to connection callbacks
Commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c erroneously introduced
duplicate conditions to several existing conditional statements
responsible for determining error codes passed to connection callbacks
upon failure. Fix the affected expressions to ensure connection
callbacks are invoked with:
- the ISC_R_SHUTTINGDOWN error code when a global netmgr shutdown is
in progress,
- the ISC_R_CANCELED error code when a specific operation has been
canceled.
This does not fix any known bugs, it only adjusts the changes introduced
by commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c so that they match
its original intent.
Michał Kępień [Mon, 27 Dec 2021 14:41:04 +0000 (15:41 +0100)]
Fix rare control channel socket reference leak
Commit 9ee60e7a17bf34c7ef7f4d79e6a00ca45444ec8c enabled netmgr shutdown
to cause read callbacks for active control channel sockets to be invoked
with the ISC_R_SHUTTINGDOWN result code. However, control channel code
only recognizes ISC_R_CANCELED as an indicator of an in-progress netmgr
shutdown (which was correct before the above commit). This discrepancy
enables the following scenario to happen in rare cases:
1. A control channel request is received and responded to. libuv
manages to write the response to the TCP socket, but the completion
callback (control_senddone()) is yet to be invoked.
2. Server shutdown is initiated. All TCP sockets are shut down, which
i.a. causes control_recvmessage() to be invoked with the
ISC_R_SHUTTINGDOWN result code. As the result code is not
ISC_R_CANCELED, control_recvmessage() does not set
listener->controls->shuttingdown to 'true'.
3. control_senddone() is called with the ISC_R_SUCCESS result code. As
neither listener->controls->shuttingdown is 'true' nor is the result
code ISC_R_CANCELED, reading is resumed on the control channel
socket. However, this read can never be completed because the read
callback on that socket was cleared when the TCP socket was shut
down. This causes a reference on the socket's handle to be held
indefinitely, leading to a hang upon shutdown.
Ensure listener->controls->shuttingdown is also set to 'true' when
control_recvmessage() is invoked with the ISC_R_SHUTTINGDOWN result
code. This ensures the send completion callback does not resume reading
after the control channel socket is shut down.
Michal Nowak [Wed, 25 Aug 2021 16:11:43 +0000 (18:11 +0200)]
Make bullseye the base image
"buster" jobs are now only going to be run in scheduled pipelines.
"--without-gssapi" ./configure option of "bullseye" before it became
the base image is dropped from "bullseye"-the-base-image because it
reduces gcov coverage by 0.38 % (651 lines) and is used in Debian 9
"stretch".
Michał Kępień [Wed, 22 Dec 2021 17:17:26 +0000 (18:17 +0100)]
Set up default logging for SSLKEYLOGFILE
A customary method of exporting TLS pre-master secrets used by a piece
of software (for debugging purposes, e.g. to examine decrypted traffic
in a packet sniffer) is to set the SSLKEYLOGFILE environment variable to
the path to the file in which this data should be logged.
In order to enable writing any data to a file using the logging
framework provided by libisc, a logging channel needs to be defined and
the relevant logging category needs to be associated with it. Since the
SSLKEYLOGFILE variable is only expected to contain a path, some defaults
for the logging channel need to be assumed. Add a new function,
named_log_setdefaultsslkeylogfile(), for setting up those implicit
defaults, which are equivalent to the following logging configuration:
This ensures TLS pre-master secrets do not use up more than about 1 GB
of disk space, which should be enough to hold debugging data for the
most recent 1 million TLS connections.
As these values are arguably not universally appropriate for all
deployment environments, a way for overriding them needs to exist.
Suppress creation of the default logging channel for TLS pre-master
secrets when the SSLKEYLOGFILE variable is set to the string "config".
This enables providing custom logging configuration for the relevant
category via the "logging" stanza. (Note that it would have been
simpler to only skip setting up the default logging channel for TLS
pre-master secrets if the SSLKEYLOGFILE environment variable is not set
at all. However, libisc only logs pre-master secrets if that variable
is set. Detecting a "magic" string enables the SSLKEYLOGFILE
environment variable to serve as a single control for both enabling TLS
pre-master secret collection and potentially also indicating where and
how they should be exported.)
Michał Kępień [Wed, 22 Dec 2021 17:17:26 +0000 (18:17 +0100)]
Check for SSL_CTX_set_keylog_callback() support
The SSL_CTX_set_keylog_callback() function is a fairly recent OpenSSL
addition, having first appeared in version 1.1.1. Add a configure.ac
check for the availability of that function to prevent build errors on
older platforms. Sort similar checks alphabetically.
This makes the SSLKEYLOGFILE mechanism a silent no-op on unsupported
platforms, which is considered acceptable for a debugging feature.
Michał Kępień [Wed, 22 Dec 2021 17:17:26 +0000 (18:17 +0100)]
Log TLS pre-master secrets when requested
Generate log messages containing TLS pre-master secrets when the
SSLKEYLOGFILE environment variable is set. This only ensures such
messages are prepared using the right logging category and passed to
libisc for further processing.
The TLS pre-master secret logging callback needs to be set on a
per-context basis, so ensure it happens for both client-side and
server-side TLS contexts.
Michał Kępień [Wed, 22 Dec 2021 17:17:26 +0000 (18:17 +0100)]
Add a logging category for TLS pre-master secrets
TLS pre-master secrets will be dumped to disk using the logging
framework provided by libisc. Add a new logging category for this type
of debugging data in order to enable exporting it to a dedicated
channel. Derive the name of the new category from the name of the
relevant environment variable, SSLKEYLOGFILE.
Michal Nowak [Wed, 22 Dec 2021 10:14:56 +0000 (11:14 +0100)]
Execute respdiff jobs out-of-order
Commit 2ececf2c dropped dependency of "respdiff" and
"respdiff-third-party" jobs on "tarball-create" job because these jobs
don't need to depend on in (e.g., for its artifacts). This, however,
caused that respdiff jobs weren't started out-of-order and artifacts
from all the "Build" stage jobs plus "unit:gcc:buster:amd64" job were
downloaded to project directory and caused problems with compilation:
Originally, the dependency on "tarball-create" has been added in 04f8b65a to indicate that respdiff "is meant to operate on two different
BIND versions". It seems that the intent didn't work out, and we better
make it obvious that respdiff jobs don't depend on any other job and
should be run out-of-order.
Artem Boldariev [Fri, 17 Dec 2021 12:03:56 +0000 (14:03 +0200)]
doth system test: reduce number of contexts in ns3
This commit removes unused listen-on statements from the ns3 instance
in order to reduce the startup time. That should help with occasional
system test initialisation hiccups in the CI which happen because the
required instances cannot initialise in time.
Artem Boldariev [Fri, 17 Dec 2021 11:57:42 +0000 (13:57 +0200)]
Fix flakiness in the doth reconfig test
Due to the fact that the primary nameserver creates a lot of TLS
contexts, its reconfiguration could take too much time on the CI,
leading to spurious test failures, while in reality it works just
fine.
This commit adds a separate instance for this test which does not use
ephemeral keys (these are costly to generate) and creates minimal
amount of TLS contexts.
Aram Sargsyan [Wed, 8 Dec 2021 16:04:15 +0000 (16:04 +0000)]
Use ECDSA P-256 instead of 4096-bit RSA for 'tls ephemeral'
ECDSA P-256 performs considerably better than the previously used
4096-bit RSA (can be observed using `openssl speed`), and, according
to RFC 6605, provides a security level comparable to 3072-bit RSA.
Ondřej Surý [Wed, 15 Dec 2021 11:07:22 +0000 (12:07 +0100)]
Simplify Address Sanitizer tweaks in mem.c
Previously, whole isc_mempool_get() and isc_mempool_set() would be
replaced by simpler version when run with address sanitizer.
Change the code to limit the fillcount to 1 and freemax to 0. This
change will make isc_mempool_get() to always allocate and use a single
new item and isc_mempool_put() will always return the item to the
allocator.
Mark Andrews [Wed, 15 Dec 2021 10:27:49 +0000 (21:27 +1100)]
Pass the digest buffer length to EVP_DigestSignFinal
OpenSSL 3.0.1 does not accept 0 as a digest buffer length when
calling EVP_DigestSignFinal as it now checks that the digest buffer
length is large enough for the digest. Pass the digest buffer
length instead.
Ondřej Surý [Wed, 15 Dec 2021 16:48:28 +0000 (17:48 +0100)]
Reduce freemax values for dns_message mempools
It was discovered that NAME_FREEMAX and RDATASET_FREEMAX was based on
the NAME_FILLCOUNT and RDATASET_FILLCOUNT respectively multiplied by 8
and then when used in isc_mempool_setfreemax, the value would be again
multiplied by 32.
Keep the 8 multiplier in the #define and remove the 32 multiplier as it
was kept in error. The default fillcount can fit 99.99% of the requests
under normal circumstances, so we don't need to keep that many free
items on the mempool.
Evan Hunt [Wed, 15 Dec 2021 16:27:28 +0000 (08:27 -0800)]
remove ns_interface reference counting
reference counting of ns_interface objects has not been used
since the clientmgr cleanup in #2433, and it no longer really
makes sense now - when we want to destroy an interface on a
rescan, we want it to be destroyed, not kept active by some
other caller. so ns_interface_attach() has been removed,
ns_interface_detach() has been replaced with a static
interface_destroy(), and do_scan() has been simplified
accordingly.
Evan Hunt [Wed, 15 Dec 2021 00:51:02 +0000 (16:51 -0800)]
keep track of non-listening interfaces
previously, if "listen-on-v6" was set to "none", then every
time a scan saw an IPv6 address it would appear to be a new
one. this commit retains all known interfaces in a list
and sets a flag in the ones that are listening, so that
configured interfaces that have been seen before will be
recognized as such.
as an incidental fix, the ns__interfacemgr_getif() and _nextif()
functions have been removed since they were never used.
This commit modifies the NetLink handling code in such a way
that the contents of the messages we are interested in is checked
for the local addresses changes only. This helps to avoid spurious
interface re-scans.
The 'route_recv' log messages are also reduced from DEBUG(3) to
DEBUG(9).
Michal Nowak [Tue, 30 Nov 2021 12:52:49 +0000 (13:52 +0100)]
Drop cppcheck CI job
Every cppcheck update brings the cost of addressing new false positives
in the BIND 9 source code while not reaping any benefits in case of
identified issues with the code.
Aram Sargsyan [Tue, 14 Dec 2021 09:28:01 +0000 (09:28 +0000)]
Recreate HTTPS and TLS interfaces only during reconfiguration
The 850e9e59bf8c29f895a981211c72c0b3c294bcfd commit intended to recreate
the HTTPS and TLS interfaces during reconfiguration, but they are being
recreated also during regular interface re-scans.
Make sure the HTTPS and TLS interfaces are being recreated only during
reconfiguration.
Aram Sargsyan [Thu, 9 Dec 2021 14:49:51 +0000 (14:49 +0000)]
Recreate TLS interfaces during reconfiguration
For DoH and DoT listeners, a reconfiguration event triggers a creation
of a new 'SSL_CTX' TLS context, and a destruction of the old one.
The network manager, though, keeps using the old context which causes
errors.
During interface scanning, when a matching existing interface is found,
reuse it only when it doesn't have a TLS context, otherwise shut it down
and recreate with a new TLS context.
Petr Menšík [Mon, 29 Nov 2021 23:04:35 +0000 (00:04 +0100)]
Improve error message when directory name is given
Surprising error IO error is returned when directory name
is given instead of named.conf file. It can be passed to named-checkconf
or include statement. Make a simple change to return Invalid file
instead. Still not precise, but much better error message is returned.
Michał Kępień [Thu, 9 Dec 2021 13:02:36 +0000 (14:02 +0100)]
Remove mutex debugging code
Mutex debugging code (used when the ISC_MUTEX_DEBUG preprocessor macro
is set to 1 and PTHREAD_MUTEX_ERRORCHECK is defined) has been broken for
the past 3 years (since commit 2f3eee5a4fdad6606135116c70875b3180c7ed83)
and nobody complained, which is a strong indication that this code is
not being used these days any more. External tools for detecting
locking issues are already wired into various GitLab CI checks. Drop
all code depending on the ISC_MUTEX_DEBUG preprocessor macro being set.
Michał Kępień [Thu, 9 Dec 2021 11:24:12 +0000 (12:24 +0100)]
Remove mutex profiling code
Mutex profiling code (used when the ISC_MUTEX_PROFILE preprocessor macro
is set to 1) has been broken for the past 3 years (since commit 0bed9bfc28a204cde57c6f68170ecc89ebfa6dc8) and nobody complained, which
is a strong indication that this code is not being used these days any
more. External tools for both measuring performance and detecting
locking issues are already wired into various GitLab CI checks. Drop
all code depending on the ISC_MUTEX_PROFILE preprocessor macro being
set.
Evan Hunt [Thu, 2 Dec 2021 20:04:42 +0000 (12:04 -0800)]
prevent a shutdown hang on non-matching TCP responses
When a non-matching DNS response is received by the resolver,
it calls dns_dispatch_getnext() to resume reading. This is necessary
for UDP but not for TCP, because TCP connections automatically
resume reading after any valid DNS response.
This commit adds a 'tcpreading' flag to TCP dispatches, so that
`dispatch_getnext()` can be called multiple times without subsequent
calls having any effect.
Ondřej Surý [Mon, 6 Dec 2021 10:10:17 +0000 (11:10 +0100)]
Stop leaking mutex in nmworker and cond in nm socket
On FreeBSD, the pthread primitives are not solely allocated on stack,
but part of the object lives on the heap. Missing pthread_*_destroy
causes the heap memory to grow and in case of fast lived object it's
possible to run out-of-memory.
Properly destroy the leaking mutex (worker->lock) and
the leaking condition (sock->cond).
Ondřej Surý [Tue, 7 Dec 2021 19:35:58 +0000 (20:35 +0100)]
Reduce the number of hazard pointers
Previously, we set the number of the hazard pointers to be 4 times the
number of workers because the dispatch ran on the old socket code.
Since the old socket code was removed there's a smaller number of
threads, namely:
Ondřej Surý [Tue, 7 Dec 2021 10:15:27 +0000 (11:15 +0100)]
Fix the isc_hp initialization and memory usage
Previously, the isc_hp_init() could not lower the value of
isc__hp_max_threads, but because of a mistake the isc__hp_max_threads
would be set to HP_MAX_THREADS (e.g. 128 threads) thus it would be
always set to 128. This would result in increased memory usage even
when small number of workers were in use.
Change the default value of isc__hp_max_threads to be 1.
Additionally, enforce the max_hps value in isc_hp_new() to be smaller or
equal to HP_MAX_HPS. The only user is isc_queue which uses just 1
hazard pointer, so it's only theoretical issue.