Move the block on the error path, where the link is checked, to a place
where it makes sense, to avoid accessing an unitialized link when
jumping to the 'cleanup_query' label from 4 different places. The link
is initialized only after those jumps happen.
In addition, initilize the link when creating the object, to avoid
similar errors.
Mark Andrews [Tue, 19 Sep 2023 04:06:15 +0000 (14:06 +1000)]
Wait for the test zone to finish re-loading
'rndc thaw' initiates asynchrous loading of all the zones
similar to 'rndc load'. Wait for the test zone's load to
complete before testing that it is updatable again.
Explicitly cast chars to unsigned chars for <ctype.h> functions
Apply the semantic patch to catch all the places where we pass 'char' to
the <ctype.h> family of functions (isalpha() and friends, toupper(),
tolower()).
Add semantic patch to explicitly cast chars to unsigned for ctype.h
Add a semantic patch to catch all the places where we pass 'char' to the
<ctype.h> family of functions (isalpha() and friends, toupper(),
tolower()). While it generally works because the way how these
functions are constructed in the libc, it's safer to do the explicit
cast.
Michal Nowak [Thu, 31 Aug 2023 16:55:36 +0000 (18:55 +0200)]
Add a Sphinx role for linking CVEs to the ISC Knowledgebase
The new :cve: Sphinx role takes a CVE number as an argument and creates
a hyperlink to the relevant ISC Knowledgebase document that might have
more up-to-date or verbose information than the relevant release note.
This makes reaching ISC Knowledgebase pages directly from the release
notes easier.
Make all CVE references in the release notes use the new Sphinx role.
Tom Krizek [Tue, 19 Sep 2023 15:20:32 +0000 (17:20 +0200)]
Rename convenience symlink to pytest artifacts
The previous symlink name convention was prone to name collisions If a
system test contained both a shell test and a pytest module of the same
name (e.g. dnstap test has both tests.sh and tests_dnstap.py), then
these would have the same convenience symlink, which could cause test
setup issues as well as confusion when examining test artifacts.
Update the naming convention to include the full pytest module name.
This results in a slightly more verbose names for shell tests (e.g.
dnstap_sh_dnstap instead of the previous dnstap_dnstap), but it removes
the chance of a collision.
Tom Krizek [Tue, 15 Aug 2023 11:55:56 +0000 (13:55 +0200)]
Use integers for ports fixtures in pytest
Reorganize individual port fixtures and re-use the ports fixture to
obtain their number. Store it as integer and only cast it to string when
setting it as environment variable.
Tom Krizek [Thu, 7 Sep 2023 13:21:54 +0000 (15:21 +0200)]
Remove legacy runner support from conftest.py
Remove code fork for legacy runner, reorganize imports and move a
pylint-silencing snippet to the top of the file. The rest of the code
was just unindented.
Tom Krizek [Tue, 15 Aug 2023 11:40:13 +0000 (13:40 +0200)]
Remove pytest invocation from legacy runner
In order to python system tests, pytest (runner) has to be used
directly. This makes it possible to simplify the pytest runner and make
its behavior simpler and easier to extend.
The legacy runner can still be used to run shell system tests.
Tom Krizek [Wed, 20 Sep 2023 11:45:41 +0000 (13:45 +0200)]
Use 0 exit code for skipped tests in legacy runner
Since the legacy runner is no longer used in the automake test suite,
don't use the special GNU exit code indicating a skipped tests. Instead,
use 0 to avoid considering skipped tests as failed when using simpler
mechanism (such as xargs -P) to run the tests with the legacy runner.
Tom Krizek [Wed, 20 Sep 2023 08:49:44 +0000 (10:49 +0200)]
ci: make sure to use legacy test runner on EL7
EL7 doesn't have the required dependencies for the newer pytest runner.
Since make check now invokes the pytest runner, ensure that the legacy
runner will be used instead.
Tom Krizek [Fri, 8 Sep 2023 10:44:08 +0000 (12:44 +0200)]
Remove make check invocation from legacy.run.sh
The legacy runner no longer uses make check. Ensure the legacy runner
script doesn't interact with that automake target in any way. The legacy
runner script remains available to execute the legacy runner, but there
is no out-of-the box support for running tests in parallel. Other tools
such as xargs can be utilized for that.
Tom Krizek [Wed, 6 Sep 2023 11:43:18 +0000 (13:43 +0200)]
ci: switch OpenBSD job to use make check
Invoking pytest directly provides a better formatted output and more
flexibility. However, it's prudent to verify that `make check` keeps
working as expected. Use it in the OpenBSD job which isn't executed as
frequently and its output is of least concern.
Tom Krizek [Tue, 5 Sep 2023 08:29:13 +0000 (10:29 +0200)]
Modify custom-test-driver to interpret JUnit results
Pytest provides JUnit output and uses different exit codes from
Automake. Use the conversion script to interpret the JUnit test results
from python rather than relying on the status code.
Tom Krizek [Tue, 5 Sep 2023 14:16:20 +0000 (16:16 +0200)]
Convert JUnit XML from pytest into Automake .trs files
It's important to parse the JUnit result file rather than relying on the
exit code from pytest, which has a different meaning. Include a .trs test
result for each test case and set an exit code which is most appropriate
as the aggregate result (e.g. it will be set to 77 (SKIP) if there's at
least one test case that was skipped).
Tom Krizek [Mon, 4 Sep 2023 11:41:09 +0000 (13:41 +0200)]
Remove redundant dependency checks for system tests
Dependencies for these tests are already checked in prereq.sh - if the
dependencies are missing, these tests will be skipped. The extra
dependency check in Makefile.am is extraneous and only applied for the
legacy test runner.
Tom Krizek [Thu, 31 Aug 2023 11:18:17 +0000 (13:18 +0200)]
Fix pytest module detection for run.sh
To allow concurrent invocations of pytest, it is necessary to assign
ports properly to avoid conflicts. In order to do that, pytest needs to
know a complete list of all test modules.
When pytest is invoked from run.sh, the current working directory is the
system test directory. To properly detect other tests, the conftest.py
has to look in the bin/tests/system directory, rather than the current
working directory.
Tom Krizek [Wed, 30 Aug 2023 11:51:05 +0000 (13:51 +0200)]
danger: check system test convetions for pytest runner
When adding a new system test, it might easy to forget to add the
required files for the pytest runner or break a naming convention. Add
danger checks to cover these cases.
Tom Krizek [Wed, 30 Aug 2023 11:37:09 +0000 (13:37 +0200)]
Rename allow-query pytest glue file
To conform with the expected naming convention, the pytest glue file for
the `allow-query` test should use underscore as the word separator in
the python file name: allow-query/tests_sh_allow_query.py
Tom Krizek [Mon, 18 Sep 2023 15:20:01 +0000 (17:20 +0200)]
Treat bin/tests/system/_common as non-temp directory
The _common directory is a special case directory which contains shared
files for other system test directories. Make sure it's tracked in git
and not deleted during temporary directory cleanup.
Tom Krizek [Mon, 18 Sep 2023 15:25:17 +0000 (17:25 +0200)]
Rename system test directory with common files to _common
The old name "common" clashes with the convention of system test
directory naming. It appears as a system test directory, but it only
contains helper files.
To reduce confusion and to allow automatic detection of issues with
possibly missing test files, rename the helper directory to "_common".
The leading underscore indicates the directory is different and the its
name can no longer be confused with regular system test directories.
Don't warn about subject line length for the fixup commits
The fixup commits' subject line has a prefix which has its own
length, so warning about the exceeding length is not accurate.
Given that the fixup commits can not be merged, because they
cause a danger failure, it's safe to ignore the length check
for them.
In the ns__client_put_cb() callback function the 'client->manager'
pointer is guaranteed to be non-NULL, because in ns__client_request(),
before setting up the callback, the ns__client_setup() function is
called for the 'client', which makes sure that 'client->manager' is set.
Removing the NULL-check resolves the following static analyzer warning:
/lib/ns/client.c: 1675 in ns__client_put_cb()
1669 dns_message_puttemprdataset(client->message, &client->opt);
1670 }
1671 client_extendederror_reset(client);
1672
1673 dns_message_detach(&client->message);
1674
>>> CID 465168: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "client->manager" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
1675 if (client->manager != NULL) {
1676 ns_clientmgr_detach(&client->manager);
1677 }
1678
1679 /*
1680 * Detaching the task must be done after unlinking from
Artem Boldariev [Mon, 7 Aug 2023 10:40:58 +0000 (13:40 +0300)]
TLS DNS: fix 'isc__nm_uvreq_t' send request use after free error
In TLS DNS, it was possible for an 'isc__nm_uvreq_t' object to be used
twice when it was not implied, leading to use after free errors,
leading to abort()'s and/or crashes.
In particular, in the older version of the code, the following was
happening:
* 'tlsdns_send_direct()' is eventually called from
'isc__nm_async_tlsdnssend()' when sending data;
* 'tlsdns_send_direct()' could fail. If it happens early closer to the
beginning of the function, then the 'isc__nm_uvreq_t' object is passed
to 'isc__nm_failed_send_cb()' and eventually gets destroyed normally
without any negative consequences;
* However, if 'tlsdns_send_direct()' fails due to the call to
'tls_cycle()', then the send request object is re-queued to be
processed asynchronously later by a call to 'tlsdns_send_enqueue()'.
* The error processing code in 'tlsdns_send_direct()' is written
without an assumption that the send request object could be re-queued
in 'tlsdns_send_direct()' and, thus, it was passed to
'isc__nm_failed_send_cb()' for asynchronous error processing.
* As a result of asynchronous processing initiated by
`tlsdns_send_enqueue()` the 'isc__nm_uvreq_t' send request object gets
properly destroyed eventually.
* Then, as a result of asynchronous error processing initiated by the
'isc__nm_failed_send_cb()' call, the send request object is used a
second time. However, it was destroyed at the previous step.
So we have a (relatively) obvious logical mistake here. Why
'tls_cycle()' was failing is of no particular importance, as I can see
many ways for it to fail both on TLS and networking levels. In
particular, during testing with 'dnsperf', I have encountered the
following two cases:
1. 'result == ISC_R_TLSERROR' - the remote side was wrapping up,
causing errors on TLS level (e.g. handshake has not been completed
yet);
2. 'result == ISC_R_CONNECTIONRESET' - the remote side closed the
connection after we have initiated the asynchronous send request
operation and reached the point when its processing started.
One could ask: why haven't we encountered the error before? That
happened because 'send()' is, in general, a faster operation than
'read()' so we were lucky enough not to encounter this, as it took
dnsperf to make our code fail in a particular way to trigger the
problematic code path. It requires closing the connection shortly
after we have made an asynchronous 'send()' attempt and passed the
data for processing by the libuv code (or, to be precise, down our
technologies stack).
Mark Andrews [Fri, 1 Sep 2023 00:17:00 +0000 (10:17 +1000)]
Adjust level of log messages when transferring in a zone
This raises the log level of messages treated as FORMERR to NOTICE
when transfering in a zone. This also adds a missing log message
for TYPE0 and meta types received during a zone transfer.
Artem Boldariev [Wed, 23 Aug 2023 20:36:16 +0000 (23:36 +0300)]
TLS DNS: take into account partial writes by SSL_write_ex()
This commit changes TLS DNS so that partial writes by the
SSL_write_ex() function are taken into account properly. Now, before
doing encryption, we are flushing the buffers for outgoing encrypted
data.
The problem is fairly complicated and originates from the fact that it
is somewhat hard to understand by reading the documentation if and
when partial writes are supported/enabled or not, and one can get a
false impression that they are not supported or enabled by
default (https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html). I
have added a lengthy comment about that into the code because it will
be more useful there. The documentation on this topic is vague and
hard to follow.
The main point is that when SSL_write_ex() fails with
SSL_ERROR_WANT_WRITE, the OpenSSL code tells us that we need to flush
the outgoing buffers and then call SSL_write_ex() again with exactly
the same arguments in order to continue as partial write could have
happened on the previous call to SSL_write_ex() (that is not hard to
verify by calling BIO_pending(sock->tls.app_rbio) before and after the
call to SSL_write_ex() and comparing the returned values). This aspect
was not taken into account in the code.
Now, one can wonder how that could have led to the behaviour that we
saw in the #4255 bug report. In particular, how could we lose one
message and duplicate one twice? That is where things get interesting.
One needs to keep two things in mind (that is important):
Firstly, the possibility that two (or more) subsequent SSL_write_ex()
calls will be done with exactly the same arguments is very high (the
code does not guarantee that in any way, but in practice, that happens
a lot).
Secondly, the dnsperf (the software that helped us to trigger the bug)
bombed the test server with messages that contained exactly the same
data. The only difference in the responses is message IDs, which can
be found closer to the start of a message.
So, that is what was going on in the older version of the code:
1. During one of the isc_nm_send() calls, the SSL_write_ex() call
fails with SSL_ERROR_WANT_WRITE. Partial writing has happened, though,
and we wrote a part of the message with the message
ID (e.g. 2014). Nevertheless, we have rescheduled the complete send
operation asynchronously by a call to tlsdns_send_enqueue().
2. While the asynchronous request has not been completed, we try to
send the message (e.g. with ID 2015). The next isc_nm_send() or
re-queued send happens with a call to SSL_write_ex() with EXACTLY the
same arguments as in the case of the previous call. That is, we are
acting as if we want to complete the previously failed SSL_write_ex()
attempt (according to the OpenSSL documentation:
https://www.openssl.org/docs/man3.1/man3/SSL_write_ex.html, the
"Warnings" section). This way, we already have a start of the message
containing the previous ID (2014 in our case) but complete the write
request with the rest of the data given in the current write
attempt. However, as responses differ only in message ID, we end up
sending a valid (properly structured) DNS message but with the ID of
the previous one. This way, we send a message with ID from the
previous isc_nm_send() attempt. The message with the ID from the send
request from this attempt will never be sent, as the code thinks that
it is sending it now (that is how we send the message with ID 2014
instead of 2015, as in our example, thus making the message with ID
2015 never to be sent).
3. At some point later, the asynchronous send request (the rescheduled
on the first step) completes without an error, sending a second
message with the same ID (2014).
It took exhausting SSL write buffers (so that a data encryption
attempt cannot be completed in one operation) via long DoT streams in
order to exhibit the behaviour described above. The exhaustion
happened because we have not been trying to flush the buffers often
enough (especially in the case of multiple subsequent writes).
In my opinion, the origin of the problem can be described as follows:
It happened due to making wrong guesses caused by poorly written
documentation.
Artem Boldariev [Thu, 10 Aug 2023 20:08:25 +0000 (23:08 +0300)]
Allocate DNS send buffers using dedicated per-worker memory arenas
This commit ensures that memory allocations related to DNS send
buffers are routed through dedicated per-worker memory arenas in order
to decrease memory usage on high load caused by TCP-based DNS
transports.
We do that by following jemalloc developers suggestions:
Artem Boldariev [Thu, 10 Aug 2023 14:02:43 +0000 (17:02 +0300)]
Make it possible to create memory contexts backed by jemalloc arenas
This commit extends the internal memory management middleware code in
BIND so that memory contexts backed by dedicated jemalloc arenas can
be created. A new function (isc_mem_create_arena()) is added for that.
Moreover, it extends the existing code so that specialised memory
contexts can be created easily, should we need that functionality for
other future purposes. We have achieved that by passing the flags to
the underlying jemalloc-related calls. See the above
isc_mem_create_arena(), which can serve as an example of this.
Having this opens up possibilities for creating memory contexts tuned
for specific needs.
Artem Boldariev [Mon, 16 Jan 2023 16:31:08 +0000 (18:31 +0200)]
Fix building BIND on DragonFly BSD (on both older an newer versions)
This commit ensures that BIND and supplementary tools still can be
built on newer versions of DragonFly BSD. It used to be the case, but
somewhere between versions 6.2 and 6.4 the OS developers rearranged
headers and moved some function definitions around.
Before that the fact that it worked was more like a coincidence, this
time we, at least, looked at the related man pages included with the
OS.
No in depth testing has been done on this OS as we do not really
support this platform - so it is more like a goodwill act. We can,
however, use this platform for testing purposes, too. Also, we know
that the OS users do use BIND, as it is included in its ports
directory.
Building with './configure' and './configure --without-jemalloc' have
been fixed and are known to work at the time the commit is made.
Michał Kępień [Mon, 4 Sep 2023 09:54:57 +0000 (11:54 +0200)]
Move security-related information to SECURITY.md
To follow current best practices, create a short SECURITY.md file in the
root of the repository that contains information about the project's
security policy and guidelines for reporting potential security issues.
Replace the relevant bits of text in other files with references to the
new SECURITY.md file, so that the relevant information only needs to be
maintained in one place.
Replace all occurrences of the generic security-officer@isc.org email
with a dedicated address for reporting BIND 9 security issues,
bind-security@isc.org.