Mark Andrews [Thu, 6 Oct 2022 06:31:40 +0000 (17:31 +1100)]
Test named's check-svcb behaviour with UPDATE
Checks that malformed _dns SVCB records are rejected unless
check-svcb is set to no, in which case they are accepted. Both
missing ALPN and missing DOHPATH are checked for.
Mark Andrews [Wed, 5 Oct 2022 06:25:21 +0000 (17:25 +1100)]
Add checking of _dns SVCB records constraints to nsupdate
_dns SVBC records have additional constrains which should be checked
when records are being added. This adds those constraint checks but
allows the user to override them using 'check-svcb no'.
Mark Andrews [Wed, 5 Oct 2022 06:18:24 +0000 (17:18 +1100)]
Add dns_rdata_checksvcb
dns_rdata_checksvcb performs data entry checks on SVCB records.
In particular that _dns SVBC record have an 'alpn' and if that 'alpn'
parameter indicates HTTP is in use that 'dophath' is present.
Matthijs Mekking [Wed, 26 Oct 2022 08:02:36 +0000 (10:02 +0200)]
Fix update forwarding bug
The wrong tls configuration was picked here. It should be of the
primary that is selected by forward->which, not zone->curprimary.
This bug may cause BIND to select the wrong primary when retrieving
the TLS settings, or cause a crash in case the wrongly selected primary
has no TLS settings.
Matthijs Mekking [Wed, 26 Oct 2022 07:51:21 +0000 (09:51 +0200)]
Add new upforwd system test
Add a new upforwd system test that checks if update forwarding still
works if the first primary is badly configured.
We cannot reuse the 'example.' zone for this test because that
checks if update forwarding works for DoT. What transport is used
in the new test is of no relevance.
Update the system test to use different known good file names for
the different zones that are being tested.
Tom Krizek [Wed, 26 Oct 2022 14:20:57 +0000 (16:20 +0200)]
Randomize algorithm selection for mkeys test
Use the ALGORITHM_SET option to use randomly selected default algorithm
in this test. Make sure the test works by using variables instead of
hard-coding values.
Tom Krizek [Tue, 25 Oct 2022 15:45:16 +0000 (17:45 +0200)]
Script for random algorithm selection in system tests
Multiple algorithm sets can be defined in this script. These can be
selected via the ALGORITHM_SET environment variable. For compatibility
reasons, "stable" set contains the currently used algorithms, since our
system tests need some changes before being compatible with randomly
selected algorithms.
The script operation is similar to the get_ports.py - environment
variables are created and then printed out as `export NAME=VALUE`
commands, to be interpreted by shell. Once we support pytest runner for
system tests, this should be a fixture instead.
Tom Krizek [Tue, 25 Oct 2022 16:00:27 +0000 (18:00 +0200)]
Export env variables in system tests
Certain variables have to be exported in order for the system tests to
work. It makes little sense to export the variables in one place/script
while they're defined in another place.
Since it makes no harm, export all the variables to make the behaviour
more predictable and consistent. Previously, some variables were
exported as environment variables, while others were just shell
variables which could be used once the configuration was sourced from
another script. However, they wouldn't be exposed to spawned processes.
For simplicity sake (and for the upcoming effort to run system tests
with pytest), export all variables that are used. TESTS, PARALLEL_UNIX
and SUBDIRS variables are automake-specific, aren't used anywhere else
and thus not exported.
Tom Krizek [Tue, 25 Oct 2022 12:05:07 +0000 (14:05 +0200)]
Support testcrypto.sh usage without including conf.sh
The only variable really needed for the script to work is the path to
the $KEYGEN binary. Allow setting this via an environment variable to
avoid loading conf.sh (and causing a chicken-egg problem). Also make
testcrypto.sh executable to allow its use from conf.sh.
Matthijs Mekking [Wed, 26 Oct 2022 07:55:55 +0000 (09:55 +0200)]
Fix config bug related to port setting
There are three levels there for the port value, with increasing
priority:
1. The default ports, defined by 'port' and 'tls-port' config options.
2. The primaries-level default port: primaries port <number> { ... };
3. The primaries element-level port: primaries { <address> port
<number>; ... };"
In 'named_config_getipandkeylist()', the 'def_port' and 'def_tlsport'
variables are extracted from level 1. The 'port' variable is extracted
from the level 2. Currently if that is unset, it defaults to the
default port ('def_port' or 'def_tlsport' depending on the transport
used), but overrides the level 2 port setting for the next primaries in
the list.
Update the code such that we inherit the port only if the level 3 port
is not set, and inherit from the default ports if the level 2 port is
also not set.
Matthijs Mekking [Wed, 26 Oct 2022 14:55:05 +0000 (16:55 +0200)]
Add xfer system test case
Add a test case that if the first primary fails, the fallback of a
second primary on plain DNS works. This is mainly to test that the port
configuration inheritance works correctly.
Evan Hunt [Mon, 24 Oct 2022 20:54:39 +0000 (13:54 -0700)]
Fix an error when building with --disable-doh
The netievent handler for isc_nmsocket_set_tlsctx() was inadvertently
ifdef'd out when BIND was built with --disable-doh, resulting in an
assertion failure on startup when DoT was configured.
Michał Kępień [Mon, 24 Oct 2022 09:05:02 +0000 (11:05 +0200)]
Bump Sphinx version to 5.3.0
Make the Sphinx version listed in doc/arm/requirements.txt match the
version currently used in GitLab CI, so that Read the Docs builds the
documentation using the same Python software versions as those used in
GitLab CI.
Aram Sargsyan [Thu, 20 Oct 2022 10:38:50 +0000 (10:38 +0000)]
Getting the "prefetch" setting from the configuration cannot fail
The "prefetch" setting is in "defaultconf" so it cannot fail, use
INSIST to confirm that.
The 'trigger' and 'eligible' variables are now prefixed with
'prefetch_' and their declaration moved to an upper level, because
there is no more additional code block after this change.
Aram Sargsyan [Tue, 18 Oct 2022 13:21:01 +0000 (13:21 +0000)]
Fix prefetch "trigger" value's documentation in ARM
For the prefetch "trigger" parameter ARM states that when a cache
record with a lower TTL value is encountered during query processing,
it is refreshed. But in reality, the record is refreshed when the TTL
value is lower or equal to the configured "trigger" value.
Fix the documentation to make it match with with the code.
Aram Sargsyan [Tue, 18 Oct 2022 12:59:47 +0000 (12:59 +0000)]
Match prefetch eligibility behavior with ARM
ARM states that the "eligibility" TTL is the smallest original TTL
value that is accepted for a record to be eligible for prefetching,
but the code, which implements the condition doesn't behave in that
manner for the edge case when the TTL is equal to the configured
eligibility value.
Fix the code to check that the TTL is greater than, or equal to the
configured eligibility value, instead of just greater than it.
Aram Sargsyan [Tue, 18 Oct 2022 13:06:19 +0000 (13:06 +0000)]
Add another prefetch check in the resolver system test
The test triggers a prefetch, but fails to check if it acutally
happened, which prevented it from catching a bug when the record's
TTL value matches the configured prefetch eligibility value.
Check that prefetch happened by comparing the TTL values.
Tony Finch [Wed, 19 Oct 2022 10:20:24 +0000 (11:20 +0100)]
Delete the `render` benchmark
Instead of fixing a Coverity complaint (and other style nits),
delete it because it needs input data that can't be generated
with the tools that ship with BIND.
Aram Sargsyan [Fri, 21 Oct 2022 08:08:55 +0000 (08:08 +0000)]
Call dns_adb_endudpfetch() on error path, if required
For UDP queries, after calling dns_adb_beginudpfetch() in fctx_query(),
make sure that dns_adb_endudpfetch() is also called on error path, in
order to adjust the quota back.
Aram Sargsyan [Fri, 21 Oct 2022 08:08:47 +0000 (08:08 +0000)]
Always call dns_adb_endudpfetch() in fctx_cancelquery() for UDP queries
It is currently possible that dns_adb_endudpfetch() is not
called in fctx_cancelquery() for a UDP query, which results
in quotas not being adjusted back.
Always call dns_adb_endudpfetch() for UDP queries.
Artem Boldariev [Wed, 19 Oct 2022 12:26:48 +0000 (15:26 +0300)]
Fix named failing to start on Solaris systems with hundreds of CPUs
This commit fixes a startup issue on Solaris systems with
many (reportedly > 510) CPUs by bumping RLIMIT_NOFILE. This appears to
be a regression from 9.11.
Ondřej Surý [Tue, 18 Oct 2022 19:46:16 +0000 (21:46 +0200)]
Replace some raw nc usage in statschannel system test with curl
For tests where the TCP connection might get interrupted abruptly,
replace the nc with curl as the data sent from server to client might
get lost because of abrupt TCP connection. This happens when the TCP
connection gets closed during sending the large request to the server.
As we already require curl for other system tests, replace the nc usage
in the statschannel test with curl that actually understands the
HTTP/1.1 protocol, so the same connection is reused for sending the
consequtive requests, but without client-side "pipelining".
For the record, the server doesn't support parallel processing of the
pipelined request, so it's a bit misnomer here, because what we are
actually testing is that we process all requests received in a single
TCP read callback.
Evan Hunt [Tue, 18 Oct 2022 20:48:52 +0000 (13:48 -0700)]
ensure RPZ lookups handle CD=1 correctly
RPZ rewrites called dns_db_findext() without passing through the
client database options; as as result, if the client set CD=1,
DNS_DBFIND_PENDINGOK was not used as it should have been, and
cache lookups failed, resulting in failure of the rewrite.
Ondřej Surý [Tue, 18 Oct 2022 19:46:16 +0000 (21:46 +0200)]
Serialize the HTTP/1.1 statschannel requests
The statschannel truncated test still terminates abruptly sometimes and
it doesn't return the answer for the first query. This might happen
when the second process_request() discovers there's not enough space
before the sending is complete and the connection is terminated before
the client gets the data.
Change the isc_http, so it pauses the reading when it receives the data
and resumes it only after the sending has completed or there's
incomplete request waiting for more data.
This makes the request processing slightly less efficient, but also less
taxing for the server, because previously all requests that has been
received via single TCP read would be processed in the loop and the
sends would be queued after the read callback has processed a full
buffer.
Ondřej Surý [Wed, 19 Oct 2022 12:13:26 +0000 (14:13 +0200)]
Fix the non-developer build with OpenSSL 1.0.2
In non-developer build, a wrong condition prevented the
isc__tls_malloc_ex, isc__tls_realloc_ex and isc__tls_free_ex to be
defined. This was causing FTBFS on platforms with OpenSSL 1.0.2.
Ondřej Surý [Wed, 19 Oct 2022 11:52:09 +0000 (13:52 +0200)]
Remove the time requirement for the statschannel truncated test
The 5 seconds requirement to finish the 'pipelined with truncated
stream' was causing spurious failures in the CI because the job runners
might be very busy and sending 128k of data might simply take some time.
Remove the time requirement altogether, there's actually no reason why
the test SHOULD or even MUST finish under 5 seconds.
Update dupsigs test for dnssec-dnskey-kskonly default. Since v9.17.20,
the dnssec-dnskey-kskonly is set to yes. Update the test to not expect
the additional RRSIG with ZSK for DNSKEY.
Speed up the test from 20 minutes to 2.5 minutes and make it part of the
default test suite executed in CI.
- decrease number of records to sign from 2000 to 500
- decrease the signing interval by a factor of 6
- shorten the final part of the test after last signing (since nothing
new happens there)
Finally, clarify misleading comments about (in)sufficient time for zone
re-signing. The time used in the test is in fact sufficient for the
re-signing to happen. If it wasn't, the previous ZSK would end up being
deleted while its signatures would still be present, which is a
situation where duplicate signatures can still happen.
Tom Krizek [Fri, 14 Oct 2022 14:59:50 +0000 (16:59 +0200)]
Revive the stress system test
Ensure the port numbers are dynamically filled in with copy_setports.
Clarify test fail condition.
Make the stress test part of the default test suite since it doesn't
seem to run too long or interfere with other tests any more (the
original note claiming so is more than 20 years old).
Tom Krizek [Mon, 10 Oct 2022 15:21:41 +0000 (17:21 +0200)]
Revive dialup system test
Properly template the port number in config files with copy_setports.
The test takes two minutes on my machine which doesn't seem like a
proper justification to exclude it from the test suite, especially
considering we run these tests in parallel nowadays. The resource usage
doesn't seems significantly increased so it shouldn't interfere with
other system tests.
There also exists a precedent for longer running system tests that are
already part of the default system test suite (e.g. serve-stale takes
almost three minutes on the same machine).
Tom Krizek [Wed, 5 Oct 2022 13:59:13 +0000 (15:59 +0200)]
Make digdelv test work in different network envs
When a target server is unreachable, the varying network conditions may
cause different ICMP message (or no message). The host unreachable
message was discovered when attempting to run the test locally while
connected to a VPN network which handles all traffic.
Extend the dig output check with "host unreachable" message to avoid a
false negative test result in certain network environments.
Michał Kępień [Tue, 6 Sep 2022 11:36:44 +0000 (13:36 +0200)]
Add tests for CVE-2022-2795
Add a test ensuring that the amount of work fctx_getaddresses() performs
for any encountered delegation is limited: delegate example.net to a set
of 1,000 name servers in the redirect.com zone, the names of which all
resolve to IP addresses that nothing listens on, and query for a name in
the example.net domain, checking the number of times the findname()
function gets executed in the process; fail if that count is excessively
large.
Since the size of the referral response sent by ans3 is about 20 kB, it
cannot be sent back over UDP (EMSGSIZE) on some operating systems in
their default configuration (e.g. FreeBSD - see the
net.inet.udp.maxdgram sysctl). To enable reliable reproduction of
CVE-2022-2795 (retry patterns vary across BIND 9 versions) and avoid
false positives at the same time (thread scheduling - and therefore the
number of fetch context restarts - vary across operating systems and
across test runs), extend bin/tests/system/resolver/ans3/ans.pl so that
it also listens on TCP and make "ns1" in the "resolver" system test
always use TCP when communicating with "ans3".
Also add a test (foo.bar.sub.tld1/TXT) that ensures the new limitations
imposed on the resolution process by the mitigation for CVE-2022-2795 do
not prevent valid, glueless delegation chains from working properly.
Artem Boldariev [Tue, 18 Oct 2022 11:42:10 +0000 (14:42 +0300)]
TLS Stream: handle successful TLS handshake after listener shutdown
It was possible that accept callback can be called after listener
shutdown. In such a case the callback pointer equals NULL, leading to
segmentation fault. This commit fixes that.
Evan Hunt [Wed, 17 Aug 2022 19:46:02 +0000 (12:46 -0700)]
test for growth of compressed pipelined responses
add a test to compare the Content-Length of successive compressed
messages on a single HTTP connection that should contain the same
data; fail if the size grows by more than 100 bytes from one query
to the next.
Matthijs Mekking [Fri, 14 Oct 2022 14:38:25 +0000 (16:38 +0200)]
Change log level when doing rekey
This log happens when BIND checks the parental-agents if the DS has
been published. But if you don't have parental-agents set up, the list
of keys to check will be empty and the result will be ISC_R_NOTFOUND.
This is not an error, so change the log level to debug in this case.
Artem Boldariev [Fri, 14 Oct 2022 17:45:40 +0000 (20:45 +0300)]
Synchronise stop listening operation for multi-layer transports
This commit introduces a primitive isc__nmsocket_stop() which performs
shutting down on a multilayered socket ensuring the proper order of
the operations.
The shared data within the socket object can be destroyed after the
call completed, as it is guaranteed to not be used from within the
context of other worker threads.
Tony Finch [Fri, 14 Oct 2022 16:37:47 +0000 (17:37 +0100)]
Include the function name when reporting unexpected errors
I.e. print the name of the function in BIND that called the system
function that returned an error. Since it was useful for pthreads
code, it seems worthwhile doing so everywhere.
Aram Sargsyan [Mon, 17 Oct 2022 08:45:45 +0000 (08:45 +0000)]
Handle large numbers when parsing/printing a duration
The isccfg_duration_fromtext() function is truncating large numbers
to 32 bits instead of capping or rejecting them, i.e. 64424509445,
which is 0xf00000005, gets parsed as 32-bit value 5 (0x00000005).
Fail parsing a duration if any of its components is bigger than
32 bits. Using those kind of big numbers has no practical use case
for a duration.
The isccfg_duration_toseconds() function can overflow the 32 bit
seconds variable when calculating the duration from its component
parts.
To avoid that, use 64-bit calculation and return UINT32_MAX if the
calculated value is bigger than UINT32_MAX. Again, a number this big
has no practical use case anyway.
The buffer for the generated duration string is limited to 64 bytes,
which, in theory, is smaller than the longest possible generated
duration string.
Use 80 bytes instead, calculated by the '7 x (10 + 1) + 3' formula,
where '7' is the count of the duration's parts (year, month, etc.), '10'
is their maximum length when printed as a decimal number, '1' is their
indicator character (Y, M, etc.), and 3 is two more indicators (P and T)
and the terminating NUL character.
Aram Sargsyan [Mon, 17 Oct 2022 08:45:26 +0000 (08:45 +0000)]
Fix an off-by-one error in cfg_print_duration()
The cfg_print_duration() checks added previously in the 'duration_test'
unit test uncovered a bug in cfg_print_duration().
When calculating the current 'str' pointer of the generated text in the
buffer 'buf', it erroneously adds 1 byte to compensate for that part's
indicator character. For example, to add 12 minutes, it needs to add
2 + 1 = 3 characters, where 2 is the length of "12", and 1 is the length
of "M" (for minute). The mistake was that the length of the indicator
is already included in 'durationlen[i]', so there is no need to
calculate it again.
In the result of this mistake the current pointer can advance further
than needed and end up after the zero-byte instead of right on it, which
essentially cuts off any further generated text. For example, for a
5 minutes and 30 seconds duration, instead of having this:
'P', 'T', '5', 'M', '3', '0', 'S', '\0'
The function generates this:
'P', 'T', '5', 'M', '\0', '3', '0', 'S', '\0'
Fix the bug by adding to 'str' just 'durationlen[i]' instead of
'durationlen[i] + 1'.