Ondřej Surý [Tue, 1 Dec 2020 17:53:26 +0000 (18:53 +0100)]
Change the default value for nocookie-udp-size back to 4096
The DNS Flag Day 2020 reduced all the EDNS buffer sizes to 1232. In
this commit, we revert the default value for nocookie-udp-size back to
4096 because the option is too obscure and most people don't realize
that they also need to change this configuration option in addition to
max-udp-size.
Mark Andrews [Thu, 26 Nov 2020 04:59:14 +0000 (15:59 +1100)]
Adjust default value of "max-recursion-queries"
Since the queries sent towards root and TLD servers are now included in
the count (as a result of the fix for CVE-2020-8616),
"max-recursion-queries" has a higher chance of being exceeded by
non-attack queries. Increase its default value from 75 to 100.
Mark Andrews [Thu, 12 Nov 2020 22:45:47 +0000 (09:45 +1100)]
Tighten DNS COOKIE response handling
Fallback to TCP when we have already seen a DNS COOKIE response
from the given address and don't have one in this UDP response. This
could be a server that has turned off DNS COOKIE support, a
misconfigured anycast server with partial DNS COOKIE support, or a
spoofed response. Falling back to TCP is the correct behaviour in
all 3 cases.
Michal Nowak [Tue, 24 Nov 2020 16:39:23 +0000 (17:39 +0100)]
Write traceback file to the same directory as core file
The traceback files could overwrite each other on systems which do not
use different core dump file names for different processes. Prevent
that by writing the traceback file to the same directory as the core
dump file.
These changes still do not prevent the operating system from overwriting
a core dump file if the same binary crashes multiple times in the same
directory and core dump files are named identically for different
processes.
Diego Fronza [Wed, 25 Nov 2020 19:39:19 +0000 (16:39 -0300)]
Silence coverity warnings in query.c
Return value of dns_db_getservestalerefresh() and
dns_db_getservestalettl() functions were previously unhandled.
This commit purposefully ignore those return values since there is
no side effect if those results are != ISC_R_SUCCESS, it also supress
Coverity warnings.
Matthijs Mekking [Fri, 13 Nov 2020 11:26:05 +0000 (12:26 +0100)]
Add NSEC3PARAM unit test, refactor zone.c
Add unit test to ensure the right NSEC3PARAM event is scheduled in
'dns_zone_setnsec3param()'. To avoid scheduling and managing actual
tasks, split up the 'dns_zone_setnsec3param()' function in two parts:
1. 'dns__zone_lookup_nsec3param()' that will check if the requested
NSEC3 parameters already exist, and if a new salt needs to be
generated.
2. The actual scheduling of the new NSEC3PARAM event (if needed).
When generating a new salt, compare it with the previous NSEC3
paremeters to ensure the new parameters are different from the
previous ones.
This moves the salt generation call from 'bin/named/*.s' to
'lib/dns/zone.c'. When setting new NSEC3 parameters, you can set a new
function parameter 'resalt' to enforce a new salt to be generated. A
new salt will also be generated if 'salt' is set to NULL.
Logging salt with zone context can now be done with 'dnssec_log',
removing the need for 'dns_nsec3_log_salt'.
Matthijs Mekking [Fri, 23 Oct 2020 13:02:19 +0000 (15:02 +0200)]
Change nsec3param salt config to saltlen
Upon request from Mark, change the configuration of salt to salt
length.
Introduce a new function 'dns_zone_checknsec3aram' that can be used
upon reconfiguration to check if the existing NSEC3 parameters are
in sync with the configuration. If a salt is used that matches the
configured salt length, don't change the NSEC3 parameters.
Matthijs Mekking [Tue, 13 Oct 2020 15:48:22 +0000 (17:48 +0200)]
Check nsec3param configuration values
Check 'nsec3param' configuration for the number of iterations. The
maximum number of iterations that are allowed are based on the key
size (see https://tools.ietf.org/html/rfc5155#section-10.3).
Check 'nsec3param' configuration for correct salt. If the string is
not "-" or hex-based, this is a bad salt.
Matthijs Mekking [Tue, 13 Oct 2020 12:52:02 +0000 (14:52 +0200)]
Don't use 'rndc signing' with kasp
The 'rndc signing' command allows you to manipulate the private
records that are used to store signing state. Don't use these with
'dnssec-policy' as such manipulations may violate the policy (if you
want to change the NSEC3 parameters, change the policy and reconfig).
Matthijs Mekking [Tue, 13 Oct 2020 12:48:04 +0000 (14:48 +0200)]
Fix a reconfig bug wrt inline-signing
When doing 'rndc reconfig', named may complain about a zone not being
reusable because it has a raw version of the zone, and the new
configuration has not set 'inline-signing'. However, 'inline-signing'
may be implicitly true if a 'dnssec-policy' is used for the zone, and
the zone is not dynamic.
Improve the check in 'named_zone_reusable'. Create a new function for
checking 'inline-signing' configuration that matches existing code in
'bin/named/server.c'.
Matthijs Mekking [Tue, 13 Oct 2020 12:39:21 +0000 (14:39 +0200)]
Support for NSEC3 in dnssec-policy
Implement support for NSEC3 in dnssec-policy. Store the configuration
in kasp objects. When configuring a zone, call 'dns_zone_setnsec3param'
to queue an nsec3param event. This will ensure that any previous
chains will be removed and a chain according to the dnssec-policy is
created.
Add tests for dnssec-policy zones that uses the new 'nsec3param'
option, as well as changing to new values, changing to NSEC, and
changing from NSEC.
Michał Kępień [Thu, 26 Nov 2020 12:10:40 +0000 (13:10 +0100)]
Use proper cmocka macros for pointer checks
Make sure pointer checks in unit tests use cmocka assertion macros
dedicated for use with pointers instead of those dedicated for use with
integers or booleans.
Michał Kępień [Wed, 25 Nov 2020 11:45:47 +0000 (12:45 +0100)]
Convert add_quota() to a function
cppcheck 2.2 reports the following false positive:
lib/isc/tests/quota_test.c:71:21: error: Array 'quotas[101]' accessed at index 110, which is out of bounds. [arrayIndexOutOfBounds]
isc_quota_t *quotas[110];
^
The above is not even an array access, so this report is obviously
caused by a cppcheck bug. Yet, it seems to be triggered by the presence
of the add_quota() macro, which should really be a function. Convert
the add_quota() macro to a function in order to make the code cleaner
and to prevent the above cppcheck 2.2 false positive from being
triggered.
Michał Kępień [Wed, 25 Nov 2020 11:45:47 +0000 (12:45 +0100)]
Silence cppcheck 2.2 false positive in udp_recv()
cppcheck 2.2 reports the following false positive:
lib/dns/dispatch.c:1241:14: warning: Either the condition 'resp==NULL' is redundant or there is possible null pointer dereference: resp. [nullPointerRedundantCheck]
if (disp != resp->disp) {
^
lib/dns/dispatch.c:1212:11: note: Assuming that condition 'resp==NULL' is not redundant
if (resp == NULL) {
^
lib/dns/dispatch.c:1241:14: note: Null pointer dereference
if (disp != resp->disp) {
^
Apparently this version of cppcheck gets confused about conditional
"goto" statements because line 1241 can never be reached if 'resp' is
NULL.
Move a code block to prevent the above false positive from being
reported without affecting the processing logic.
Michał Kępień [Wed, 25 Nov 2020 11:45:47 +0000 (12:45 +0100)]
Teach cppcheck that fatal() does not return
cppcheck is not aware that the bin/dnssec/dnssectool.c:fatal() function
does not return. This triggers certain cppcheck 2.2 false positives,
for example:
bin/dnssec/dnssec-signzone.c:3470:13: warning: Either the condition 'ndskeys==8' is redundant or the array 'dskeyfile[8]' is accessed at index 8, which is out of bounds. [arrayIndexOutOfBoundsCond]
dskeyfile[ndskeys++] = isc_commandline_argument;
^
bin/dnssec/dnssec-signzone.c:3467:16: note: Assuming that condition 'ndskeys==8' is not redundant
if (ndskeys == MAXDSKEYS) {
^
bin/dnssec/dnssec-signzone.c:3470:13: note: Array index out of bounds
dskeyfile[ndskeys++] = isc_commandline_argument;
^
bin/dnssec/dnssec-signzone.c:771:20: warning: Either the condition 'l->hashbuf==NULL' is redundant or there is pointer arithmetic with NULL pointer. [nullPointerArithmeticRedundantCheck]
memset(l->hashbuf + l->entries * l->length, 0, l->length);
^
bin/dnssec/dnssec-signzone.c:767:18: note: Assuming that condition 'l->hashbuf==NULL' is not redundant
if (l->hashbuf == NULL) {
^
bin/dnssec/dnssec-signzone.c:771:20: note: Null pointer addition
memset(l->hashbuf + l->entries * l->length, 0, l->length);
^
Instead of suppressing all such warnings individually, conditionally
define a preprocessor macro which prevents them from being triggered.
Michał Kępień [Wed, 25 Nov 2020 11:45:47 +0000 (12:45 +0100)]
Remove cppcheck 2.0 false positive workarounds
The cppcheck bug which commit 4c2c93c821631e0411d3df91f1b7f2ae75535bd7
works around was fixed in cppcheck 2.2. Drop the relevant hack from the
definition of the cppcheck GitLab CI job.
Michał Kępień [Mon, 23 Nov 2020 10:46:50 +0000 (11:46 +0100)]
Enable "stress" tests to be run on demand
The "stress" test can be run in different ways, depending on:
- the tested scenario (authoritative, recursive),
- the operating system used (Linux, FreeBSD),
- the architecture used (amd64, arm64).
Currently, all supported "stress" test variants are automatically
launched for all scheduled pipelines and for pipelines started for tags;
there is no possibility of running these tests on demand, which could be
useful in certain circumstances.
Employ the "only:variables" key to enable fine-grained control over the
list of "stress" test jobs to be run for a given pipeline. Three CI
variables are used to specify the list of "stress" test jobs to create:
- BIND_STRESS_TEST_MODE: specifies the test mode to use; must be
explicitly set in order for any "stress" test job to be created;
allowed values are: "authoritative", "recursive",
- BIND_STRESS_TEST_OS: specifies the operating system to run the test
on; allowed values are: "linux", "freebsd"; defaults to "linux", may
be overridden at pipeline creation time,
- BIND_STRESS_TEST_ARCH: specifies the architecture to run the test
on; allowed values are: "amd64", "arm64"; defaults to "amd64", may
be overridden at pipeline creation time.
Since case-insensitive regular expressions are used for determining
which jobs to run, every variable described above may contain multiple
values. For example, setting the BIND_STRESS_TEST_MODE variable to
"authoritative,recursive" will cause the "stress" test to be run in both
supported scenarios (either on the default OS/architecture combination,
i.e. Linux/amd64, or, if the relevant variables are explicitly
specified, the requested OS/architecture combinations).
Matthijs Mekking [Tue, 10 Nov 2020 13:48:24 +0000 (14:48 +0100)]
Change serve-stale test stale-answer-ttl
Using a 'stale-answer-ttl' the same value as the authoritative ttl
value makes it hard to differentiate between a response from the
stale cache and a response from the authoritative server.
Change the stale-answer-ttl from 2 to 4, so that it differs from the
authoritative ttl.
Diego Fronza [Tue, 20 Oct 2020 19:07:56 +0000 (16:07 -0300)]
Wait for multiple parallel dig commands to fully finish
The strategy of running many dig commands in parallel and
waiting for the respective output files to be non empty was
resulting in random test failures, hard to reproduce, where
it was possible that the subsequent reading of the files could
have been failing due to the file's content not being fully flushed.
Instead of checking if output files are non empty, we now wait
for the dig processes to finish.
Diego Fronza [Tue, 20 Oct 2020 00:25:34 +0000 (21:25 -0300)]
Added system test for stale-refresh-time
This test works as follow:
- Query for data.example rrset.
- Sleep until its TTL expires (2 secs).
- Disable authoritative server.
- Query for data.example again.
- Since server is down, answer come from stale cache, which has
a configured stale-answer-ttl of 3 seconds.
- Enable authoritative server.
- Query for data.example again
- Since last query before activating authoritative server failed, and
since 'stale-refresh-time' seconds hasn't elapsed yet, answer should
come from stale cache and not from the authoritative server.
Diego Fronza [Tue, 20 Oct 2020 00:24:38 +0000 (21:24 -0300)]
Adjusted ancient rrset system test
Before the stale-refresh-time feature, the system test for ancient rrset
was somewhat based on the average time the previous tests and queries
were taking, thus not very precise.
After the addition of stale-refresh-time the system test for ancient
rrset started to fail since the queries for stale records (low
max-stale-ttl) were not taking the time to do a full resolution
anymore, since the answers now were coming from the cache (because the
rrset were stale and within stale-refresh-time window after the
previous resolution failure).
To handle this, the correct time to wait before rrset become ancient is
calculated from max-stale-ttl configuration plus the TTL set in the
rrset used in the tests (ans2/ans.pl).
Then before sending queries for ancient rrset, we check if we need to
sleep enough to ensure those rrset will be marked as ancient.
Diego Fronza [Mon, 19 Oct 2020 20:02:03 +0000 (17:02 -0300)]
Add stale-refresh-time option
Before this update, BIND would attempt to do a full recursive resolution
process for each query received if the requested rrset had its ttl
expired. If the resolution fails for any reason, only then BIND would
check for stale rrset in cache (if 'stale-cache-enable' and
'stale-answer-enable' is on).
The problem with this approach is that if an authoritative server is
unreachable or is failing to respond, it is very unlikely that the
problem will be fixed in the next seconds.
A better approach to improve performance in those cases, is to mark the
moment in which a resolution failed, and if new queries arrive for that
same rrset, try to respond directly from the stale cache, and do that
for a window of time configured via 'stale-refresh-time'.
Only when this interval expires we then try to do a normal refresh of
the rrset.
The logic behind this commit is as following:
- In query.c / query_gotanswer(), if the test of 'result' variable falls
to the default case, an error is assumed to have happened, and a call
to 'query_usestale()' is made to check if serving of stale rrset is
enabled in configuration.
- If serving of stale answers is enabled, a flag will be turned on in
the query context to look for stale records:
query.c:6839
qctx->client->query.dboptions |= DNS_DBFIND_STALEOK;
- A call to query_lookup() will be made again, inside it a call to
'dns_db_findext()' is made, which in turn will invoke rbdb.c /
cache_find().
- In rbtdb.c / cache_find() the important bits of this change is the
call to 'check_stale_header()', which is a function that yields true
if we should skip the stale entry, or false if we should consider it.
- In check_stale_header() we now check if the DNS_DBFIND_STALEOK option
is set, if that is the case we know that this new search for stale
records was made due to a failure in a normal resolution, so we keep
track of the time in which the failured occured in rbtdb.c:4559:
header->last_refresh_fail_ts = search->now;
- In check_stale_header(), if DNS_DBFIND_STALEOK is not set, then we
know this is a normal lookup, if the record is stale and the query
time is between last failure time + stale-refresh-time window, then
we return false so cache_find() knows it can consider this stale
rrset entry to return as a response.
The last additions are two new methods to the database interface:
- setservestale_refresh
- getservestale_refresh
Those were added so rbtdb can be aware of the value set in configuration
option, since in that level we have no access to the view object.
Mark Andrews [Tue, 13 Oct 2020 02:00:36 +0000 (13:00 +1100)]
Address TSAN error between dns_rbt_findnode() and subtractrdataset().
Having dns_rbt_findnode() in previous_closest_nsec() check of
node->data is a optimisation that triggers a TSAN error with
subtractrdataset(). find_closest_nsec() still needs to check if
the NSEC record are active or not and look for a earlier NSEC records
if it isn't. Set DNS_RBTFIND_EMPTYDATA so node->data isn't referenced
without the node lock being held.