By inspecting the code, it was discovered that .sendbuf member of the
isc__nm_networker_t was unused and just consuming ~64k per worker.
Remove the member and the association allocation/deallocation.
This makes any debugging of the unit tests too hard. Futures attempts
to fix #3980 should add a custom automake test harness (log driver) that
would kill the unit test after configured timeout.
Refactor the isc_quota code and fix the quota in TCP accept code
In e18541287231b721c9cdb7e492697a2a80fd83fc, the TCP accept quota code
became broken in a subtle way - the quota would get initialized on the
first accept for the server socket and then deleted from the server
socket, so it would never get applied again.
Properly fixing this required a bigger refactoring of the isc_quota API
code to make it much simpler. The new code decouples the ownership of
the quota and acquiring/releasing the quota limit.
After (during) the refactoring it became more clear that we need to use
the callback from the child side of the accepted connection, and not the
server side.
Run closehandle_cb on run queue instead of async queue
Instead of using isc_async_run() when closing StreamDNS handle, add
isc_job_t member to the isc_nmhandle_t structure and use isc_job_run()
to avoid allocation/deallocation on the StreamDNS hot-path.
Accept overquota TCP connection on local thread if possible
If the quota callback is called on a thread matching the socket, call
the TCP accept function directly instead of using isc_async_run() which
allocates-deallocates memory.
The isc_tid() function is often called on the hot-path and it's the only
function is to return thread_local variable, make the isc_tid() function
a header-only to save several function calls during query-response
processing.
Michal Nowak [Wed, 5 Apr 2023 13:55:09 +0000 (15:55 +0200)]
Do not retry in resolution_fails() on timeout
At the time of test number (19), there were 10 "sending packet to
10.53.0.7" lines in the "legacy/ns1/named.run" file; usually, only seven
are present:
I:legacy:checking recursive lookup to edns 512 + no tcp server does not cause query loops (19)
I:legacy:ns1 sent 10 queries to ns7, expected less than 10
I:legacy:failed
Those three can be attributed to tests "8", "10", and "18", where the
dig of "resolution_fails()" retried after a timeout to succeed with
"status: SERVFAIL" subsequently, as seen in each of
dig.out.test{8,10,18} files.
;; communications error to 10.53.0.1#13093: timed out
Tony Finch [Wed, 5 Apr 2023 12:42:52 +0000 (13:42 +0100)]
Correct value of DNS_NAME_MAXLABELS
It should be floor(DNS_NAME_MAXWIRE / 2) + 1 == 128
The mistake was introduced in c6bf51492dbd because:
* I was refactoring an existing `DNS_MAX_LABELS` defined as 127
* There was a longstanding bug in `dns_name_isvalid()` which
checked the number of labels against 127U instead of 128
* I mistakenly thought `dns_name_isvalid()` was correct and
`dns_name_countlabels()` was incorrect, but the reverse was true.
After this commit, occurrances of `DNS_NAME_MAXLABELS` with value
128 are consistent with the use of 127 or 128 before commit c6bf51492dbd except for the mistake in `dns_name_isvalid()`.
This commit adds a test case that checks the MAXLABELS case
in `dns_name_fromtext()` and `dns_name_isvalid()`.
Tony Finch [Tue, 14 Feb 2023 16:13:16 +0000 (16:13 +0000)]
Use a qp-trie for the zone table
This change makes the zone table lock-free for reads. Previously, the
zone table used a red-black tree, which is not thread safe, so the hot
read path acquired both the per-view mutex and the per-zonetable
rwlock. (The double locking was to fix to cleanup races on shutdown.)
One visible difference is that zones are not necessarily shut down
promptly: it depends on when the qp-trie garbage collector cleans up
the zone table. The `catz` system test checks several times that zones
have been deleted; the test now checks for zones to be removed from
the server configuration, instead of being fully shut down. The catz
test does not churn through enough zones to trigger a gc, so the zones
are not fully detached until the server exits.
After this change, it is still possible to improve the way we handle
changes to the zone table, for instance, batching changes, or better
compaction heuristics.
Tony Finch [Fri, 3 Mar 2023 12:05:51 +0000 (12:05 +0000)]
Compact more in dns_qp_compact(DNS_QPGC_ALL)
Commit 0858514ae8 enriched dns_qp_compact() to give callers more
control over how thoroughly the trie should be compacted.
In the DNS_QPGC_ALL case, if the trie is small it might be compacted
to a new position in the same memory chunk. In this situation it will
still be holding references to old leaf objects which have been
removed from the trie but will not be completely detached until the
chunk containing the references is freed.
This change resets the qp-trie allocator to a fresh chunk before a
DNS_QPGC_ALL compaction, so all the old memory chunks will be
evacuated and old leaf objects can be detached sooner.
Tony Finch [Thu, 2 Mar 2023 13:30:24 +0000 (13:30 +0000)]
Support for off-loop read-ony qp-trie transactions
It is sometimes necessary to access a qp-trie outside an isc_loop,
such as in tests or an isc_work callback. The best option was to use
a `dns_qpmulti_write()` transaction, but that has overheads that are
not necessary for read-only access, such as committing a new version
of the trie even when nothing changed.
So this commit adds a `dns_qpmulti_read()` transaction, which is
nearly as lightweight as a query transaction, but it takes the mutex
like a write transaction.
Tony Finch [Fri, 10 Feb 2023 16:53:31 +0000 (16:53 +0000)]
Support for finding the longest parent domain in a qp-trie
This is the first of the "fancy" searches that know how the DNS
namespace maps on to the structure of a qp-trie. For example, it will
find the closest enclosing zone in the zone tree.
In the check_algorithm() function openssleddsa_alg_info() is
called with two known variants of the 'algorithm' argument, and
both are expected to return a non-NULL value.
Add an INSIST to suppress the following GCC 12 analyzer report:
openssleddsa_link.c: In function 'raw_key_to_ossl':
openssleddsa_link.c:92:13: error: dereference of NULL 'alginfo' [CWE-476] [-Werror=analyzer-null-dereference]
92 | int pkey_type = alginfo->pkey_type;
| ^~~~~~~~~
Mark Andrews [Tue, 4 Apr 2023 01:01:36 +0000 (11:01 +1000)]
Remove 'inst != NULL' from cleanup check in plugin_register
'inst' is guarenteed to be non NULL at this point.
358 *instp = inst;
359
360cleanup:
CID 281450 (#2 of 2): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking inst suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
361 if (result != ISC_R_SUCCESS && inst != NULL) {
362 plugin_destroy((void **)&inst);
363 }
364
365 return (result);
Bump the requirement in the shutdown test to dnspython 2.0.0
The dnspython.Resolve.resolve() requires at least dnspython >= 2.0.0,
this wasn't enforced in the shutdown system test leading to infinite
loop waiting for the server start due to failing resolve() call.
Ondřej Surý [Wed, 22 Mar 2023 14:11:17 +0000 (15:11 +0100)]
Add test for RPZ in multiple views
This adds rudimentary test for response-policy zones in multiple
views. Different combinations are tested:
- two views with response-policy inherited from options {};
- two views view explicit response-policy using same RPZ zone name
- two views view explicit response-policy using secondary RPZ zone
Ondřej Surý [Thu, 30 Mar 2023 19:19:17 +0000 (21:19 +0200)]
Change dns_adbentry_overquota() to dns_adb_overquota()
The dns_adbentry_overquota() was violating the layers accessing the
adbentry struct members directly. Change it to dns_adb_overquota() to
match the dns_adb API.
Attach catzs to catz instead of doing this explicitly
Instead of explicitly adding a reference to catzs (catalog zones) when
calling the update callback, attach the catzs to the catz (catalog zone)
object to keep it referenced for the whole time the catz exists.
As we are now using dispatch instead of netmgr for XFR TCP connection,
the xfrin_recv_done() will be called when cancelling the dispatch with
ISC_R_CANCELED. This could lead to double detach from the dns_xfrin_t,
one in the xfrin_recv_done() and one in the dns_xfrin_shutdown().
Remove the extra detach from the dns_xfrin_shutdown() and rely on the
dispatch read callback to be always called.
use ISC_REFCOUNT_IMPL for external dns_zone references
use the ISC_REFCOUNT implementation for dns_zone_attach() and
_detach(). (this applies only to external zone references, not
to dns_zone_iattach() and dns_zone_idetach().)
use dns_zone_ref() where previously a dummy zone object had been
used to increment the reference count.
Mark Andrews [Thu, 24 Nov 2022 03:18:20 +0000 (14:18 +1100)]
Reduce the number of verifiations required
In selfsigned_dnskey only call dns_dnssec_verify if the signature's
key id matches a revoked key, the trust is pending and the key
matches a trust anchor. Previously named was calling dns_dnssec_verify
unconditionally resulted in busy work.