Be more aggressive when throttling the reading - when we can't send the
outgoing TCP synchronously with uv_try_write(), we start throttling the
reading immediately instead of waiting for the send buffers to fill up.
This should not affect behaved clients that read the data from the TCP
on the other end.
Ondřej Surý [Mon, 17 Jun 2024 09:40:40 +0000 (11:40 +0200)]
Be smarter about refusing to add many RR types to the database
Instead of outright refusing to add new RR types to the cache, be a bit
smarter:
1. If the new header type is in our priority list, we always add either
positive or negative entry at the beginning of the list.
2. If the new header type is negative entry, and we are over the limit,
we mark it as ancient immediately, so it gets evicted from the cache
as soon as possible.
3. Otherwise add the new header after the priority headers (or at the
head of the list).
4. If we are over the limit, evict the last entry on the normal header
list.
Ondřej Surý [Mon, 17 Jun 2024 09:40:40 +0000 (11:40 +0200)]
Expand the list of the priority types
Add HTTPS, SVCB, SRV, PTR, NAPTR, DNSKEY and TXT records to the list of
the priority types that are put at the beginning of the slabheader list
for faster access and to avoid eviction when there are more types than
the max-types-per-name limit.
Artem Boldariev [Tue, 18 Jun 2024 08:43:49 +0000 (11:43 +0300)]
Use smaller pools of requests and handles for sockets
This commit ensures that socket objects use smaller sizes for its
internal requests and handles pools. That prevents a memory allocator
from thrashing.
Artem Boldariev [Thu, 13 Jun 2024 11:34:20 +0000 (14:34 +0300)]
Avoid indefinite send re-scheduling in TLS DNS
When a peer is not reading the data we are sending it was for the TLS
DNS code to end up in a situation when it would indefinitely
reschedule send requests, effectively turning the 'uv_loop' into a
busy loop that would consume CPU cycles in endless efforts to send
outgoing data.
The main reason for that was only one send buffer dedicated for sends:
the code would re-queue sends until it is empty - that would never
happen when the remote side is not reading data.
That seems like an omission from the older day of the Network Manager
as it is quiet simple to make the code use multiple buffers for
sends. That ultimately breaks the cycle of futile send request
rescheduling.
As a side effect, this commit also gets rid of one memory copying on a
hot path.
Artem Boldariev [Wed, 12 Jun 2024 11:28:38 +0000 (14:28 +0300)]
Introduce TCP throttling into TLS DNS code
Throttling functionality was omitted from the c6f13f12cd862ffae071e56ee7e1fa9998fc23c3. This commit fixes that,
taking into account the latest developments in this area.
Artem Boldariev [Tue, 11 Jun 2024 14:20:22 +0000 (17:20 +0300)]
Do not un-throttle TCP connections on isc_nm_read()
Due to omission it was possible to un-throttle a TCP connection
previously throttled due to the peer not reading back data we are
sending.
In particular, that affected DoH code, but it could also affect other
transports (the current or future ones) that pause/resume reading
according to its internal state.
Mark Andrews [Tue, 16 Jan 2024 03:25:27 +0000 (14:25 +1100)]
Clear qctx->zversion
Clear qctx->zversion when clearing qctx->zrdataset et al in
lib/ns/query.c:qctx_freedata. The uncleared pointer could lead to
an assertion failure if zone data needed to be re-saved which could
happen with stale data support enabled.
By default we log a rekey failure on debug level. We should probably
change the log level to error. We make an exception for when the zone
is not loaded yet, it often happens at startup that a rekey is
run before the zone is fully loaded.
Evan Hunt [Sat, 1 Jun 2024 00:16:29 +0000 (17:16 -0700)]
fix a memory leak that could occur when signing
when signatures were not added because of too many types already
existing at a node, the diff was not being cleaned up; this led to
a memory leak being reported at shutdown.
Matthijs Mekking [Fri, 31 May 2024 11:08:38 +0000 (13:08 +0200)]
Add new test cases with DNSSEC signing
kasp-max-types-per-name (named2.conf.in):
An unsigned zone with RR type count on a name right below the
configured limit. Then sign the zone using KASP. Adding a RRSIG would
push it over the RR type limit per name. Signing should fail, but
the server should not crash, nor end up in infinite resign-attempt loop.
kasp-max-records-per-type-dnskey (named1.conf.in):
Test with low max-record-per-rrset limit and a DNSSEC policy requiring
more than the limit. Signing should fail.
kasp-max-types-per-name (named1.conf.in):
Each RRSIG(covered type) is counted as an individual RR type. Test the
corner case where a signed zone, which is just below the limit-1,
adds a new type - doing so would trigger signing for the new type and
thus increase the number of "types" by 2, pushing it over the limit
again.
Matthijs Mekking [Thu, 30 May 2024 10:26:03 +0000 (12:26 +0200)]
Add test cases that use DNSSEC signing
Add two new masterformat tests that use signing. In the case of
'under-limit-kasp', the signing will keep the number of records in the
RRset under the limit. In the case of 'on-limit-kasp', the signing
will push the number of records in the RRset over the limit, because
of the added RRSIG record.
Ondřej Surý [Tue, 28 May 2024 13:23:24 +0000 (15:23 +0200)]
Add a test for not caching large number of RRsets
Send a recursive query for a large number of RRsets, which should
fail when using the default max-types-per-name setting of 100, but
succeed when the cap is disabled.
Ondřej Surý [Sat, 25 May 2024 09:46:56 +0000 (11:46 +0200)]
Add a limit to the number of RR types for single name
Previously, the number of RR types for a single owner name was limited
only by the maximum number of the types (64k). As the data structure
that holds the RR types for the database node is just a linked list, and
there are places where we just walk through the whole list (again and
again), adding a large number of RR types for a single owner named with
would slow down processing of such name (database node).
Add a configurable limit to cap the number of the RR types for a single
owner. This is enforced at the database (rbtdb, qpzone, qpcache) level
and configured with new max-types-per-name configuration option that
can be configured globally, per-view and per-zone.
Evan Hunt [Fri, 24 May 2024 02:07:34 +0000 (19:07 -0700)]
Add a test for not caching large RRset
Send a recursive query for a large (2500 record) RRset, which should
fail when using the default max-records-per-type setting of 100, but
succeed when the cap is disabled.
Ondřej Surý [Thu, 23 May 2024 17:12:40 +0000 (19:12 +0200)]
Add test for not-loading and not-transfering huge RRSets
Add two new masterformat tests - the 'huge' zone fits within the ns1
limit and loads on the primary ns1 server, but must not transfer to the
ns2 secondary, and the 'uber' zone should not even load on the primary
ns1 server.
Ondřej Surý [Fri, 1 Mar 2024 07:26:07 +0000 (08:26 +0100)]
Add a limit to the number of RRs in RRSets
Previously, the number of RRs in the RRSets were internally unlimited.
As the data structure that holds the RRs is just a linked list, and
there are places where we just walk through all of the RRs, adding an
RRSet with huge number of RRs inside would slow down processing of said
RRSets.
Add a configurable limit to cap the number of the RRs in a single RRSet.
This is enforced at the database (rbtdb, qpzone, qpcache) level and
configured with new max-records-per-type configuration option that can
be configured globally, per-view and per-zone.
Ondřej Surý [Wed, 5 Jun 2024 07:15:39 +0000 (09:15 +0200)]
Limit the number of DNS message processed from a single TCP read
The single TCP read can create as much as 64k divided by the minimum
size of the DNS message. This can clog the processing thread and trash
the memory allocator because we need to do as much as ~20k allocations in
a single UV loop tick.
Limit the number of the DNS messages processed in a single UV loop tick
to just single DNS message and limit the number of the outstanding DNS
messages back to 23. This effectively limits the number of pipelined
DNS messages to that number (this is the limit we already had before).
Ondřej Surý [Tue, 4 Jun 2024 06:38:35 +0000 (08:38 +0200)]
Replace the tcp_buffers memory pool with static per-loop buffer
As a single thread can process only one TCP send at the time, we don't
really need a memory pool for the TCP buffers, but it's enough to have
a single per-loop (client manager) static buffer that's being used to
assemble the DNS message and then it gets copied into own sending
buffer.
In the future, this should get optimized by exposing the uv_try API
from the network manager, and first try to send the message directly
and allocate the sending buffer only if we need to send the data
asynchronously.
Aram Sargsyan [Tue, 12 Mar 2024 15:29:51 +0000 (15:29 +0000)]
ns_client: reuse TCP send buffers
Constantly allocating, reallocating and deallocating 64K TCP send
buffers by 'ns_client' instances takes too much CPU time.
There is an existing mechanism to reuse the ns_clent_t structure
associated with the handle using 'isc_nmhandle_getdata/_setdata'
(see ns_client_request()), but it doesn't work with TCP, because
every time ns_client_request() is called it gets a new handle even
for the same TCP connection, see the comments in
streamdns_on_complete_dnsmessage().
To solve the problem, we introduce an array of available (unused)
TCP buffers stored in ns_clientmgr_t structure so that a 'client'
working via TCP can have a chance to reuse one (if there is one)
instead of allocating a new one every time.
Ondřej Surý [Thu, 18 Jan 2024 16:24:22 +0000 (17:24 +0100)]
Throttle reading from TCP if the sends are not getting through
When TCP client would not read the DNS message sent to them, the TCP
sends inside named would accumulate and cause degradation of the
service. Throttle the reading from the TCP socket when we accumulate
enough DNS data to be sent. Currently this is limited in a way that a
single largest possible DNS message can fit into the buffer.
Artem Boldariev [Wed, 13 Mar 2024 16:04:46 +0000 (18:04 +0200)]
Keep the endpoints set reference within an HTTP/2 socket
This commit ensures that an HTTP endpoints set reference is stored in
a socket object associated with an HTTP/2 stream instead of
referencing the global set stored inside a listener.
This helps to prevent an issue like follows:
1. BIND is configured to serve DoH clients;
2. A client is connected and one or more HTTP/2 stream is
created. Internal pointers are now pointing to the data on the
associated HTTP endpoints set;
3. BIND is reconfigured - the new endpoints set object is created and
promoted to all listeners;
4. The old pointers to the HTTP endpoints set data are now invalid.
Instead referencing a global object that is updated on
re-configurations we now store a local reference which prevents the
endpoints set objects to go out of scope prematurely.
Artem Boldariev [Fri, 8 Dec 2023 12:26:46 +0000 (14:26 +0200)]
DoH: avoid potential use after free for HTTP/2 session objects
It was reported that HTTP/2 session might get closed or even deleted
before all async. processing has been completed.
This commit addresses that: now we are avoiding using the object when
we do not need it or specifically check if the pointers used are not
'NULL' and by ensuring that there is at least one reference to the
session object while we are doing incoming data processing.
This commit makes the code more resilient to such issues in the
future.
Mark Andrews [Wed, 3 Apr 2024 06:37:14 +0000 (17:37 +1100)]
Clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
When calling dns_resolver_createfetch in resolver.c with a callback
of resume_dslookup, clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT from
options as DNS_EVENT_TRYSTALE is not an expected event type and
triggers a REQUIRE.
Mark Andrews [Fri, 16 Feb 2024 00:40:26 +0000 (11:40 +1100)]
Use a new memory context when flushing the cache
When the cache's memory context was in over memory state when the
cache was flushed it resulted in LRU cleaning removing newly entered
data in the new cache straight away until the old cache had been
destroyed enough to take it out of over memory state. When flushing
the cache create a new memory context for the new db to prevent this.
Mark Andrews [Wed, 29 Nov 2023 03:29:05 +0000 (14:29 +1100)]
Check that no primaries is logged with -4 or -6
When in -4 mode check that "IPv6 disabled and no IPv4 primaries"
is logged and when in -6 mode check that "IPv4 disabled and no IPv6
primaries" is logged.
Michał Kępień [Mon, 3 Jun 2024 11:07:21 +0000 (13:07 +0200)]
Fail for merge requests with "Affects v9.x" labels
Setting "Affects v9.x" labels on a merge request duplicates information
already present on the GitLab issue associated with that merge request.
For trivial merge requests that are not associated with any GitLab
issue, setting the "Affects v9.x" label(s) is considered unnecessary.
Trigger a failure for every merge request marked with at least one
"Affects v9.x" label.
Michał Kępień [Mon, 3 Jun 2024 11:07:21 +0000 (13:07 +0200)]
Warn about auto-generated merge request titles
Merge request titles auto-generated by GitLab are often a source of
confusion regarding the actual contents of a given merge request. Warn
for merge requests containing titles that look like auto-generated ones.
Michał Kępień [Mon, 3 Jun 2024 11:07:21 +0000 (13:07 +0200)]
Fail for branches using old-style version suffixes
Using "-v9_x" and "-v9.x" version suffixes for branch names is now
deprecated since some automation logic does not handle these. Fail for
any merge request using such old-style version suffixes.
Michał Kępień [Mon, 3 Jun 2024 11:07:21 +0000 (13:07 +0200)]
Fail for backports with "Affects v9.x" labels set
Backports are not expected to have any "Affects v9.x" labels set since
those are only meant to be set for merge requests that should have
backports created for them.
Aydın Mercan [Fri, 24 May 2024 11:56:03 +0000 (14:56 +0300)]
increase TCP4Clients/TCP6Clients after point of no failure
Failing to accept TCP/TLS connections in 9.18 detaches the quota in
isc__nm_failed_accept_cb, causing TCP4Clients and TCP6Clients statistics
to not decrease inside cleanup.
Fix by increasing the counter after the point of no failure but before
handling statistics through the client's socket is no longer valid.
Ondřej Surý [Tue, 28 May 2024 14:13:31 +0000 (16:13 +0200)]
Create the new database for AXFR from the dns_zone API
The `axfr_makedb()` didn't set the loop on the newly created database,
effectively killing delayed cleaning on such database. Move the
database creation into dns_zone API that knows all the gory details of
creating new database suitable for the zone.