Fix DNS-over-HTTP(S) implementation issues that arise under heavy
query load. Optimize resource usage for :iscman:`named` instances
that accept queries over DNS-over-HTTP(S).
Previously, :iscman:`named` would process all incoming HTTP/2 data
at once, which could overwhelm the server, especially when dealing
with clients that send requests but don't wait for responses. That
has been fixed. Now, :iscman:`named` handles HTTP/2 data in smaller
chunks and throttles reading until the remote side reads the
response data. It also throttles clients that send too many requests
at once.
Additionally, :iscman:`named` now carefully processes data sent by
some clients, which can be considered "flooding." It logs these
clients and drops connections from them.
:gl:`#4795`
In some cases, :iscman:`named` could leave DNS-over-HTTP(S)
connections in the `CLOSE_WAIT` state indefinitely. That also has
been fixed. ISC would like to thank JF Billaud for thoroughly
investigating the issue and verifying the fix.
:gl:`#5083`
See https://gitlab.isc.org/isc-projects/bind9/-/issues/4795
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.
That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.
Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.
Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:
The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.
DoH: process data chunk by chunk instead of all at once
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.
Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.
That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.
Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.
This commit adds isc__nm_async_run() which is very similar to
isc_async_run() in newer versions of BIND: it allows calling a
callback asynchronously.
Potentially, it can be used to replace some other async operations in
other networking code, in particular the delayed I/O calls in TLS a
TCP DNS transports to name a few and remove quiet a lot of code, but
it we are unlikely to do that for the strictly maintenance only
branch, so it is protected with DoH-related #ifdefs.
It is implemented in a "universal" way mainly because doing it in the
specific code requires the same amount of code and is not simpler.
Implement TCP manual read timer control functionality
This commit adds a manual TCP read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received. That can be accomplished by
`isc__nmhandle_set_manual_timer()`.
This functionality is supposed to be used by multilevel networking
transports which require finer grained control over the read
timer (TLS Stream, DoH).
The commit is essentially an implementation of the functionality from
newer versions of BIND.
Andoni Duarte [Wed, 15 Jan 2025 13:27:08 +0000 (13:27 +0000)]
[9.18] [CVE-2024-11187] sec: usr: Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if the answer has more than 13 names in the RDATA. This limits the number of lookups into the database(s) during a single client query, reducing query processing load.
Backport of MR !750
See isc-projects/bind9#5034
Merge branch '5034-security-limit-additional-9.18' into 'v9.18.33-release'
Ondřej Surý [Thu, 14 Nov 2024 09:37:29 +0000 (10:37 +0100)]
Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA. This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.
Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.
Ondřej Surý [Tue, 7 Jan 2025 14:22:40 +0000 (15:22 +0100)]
Isolate using the -T noaa flag only for part of the resolver test
Instead of running the whole resolver/ns4 server with -T noaa flag,
use it only for the part where it is actually needed. The -T noaa
could interfere with other parts of the test because the answers don't
have the authoritative-answer bit set, and we could have false
positives (or false negatives) in the test because the authoritative
server doesn't follow the DNS protocol for all the tests in the resolver
system test.
Arаm Sаrgsyаn [Wed, 8 Jan 2025 10:29:14 +0000 (10:29 +0000)]
fix: dev: Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.
Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
Closes #5121
Merge branch 'aram/isc_rwlock_trylock-bugfix-9.18' into 'bind-9.18'
Aram Sargsyan [Tue, 7 Jan 2025 13:30:26 +0000 (13:30 +0000)]
Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.
Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
Mark Andrews [Wed, 11 Dec 2024 02:32:18 +0000 (13:32 +1100)]
Fix startup notify rate test
The terminating conditions for the startup notify test would
occasionally get ~20 records or get +10 seconds of records due to
a bad terminating condition. Additionally 20 samples lead to test
failures. Fix the terminating condition to use the correct conditional
(-eq -> -ge) and increase the minimum number of log entries to
average over to 22.
Michal Nowak [Thu, 12 Dec 2024 12:51:46 +0000 (12:51 +0000)]
[9.18] fix: test: Wait for "all zones loaded" after rndc reload in "database" test
After the rndc reload command finished, we might have queried the
database zone sooner than it was reloaded because rndc reloads zones
asynchronously if no specific zone was provided. We should wait for "all
zones loaded" in the ns1 log to be sure.
Closes #5075
Backport of MR !9829
Merge branch 'backport-5075-database-rndc-reload-ensure-all-zones-loaded-9.18' into 'bind-9.18'
Michal Nowak [Thu, 5 Dec 2024 10:58:12 +0000 (11:58 +0100)]
Wait for "all zones loaded" after rndc reload in "database" test
After the rndc reload command finished, we might have queried the
database zone sooner than it was reloaded because rndc reloads zones
asynchronously if no specific zone was provided. We should wait for "all
zones loaded" in the ns1 log to be sure.
Evan Hunt [Wed, 11 Dec 2024 15:53:26 +0000 (15:53 +0000)]
[9.18] fix: nil: update style guideline to reflect current practice
The style guide now mentions clang-format, doesn't parenthesize return values, and no longer calls for backward compatibility in public function names.
Backport of MR !9892
Merge branch 'backport-each-style-update-9.18' into 'bind-9.18'
Michal Nowak [Thu, 5 Dec 2024 14:50:40 +0000 (15:50 +0100)]
Set cross-version-config-tests to allow_failure in CI
The December releases suffer from the ns2/managed1.conf file not being
in the mkeys extra_artifacts. This manifests only when pytest is run
with the --setup-only option, which is the case in the
cross-version-config-tests CI job. The original issue is fixed in !9815,
but the fix will be effective only when subsequent releases are out.
Mark Andrews [Tue, 10 Dec 2024 03:36:40 +0000 (03:36 +0000)]
[9.18] fix: usr: Unknown directive in resolv.conf not handled properly
The line after an unknown directive in resolv.conf could accidentally be skipped, potentially affecting dig, host, nslookup, nsupdate, or delv. This has been fixed.
Closes #5084
Backport of MR !9865
Merge branch 'backport-5084-plain-unknown-keyword-in-resolv-conf-not-handled-propely-9.18' into 'bind-9.18'
Mark Andrews [Mon, 9 Dec 2024 03:45:38 +0000 (14:45 +1100)]
Fix parsing of unknown directives in resolv.conf
Only call eatline() to skip to the next line if we're not
already at the end of a line when parsing an unknown directive.
We were accidentally skipping the next line when there was only
a single unknown directive on the current line.
[9.18] chg: dev: Use query counters in validator code
Commit af7db8951364a89c468eda1535efb3f53adc2c1f as part of #4141 was supposed to apply the 'max-recursion-queries' quota to validator queries, but the counter was never actually passed on to 'dns_resolver_createfetch()'. This has been fixed, and the global query counter ('max-query-count', per client request) is now also added.
Related to #4980
Backport of MR !9856
Merge branch 'backport-4980-pass-counters-in-validator-createfetch-9.18' into 'bind-9.18'
Commit af7db8951364a89c468eda1535efb3f53adc2c1f as part of #4141 was
supposed to apply the 'max-recursion-queries' quota to validator
queries, but the counter was never actually passed on to
dns_resolver_createfetch(). This has been fixed, and the global query
counter ('max-query-count', per client request) is now also added.
Ondřej Surý [Fri, 6 Dec 2024 17:29:39 +0000 (18:29 +0100)]
Update picohttpparser.{c,h} with upstream repository
Upstream code doesn't do regular releases, so we need to regularly
sync the code from the upstream repository. This is synchronization up
to the commit f8d0513 from Jan 29, 2024.
While implementing the global limit 'max-query-count', initially I
thought adding the variable to the resolver structure. But the limit
is per client request so it was moved to the view structure (and
counter in ns_query structure). However, I forgot to remove the
variable from the resolver structure again. This commit fixes that.
[9.18] new: usr: Add a new option to configure the maximum number of outgoing queries per client request
The configuration option 'max-query-count' sets how many outgoing queries per client request is allowed. The existing 'max-recursion-queries' is the number of permissible queries for a single name and is reset on every CNAME redirection. This new option is a global limit on the client request. The default is 200.
This allows us to send a bit more queries while looking up a single name. The default for 'max-recursion-queries' is changed from 32 to 50.
Closes #4980 Closes #4921
Backport of MR !9737
Merge branch 'backport-4980-global-limit-outgoing-queries-9.18' into 'bind-9.18'
Changing the default for max-recursion-queries from 100 to 32 was too
strict in some cases, especially lookups in reverse IPv6 trees started
to fail more frequently. From issue #4921 it looks like 50 is a better
default.
Now that we have 'max-query-count' as a global limit of outgoing queries
per client request, we can increase the default for
'max-recursion-queries' again, as the number of recursive queries is
no longer bound by the multiple of 'max-recursion-queries' and
'max-query-restarts'.
Matthijs Mekking [Mon, 25 Nov 2024 15:27:21 +0000 (16:27 +0100)]
Add a CAMP test case
This adds a new test directory specifically for CAMP attacks. This first
test in this test directory follows multiple CNAME chains, restarting
the max-recursion-queries counter, but should bail when the global
maximum quota max-query-count is reached.
Add another option to configure how many outgoing queries per
client request is allowed. The existing 'max-recursion-queries' is
per restart, this one is a global limit.
[9.18] fix: usr: Fix nsupdate hang when processing a large update
To mitigate DNS flood attacks over a single TCP connection, we throttle the connection when the other side does not read the data. Throttling should only occur on server-side sockets, but erroneously also happened for nsupdate, which acts as a client. When nsupdate started throttling the connection, it never attempts to read again. This has been fixed.
Closes #4910
Backport of MR !9709
Merge branch 'backport-4910-nsupdate-hangs-when-processing-large-update-9.18' into 'bind-9.18'
The root cause is the fix for CVE-2024-0760 (part 3), which resets
the TCP connection on a failed send. Specifically commit 4b7c6138 stops reading on the socket
because the TCP connection is throttling.
When the tcpdns_send_cb callback thinks about restarting reading
on the socket, this fails because the socket is a client socket.
And nsupdate is a client and is using the same netmgr code.
This commit removes the requirement that the socket must be a server
socket, allowing reading on the socket again after being throttled.
Michal Nowak [Thu, 5 Dec 2024 10:47:59 +0000 (10:47 +0000)]
[9.18] fix: ci: Add ns2/managed1.conf to mkeys extra_artifacts
The ns2/managed1.conf file is created by the setup.sh script. Then, in
the tests.sh script it is moved to ns2/managed.conf. The latter file
name is in mkeys extra_artifacts, but the former one is not. This is a
problem when pytest is started with the --setup-only option as it only
runs the setup.sh script (e.g., in the cross-version-config-tests CI
job) and thus failing the "Unexpected files found" assertion.
Backport of MR !9815
Merge branch 'backport-mnowak/mkeys-add-ns2-managed1-conf-to-extra-artifacts-9.18' into 'bind-9.18'
Michal Nowak [Wed, 4 Dec 2024 17:17:40 +0000 (18:17 +0100)]
Add ns2/managed1.conf to mkeys extra_artifacts
The ns2/managed1.conf file is created by the setup.sh script. Then, in
the tests.sh script it is moved to ns2/managed.conf. The latter file
name is in mkeys extra_artifacts, but the former one is not. This is a
problem when pytest is started with the --setup-only option as it only
runs the setup.sh script (e.g., in the cross-version-config-tests CI
job) and thus failing the "Unexpected files found" assertion.
Mark Andrews [Tue, 19 Nov 2024 14:20:42 +0000 (01:20 +1100)]
Keep a local copy of the update rules to prevent UAF
Previously, the update policy rules check was moved earlier in the
sequence, and the keep rule match pointers were kept to maintain the
ability to verify maximum records by type.
However, these pointers can become invalid if server reloading
or reconfiguration occurs before update completion. To prevent
this issue, extract the maximum records by type value immediately
during processing and only keep the copy of the values instead of the
full ssurule.
Petr Špaček [Mon, 2 Dec 2024 14:01:48 +0000 (14:01 +0000)]
[9.18] chg: doc: gitchangelog: don't break lines on hyphens in relnotes
When release notes are generated, the text is wrapped and line breaks
are inserted into each paragraph (sourced from the commit message's
body). Prevent line breaks after hyphens, as these are often used for
option names. This makes it possible to easily find the options
afterwards.
Backport of MR !9801
Merge branch 'backport-nicki/gitchangelog-dont-break-on-hyphens-9.18' into 'bind-9.18'
Nicki Křížek [Mon, 2 Dec 2024 10:10:01 +0000 (11:10 +0100)]
gitchangelog: don't break lines on hyphens in relnotes
When release notes are generated, the text is wrapped and line breaks
are inserted into each paragraph (sourced from the commit message's
body). Prevent line breaks after hyphens, as these are often used for
option names. This makes it possible to easily find the options
afterwards.
Arаm Sаrgsyаn [Wed, 27 Nov 2024 15:54:08 +0000 (15:54 +0000)]
[9.18] fix: test: Fix the nslookup system test
The nslookup system test checks the count of resolved addresses in
the CNAME tests using a 'grep' match on the hostname, and ignoring
lines containing the 'canonical name' string. In order to protect
the check from intermittent failures like the 'address in use' warning
message, which then automatically resolves after a retry, edit the
'grep' matching string to also ignore the comments (as the mentioned
warning message is a comment which contains the hostname).
Closes #4948
Backport of MR !9523
Merge branch 'backport-4948-nslookup-test-fix-9.18' into 'bind-9.18'
The nslookup system test checks the count of resolved addresses in
the CNAME tests using a 'grep' match on the hostname, and ignoring
lines containing the 'canonical name' string. In order to protect
the check from intermittent failures like the 'address in use' warning
message, which then automatically resolves after a retry, edit the
'grep' matching string to also ignore the comments (as the mentioned
warning message is a comment which contains the hostname).
Mark Andrews [Wed, 27 Nov 2024 03:22:49 +0000 (03:22 +0000)]
[9.18] chg: usr: emit more helpful log for exceeding max-records-per-type
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.
It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.
Backport of MR !9509
Merge branch 'backport-helpful-log-on-toomanyrecords-9.18' into 'bind-9.18'
use more generic log module name for 'logtoomanyrecords'
DNS_LOGMODULE_RBTDB was simply inappropriate, and this
log message is actually dependent on db implementation
details, so DNS_LOGMODULE_DB would be the best choice.
JINMEI Tatuya [Thu, 29 Aug 2024 07:24:48 +0000 (16:24 +0900)]
emit more helpful log for exceeding max-records-per-type
The new log message is emitted when adding or updating an RRset
fails due to exceeding the max-records-per-type limit. The log includes
the owner name and type, corresponding zone name, and the limit value.
It will be emitted on loading a zone file, inbound zone transfer
(both AXFR and IXFR), handling a DDNS update, or updating a cache DB.
It's especially helpful in the case of zone transfer, since the
secondary side doesn't have direct access to the offending zone data.
It could also be used for max-types-per-name, but this change
doesn't implement it yet as it's much less likely to happen
in practice.