Andoni Duarte [Thu, 30 Jan 2025 12:31:14 +0000 (12:31 +0000)]
[9.18] fix: ci: remove allow failure in cross version config tests
From https://gitlab.isc.org/isc-projects/bind9/-/issues/5087, the relevant MRs have been merged in the January 2025 release. Hence this MR removes `allow_failure: true` in CI.
Backport of MR !10026
Merge branch 'backport-andoni/remove-allow-failure-in-cross-version-config-tests-9.18' into 'bind-9.18'
Michał Kępień [Thu, 30 Jan 2025 06:44:18 +0000 (07:44 +0100)]
Fix "rndc flushname" for longer name server names
dns_adb_flushname() calls dns_name_hash() to determine the ADB bucket
number to search for the given name. Meanwhile, all other functions in
lib/dns/adb.c call dns_name_fullhash() for determining the bucket number
instead. This discrepancy causes dns_adb_flushname() to have virtually
no chances of actually removing the given name from the ADB if the
name is longer than 16 bytes (since dns_name_hash() only hashes the
first 16 bytes of the name provided to it) - more specifically, the
probability of success for names longer than 16 bytes is inversely
proportional to the number of ADB buckets in use, i.e. 1:1021 at best.
Fix by using dns_name_fullhash() instead of dns_name_hash() in
dns_adb_flushname(), so that the logic for determining the bucket number
that a given name belongs to is consistent throughout lib/dns/adb.c.
Nicki Křížek [Wed, 29 Jan 2025 14:51:51 +0000 (14:51 +0000)]
[9.18] chg: ci: Use make clean to reduce artifacts in successful jobs
Reduce the amount of artifacts stored by running make clean at the end
of unit and system test run. If any of the previous commands fail, the
runner will stop executing the commands in `script` immediately, so the
cleanup only happens if none of the previous commands failed.
The build artifacts from unit and system tests are re-used anywhere and
should be safe to throw away immediately.
Backport of MR !10015
Merge branch 'backport-nicki/reduce-ci-artifacts-9.18' into 'bind-9.18'
Nicki Křížek [Tue, 28 Jan 2025 14:23:01 +0000 (15:23 +0100)]
Use make clean to reduce artifacts in successful jobs
Reduce the amount of artifacts stored by running make clean at the end
of unit and system test run. If any of the previous commands fail, the
runner will stop executing the commands in `script` immediately, so the
cleanup only happens if none of the previous commands failed.
The build artifacts from unit and system tests are re-used anywhere and
should be safe to throw away immediately. Same for respdiff.
Nicki Křížek [Tue, 28 Jan 2025 13:40:43 +0000 (13:40 +0000)]
[9.18] fix: ci: Run merged-metadata job for release branches in private repo
The prior regex didn't match the actual names we use for release
branches in the private repo. This caused the merged-metadata job to not
be created upon merging to a release branch, resulting in the private MR
not being properly milestoned.
Use the correct regex along with protecting the v9.*-release branches in
the gitlab UI so that they have access to the token used to perform the
required API operations.
Backport of MR !10003
Merge branch 'backport-nicki/ci-fix-post-merge-in-private-repo-9.18' into 'bind-9.18'
Nicki Křížek [Mon, 27 Jan 2025 14:30:39 +0000 (15:30 +0100)]
Run merged-metadata job for release branches in private repo
The prior regex didn't match the actual names we use for release
branches in the private repo. This caused the merged-metadata job to not
be created upon merging to a release branch, resulting in the private MR
not being properly milestoned.
Use the correct regex along with protecting the v9.*-release branches in
the gitlab UI so that they have access to the token used to perform the
required API operations.
Michal Nowak [Mon, 19 Feb 2024 14:55:00 +0000 (15:55 +0100)]
Add DoH and DoT stress tests, generate test configurations
Add DoH and DoT stress test jobs. The DoH scenario on FreeBSD is omitted
because all Flamethrower's DoH queries timeout on this platform.
Since the response rate of DoT queries is lower than that of DoH and
TCP, the expected TCP response rate is 80%.
Due to the large number of similar stress test configurations, the
"util/generate-stress-test-configs.py" script now generates them as part
of a downstream pipeline. The script is expected to be run exclusively
within the CI environment, which sources all environmental variables and
files.
This refactoring brought the following changes:
- To start a stress test immediately and not wait for artifacts of the
autoreconf job, run the "autoreconf -fi" command as part of every job.
- Drop the BIND_STRESS_TEST_* variables as they were rarely used and
conflicted with mode and platform selection in the configuration
generator.
- Most pipelines now include a few short, randomly selected stress test
jobs. To schedule all stress tests, set the ALL_BIND_STRESS_TESTS
environmental variable, push a tag to CI, or run a scheduled pipeline.
- Set the BIND_STRESS_TESTS_RUN_TIME environmental variable to pick the
stress test runtime of your choosing, set the BIND_STRESS_TESTS_RATE
environmental variable to set different than the default query rate.
- Job timeout is set to 30 minutes plus stress test runtime in minutes.
Nicki Křížek [Mon, 27 Jan 2025 09:35:45 +0000 (09:35 +0000)]
[9.18] chg: ci: Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In 49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
Backport of MR !9804
Merge branch 'backport-nicki/ci-fix-changelog-job-9.18' into 'bind-9.18'
Nicki Křížek [Mon, 2 Dec 2024 14:31:53 +0000 (15:31 +0100)]
Ensure changelog job builds docs with the new entry
The changelog job is supposed to test that the text from GitLab MR
title&description is valid rst syntax and can be built with sphinx. In 49128fc1, the way gitchangelog generates entries was changed - it no
longer writes to the changelog file, but generates output on stdout
instead. Ensure the generated notes is actually written to (some)
rendered file which is part of the docs so that the subsequent sphinx
build attempts to render the note.
Nicki Křížek [Thu, 23 Jan 2025 17:48:18 +0000 (17:48 +0000)]
[9.18] chg: ci: Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.
The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.
The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.
Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
Backport of MR !9950
Merge branch 'backport-nicki/ci-respdiff-limits-9.18' into 'bind-9.18'
Nicki Křížek [Mon, 13 Jan 2025 13:29:24 +0000 (14:29 +0100)]
Set stricter limits for respdiff testing
Adjust the limit of maximum disagreements in respdiff results based on
recent pipeline results.
The respdiff and respdiff:asan seem to have almost identical results,
typically around 0.07 % of differences with ocassional spikes up to
around 0.11 %. Similar results are for respdiff:tsan, perhaps with more
common spikes with values up to around 0.12 %. Set the limit to 0.15 %
to allow for some tolerance due to network conditions, time of day etc.
The respdiff:third-party has a slightly higher disagreements average,
with typical values being around 0.12 %. Set the limit to 0.2 %.
Exceeding either of those values should be quite clear indication that
some resolution behaviour has changed, since the values appear to be
very stable within the newly configured limits.
Ondřej Surý [Thu, 23 Jan 2025 17:25:37 +0000 (17:25 +0000)]
fix: nil: Stop the timer when canceling the last fetch
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer.
Closes #5136
Merge branch '5136-stop-timer-when-canceling-last-fetch-9.18' into 'bind-9.18'
Ondřej Surý [Thu, 23 Jan 2025 16:04:24 +0000 (17:04 +0100)]
Stop the timer when shuttingdown the fetch context
When canceling the last fetch, we also need to stop the fctx_expired
timer from possibly firing between the fctx_shutdown() call and the
fetch being actually destroyed along with the timer. As there are
multiple places where fctx_shutdown() is being called without stopping
the timer, move the fctx_stoptimer() to fctx_shutdown() and cleanup the
explicit usage.
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.
Add text to clarify that the interval is calculated from the new
validity period.
Closes #5128
Backport of MR !9955
Merge branch 'backport-5128-clarify-dnssec-signzone-interval-9.18' into 'bind-9.18'
Matthijs Mekking [Wed, 15 Jan 2025 12:47:48 +0000 (13:47 +0100)]
Clarify dnssec-signzone interval option
There was confusion about whether the interval was calculated from
the validity period provided on the command line (with -s and -e),
or from the signature being replaced.
Add text to clarify that the interval is calculated from the new
validity period.
Ondřej Surý [Wed, 22 Jan 2025 14:30:05 +0000 (14:30 +0000)]
[9.18] fix: usr: Apply the memory limit only to ADB database items
Resolver under heavy-load could exhaust the memory available for storing
the information in the Address Database (ADB) effectively evicting already
stored information in the ADB. The memory used to retrieve and provide
information from the ADB is now not a subject of the same memory limits
that are applied for storing the information in the Address Database.
Closes #5127
Backport of MR !9954
Merge branch 'backport-5127-change-ADB-memory-split-9.18' into 'bind-9.18'
Ondřej Surý [Wed, 15 Jan 2025 09:36:33 +0000 (10:36 +0100)]
Remove memory limit on ADB finds and fetches
Address Database (ADB) shares the memory for the short lived ADB
objects (finds, fetches, addrinfo) and the long lived ADB
objects (names, entries, namehooks). This could lead to a situation
where the resolver-heavy load would force evict ADB objects from the
database to point where ADB is completely empty, leading to even more
resolver-heavy load.
Make the short lived ADB objects use the other memory context that we
already created for the hashmaps. This makes the ADB overmem condition
to not be triggered by the ongoing resolver fetches.
Ondřej Surý [Wed, 22 Jan 2025 14:27:44 +0000 (14:27 +0000)]
[9.18] fix: usr: Improve the resolver performance under attack
A remote client can force the DNS resolver component to consume the memory faster than cleaning up the resources for the canceled resolver fetches due to `recursive-clients` limit. If the such traffic pattern is sustained for a long period of time, the DNS server might eventually run out of the available memory. This has been fixed.
It should be noted that when under such heavy attack for BIND 9 version both with and without the fix, no outgoing DNS queries will be successful as the generated traffic pattern will consume all the available slots for the recursive clients.
Merge branch '5110-backport-the-hashtable-use-for-fetchcontexts-9.18' into 'bind-9.18'
Ondřej Surý [Wed, 15 Jan 2025 12:02:20 +0000 (13:02 +0100)]
Replace linked lists with the hashtables to hold fetch contexts
When the recursive-clients value is too large, the linked lists holding
the fetch contexts can also grow large and since the algorithm to merge
outgoing queries is quadratic, named can get slow.
Replace the linked list with hashtable for faster lookups. This also
allows us to reduce the number of tasks (buckets) in the resolver.
Ondřej Surý [Wed, 22 Jan 2025 13:31:39 +0000 (13:31 +0000)]
[9.18] fix: usr: Avoid unnecessary locking in the zone/cache database
Prevent lock contention among many worker threads referring to the same database node at the same time. This would improve zone and cache database performance for the heavily contended database nodes.
Backport of !9963
Closes #5130
Merge branch '5130-reduce-lock-contention-in-decrement-reference-9.18' into 'bind-9.18'
JINMEI Tatuya [Sat, 18 Jan 2025 00:54:19 +0000 (16:54 -0800)]
Optimize database decref by avoiding locking with refs > 1
Previously, this function always acquires a node write lock if it
might need node cleanup in case the reference decrements to 0. In
fact, the lock is unnecessary if the reference is larger than 1 and it
can be optimized as an "easy" case. This optimization could even be
"necessary". In some extreme cases, many worker threads could repeat
acquring and releasing the reference on the same node, resulting in
severe lock contention for nothing (as the ref wouldn't decrement to 0
in most cases). This change would prevent noticeable performance
drop like query timeout for such cases.
Ondřej Surý [Wed, 15 Jan 2025 12:02:20 +0000 (13:02 +0100)]
Shutdown the fetch context after canceling the last fetch
Currently, the fetch context will continue running even when the last
fetch (response) has been removed from the context, so named can process
and cache the answer. This can lead to a situation where the number of
outgoing recursing clients exceeds the the configured number for
recursive-clients.
Be more stringent about the recursive-clients limit and shutdown the
fetch context immediately after the last fetch has been canceled from
that particular fetch context.
The last remaining tuning value was RESOLVER_NTASKS and instead of
having variable number of the tasks per-cpu and in named and in
dns_client, set the number of the resolver tasks to 523 (number taken
from dns_client unit) to accomodate most of the recursive-clients
values.
Reduce struct isc__nm_uvreq size from 1560 to 560 bytes
The uv_req union member of struct isc__nm_uvreq contained libuv request
types that we don't use. Turns out that uv_getnameinfo_t is 1000 bytes
big and unnecessarily enlarged the whole structure. Remove all the
unused members from the uv_req union.
After removing sockaddr_unix from isc_sockaddr, we can also remove
sockaddr_storage and reduce the isc_sockaddr size from 152 bytes to just
48 bytes needed to hold IPv6 addresses.
Fix DNS-over-HTTP(S) implementation issues that arise under heavy
query load. Optimize resource usage for :iscman:`named` instances
that accept queries over DNS-over-HTTP(S).
Previously, :iscman:`named` would process all incoming HTTP/2 data
at once, which could overwhelm the server, especially when dealing
with clients that send requests but don't wait for responses. That
has been fixed. Now, :iscman:`named` handles HTTP/2 data in smaller
chunks and throttles reading until the remote side reads the
response data. It also throttles clients that send too many requests
at once.
Additionally, :iscman:`named` now carefully processes data sent by
some clients, which can be considered "flooding." It logs these
clients and drops connections from them.
:gl:`#4795`
In some cases, :iscman:`named` could leave DNS-over-HTTP(S)
connections in the `CLOSE_WAIT` state indefinitely. That also has
been fixed. ISC would like to thank JF Billaud for thoroughly
investigating the issue and verifying the fix.
:gl:`#5083`
See https://gitlab.isc.org/isc-projects/bind9/-/issues/4795
We started using isc_nm_bad_request() more actively throughout
codebase. In the case of HTTP/2 it can lead to a large count of
useless "Bad Request" messages in the BIND log, as often we attempt to
send such request over effectively finished HTTP/2 sessions.
This commit introduces manual read timer control as used by StreamDNS
and its underlying transports. Before that, DoH code would rely on the
timer control provided by TCP, which would reset the timer any time
some data arrived. Now, the timer is restarted only when a full DNS
message is processed in line with other DNS transports.
That change is required because we should not stop the timer when
reading from the network is paused due to throttling. We need a way to
drop timed-out clients, particularly those who refuse to read the data
we send.
This commit adds logic to make code better protected against clients
that send valid HTTP/2 data that is useless from a DNS server
perspective.
Firstly, it adds logic that protects against clients who send too
little useful (=DNS) data. We achieve that by adding a check that
eventually detects such clients with a nonfavorable useful to
processed data ratio after the initial grace period. The grace period
is limited to processing 128 KiB of data, which should be enough for
sending the largest possible DNS message in a GET request and then
some. This is the main safety belt that would detect even flooding
clients that initially behave well in order to fool the checks server.
Secondly, in addition to the above, we introduce additional checks to
detect outright misbehaving clients earlier:
The code will treat clients that open too many streams (50) without
sending any data for processing as flooding ones; The clients that
managed to send 1.5 KiB of data without opening a single stream or
submitting at least some DNS data will be treated as flooding ones.
Of course, the behaviour described above is nothing else but
heuristical checks, so they can never be perfect. At the same time,
they should be reasonable enough not to drop any valid clients,
realatively easy to implement, and have negligible computational
overhead.
DoH: process data chunk by chunk instead of all at once
Initially, our DNS-over-HTTP(S) implementation would try to process as
much incoming data from the network as possible. However, that might
be undesirable as we might create too many streams (each effectively
backed by a ns_client_t object). That is too forgiving as it might
overwhelm the server and trash its memory allocator, causing high CPU
and memory usage.
Instead of doing that, we resort to processing incoming data using a
chunk-by-chunk processing strategy. That is, we split data into small
chunks (currently 256 bytes) and process each of them
asynchronously. However, we can process more than one chunk at
once (up to 4 currently), given that the number of HTTP/2 streams has
not increased while processing a chunk.
That alone is not enough, though. In addition to the above, we should
limit the number of active streams: these streams for which we have
received a request and started processing it (the ones for which a
read callback was called), as it is perfectly fine to have more opened
streams than active ones. In the case we have reached or surpassed the
limit of active streams, we stop reading AND processing the data from
the remote peer. The number of active streams is effectively decreased
only when responses associated with the active streams are sent to the
remote peer.
Overall, this strategy is very similar to the one used for other
stream-based DNS transports like TCP and TLS.
This commit adds isc__nm_async_run() which is very similar to
isc_async_run() in newer versions of BIND: it allows calling a
callback asynchronously.
Potentially, it can be used to replace some other async operations in
other networking code, in particular the delayed I/O calls in TLS a
TCP DNS transports to name a few and remove quiet a lot of code, but
it we are unlikely to do that for the strictly maintenance only
branch, so it is protected with DoH-related #ifdefs.
It is implemented in a "universal" way mainly because doing it in the
specific code requires the same amount of code and is not simpler.
Implement TCP manual read timer control functionality
This commit adds a manual TCP read timer control mode which is
supposed to override automatic resetting of the timer when any data is
received. That can be accomplished by
`isc__nmhandle_set_manual_timer()`.
This functionality is supposed to be used by multilevel networking
transports which require finer grained control over the read
timer (TLS Stream, DoH).
The commit is essentially an implementation of the functionality from
newer versions of BIND.
Andoni Duarte [Wed, 15 Jan 2025 13:27:08 +0000 (13:27 +0000)]
[9.18] [CVE-2024-11187] sec: usr: Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if the answer has more than 13 names in the RDATA. This limits the number of lookups into the database(s) during a single client query, reducing query processing load.
Backport of MR !750
See isc-projects/bind9#5034
Merge branch '5034-security-limit-additional-9.18' into 'v9.18.33-release'
Ondřej Surý [Thu, 14 Nov 2024 09:37:29 +0000 (10:37 +0100)]
Limit the additional processing for large RDATA sets
When answering queries, don't add data to the additional section if
the answer has more than 13 names in the RDATA. This limits the
number of lookups into the database(s) during a single client query,
reducing query processing load.
Also, don't append any additional data to type=ANY queries. The
answer to ANY is already big enough.
Ondřej Surý [Tue, 7 Jan 2025 14:22:40 +0000 (15:22 +0100)]
Isolate using the -T noaa flag only for part of the resolver test
Instead of running the whole resolver/ns4 server with -T noaa flag,
use it only for the part where it is actually needed. The -T noaa
could interfere with other parts of the test because the answers don't
have the authoritative-answer bit set, and we could have false
positives (or false negatives) in the test because the authoritative
server doesn't follow the DNS protocol for all the tests in the resolver
system test.
Arаm Sаrgsyаn [Wed, 8 Jan 2025 10:29:14 +0000 (10:29 +0000)]
fix: dev: Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.
Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
Closes #5121
Merge branch 'aram/isc_rwlock_trylock-bugfix-9.18' into 'bind-9.18'
Aram Sargsyan [Tue, 7 Jan 2025 13:30:26 +0000 (13:30 +0000)]
Fix a bug in isc_rwlock_trylock()
When isc_rwlock_trylock() fails to get a read lock because another
writer was faster, it should wake up other waiting writers in case
there are no other readers, but the current code forgets about
the currently active writer when evaluating 'cntflag'.
Unset the WRITER_ACTIVE bit in 'cntflag' before checking to see if
there are other readers, otherwise the waiting writers, if they exist,
might not wake up.
Mark Andrews [Wed, 11 Dec 2024 02:32:18 +0000 (13:32 +1100)]
Fix startup notify rate test
The terminating conditions for the startup notify test would
occasionally get ~20 records or get +10 seconds of records due to
a bad terminating condition. Additionally 20 samples lead to test
failures. Fix the terminating condition to use the correct conditional
(-eq -> -ge) and increase the minimum number of log entries to
average over to 22.
Michal Nowak [Thu, 12 Dec 2024 12:51:46 +0000 (12:51 +0000)]
[9.18] fix: test: Wait for "all zones loaded" after rndc reload in "database" test
After the rndc reload command finished, we might have queried the
database zone sooner than it was reloaded because rndc reloads zones
asynchronously if no specific zone was provided. We should wait for "all
zones loaded" in the ns1 log to be sure.
Closes #5075
Backport of MR !9829
Merge branch 'backport-5075-database-rndc-reload-ensure-all-zones-loaded-9.18' into 'bind-9.18'
Michal Nowak [Thu, 5 Dec 2024 10:58:12 +0000 (11:58 +0100)]
Wait for "all zones loaded" after rndc reload in "database" test
After the rndc reload command finished, we might have queried the
database zone sooner than it was reloaded because rndc reloads zones
asynchronously if no specific zone was provided. We should wait for "all
zones loaded" in the ns1 log to be sure.
Evan Hunt [Wed, 11 Dec 2024 15:53:26 +0000 (15:53 +0000)]
[9.18] fix: nil: update style guideline to reflect current practice
The style guide now mentions clang-format, doesn't parenthesize return values, and no longer calls for backward compatibility in public function names.
Backport of MR !9892
Merge branch 'backport-each-style-update-9.18' into 'bind-9.18'
Michal Nowak [Thu, 5 Dec 2024 14:50:40 +0000 (15:50 +0100)]
Set cross-version-config-tests to allow_failure in CI
The December releases suffer from the ns2/managed1.conf file not being
in the mkeys extra_artifacts. This manifests only when pytest is run
with the --setup-only option, which is the case in the
cross-version-config-tests CI job. The original issue is fixed in !9815,
but the fix will be effective only when subsequent releases are out.
Mark Andrews [Tue, 10 Dec 2024 03:36:40 +0000 (03:36 +0000)]
[9.18] fix: usr: Unknown directive in resolv.conf not handled properly
The line after an unknown directive in resolv.conf could accidentally be skipped, potentially affecting dig, host, nslookup, nsupdate, or delv. This has been fixed.
Closes #5084
Backport of MR !9865
Merge branch 'backport-5084-plain-unknown-keyword-in-resolv-conf-not-handled-propely-9.18' into 'bind-9.18'
Mark Andrews [Mon, 9 Dec 2024 03:45:38 +0000 (14:45 +1100)]
Fix parsing of unknown directives in resolv.conf
Only call eatline() to skip to the next line if we're not
already at the end of a line when parsing an unknown directive.
We were accidentally skipping the next line when there was only
a single unknown directive on the current line.
[9.18] chg: dev: Use query counters in validator code
Commit af7db8951364a89c468eda1535efb3f53adc2c1f as part of #4141 was supposed to apply the 'max-recursion-queries' quota to validator queries, but the counter was never actually passed on to 'dns_resolver_createfetch()'. This has been fixed, and the global query counter ('max-query-count', per client request) is now also added.
Related to #4980
Backport of MR !9856
Merge branch 'backport-4980-pass-counters-in-validator-createfetch-9.18' into 'bind-9.18'
Commit af7db8951364a89c468eda1535efb3f53adc2c1f as part of #4141 was
supposed to apply the 'max-recursion-queries' quota to validator
queries, but the counter was never actually passed on to
dns_resolver_createfetch(). This has been fixed, and the global query
counter ('max-query-count', per client request) is now also added.