In HTTP/1.0 and HTTP/1.1, RFC 9112 section 9.6 says the last response
in a connection should include a `Connection: close` header, but the
statschannel server omitted it.
In an HTTP/1.0 response, the statschannel server can sometimes send a
`Connection: keep-alive` header when it is about to close the
connection. There are two ways:
If the first request on a connection is keep-alive and the second
request is not, then _both_ responses have `Connection: keep-alive`
but the connection is (correctly) closed after the second response.
If a single request contains
Connection: close
Connection: keep-alive
then RFC 9112 section 9.3 says the keep-alive header is ignored, but
the statschannel sends a spurious keep-alive in its response, though
it correctly closes the connection.
To fix these bugs, make it more clear that the `httpd->flags` are part
of the per-request-response state. The Connection: flags are now
described in terms of the effect they have instead of what causes them
to be set.
Michał Kępień [Thu, 15 Jun 2023 14:17:14 +0000 (16:17 +0200)]
Fix entity renumbering in util/parse_tsan.py
util/parse_tsan.py builds tables of mutexes, threads, and pointers it
finds in the TSAN report provided to it as a command-line argument and
then replaces all mentions of each of these entities so that they are
numbered sequentially in the processed report. For example, this line:
Cycle in lock order graph: M0 (...) => M5 (...) => M9 (...) => M0
is expected to become:
Cycle in lock order graph: M1 (...) => M2 (...) => M3 (...) => M1
Problems arise when the gaps between mutex/thread identifiers present on
a single line are smaller than the total number of mutexes/threads found
by the script so far. For example, the following line:
Cycle in lock order graph: M0 (...) => M1 (...) => M2 (...) => M0
first gets turned into:
Cycle in lock order graph: M1 (...) => M1 (...) => M2 (...) => M1
and then into:
Cycle in lock order graph: M2 (...) => M2 (...) => M2 (...) => M2
In other words, lines like this become garbled due to information loss.
The problem stems from the fact that the numbering scheme the script
uses for identifying mutexes and threads is exactly the same as the one
used by TSAN itself. Update util/parse_tsan.py so that it uses
zero-padded numbers instead, making the "overlapping" demonstrated above
impossible.
Ondřej Surý [Thu, 15 Jun 2023 08:24:21 +0000 (10:24 +0200)]
Make isc_result tables smaller
The isc_result_t enum was to sparse when each library code would skip to
next << 16 as a base. Remove the huge holes in the isc_result_t enum to
make the isc_result tables more compact.
This change required a rewrite how we map dns_rcode_t to isc_result_t
and back, so we don't ever return neither isc_result_t value nor
dns_rcode_t out of defined range.
Ondřej Surý [Thu, 15 Jun 2023 08:24:21 +0000 (10:24 +0200)]
Refactor how we map isc_result_t <-> dns_rcode_t
The mapping functions between isc_result_t and dns_rcode_t could return
both isc_result_t values not defined in the header and dns_rcode_t
values not defined in the header because it blindly maps anything
withing full 12-bits defined for RCODEs to isc_result_t and back.
Refactor the dns_result_{from,to}rcode() functions to always return
valid isc_result_t and dns_rcode_t values by explicitly mapping the
values to each other and returning DNS_R_SERVFAIL (dns_rcode_servfail)
when encountering value out of the defined range.
Tom Krizek [Thu, 15 Jun 2023 08:48:21 +0000 (10:48 +0200)]
Mark CI failure of system:clang:tsan as an error again
Both the issues causing frequent failures have been resolved. The job
seems to have stabilized and there's no longer a need to mark the
failure as a mere warnings.
Aram Sargsyan [Fri, 9 Jun 2023 07:13:27 +0000 (07:13 +0000)]
Fix a data race between the dns_zone and dns_catz modules
The dns_zone_catz_enable_db() and dns_zone_catz_disable_db()
functions can race with similar operations in the catz module
because there is no synchronization between the threads.
Add catz functions which use the view's catalog zones' lock
when registering/unregistering the database update notify callback,
and use those functions in the dns_zone module, instead of doing it
directly.
Tony Finch [Fri, 9 Jun 2023 07:22:47 +0000 (08:22 +0100)]
CHANGES note for [GL #4134]
[cleanup] Report "permission denied" instead of "unexpected error"
when trying to update a zone file is on a read-only file
system. Thanks to Midnight Veil. [GL #4134]
Mark Andrews [Thu, 8 Jun 2023 06:58:45 +0000 (16:58 +1000)]
Use RCU for view->adb access
view->adb may be referenced while the view is shutting down as the
zone uses a weak reference to the view and examines view->adb but
dns_view_detach call dns_adb_detach to clear view->adb.
- style fixes and general tidying-up in tkey.c
- remove the unused 'intoken' parameter from dns_tkey_buildgssquery()
- remove an unnecessary call to dns_tkeyctx_create() in ns_server_create()
(the TKEY context that was created there would soon be destroyed and
another one created when the configuration was loaded).
purely to assuage my desire for consistency across modules,
result variables have been renamed to 'result' as they are
throughout most of BIND. there are no other changes.
this function was no longer needed, because the algorithm name is no
longer copied into the tsigkey object by dns_tsigkey_createfromkey();
it's always just a pointer to a statically defined name.
use algorithm number instead of name to create TSIG keys
the prior practice of passing a dns_name containing the
expanded name of an algorithm to dns_tsigkey_create() and
dns_tsigkey_createfromkey() is unnecessarily cumbersome;
we can now pass the algorithm number instead.
- remove the 'ring' parameter from dns_tsigkey_createfromkey(),
and use dns_tsigkeyring_add() to add key objects to a keyring instead.
- add a magic number to dns_tsigkeyring_t
- change dns_tsigkeyring_dumpanddetach() to dns_tsigkeyring_dump();
we now call dns_tsigkeyring_detach() separately.
- remove 'maxgenerated' from dns_tsigkeyring_t since it never changes.
- style cleanups.
- simplify the function parameters to dns_tsigkey_create():
+ remove 'restored' and 'generated', they're only ever set to false.
+ remove 'creator' because it's only ever set to NULL.
+ remove 'inception' and 'expiry' because they're only ever set to
(0, 0) or (now, now), and either way, this means "never expire".
+ remove 'ring' because we can just use dns_tsigkeyring_add() instead.
- rename dns_keyring_restore() to dns_tsigkeyring_restore() to match the
rest of the functions operating on dns_tsigkeyring objects.
Matthijs Mekking [Wed, 14 Jun 2023 07:05:31 +0000 (09:05 +0200)]
Update findzonekeys function name in log message
The "dns_dnssec_findzonekeys2" log message is a leftover from when that
was the name of the function. Rename to match the current name of the
function.
Matthijs Mekking [Tue, 13 Jun 2023 15:45:08 +0000 (17:45 +0200)]
Add dynamic update prepub and doubleksk test case
Add two test cases for zones that use auto-dnssec, but not
inline-signing, and make sure that the change for find_zone_keys()
do not affect introducing a new key that is intended for signing.
See note https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/7638#note_355944
The find_zone_keys() function was not working properly for
inline-signed zones. It only worked if the DNSKEY records were also
published in the unsigned version of the zone. But this is not the
case when you use dnssec-policy, the DNSKEY records will only occur
in the signed version of the zone. Therefor, when looking for keys
to sign the zone, only the newly added keys in the dynamic update
were found (which could be zero), ignoring existing keys.
Also, if a DNSKEY was added, it would try to sign the zone with just
this new key, and this would only work if the key files for that key
were imported into the key-directory.
This is a design error, because the goal is to sign the zone with the
keys for which we actually have key files for. So instead of looking
for DNSKEY records to then search for the matching key files, call
dns_dnssec_findmatchingkeys() which just looks for the keys we have
on disk for the given zone. It will also set the correct DNSSEC
signing hints.
Matthijs Mekking [Tue, 13 Jun 2023 13:59:53 +0000 (15:59 +0200)]
Add log check in multisigner system test
When we add DNSKEY records via dynamic update, this should no longer
trigger signing the zone with these keys. This currently happens when
'find_zone_keys()' looks up the keys by inspecting the DNSKEY RRset,
then attempting to read the corresponding key files.
Add checks that inspect the logs whether an attempt to read the key
files for the newly added keys was done (and failed because these files
are not available).
Aram Sargsyan [Tue, 13 Jun 2023 09:58:29 +0000 (09:58 +0000)]
Fix catz db update callback registration logic error
When a catalog zone is updated using AXFR, the zone database is changed,
so it is required to unregister the update notification callback from
the old database, and register it for the new one.
Currently, here is the order of the steps happening in such scenario:
1. The zone.c:zone_startload() function registers the notify callback
on the new database using dns_zone_catz_enable_db()
2. The callback, when called, notices that the new 'db' is different
than 'catz->db', and unregisters the old callback for 'catz->db',
marks that it's unregistered by setting 'catz->db_registered' to
false, then it schedules an update if it isn't already scheduled.
3. The offloaded update process, after completing its job, notices that
'catz->db_registered' is false, and (re)registers the update callback
for the current database it is working on. There is no harm here even
if it was registered also on step 1, and we can't skip it, because
this function can also be called "artificially" during a
reconfiguration, and in that case the registration step is required
here.
A problem arises when before step 1 an update process was already
in a running state, operating on the old database, and finishing its
work only after step 2. As described in step 3, dns__catz_update_cb()
notices that 'catz->db_registered' is false and registers the callback
on the current database it is working on, which, at that state, is
already obsolete and unused by the zone. When it detaches the database,
the function which is responsible for its cleanup (e.g. free_rbtdb())
asserts because there is a registered update notify callback there.
To fix the problem, instead of delaying the (re)registration to step 3,
make sure that the new callback is registered and 'catz->db_registered'
is accordingly marked on step 2.
Tom Krizek [Tue, 13 Jun 2023 08:52:01 +0000 (10:52 +0200)]
Avoid false positive in serve-stale system test check
The purpose of the check is to verify the server has survived the
previous barrage of queries. This is done by sending a query and
checking we get a NOERROR response back.
Previously, that query could've been affected by a servfail cache - the
server would return a SERVFAIL answer, thus failing the check, despite
being up and running. Use version.bind txt ch query to avoid the
interference of servfail cache.
Ondřej Surý [Tue, 30 May 2023 06:46:17 +0000 (08:46 +0200)]
Improve RBT overmem cache cleaning
When cache memory usage is over the configured cache size (overmem) and
we are cleaning unused entries, it might not be enough to clean just two
entries if the entries to be expired are smaller than the newly added
rdata. This could be abused by an attacker to cause a remote Denial of
Service by possibly running out of the operating system memory.
Currently, the addrdataset() tries to do a single TTL-based cleaning
considering the serve-stale TTL and then optionally moves to overmem
cleaning if we are in that condition. Then the overmem_purge() tries to
do another single TTL based cleaning from the TTL heap and then continue
with LRU-based cleaning up to 2 entries cleaned.
Squash the TTL-cleaning mechanism into single call from addrdataset(),
but ignore the serve-stale TTL if we are currently overmem.
Then instead of having a fixed number of entries to clean, pass the size
of newly added rdatasetheader to the overmem_purge() function and
cleanup at least the size of the newly added data. This prevents the
cache going over the configured memory limit (`max-cache-size`).
Additionally, refactor the overmem_purge() function to reduce for-loop
nesting for readability.
Ondřej Surý [Tue, 6 Jun 2023 10:48:23 +0000 (12:48 +0200)]
Fix extra detach when dns_validator create_fetch() detects deadlock
When create_fetch() in the dns_validator unit detects deadlock, it
returns DNS_R_NOVALIDSIG, but it didn't attach to the validator. The
other condition to returning result != ISC_R_SUCCESS would be error from
dns_resolver_createfetch(). The caller (in two places out of three)
would detect the error condition and always detach from the validator.
Move the dns_validator_detach() on dns_resolver_createfetch() error
condition to create_fetch() function and cleanup the extra detaches in
seek_dnskey() and get_dsset().
Artem Boldariev [Fri, 2 Jun 2023 11:28:50 +0000 (14:28 +0300)]
Use appropriately sized send buffers for DNS messages over TCP
This commit changes send buffers allocation strategy for stream based
transports. Before that change we would allocate a dynamic buffers
sized at 64Kb even when we do not need that much. That could lead to
high memory usage on server. Now we resize the send buffer to match
the size of the actual data, freeing the memory at the end of the
buffer for being reused later.
Mark Andrews [Wed, 31 May 2023 02:40:37 +0000 (12:40 +1000)]
Use rcu methods to lock access view->zonetable
dns_view_find* may be called after the final call to dns_view_detach
is made which detaches view->zonetable to permit the server to
shutdown. We need to detect if view->zonetable is NULL during this
stage and appropriately recover.
Ondřej Surý [Thu, 1 Jun 2023 11:38:42 +0000 (13:38 +0200)]
Disable URCU inlining if inlined rcu_dereference() fails to compile
In some cases, the inlined version rcu_dereference() would not compile
when working on pointer to opaque struct (namely Ubuntu Jammy). Detect
such condition in the autoconf and disable the inlining of the small
functions if it breaks the build.
Aram Sargsyan [Sat, 27 May 2023 11:01:28 +0000 (11:01 +0000)]
Fix a clients-per-query miscalculation bug
The number of clients per query is calculated using the pending
fetch responses in the list. The dns_resolver_createfetch() function
includes every item in the list when deciding whether the limit is
reached (i.e. fctx->spilled is true). Then, when the limit is reached,
there is another calculation in fctx_sendevents(), when deciding
whether it is needed to increase the limit, but this time the TRYSTALE
responses are not included in the calculation (because of early break
from the loop), and because of that the limit is never increased.
A single client can have more than one associated response/event in the
list (currently max. two), and calculating them as separate "clients"
is unexpected. E.g. if 'stale-answer-enable' is enabled and
'stale-answer-client-timeout' is enabled and is larger than 0, then
each client will have two events, which will effectively halve the
clients-per-query limit.
Fix the dns_resolver_createfetch() function to calculate only the
regular FETCHDONE responses/events.
Change the fctx_sendevents() function to also calculate only FETCHDONE
responses/events. Currently, this second change doesn't have any impact,
because the TRYSTALE events were already skipped, but having the same
condition in both places will help prevent similar bugs in the future
if a new type of response/event is ever added.
Aram Sargsyan [Mon, 29 May 2023 17:34:13 +0000 (17:34 +0000)]
Light refactoring of the fetchlimit system test
Prepare the fetchlimit system test for adding a clients-per-query
check. Change some functions and commands to accept a destination
NS IP address instead of using the hardcoded 10.53.0.3.