Replace the HTTP/2 session's ad-hoc buffer with isc_buffer_t
This commit replaces a static ad-hoc HTTP/2 session's temporary buffer
with a realloc-able isc_buffer_t object, which is being allocated on
as needed basis, lowering the memory consumption somewhat. The buffer
is needed in very rare cases, so allocating it prematurely is not
wise.
Also, it fixes a bug in http_readcb() where the ad-hoc buffer appeared
to be improperly used, leading to a situation when the processed data
from the receiving regions can be processed twice, while unprocessed
data will never be processed.
Michal Nowak [Tue, 11 May 2021 16:06:59 +0000 (18:06 +0200)]
Fix handling of restart option in run.sh
The support for stat.pl's --restart option was incomplete in run.sh.
This change makes sure it's handled properly and that named.run file is
not being removed by clean.sh when the --restart option is used.
Michal Nowak [Tue, 4 May 2021 10:58:23 +0000 (12:58 +0200)]
Process core dump from named which failed to start
When named failed to start and produced core dump, the core file wasn't
processed by GDB because of run.sh script exiting immediately. This
remedies the limitation, simplifies the surrounding code, and makes the
script shellcheck clean.
Artem Boldariev [Mon, 14 Jun 2021 13:40:27 +0000 (16:40 +0300)]
Add a system test that tests connections quota for DoH
The system tests stress out the DoH quota by opening many TCP
connections and then running dig instances against the "overloaded"
server to perform some queries. The processes cannot make any
resolutions because the quota is exceeded. Then the opened connections
are getting closed in random order allowing the queries to proceed.
Artem Boldariev [Tue, 18 May 2021 09:03:58 +0000 (12:03 +0300)]
Make max number of HTTP/2 streams configurable
This commit makes number of concurrent HTTP/2 streams per connection
configurable as a mean to fight DDoS attacks. As soon as the limit is
reached, BIND terminates the whole session.
The commit adds a global configuration
option (http-streams-per-connection) which can be overridden in an
http <name> {...} statement like follows:
For now the default value is 100, which should be enough (e.g. NGINX
uses 128, but it is a full-featured WEB-server). When using lower
numbers (e.g. ~70), it is possible to hit the limit with
e.g. flamethrower.
This way we have ability to specify per-listener active connections
quota globally and then override it when required. This is exactly
what AT&T requested us: they wanted a functionality to specify quota
globally and then override it for specific IPs. This change
functionality makes such a configuration possible.
It makes sense: for example, one could have different quotas for
internal and external clients. Or, for example, one could use BIND's
internal ability to serve encrypted DoH with some sane quota value for
internal clients, while having un-encrypted DoH listener without quota
to put BIND behind a load balancer doing TLS offloading for external
clients.
Moreover, the code no more shares the quota with TCP, which makes
little sense anyway (see tcp-clients option), because of the nature of
interaction of DoH clients: they tend to keep idle opened connections
for longer periods of time, preventing the TCP and TLS client from
being served. Thus, the need to have a separate, generally larger,
quota for them.
Also, the change makes any option within "http <name> { ... };"
statement optional, making it easier to override only required default
options.
By default, the DoH connections are limited to 300 per listener. I
hope that it is a good initial guesstimate.
Artem Boldariev [Wed, 19 May 2021 15:03:11 +0000 (18:03 +0300)]
Verify HTTP paths both in incoming requests and in config file
This commit adds the code (and some tests) which allows verifying
validity of HTTP paths both in incoming HTTP requests and in BIND's
configuration file.
Michał Kępień [Fri, 16 Jul 2021 05:20:15 +0000 (07:20 +0200)]
Extend tests for signed, CNAME-sourced delegations
Extend the "chain" system test with AUTHORITY section checks for signed,
secure delegations. This complements the checks for signed, insecure
delegations added by commit 26ec4b9a89720e6d5630ad8cf5da6fe357dee32a.
Extend the existing AUTHORITY section checks for signed, insecure
delegations to ensure nonexistence of DS RRsets in such responses.
Adjust comments accordingly.
Ensure dig failures cause the "chain" system test to fail.
Michał Kępień [Fri, 16 Jul 2021 05:20:15 +0000 (07:20 +0200)]
Tweak query_addds() comments to avoid confusion
It has been noticed that commit 7a87bf468b9e092bf65db55a8e9234853c7db63d
did not only fix NSEC record handling in signed, insecure delegations
prepared using both wildcard expansion and CNAME chaining - it also
inadvertently fixed DS record handling in signed, secure delegations
of that flavor. This is because the 'rdataset' variable in the relevant
location in query_addds() can be either a DS RRset or an NSEC RRset.
Update a code comment in query_addds() to avoid confusion.
Update the comments describing the purpose of query_addds() so that they
also mention NSEC(3) records.
Add tests to the nsupdate system test to make sure that CDS and/or
CDNSKEY that match an algorithm in the DNSKEY RRset are allowed. Also
add tests that updates are rejected if the algorithm does not match.
Remove the now redundant test cases from the dnssec system test.
Update the checkzone system test: Change the algorithm of the CDS and
CDNSKEY records so that the zone is still rejected.
Fix crash in DoH on empty query string in GET requests
An unhandled code path left GET query string data uninitialised (equal
to NULL) and led to a crash during the requests' base64 data
decoding. This commit fixes that.
As we don't set the thread affinity, the cpu test would consistently
fail. Disable it, but don't remove it as we might restore setting the
affinity in the future versions of BIND 9.
It was discovered that setting the thread affinity on both the netmgr
and netthread threads lead to inconsistent recursive performance because
sometimes the netmgr and netthread threads would compete over single
resource and sometimes not.
Removing setting the affinity causes a slight dip in the authoritative
performance around 5% (the measured range was from 3.8% to 7.8%), but
the recursive performance is now consistently good.
Ondrej Sury [Tue, 13 Jul 2021 10:35:52 +0000 (12:35 +0200)]
Use max_align_t for memory sizeinfo alignment on OpenBSD
On OpenBSD and more generally on platforms without either jemalloc or
malloc_(usable_)size, we need to increase the alignment for the memory
to sizeof(max_align_t) as with plain sizeof(void *), the compiled code
would be crashing when accessing the returned memory.
Ondrej Sury [Mon, 12 Jul 2021 12:21:21 +0000 (14:21 +0200)]
Cache the isc_os_ncpu() result
It was discovered that on some platforms (f.e. Alpine Linux with MUSL)
the result of isc_os_ncpus() call differ when called before and after we
drop privileges. This commit changes the isc_os_ncpus() call to cache
the result from the first call and thus always return the same value
during the runtime of the named. The first call to isc_os_ncpus() is
made as soon as possible on the library initalization.
Remove nonnull attribute from isc_mem_{get,allocate,reallocate}
The isc_mem_get(), isc_mem_allocate() and isc_mem_reallocate() can
return NULL ptr in case where the allocation size is NULL. Remove the
nonnull attribute from the functions' declarations.
This stems from the following definition in the C11 standard:
> If the size of the space requested is zero, the behavior is
> implementation-defined: either a null pointer is returned, or the
> behavior is as if the size were some nonzero value, except that the
> returned pointer shall not be used to access an object.
In this case, we return NULL as it's easier to detect errors when
accessing pointer from zero-sized allocation which should obviously
never happen.
Fix the real allocation size in OpenBSD rallocx shim
In the rallocx() shim for OpenBSD (that's the only platform that doesn't
have malloc_size() or malloc_usable_size() equivalent), the newly
allocated size was missing the extra size_t member for storing the
allocation size leading to size_t sized overflow at the end of the
reallocated memory chunk.
CID 281425 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR)
4. tainted_data: Passing tainted expression nsec3param.iterations to dns_nsec3_hashname, which uses it as a loop boundary. [show details]
Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
635 result = dns_nsec3_hashname(&fixed, rawhash, &rhsize, vctx->origin,
636 vctx->origin, nsec3param.hash,
637 nsec3param.iterations, nsec3param.salt,
638 nsec3param.salt_length);
Mark Andrews [Wed, 7 Jul 2021 04:08:04 +0000 (14:08 +1000)]
Silence tainted scalar on rdlen
2042 ttl = isc_buffer_getuint32(&j->it.source);
13. tainted_data_transitive: Call to function isc_buffer_getuint16 with tainted argument *j->it.source.base returns tainted data. [show details]
14. var_assign: Assigning: rdlen = isc_buffer_getuint16(&j->it.source), which taints rdlen.
2043 rdlen = isc_buffer_getuint16(&j->it.source);
2044
2045 /*
2046 * Parse the rdata.
2047 */
15. Condition j->it.source.used - j->it.source.current != rdlen, taking false branch.
2048 if (isc_buffer_remaininglength(&j->it.source) != rdlen) {
2049 FAIL(DNS_R_FORMERR);
2050 }
16. var_assign_var: Assigning: j->it.source.active = j->it.source.current + rdlen. Both are now tainted.
2051 isc_buffer_setactive(&j->it.source, rdlen);
2052 dns_rdata_reset(&j->it.rdata);
17. lower_bounds: Checking lower bounds of unsigned scalar j->it.source.active by taking the true branch of j->it.source.active > j->it.source.current.
CID 316506 (#1 of 1): Untrusted loop bound (TAINTED_SCALAR)
18. tainted_data: Passing tainted expression j->it.source.active to dns_rdata_fromwire, which uses it as a loop boundary. [show details]
Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
2053 CHECK(dns_rdata_fromwire(&j->it.rdata, rdclass, rdtype, &j->it.source,
2054 &j->it.dctx, 0, &j->it.target));
Mark Andrews [Wed, 7 Jul 2021 02:09:31 +0000 (12:09 +1000)]
Silence use of tainted scalar
2607
43. tainted_argument: Calling function journal_read_xhdr taints argument xhdr.size. [show details]
2608 result = journal_read_xhdr(j1, &xhdr);
44. Condition rewrite, taking true branch.
45. Condition result == 29, taking false branch.
2609 if (rewrite && result == ISC_R_NOMORE) {
2610 break;
2611 }
46. Condition result != 0, taking false branch.
2612 CHECK(result);
2613
47. var_assign_var: Assigning: size = xhdr.size. Both are now tainted.
2614 size = xhdr.size;
CID 331088 (#3 of 3): Untrusted allocation size (TAINTED_SCALAR)
48. tainted_data: Passing tainted expression size to isc__mem_get, which uses it as an allocation size. [show details]
Ensure that tainted values are properly sanitized, by checking that their values are within a permissible range.
2615 buf = isc_mem_get(mctx, size);
Revert the allocate/free -> get/put change from jemalloc change
In the jemalloc merge request, we missed the fact that ah_frees and ah_handles
are reallocated which is not compatible with using isc_mem_get() for allocation
and isc_mem_put() for deallocation. This commit reverts that part and restores
use of isc_mem_allocate() and isc_mem_free().
The proper way how to disable the water limit in the isc_mem context is
to call:
isc_mem_setwater(ctx, NULL, NULL, 0, 0);
this ensures that the old water callback is called with ISC_MEM_LOWATER
if the callback was called with ISC_MEM_HIWATER before.
Historically, there were some places where the limits were disabled by
calling:
isc_mem_setwater(ctx, water, water_arg, 0, 0);
which would also call the old callback, but it also causes the water_t
to be allocated and extra check to be executed because water callback is
not NULL.
This commits unifies the calls to disable water to the preferred form.
Disable jemalloc for Address and Thread Sanitizers
The Address and Thread Sanitizers both intercept the malloc calls and
using the extended jemalloc API interferes with that. This commit
disables the use of jemalloc for both ASAN and TSAN enabled builds to
eliminate both false positives and false negatives.
Use isc_mem_get() and isc_mem_put() in isc_mem_total test
Previously, the isc_mem_allocate() and isc_mem_free() would be used for
isc_mem_total test, but since we now use the real allocation
size (sallocx, malloc_size, malloc_usable_size) to track the allocation
size, it's impossible to get the test value right. Changing the test to
use isc_mem_get() and isc_mem_put() will use the exact size provided, so
the test would work again on all the platforms even when jemalloc is not
being used.
It was discovered that softhsm2.4 has a bug that causes invalid free()
call to be called when unloading libsofthsm.so.2 library. The native
PKCS#11 API is scheduled to removed in the 9.17+ release, we could
safely just disable jemalloc for this particular build.
Rewrite isc_mem water to use single atomic exchange operation
This commit refactors the water mechanism in the isc_mem API to use
single pointer to a water_t structure that can be swapped with
atomic_exchange operation instead of having four different
values (water, water_arg, hi_water, lo_water) in the flat namespace.
This reduces the need for locking and prevents a race when water and
water_arg could be desynchronized.
Ondřej Surý [Thu, 10 Jun 2021 08:18:24 +0000 (10:18 +0200)]
Allow size == 0 in isc_mem_{get,allocate,reallocate}
Calls to jemalloc extended API with size == 0 ends up in undefined
behaviour. This commit makes the isc_mem_get() and friends calls
more POSIX aligned:
If size is 0, either a null pointer or a unique pointer that can be
successfully passed to free() shall be returned.
We picked the easier route (which have been already supported in the old
code) and return NULL on calls to the API where size == 0.
Ondřej Surý [Tue, 25 May 2021 10:46:00 +0000 (12:46 +0200)]
Use system allocator when jemalloc is unavailable
This commit adds support for systems where the jemalloc library is not
available as a package, here's the quick summary:
* On Linux - the jemalloc is usually available as a package, if
configured --without-jemalloc, the shim would be used around
malloc(), free(), realloc() and malloc_usable_size()
* On macOS - the jemalloc is available from homebrew or macports, if
configured --without-jemalloc, the shim would be used around
malloc(), free(), realloc() and malloc_size()
* On FreeBSD - the jemalloc is *the* system allocator, we just need
to check for <malloc_np.h> header to get access to non-standard API
* On NetBSD - the jemalloc is *the* system allocator, we just need to
check for <jemalloc/jemalloc.h> header to get access to non-standard
API
* On a system hostile to users and developers (read OpenBSD) - the
jemalloc API is emulated by using ((size_t *)ptr)[-1] field to hold
the size information. The OpenBSD developers care only for
themselves, so why should we care about speed on OpenBSD?
Ondřej Surý [Wed, 12 May 2021 22:29:11 +0000 (00:29 +0200)]
Clean up isc_mempool API
- isc_mempool_get() can no longer fail; when there are no more objects
in the pool, more are always allocated. checking for NULL return is
no longer necessary.
- the isc_mempool_setmaxalloc() and isc_mempool_getmaxalloc() functions
are no longer used and have been removed.
Ondřej Surý [Tue, 11 May 2021 17:54:05 +0000 (19:54 +0200)]
Add debug tracing capability to isc_mempool_create/destroy
Previously, we only had capability to trace the mempool gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory pools get created and destroyed. This commit
adds such tracking capability.
Ondřej Surý [Tue, 11 May 2021 12:37:18 +0000 (14:37 +0200)]
Replace isc_mem_allocate() usage with isc_mem_get() in netmgr.c
The isc_mem_allocate() comes with additional cost because of the memory
tracking. In this commit, we replace the usage with isc_mem_get()
because we track the allocated sizes anyway, so it's possible to also
replace isc_mem_free() with isc_mem_put().
Ondřej Surý [Tue, 11 May 2021 12:00:12 +0000 (14:00 +0200)]
Replace internal memory calls with non-standard jemalloc API
The jemalloc non-standard API fits nicely with our memory contexts, so
just rewrite the memory context internals to use the non-public API.
There's just one caveat - since we no longer track the size of the
allocation for isc_mem_allocate/isc_mem_free combination, we need to use
sallocx() to get real allocation size in both allocator and deallocator
because otherwise the sizes would not match.
Ondřej Surý [Tue, 11 May 2021 10:59:35 +0000 (12:59 +0200)]
Remove ISC_MEM_DEBUGSIZE and ISC_MEM_DEBUGRECORD
The ISC_MEM_DEBUGSIZE and ISC_MEM_DEBUGCTX did sanity checks on matching
size and memory context on the memory returned to the allocator. Those
will no longer needed when most of the allocator will be replaced with
jemalloc.
* metadata_thp:auto - allow jemalloc to use transparent huge page
(THP) for internal metadata initially, but may begin to do so when
metadata usage reaches certain level.
* dirty_decay_ms:30000 - Approximate time in milliseconds from the
creation of a set of unused dirty pages until an equivalent set of
unused dirty pages is purged and/or reused.
* muzzy_decay_ms:30000 - Approximate time in milliseconds from the
creation of a set of unused muzzy pages until an equivalent set of
unused muzzy pages is purged and/or reused.
More information about the specific meaning can be found in the jemalloc
manpage or online at http://jemalloc.net/jemalloc.3.html
Ondřej Surý [Tue, 11 May 2021 10:29:57 +0000 (12:29 +0200)]
Compile with jemalloc to reduce memory allocator contention
The jemalloc allocator is scalable high performance allocator, this is
the first in the series of commits that will add jemalloc as a memory
allocator for BIND 9.
This commit adds configure.ac check and Makefile modifications to use
jemalloc as BIND 9 allocator.
Ondřej Surý [Tue, 11 May 2021 10:18:56 +0000 (12:18 +0200)]
Add debug tracing capability to isc_mem_create/isc_mem_destroy
Previously, we only had capability to trace the memory gets and puts,
but for debugging, it's sometimes also important to keep track how many
and where do the memory contexts get created and destroyed. This commit
adds such tracking capability.
Return HTTP status code for small/malformed requests
This commit makes BIND return HTTP status codes for malformed or too
small requests.
DNS request processing code would ignore such requests. Such an
approach works well for other DNS transport but does not make much
sense for HTTP, not allowing it to complete the request/response
sequence.
Suppose execution has reached the point where DNS message handling
code has been called. In that case, it means that the HTTP request has
been successfully processed, and, thus, we are expected to respond to
it either with a message containing some DNS payload or at least to
return an error status code. This commit ensures that BIND behaves
this way.