Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Remove redundant recursion quota pointer checks
When the client->recursionquota pointer was overloaded by different
features, each of those features had to be aware of that fact and handle
any updates of that pointer gracefully. Example: prefetch code
initiates recursion, attaching to client->recursionquota, then query
processing restarts due to a CNAME being encountered, then that CNAME is
not found in the cache, so another recursion is triggered, but
client->recursionquota is already attached to; even though it is not
CNAME chaining code that attached to that pointer, that code still has
to handle such a situation gracefully.
However, all features that can initiate recursion have now been updated
to use separate slots in the 'recursions' array, so keeping the old
checks in place means masking future programming errors that could
otherwise be caught - and should be caught because each feature needs to
properly maintain its own quota reference.
Remove outdated recursion quota pointer checks to enable the assertions
in isc_quota_*() functions to detect programming errors in code paths
that can start recursion. Remove an outdated comment to prevent
confusion.
Ondřej Surý [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Add recursionquota_detach()
Add a new helper function for detaching from the recursion quota in
order to reduce code duplication and to ensure that detaching from that
quota is always accompanied by decreasing the recursive clients counter.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Add helper function for recursive-clients logging
Reduce code duplication in check_recursionquota() by extracting its
parts responsible for logging to a separate helper function.
Remove result text from the "no more recursive clients" message because
it always says "quota reached" (as the relevant branch is only evaluated
when 'result' is set to ISC_R_QUOTA) and therefore brings no additional
value.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Check recursions at the end of request processing
ns_client_endrequest() currently contains code that looks for
outstanding quota references and cleans them up if necessary. This
approach masks programming errors because ns_client_endrequest() is only
called from ns__client_reset_cb(), which in turn is only called when all
references to the client's netmgr handle are released, which in turn
only happens after all recursion completion callbacks are invoked
(because isc_nmhandle_attach() is called before every call to
dns_resolver_createfetch() in lib/ns/query.c and the completion callback
is expected to detach from the handle), which in turn is expected to
happen for all recursions attempts, even those that get canceled.
Furthermore, declaring the prototype of ns_client_endrequest() at the
top of lib/ns/client.c is redundant because the definition of that
function is placed before its first use in that file. Remove the
redundant function prototype.
Finally, remove INSIST assertions ensuring quota pointers are NULL in
ns__client_reset_cb() because the latter calls ns_client_endrequest() a
few lines earlier.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Make async hooks code use the 'recursions' array
Async hooks are the last feature using the client->fetchhandle and
client->query.fetch pointers. Update ns_query_hookasync() and
query_hookresume() so that they use a dedicated slot in the 'recursions'
array. Note that async hooks are still not expected to initiate
recursion if one was already started by a prior ns_query_recurse() call,
so the REQUIRE assertion in ns_query_hookasync() needs to check the
RECTYPE_NORMAL slot rather than the RECTYPE_HOOK one.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Enable ns_query_t to track multiple recursions
When a client waits for a prefetch- or RPZ-triggered recursion to
complete, its netmgr handle is attached to client->prefetchhandle and a
reference to the resolver fetch is stored in client->query.prefetch.
Both of these features use the same fields mentioned above. This makes
the code fragile and hard to follow as its logically distinct parts
become intertwined for no obvious reason.
Furthermore, storing pointers related to a specific recursion process in
two different structures makes their purpose harder to grasp than it has
to be.
To alleviate the problem, extend ns_query_t with an array of structures
containing recursion-related pointers. Each feature able to initiate
recursion is supposed to use its own slot in that array, allowing
logically unrelated code paths to be untangled. Prefetch and RPZ will
be the first users of that array.
Define helper macros for accessing specific recursion-related pointers
in order to improve code readability.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Adjust recursion quota when starting a fetch fails
Some functions fail to detach from the recursion quota if an error
occurs while initiating recursion. This causes the recursive client
counter to be off. Add missing recursionquota_detach() calls, reworking
cleanup code where appropriate.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Make resolver glue code use the 'recursions' array
With prefetch and RPZ code updated to use separate slots in the
'recursions' array, the code responsible for starting recursion in
ns_query_recurse() and resuming query handling in fetch_callback()
should follow suit, so that it does not need to explicitly cooperate
with other code paths that may initiate recursion.
Replace:
- client->fetchhandle with HANDLE_RECTYPE_NORMAL(client)
- client->query.fetch with FETCH_RECTYPE_NORMAL(client)
Also update other functions using client->fetchhandle and
client->query.fetch (ns_query_cancel(), query_usestale()) so that those
two fields can shortly be dropped altogether.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Simplify client->query initialization
Initialize client->query using a compound literal in order to make the
ns_query_init() function shorter and more readable. This also prevents
the need to explicitly initialize any newly added fields in the future.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Attach to separate recursion quota pointers
Similarly to how different code paths reused common client handle
pointers and fetch references despite being logically unrelated, they
also reuse client->recursionquota, the field in which a reference to the
recursion quota is stored. This unnecessarily forces all code using
that field to be aware of the fact that it is overloaded by different
features.
Overloading client->recursionquota also causes inconsistent behavior.
For example, if prefetch code triggers recursion and then delegation
handling code also triggers recursion, only one of these code paths will
be able to attach to the recursion quota, but both recursions will be
started anyway. In other words, each code path only checks whether the
recursion quota has not been exceeded if the quota has not yet been
attached to by another code path. This behavior theoretically allows
the configured recursion quota to be slightly exceeded; while it is not
expected to be a real-world operational issue, it is still confusing and
should therefore be fixed.
Extend the structures comprising the 'recursions' array with a new field
holding a pointer to the recursion quota that a given recursion process
attached to. Update all code paths using client->recursionquota so that
they use the appropriate slot in the 'recursions' array. Drop the
'recursionquota' field from ns_client_t.
Michał Kępień [Tue, 14 Jun 2022 11:13:32 +0000 (13:13 +0200)]
Use common code to start prefetches & RPZ fetches
query_prefetch() and query_rpzfetch() contain a lot of duplicated code.
Extract the common bits into a separate function whose name clearly
suggests its purpose.
Ondřej Surý [Tue, 14 Jun 2022 07:17:08 +0000 (09:17 +0200)]
Gracefully handle uv_read_start() failures
Under specific rare timing circumstances the uv_read_start() could
fail with UV_EINVAL when the connection is reset between the connect (or
accept) and the uv_read_start() call on the nmworker loop. Handle such
situation gracefully by propagating the errors from uv_read_start() into
upper layers, so the socket can be internally closed().
Michal Nowak [Fri, 3 Jun 2022 11:12:22 +0000 (13:12 +0200)]
Fix statistics system test on Oracle Linux 7
The statistics system test fails on Oracle Linux 7 when libxml2, Curl,
and xsltproc are present:
I:statistics:checking bind9.xsl vs xml (17)
diff: curl.out.17.xsl: No such file or directory
tests.sh: line 183: curl.out.17.xml: No such file or directory
cp: cannot stat 'curl.out.17.xml': No such file or directory
grep: xsltproc.out.17: No such file or directory
This is because the Oracle Linux 7 Curl does not know about the
--http1.1 option and silently fails with:
+ /usr/bin/curl --http1.1 http://10.53.0.3:7252
curl: option --http1.1: is unknown
curl: try 'curl --help' or 'curl --manual' for more information
The following test "checking bind9.xml socket statistics" then needs to
check for existence of stats.xml.out file which is artifact of the
previous test.
Evan Hunt [Fri, 3 Jun 2022 23:22:01 +0000 (16:22 -0700)]
don't keep stale NXDOMAIN cache entries
when serve-stale is enabled, NXDOMAIN cache entries are no longer
preserved after the normal negative cache TTL, in order to reduce
unnecessary cache memory consumption.
query.c:9394:26: warning: Dereference of null pointer [core.NullDereference]
if (!qctx->nxrewrite || qctx->rpz_st->m.rpz->addsoa) {
^~~~~~~~~~~~~~~~~~~
1 warning generated.
The warning above is for qctx->rpz_st potentially being a NULL pointer
when query_nxdomain() is called from query_resume(). This is a false
positive because none of the database lookup result codes currently
causing query_nxdomain() to be called (DNS_R_EMPTYWILD, DNS_R_NXDOMAIN)
can be returned by a database lookup following a recursive resolution
attempt. Add a NULL check nevertheless in order to future-proof the
code and silence Clang Static Analyzer.
Michał Kępień [Fri, 10 Jun 2022 12:30:23 +0000 (14:30 +0200)]
Remove NULL checks for ns_client_getnamebuf()
ns_client_getnamebuf() cannot fail (i.e. return NULL) since commit e31cc1eeb436095490c7caa120de148df82ecd6c. Remove redundant NULL checks
performed on the pointer returned by ns_client_getnamebuf().
Petr Špaček [Thu, 12 May 2022 07:20:46 +0000 (09:20 +0200)]
Refactor and unite internal data structures for iscconf Sphinx extension
It turns out it is easier to regenerate Sphinx-mandated structure in
get_objects than to maintain two separate data structures. I should have
realized that before.
Petr Špaček [Tue, 10 May 2022 12:50:34 +0000 (14:50 +0200)]
Add table generator into Sphinx config extension
New directive .. statementlist:: generates table of statements in a
the given domain (named.conf or rndc.conf). The table contains link to
definition, short description, and also list of tags.
Short description and tags have to be provided by user using optional
parameters. E.g.:
.. statement:: max-cache-size
:tags: resolver, cache
:short: Short description
.. statementlist:: is currently not parametrized.
This modification is based on Sphinx "tutorial" extension "TODO".
The main trick is to use placeholder node for .. statementlist:: and
replace it with table at later stage, when all source files were
processed and all cross-references can be resolved.
Beware, some details in Sphinx docs are not up-to-date, it's better
to read Sphinx and docutil sources.
Petr Špaček [Tue, 10 May 2022 13:00:06 +0000 (15:00 +0200)]
Add Sphinx extension to help with ARM maintenance and cross-linking
The extension provides a "Sphinx domain factory". Each new Sphinx domain
defines a namespace for configuration statements so named.conf and
rndc.conf do not clash. Currently the Sphinx domains are instantiated
twice and resuling domains are named "namedconf" and "rndcconf".
This commit adds a single new directive:
.. statement:: max-cache-size
It is namespaced like this:
.. namedconf:statement:: max-cache-size
This directive generates a new anchor for configuration statement and it
can be referenced like :any:`max-cache-size` (if the identifier is
unique), or more specific :namedconf:ref:`max-cache-size`.
It is based on Sphinx "tutorial" extension "recipe".
Beware, some details in Sphinx docs are not up-to-date, it's better
to read Sphinx and docutil sources.
Aram Sargsyan [Wed, 4 May 2022 09:43:49 +0000 (09:43 +0000)]
Cleanup dns_fwdtable_delete()
The conversion of `DNS_R_PARTIALMATCH` into `DNS_R_NOTFOUND` is done
in the `dns_rbt_deletename()` function so there is no need to do that
in `dns_fwdtable_delete()`.
Add a possible return value of `ISC_R_NOSPACE` into the header file's
function description comment.
Aram Sargsyan [Tue, 3 May 2022 22:28:45 +0000 (22:28 +0000)]
Convert some catz error messages from ISC_LOG_INFO to ISC_LOG_WARNING
There is no reason for these two messages to be `ISC_LOG_INFO` while all
the other similar messages in `catz_addmodzone_taskaction()` and
`catz_delzone_taskaction()` functions are logged as `ISC_LOG_WARNING`.
Aram Sargsyan [Tue, 3 May 2022 22:24:32 +0000 (22:24 +0000)]
Check that catz member zone is not a configured forward zone
When processing a catalog zone member zone make sure that there is no
configured pre-existing forward zone with that name.
Refactor the `dns_fwdtable_find()` function to not alter the
`DNS_R_PARTIALMATCH` result (coming from `dns_rbt_findname()`) into
`DNS_R_SUCCESS`, so that now the caller can differentiate partial
and exact matches. Patch the calling sites to expect and process
the new return value.
Tom Krizek [Tue, 7 Jun 2022 15:27:25 +0000 (17:27 +0200)]
Move pylint CI job to precheck stage
Historically, some *.py files were generated, so Python checks required
running ./configure beforehand. This is no longer the case since v9_18,
so let's run the job ASAP without the unnecessary extra dependency on
autoconf job.
Tom Krizek [Tue, 7 Jun 2022 14:29:52 +0000 (16:29 +0200)]
Remove flake8 linter for Python from CI
Python codestyle is now handled by black and other issues are checked by
pylint. Flake8 checking has been made redundant and is thus removed as
obsolete.
Aram Sargsyan [Mon, 6 Jun 2022 15:12:46 +0000 (15:12 +0000)]
Remove unneded NULL-checking
Fix an issue reported by Coverity by removing the unneded check.
*** CID 352554: Null pointer dereferences (REVERSE_INULL)
/bin/dig/dighost.c: 3056 in start_tcp()
3050
3051 if (ISC_LINK_LINKED(query, link)) {
3052 next = ISC_LIST_NEXT(query, link);
3053 } else {
3054 next = NULL;
3055 }
>>> CID 352554: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "connectquery" suggests that it may be null, but it
has already been dereferenced on all paths leading to the check.
3056 if (connectquery != NULL) {
3057 query_detach(&connectquery);
3058 }
3059 query_detach(&query);
3060 if (next == NULL) {
3061 clear_current_lookup();
In the cases where we test SOA serial updates and TTL updates, we check
if for "all zones loaded" to ensure the new zone content is loaded. But
this is the unsigned zone, the signed zone still needs to be produced.
There is thus a timing issue where the dig request comes in before
the signing process has finished.
Petr Špaček [Tue, 15 Mar 2022 10:55:36 +0000 (11:55 +0100)]
Flag new user-visible log messages for review
Messages with log levels INFO or higher are flagged for manual review.
Purpose of this check is to prevent debug logs to being released with
too-high log level.
Petr Špaček [Fri, 6 May 2022 16:44:15 +0000 (18:44 +0200)]
ARM style change: render literals in black color
After enormous amount of bikesheding about colors we decided to override
ReadTheDocs default style for literals (``literal`` in the RST markup).
Justification:
- The default RTD "light red literal on white background" is hard to
read. https://webaim.org/resources/contrastchecker/ reports that text
colored as rgb(231, 76, 60) on white background has insufficient
contrast.
- The ARM has enormous amount of literals all over the place and thus
one sentence can contain several black/red/black color changes. This
is distracting. As a consequence, the ARM looks like a Geronimo
Stilton book.
What we experimented with as replacements for red:
- Green - way too distracting
- Blue - too similar to "usual clickable link"
- Violet - too Geronimo Stilton style
- Brown - better but still distracting
After all the bikesheding we settled on black, i.e. the same as all
"normal" text. I.e. the color is now the same and literals are denoted
by monospaced font and a box around the literal. This has best contrast
and is way less distracting than it used to be.
This lead to a new problem: Internal references to "term definitions"
defined using directives like .. option:: were rendered almost the same
as literals:
- References: monospaced + box + bold + clickable
- Literals: monospaced + box To distinguish these two we added black
dotted underline to clickable references.
Petr Špaček [Tue, 10 May 2022 14:53:40 +0000 (16:53 +0200)]
Allow wrapping for ARM table content
RTD style default never wraps <th> and <td> elements and that just does
not work for real sentences or any other long lines.
We can reconsider styling some tables separately, but at the moment we
do not have use for tables with long but unwrappable lines so it's
easier to allow wrapping globally.
Aram Sargsyan [Wed, 1 Jun 2022 08:51:55 +0000 (08:51 +0000)]
Don't process DNSSEC-related and ZONEMD records in catz
When processing a catalog zone update, skip processing records with
DNSSEC-related and ZONEMD types, because we are not interested in them
in the context of a catalog zone, and processing them will fail and
produce an unnecessary warning message.
Ondřej Surý [Wed, 1 Jun 2022 11:10:37 +0000 (13:10 +0200)]
Properly adjust the srcdir vs builddir paths
Affected unit tests load testdata from the srcdir. Previously, there
was a kludge that chdir()ed to the tests srcdir, but that get removed
during refactoring. Instead of introducing the kludge again, the paths
were fixed to be properly prefixed with TESTS_DIR as needed.
Ondřej Surý [Wed, 1 Jun 2022 07:02:10 +0000 (09:02 +0200)]
Don't list libtest.la headers in HEADERS variable
The libtest.la headers were installed in very weird place, in fact, we
don't need to list them in the HEADERS variable, listing them in SOURCES
is enough for autotools to figure out how to compile the convenience
library.
Artem Boldariev [Wed, 25 May 2022 11:49:32 +0000 (14:49 +0300)]
Increase server start timeout for system tests
This commit increases server start timeout from 60 to 90 seconds in
order to avoid system test failures on some platforms due to inability
to initialise TLS contexts in time.
Tony Finch [Fri, 6 May 2022 07:19:54 +0000 (08:19 +0100)]
CHANGES note for [GL !6270]
[cleanup] Simplify BIND's internal DNS name compression API. As
RFC 6891 explains, it isn't practical to deploy new
label types or compression methods, so it isn't
necessary to have an API designed to support them.
Remove compression terminology that refers to Internet
Drafts that expired in the 1990s.
Tony Finch [Thu, 5 May 2022 17:36:48 +0000 (18:36 +0100)]
Clean up remaining references to global compression
It is simply called "compression" now, without any qualifiers. Also,
improve some variable names in dns_name_towire2() so they are not two
letter abbreviations for global something.
Tony Finch [Thu, 5 May 2022 15:36:52 +0000 (16:36 +0100)]
Shrink decompression contexts
It's wasteful to use 20 bytes and a pointer indirection to represent
two bits of information, so turn the struct into an enum. And change
the names of the enumeration constants to make the intent more clear.
This change introduces some inline functions into another header,
which confuses `gcovr` when it is trying to collect code coverage
statistics. So, in the CI job, copy more header files into a directory
where `gcovr` looks for them.
Tony Finch [Wed, 4 May 2022 16:35:39 +0000 (17:35 +0100)]
DNS name compression does not depend on the EDNS version
There was a proposal in the late 1990s that it might, but it turned
out to be unworkable. See RFC 6891, Extension Mechanisms for
DNS (EDNS(0)), section 5, Extended Label Types.
The remnants of the code that supported this in BIND are redundant.
Tony Finch [Wed, 4 May 2022 08:38:54 +0000 (09:38 +0100)]
Remove obsolete notes on name compression
These notes describe the initial compression design for BIND 9 in
1998/1999, when the IETF had some over-optimistic plans for using EDNS
to change the wire format of domain names. (Another example was
bitstring labels for IPv6 reverse DNS.) By the end of 2000 the EDNS
name compression schemes had been abandoned, and BIND 9's compression
code was rewritten to use a hash table.
There is nothing left of the implementation described here, and the
API functions are better described in `compress.h`, so these notes are
more misleading than helpful. Those who are interested in the past can
look at the version control history.