Fix the clang 12 warnings with multi-line strings in string arrays
The clang 12 has a new warning that warns when using multi-line strings
in the string arrays, f.e.:
{ "aa",
"b"
"b",
"cc" }
would generate warning like this:
private_test.c:162:7: error: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Werror,-Wstring-concatenation]
"33333/RSASHA1" };
^
private_test.c:161:7: note: place parentheses around the string literal to silence warning
"Done removing signatures for key "
^
private_test.c:197:7: error: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Werror,-Wstring-concatenation]
"NSEC chain",
^
private_test.c:196:7: note: place parentheses around the string literal to silence warning
"Removing NSEC3 chain 1 0 30 DEAF / creating "
^
2 errors generated.
As the query_prefetch() or query_rpzfetch() could be called during
"regular" fetch, we need to introduce separate storage for attaching
the nmhandle during prefetching the records. The query_prefetch()
and query_rpzfetch() are guarded for re-entrance by .query.prefetch
member of ns_client_t, so we can reuse the same .prefetchhandle for
both.
Refactor the pausing/unpausing and finishing the nm_thread
The isc_nm_pause(), isc_nm_resume() and finishing the nm_thread() from
nm_destroy() has been refactored, so all use the netievents instead of
directly touching the worker structure members. This allows us to
remove most of the locking as the .paused and .finished members are
always accessed from the matching nm_thread.
When shutting down the nm_thread(), instead of issuing uv_stop(), we
just shutdown the .async handler, so all uv_loop_t events are properly
finished first and uv_run() ends gracefully with no outstanding active
handles in the loop.
If NETMGR_TRACE is defined, we now maintain a list of active sockets
in the netmgr object and a list of active handles in each socket
object; by walking the list and printing `backtrace` in a debugger
we can see where they were created, to assist in in debugging of
reference counting errors.
On shutdown, if netmgr finds there are still active sockets after
waiting, isc__nm_dump_active() will be called to log the list of
active sockets and their underlying handles, along with some details
about them.
if more than 10 seconds pass while we wait for netmgr events to
finish running on shutdown, something is almost certainly wrong
and we should assert and crash.
change from isc_nmhandle_ref/unref to isc_nmhandle attach/detach
Attaching and detaching handle pointers will make it easier to
determine where and why reference counting errors have occurred.
A handle needs to be referenced more than once when multiple
asynchronous operations are in flight, so callers must now maintain
multiple handle pointers for each pending operation. For example,
ns_client objects now contain:
- reqhandle: held while waiting for a request callback (query,
notify, update)
- sendhandle: held while waiting for a send callback
- fetchhandle: held while waiting for a recursive fetch to
complete
- updatehandle: held while waiting for an update-forwarding
task to complete
Witold Kręcicki [Wed, 10 Jun 2020 09:32:39 +0000 (11:32 +0200)]
assorted small netmgr-related changes
- rename isc_nmsocket_t->tcphandle to statichandle
- cancelread functions now take handles instead of sockets
- add a 'client' flag in socket objects, currently unused, to
indicate whether it is to be used as a client or server socket
Each worker has a receive buffer with space for 20 DNS messages of up
to 2^16 bytes each, and the allocator function passed to uv_read_start()
or uv_udp_recv_start() will reserve a portion of it for use by sockets.
UDP can use recvmmsg() and so it needs that entire space, but TCP reads
one message at a time.
This commit introduces separate allocator functions for TCP and UDP
setting different buffer size limits, so that libuv will provide the
correct buffer sizes to each of them.
netmgr: retry binding with IP_FREEBIND when EADDRNOTAVAIL is returned.
When a new IPv6 interface/address appears it's first in a tentative
state - in which we cannot bind to it, yet it's already being reported
by the route socket. Because of that BIND9 is unable to listen on any
newly detected IPv6 addresses. Fix it by setting IP_FREEBIND option (or
equivalent option on other OSes) and then retrying bind() call.
Don't destroy a non-closed socket, wait for all the callbacks.
We erroneously tried to destroy a socket after issuing
isc__nm_tcp{,dns}_close. Under some (race) circumstances we could get
nm_socket_cleanup to be called twice for the same socket, causing an
access to a dead memory.
Witold Kręcicki [Mon, 29 Jun 2020 06:43:54 +0000 (08:43 +0200)]
Fix possible race in isc__nm_tcpconnect.
There's a possibility of race in isc__nm_tcpconnect if the asynchronous
connect operation finishes with all the callbacks before we exit the
isc__nm_tcpconnect itself we might access an already freed memory.
Fix it by creating an additional reference to the socket freed at the
end of isc__nm_tcpconnect.
Evan Hunt [Wed, 17 Jun 2020 19:09:10 +0000 (12:09 -0700)]
restore "blackhole" functionality
the blackhole ACL was accidentally disabled with respect to client
queries during the netmgr conversion.
in order to make this work for TCP, it was necessary to add a return
code to the accept callback functions passed to isc_nm_listentcp() and
isc_nm_listentcpdns().
Evan Hunt [Mon, 22 Jun 2020 23:45:47 +0000 (16:45 -0700)]
Make netmgr tcpdns send calls asynchronous
isc__nm_tcpdns_send() was not asynchronous and accessed socket
internal fields in an unsafe manner, which could lead to a race
condition and subsequent crash. Fix it by moving tcpdns processing
to a proper netmgr thread.
Witold Kręcicki [Mon, 22 Jun 2020 22:46:11 +0000 (15:46 -0700)]
Fix a shutdown race in netmgr udp
We need to mark the socket as inactive early (and synchronously)
in the stoplistening process; otherwise we might destroy the
callback argument before we actually stop listening, and call
the callback on bad memory.
Evan Hunt [Sat, 6 Jun 2020 00:32:36 +0000 (17:32 -0700)]
implement isc_nm_cancelread()
The isc_nm_cancelread() function cancels reading on a connected
socket and calls its read callback function with a 'result'
parameter of ISC_R_CANCELED.
when isc_nm_destroy() is called, there's a loop that waits for
other references to be detached, pausing and unpausing the netmgr
to ensure that all the workers' events are run, followed by a
1-second sleep. this caused a delay on shutdown which will be
noticeable when netmgr is used in tools other than named itself,
so the delay has now been reduced to a hundredth of a second.
Evan Hunt [Tue, 17 Dec 2019 02:24:55 +0000 (18:24 -0800)]
implement isc_nm_tcpconnect()
the isc_nm_tcpconnect() function establishes a client connection via
TCP. once the connection is esablished, a callback function will be
called with a newly created network manager handle.
Witold Kręcicki [Wed, 10 Jun 2020 00:07:16 +0000 (17:07 -0700)]
allow tcpdns sockets to self-reference while connected
A TCPDNS socket creates a handle for each complete DNS message.
Previously, when all the handles were disconnected, the socket
would be closed, but the wrapped TCP socket might still have
more to read.
Now, when a connection is established, the TCPDNS socket creates
a reference to itself by attaching itself to sock->self. This
reference isn't cleared until the connection is closed via
EOF, timeout, or server shutdown. This allows the socket to remain
open even when there are no active handles for it.
Evan Hunt [Fri, 5 Jun 2020 06:13:54 +0000 (23:13 -0700)]
modify reference counting within netmgr
- isc__nmhandle_get() now attaches to the sock in the nmhandle object.
the caller is responsible for dereferencing the original socket
pointer when necessary.
- tcpdns listener sockets attach sock->outer to the outer tcp listener
socket. tcpdns connected sockets attach sock->outerhandle to the handle
for the tcp connected socket.
- only listener sockets need to be attached/detached directly. connected
sockets should only be accessed and reference-counted via their
associated handles.
Evan Hunt [Thu, 4 Jun 2020 21:54:36 +0000 (14:54 -0700)]
make isc_nmsocket_{attach,detach}{} functions private
there is no need for a caller to reference-count socket objects.
they need tto be able tto close listener sockets (i.e., those
returned by isc_nm_listen{udp,tcp,tcpdns}), and an isc_nmsocket_close()
function has been added for that. other sockets are only accessed via
handles.
Remove the .key from the beginning of the line in rst file
The handling of . (dot) characted at the beginning of the line has
changed between the sphinx-doc versions, and it was constantly giving us
trouble when generating man pages when using different sphinx-doc. This
commit just changes the source rst file, so there's no more . (dot) the
beginning of the line.
The dns_message_create() cannot fail, change the return to void
The dns_message_create() function cannot soft fail (as all memory
allocations either succeed or cause abort), so we change the function to
return void and cleanup the calls.
Diego Fronza [Mon, 21 Sep 2020 21:19:49 +0000 (18:19 -0300)]
cocci: Add semantic patch to refactor dns_message_destroy()
dns_message_t objects are now being handled using reference counting
semantics, so now dns_message_destroy() is not called directly anymore,
dns_message_detach must be called instead.
Diego Fronza [Mon, 21 Sep 2020 20:44:29 +0000 (17:44 -0300)]
Properly handling dns_message_t shared references
This commit fix the problems that arose when moving the dns_message_t
object from fetchctx_t to the query structure.
Since the lifetime of query objects are different than that of a
fetchctx and the dns_message_t object held by the query may be being
used by some external module, e.g. validator, even after the query
may have been destroyed, propery handling of the references to the
message were added in this commit to avoid accessing an already
destroyed object.
Specifically, in rctx_done(), a reference to the message is attached
at the beginning of the function and detached at the end, since a
possible call to fctx_cancelquery() would release the dns_message_t
object, and in the next lines of code a call to rctx_nextserver()
or rctx_chaseds() would require a valid pointer to the same object.
In valcreate() a new reference is attached to the message object,
this ensures that if the corresponding query object is destroyed
before the validator attempts to access it, no invalid pointer
access occurs.
In validated() we have to attach a new reference to the message,
since we destroy the validator object at the beginning of the
function, and we need access to the message in the next lines of
the same function.
rctx_nextserver() and rctx_chaseds() functions were adapted to
receive a new parameter of dns_message_t* type, this was so they
could receive a valid reference to a dns_message_t since using the
response context respctx_t to access the message through
rctx->query->rmessage could lead to an already released reference
due to the query being canceled.
Diego Fronza [Mon, 21 Sep 2020 20:32:39 +0000 (17:32 -0300)]
Fix invalid dns message state in resolver's logic
The assertion failure REQUIRE(msg->state == DNS_SECTION_ANY), caused
by calling dns_message_setclass within function resquery_response()
in resolver.c, was happening due to wrong management of dns message_t
objects used to process responses to the queries issued by the
resolver.
Before the fix, a resolver's fetch context (fetchctx_t) would hold
a pointer to the message, this same reference would then be used
over all the attempts to resolve the query, trying next server,
etc... for this to work the message object would have it's state
reset between each iteration, marking it as ready for a new processing.
The problem arose in a scenario with many different forwarders
configured, managing the state of the dns_message_t object was
lacking better synchronization, which have led it to a invalid
dns_message_t state in resquery_response().
Instead of adding unnecessarily complex code to synchronize the
object, the dns_message_t object was moved from fetchctx_t structure
to the query structure, where it better belongs to, since each query
will produce a response, this way whenever a new query is created
an associated dns_messate_t is also created.
This commit deals mainly with moving the dns_message_t object from
fetchctx_t to the query structure.
Michał Kępień [Mon, 28 Sep 2020 07:21:59 +0000 (09:21 +0200)]
Make native PKCS#11 require dlopen() support
PKCS#11 support in BIND requires dlopen() support from the operating
system and thus building with "--enable-native-pkcs11 --without-dlopen"
should not be possible. Add an Autoconf check which enforces that
constraint. Adjust the pairwise testing model accordingly.
Michał Kępień [Mon, 28 Sep 2020 07:16:48 +0000 (09:16 +0200)]
Fix function overrides in unit tests on macOS
Since Mac OS X 10.1, Mach-O object files are by default built with a
so-called two-level namespace which prevents symbol lookups in BIND unit
tests that attempt to override the implementations of certain library
functions from working as intended. This feature can be disabled by
passing the "-flat_namespace" flag to the linker. Fix unit tests
affected by this issue on macOS by adding "-flat_namespace" to LDFLAGS
used for building all object files on that operating system (it is not
enough to only set that flag for the unit test executables).
Michał Kępień [Mon, 28 Sep 2020 07:16:48 +0000 (09:16 +0200)]
Clean up use of function wrapping
Currently, building BIND using "--without-dlopen" universally breaks
building unit tests which employ the --wrap linker option (because the
replacement functions are put in a shared library and building shared
objects requires "--with-dlopen"). Fix by moving the overridden symbol,
isc_nmhandle_unref(), to lib/ns/tests/nstest.c and dropping
lib/ns/tests/wrap.c altogether. This makes lib/ns/tests/Makefile.in
simpler and prevents --without-dlopen from messing with the process of
building unit tests.
Remove parts of configure.ac which are made redundant by the above
changes.
Put the replacement definition of isc_nmhandle_unref() inside an #ifdef
block, so that the build does not break for non-libtool builds (see
below).
These changes allow the broadest possible set of build variants to work
while also simplifying the build process:
- for libtool builds, overriding isc_nmhandle_unref() is done by
placing that symbol directly in lib/ns/tests/nstest.c and relying on
the dynamic linker to perform symbol resolution in the expected way
when the test binary is run,
- for non-libtool builds, overriding isc_nmhandle_unref() is done
using the --wrap linker option (the libtool approach cannot be used
in this case as multiple strong symbols with the same name cannot
coexist in the same binary),
- the "--without-dlopen" option no longer affects building unit tests.
Mark Andrews [Fri, 25 Sep 2020 07:42:41 +0000 (17:42 +1000)]
Wait for 'rpz: policy: reload done' to signalled before proceeding.
RPZ rules cannot be fully relied upon until the summary RPZ database is
updated after an "rndc reload". Wait until the relevant message is
logged after an "rndc reload" to prevent false positives in the
"rpzrecurse" system test caused by the RPZ rules not yet being in effect
by the time ns3 is queried.
Evan Hunt [Wed, 22 May 2019 08:58:41 +0000 (10:58 +0200)]
Purge memory pool upon plugin destruction
The typical sequence of events for AAAA queries which trigger recursion
for an A RRset at the same name is as follows:
1. Original query context is created.
2. An AAAA RRset is found in cache.
3. Client-specific data is allocated from the filter-aaaa memory pool.
4. Recursion is triggered for an A RRset.
5. Original query context is torn down.
6. Recursion for an A RRset completes.
7. A second query context is created.
8. Client-specific data is retrieved from the filter-aaaa memory pool.
9. The response to be sent is processed according to configuration.
10. The response is sent.
11. Client-specific data is returned to the filter-aaaa memory pool.
12. The second query context is torn down.
However, steps 6-12 are not executed if recursion for an A RRset is
canceled. Thus, if named is in the process of recursing for A RRsets
when a shutdown is requested, the filter-aaaa memory pool will have
outstanding allocations which will never get released. This in turn
leads to a crash since every memory pool must not have any outstanding
allocations by the time isc_mempool_destroy() is called.
Fix by creating a stub query context whenever fetch_callback() is called,
including cancellation events. When the qctx is destroyed, it will ensure
the client is detached and the plugin memory is freed.
Matthijs Mekking [Thu, 13 Aug 2020 05:47:27 +0000 (07:47 +0200)]
Include expired rdatasets in iteration functions
By changing the check in 'rdatasetiter_first' and 'rdatasetiter_next'
from "now > header->rdh_ttl" to "now - RBDTB_VIRTUAL > header->rdh_ttl"
we include expired rdataset entries so that they can be used for
"rndc dumpdb -expired".
Matthijs Mekking [Thu, 13 Aug 2020 05:58:42 +0000 (07:58 +0200)]
Minor changes to serve-stale tests
Minor changes are:
- Replace the "$RNDCCMD dumpdb" logic with "rndc_dumpdb" from
conf.sh.common (it does the same thing).
- Update a comment to match the grep calls below it (comment said the
rest should be expired, while the grep calls indicate that they
are still in the cache, the comment now explains why).
Mark Andrews [Fri, 18 Sep 2020 05:00:35 +0000 (15:00 +1000)]
Clone the saved / query message buffers
The message buffer passed to ns__client_request is only valid for
the life of the the ns__client_request call. Save a copy of it
when we recurse or process a update as ns__client_request will
return before those operations complete.
Mark Andrews [Tue, 22 Sep 2020 05:22:34 +0000 (15:22 +1000)]
Break lock order loop by sending TAT in an event
The dotat() function has been changed to send the TAT
query asynchronously, so there's no lock order loop
because we initialize the data first and then we schedule
the TAT send to happen asynchronously.
This breaks following lock-order loops:
zone->lock (dns_zone_setviewcommit) while holding view->lock
(dns_view_setviewcommit)
keytable->lock (dns_keytable_find) while holding zone->lock
(zone_asyncload)
view->lock (dns_view_findzonecut) while holding keytable->lock
(dns_keytable_forall)
Michal Nowak [Wed, 1 Jul 2020 08:29:36 +0000 (10:29 +0200)]
Add pairwise testing
Pairwise testing is a test case generation technique based on the
observation that most faults are caused by interactions of at most two
factors. For BIND, its configure options can be thought of as such
factors.
Process BIND configure options into a model that is subsequently
processed by the PICT tool in order to find an effective test vector.
That test vector is then used for configuring and building BIND using
various combinations of configure options.