Evan Hunt [Fri, 4 Jun 2021 02:35:01 +0000 (19:35 -0700)]
WIP: non-blocking zone lookup
- add a DNS_DBFIND_NONBLOCK option, which causes dns_db_find/findext()
to return ISC_R_LOCKBUSY if unable to acquire a read lock on the
database.
- use the asynchronous hook mechanism to postpone query_lookup() if
this occurs.
Mark Andrews [Wed, 3 Feb 2021 00:10:20 +0000 (11:10 +1100)]
Add a system test checking a malformed IXFR
Make sure an incoming IXFR containing an SOA record which is not placed
at the apex of the transferred zone does not result in a broken version
of the zone being served by named and/or a subsequent crash.
Ondřej Surý [Wed, 2 Jun 2021 09:23:36 +0000 (11:23 +0200)]
Cleanup the remaining of HAVE_UV_<func> macros
While cleaning up the usage of HAVE_UV_<func> macros, we forgot to
cleanup the HAVE_UV_UDP_CONNECT in the actual code and
HAVE_UV_TRANSLATE_SYS_ERROR and this was causing Windows build to fail
on uv_udp_send() because the socket was already connected and we were
falsely assuming that it was not.
The platforms with autoconf support were not affected, because we were
still checking for the functions from the configure.
Artem Boldariev [Mon, 24 May 2021 11:14:17 +0000 (14:14 +0300)]
HTTP/2 write buffering
This commit adds the ability to consolidate HTTP/2 write requests if
there is already one in flight. If it is the case, the code will
consolidate multiple subsequent write request into a larger one
allowing to utilise the network in a more efficient way by creating
larger TCP packets as well as by reducing TLS records overhead (by
creating large TLS records instead of multiple small ones).
This optimisation is especially efficient for clients, creating many
concurrent HTTP/2 streams over a transport connection at once. This
way, the code might create a small amount of multi-kilobyte requests
instead of many 50-120 byte ones.
In fact, it turned out to work so well that I had to add a work-around
to the code to ensure compatibility with the flamethrower, which, at
the time of writing, does not support TLS records larger than two
kilobytes. Now the code tries to flush the write buffer after 1.5
kilobyte, which is still pretty adequate for our use case.
Essentially, this commit implements a recommendation given by nghttp2
library:
Ondřej Surý [Thu, 27 May 2021 09:04:37 +0000 (11:04 +0200)]
Indicate to the kernel that we won't be needing the zone dumps
Add a call to posix_fadvise() to indicate to the kernel, that `named`
won't be needing the dumped zone files any time soon with:
* POSIX_FADV_DONTNEED - The specified data will not be accessed in the
near future.
Notes:
POSIX_FADV_DONTNEED attempts to free cached pages associated with the
specified region. This is useful, for example, while streaming large
files. A program may periodically request the kernel to free cached
data that has already been used, so that more useful cached pages are
not discarded instead.
Ondřej Surý [Thu, 27 May 2021 07:45:07 +0000 (09:45 +0200)]
Refactor zone dumping code to use netmgr async threadpools
Previously, dumping the zones to the files were quantized, so it doesn't
slow down network IO processing. With the introduction of network
manager asynchronous threadpools, we can move the IO intensive work to
use that API and we don't have to quantize the work anymore as it the
file IO won't block anything except other zone dumping processes.
Ondřej Surý [Thu, 27 May 2021 07:45:07 +0000 (09:45 +0200)]
Add asynchronous work API to the network manager
The libuv has a support for running long running tasks in the dedicated
threadpools, so it doesn't affect networking IO.
This commit adds isc_nm_work_enqueue() wrapper that would wraps around
the libuv API and runs it on top of associated worker loop.
The only limitation is that the function must be called from inside
network manager thread, so the call to the function should be wrapped
inside a (bound) task.
Ondřej Surý [Thu, 27 May 2021 11:38:21 +0000 (13:38 +0200)]
Use UV_VERSION_HEX to decide whether we need libuv shim functions
Instead of having a configure check for every missing function that has
been added in later version of libuv, we now use UV_VERSION_HEX to
decide whether we need the shim or not.
Ondřej Surý [Thu, 27 May 2021 09:46:38 +0000 (11:46 +0200)]
Add uv_req_get_data() and uv_req_set_data() compatibility shims
The uv_req_get_data() and uv_req_set_data() functions were introduced in
libuv >= 1.19.0, so we need to add compatibility shims with older libuv
versions.
However, sphinx-rtd-theme 0.5.2 is incompatible with versions 0.17+ of
the docutils package. This problem was addressed in the Docker image
used for building man pages by downgrading the docutils package to
version 0.16.
Matthijs Mekking [Thu, 27 May 2021 07:43:21 +0000 (09:43 +0200)]
Reuse rdatset->ttl when dumping ancient RRsets
Rather than having an expensive 'expired' (fka 'stale_ttl') in the
rdataset structure, that is only used to be printed in a comment on
ancient RRsets, reuse the TTL field of the RRset.
Kevin Chen [Thu, 11 Mar 2021 14:45:44 +0000 (15:45 +0100)]
Several serve-stale improvements
Commit a83c8cb0afd88d54b9cf67239f2495c9b0391e97 updated masterdump so
that stale records in "rndc dumpdb" output no longer shows 0 TTLs. In
this commit we change the name of the `rdataset->stale_ttl` field to
`rdataset->expired` to make its purpose clearer, and set it to zero in
cases where it's unused.
Add 'rbtdb->serve_stale_ttl' to various checks so that stale records
are not purged from the cache when they've been stale for RBTDB_VIRTUAL
(300) seconds.
Increment 'ns_statscounter_usedstale' when a stale answer is used.
Note: There was a question of whether 'overmem_purge' should be
purging ancient records, instead of stale ones. It is left as purging
stale records, since stale records could take up the majority of the
cache.
This submission is copyrighted Akamai Technologies, Inc. and provided
under an MPL 2.0 license.
This commit was originally authored by Kevin Chen, and was updated by
Matthijs Mekking to match recent serve-stale developments.
Matthijs Mekking [Thu, 27 May 2021 13:54:13 +0000 (15:54 +0200)]
Reset DNS_FETCHOPT_TRYSTALE_ONTIMEOUT on resume
Once we resume a query, we should clear DNS_FETCHOPT_TRYSTALE_ONTIMEOUT
from the options to prevent triggering the stale-answer-client-timeout
on subsequent fetches.
If we don't this may cause a crash when for example when prefetch is
triggered after a query restart.
Matthijs Mekking [Thu, 27 May 2021 13:44:09 +0000 (15:44 +0200)]
Test with stale timeout cache miss, then fetch completes
Add a test case where a client request is received and the stale
timeout occurs, but it is not served stale data because there is no entry
in the cache, then is served an authoritative answer once the background
fetch completes. This ensures that a stale timeout only affects a
subsequent response if the client was answered.
Evan Hunt [Wed, 26 May 2021 22:35:18 +0000 (15:35 -0700)]
clean up query correctly if already answered by serve-stale
when a serve-stale answer has been sent, the client continues waiting
for a proper answer. if a final completion event for the client does
arrive, it can just be cleaned up without sending a response, similar
to a canceled fetch.
Evan Hunt [Wed, 26 May 2021 21:10:50 +0000 (14:10 -0700)]
add a test of DNS64 processing with a stale negative response
- send a query for an AAAA which will be resolved as a mapped A
- disable authoritative responses
- wait for the negative AAAA response to become stale
- send another query, wait for the stale answer
- re-enable authorative responses so that a real answer arrives
- currently, this triggers an assertion in query.c
Mark Andrews [Wed, 19 May 2021 06:38:33 +0000 (16:38 +1000)]
Remove priority from attribute constructor/destructor
On some platforms, the __attribute__ constructor and destructor won't
take priorities and the compilation failed. On such platform would be
macOS. For this reason, the constructor/destructor in the libisc was
reworked to not use priorities, but have a single constructor and
destructor that calls the appropriate routines in correct order.
This commit removes the extra priority because it's now not needed and
it also breaks a compilation on macOS with GCC 10.
Diego Fronza [Tue, 20 Apr 2021 15:59:46 +0000 (12:59 -0300)]
Handling NoNameservers exception
In the shutdown system test multiple queries are sent to a resolver
instance, in the meantime we terminate the same resolver process for
which the queries were sent to, either via rndc stop or a SIGTERM
signal, that means the resolver may not be able to answer all those
queries, since it has initiated the shutdown process.
The dnspython library raises a dns.resolver.NoNameservers exception when
a resolver object fails to receive an answer from the specified list
of nameservers (resolver.nameservers list), we need to handle this
exception as this is something that may happen since we asked the
resolver to terminate, as a result it may not answer clients even if
an answer is available, as the operation will be canceled.
Mark Andrews [Wed, 26 May 2021 07:55:43 +0000 (17:55 +1000)]
Add missing initialisations
configuring with --enable-mutex-atomics flagged these incorrectly
initialised variables on systems where pthread_mutex_init doesn't
just zero out the structure.
Ondřej Surý [Wed, 26 May 2021 08:01:30 +0000 (10:01 +0200)]
Fix the sizeof() for array holding the pointers to clientmgr
The size of the array holding the pointers to clientmgr was created so
big it could hold the actual clientmgr objects, not just the pointer.
This commit fixes the size to be just the ncpus * sizeof(pointer).
Ondřej Surý [Wed, 26 May 2021 06:15:34 +0000 (08:15 +0200)]
Refactor the interface handling in the netmgr
The isc_nmiface_t type was holding just a single isc_sockaddr_t,
so we got rid of the datatype and use plain isc_sockaddr_t in place
where isc_nmiface_t was used before. This means less type-casting and
shorter path to access isc_sockaddr_t members.
At the same time, instead of keeping the reference to the isc_sockaddr_t
that was passed to us when we start listening, we will keep a local
copy. This prevents the data race on destruction of the ns_interface_t
objects where pending nmsockets could reference the sockaddr of already
destroyed ns_interface_t object.
Mark Andrews [Tue, 11 May 2021 06:39:41 +0000 (16:39 +1000)]
Correct size calculation in dns_journal_iter_init()
* dns_journal_next() leaves the read point in the journal after the
transaction header so journal_seek() should be inside the loop.
* we need to recover from transaction header inconsistencies
Additionally when correcting for <size, serial0, serial1, 0> the
correct consistency check is isc_serial_gt() rather than
isc_serial_ge(). All instances updated.
Michal Nowak [Tue, 13 Apr 2021 16:58:22 +0000 (18:58 +0200)]
Install BIND with "make DESTDIR=<PATH> install"
BIND installation should be done by setting DESTDIR during "make
install" not by setting prefix via ./configure.
Make sure that installation with DESTDIR=<PATH> works by checking that
named binary and it's respective man page were installed and that
well-known BIND9 directories - and only them - are present in DESTDIR.
Also rename install path variable from BIND_INSTALL_PATH to
INSTALL_PATH to avoid namespace clash in stress tests which use
BIND_INSTALL_PATH variable to configure path to BIND9 binaries.
Ondřej Surý [Fri, 12 Mar 2021 13:02:57 +0000 (14:02 +0100)]
Replace Ubuntu 16.04 LTS with Ubuntu 18.04 LTS in the GitLab CI
Ubuntu 16.04 (Xenial Xerus) is reaching End of Standard Support in April
2021 thus we are removing it from the list of supported platforms and
replacing it with Ubuntu 18.04 LTS (Bionic Beaver).
Ondřej Surý [Sun, 23 May 2021 13:36:06 +0000 (15:36 +0200)]
Adjust the fillcount and freemax for dns_message mempools
According to the measurements (recorded on GL!5085), the fillcount of 2
for namepool and fillcount of 4 for rdspool can fit 99.99% of request
for tested scenarios.
This was discovered by perf recording the single second recursive test
using flamethrower where the initial malloc lit up like a flare.
Ondřej Surý [Tue, 11 May 2021 10:09:15 +0000 (12:09 +0200)]
Don't create per bucket memory contexts in resolver
Similarly, the resolver code would create hundreds of memory contexts
just on the resolver setup. The contention will be reduced directly in
the allocator, so for now just attach to the view memory instead of
creating separate memory context for each bucket.
Ondřej Surý [Tue, 18 May 2021 17:44:31 +0000 (19:44 +0200)]
Reduce the number of client tasks and bind them to netmgr queues
Since a client object is bound to a netmgr handle, each client
will always be processed by the same netmgr worker, so we can
simplify the code by binding client->task to the same thread as
the client. Since ns__client_request() now runs in the same event
loop as client->task events, is no longer necessary to pause the
task manager before launching them.
Also removed some functions in isc_task that were not used.
The original goal was to reduce the contention when allocating the
memory, but after a while nobody noticed that the amount of memory
context allocated would not reduce contention at all.
This commit removes the whole mctxpool and just uses the mctx from
clientmgr as the contention will be reduced directly in the allocator.
Michal Nowak [Thu, 20 May 2021 16:00:28 +0000 (18:00 +0200)]
Run gcc:tarball CI job for merge requests
Running gcc:tarball CI job for merge requests is consistent with how we
run gcc:out-of-tree CI job and should help identify problems with the
build system during the review process, not once merged during daily
runs. For the sake of time, unit and system tests associated with the
gcc:tarball CI job are excluded from merge requests.
Michal Nowak [Thu, 20 May 2021 08:56:12 +0000 (10:56 +0200)]
Create an anchor for schedules, tags, and web rules
It's a common pattern to spawn CI jobs only for pipelines triggered by
schedules, tags, and web. There should be an anchor so that the rules
are not repeated.
Evan Hunt [Sat, 22 May 2021 02:33:54 +0000 (19:33 -0700)]
extend rndc timeout to 60 seconds
the idle timeout for rndc connections was set to 10 seconds, but this
caused intermittent system failures of the 'rndc' system test on slow
platforms, since 'rndc reconfig' could time out before reconfiguration
was complete.
this commit restores the original timeout value of 60 seconds, which was
changed inadvertently after rndc was updated to use the network manager.
even with this change, however, the test can still time out under
TSAN because loading the huge zone can take a very long time (upwards
of two minutes). so the test is modified here to generate a smaller zone
file when running under TSAN.
Evan Hunt [Sat, 22 May 2021 00:10:59 +0000 (17:10 -0700)]
remove the remaining uses of dns_name_copy()
dns_name_copy() has been replaced nearly everywhere with
dns_name_copynf(). this commit changes the last two uses of
the original function. afterward, we can remove the old
dns_name_copy() implementation, and replace it with _copynf().
Ondřej Surý [Fri, 21 May 2021 13:30:00 +0000 (15:30 +0200)]
Use dns_name_copynf() with dns_message_gettempname() when needed
dns_message_gettempname() returns an initialized name with a dedicated
buffer, associated with a dns_fixedname object. Using dns_name_copynf()
to write a name into this object will actually copy the name data
from a source name. dns_name_clone() merely points target->ndata to
source->ndata, so it is faster, but it can lead to a use-after-free if
the source is freed before the target object is released via
dns_message_puttempname().
In a few places, clone was being used where copynf should have been;
this is now fixed.
As a side note, no memory was lost, because the ndata buffer used in
the dns_fixedname_t is internal to the structure, and is freed when
the dns_fixedname_t is freed regardless of the .ndata contents.