Tony Finch [Thu, 30 Jun 2022 16:19:54 +0000 (17:19 +0100)]
CHANGES note for [GL !6517]
[performance] A new algorithm for DNS name compression based on a
hash set of message offsets. Name compression is now
more complete as well as being generally faster, and
the implementation is less complicated and requires
much less memory.
Tony Finch [Fri, 16 Sep 2022 17:55:48 +0000 (18:55 +0100)]
Test compression context hash set collisions
Check that names are correctly added and deleted in the compression
context. Use many names with differing numerical prefixes to make it
relatively easy to identify and debug problems.
Tony Finch [Thu, 23 Jun 2022 20:53:46 +0000 (21:53 +0100)]
Simplify and speed up DNS name compression
All we need for compression is a very small hash set of compression
offsets, because most of the information we need (the previously added
names) can be found in the message using the compression offsets.
This change combines dns_compress_find() and dns_compress_add() into
one function dns_compress_name() that both finds any existing suffix,
and adds any new prefix to the table. The old split led to performance
problems caused by duplicate names in the compression context.
Compression contexts are now either small or large, which the caller
chooses depending on the expected size of the message. There is no
dynamic resizing.
There is a behaviour change: compression now acts on all the labels in
each name, instead of just the last few.
A small benchmark suggests this is about 2x faster.
Artem Boldariev [Fri, 14 Oct 2022 18:36:51 +0000 (21:36 +0300)]
Fix isc_nmsocket_set_tlsctx()
During loop manager refactoring isc_nmsocket_set_tlsctx() was not
properly adapted. The function is expected to broadcast the new TLS
context for every worker, but this behaviour was accidentally broken.
Ondřej Surý [Fri, 14 Oct 2022 14:10:41 +0000 (16:10 +0200)]
Improve reporting for pthread_once errors
Replace all uses of RUNTIME_CHECK() in lib/isc/include/isc/once.h with
PTHEADS_RUNTIME_CHECK(), in order to improve error reporting for any
once-related run-time failures (by augmenting error messages with
file/line/caller information and the error string corresponding to
errno).
Tom Krizek [Mon, 10 Oct 2022 11:28:29 +0000 (13:28 +0200)]
Remove system test delzone
There are multiple reasons to remove this test as obsolete:
- The test may not possibly work for over 2.5 years, since 98b3b93791777218c04a67ddaef22619162249f7 removed the rndc.py python
tool on which this test relies.
- It isn't part of the test suite either in CI or locally unless it is
explicitly enabled. As a result, there are many issues which prevent
the test from being executed caused by various refactoring efforts
accumulated over time.
- Even if the test could be executed, it has no clear failure condition.
If the python script(s) fail, the test still passes.
Ondřej Surý [Thu, 6 Oct 2022 10:56:25 +0000 (12:56 +0200)]
Replace the statschannel truncated tests with two new tests
Now that the artificial limit on the recv buffer has been removed, the
current system test always fails because it tests if the truncation has
happened.
Add test that sending more than 10 headers makes the connection to
closed; and add test that sending huge HTTP request makes the connection
to be closed.
Ondřej Surý [Thu, 6 Oct 2022 10:56:25 +0000 (12:56 +0200)]
Rewrite isc_httpd using picohttpparser and isc_url_parse
Rewrite the isc_httpd to be more robust.
1. Replace the hand-crafted HTTP request parser with picohttpparser for
parsing the whole HTTP/1.0 and HTTP/1.1 requests. Limit the number
of allowed headers to 10 (arbitrary number).
2. Replace the hand-crafted URL parser with isc_url_parse for parsing
the URL from the HTTP request.
3. Increase the receive buffer to match the isc_netmgr buffers, so we
can at least receive two full isc_nm_read()s. This makes the
truncation processing much simpler.
4. Process the received buffer from single isc_nm_read() in a single
loop and schedule the sends to be independent of each other.
The first two changes makes the code simpler and rely on already
existing libraries that we already had (isc_url based on nodejs) or are
used elsewhere (picohttpparser).
The second two changes remove the artificial "truncation" limit on
parsing multiple request. Now only a request that has too many
headers (currently 10) or is too big (so, the receive buffer fills up
without reaching end of the request) will end the connection.
We can be benevolent here with the limites, because the statschannel
channel is by definition private and access must be allowed only to
administrators of the server. There are no timers, no rate-limiting, no
upper limit on the number of requests that can be served, etc.
Ondřej Surý [Fri, 7 Oct 2022 13:48:26 +0000 (15:48 +0200)]
Add picohttpparser.{c.h} from https://github.com/h2o/picohttpparser
PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.
Unlike most parsers, it is stateless and does not allocate memory by
itself. All it does is accept pointer to buffer and the output
structure, and setups the pointers in the latter to point at the
necessary portions of the buffer.
Petr Špaček [Thu, 13 Oct 2022 12:33:52 +0000 (14:33 +0200)]
Add list of meaningless commits to .git-blame-ignore-revs
Works nicely together with:
git config --add blame.ignoreRevsFile .git-blame-ignore-revs
The list was generated by hand-picking from git log --oneline augmented
with:
--author=tbox
--grep=clang-format
--grep=copyright
--grep=reformat
--grep=whitespace
plus
git log --format='commit %H %s' --stat | grep -E 'commit|changed' | grep -B1 '[0-9][0-9][0-9] files changed'
plus some sanity checking.
Comments were added with:
for COMMIT in $(cat .git-blame-ignore-revs)
do git log -1 --format="# %s" "$COMMIT"
echo $COMMIT
done
Petr Špaček [Thu, 13 Oct 2022 08:33:41 +0000 (10:33 +0200)]
Replace #define DNS_NAMEATTR_ with struct of bools
sizeof(dns_name_t) did not change but the boolean attributes are now
separated as one-bit structure members. This allows debuggers to
pretty-print dns_name_t attributes without any special hacks, plus we
got rid of manual bit manipulation code.
Petr Špaček [Thu, 13 Oct 2022 11:08:22 +0000 (13:08 +0200)]
Fix latent bug in RBT node attributes handling
Originally RBT node stored three lowest bits from dns_name_t attributes.
This had a curious side-effect noticed by Tony Finch:
If you create an rbt node from a DYNAMIC name then the flag will be
propagated through dns_rbt_namefromnode() ... if you subsequently call
dns_name_free() it will try to isc_mem_put() a piece of an rbt node ...
but dns_name_free() REQUIRE()s that the name is dynamic so in the usual
case where rbt nodes are created from non-dynamic names, this kind of
code will fail an assertion.
This is a bug it dates back to june 1999 when NAMEATTR_DYNAMIC was
invented.
Apparently it does not happen often :-)
I'm planning to get rid of DNS_NAMEATTR_ definitions and bit operations,
so removal of this "three-bit-subset" assignment is a first step.
We can keep only the ABSOLUTE flag in RBT node and nothing else because
names attached to rbt nodes are always readonly: The internal node_name()
function always sets the NAMEATTR_READONLY when making a dns_name that
refers to the node's name, so the READONLY flag will be set in the name
returned by dns_rbt_namefromnode().
Artem Boldariev [Wed, 24 Aug 2022 17:44:47 +0000 (20:44 +0300)]
doth system test: increase transfers-in/out limits
Sometimes doth test could intermittently fail shortly after start due
to inability to complete a zone transfer in time. As it turned out, it
could happen due to transfers-in/out limits. Initially the defaults
were fine, but over time, especially when adding Strict/Mutual TLS, we
added more than 10 zones so it became possible to hit the limits.
This commit takes care of that by bumping the limits.
Artem Boldariev [Wed, 12 Oct 2022 15:46:54 +0000 (18:46 +0300)]
doth system test - decrease HTTP listener quota size
This commit reduces the size of HTTP listener quota from 300 (default)
to 100 so that it would make hitting any global limits in case of
running multiple tests in parallel in multiple containers unlikely.
This way the need in opening many file descriptors of different
kinds (e.g. client side connections and pipes) gets significantly
reduced while the required code paths are still verified.
Ondřej Surý [Tue, 11 Oct 2022 10:29:48 +0000 (12:29 +0200)]
Gracefully handle ISC_R_SHUTTINGDOWN in udp__send_cb
The ISC_R_SHUTTINGDOWN should be handled the same as ISC_R_CANCELED in
the udp__send_cb(), as we might be sending the data while the
loopmgr/netmgr shutdown has been initiated.
Ondřej Surý [Tue, 11 Oct 2022 10:03:17 +0000 (12:03 +0200)]
Make sure the unit test listening and connecting ports are different
In rare circumstances, the UDP port for the listening socket and the UDP
port for the connecting socket might be the same. Because we use the
"reuse" port socket option, this isn't caught when binding the socket,
and thus the connected client socket could send a datagram to itself,
completely bypassing the server. This doesn't happen under normal
operation mode because `named` is listening on a privileged port (53),
and even if not, it doesn't usually talk to itself as the tests do.
Pick an arbitrary port for listening (9153-9156) that is outside the
ephemeral port range for the network manager related unit tests (except
the `doh_test).
Ondřej Surý [Tue, 11 Oct 2022 07:10:16 +0000 (09:10 +0200)]
Don't set load-balancing socket option on the UDP connect sockets
The isc_nm_udpconnect() erroneously set the reuse port with
load-balancing on the outgoing connected UDP sockets. This socket
option makes only sense for the listening sockets. Don't set the
load-balancing reuse port option on the outgoing UDP sockets.
Ondřej Surý [Wed, 12 Oct 2022 07:43:56 +0000 (09:43 +0200)]
Retry on timeout in the UDP recv_one, recv_two and double_read tests
Since we are testing UDP on the localhost and the same interface, the
UDP datagrams can't get lost. Change the connect read callback, so it
starts reading again on the timeout instead of just getting stuck, and
fail when any other result codes than ISC_R_SUCCESS and ISC_R_TIMEDOUT
are received because we don't expect them to happen in these simple
tests.
Artem Boldariev [Wed, 12 Oct 2022 12:26:06 +0000 (15:26 +0300)]
DoH unit test: remove broken remnants of slowdown logic
This commit removes broken remnants of unit test slowdown logic, which
caused unit test hangs on platforms susceptible to "too many open
files" error, notably OpenBSD.
Artem Boldariev [Tue, 4 Oct 2022 18:03:23 +0000 (21:03 +0300)]
TLS: clear error queue before doing IO or calling SSL_get_error()
Ensure that TLS error is empty before calling SSL_get_error() or doing
SSL I/O so that the result will not get affected by prior error
statuses.
In particular, the improper error handling led to intermittent unit
test failure and, thus, could be responsible for some of the system
test failures and other intermittent TLS-related issues.
Ondřej Surý [Wed, 12 Oct 2022 12:40:49 +0000 (14:40 +0200)]
Ignore additional return codes in the netmgr unit tests
There was inconsistency in which error codes would get accepted and
ignored in the network manager unit test callbacks. Add following
results, so we just detach the handle instead of causing assertion
failure:
* ISC_R_SHUTTINGDOWN - when the network manager is shutting down
* ISC_R_CANCELED - the socket has been shut down
* ISC_R_EOF - the (TCP) communication has ended on the other side
* ISC_R_CONNECTIONRESET - the TCP connection was reset
This should fix some of the spurious unit test failures.
Aram Sargsyan [Wed, 12 Oct 2022 08:21:35 +0000 (08:21 +0000)]
Remove a superfluous check of sock->fd against -1
The check is left from when tcp_connect_direct() called isc__nm_socket()
and it was uncertain whether it had succeeded, but now isc__nm_socket()
is called before tcp_connect_direct(), so sock->fd cannot be -1.
*** CID 357292: (REVERSE_NEGATIVE)
/lib/isc/netmgr/tcp.c: 309 in isc_nm_tcpconnect()
303
304 atomic_store(&sock->active, true);
305
306 result = tcp_connect_direct(sock, req);
307 if (result != ISC_R_SUCCESS) {
308 atomic_store(&sock->active, false);
>>> CID 357292: (REVERSE_NEGATIVE)
>>> You might be using variable "sock->fd" before verifying that it is >= 0.
309 if (sock->fd != (uv_os_sock_t)(-1)) {
310 isc__nm_tcp_close(sock);
311 }
312 isc__nm_connectcb(sock, req, result, true);
313 }
314
Ondřej Surý [Tue, 11 Oct 2022 07:06:37 +0000 (09:06 +0200)]
Handle double timeout in udp_cancel_read test
If sending took too long the isc_nm_read() could timeout twice, leading
to extra 'cread' counter in the udp_cancel_read test. Increase the
cread counter only on ISC_R_EOF (canceled read) and deal with the
multiple ISC_R_TIMEOUTS gracefully.
Michał Kępień [Tue, 11 Oct 2022 09:54:57 +0000 (11:54 +0200)]
Fix startup detection after restart in start.pl
The bin/tests/system/start.pl script waits until a "running" message is
logged by a given name server instance before attempting to send a
version.bind/CH/TXT query to it. The idea behind this was to make the
script wait until named loads all the zones it is configured to serve
before telling the system test framework that a given server is ready to
use; this prevents the need to add boilerplate code that waits for a
specific zone to be loaded to each test expecting that.
The problem is that when it looks for "running" messages, the
bin/tests/system/start.pl script assumes that the existence of any such
message in the named.run file indicates that a given named instance has
already finished loading all zones. Meanwhile, some system tests
restart all the named instances they use throughout their lifetime (some
even do that a few times), for example to run Python-based tests. The
bin/tests/system/start.pl script handles such a scenario incorrectly: as
soon as it finds any "running" message in the named.run file it inspects
and it gets a response to a version.bind/CH/TXT query, it tells the
system test framework that a given server is ready to use, which might
not be true - it is possible that only the "version.bind" zone is loaded
at that point and the "running" message found was logged by a
previously-shutdown named instance. This triggers intermittent failures
for Python-based tests.
Fix by improving the logic that the bin/tests/system/start.pl script
uses to detect server startup: check how many "running" lines are
present in a given named.run file before attempting to start a named
instance and only proceed with version.bind/CH/TXT queries when the
number of "running" lines found in that named.run file increases after
the server is started.
Michał Kępień [Tue, 11 Oct 2022 09:54:57 +0000 (11:54 +0200)]
Do not truncate ns2 logs in the "rrsetorder" test
In the "rrsetorder" system test, the ns2 named instance is restarted
without passing the --restart option to bin/tests/system/start.pl. This
causes the log file for that named instance to be needlessly truncated.
Prevent this from happening by restarting the affected named instance
in the same way as all the other named instances used in system tests.
Ondřej Surý [Tue, 4 Oct 2022 15:07:19 +0000 (17:07 +0200)]
Record the 'edns-udp-size' in the view, not in the resolver
Getting the recorded value of 'edns-udp-size' from the resolver requires
strong attach to the dns_view because we are accessing `view->resolver`.
This is not the case in places (f.e. dns_zone unit) where `.udpsize` is
accessed. By moving the .udpsize field from `struct dns_resolver` to
`struct dns_view`, we can access the value directly even with weakly
attached dns_view without the need to lock the view because `.udpsize`
can be accessed after the dns_view object has been shut down.
Ondřej Surý [Mon, 3 Oct 2022 13:54:12 +0000 (15:54 +0200)]
Resolve violation of weak referencing dns_view
The dns_view implements weak and strong reference counting. When strong
reference counting reaches zero, the adb, ntatable and resolver objects
are shut down and detached.
In dns_zone and dns_nta the dns_view was weakly attached, but the
view->resolver reference was accessed directly leading to dereferencing
the NULL pointer.
Add dns_view_getresolver() method which attaches to view->resolver
object under the lock (if it still exists) ensuring the dns_resolver
will be kept referenced until not needed.
Tony Finch [Wed, 5 Oct 2022 11:11:41 +0000 (12:11 +0100)]
Avoid dead code warning when using a constant boolean
The value of `sign_bit` is platform-dependent but constant at compile
time. Use a cast to convert the boolean `sign_bit` to 0 or 1 instead of
ternary `?:` because one branch of the conditional is dead code. (We
could leave out the cast to `size_t` but our style prefers to handle
booleans more explicitly, hence the `?:` that caused the issue.)
*** CID 358310: Possible Control flow issues (DEADCODE)
/lib/isc/resource.c: 118 in isc_resource_setlimit()
112 * rlim_t, and whether rlim_t has a sign bit.
113 */
114 isc_resourcevalue_t rlim_max = UINT64_MAX;
115 size_t wider = sizeof(rlim_max) - sizeof(rlim_t);
116 bool sign_bit = (double)(rlim_t)-1 < 0;
117
>>> CID 358310: Possible Control flow issues (DEADCODE)
>>> Execution cannot reach the expression "1" inside this statement: "rlim_max >>= 8UL * wider + ...".
118 rlim_max >>= CHAR_BIT * wider + (sign_bit ? 1 : 0);
119 rlim_value = ISC_MIN(value, rlim_max);
120 }
121
122 rl.rlim_cur = rl.rlim_max = rlim_value;
123 unixresult = setrlimit(unixresource, &rl);
Ondřej Surý [Fri, 26 Aug 2022 09:58:51 +0000 (11:58 +0200)]
Use isc_mem_regetx() when appropriate
While refactoring the isc_mem_getx(...) usage, couple places were
identified where the memory was resized manually. Use the
isc_mem_reget(...) that was introduced in [GL !5440] to resize the
arrays via function rather than a custom code.
Ondřej Surý [Fri, 26 Aug 2022 09:58:51 +0000 (11:58 +0200)]
Use designated initializers instead of memset()/MEM_ZERO for structs
In several places, the structures were cleaned with memset(...)) and
thus the semantic patch converted the isc_mem_get(...) to
isc_mem_getx(..., ISC_MEM_ZERO). Use the designated initializer to
initialized the structures instead of zeroing the memory with
ISC_MEM_ZERO flag as this better matches the intended purpose.
Ondřej Surý [Fri, 3 Jun 2022 10:23:49 +0000 (12:23 +0200)]
Replace isc_mem_*_aligned(..., alignment) with isc_mem_*x(..., flags)
Previously, the isc_mem_get_aligned() and friends took alignment size as
one of the arguments. Replace the specific function with more generic
extended variant that now accepts ISC_MEM_ALIGN(alignment) for aligned
allocations and ISC_MEM_ZERO for allocations that zeroes
the (re-)allocated memory before returning the pointer to the caller.
Petr Špaček [Tue, 4 Oct 2022 09:00:54 +0000 (11:00 +0200)]
Remove manually defined anchors pointing to statement definitions
This is hopefully end of duplication. This batch did not cause clashes
in Sphinx but it was pointless nonetheless as we have auto-generated
anchors for all statements.
Petr Špaček [Fri, 30 Sep 2022 11:57:17 +0000 (13:57 +0200)]
Deduplicate link anchors in the ARM
Some statement names like "allow-query" had manually defined link anchor
_allow-query and also implicit anchor created by
.. namedconf:statement:: syntax. This causes warnings if a ambiguous
reference is made using :any:`allow-query` syntax.
Remove (hopefully all) manually defined anchors which pointed to
identical place as the implicit anchor. This allows :any: to work.
In rare cases where manual anchor points to descriptive text separated
from statement definition the reference was disamguated by replacing
:any:`notify` with :ref:`notify` (for manual anchor)
vs. :namedconf:ref:`notify` (for statement definition).
Please note that `options` statement is a trap: It is ambiguous even
without manual anchor because rndc.conf has its own `options`. Use
:namedconf:ref:`options` vs. :rndcconf:ref:`options` to select
appropriate target.
We forgot to update TSAN paths when moving all the unit tests to
/tests/. Let's remove paths from find to make it less dependent on
exact location, and store all untracked files as we do in the normal
unit test template.
If refresh stale RRset times out, start stale-refresh-time
The previous commit failed some tests because we expect that if a
fetch fails and we have stale candidates in cache, the
stale-refresh-time window is started. This means that if we hit a stale
entry in cache and answering stale data is allowed, we don't bother
resolving it again for as long we are within the stale-refresh-time
window.
This is useful for two reasons:
- If we failed to fetch the RRset that we are looking for, we are not
hammering the authoritative servers.
- Successor clients don't need to wait for stale-answer-client-timeout
to get their DNS response, only the first one to query will take
the latency penalty.
The latter is not useful when stale-answer-client-timeout is 0 though.
So this exception code only to make sure we don't try to refresh the
RRset again if it failed to do so recently.
Refreshing a stale RRset is similar to prefetching an RRset, so
reuse the existing code. When refreshing an RRset we need to clear
all db options related to serve-stale so that stale RRsets in cache
are ignored during the refresh.
We no longer need to set the "nodetach" flag, because the refresh
fetch is now a "fetch and forget". So we can detach from the client
in the query_send().
This code will break some serve-stale test cases, this will be fixed
in the successor commit.
TODO: add explanation why the serve-stale test cases fail.
Ondřej Surý [Mon, 27 Jun 2022 08:13:19 +0000 (10:13 +0200)]
Add a case-insensitive option directly to siphash 2-4 implementation
Formerly, the isc_hash32() would have to change the key in a local copy
to make it case insensitive. Change the isc_siphash24() and
isc_halfsiphash24() functions to lowercase the input directly when
reading it from the memory and converting the uint8_t * array to
64-bit (respectively 32-bit numbers).
Mark Andrews [Thu, 15 Sep 2022 06:04:43 +0000 (16:04 +1000)]
Add support for 'dohpath' to SVCB (and HTTPS)
dohpath is specfied in draft-ietf-add-svcb-dns and has a value
of 7. It must be a relative path (start with a /), be encoded
as UTF8 and contain the variable dns ({?dns}).
Tony Finch [Tue, 20 Sep 2022 13:32:01 +0000 (14:32 +0100)]
Avoid signed integer overflow in isc_resource_setlimit()
On systems with signed rlim_t the old code calculated its maximum
value by shifting 1 into the sign bit, which is undefined behaviour.
Avoid the bug by using an unsigned shift.
Be more patient when stopping servers in the system tests
When the TCP test is run on the busy server, the server might take a
while to wind the server down because it might still be processing all
that 300k invalid XFR requests.
Increate the rncd wait time to 120 seconds, the SIGTERM time to 300
seconds, and reduce the time to wait for ans servers from 1200 second
to just 120 seconds.
The dns__nta_shutdown() could be run from different threads and it was
accessing nta->timer unlocked. Don't check and stop the timer from
dns__nta_shutdown() directly, but leave it for the async callback.
Because the dns_zonemgr_create() was run before the loopmgr was started,
the isc_ratelimiter API was more complicated that it had to be. Move
the dns_zonemgr_create() to run_server() task which is run on the main
loop, and simplify the isc_ratelimiter API implementation.
The isc_timer is now created in the isc_ratelimiter_create() and
starting the timer is now separate async task as is destroying the timer
in case it's not launched from the loop it was created on. The
ratelimiter tick now doesn't have to create and destroy timer logic and
just stops the timer when there's no more work to do.
This should also solve all the races that were causing the
isc_ratelimiter to be left dangling because the timer was stopped before
the last reference would be detached.