Remi Gacogne [Wed, 7 May 2025 08:52:56 +0000 (10:52 +0200)]
dnsdist: Only set the proxy protocol payload size when actually added
I can think of two cases where we got this wrong:
- the query was initially assigned to a backend using the proxy protocol
payload, then later restarted and assigned to a backend not using it.
The proxy protocol payload size was then kept from the first assignment.
- we failed to actually prepend the proxy protocol payload but the payload
size was updated.
Both cases could cause a corrupted payload to be sent, or an exception to
be raised if the size of the proxy protocol payload was larger than the
size of the initial query.
Remi Gacogne [Wed, 7 May 2025 14:18:25 +0000 (16:18 +0200)]
dnsdist: Gracefully handle missing v6 in backend discovery test
This test has been randomly failing on GH actions lately, and it looks
like it is because we sometimes do not get the IPv6 addresses when
resolving `dns.quad9.net` via the system resolver.
Remi Gacogne [Mon, 5 May 2025 15:03:06 +0000 (17:03 +0200)]
dnsdist: Improve scalability of custom metrics
This commit improves the scalability of custom metrics by:
- being optimistic about the existence of a given metric (including labels):
since most of the time a given metric, even with labels, will be increased
more than once, we can take read-only lock and only fallback to taking a
write lock if we actually have to add a new entry. This is especially
useful when using custom metrics with per-thread Lua, since there is no
global lock involved in this case.
- optimizing the "no label" case, since the Lua FFI interface does not
use anyway: skip the creation (and destruction) of an empty labels
map whenever possible, return an empty string early when combining
empty labels.
It already yields a noticeable improvement when a single thread is used,
but really shines when several threads are processing queries simultaneously.
Remi Gacogne [Mon, 5 May 2025 08:42:20 +0000 (10:42 +0200)]
Fix building with GCC 15.1: missing `cstdint` include
GCC 15.1 complains about a missing `cstdint` include when building
`DNSdist`:
```
In file included from ../doh3.hh:29,
from ../doh3.cc:23:
../noinitvector.hh:67:35: error: ‘uint8_t’ was not declared in this scope
67 | using PacketBuffer = NoInitVector<uint8_t>;
| ^~~~~~~
../noinitvector.hh:7:1: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
6 | #include <vector>
+++ |+#include <cstdint>
7 |
../noinitvector.hh:67:42: error: template argument 1 is invalid
67 | using PacketBuffer = NoInitVector<uint8_t>;
| ^
In file included from ../dnsdist-idstate.hh:27,
from ../doh3.hh:48:
../dnscrypt.hh:247:20: error: ‘PacketBuffer’ has not been declared
247 | void parsePacket(PacketBuffer& packet, bool tcp, time_t now);
| ^~~~~~~~~~~~
../dnscrypt.hh:248:31: error: ‘PacketBuffer’ has not been declared
248 | void getDecrypted(bool tcp, PacketBuffer& packet);
| ^~~~~~~~~~~~
../dnscrypt.hh:249:43: error: ‘PacketBuffer’ has not been declared
249 | void getCertificateResponse(time_t now, PacketBuffer& response) const;
| ^~~~~~~~~~~~
../dnscrypt.hh:250:23: err
```
dnsdist: Fix memory corruption when using `getAddressInfo`
The object holding the callback function, which is translated into
a `LuaContext::LuaFunctionCaller`, needs to be destroyed while holding
the Lua mutex because it will unregister itself from the Lua context,
causing a corruption if a different thread is accessing the Lua context
at the same time.
Oliver Chen [Wed, 30 Apr 2025 03:40:22 +0000 (03:40 +0000)]
Use atomic type for potential read/write race condition
Only a few numerical healthcheck parameters are selected,
and changed to use atomic type for those parameters so as to
avoid potential read/write race conditions.
dnsdist: Gracefully handle timeout/response for a closed HTTP stream
The remote end might very well have already closed the HTTP stream
corresponding to the timeout or response we are processing. While
this means we need to discard the event we were processing, it is
not an unexpected event and we should thus not raise an exception
since the caller cannot do anything about it.
dnsdist: Fix a crash when processing timeouts for incoming DoH queries
This commit fixes a double-free triggered by an exception being raised
while we are processing a timeout for an incoming DoH query. The exception
bypasses the call releasing the smart pointer, and thus the destructor
is called when we reach the end of the function since we own the smart
pointer, but unfortunately it has already been destroyed by the function
that raised the exception. The fix is to release the pointer first,
then call the function, so even if an exception is raised we no longer
own the pointer, and it's clear that the function has taken ownership of it.
dnsdist: Be consistent with regard to health-check modes transition
Calling `setAuto()` on a backend used to set the health-check mode
to `active`, even if it had been set to `lazy` before, which was
quite confusing.
This commit introduces a new method, `setAutoActive()` which can
be used to change the health-check mode to `active`, and alters the
behaviour of `setAuto()` to restore the previous health-check mode
instead. This is a breaking change but since the default health-check
mode is `active` I don't expect to break any existing configurations.
It also introduces a new method, `getHealthCheckMode()`, to inspect
the current mode.
dnsdist: Fix spurious failure of the TCP limits regression tests
The "maximum duration" test used to trigger the maximum number of
TCP read IOs, preventing the next test from being run. This commit
sets the maximum number of TCP read IOs to "unlimited" for this test.
dnsdist: Fix a confusion about contexts/frontends in `getDNSCryptBind`
We internally keep two different frontends (UDP and TCP) for DNSCrypt
configuration binds, but the frontends should not be exposed to the user.
`getDNSCryptBind` should return distinct DNSCrypt contexts, one per
DNSCrypt configuration bind. This was broken during the refactoring
of how frontends are internally kept.
dnsdist: Reduce memory usage with fast-changing dynamic backends
The Lua code in `dnsdist-resolver.lua` periodically check whether
the IP addresses associated with configured backend names have changed,
and if they have it does potentially remove backends whose IP addresses
are no longer present, and adds backends for new IP addresses.
Each backend configured in DNSdist uses a fair amount of memory
because of the states we pre-allocate for UDP queries, and unfortunately
it looks like Lua holds to removed backends for quite a while by default,
causing the removed backends to actually co-exist in memory with the new
ones for a long time, leading to a higher than necessary memory consumption.
This is particularly problematic when you realize that the memory allocator
usually hold onto memory even when the program releases it.
So in environments where IP addresses are updated very often, DNSdist can
consume several times the memory it actually needs, which is not nice.
This commit forces Lua to do a garbage collection cycle after refreshing
servers, thus releasing removed servers much more quickly. This is far
from ideal, but I don't have a better idea yet.
dnsdist: Properly handle buffering in the "max read IOs" test
It is completely possible that the entire query will be sent before
the dnsdist process notices that the number of IOs is larger than the
limit, closes the connection, and the test process is notified of the
socket being closed (for example because of buffering).
So we need to detect that the connection is closed during our attempt
to read the response, rather than while we are sending the query.
This commit does that, and also introduces a slight delay after sending
each byte of the query, increasing the likelihood of the dnsdist process
actually reading the query bytes one by one.