Samir Aguiar [Sat, 17 May 2025 19:40:09 +0000 (19:40 +0000)]
Patch 5: support setting NSID option (SNA-20345)
To ease debugging behind anycast, add support to dnsdist for
returning an identifier of the POP that is handling the request.
This commit introduces a new ``SetEDNSOptionResponseAction``
action that works similarly to ``SetEDNSOptionAction``, but that
can be used for responses.
Remi Gacogne [Tue, 13 May 2025 13:50:21 +0000 (15:50 +0200)]
dnsdist: Fix a crash when TCP queries and responses keep coming
It happens when we keep finding queries waiting for us on the incoming
TCP socket from the client, and responses waiting for us on the TCP
socket to the backend after forwarding a new query. This is quite
unlikely but not impossible to happen, as reported by Renaud Allard
(many thanks for taking the time to investigate the issue!).
Remi Gacogne [Wed, 7 May 2025 08:52:56 +0000 (10:52 +0200)]
dnsdist: Only set the proxy protocol payload size when actually added
I can think of two cases where we got this wrong:
- the query was initially assigned to a backend using the proxy protocol
payload, then later restarted and assigned to a backend not using it.
The proxy protocol payload size was then kept from the first assignment.
- we failed to actually prepend the proxy protocol payload but the payload
size was updated.
Both cases could cause a corrupted payload to be sent, or an exception to
be raised if the size of the proxy protocol payload was larger than the
size of the initial query.
Remi Gacogne [Mon, 5 May 2025 08:42:20 +0000 (10:42 +0200)]
Fix building with GCC 15.1: missing `cstdint` include
GCC 15.1 complains about a missing `cstdint` include when building
`DNSdist`:
```
In file included from ../doh3.hh:29,
from ../doh3.cc:23:
../noinitvector.hh:67:35: error: ‘uint8_t’ was not declared in this scope
67 | using PacketBuffer = NoInitVector<uint8_t>;
| ^~~~~~~
../noinitvector.hh:7:1: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
6 | #include <vector>
+++ |+#include <cstdint>
7 |
../noinitvector.hh:67:42: error: template argument 1 is invalid
67 | using PacketBuffer = NoInitVector<uint8_t>;
| ^
In file included from ../dnsdist-idstate.hh:27,
from ../doh3.hh:48:
../dnscrypt.hh:247:20: error: ‘PacketBuffer’ has not been declared
247 | void parsePacket(PacketBuffer& packet, bool tcp, time_t now);
| ^~~~~~~~~~~~
../dnscrypt.hh:248:31: error: ‘PacketBuffer’ has not been declared
248 | void getDecrypted(bool tcp, PacketBuffer& packet);
| ^~~~~~~~~~~~
../dnscrypt.hh:249:43: error: ‘PacketBuffer’ has not been declared
249 | void getCertificateResponse(time_t now, PacketBuffer& response) const;
| ^~~~~~~~~~~~
../dnscrypt.hh:250:23: err
```
Remi Gacogne [Wed, 7 May 2025 14:18:25 +0000 (16:18 +0200)]
dnsdist: Gracefully handle missing v6 in backend discovery test
This test has been randomly failing on GH actions lately, and it looks
like it is because we sometimes do not get the IPv6 addresses when
resolving `dns.quad9.net` via the system resolver.
dnsdist: Fix memory corruption when using `getAddressInfo`
The object holding the callback function, which is translated into
a `LuaContext::LuaFunctionCaller`, needs to be destroyed while holding
the Lua mutex because it will unregister itself from the Lua context,
causing a corruption if a different thread is accessing the Lua context
at the same time.
Remi Gacogne [Mon, 5 May 2025 09:11:00 +0000 (11:11 +0200)]
dnsdist-1.9.x: Disable code coverage
It seems to be broken for now:
```
🚀 Posting coverage data to https://coveralls.io/api/v1/jobs
HTTP error:
---
Error: Payment Required (402)
Message: {"message":"Repo was paused. Check your subscription.","error":true}
---
```
and I don't really care about coverage on stable branches anyway.
Remi Gacogne [Thu, 6 Mar 2025 08:44:30 +0000 (09:44 +0100)]
dnsdist: Limit # of proxy protocol-enabled outgoing TCP connections
TCP worker threads keep a cache of outgoing TCP connections to a
backend to be able to reuse them for subsequent queries. Proxy
protocol-enabled outgoing TCP connections are trickier because the
proxy protocol payload is sent only once at the beginning of the
TCP connection, contains the source and destination addresses and
ports, and thus the connections can only be reused with the exact
same incoming TCP connection. For this reason these connections are
stored in a specific structure of the incoming connection, instead
of the TCP worker connection cache. However, we can only reuse a
given proxy protocol-enabled outgoing TCP connection for a subsequent
query if the TLV values contained in the proxy-protocol payload
associated to the new query are exactly the same than the ones
associated to the existing query. Up until now, we would keep an
unbounded amount of proxy protocol-enabled connections around if
the TLV values were, for example, randomly assigned per query.
This commit sets a limit on the number of such connections we will
keep around: we will keep at most N connections, where N is the
ratio between the number of concurrent queries on a single TCP
connection supported by the backend and the number of concurrent
queries on a single TCP connection supported by the frontend, with
a hard cap to 5.
Wouter de Vries [Tue, 21 Jan 2025 14:18:57 +0000 (15:18 +0100)]
Adjust Content-Type header for Prometheus endpoint to include version
Prometheus v3 will, by default, be more strict about the content-types
returned from scrape endpoints. With the current value (just
`text/plain`), it would fail to scrape.
In this commit the value is changed from `text/plain` to `text/plain;
version=0.0.4`.
See also [1] and [2]
[1] https://prometheus.io/docs/instrumenting/exposition_formats/
[2] https://prometheus.io/docs/prometheus/3.0/migration/
Remi Gacogne [Mon, 30 Dec 2024 14:55:33 +0000 (15:55 +0100)]
dnsdist: Fix regression tests with Python 3.13
The CA certificates that we are generating as par of our regression tests
were lacking the X.509 `Key Usage` extension, causing TLS validation with
Python 3.13 to fail with:
> certificate verify failed: CA cert does not include key usage extension
It appears that Python 3.13 enables `VERIFY_X509_STRICT` by default, which makes OpenSSL stricter, and thus it chokes on our invalid CA.
dnsdist: Gracefully handle timeout/response for a closed HTTP stream
The remote end might very well have already closed the HTTP stream
corresponding to the timeout or response we are processing. While
this means we need to discard the event we were processing, it is
not an unexpected event and we should thus not raise an exception
since the caller cannot do anything about it.
dnsdist: Fix a crash when processing timeouts for incoming DoH queries
This commit fixes a double-free triggered by an exception being raised
while we are processing a timeout for an incoming DoH query. The exception
bypasses the call releasing the smart pointer, and thus the destructor
is called when we reach the end of the function since we own the smart
pointer, but unfortunately it has already been destroyed by the function
that raised the exception. The fix is to release the pointer first,
then call the function, so even if an exception is raised we no longer
own the pointer, and it's clear that the function has taken ownership of it.
Remi Gacogne [Mon, 10 Feb 2025 10:24:28 +0000 (11:24 +0100)]
dnsdist-1.9.x: Fix compatibility with boost::lockfree >= 1.87.0
In https://github.com/boostorg/lockfree/pull/90 `boost::lockfree::spsc_queue`
introduced moved semantics, which is great, but added restrictions
to the callback functor that did not exist before, breaking the API.
This PR fixes that by updating our callbacks to expect an object
instead of a reference.
Remi Gacogne [Fri, 13 Dec 2024 14:45:31 +0000 (15:45 +0100)]
dnsdist: Fix ECS zero-scope with incoming DoH queries
The zero-scope feature involves a first cache lookup before the ECS
information has been added to the query, then on a miss a second,
regular lookup is done. When we get a response from the backend that
contains an ECS scope set to 0, we can insert it into the cache in a
way that allows using it for all clients, but we must be careful to
use the key that was computed during the first lookup, and not the
second one.
Incoming DoH queries make that even more interesting because while
they are received over TCP, they are initially forwarded to the
backend over UDP but can be retried over TCP if a TC=1 answer is
received. In that case we must be very careful not to insert the
answer into the cache using the wrong protocol, as we don't want to
serve a TC=1 answer to a client contacting us over TCP, for example.
The computation of the cache key and protocol was unfortunately broken
for the incoming query received over DoH, forwarded over UDP and
response has a zero scope case. This commit fixes it.
Remi Gacogne [Wed, 4 Dec 2024 13:39:56 +0000 (14:39 +0100)]
dnsdist: Allow resetting `setWeightedBalancingFactor()` to zero
Zero is the initial value, but until now it was only possible to pass
a value greater than or equal to 1.0 to `setWeightedBalancingFactor()`
so it was not possible to reset it to the default value.
Remi Gacogne [Tue, 26 Nov 2024 08:42:47 +0000 (09:42 +0100)]
Merge pull request #14878 from rgacogne/ddist19-backport-14768
dnsdist-1.9.x: Backport of #14768 - setTicketsKeyAddedHook: pass a std::string to the hook to avoid luawrapper to truncate content at potential null chars