Remi Gacogne [Wed, 7 May 2025 14:18:25 +0000 (16:18 +0200)]
dnsdist: Gracefully handle missing v6 in backend discovery test
This test has been randomly failing on GH actions lately, and it looks
like it is because we sometimes do not get the IPv6 addresses when
resolving `dns.quad9.net` via the system resolver.
dnsdist: Fix memory corruption when using `getAddressInfo`
The object holding the callback function, which is translated into
a `LuaContext::LuaFunctionCaller`, needs to be destroyed while holding
the Lua mutex because it will unregister itself from the Lua context,
causing a corruption if a different thread is accessing the Lua context
at the same time.
Remi Gacogne [Mon, 5 May 2025 09:11:00 +0000 (11:11 +0200)]
dnsdist-1.9.x: Disable code coverage
It seems to be broken for now:
```
🚀 Posting coverage data to https://coveralls.io/api/v1/jobs
HTTP error:
---
Error: Payment Required (402)
Message: {"message":"Repo was paused. Check your subscription.","error":true}
---
```
and I don't really care about coverage on stable branches anyway.
Remi Gacogne [Thu, 6 Mar 2025 08:44:30 +0000 (09:44 +0100)]
dnsdist: Limit # of proxy protocol-enabled outgoing TCP connections
TCP worker threads keep a cache of outgoing TCP connections to a
backend to be able to reuse them for subsequent queries. Proxy
protocol-enabled outgoing TCP connections are trickier because the
proxy protocol payload is sent only once at the beginning of the
TCP connection, contains the source and destination addresses and
ports, and thus the connections can only be reused with the exact
same incoming TCP connection. For this reason these connections are
stored in a specific structure of the incoming connection, instead
of the TCP worker connection cache. However, we can only reuse a
given proxy protocol-enabled outgoing TCP connection for a subsequent
query if the TLV values contained in the proxy-protocol payload
associated to the new query are exactly the same than the ones
associated to the existing query. Up until now, we would keep an
unbounded amount of proxy protocol-enabled connections around if
the TLV values were, for example, randomly assigned per query.
This commit sets a limit on the number of such connections we will
keep around: we will keep at most N connections, where N is the
ratio between the number of concurrent queries on a single TCP
connection supported by the backend and the number of concurrent
queries on a single TCP connection supported by the frontend, with
a hard cap to 5.
Wouter de Vries [Tue, 21 Jan 2025 14:18:57 +0000 (15:18 +0100)]
Adjust Content-Type header for Prometheus endpoint to include version
Prometheus v3 will, by default, be more strict about the content-types
returned from scrape endpoints. With the current value (just
`text/plain`), it would fail to scrape.
In this commit the value is changed from `text/plain` to `text/plain;
version=0.0.4`.
See also [1] and [2]
[1] https://prometheus.io/docs/instrumenting/exposition_formats/
[2] https://prometheus.io/docs/prometheus/3.0/migration/
Remi Gacogne [Mon, 30 Dec 2024 14:55:33 +0000 (15:55 +0100)]
dnsdist: Fix regression tests with Python 3.13
The CA certificates that we are generating as par of our regression tests
were lacking the X.509 `Key Usage` extension, causing TLS validation with
Python 3.13 to fail with:
> certificate verify failed: CA cert does not include key usage extension
It appears that Python 3.13 enables `VERIFY_X509_STRICT` by default, which makes OpenSSL stricter, and thus it chokes on our invalid CA.
dnsdist: Gracefully handle timeout/response for a closed HTTP stream
The remote end might very well have already closed the HTTP stream
corresponding to the timeout or response we are processing. While
this means we need to discard the event we were processing, it is
not an unexpected event and we should thus not raise an exception
since the caller cannot do anything about it.
dnsdist: Fix a crash when processing timeouts for incoming DoH queries
This commit fixes a double-free triggered by an exception being raised
while we are processing a timeout for an incoming DoH query. The exception
bypasses the call releasing the smart pointer, and thus the destructor
is called when we reach the end of the function since we own the smart
pointer, but unfortunately it has already been destroyed by the function
that raised the exception. The fix is to release the pointer first,
then call the function, so even if an exception is raised we no longer
own the pointer, and it's clear that the function has taken ownership of it.
Remi Gacogne [Mon, 10 Feb 2025 10:24:28 +0000 (11:24 +0100)]
dnsdist-1.9.x: Fix compatibility with boost::lockfree >= 1.87.0
In https://github.com/boostorg/lockfree/pull/90 `boost::lockfree::spsc_queue`
introduced moved semantics, which is great, but added restrictions
to the callback functor that did not exist before, breaking the API.
This PR fixes that by updating our callbacks to expect an object
instead of a reference.
Remi Gacogne [Fri, 13 Dec 2024 14:45:31 +0000 (15:45 +0100)]
dnsdist: Fix ECS zero-scope with incoming DoH queries
The zero-scope feature involves a first cache lookup before the ECS
information has been added to the query, then on a miss a second,
regular lookup is done. When we get a response from the backend that
contains an ECS scope set to 0, we can insert it into the cache in a
way that allows using it for all clients, but we must be careful to
use the key that was computed during the first lookup, and not the
second one.
Incoming DoH queries make that even more interesting because while
they are received over TCP, they are initially forwarded to the
backend over UDP but can be retried over TCP if a TC=1 answer is
received. In that case we must be very careful not to insert the
answer into the cache using the wrong protocol, as we don't want to
serve a TC=1 answer to a client contacting us over TCP, for example.
The computation of the cache key and protocol was unfortunately broken
for the incoming query received over DoH, forwarded over UDP and
response has a zero scope case. This commit fixes it.
Remi Gacogne [Wed, 4 Dec 2024 13:39:56 +0000 (14:39 +0100)]
dnsdist: Allow resetting `setWeightedBalancingFactor()` to zero
Zero is the initial value, but until now it was only possible to pass
a value greater than or equal to 1.0 to `setWeightedBalancingFactor()`
so it was not possible to reset it to the default value.
Remi Gacogne [Tue, 26 Nov 2024 08:42:47 +0000 (09:42 +0100)]
Merge pull request #14878 from rgacogne/ddist19-backport-14768
dnsdist-1.9.x: Backport of #14768 - setTicketsKeyAddedHook: pass a std::string to the hook to avoid luawrapper to truncate content at potential null chars
Remi Gacogne [Thu, 3 Oct 2024 07:10:09 +0000 (09:10 +0200)]
dnsdist: Disable eBPF filtering on QUIC (DoQ, DoH3) sockets
The current eBPF code tries to parse the beginning of the DNS payload
to extract the qname for all UDP datagrams, which is not course
not working correctly for QUIC packets. I don't immediately see a way
to identify QUIC packets from our eBPF code, so for now this commit
disables the eBPF filtering feature on QUIC sockets.
dnsdist: Add EDNS to responses generated from raw record data
My reasoning is that it makes sense to add EDNS to responses generated
from DNSdist provided that:
- the initial query had EDNS
- `setAddEDNSToSelfGeneratedResponses` has not been set to `false`
- we are only provided part of the response and not a full response
packet