dnsdist: Fix a data race in the AF_XDP/XSK dnsdist <-> backend code
The existing code was sharing the same XskWorker between the thread
handling incoming queries (possibly replying right away for
self-answered and cache hit responses) and the one handling responses
coming from a backend (without XSK), which is wrong since the internal
queues are single-producer (and single consumer, but a worker is only
assigned to a single XskRouter which is OK).
This commit introduces a new, separate worker for the threads handling
responses coming from a backend without XSK (it was already the case
for responses coming from a backend via XSK). The new worker is marked
"outgoing-only" to ensure we are not confused about what it can be used
for, which is only sending packets, not receiving any.
dnsdist: Fix a race in the XSK/AF_XDP backend handling code
For performance reasons we used to keep a local list of available frames
in our `XskWorker` object, like we are doing in the `XskSocket` one,
to avoid having to go to the shared list which is protected by a lock.
Unfortunately, while it works well for the `XskSocket` because it is
accessed by a single `XskRouter` thread, the `XskWorker` object can
be accessed by multiple threads at once: `XskResponderThread`,
`responderThread`, `XskClientThread` and `XskRouter`. Most of the
time these threads do not acquire nor release frames to the local
list, but `responderThread` does acquire one when a response frame
is punted to the regular networking stack, and all of them release
frames when an unexpected condition occurs, for example when a queue
is full. This leads to memory corruption and to a crash.
This commit gets rid of the local list of frames in the `XskWorker`
object, acquiring and releasing them to the shared list instead, since
performance in these cases is likely not as critical. If it turns out
to be too slow, we can look into caching a few frames in a thread-local
list, but then we need to be careful not to hold on them indefinitely
which might be tricky.
Remi Gacogne [Mon, 17 Jun 2024 10:42:02 +0000 (12:42 +0200)]
dnsdist: Fix a race condition with custom Lua web handlers
Custom web handlers written in Lua modify the global Lua context,
but until now they did not take the lock protecting it so a data
race condition was possible.
Reported by TSAN while running our unit tests.
Remi Gacogne [Tue, 4 Jun 2024 14:28:31 +0000 (16:28 +0200)]
dnsdist: Edit the systemd unit file, `CAP_BPF` is no longer enough
We used to be able to use only `CAP_BPF` since kernel 5.8, but the
eBPF verifier has been made more strict a few versions later and we
now require `CAP_SYS_ADMIN` again.
Your Name [Mon, 15 Apr 2024 13:45:38 +0000 (15:45 +0200)]
YaHTTP: Enforce max # of request fields and max request line size
The default values, 8192 bytes for the maximum request line size and
100 fields, are taken from the default settings of Apache HTTPd:
- https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestline
- https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestfields
Reported by OSS-Fuzz as a timeout in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67993
Remi Gacogne [Fri, 3 May 2024 12:28:12 +0000 (14:28 +0200)]
dnsdist: Reply to HTTP/2 PING frames immediately
We usually buffer a bit to avoid sending a lot of small data chunks
on the wire (or to the kernel anyway), but for `HTTP/2 PING` frames
that are not followed by anything else calling for a response, this
causes an issue as these frames are designed to measure the latency
between a client and a server, and are used by HTTP/2 proxies to
ensure that a connection can be reused.
We did not properly handle incoming XFR requests received over DoH
When a TCP-only or DoT backend was configured, and the nghttp2 provider
used.
This commit fixes the assertion failure and makes sure that XFR
requests are denied with `NOTIMP` when received over DNS over HTTPS,
including DNS over HTTP/3. It also denies them when received over
DNS over QUIC as this is not properly handled at the moment, although
it does not cause a crash.
This fixes an issue in the code dealing with incoming DNS over HTTPS
queries with the nghttp2 provider. In some rare cases, if the incoming
query is forwarded to the backend over TCP and the response comes back
immediately (the `read()` call done just after the `write()` call sending
the query must succeed and yield a complete response), the processing
of the response might end up calling `IncomingHTTP2Connection::readHTTPData()`
down the line, via the `nghttp2` callbacks, while we were already
inside this function. This does not actually work because
`nghttp2_session_mem_recv` is not reentrant, so the internal state of
the `nghttp2_session` object might become inconsistent and trigger
an assertion, for example:
```
nghttp2_session.c:6854: nghttp2_session_mem_recv2: Assertion `iframe->state == NGHTTP2_IB_IGN_ALL' failed.
```
This results in a call to `abort()` and very unlikely to be exploitable,
because there is no memory corruption occurring. It would also be quite
difficult for an attacker to trigger the conditions leading to this event
remotely.
Reported by Daniel Stirnimann from Switch and Stephane Bortzmeyer, many thanks to them.
dnsdist: Fix "C++ One Definition Rule" warnings in XSK
It turns out we need to include the linux specific headers AFTER the
regular ones, because it then detects that some types have already been
defined (`sockaddr_in6` for example) and does not attempt to re-define
them, which otherwise breaks the C++ One Definition Rule
Remi Gacogne [Fri, 29 Mar 2024 13:22:40 +0000 (14:22 +0100)]
dnsdist: Release incoming TCP connection right away on backend failure
We used to keep a shared pointer to the incoming TCP connection around
in `TCPConnectionToBackend::d_currentQuery.d_sender` even after all queries
sent to the backend failed, which prevented the incoming TCP connection
from being closed as soon as it should have.
Remi Gacogne [Fri, 29 Mar 2024 14:14:55 +0000 (15:14 +0100)]
FDWrapper: Do not try to close negative file descriptors
It turns out that some of the BPF helper functions return
a negative `errno` value in case of failure, and since we
wrap the return value into a `FDWrapper` right away this
led to a warning from Valgrind about trying to close an
invalid file descriptor.
Remi Gacogne [Thu, 28 Mar 2024 09:27:15 +0000 (10:27 +0100)]
dnsdist: Increase the HTTP/1.1 query counter when DoH with 1.1 ALPN
This way we can keep track of how many HTTP/1.1 connections attempt
we see. We will not actually process the DNS over HTTP/1.1 payload
anyway when the `nghttp2` provider is used.
Remi Gacogne [Fri, 29 Mar 2024 13:12:29 +0000 (14:12 +0100)]
dnsdist: Fix a null-deref in incoming DoH w/ nghttp2
When an incoming DoH connection using the `nghttp2` provider is waiting
for a response from a backend that results in a I/O error or timeout,
and the incoming connection also fails due to a I/O error or timeout,
dnsdist could in some cases try to dereference a null pointer, leading
to a crash.