dnsdist: Fix handling of proxy protocol payload outside of TLS for DoT
After reading the proxy protocol payload from the I/O buffer
we were clearing the buffer but failed to properly reset the
position, leading to an exception when trying to read the DNS
payload after processing the TLS handshake:
```
Got an exception while handling (reading) TCP query from 127.0.0.1:59426: Calling tryRead() with a too small buffer (2) for a read of 18446744073709551566 bytes starting at 52
```
The huge value comes from the fact that the position (52 here)
is larger than the size of the buffer (2 at this point to read
the size of the incoming DNS payload), leading to an unsigned
underflow. The code is properly detecting that the value makes
no sense in this context, but the connection is then dropped
because we cannot recover.
It turns out we had a end-to-end test for the "proxy protocol
outside of TLS" case but only over incoming DoH, and the DoH
case avoids this specific issue because the buffer is always
properly resized, and the position updated.
Remi Gacogne [Thu, 27 Jun 2024 14:07:20 +0000 (16:07 +0200)]
dnsdist: Handle Quiche >= 0.22.0
Quiche broke its existing API in 0.22.0: https://github.com/cloudflare/quiche/pull/1726
This pull request adds m4 code to detect whether the Quiche version
we are building against is >= 0.22.0, and if it is defines
`HAVE_QUICHE_STREAM_ERROR_CODES` which is later used by the code
using Quiche to know which version of the API to use.
Otto Moerbeek [Wed, 19 Jun 2024 11:10:15 +0000 (13:10 +0200)]
dns.cc: use pdns::views::UnsignedCharView
Includes minor cleanup and additions to make UnsignedCharView usable for this use case.
Supersedes #14356
Fixes
/usr/include/c++/v1/__fwd/string_view.h:22:41: warning: 'char_traits<unsigned char>' is deprecated: char_traits<T> for T not equal to char, wchar_t, char8_t, char16_t or char32_t is non-standard and is provided for a temporary period. It will be removed in LLVM 19, so please migrate off of it. [-Wdeprecated-declarations]
dnsdist: Fix a data race in the AF_XDP/XSK dnsdist <-> backend code
The existing code was sharing the same XskWorker between the thread
handling incoming queries (possibly replying right away for
self-answered and cache hit responses) and the one handling responses
coming from a backend (without XSK), which is wrong since the internal
queues are single-producer (and single consumer, but a worker is only
assigned to a single XskRouter which is OK).
This commit introduces a new, separate worker for the threads handling
responses coming from a backend without XSK (it was already the case
for responses coming from a backend via XSK). The new worker is marked
"outgoing-only" to ensure we are not confused about what it can be used
for, which is only sending packets, not receiving any.
dnsdist: Fix a race in the XSK/AF_XDP backend handling code
For performance reasons we used to keep a local list of available frames
in our `XskWorker` object, like we are doing in the `XskSocket` one,
to avoid having to go to the shared list which is protected by a lock.
Unfortunately, while it works well for the `XskSocket` because it is
accessed by a single `XskRouter` thread, the `XskWorker` object can
be accessed by multiple threads at once: `XskResponderThread`,
`responderThread`, `XskClientThread` and `XskRouter`. Most of the
time these threads do not acquire nor release frames to the local
list, but `responderThread` does acquire one when a response frame
is punted to the regular networking stack, and all of them release
frames when an unexpected condition occurs, for example when a queue
is full. This leads to memory corruption and to a crash.
This commit gets rid of the local list of frames in the `XskWorker`
object, acquiring and releasing them to the shared list instead, since
performance in these cases is likely not as critical. If it turns out
to be too slow, we can look into caching a few frames in a thread-local
list, but then we need to be careful not to hold on them indefinitely
which might be tricky.
Remi Gacogne [Mon, 17 Jun 2024 10:42:02 +0000 (12:42 +0200)]
dnsdist: Fix a race condition with custom Lua web handlers
Custom web handlers written in Lua modify the global Lua context,
but until now they did not take the lock protecting it so a data
race condition was possible.
Reported by TSAN while running our unit tests.
Remi Gacogne [Tue, 4 Jun 2024 14:28:31 +0000 (16:28 +0200)]
dnsdist: Edit the systemd unit file, `CAP_BPF` is no longer enough
We used to be able to use only `CAP_BPF` since kernel 5.8, but the
eBPF verifier has been made more strict a few versions later and we
now require `CAP_SYS_ADMIN` again.
Your Name [Mon, 15 Apr 2024 13:45:38 +0000 (15:45 +0200)]
YaHTTP: Enforce max # of request fields and max request line size
The default values, 8192 bytes for the maximum request line size and
100 fields, are taken from the default settings of Apache HTTPd:
- https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestline
- https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestfields
Reported by OSS-Fuzz as a timeout in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67993
Remi Gacogne [Fri, 3 May 2024 12:28:12 +0000 (14:28 +0200)]
dnsdist: Reply to HTTP/2 PING frames immediately
We usually buffer a bit to avoid sending a lot of small data chunks
on the wire (or to the kernel anyway), but for `HTTP/2 PING` frames
that are not followed by anything else calling for a response, this
causes an issue as these frames are designed to measure the latency
between a client and a server, and are used by HTTP/2 proxies to
ensure that a connection can be reused.
We did not properly handle incoming XFR requests received over DoH
When a TCP-only or DoT backend was configured, and the nghttp2 provider
used.
This commit fixes the assertion failure and makes sure that XFR
requests are denied with `NOTIMP` when received over DNS over HTTPS,
including DNS over HTTP/3. It also denies them when received over
DNS over QUIC as this is not properly handled at the moment, although
it does not cause a crash.
This fixes an issue in the code dealing with incoming DNS over HTTPS
queries with the nghttp2 provider. In some rare cases, if the incoming
query is forwarded to the backend over TCP and the response comes back
immediately (the `read()` call done just after the `write()` call sending
the query must succeed and yield a complete response), the processing
of the response might end up calling `IncomingHTTP2Connection::readHTTPData()`
down the line, via the `nghttp2` callbacks, while we were already
inside this function. This does not actually work because
`nghttp2_session_mem_recv` is not reentrant, so the internal state of
the `nghttp2_session` object might become inconsistent and trigger
an assertion, for example:
```
nghttp2_session.c:6854: nghttp2_session_mem_recv2: Assertion `iframe->state == NGHTTP2_IB_IGN_ALL' failed.
```
This results in a call to `abort()` and very unlikely to be exploitable,
because there is no memory corruption occurring. It would also be quite
difficult for an attacker to trigger the conditions leading to this event
remotely.
Reported by Daniel Stirnimann from Switch and Stephane Bortzmeyer, many thanks to them.
dnsdist: Fix "C++ One Definition Rule" warnings in XSK
It turns out we need to include the linux specific headers AFTER the
regular ones, because it then detects that some types have already been
defined (`sockaddr_in6` for example) and does not attempt to re-define
them, which otherwise breaks the C++ One Definition Rule