else {
mac = getMACAddress(ret->d_config.remote);
if (mac.size() != ret->d_config.destMACAddr.size()) {
- throw runtime_error("Field 'MACAddr' is not set on 'newServer' directive for '" + ret->d_config.remote.toStringWithPort() + "' and cannot be retriever from the system either!");
+ throw runtime_error("Field 'MACAddr' is not set on 'newServer' directive for '" + ret->d_config.remote.toStringWithPort() + "' and cannot be retrieved from the system either!");
}
memcpy(ret->d_config.destMACAddr.data(), mac.data(), ret->d_config.destMACAddr.size());
}
tls-sessions-management
internal-design
asynchronous-processing
+ xsk
When dealing with a large traffic load, it might happen that the internal pipe used to pass queries between the threads handling the incoming connections and the one getting a response from the backend become full too quickly, degrading performance and causing timeouts. This can be prevented by increasing the size of the internal pipe buffer, via the `internalPipeBufferSize` option of :func:`addDOHLocal`. Setting a value of `1048576` is known to yield good results on Linux.
+UDP buffer sizes
+----------------
+
+The operating system usually maintains buffers of incoming and outgoing datagrams for UDP sockets, to deal with short spikes where packets are received or emitted faster than the network layer can process them. On medium to large setups, it is usually useful to increase these buffers to deal with large spikes. This can be done via the :func:`setUDPSocketBufferSizes`.
+
Outgoing DoH
------------
+---------------------------------+-----------------------------+
| DoH (w/ releaseBuffers) | 15 kB |
+---------------------------------+-----------------------------+
+
+Firewall connection tracking
+----------------------------
+
+When dealing with a lot of queries per second, dnsdist puts a severe stress on any stateful (connection tracking) firewall, so much so that the firewall may fail.
+
+Specifically, many Linux distributions run with a connection tracking firewall configured. For high load operation (thousands of queries/second), it is advised to either turn off ``iptables`` and ``nftables`` completely, or use the ``NOTRACK`` feature to make sure client DNS traffic bypasses the connection tracking.
+
+Network interface receive queues
+--------------------------------
+
+Most high-speed (>= 10 Gbps) network interfaces support multiple queues to offer better performance, using hashing to dispatch incoming packets into a specific queue.
+
+Unfortunately the default hashing algorithm is very often considering the source and destination addresses only, which might be an issue when dnsdist is placed behind a frontend, for example.
+
+On Linux it is possible to inspect the current network flow hashing policy via ``ethtool``::
+
+ $ sudo ethtool -n enp1s0 rx-flow-hash udp4
+ UDP over IPV4 flows use these fields for computing Hash flow key:
+ IP SA
+ IP DA
+
+In this example only the source (``IP SA``) and destination (``IP DA``) addresses are indeed used, meaning that all packets coming from the same source address to the same destination address will end up in the same receive queue, which is not optimal. To take the source and destination ports into account as well::
+
+ $ sudo ethtool -N enp1s0 rx-flow-hash udp4 sdfn
+ $
--- /dev/null
+``AF_XDP`` / ``XSK``
+====================
+
+Since 1.9.0, :program:`dnsdist` can use `AF_XDP <https://www.kernel.org/doc/html/v4.18/networking/af_xdp.html>`_ for high performance UDP packet processing recent Linux kernels (4.18+). It requires :program:`dnsdist` to have the ``CAP_NET_ADMIN`` and ``CAP_SYS_ADMIN`` capabilities at startup, and to have been compiled with the ``--with-xsk`` configure option.
+
+.. note::
+ To retain the required capabilities it is necessary to call :func:`addCapabilitiesToRetain` during startup, as :program:`dnsdist` drops capabilities after startup.
+
+.. note::
+ ``AppArmor`` users might need to update their policy to allow :program:`dnsdist` to keep the capabilities. Adding ``capability sys_admin,`` (for ``CAP_SYS_ADMIN``) and ``capability net_admin,`` (for ``CAP_NET_ADMIN``) lines to the policy file is usually enough.
+
+The way ``AF_XDP`` works is that :program:`dnsdist` allocates a number of frames in a memory area called a ``UMEM``, which is accessible both by the program, in userspace, and by the kernel. Using in-memory ring buffers, the receive (``RX``), transmit (``TX``), completion (``cq``) and fill (``fq``) rings, the kernel can very efficiently pass raw incoming packets to :program:`dnsdist`, which can in return pass raw outgoing packets to the kernel.
+In addition to these, an ``eBPF`` ``XDP`` program needs to be loaded to decide which packets to distribute via the ``AF_XDP`` socket (and to which, as there are usually more than one). This program uses a ``BPF`` map of type ``XSKMAP`` (located at ``/sys/fs/bpf/dnsdist/xskmap`` by default) that is populated by :program:``dnsdist` at startup to locate the ``AF_XDP`` socket to use. :program:`dnsdist` also sets up two additional ``BPF`` maps (located at ``/sys/fs/bpf/dnsdist/xsk-destinations-v4`` and ``/sys/fs/bpf/dnsdist/xsk-destinations-v6``) to let the ``XDP`` program know which IP destinations are to be routed to the ``AF_XDP`` sockets and which are to be passed to the regular network stack (health-checks queries and responses, for example). A ready-to-use `XDP program <https://github.com/PowerDNS/pdns/blob/master/contrib/xdp.py>`_ can be found in the ``contrib`` directory of the PowerDNS Git repository. The name of the network interface to use might have to be updated, though. Once it has been updated, the ``XDP`` program can be started::
+
+ $ python xdp.py
+
+Then :program:`dnsdist` needs to be configured to use ``AF_XDP``, first by creating a :class:`XskSocket` object that are tied to a specific queue of a specific network interface:
+
+.. code-block:: lua
+
+ xsk = newXsk({ifName="enp1s0", NIC_queue_id=0, frameNums=65536, xskMapPath="/sys/fs/bpf/dnsdist/xskmap"})
+
+This ties the new object to the first receive queue on ``enp1s0``, allocating 65536 frames and populating the map located at ``/sys/fs/bpf/dnsdist/xskmap``.
+
+Then we can tell :program:`dnsdist` to listen for ``AF_XDP`` packets to ``108.61.103.88:53``, in addition to packets coming via the regular network stack:
+
+.. code-block:: lua
+
+ addLocal("192.0.2.1:53", {xskSocket=xsk})
+
+In practice most high-speed (>= 10 Gbps) network interfaces support multiple queues to offer better performance, so we need to allocate one :class:`XskSocket` per queue. We can retrieve the number of queues for a given interface via::
+
+ $ sudo ethtool -l enp1s0
+ Channel parameters for enp1s0:
+ Pre-set maximums:
+ RX: n/a
+ TX: n/a
+ Other: 1
+ Combined: 8
+ Current hardware settings:
+ RX: n/a
+ TX: n/a
+ Other: 1
+ Combined: 8
+
+The ``Combined`` lines tell us that the interface supports 8 queues, so we can do something like this:
+
+.. code-block:: lua
+
+ for i=1,8 do
+ xsk = newXsk({ifName="enp1s0", NIC_queue_id=i-1, frameNums=65536, xskMapPath="/sys/fs/bpf/dnsdist/xskmap"})
+ addLocal("192.0.2.1:53", {xskSocket=xsk, reusePort=true})
+ end
+
+We can also instructs :program:`dnsdist` to use ``AF_XDP`` to send and receive UDP packets to a backend:
+
+.. code-block:: lua
+
+ newServer("192.0.2.2:53", {xskSocket=xsk})
+
+We are not passing the MAC address of the backend (or the gateway to reach it) directly, so :program:`dnsdist` will try to fetch it from the system MAC address cache. This may not work, in which case we might need to pass explicitly:
+
+.. code-block:: lua
+
+ newServer("192.0.2.2:53", {xskSocket=xsk, MACAddr='00:11:22:33:44:55'})
+
+
+Performance
+-----------
+
+Using `kxdpgun <https://www.knot-dns.cz/docs/latest/html/man_kxdpgun.html>`_, we can compare the performance of :program:`dnsdist` using the regular network stack and ``AF_XDP``.
+
+This test was realized using two Intel E3-1270 with 4 cores (8 threads) running at 3.8 Ghz, using 10 Gbps network cards. On both the injector running ``kxdpgun`` and the box running :program:`dnsdist` there was no firewall, the governor was set to ``performance``, the UDP buffers were raised to ``16777216`` and the receive queue hash policy set to use the IP addresses and ports (see :doc:`tuning`).
+
+:program:`dnsdist` was configured to immediately respond to incoming queries with ``REFUSED``:
+
+.. code-block:: lua
+
+ addAction(AllRule(), RCodeAction(DNSRCode.REFUSED))
+
+On the injector box we executed::
+
+ $ sudo kxdpgun -Q 2500000 -p 53 -i random_1M 192.0.2.1 -t 60
+ using interface enp1s0, XDP threads 8, UDP, native mode
+ [...]
+
+We first ran without ``AF_XDP``:
+
+.. code-block:: lua
+
+ for i=1,8 do
+ addLocal("192.0.2.1:53", {reusePort=true})
+ end
+
+then with:
+
+.. code-block:: lua
+
+ for i=1,8 do
+ xsk = newXsk({ifName="enp1s0", NIC_queue_id=i-1, frameNums=65536, xskMapPath="/sys/fs/bpf/dnsdist/xskmap"})
+ addLocal("192.0.2.1:53", {xskSocket=xsk, reusePort=true})
+ end
+
+.. figure:: ../imgs/af_xdp_refused_qps.png
+ :align: center
+ :alt: AF_XDP QPS
+
+.. figure:: ../imgs/af_xdp_refused_cpu.png
+ :align: center
+ :alt: AF_XDP CPU
+
+The first run handled roughly 1 million QPS, the second run 2.5 millions, with the CPU usage being much lower in the ``AF_XDP`` case.
Added ``maxInFlight`` and ``maxConcurrentTCPConnections`` parameters.
.. versionchanged:: 1.9.0
- Added ``enableProxyProtocol`` parameter, which was always ``true`` before 1.9.0.
+ Added the ``enableProxyProtocol`` parameter, which was always ``true`` before 1.9.0, and the``xskSocket`` one.
Add to the list of listen addresses. Note that for IPv6 link-local addresses, it might be necessary to specify the interface to use: ``fe80::1%eth0``. On recent Linux versions specifying the interface via the ``interface`` parameter should work as well.
* ``maxInFlight=0``: int - Maximum number of in-flight queries. The default is 0, which disables out-of-order processing.
* ``maxConcurrentTCPConnections=0``: int - Maximum number of concurrent incoming TCP connections. The default is 0 which means unlimited.
* ``enableProxyProtocol=true``: str - Whether to expect a proxy protocol v2 header in front of incoming queries coming from an address in :func:`setProxyProtocolACL`. Default is ``true``, meaning that queries are expected to have a proxy protocol payload if they come from an address present in the :func:`setProxyProtocolACL` ACL.
+ * ``xskSocket``: :class:`XskSocket` - A socket to enable ``XSK`` / ``AF_XDP`` support for this frontend. See :doc:`../advanced/xsk` for more information.
.. code-block:: lua
``additionalAddresses``, ``ignoreTLSConfigurationErrors`` and ``keepIncomingHeaders`` options added.
.. versionchanged:: 1.9.0
- ``enableProxyProtocol``, ``library``, ``readAhead`` and ``proxyProtocolOutsideTLS`` options added.
+ ``enableProxyProtocol``, ``library``, ``proxyProtocolOutsideTLS`` and ``readAhead`` options added.
Listen on the specified address and TCP port for incoming DNS over HTTPS connections, presenting the specified X.509 certificate.
If no certificate (or key) files are specified, listen for incoming DNS over HTTP connections instead.
* ``library``: str - Which underlying HTTP2 library should be used, either h2o or nghttp2. Until 1.9.0 only h2o was available, but the use of this library is now deprecated as it is no longer maintained. nghttp2 is the new default since 1.9.0.
* ``readAhead``: bool - When the TLS provider is set to OpenSSL, whether we tell the library to read as many input bytes as possible, which leads to better performance by reducing the number of syscalls. Default is true.
* ``proxyProtocolOutsideTLS``: bool - When the use of incoming proxy protocol is enabled, whether the payload is prepended after the start of the TLS session (so inside, meaning it is protected by the TLS layer providing encryption and authentication) or not (outside, meaning it is in clear-text). Default is false which means inside. Note that most third-party software like HAproxy expect the proxy protocol payload to be outside, in clear-text.
- * ``enableProxyProtocol=true``: str - Whether to expect a proxy protocol v2 header in front of incoming queries coming from an address in :func:`setProxyProtocolACL`. Default is ``true``, meaning that queries are expected to have a proxy protocol payload if they come from an address present in the :func:`setProxyProtocolACL` ACL.
+ * ``enableProxyProtocol=true``: bool - Whether to expect a proxy protocol v2 header in front of incoming queries coming from an address in :func:`setProxyProtocolACL`. Default is ``true``, meaning that queries are expected to have a proxy protocol payload if they come from an address present in the :func:`setProxyProtocolACL` ACL.
.. function:: addDOH3Local(address, certFile(s), keyFile(s) [, options])
Added ``autoUpgrade``, ``autoUpgradeDoHKey``, ``autoUpgradeInterval``, ``autoUpgradeKeep``, ``autoUpgradePool``, ``maxConcurrentTCPConnections``, ``subjectAddr``, ``lazyHealthCheckSampleSize``, ``lazyHealthCheckMinSampleCount``, ``lazyHealthCheckThreshold``, ``lazyHealthCheckFailedInterval``, ``lazyHealthCheckMode``, ``lazyHealthCheckUseExponentialBackOff``, ``lazyHealthCheckMaxBackOff``, ``lazyHealthCheckWhenUpgraded``, ``healthCheckMode`` and ``ktls`` to server_table.
.. versionchanged:: 1.9.0
- Added ``proxyProtocolAdvertiseTLS`` to server_table.
+ Added ``MACAddr``, ``proxyProtocolAdvertiseTLS`` and ``xskSocket`` to server_table.
:param str server_string: A simple IP:PORT string.
:param table server_table: A table with at least an ``address`` key
``lazyHealthCheckWhenUpgraded`` ``bool`` "Whether the auto-upgraded version of this backend (see ``autoUpgrade``) should use the lazy health-checking mode. Default is false, which means it will use the regular health-checking mode."
``ktls`` ``bool`` "Whether to enable the experimental kernel TLS support on Linux, if both the kernel and the OpenSSL library support it. Default is false. Currently both DoT and DoH backend support this option."
``proxyProtocolAdvertiseTLS`` ``bool`` "Whether to set the SSL Proxy Protocol TLV in the proxy protocol payload sent to the backend if the query was received over an encrypted channel (DNSCrypt, DoQ, DoH or DoT). Requires ``useProxyProtocol=true``. Default is false."
+ ``xskSocket`` :class:`XskSocket` "A socket to enable ``XSK`` / ``AF_XDP`` support for this backend. See :doc:`../advanced/xsk` for more information."
+ ``MACAddr`` ``str`` "When the ``xskSocket`` option is set, this parameter can be used to specify the destination MAC address to use to reach the backend. If this options is not specified, dnsdist will try to get it from the IP of the backend by looking into the system's MAC address table, but it will fail if the corresponding MAC address is not present."
.. function:: getServer(index) -> Server
web
svc
custommetrics
+ xsk
Set the size of the receive (``SO_RCVBUF``) and send (``SO_SNDBUF``) buffers for incoming UDP sockets. On Linux the default
values correspond to ``net.core.rmem_default`` and ``net.core.wmem_default`` , and the maximum values are restricted
by ``net.core.rmem_max`` and ``net.core.wmem_max``.
+ Since 1.9.0, on Linux, dnsdist will automatically try to raise the buffer sizes to the maximum value allowed by the system (``net.core.rmem_max`` and ``net.core.wmem_max``) if :func:`setUDPSocketBufferSizes` is not set.
:param int recv: ``SO_RCVBUF`` value. Default is 0, meaning the system value will be kept.
:param int send: ``SO_SNDBUF`` value. Default is 0, meaning the system value will be kept.
--- /dev/null
+XSK / AF_XDP functions and objects
+==================================
+
+These are all the functions, objects and methods related to :doc:`../advanced/xsk`.
+
+.. function:: newXSK(options)
+
+ .. versionadded:: 1.9.0
+
+ This function creates a new :class:`XskSocket` object, tied to a network interface and queue, to accept ``XSK`` / ``AF_XDP`` packet from the Linux kernel. The returned object can be passed as a parameter to :func:`addLocal` to use XSK for ``UDP`` packets between clients and dnsdist. It can also be passed to ``newServer`` to use XSK for ``UDP`` packets between dnsdist a backend.
+
+ :param table options: A table with key: value pairs with listen options.
+
+ Options:
+
+ * ``ifName``: str - The name of the network interface this object will be tied to.
+ * ``NIC_queue_id``: int - The queue of the network interface this object will be tied to.
+ * ``frameNums``: int - The number of ``UMEM`` frames to allocate for this socket. More frames mean that a higher number of packets can be processed at the same time. 65535 is a good choice for maximum performance.
+ * ``xskMapPath``: str - The path of the BPF map used to communicate with the kernel space XDP program, usually ``/sys/fs/bpf/dnsdist/xskmap``.
+
+.. class:: XskSocket
+
+ .. versionadded:: 1.9.0
+
+ Represents a ``XSK`` / ``AF_XDP`` socket tied to a specific network interface and queue. This object can be created via :func:``newXSK`` and passed to :func:`addLocal` to use XSK for ``UDP`` packets between clients and dnsdist. It can also be passed to ``newServer`` to use XSK for ``UDP`` packets between dnsdist a backend.
+
+ .. method:: XskSocket:getMetrics() -> str
+
+ Returns a string containing ``XSK`` / ``AF_XDP`` metrics for this object, as reported by the Linux kernel.