From: Frantisek Sumsal <frantisek@sumsal.cz>
Date: Tue, 19 May 2026 12:51:56 +0000 (+0200)
Subject: resolve: cap pre-allocation for questions/RRs
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=e7cd836dcffb5f85d66a156904fc68f8b654a290;p=thirdparty%2Fsystemd.git

resolve: cap pre-allocation for questions/RRs

Since [0] and [1] questions & answer RRs from the incoming packets are
parsed into a hashmap to speed things up. The hashmaps are even
pre-allocated to speed things up even more, but there's one caveat - the
size for the pre-allocation comes from one or more fields from the
incoming packets that are under sender's control.

This can be abused by a malicious DNS server which can send a packet
with a spoofed QDCOUNT (for question packets) or ANCOUNT/NSCOUNT/ARCOUNT
(for answer packets). The limit of the final value in both cases is 64K.
This value is then used to pre-allocate the hashmap (via
set_reserve()/ordered_set_reserve(), where the caller also multiplies
the input value by 2 in both cases), which in turns calls
resize_buckets() that memzero()s the pre-allocated area, so all the
pages are faulted in, showing in process' RSS. Each such spoofed packet
then can translate into a ~4 MiB allocation in the systemd-resolved
process, which doesn't sound that bad.

However, this can be further amplified if the spoofed packet ends up in
resolved's cache. So, if the spoofed packet contains one valid A record
and then an OPT record with a spoofed ARCOUNT, the whole packet ends up
in the cache that can hold 4K of entries, which can eventually cause
resolved to keep up to 16 GiB of memory just for the cache (and thanks
to the memzero() above it's all RSS). Note that all this requires
someone with enough privileges to configure resolved to actually point
to such malicious DNS server or it could come from a malicious DHCP
server on the network. This could also get exploited via LLMNR, but in
thas case an attacker would have to match an ID of a valid transaction
for the packet to end up in resolved's cache.

For example, with a malicious DNS already in resolved configuration:

$ resolvectl dns eth0
Link 2 (eth0): 192.168.99.1:5354

Filling resolved's cache:

$ for i in {0..4200}; do resolvectl query test-$i.example.com; done
...
test-4200.example.com: 192.0.2.1               -- link: dummy0

-- Information acquired via protocol DNS in 1.6ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

Yields following memory increase:

$ while :; do grep VmRSS /proc/$(pidof systemd-resolved)/status; sleep 1; done
VmRSS:     14280 kB
VmRSS:     14280 kB
...
VmRSS:    403352 kB
VmRSS:   1017976 kB
VmRSS:   1603876 kB
VmRSS:   2202028 kB
...
VmRSS:  16795724 kB
VmRSS:  16795724 kB

In my testing I also noticed one annoyance - after certain threshold the
RSS increase persisted even after the malicious entries were evicted
from the cache (or flushed via `resolvectl flush-caches`). This was most
likely due to mmap_threshold getting bumped to > 4 MiB and neither cache
eviction nor flush-caches call malloc_trim(0) (via
sd_event_trim_memory() or similar).

To mitigate this, let's cap the pre-allocation to a maximum number of
records the given packet body can realistically contain. If the minimum
size would be, for whatever unlikely reason, not enough, nothing serious
would happen - the hashmap would still get resized automatically by
resize_buckets(), it'd be just slightly slower.

[0] ae45e1a3832fbb6c96707687e42f0b4aaab52c9b
[1] 2d34cf0c16dd8fa71fb593e65ce4734cb61d9170
---

diff --git a/src/shared/dns-packet.c b/src/shared/dns-packet.c
index c8c54e7988a..4fdef2b570a 100644
--- a/src/shared/dns-packet.c
+++ b/src/shared/dns-packet.c
@@ -2463,8 +2463,15 @@ static int dns_packet_extract_question(DnsPacket *p, DnsQuestion **ret_question)
                 if (!keys)
                         return log_oom();
 
-                r = set_reserve(keys, n * 2); /* Higher multipliers give slightly higher efficiency through
-                                               * hash collisions, but the gains quickly drop off after 2. */
+                /* Pre-allocate the question hashmap, but cap the pre-allocation to a number of questions the
+                 * packet can realistically contain. That is, pick the minimal value from the claimed number
+                 * of questions (n) and a maximum number of potential questions the remaining packet data can
+                 * actually contain: p->size - p->rindex are the remaining unread bytes in the packet, and 5U
+                 * is the minimum size of each question - 1 (QNAME) + 2 (QTYPE) + 2 (QCLASS).
+                 *
+                 * Note for the multiplication: higher multipliers give slightly higher efficiency through
+                 * hash collisions, but the gains quickly drop off after 2. */
+                r = set_reserve(keys, MIN(n, (p->size - p->rindex) / 5U) * 2);
                 if (r < 0)
                         return r;
 
@@ -2510,7 +2517,12 @@ static int dns_packet_extract_answer(DnsPacket *p, DnsAnswer **ret_answer) {
         if (n == 0)
                 return 0;
 
-        answer = dns_answer_new(n);
+        /* Pre-allocate the answer hashmap, but cap the pre-allocation to a number of RRs the packet can
+         * realistically contain. That is, pick the minimal value from the claimed number of RRs (n) and a
+         * maximum number of potential RRs the remaining packet data can actually contain: p->size -
+         * p->rindex are the remaining unread bytes in the packet, and the 11U is the minimum size of each RR
+         * - 1 (NAME) + 2 (TYPE) + 2 (CLASS) + 4 (TTL) + 2 (RDLENGTH). */
+        answer = dns_answer_new(MIN(n, (p->size - p->rindex) / 11U));
         if (!answer)
                 return -ENOMEM;