From: Frantisek Sumsal Date: Tue, 19 May 2026 12:51:56 +0000 (+0200) Subject: resolve: cap pre-allocation for questions/RRs X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=e7cd836dcffb5f85d66a156904fc68f8b654a290;p=thirdparty%2Fsystemd.git resolve: cap pre-allocation for questions/RRs Since [0] and [1] questions & answer RRs from the incoming packets are parsed into a hashmap to speed things up. The hashmaps are even pre-allocated to speed things up even more, but there's one caveat - the size for the pre-allocation comes from one or more fields from the incoming packets that are under sender's control. This can be abused by a malicious DNS server which can send a packet with a spoofed QDCOUNT (for question packets) or ANCOUNT/NSCOUNT/ARCOUNT (for answer packets). The limit of the final value in both cases is 64K. This value is then used to pre-allocate the hashmap (via set_reserve()/ordered_set_reserve(), where the caller also multiplies the input value by 2 in both cases), which in turns calls resize_buckets() that memzero()s the pre-allocated area, so all the pages are faulted in, showing in process' RSS. Each such spoofed packet then can translate into a ~4 MiB allocation in the systemd-resolved process, which doesn't sound that bad. However, this can be further amplified if the spoofed packet ends up in resolved's cache. So, if the spoofed packet contains one valid A record and then an OPT record with a spoofed ARCOUNT, the whole packet ends up in the cache that can hold 4K of entries, which can eventually cause resolved to keep up to 16 GiB of memory just for the cache (and thanks to the memzero() above it's all RSS). Note that all this requires someone with enough privileges to configure resolved to actually point to such malicious DNS server or it could come from a malicious DHCP server on the network. This could also get exploited via LLMNR, but in thas case an attacker would have to match an ID of a valid transaction for the packet to end up in resolved's cache. For example, with a malicious DNS already in resolved configuration: $ resolvectl dns eth0 Link 2 (eth0): 192.168.99.1:5354 Filling resolved's cache: $ for i in {0..4200}; do resolvectl query test-$i.example.com; done ... test-4200.example.com: 192.0.2.1 -- link: dummy0 -- Information acquired via protocol DNS in 1.6ms. -- Data is authenticated: no; Data was acquired via local or encrypted transport: no -- Data from: network Yields following memory increase: $ while :; do grep VmRSS /proc/$(pidof systemd-resolved)/status; sleep 1; done VmRSS: 14280 kB VmRSS: 14280 kB ... VmRSS: 403352 kB VmRSS: 1017976 kB VmRSS: 1603876 kB VmRSS: 2202028 kB ... VmRSS: 16795724 kB VmRSS: 16795724 kB In my testing I also noticed one annoyance - after certain threshold the RSS increase persisted even after the malicious entries were evicted from the cache (or flushed via `resolvectl flush-caches`). This was most likely due to mmap_threshold getting bumped to > 4 MiB and neither cache eviction nor flush-caches call malloc_trim(0) (via sd_event_trim_memory() or similar). To mitigate this, let's cap the pre-allocation to a maximum number of records the given packet body can realistically contain. If the minimum size would be, for whatever unlikely reason, not enough, nothing serious would happen - the hashmap would still get resized automatically by resize_buckets(), it'd be just slightly slower. [0] ae45e1a3832fbb6c96707687e42f0b4aaab52c9b [1] 2d34cf0c16dd8fa71fb593e65ce4734cb61d9170 --- diff --git a/src/shared/dns-packet.c b/src/shared/dns-packet.c index c8c54e7988a..4fdef2b570a 100644 --- a/src/shared/dns-packet.c +++ b/src/shared/dns-packet.c @@ -2463,8 +2463,15 @@ static int dns_packet_extract_question(DnsPacket *p, DnsQuestion **ret_question) if (!keys) return log_oom(); - r = set_reserve(keys, n * 2); /* Higher multipliers give slightly higher efficiency through - * hash collisions, but the gains quickly drop off after 2. */ + /* Pre-allocate the question hashmap, but cap the pre-allocation to a number of questions the + * packet can realistically contain. That is, pick the minimal value from the claimed number + * of questions (n) and a maximum number of potential questions the remaining packet data can + * actually contain: p->size - p->rindex are the remaining unread bytes in the packet, and 5U + * is the minimum size of each question - 1 (QNAME) + 2 (QTYPE) + 2 (QCLASS). + * + * Note for the multiplication: higher multipliers give slightly higher efficiency through + * hash collisions, but the gains quickly drop off after 2. */ + r = set_reserve(keys, MIN(n, (p->size - p->rindex) / 5U) * 2); if (r < 0) return r; @@ -2510,7 +2517,12 @@ static int dns_packet_extract_answer(DnsPacket *p, DnsAnswer **ret_answer) { if (n == 0) return 0; - answer = dns_answer_new(n); + /* Pre-allocate the answer hashmap, but cap the pre-allocation to a number of RRs the packet can + * realistically contain. That is, pick the minimal value from the claimed number of RRs (n) and a + * maximum number of potential RRs the remaining packet data can actually contain: p->size - + * p->rindex are the remaining unread bytes in the packet, and the 11U is the minimum size of each RR + * - 1 (NAME) + 2 (TYPE) + 2 (CLASS) + 4 (TTL) + 2 (RDLENGTH). */ + answer = dns_answer_new(MIN(n, (p->size - p->rindex) / 11U)); if (!answer) return -ENOMEM;