This improves cache population quite a bit and therefore helps when
dealing with large rulesets. A simple hard to improve use-case is
listing the last rule in a large chain. These are the average program
run times depending on number of rules:
rule count | legacy | nft old | nft new
---------------------------------------------------------
50,000 | .052s | .611s | .406s
100,000 | .115s | 2.12s | 1.24s
150,000 | .265s | 7.63s | 4.14s
200,000 | .411s | 21.0s | 10.6s
So while legacy iptables is still magnitudes faster, this simple change
doubles iptables-nft performance in ideal cases.
Note that using a larger buffer than 32KB doesn't further improve
performance since linux kernel won't transmit more data at once. This
limit was set (actually extended from 16KB) in kernel commit
d35c99ff77ecb ("netlink: do not enter direct reclaim from
netlink_dump()").
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>