From: Michael Tremer <michael.tremer@ipfire.org>
Date: Tue, 1 Mar 2022 12:44:21 +0000 (+0000)
Subject: ipset: Optimise hash table size
X-Git-Tag: 0.9.11~10
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=47de14b01a0fae69994131e4fb9239738570ab38;p=location%2Flibloc.git

ipset: Optimise hash table size

ipset uses a hash table internally which can be dynamically sized to
chose whether more space efficiency or performance is required.

Previously to this patch, we always set the size of the hash table to
1024 buckets. Having large sets with almost half a million entries, this
is not performing well since we will spend a lot of time in searching
the linked list.

This will probably perform even slower on systems with smaller cache
sizes like the IPFire Mini Appliance.

Having more buckets that are sparesely filled, will result in less
memory fetches at the cost of more wastage. Throughout the whole IPv4
set, this ranges from about 50 MB for a factor of 4, to about 100 MB for
a factor of 0.75.

Since memory of this quantity is cheap and since we want to increase
throughput, I have chosen to set the fill factor to 0.75.

Logistically, it is a little bit complicated to know this in advance
when we have to write the header, so we will write the entire file
first, and then come back to write the header again. This is required to
keep memory consumption down during the export.

Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
---

diff --git a/src/python/export.py b/src/python/export.py
index be33847..f7401fb 100644
--- a/src/python/export.py
+++ b/src/python/export.py
@@ -20,6 +20,7 @@
 import io
 import ipaddress
 import logging
+import math
 import os
 import socket
 
@@ -43,9 +44,18 @@ class OutputWriter(object):
 	def __init__(self, f, prefix=None):
 		self.f, self.prefix = f, prefix
 
+		# Call any custom initialization
+		self.init()
+
 		# Immediately write the header
 		self._write_header()
 
+	def init(self):
+		"""
+			To be overwritten by anything that inherits from this
+		"""
+		pass
+
 	@classmethod
 	def open(cls, filename, **kwargs):
 		"""
@@ -89,13 +99,64 @@ class IpsetOutputWriter(OutputWriter):
 	"""
 	suffix = "ipset"
 
+	# The value is being used if we don't know any better
+	DEFAULT_HASHSIZE = 64
+
+	# We aim for this many networks in a bucket on average. This allows us to choose
+	# how much memory we want to sacrifice to gain better performance. The lower the
+	# factor, the faster a lookup will be, but it will use more memory.
+	# We will aim for only using three quarters of all buckets to avoid any searches
+	# through the linked lists.
+	HASHSIZE_FACTOR = 0.75
+
+	def init(self):
+		# Count all networks
+		self.networks = 0
+
+	@property
+	def hashsize(self):
+		"""
+			Calculates an optimized hashsize
+		"""
+		# Return the default value if we don't know the size of the set
+		if not self.networks:
+			return self.DEFAULT_HASHSIZE
+
+		# Find the nearest power of two that is larger than the number of networks
+		# divided by the hashsize factor.
+		exponent = math.log(self.networks / self.HASHSIZE_FACTOR, 2)
+
+		# Return the size of the hash
+		return 2 ** math.ceil(exponent)
+
+	@property
+	def maxelem(self):
+		"""
+			Tells ipset how large the set will be.
+
+			Since these are considered immutable, we will use the total number of networks.
+		"""
+		return self.networks
+
 	def _write_header(self):
-		self.f.write("create %s hash:net family inet hashsize 1024 maxelem 65536 -exist\n" % self.prefix)
+		# This must have a fixed size, because we will write the header again in the end
+		self.f.write("create %s hash:net family inet "
+			"hashsize %8d maxelem %8d -exist\n" % (self.prefix, self.hashsize, self.maxelem))
 		self.f.write("flush %s\n" % self.prefix)
 
 	def write(self, network):
 		self.f.write("add %s %s\n" % (self.prefix, network))
 
+		# Increment network counter
+		self.networks += 1
+
+	def _write_footer(self):
+		# Jump back to the beginning of the file
+		self.f.seek(0)
+
+		# Rewrite the header with better configuration
+		self._write_header()
+
 
 class NftablesOutputWriter(OutputWriter):
 	"""