From: Willy Tarreau <w@1wt.eu>
Date: Sun, 9 Apr 2023 09:50:15 +0000 (+0200)
Subject: IMPORT: slz: use a better hash for machines with a fast multiply
X-Git-Tag: v3.2-dev17~33
X-Git-Url: http://git.ipfire.org/gitweb/gitweb.cgi?a=commitdiff_plain;h=411b04c7d39619ae36842396888ad75f6c3ba5cf;p=thirdparty%2Fhaproxy.git

IMPORT: slz: use a better hash for machines with a fast multiply

The current hash involves 3 simple shifts and additions so that it can
be mapped to a multiply on architecures having a fast multiply. This is
indeed what the compiler does on x86_64. A large range of values was
scanned to try to find more optimal factors on machines supporting such
a fast multiply, and it turned out that new factor 0x1af42f resulted in
smoother hashes that provided on average 0.4% better compression on both
the Silesia corpus and an mbox file composed of very compressible emails
and uncompressible attachments. It's even slightly better than CRC32C
while being faster on Skylake. This patch enables this factor on archs
with a fast multiply.

This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.
---

diff --git a/include/import/slz.h b/include/import/slz.h
index 901a79027..5ff756c35 100644
--- a/include/import/slz.h
+++ b/include/import/slz.h
@@ -38,6 +38,7 @@
 #define UNALIGNED_LE_OK
 #define UNALIGNED_FASTER
 #define USE_64BIT_QUEUE
+#define HAVE_FAST_MULT
 #elif defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__)
 #define UNALIGNED_LE_OK
 //#define UNALIGNED_FASTER
@@ -47,6 +48,7 @@
 #elif defined(__ARM_ARCH_8A) || defined(__ARM_FEATURE_UNALIGNED)
 #define UNALIGNED_LE_OK
 #define UNALIGNED_FASTER
+#define HAVE_FAST_MULT
 #endif
 
 /* Log2 of the size of the hash table used for the references table. */
diff --git a/src/slz.c b/src/slz.c
index a41df72ae..1b0b13c57 100644
--- a/src/slz.c
+++ b/src/slz.c
@@ -388,6 +388,9 @@ static inline uint32_t slz_hash(uint32_t a)
 	// but provides a slightly smoother hash
 	__asm__ volatile("crc32l %1,%0" : "+r"(a) : "r"(0));
 	return a >> (32 - HASH_BITS);
+#elif defined(HAVE_FAST_MULT)
+	// optimal factor for HASH_BITS=12 and HASH_BITS=13 among 48k tested: 0x1af42f
+	return (a * 0x1af42f) >> (32 - HASH_BITS);
 #else
 	return ((a << 19) + (a << 6) - a) >> (32 - HASH_BITS);
 #endif