From: Willy Tarreau Date: Sun, 9 Apr 2023 09:50:15 +0000 (+0200) Subject: IMPORT: slz: use a better hash for machines with a fast multiply X-Git-Tag: v3.2-dev17~33 X-Git-Url: http://git.ipfire.org/gitweb/gitweb.cgi?a=commitdiff_plain;h=411b04c7d39619ae36842396888ad75f6c3ba5cf;p=thirdparty%2Fhaproxy.git IMPORT: slz: use a better hash for machines with a fast multiply The current hash involves 3 simple shifts and additions so that it can be mapped to a multiply on architecures having a fast multiply. This is indeed what the compiler does on x86_64. A large range of values was scanned to try to find more optimal factors on machines supporting such a fast multiply, and it turned out that new factor 0x1af42f resulted in smoother hashes that provided on average 0.4% better compression on both the Silesia corpus and an mbox file composed of very compressible emails and uncompressible attachments. It's even slightly better than CRC32C while being faster on Skylake. This patch enables this factor on archs with a fast multiply. This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40. --- diff --git a/include/import/slz.h b/include/import/slz.h index 901a79027..5ff756c35 100644 --- a/include/import/slz.h +++ b/include/import/slz.h @@ -38,6 +38,7 @@ #define UNALIGNED_LE_OK #define UNALIGNED_FASTER #define USE_64BIT_QUEUE +#define HAVE_FAST_MULT #elif defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__) #define UNALIGNED_LE_OK //#define UNALIGNED_FASTER @@ -47,6 +48,7 @@ #elif defined(__ARM_ARCH_8A) || defined(__ARM_FEATURE_UNALIGNED) #define UNALIGNED_LE_OK #define UNALIGNED_FASTER +#define HAVE_FAST_MULT #endif /* Log2 of the size of the hash table used for the references table. */ diff --git a/src/slz.c b/src/slz.c index a41df72ae..1b0b13c57 100644 --- a/src/slz.c +++ b/src/slz.c @@ -388,6 +388,9 @@ static inline uint32_t slz_hash(uint32_t a) // but provides a slightly smoother hash __asm__ volatile("crc32l %1,%0" : "+r"(a) : "r"(0)); return a >> (32 - HASH_BITS); +#elif defined(HAVE_FAST_MULT) + // optimal factor for HASH_BITS=12 and HASH_BITS=13 among 48k tested: 0x1af42f + return (a * 0x1af42f) >> (32 - HASH_BITS); #else return ((a << 19) + (a << 6) - a) >> (32 - HASH_BITS); #endif