]> git.ipfire.org Git - thirdparty/zlib-ng.git/commit
Explicit SSE2 vectorization of Chorba CRC method
authorAdam Stylinski <kungfujesus06@gmail.com>
Sun, 16 Feb 2025 17:13:00 +0000 (12:13 -0500)
committerHans Kristian Rosbach <hk-github@circlestorm.org>
Fri, 28 Mar 2025 19:43:59 +0000 (20:43 +0100)
commit724dc0cfb4805dfd57983080ec4d2b3c53262e87
treeb9bd4347f3059cb5976ec37c9ad25535bd56b9e8
parent2bba7e8468e808b7a7d5c1045d339eb5ffd12591
Explicit SSE2 vectorization of Chorba CRC method

The version that's currently in the generic implementation for 32768
byte buffers leverages the stack. It manages to autovectorize but
unfortunately the trips to the stack hurt its performance for CPUs which
need this the most. This version is explicitly SIMD vectorized and
doesn't use trips to the stack.  In my testing it's ~10% faster than the
"small" variant, and about 42% faster than the "32768" variant.
CMakeLists.txt
arch/x86/Makefile.in
arch/x86/chorba_sse2.c [new file with mode: 0644]
arch/x86/x86_functions.h
arch/x86/x86_intrins.h
configure
crc32.h
functable.c
test/benchmarks/benchmark_crc32.cc
test/test_crc32.cc
win32/Makefile.msc