]> git.ipfire.org Git - thirdparty/zlib-ng.git/commit
Make an AVX512 inflate fast with low cost masked writes
authorAdam Stylinski <kungfujesus06@gmail.com>
Wed, 25 Sep 2024 21:56:36 +0000 (17:56 -0400)
committerHans Kristian Rosbach <hk-github@circlestorm.org>
Wed, 20 Nov 2024 21:14:44 +0000 (22:14 +0100)
commit0ed5ac8289e029ceff68150c0a6eb57d0da1148b
tree960a6bc5445f2f4e92eabcc9bf3f745901c2cb93
parent94aacd8bd69b7bfafce14fbe7639274e11d92d51
Make an AVX512 inflate fast with low cost masked writes

This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.
13 files changed:
CMakeLists.txt
arch/x86/Makefile.in
arch/x86/avx2_tables.h [new file with mode: 0644]
arch/x86/chunkset_avx2.c
arch/x86/chunkset_avx512.c [new file with mode: 0644]
arch/x86/x86_features.c
arch/x86/x86_features.h
arch/x86/x86_functions.h
chunkset_tpl.h
cmake/detect-intrinsics.cmake
configure
functable.c
inffast_tpl.h