]> git.ipfire.org Git - thirdparty/zlib-ng.git/commit
Adding avx512_vnni inline + copy elision
authorAdam Stylinski <kungfujesus06@gmail.com>
Fri, 8 Apr 2022 17:24:21 +0000 (13:24 -0400)
committerHans Kristian Rosbach <hk-github@circlestorm.org>
Mon, 23 May 2022 14:13:39 +0000 (16:13 +0200)
commitd79984b5bcaccab15e6cd13d7d1edea32ac36977
tree7b8e0053bfc6d237bb3ff493e0ad580923ef2526
parentb8269bb7d4702f8e694441112bb4ba7c59ff2362
Adding avx512_vnni inline + copy elision

Interesting revelation while benchmarking all of this is that our
chunkmemset_avx seems to be slower in a lot of use cases than
chunkmemset_sse.  That will be an interesting function to attempt to
optimize.

Right now though, we're basically beating google for all PNG decode and
encode benchmarks.  There are some variations of flags that can
basically have us trading blows, but we're about as much as 14% faster
than chromium's zlib patches.

While we're here, add a more direct benchmark of the folded copy method
versus the explicit copy + checksum.
20 files changed:
adler32_fold.c
adler32_fold.h
adler32_p.h
arch/x86/adler32_avx2.c
arch/x86/adler32_avx2_p.h
arch/x86/adler32_avx2_tpl.h
arch/x86/adler32_avx512.c
arch/x86/adler32_avx512_tpl.h
arch/x86/adler32_avx512_vnni.c
arch/x86/adler32_sse42.c
arch/x86/adler32_ssse3_tpl.h [deleted file]
cpu_features.h
deflate.c
deflate.h
functable.c
inflate.c
inflate.h
test/benchmarks/CMakeLists.txt
test/benchmarks/benchmark_adler32_copy.cc [new file with mode: 0644]
win32/Makefile.msc