]> git.ipfire.org Git - thirdparty/zstd.git/commit
Improvements in zstd decode performance
authormgrice <mgrice@fb.com>
Tue, 27 Aug 2019 21:49:23 +0000 (14:49 -0700)
committermgrice <mgrice@fb.com>
Thu, 29 Aug 2019 19:25:56 +0000 (12:25 -0700)
commitb83059958246dfcb5b91af9c187fad8c706869a0
treec0aa8f99e3cc6b47c440b292a1c0da8a7f90866d
parentd944197e7945fe7319b4ac13f5be2a9de6ab0fba
Improvements in zstd decode performance

Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3).  This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.

Speedup on Xeon E5-2680v4 is in the range of 3-5%.

Measured wildcopy length distributions on silesia.tar:

level <=8 <=16 <=24 >24
1 78.05% 11.49% 3.52% 6.94%
3 82.14% 8.99% 2.44% 6.43%
6 85.81% 6.51% 2.92% 4.76%
8 83.02% 7.31% 3.64% 6.03%
10 84.13% 6.67% 3.29% 5.91%
15 77.58% 7.55% 5.21% 9.66%
16 80.07% 7.20% 3.98% 8.75%

Test Plan: benchmark silesia, make check
lib/common/zstd_internal.h
lib/compress/zstd_compress_internal.h
lib/decompress/zstd_decompress_block.c