From: Yann Collet Date: Wed, 5 May 2021 17:04:03 +0000 (-0700) Subject: deeper prefetching pipeline for decompressSequencesLong X-Git-Tag: v1.5.0^2~34^2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=7ef6d7b36ca34eb4adef6f9780b0953d51643bb7;p=thirdparty%2Fzstd.git deeper prefetching pipeline for decompressSequencesLong pipeline increased from 4 to 8 slots. This change substantially improves decompression speed when there are long distance offsets. example with enwik9 compressed at level 22 : gcc-9 : 947 -> 1039 MB/s clang-10: 884 -> 946 MB/s I also checked the "cold dictionary" scenario, and found a smaller benefit, around ~2% (measurements are more noisy for this scenario). --- diff --git a/lib/decompress/zstd_decompress_block.c b/lib/decompress/zstd_decompress_block.c index b980339a1..5419724da 100644 --- a/lib/decompress/zstd_decompress_block.c +++ b/lib/decompress/zstd_decompress_block.c @@ -1254,9 +1254,9 @@ ZSTD_decompressSequencesLong_body( /* Regen sequences */ if (nbSeq) { -#define STORED_SEQS 4 +#define STORED_SEQS 8 #define STORED_SEQS_MASK (STORED_SEQS-1) -#define ADVANCED_SEQS 4 +#define ADVANCED_SEQS STORED_SEQS seq_t sequences[STORED_SEQS]; int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS); seqState_t seqState;