From: W. Felix Handte Date: Fri, 20 Aug 2021 19:56:14 +0000 (-0400) Subject: Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) X-Git-Tag: v1.5.1~1^2~116^2~5 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=24fcccd05c6a3609715b9d9d1020129105c55116;p=thirdparty%2Fzstd.git Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations. --- diff --git a/lib/compress/zstd_fast.c b/lib/compress/zstd_fast.c index ebbef4919..9b40558e1 100644 --- a/lib/compress/zstd_fast.c +++ b/lib/compress/zstd_fast.c @@ -247,6 +247,7 @@ ZSTD_compressBlock_fast_generic_pipelined( const BYTE* ip0 = istart; const BYTE* ip1; const BYTE* ip2; + const BYTE* ip3; U32 current0; U32 rep_offset1 = rep[0]; @@ -284,8 +285,9 @@ _start: /* Requires: ip0 */ /* calculate positions, ip0 - anchor == 0, so we skip step calc */ ip1 = ip0 + stepSize; ip2 = ip1 + stepSize; + ip3 = ip2 + stepSize; - if (ip2 >= ilimit) { + if (ip3 >= ilimit) { goto _cleanup; } @@ -298,9 +300,8 @@ _start: /* Requires: ip0 */ /* load repcode match for ip[2]*/ const U32 rval = MEM_read32(ip2 - rep_offset1); - current0 = ip0 - base; - /* write back hash table entry */ + current0 = ip0 - base; hashTable[hash0] = current0; /* check repcode at ip[2] */ @@ -328,16 +329,45 @@ _start: /* Requires: ip0 */ goto _offset; } - hash0 = hash1; + /* lookup ip[1] */ + idx = hashTable[hash1]; /* hash ip[2] */ + hash0 = hash1; hash1 = ZSTD_hashPtr(ip2, hlog, mls); + /* advance to next positions */ + ip0 = ip1; + ip1 = ip2; + ip2 = ip3; + ip3 += step; + + /* write back hash table entry */ + current0 = ip0 - base; + hashTable[hash0] = current0; + + /* load match for ip[0] */ + if (idx >= prefixStartIndex) { + mval = MEM_read32(base + idx); + } else { + mval = MEM_read32(ip0) ^ 1; /* guaranteed to not match. */ + } + + /* check match at ip[0] */ + if (MEM_read32(ip0) == mval) { + /* found a match! */ + goto _offset; + } + /* lookup ip[1] */ - idx = hashTable[hash0]; + idx = hashTable[hash1]; + + /* hash ip[2] */ + hash0 = hash1; + hash1 = ZSTD_hashPtr(ip2, hlog, mls); /* calculate step */ - if (ip1 >= nextStep) { + if (ip2 >= nextStep) { PREFETCH_L1(ip1 + 64); PREFETCH_L1(ip1 + 128); step++; @@ -347,8 +377,9 @@ _start: /* Requires: ip0 */ /* advance to next positions */ ip0 = ip1; ip1 = ip2; - ip2 += step; - } while (ip2 < ilimit); + ip2 = ip3; + ip3 += step; + } while (ip3 < ilimit); _cleanup: /* Note that there are probably still a couple positions we could search.