I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)
Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
hash0 = hash1;
hash1 = ZSTD_hashPtr(ip2, hlog, mls);
+ /* advance to next positions */
+ ip0 = ip1;
+ ip1 = ip2;
+ ip2 = ip0 + step;
+ ip3 = ip1 + step;
+
/* calculate step */
if (ip2 >= nextStep) {
+ step++;
PREFETCH_L1(ip1 + 64);
PREFETCH_L1(ip1 + 128);
- step += 2;
nextStep += kStepIncr;
}
-
- /* advance to next positions */
- ip0 = ip1;
- ip1 = ip2;
- ip2 = ip0 + step;
- ip3 = ip1 + step;
} while (ip3 < ilimit);
_cleanup: