Nick Terrell [Wed, 1 Dec 2021 19:49:58 +0000 (11:49 -0800)]
[CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.
Also fix all the test failures that were exposed.
`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
Nick Terrell [Wed, 1 Dec 2021 02:26:11 +0000 (18:26 -0800)]
[contrib][pzstd] Fix build issue with gcc-5
gcc-5 didn't like the l-value overload for defaulted operator=. There is
no reason it needs to be l-value overloaded, so just remove it.
I'm not sure why the build broke for @mckaygerhard in Issue #2811, since
this code hasn't changed since it was added. But, there is no harm in
fixing it.
Nick Terrell [Tue, 30 Nov 2021 20:29:55 +0000 (12:29 -0800)]
[test] Test that the exec-stack bit isn't set on libzstd.so
Tests that libzstd.so doesn't have the exec-stack bit set using
readelf. If the stack is marked executable systemd will refuse
to link against zstd. We now test that it isn't set on every PR.
Nick Terrell [Wed, 1 Dec 2021 01:43:28 +0000 (17:43 -0800)]
[bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
be any cases where bmi2 is supported but bmi1 isn't. But, since we are
using the instruction we should check bmi1 as well.
W. Felix Handte [Wed, 17 Nov 2021 23:09:18 +0000 (18:09 -0500)]
Determinism: Avoid Mapping Window into Reserved Indices during Reduction
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.
This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.
I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.
Nick Terrell [Tue, 16 Nov 2021 22:25:18 +0000 (14:25 -0800)]
[linux-kernel] Don't add -O3 to CFLAGS
It is no longer necessary to get good performance, there is only a small
speed difference between -O2 and -O3, so just stick to the default of
-O2. I've measured neutral compression speed and a ~3% decompression
speed loss in userspace with clang & gcc. I've also measured neutral
compression speed and a ~1% decompression speed loss in the kernel
benchmarks.
This also fixes the stack space usage on parisc. The compiler was buggy
for -O3 and used ~3KB of stack space for several functions. With -O2 the
problem is completely resolved, and stack space is back to a few hundred
bytes.
Additionally, we get a large code size win on gcc:
Nick Terrell [Tue, 16 Nov 2021 00:57:00 +0000 (16:57 -0800)]
[linux-kernel] Don't inline function in zstd_opt.c
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.
Nick Terrell [Tue, 16 Nov 2021 01:25:24 +0000 (17:25 -0800)]
Reduce function size in fast & dfast
Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and `switch`. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.
Also, in the `extDict` modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.
I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
ko-zu [Sat, 13 Nov 2021 13:48:33 +0000 (22:48 +0900)]
Remove executable flag from GNU_STACK section
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.
W. Felix Handte [Tue, 9 Nov 2021 01:03:52 +0000 (20:03 -0500)]
Avoid Reducing Indices to Reserved Values
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.
So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
Nick Terrell [Thu, 21 Oct 2021 19:52:26 +0000 (12:52 -0700)]
[lazy] Speed up compilation times
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.
The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
Nick Terrell [Fri, 8 Oct 2021 18:45:30 +0000 (11:45 -0700)]
[binary-tree] Fix underflow of nbCompares
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
W. Felix Handte [Wed, 8 Sep 2021 16:45:42 +0000 (12:45 -0400)]
Write Back Advanced Hash in Long Matches as Well (+Ratio)
Since we're now hashing the position ahead even if we find a long match and
don't search that next position, we can write it back into the hashtable even
in long matches. This seems to cost us no speed, and improves compression
ratio slightly!
W. Felix Handte [Thu, 2 Sep 2021 16:25:08 +0000 (12:25 -0400)]
Hash Long One Position Ahead (+2.5% Speed)
Aside from maybe a latency win in the loop, this means that when we find a
short match, we've already done the hash we need to check the next long match.