Nick Terrell [Wed, 5 May 2021 19:18:47 +0000 (12:18 -0700)]
[lib] Add ZSTD_c_deterministicRefPrefix
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.
A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.
Question: Should this be the default behavior? It isn't in this PR.
Nick Terrell [Tue, 4 May 2021 05:33:22 +0000 (22:33 -0700)]
[lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.
This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
Yann Collet [Tue, 4 May 2021 17:54:34 +0000 (10:54 -0700)]
allow jobSize to be as low as 512 KB
previous lower limit was 1 MB.
Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.
Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
Nick Terrell [Mon, 3 May 2021 21:32:15 +0000 (14:32 -0700)]
[LDM] Speed optimization on repetitive data
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.
```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
21.187881087 seconds time elapsed
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
1.149707921 seconds time elapsed
Nick Terrell [Mon, 3 May 2021 23:29:11 +0000 (16:29 -0700)]
[tests] Reduce memory usage of MT CLI tests
Switch from `-T0` to the default `-T1` which significantly reduces
memory usage for level 19 when there are many cores. This fixes
32-bit issues of running out of address space.
Nick Terrell [Mon, 3 May 2021 21:37:06 +0000 (14:37 -0700)]
Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
got the correction wrong in this case, and our chain tables and binary
trees would be corrupted. Now, we work as long as `maxDist` is a power
of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
run overflow correction as frequently as allowed without impacting
compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
`zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
speed penalty at most, which seems reasonable.
Nick Terrell [Fri, 30 Apr 2021 22:02:12 +0000 (15:02 -0700)]
[1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
Nick Terrell [Mon, 26 Apr 2021 23:05:39 +0000 (16:05 -0700)]
[trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
Niclas Rosenvik [Sun, 28 Mar 2021 12:36:13 +0000 (14:36 +0200)]
Stop complaining about hash tool not found
If build_dir is set the zstd build complains about md5sum not being found.
Fix this by checking if build_dir is set before checking and using the hash tool
just like in lib/Makefile .
Nick Terrell [Mon, 29 Mar 2021 21:23:36 +0000 (14:23 -0700)]
[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`