Nick Terrell [Tue, 6 Oct 2020 02:14:19 +0000 (19:14 -0700)]
[zstdmt] Fix determinism issue with rsyncable mode
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
for the next sync point.
This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.
Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
Nick Terrell [Tue, 6 Oct 2020 00:37:19 +0000 (17:37 -0700)]
[zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
Nick Terrell [Thu, 1 Oct 2020 22:02:15 +0000 (15:02 -0700)]
[zstreamtest] Add compression determinism tests
* Run compression twice and check the compressed data is byte-identical.
The compression loop had to be rewritten to ensure deteriminism. It is
guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.
I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
Nick Terrell [Fri, 2 Oct 2020 01:47:54 +0000 (18:47 -0700)]
[zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
Nick Terrell [Thu, 1 Oct 2020 21:29:13 +0000 (14:29 -0700)]
[zstdmt] Remove single-pass shortcut
Simplifies the code and removes blocking from zstdmt.
At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.
Nick Terrell [Thu, 24 Sep 2020 23:04:21 +0000 (16:04 -0700)]
Allow user to override ASAN/MSAN detection
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
Nick Terrell [Thu, 24 Sep 2020 03:34:44 +0000 (20:34 -0700)]
[lib] Wrap customMem xor checks in parens for readability
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
W. Felix Handte [Thu, 17 Sep 2020 16:15:33 +0000 (12:15 -0400)]
Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset()
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
W. Felix Handte [Tue, 15 Sep 2020 18:06:58 +0000 (14:06 -0400)]
Fall Back if Derived CParams are Incompatible with DDSS; Refactor CDict Creation
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
W. Felix Handte [Fri, 11 Sep 2020 03:35:42 +0000 (23:35 -0400)]
Print More During Fuzzer Test to Avoid CI Killing it Due to Timeout
This is kind of hacky. And maybe this test doesn't need to be permanently as
exhaustive as it is now. But while we're actively developing the DDSS, we
should ensure it's compatible across many different modes.
W. Felix Handte [Fri, 4 Sep 2020 04:11:44 +0000 (00:11 -0400)]
Use All Available Space in the Hash Table to Extent Chain Table Reach
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.