Nick Terrell [Tue, 20 Aug 2019 18:33:33 +0000 (11:33 -0700)]
[fuzz] Improve fuzzer build script and docs
* Remove the `make libFuzzer` target since it is broken and obsoleted
by `CC=clang CXX=clang++ ./fuzz.py build all --enable-fuzzer`. The
new `-fsanitize=fuzzer` is much better because it works with MSAN
by default.
* Improve the `./fuzz.py gen` command by making the input type explicit
when creating a new target.
* Update the `README` for `--enable-fuzzer`.
Yann Collet [Thu, 15 Aug 2019 14:41:34 +0000 (16:41 +0200)]
fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.
However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.
Yann Collet [Fri, 2 Aug 2019 15:34:53 +0000 (17:34 +0200)]
fixed datagen
to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.
also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.
Nick Terrell [Mon, 22 Jul 2019 20:05:09 +0000 (13:05 -0700)]
[legacy] Fix bug in zstd-0.5 decoder
The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.
The bug, introduced by commit 0fd322f812211e653a83492c0c114b933f8b6bc5,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.
I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.
Qin Li [Thu, 18 Jul 2019 18:44:59 +0000 (11:44 -0700)]
fix compiling errors with clang-8
Compiling with clang-8 fails with the following errors:
largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
U64 const dTime_ns = result.nanoSecPerRun;
~~~~~~~~ ~~~~~~~^~~~~~~~~~~~~
zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 1 means that cover parameters were correct
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 0 in case of malformed parameters
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
W. Felix Handte [Wed, 17 Jul 2019 21:30:09 +0000 (17:30 -0400)]
[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content
This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.
Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.
Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
Nick Terrell [Tue, 2 Jul 2019 22:45:47 +0000 (15:45 -0700)]
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.