Nick Terrell [Tue, 27 Aug 2019 01:19:29 +0000 (18:19 -0700)]
[dictBuilder] Fix deadlock in *COVER error case
The COVER and FASTCOVER dictionary builders can deadlock when
dictionary construction errors, likely because there are too few
samples, or too few distinct dmers. The deadlock only occurs when
there are errors.
Nick Terrell [Tue, 20 Aug 2019 18:33:33 +0000 (11:33 -0700)]
[fuzz] Improve fuzzer build script and docs
* Remove the `make libFuzzer` target since it is broken and obsoleted
by `CC=clang CXX=clang++ ./fuzz.py build all --enable-fuzzer`. The
new `-fsanitize=fuzzer` is much better because it works with MSAN
by default.
* Improve the `./fuzz.py gen` command by making the input type explicit
when creating a new target.
* Update the `README` for `--enable-fuzzer`.
Yann Collet [Thu, 15 Aug 2019 14:41:34 +0000 (16:41 +0200)]
fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.
However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.
Yann Collet [Fri, 2 Aug 2019 15:34:53 +0000 (17:34 +0200)]
fixed datagen
to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.
also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.
Nick Terrell [Mon, 22 Jul 2019 20:05:09 +0000 (13:05 -0700)]
[legacy] Fix bug in zstd-0.5 decoder
The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.
The bug, introduced by commit 0fd322f812211e653a83492c0c114b933f8b6bc5,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.
I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.
Qin Li [Thu, 18 Jul 2019 18:44:59 +0000 (11:44 -0700)]
fix compiling errors with clang-8
Compiling with clang-8 fails with the following errors:
largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
U64 const dTime_ns = result.nanoSecPerRun;
~~~~~~~~ ~~~~~~~^~~~~~~~~~~~~
zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 1 means that cover parameters were correct
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 0 in case of malformed parameters
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
W. Felix Handte [Wed, 17 Jul 2019 21:30:09 +0000 (17:30 -0400)]
[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content
This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.
Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.
Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.