fixed timespec_get() initialization bug on some targets
not sure why, but msan fires an "unitialized variable" error
when time gets properly initialized by timespec_get().
Maybe in some cases, not all bytes of the structure are initialized ?
Or maybe msan fails to detect the initialization ?
Anyway, pre-initializing the variable before passing it to timespec_get() works.
C11 mandates the definition of timespec_get() and TIME_UTC.
However, FreeBSD11 announce C11 compliance, but does not provifr timespec_get(),
breaking link stage for benchfn.
Since it does not provide TIME_UTC either, which is also required by C11,
test this macro: this will automatically rule out FreeBSD 11 for this code path
(it will use the backup C90 path instead, based on clock_t).
Nick Terrell [Wed, 10 Apr 2019 19:34:21 +0000 (12:34 -0700)]
[libzstd] Fix decompression dictionary bugs and clean up initialization
Bugs:
* `ZSTD_DCtx_refPrefix()` didn't clear the dictionary after the first
use. Fix and add a test case.
* `ZSTD_DCtx_reset()` always cleared the dictionary. Fix and add a test
case.
* After calling `ZSTD_resetDStream()` you could no longer load a
dictionary, since the stage was set to `zdss_loadHeader`. Fix and add
a test case.
Cleanup:
* Make `ZSTD_initDStream*()` and `ZSTD_resetDStream()` wrap the new
advanced API, and add test cases.
* Document the equivalent of these functions in the advanced API and
document the unstable functions as deprecated.
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.
Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.
Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
Nick Terrell [Wed, 10 Apr 2019 00:59:27 +0000 (17:59 -0700)]
[libzstd] Fix ZSTD_decompressDCtx() with a dictionary
* `ZSTD_decompressDCtx()` did not use the dictionary loaded by
`ZSTD_DCtx_loadDictionary()`.
* Add a unit test.
* A stacked diff uses `ZSTD_decompressDCtx()` in the
`dictionary_round_trip` and `dictionary_decompress` fuzzers.
Nick Terrell [Tue, 9 Apr 2019 23:47:59 +0000 (16:47 -0700)]
[fuzzer] Sometimes fuzz with one less output byte
Zstd compression sometimes does different stuff when it has at least
`ZSTD_compressBound()` output bytes, or not. Half of the time fuzz with
`ZSTD_compressBound() - 1` output bytes. Ensure that we have at least
one byte of overhead by disabling either the dictionary ID or checksum.
Nick Terrell [Tue, 9 Apr 2019 23:24:17 +0000 (16:24 -0700)]
[libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.
Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`
* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
`ZSTD_compressBound()` output space.
Nick Terrell [Tue, 9 Apr 2019 02:57:41 +0000 (19:57 -0700)]
[libzstd] Don't check the dictID in fuzzing mode
When `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined don't check
the dictID. This check makes the fuzzers job harder, and it is at the
very beginning.
Nick Terrell [Fri, 29 Mar 2019 01:04:32 +0000 (19:04 -0600)]
Stabilize advance API
This commit moves the candidate advanced API to the stable section.
It makes some minor whitespace changes, but it doesn't change any
of the wording of the documentation.
I'll put up a separate PR that tweaks some of the documentation
once this lands, so that it is easier to review.
NOTE: Even though these functions are now in stable, they aren't
stable until the next release (in under 1 month). It is possible
that they change until then.
We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.
I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:
Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller. If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.
Nick Terrell [Tue, 2 Apr 2019 00:25:26 +0000 (17:25 -0700)]
[examples] Update streaming_decompression.c
Update to use the new streaming API. Making progress on Issue #1548.
Tested that it can decompress files produced by `streaming_compression`.
Tested that it can decompress two frames concatenated together.
Tested that it fails on corrupted data.
Nick Terrell [Tue, 2 Apr 2019 00:51:28 +0000 (17:51 -0700)]
Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.
Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.
I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
Update to use the new streaming API. Making progress on Issue #1548.
Tested that multiple files could be compressed, and that the output
is the same as calling `streaming_compression` multiple times with
the same compression level, and that it can be decompressed.
Nick Terrell [Fri, 22 Mar 2019 19:28:55 +0000 (12:28 -0700)]
[cover] Improvements for small or homogeneous data
* The algorithm would bail as soon as it found one epoch that
contained no new segments. Change it so it now has to fail
>= 10 times in a row (10 for fastcover, 10-100 for cover).
* The algorithm uses the `maxDict` size to decide the epoch size.
When this size is absurdly large, it causes tiny epochs. Lower
bound the epoch size at 10x the segment size, and warn the user
that their training set is too small.
Nick Terrell [Thu, 21 Mar 2019 22:17:41 +0000 (15:17 -0700)]
[lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
compression job. This allows the user to pass the dictionary before they
set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.