Yann Collet [Wed, 13 Dec 2017 19:48:30 +0000 (11:48 -0800)]
Improved tests
- building cli from /tests preserves potential flags in MOREFLAGS (such as asan/usan)
- MT dictionary tests check for MT capability (MT is not enabled by default for zstd32)
Yann Collet [Thu, 7 Dec 2017 07:52:50 +0000 (02:52 -0500)]
fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.
This patch fixes the issue.
Todo : add a test which traps the issue.
Yann Collet [Mon, 4 Dec 2017 23:57:01 +0000 (15:57 -0800)]
fix #911 : changed detection macro for clock_gettime()
The new macro might be a bit too restrictive.
Systems which do not support new test will simply default to <time.h>'s `clock_t clock()`,
suffering lesser benchmark accuracy.
Should it matter, the detection macro will have to be upgraded.
Yann Collet [Sat, 2 Dec 2017 05:17:09 +0000 (21:17 -0800)]
setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
Yann Collet [Fri, 1 Dec 2017 18:30:53 +0000 (10:30 -0800)]
removed -ftrapv from tests/ debug flags
-ftrapv is apparently buggy for gcc.
versions >= 5 are supposed to work better,
but even then, some complaints say it's still flaky when optimizations are enabled.
I even saw a post saying it only works if one creates its own signal handler,
which would make this flag no longer transparent.
on clang, it seems to work correctly.
But we would need to add a method to selectively add flags depending on compiler.
That's too much troubles for the intended benefit
(just catch integer overflows, which we can also do using ubsan).
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
Yann Collet [Tue, 28 Nov 2017 22:07:03 +0000 (14:07 -0800)]
zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.
Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.
The "proper" formula makes it possible to remove this heuristic.
Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).
It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
Yann Collet [Tue, 28 Nov 2017 20:32:24 +0000 (12:32 -0800)]
removed a bunch of code related to cached literal price
optState was used both to evaluate price
and to cache cost of previously calculated literals.
This created a strong dependency, forcing parser to request cost in a strict order.
This limitation is forbids future parser with skipping capabilities.
After this patch, caching literals price still exists,
but is now explicit, in a stack structure.
W. Felix Handte [Tue, 17 Oct 2017 05:19:29 +0000 (01:19 -0400)]
Fix LZ4 Compression Buffer Overflow
Fixes issue where, when `zstd --format=lz4` is fed an input larger than 128KB,
the read overruns the input buffer. This changes Zstd to use LZ4 with chained
64KB blocks. This is technically a breaking change in that some third party
LZ4 implementations may not support linked blocks. However, progress should not
be allowed to be stopped by such petty concerns as backwards compatibility!
Yann Collet [Sun, 19 Nov 2017 21:39:12 +0000 (13:39 -0800)]
slightly improved ratio at -22
merging of repcode search into btsearch introduced a small compression ratio regressio at max level :
1.3.2 : 52728769
after repMerge patch : 52760789 (+32020)
A few minor changes have produced this difference.
They can be hard to spot.
This patch buys back about half of the difference,
by no longer inserting position at hc3 when a long match is found there.
It feels strangely counter-intuitive, but works :
after this patch : 52742555 (-18234)
Yann Collet [Sat, 18 Nov 2017 23:54:32 +0000 (15:54 -0800)]
bench: slightly adjusted display format
adapt accuracy depending on value.
makes it possible to have higher accuracy for small value,
notably small compression speed.
This capability is expected to be useful while modifying optimal parser.
Yann Collet [Fri, 17 Nov 2017 08:22:55 +0000 (00:22 -0800)]
bench: added cli command `-S` to benchmark multiple files separately
Currently, all files are joined by default,
they are compressed separately but benchmarked together,
providing a single final result.
Benchmarking files separately make it possible to accurately measure difference for each file.
This is expected to be useful while tuning optimal parser.
Yann Collet [Thu, 16 Nov 2017 23:02:28 +0000 (15:02 -0800)]
fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
Yann Collet [Thu, 16 Nov 2017 20:18:56 +0000 (12:18 -0800)]
Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.
Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.
It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.
Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.
Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.
- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.