]>
git.ipfire.org Git - thirdparty/zstd.git/log
Yann Collet [Fri, 25 May 2018 21:52:21 +0000 (14:52 -0700)]
slightly nudge choices towards less sequences
also slightly improve some strange detrimental corner cases.
Yann Collet [Thu, 24 May 2018 23:21:02 +0000 (16:21 -0700)]
Merge branch 'dev' into staticDictCost
Yann Collet [Thu, 24 May 2018 22:56:09 +0000 (15:56 -0700)]
Merge pull request #1149 from terrelln/fuzz-py
Small fixes to fuzz.py
Nick Terrell [Thu, 24 May 2018 01:46:38 +0000 (18:46 -0700)]
Improve compiler detection to work on Mac
Nick Terrell [Thu, 24 May 2018 01:25:26 +0000 (18:25 -0700)]
Define BIT_DEBUG for --debug
Nick Terrell [Thu, 24 May 2018 01:22:32 +0000 (18:22 -0700)]
Increase the maximum file size
Nick Terrell [Thu, 24 May 2018 01:04:52 +0000 (18:04 -0700)]
Small fixes to fuzz.py
Yann Collet [Thu, 24 May 2018 21:19:30 +0000 (14:19 -0700)]
Merge pull request #1150 from facebook/fracFse
fix corner case when requiring cost of an FSE symbol
Yann Collet [Thu, 24 May 2018 21:09:49 +0000 (14:09 -0700)]
Merge branch 'dev' into fracFse
Yann Collet [Thu, 24 May 2018 20:59:11 +0000 (13:59 -0700)]
fix corner case when requiring cost of an FSE symbol
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.
Also : enable BIT_DEBUG in /tests
Yann Collet [Thu, 24 May 2018 02:32:25 +0000 (19:32 -0700)]
Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
Nick Terrell [Thu, 24 May 2018 01:02:30 +0000 (18:02 -0700)]
Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder
Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
Yann Collet [Wed, 23 May 2018 23:41:42 +0000 (16:41 -0700)]
Merge pull request #1146 from terrelln/fse-fix
[zstd] Fix decompression edge case
Nick Terrell [Wed, 23 May 2018 21:58:58 +0000 (14:58 -0700)]
Variable declarations
W. Felix Handte [Wed, 23 May 2018 20:00:17 +0000 (16:00 -0400)]
Assert that Dict and Current Window are Adjacent in Index Space
W. Felix Handte [Tue, 22 May 2018 00:12:11 +0000 (20:12 -0400)]
Make loadedDictEnd an Index, not the Dict Len
W. Felix Handte [Mon, 21 May 2018 22:27:08 +0000 (18:27 -0400)]
Fixes in re Comments
W. Felix Handte [Tue, 15 May 2018 21:23:16 +0000 (17:23 -0400)]
Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
W. Felix Handte [Tue, 15 May 2018 19:45:37 +0000 (15:45 -0400)]
Avoid Undefined Behavior in Match Ptr Calculation
W. Felix Handte [Tue, 15 May 2018 19:41:37 +0000 (15:41 -0400)]
Remove Out-of-Date Comment
W. Felix Handte [Tue, 15 May 2018 17:16:50 +0000 (13:16 -0400)]
Moar Renames
W. Felix Handte [Tue, 15 May 2018 17:13:19 +0000 (13:13 -0400)]
Also Attach Dict When Source Size is Unknown
W. Felix Handte [Tue, 15 May 2018 17:08:03 +0000 (13:08 -0400)]
Clear the Dictionary When Sliding the Window
W. Felix Handte [Tue, 15 May 2018 05:15:33 +0000 (01:15 -0400)]
Refine ip Initialization to Avoid ARM Weirdness
W. Felix Handte [Thu, 10 May 2018 21:18:08 +0000 (17:18 -0400)]
Use New Index Invariant to Simplify Conditionals
W. Felix Handte [Thu, 10 May 2018 21:17:10 +0000 (17:17 -0400)]
Force Working Context Indices Greater than Dict Indices
W. Felix Handte [Thu, 10 May 2018 17:46:19 +0000 (13:46 -0400)]
Whitespace Fix
W. Felix Handte [Wed, 9 May 2018 22:40:23 +0000 (18:40 -0400)]
Switch to Original Match Calc for noDict Repcode Check
W. Felix Handte [Wed, 9 May 2018 19:14:12 +0000 (15:14 -0400)]
Rename 'hasDict' to 'dictMode'
W. Felix Handte [Wed, 9 May 2018 17:14:20 +0000 (13:14 -0400)]
Respond to PR Comments; Formatting/Style/Lint Fixes
W. Felix Handte [Fri, 4 May 2018 17:08:07 +0000 (13:08 -0400)]
Rename and Reformat
W. Felix Handte [Thu, 3 May 2018 02:28:29 +0000 (22:28 -0400)]
Change Cut-Off to 8 KB
W. Felix Handte [Thu, 3 May 2018 00:30:03 +0000 (20:30 -0400)]
Fix Rep Code Initialization
W. Felix Handte [Wed, 2 May 2018 21:34:34 +0000 (17:34 -0400)]
Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff
W. Felix Handte [Wed, 2 May 2018 21:10:51 +0000 (17:10 -0400)]
Split Wrapper Functions to Cause Inlining
W. Felix Handte [Wed, 2 May 2018 19:12:18 +0000 (15:12 -0400)]
Add bounds check in repcode tests
W. Felix Handte [Tue, 1 May 2018 20:21:18 +0000 (16:21 -0400)]
Initial Repcode Check Support for Ext Dict Ctx
W. Felix Handte [Sat, 28 Apr 2018 04:42:37 +0000 (00:42 -0400)]
Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx
W. Felix Handte [Fri, 27 Apr 2018 22:46:59 +0000 (18:46 -0400)]
Refer to the Dictionary Match State In-Place (Sometimes)
Nick Terrell [Wed, 23 May 2018 21:47:20 +0000 (14:47 -0700)]
Error if reported size is too large in edge case
Nick Terrell [Wed, 23 May 2018 19:16:00 +0000 (12:16 -0700)]
[zstd] Fix decompression edge case
This edge case is only possible with the new optimal encoding selector,
since before zstd would always choose `set_basic` for small numbers of
sequences.
Fix `FSE_readNCount()` to support buffers < 4 bytes.
Credit to OSS-Fuzz
Yann Collet [Wed, 23 May 2018 02:25:37 +0000 (19:25 -0700)]
Merge pull request #1144 from terrelln/fse-entropy
Approximate FSE encoding costs for selection
Yann Collet [Tue, 22 May 2018 23:21:40 +0000 (16:21 -0700)]
Merge pull request #1145 from terrelln/spec
Clarify what happens when Number_of_Sequences == 0
Nick Terrell [Tue, 22 May 2018 23:12:33 +0000 (16:12 -0700)]
Clarify what happens when Number_of_Sequences == 0
Nick Terrell [Tue, 22 May 2018 23:06:33 +0000 (16:06 -0700)]
Fixes
Yann Collet [Tue, 22 May 2018 22:10:05 +0000 (15:10 -0700)]
Merge branch 'dev' into staticDictCost
Yann Collet [Tue, 22 May 2018 22:06:36 +0000 (15:06 -0700)]
disable 2-passes strategy
Nick Terrell [Mon, 16 Apr 2018 22:37:27 +0000 (15:37 -0700)]
Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.
* The cost of `set_basic` is computed using the cross-entropy cost
function `ZSTD_crossEntropyCost()`, using the normalized default count
and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
`ZSTD_entropyCost()`, together with the cost of writing the normalized
count `ZSTD_NCountCost()`.
Yann Collet [Sat, 19 May 2018 21:40:37 +0000 (14:40 -0700)]
Merge pull request #1143 from facebook/tableLevels
Update table of compression levels
Yann Collet [Sat, 19 May 2018 01:23:40 +0000 (18:23 -0700)]
Merge branch 'tableLevels' of github.com:facebook/zstd into tableLevels
Yann Collet [Sat, 19 May 2018 00:17:45 +0000 (17:17 -0700)]
Merge branch 'dev' into tableLevels
Yann Collet [Sat, 19 May 2018 00:19:13 +0000 (17:19 -0700)]
Merge pull request #1142 from terrelln/better-dict
[cover] Small compression ratio improvement
Yann Collet [Sat, 19 May 2018 00:17:45 +0000 (17:17 -0700)]
Merge branch 'dev' into tableLevels
Yann Collet [Sat, 19 May 2018 00:17:12 +0000 (17:17 -0700)]
updated compression levels for blocks of 256KB
Nick Terrell [Fri, 18 May 2018 22:25:10 +0000 (15:25 -0700)]
[cover] Small compression ratio improvement
The cover algorithm selects one segment per epoch, and it selects the epoch
size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs
gives the algorithm more candidates to choose from for each segment it
selects, and then it will loop back to the first epoch when it hits the
last one.
The trade off is that now it takes longer to select each segment, since it
has to look at more data before making a choice.
I benchmarked on the following data sets using this command:
```sh
$ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR | wc -c
```
| Data set | k (approx) | Before | After | % difference |
|--------------|------------|----------|----------|--------------|
| GitHub | ~1000 | 738138 | 746610 | +1.14% |
| hg-changelog | ~90 |
4295156 |
4285336 | -0.23% |
| hg-commands | ~500 |
1095580 |
1079814 | -1.44% |
| hg-manifest | ~400 |
16559892 |
16504346 | -0.34% |
There is some noise in the measurements, since small changes to `k` can
have large differences, which is why I'm using `steps=256`, to try to
minimize the noise. However, the GitHub data set still has some noise.
If I run the GitHub data set on my Mac, which presumably lists directory
entries in a different order, so the dictionary builder sees the files in
a different order, or I use `steps=1024` I see these results.
| Run | Before | After | % difference |
|------------|--------|--------|--------------|
| steps=1024 | 738138 | 734470 | -0.50% |
| MacBook | 738451 | 737132 | -0.18% |
Question: Should we expose this as a parameter? I don't think it is
necessary. Someone might want to turn it up to exchange a much longer
dictionary building time in exchange for a slightly better dictionary.
I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16`
with a faster running time.
Yann Collet [Fri, 18 May 2018 23:03:06 +0000 (16:03 -0700)]
Merge branch 'dev' into staticDictCost
Yann Collet [Fri, 18 May 2018 21:09:42 +0000 (14:09 -0700)]
adding some debug functions to observe statistics
Yann Collet [Fri, 18 May 2018 20:23:35 +0000 (13:23 -0700)]
Merge pull request #1139 from fbrosson/prefetch
__builtin_prefetch did probably not exist before gcc 3.1.
fbrosson [Fri, 18 May 2018 18:40:11 +0000 (18:40 +0000)]
__builtin_prefetch did probably not exist before gcc 3.1.
Yann Collet [Fri, 18 May 2018 17:32:16 +0000 (10:32 -0700)]
Merge pull request #1140 from fbrosson/cpu-asm
Drop colon in asm snippet to make old versions of gcc happy.
fbrosson [Fri, 18 May 2018 17:05:36 +0000 (17:05 +0000)]
Drop colon in asm snippet to make old versions of gcc happy.
Yann Collet [Fri, 18 May 2018 00:27:27 +0000 (17:27 -0700)]
fixed minor conversion warning
Yann Collet [Thu, 17 May 2018 23:13:53 +0000 (16:13 -0700)]
fixed a pretty complex bug when combining ldm + btultra
Yann Collet [Thu, 17 May 2018 19:19:37 +0000 (12:19 -0700)]
collect statistics for first block in ultra mode
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.
It translates into a very small compression ratio gain :
enwik7, level 20:
blocks 4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726
On the other hand, the cpu cost is doubled.
The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
Yann Collet [Thu, 17 May 2018 18:19:05 +0000 (11:19 -0700)]
slightly improved weight calculation
translating into a tiny compression ratio improvement
Yann Collet [Wed, 16 May 2018 23:13:37 +0000 (16:13 -0700)]
update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
Yann Collet [Wed, 16 May 2018 21:53:35 +0000 (14:53 -0700)]
introduced bit-fractional cost evaluation
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.
Consequently, bit-fractional evaluation is only turned on for btultra.
Yann Collet [Tue, 15 May 2018 18:02:53 +0000 (11:02 -0700)]
Merge pull request #1135 from facebook/frameCSize
decompress: changed error code when input is too large
Yann Collet [Tue, 15 May 2018 18:02:01 +0000 (11:02 -0700)]
Merge pull request #1136 from terrelln/fix
Fix failing Travis tests
Nick Terrell [Tue, 15 May 2018 16:46:20 +0000 (09:46 -0700)]
Fix failing Travis tests
Yann Collet [Tue, 15 May 2018 01:09:26 +0000 (18:09 -0700)]
Merge branch 'dev' into staticDictCost
Yann Collet [Tue, 15 May 2018 01:04:08 +0000 (18:04 -0700)]
opt: removed static prices
after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.
Yann Collet [Tue, 15 May 2018 00:45:50 +0000 (17:45 -0700)]
Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
Yann Collet [Mon, 14 May 2018 22:32:28 +0000 (15:32 -0700)]
decompress: changed error code when input is too large
ZSTD_decompress() can decompress multiple frames sent as a single input.
But the input size must be the exact sum of all compressed frames, no more.
In the case of a mistake on srcSize, being larger than required,
ZSTD_decompress() will try to decompress a new frame after current one, and fail.
As a consequence, it will issue an error code, ERROR(prefix_unknown).
While the error is technically correct
(the decoder could not recognise the header of _next_ frame),
it's confusing, as users will believe that the first header of the first frame is wrong,
which is not the case (it's correct).
It makes it more difficult to understand that the error is in the source size, which is too large.
This patch changes the error code provided in such a scenario.
If (at least) a first frame was successfully decoded,
and then following bytes are garbage values,
the decoder assumes the provided input size is wrong (too large),
and issue the error code ERROR(srcSize_wrong).
Yann Collet [Mon, 14 May 2018 18:55:52 +0000 (11:55 -0700)]
Merge branch 'dev' into tableLevels
Yann Collet [Mon, 14 May 2018 18:53:58 +0000 (11:53 -0700)]
Merge pull request #1131 from facebook/zstdcli
minor: control numeric argument overflow
Yann Collet [Mon, 14 May 2018 18:52:53 +0000 (11:52 -0700)]
Merge pull request #1130 from facebook/man
fix #1115
Yann Collet [Mon, 14 May 2018 18:52:41 +0000 (11:52 -0700)]
Merge pull request #1129 from facebook/paramgrill
Paramgrill refactoring
Yann Collet [Mon, 14 May 2018 18:52:05 +0000 (11:52 -0700)]
Merge branch 'dev' into tableLevels
Yann Collet [Mon, 14 May 2018 16:59:43 +0000 (09:59 -0700)]
Merge pull request #1133 from felixhandte/travis-fix
Make Travis CI Run `apt-get update`
W. Felix Handte [Mon, 14 May 2018 15:55:21 +0000 (11:55 -0400)]
Travis CI Runs apt-get Update
Yann Collet [Mon, 14 May 2018 00:25:53 +0000 (17:25 -0700)]
paramgrill: use NB_LEVELS_TRACKED in loop
make it easier to generate/track more levels
than ZSTD_maxClevel()
Yann Collet [Mon, 14 May 2018 00:15:07 +0000 (17:15 -0700)]
update table for 128 KB blocks
Yann Collet [Sun, 13 May 2018 08:53:38 +0000 (01:53 -0700)]
update compression levels for large inputs
Yann Collet [Sat, 12 May 2018 21:29:33 +0000 (14:29 -0700)]
cli: control numeric argument overflow
exit on overflow
backported from paramgrill
added associated test case
Yann Collet [Sat, 12 May 2018 19:34:34 +0000 (12:34 -0700)]
minor : factor out errorOut()
Yann Collet [Sat, 12 May 2018 17:21:30 +0000 (10:21 -0700)]
fix #1115
Yann Collet [Sat, 12 May 2018 16:40:04 +0000 (09:40 -0700)]
paramgrill: subtle change in level spacing
distance between levels is slightly increased
to compensate for level 1 speed improvements
and the will to have stronger level 19
extending the range of speed to cover.
Yann Collet [Sat, 12 May 2018 02:43:08 +0000 (19:43 -0700)]
added programmable constraints
Yann Collet [Sat, 12 May 2018 00:32:26 +0000 (17:32 -0700)]
generalized use of readU32FromChar()
and check input overflow
Yann Collet [Fri, 11 May 2018 22:54:06 +0000 (15:54 -0700)]
replaced FSE_count by FSE_count_simple
to reduce usage of stack memory.
Also : tweaked a few comments, as suggested by @terrelln
Yann Collet [Fri, 11 May 2018 18:47:59 +0000 (11:47 -0700)]
Merge pull request #1128 from facebook/libdir
minor Makefile patch
Yann Collet [Fri, 11 May 2018 18:31:48 +0000 (11:31 -0700)]
minor patch, ensuring LIBDIR is created before installation
follow-up from #1123
Yann Collet [Fri, 11 May 2018 02:48:09 +0000 (19:48 -0700)]
fixed wrong assertion
base can actually overflow
Yann Collet [Fri, 11 May 2018 01:17:21 +0000 (18:17 -0700)]
fixed c++ conversion warning
Yann Collet [Fri, 11 May 2018 01:13:48 +0000 (18:13 -0700)]
minor adjusting of weights
Yann Collet [Fri, 11 May 2018 00:59:12 +0000 (17:59 -0700)]
opt: init statistics from dictionary
instead of starting from fake "default" statistics.
Yann Collet [Thu, 10 May 2018 23:32:36 +0000 (16:32 -0700)]
btopt : minor adjustment of update frequencies
Yann Collet [Thu, 10 May 2018 17:38:19 +0000 (10:38 -0700)]
Merge pull request #1123 from baruchsiach/fix-install-pc
lib/Makefile: create include directory before headers installation
Yann Collet [Wed, 9 May 2018 22:46:11 +0000 (15:46 -0700)]
opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)