git.ipfire.org Git - thirdparty/zstd.git/log

]> git.ipfire.org Git - thirdparty/zstd.git/log

projects / thirdparty / zstd.git / log

commit | commitdiff | tree

Yann Collet [Sat, 26 May 2018 03:43:09 +0000 (20:43 -0700)]

fixed minor visual warning

commit | commitdiff | tree

Yann Collet [Sat, 26 May 2018 00:41:16 +0000 (17:41 -0700)]

changed dynamic fse threshold for offset

recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.

Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.

Detection based on defaultNormLog

commit | commitdiff | tree

Yann Collet [Fri, 25 May 2018 22:43:32 +0000 (15:43 -0700)]

Merge pull request #1141 from facebook/staticDictCost

Random stuff on High Compression mode

commit | commitdiff | tree

Yann Collet [Fri, 25 May 2018 21:52:21 +0000 (14:52 -0700)]

slightly nudge choices towards less sequences

also slightly improve some strange detrimental corner cases.

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 23:21:02 +0000 (16:21 -0700)]

Merge branch 'dev' into staticDictCost

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 22:56:09 +0000 (15:56 -0700)]

Merge pull request #1149 from terrelln/fuzz-py

Small fixes to fuzz.py

commit | commitdiff | tree

Nick Terrell [Thu, 24 May 2018 01:46:38 +0000 (18:46 -0700)]

Improve compiler detection to work on Mac

commit | commitdiff | tree

Nick Terrell [Thu, 24 May 2018 01:25:26 +0000 (18:25 -0700)]

Define BIT_DEBUG for --debug

commit | commitdiff | tree

Nick Terrell [Thu, 24 May 2018 01:22:32 +0000 (18:22 -0700)]

Increase the maximum file size

commit | commitdiff | tree

Nick Terrell [Thu, 24 May 2018 01:04:52 +0000 (18:04 -0700)]

Small fixes to fuzz.py

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 21:19:30 +0000 (14:19 -0700)]

Merge pull request #1150 from facebook/fracFse

fix corner case when requiring cost of an FSE symbol

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 21:09:49 +0000 (14:09 -0700)]

Merge branch 'dev' into fracFse

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 20:59:11 +0000 (13:59 -0700)]

fix corner case when requiring cost of an FSE symbol

ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.

Also : enable BIT_DEBUG in /tests

commit | commitdiff | tree

Yann Collet [Thu, 24 May 2018 02:32:25 +0000 (19:32 -0700)]

Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict

ZSTD_fast: Support Searching the Dictionary Context In-Place

commit | commitdiff | tree

Nick Terrell [Thu, 24 May 2018 01:02:30 +0000 (18:02 -0700)]

Work around bug in zstd decoder (#1147)

Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.

commit | commitdiff | tree

Yann Collet [Wed, 23 May 2018 23:41:42 +0000 (16:41 -0700)]

Merge pull request #1146 from terrelln/fse-fix

[zstd] Fix decompression edge case

commit | commitdiff | tree

Nick Terrell [Wed, 23 May 2018 21:58:58 +0000 (14:58 -0700)]

Variable declarations

commit | commitdiff | tree

W. Felix Handte [Wed, 23 May 2018 20:00:17 +0000 (16:00 -0400)]

Assert that Dict and Current Window are Adjacent in Index Space

commit | commitdiff | tree

W. Felix Handte [Tue, 22 May 2018 00:12:11 +0000 (20:12 -0400)]

Make loadedDictEnd an Index, not the Dict Len

commit | commitdiff | tree

W. Felix Handte [Mon, 21 May 2018 22:27:08 +0000 (18:27 -0400)]

Fixes in re Comments

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 21:23:16 +0000 (17:23 -0400)]

Don't Attach Empty Dict Contents

In weird corner cases, they produce unexpected results...

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 19:45:37 +0000 (15:45 -0400)]

Avoid Undefined Behavior in Match Ptr Calculation

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 19:41:37 +0000 (15:41 -0400)]

Remove Out-of-Date Comment

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 17:16:50 +0000 (13:16 -0400)]

Moar Renames

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 17:13:19 +0000 (13:13 -0400)]

Also Attach Dict When Source Size is Unknown

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 17:08:03 +0000 (13:08 -0400)]

Clear the Dictionary When Sliding the Window

commit | commitdiff | tree

W. Felix Handte [Tue, 15 May 2018 05:15:33 +0000 (01:15 -0400)]

Refine ip Initialization to Avoid ARM Weirdness

commit | commitdiff | tree

W. Felix Handte [Thu, 10 May 2018 21:18:08 +0000 (17:18 -0400)]

Use New Index Invariant to Simplify Conditionals

commit | commitdiff | tree

W. Felix Handte [Thu, 10 May 2018 21:17:10 +0000 (17:17 -0400)]

Force Working Context Indices Greater than Dict Indices

commit | commitdiff | tree

W. Felix Handte [Thu, 10 May 2018 17:46:19 +0000 (13:46 -0400)]

Whitespace Fix

commit | commitdiff | tree

W. Felix Handte [Wed, 9 May 2018 22:40:23 +0000 (18:40 -0400)]

Switch to Original Match Calc for noDict Repcode Check

commit | commitdiff | tree

W. Felix Handte [Wed, 9 May 2018 19:14:12 +0000 (15:14 -0400)]

Rename 'hasDict' to 'dictMode'

commit | commitdiff | tree

W. Felix Handte [Wed, 9 May 2018 17:14:20 +0000 (13:14 -0400)]

Respond to PR Comments; Formatting/Style/Lint Fixes

commit | commitdiff | tree

W. Felix Handte [Fri, 4 May 2018 17:08:07 +0000 (13:08 -0400)]

Rename and Reformat

commit | commitdiff | tree

W. Felix Handte [Thu, 3 May 2018 02:28:29 +0000 (22:28 -0400)]

Change Cut-Off to 8 KB

commit | commitdiff | tree

W. Felix Handte [Thu, 3 May 2018 00:30:03 +0000 (20:30 -0400)]

Fix Rep Code Initialization

commit | commitdiff | tree

W. Felix Handte [Wed, 2 May 2018 21:34:34 +0000 (17:34 -0400)]

Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff

commit | commitdiff | tree

W. Felix Handte [Wed, 2 May 2018 21:10:51 +0000 (17:10 -0400)]

Split Wrapper Functions to Cause Inlining

commit | commitdiff | tree

W. Felix Handte [Wed, 2 May 2018 19:12:18 +0000 (15:12 -0400)]

Add bounds check in repcode tests

commit | commitdiff | tree

W. Felix Handte [Tue, 1 May 2018 20:21:18 +0000 (16:21 -0400)]

Initial Repcode Check Support for Ext Dict Ctx

commit | commitdiff | tree

W. Felix Handte [Sat, 28 Apr 2018 04:42:37 +0000 (00:42 -0400)]

Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx

commit | commitdiff | tree

W. Felix Handte [Fri, 27 Apr 2018 22:46:59 +0000 (18:46 -0400)]

Refer to the Dictionary Match State In-Place (Sometimes)

commit | commitdiff | tree

Nick Terrell [Wed, 23 May 2018 21:47:20 +0000 (14:47 -0700)]

Error if reported size is too large in edge case

commit | commitdiff | tree

Nick Terrell [Wed, 23 May 2018 19:16:00 +0000 (12:16 -0700)]

[zstd] Fix decompression edge case

This edge case is only possible with the new optimal encoding selector,
since before zstd would always choose `set_basic` for small numbers of
sequences.

Fix `FSE_readNCount()` to support buffers < 4 bytes.

Credit to OSS-Fuzz

commit | commitdiff | tree

Yann Collet [Wed, 23 May 2018 02:25:37 +0000 (19:25 -0700)]

Merge pull request #1144 from terrelln/fse-entropy

Approximate FSE encoding costs for selection

commit | commitdiff | tree

Yann Collet [Tue, 22 May 2018 23:21:40 +0000 (16:21 -0700)]

Merge pull request #1145 from terrelln/spec

Clarify what happens when Number_of_Sequences == 0

commit | commitdiff | tree

Nick Terrell [Tue, 22 May 2018 23:12:33 +0000 (16:12 -0700)]

Clarify what happens when Number_of_Sequences == 0

commit | commitdiff | tree

Nick Terrell [Tue, 22 May 2018 23:06:33 +0000 (16:06 -0700)]

Fixes

commit | commitdiff | tree

Yann Collet [Tue, 22 May 2018 22:10:05 +0000 (15:10 -0700)]

Merge branch 'dev' into staticDictCost

commit | commitdiff | tree

Yann Collet [Tue, 22 May 2018 22:06:36 +0000 (15:06 -0700)]

disable 2-passes strategy

commit | commitdiff | tree

Nick Terrell [Mon, 16 Apr 2018 22:37:27 +0000 (15:37 -0700)]

Approximate FSE encoding costs for selection

Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 21:40:37 +0000 (14:40 -0700)]

Merge pull request #1143 from facebook/tableLevels

Update table of compression levels

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 01:23:40 +0000 (18:23 -0700)]

Merge branch 'tableLevels' of github.com:facebook/zstd into tableLevels

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 00:17:45 +0000 (17:17 -0700)]

Merge branch 'dev' into tableLevels

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 00:19:13 +0000 (17:19 -0700)]

Merge pull request #1142 from terrelln/better-dict

[cover] Small compression ratio improvement

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 00:17:45 +0000 (17:17 -0700)]

Merge branch 'dev' into tableLevels

commit | commitdiff | tree

Yann Collet [Sat, 19 May 2018 00:17:12 +0000 (17:17 -0700)]

updated compression levels for blocks of 256KB

commit | commitdiff | tree

Nick Terrell [Fri, 18 May 2018 22:25:10 +0000 (15:25 -0700)]

[cover] Small compression ratio improvement

The cover algorithm selects one segment per epoch, and it selects the epoch
size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs
gives the algorithm more candidates to choose from for each segment it
selects, and then it will loop back to the first epoch when it hits the
last one.

The trade off is that now it takes longer to select each segment, since it
has to look at more data before making a choice.

I benchmarked on the following data sets using this command:

```sh
$ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR | wc -c
```

| Data set     | k (approx) |  Before  |  After   | % difference |
|--------------|------------|----------|----------|--------------|
| GitHub       | ~1000      |   738138 |   746610 |       +1.14% |
| hg-changelog | ~90        |  4295156 |  4285336 |       -0.23% |
| hg-commands  | ~500       |  1095580 |  1079814 |       -1.44% |
| hg-manifest  | ~400       | 16559892 | 16504346 |       -0.34% |

There is some noise in the measurements, since small changes to `k` can
have large differences, which is why I'm using `steps=256`, to try to
minimize the noise. However, the GitHub data set still has some noise.

If I run the GitHub data set on my Mac, which presumably lists directory
entries in a different order, so the dictionary builder sees the files in
a different order, or I use `steps=1024` I see these results.

| Run        | Before | After  | % difference |
|------------|--------|--------|--------------|
| steps=1024 | 738138 | 734470 |       -0.50% |
| MacBook    | 738451 | 737132 |       -0.18% |

Question: Should we expose this as a parameter? I don't think it is
necessary. Someone might want to turn it up to exchange a much longer
dictionary building time in exchange for a slightly better dictionary.
I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16`
with a faster running time.

commit | commitdiff | tree

Yann Collet [Fri, 18 May 2018 23:03:06 +0000 (16:03 -0700)]

Merge branch 'dev' into staticDictCost

commit | commitdiff | tree

Yann Collet [Fri, 18 May 2018 21:09:42 +0000 (14:09 -0700)]

adding some debug functions to observe statistics

commit | commitdiff | tree

Yann Collet [Fri, 18 May 2018 20:23:35 +0000 (13:23 -0700)]

Merge pull request #1139 from fbrosson/prefetch

__builtin_prefetch did probably not exist before gcc 3.1.

commit | commitdiff | tree

fbrosson [Fri, 18 May 2018 18:40:11 +0000 (18:40 +0000)]

__builtin_prefetch did probably not exist before gcc 3.1.

commit | commitdiff | tree

Yann Collet [Fri, 18 May 2018 17:32:16 +0000 (10:32 -0700)]

Merge pull request #1140 from fbrosson/cpu-asm

Drop colon in asm snippet to make old versions of gcc happy.

commit | commitdiff | tree

fbrosson [Fri, 18 May 2018 17:05:36 +0000 (17:05 +0000)]

Drop colon in asm snippet to make old versions of gcc happy.

commit | commitdiff | tree

Yann Collet [Fri, 18 May 2018 00:27:27 +0000 (17:27 -0700)]

fixed minor conversion warning

commit | commitdiff | tree

Yann Collet [Thu, 17 May 2018 23:13:53 +0000 (16:13 -0700)]

fixed a pretty complex bug when combining ldm + btultra

commit | commitdiff | tree

Yann Collet [Thu, 17 May 2018 19:19:37 +0000 (12:19 -0700)]

collect statistics for first block in ultra mode

this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.

It translates into a very small compression ratio gain :

enwik7, level 20:
blocks 4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726

On the other hand, the cpu cost is doubled.

The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.

commit | commitdiff | tree

Yann Collet [Thu, 17 May 2018 18:19:05 +0000 (11:19 -0700)]

slightly improved weight calculation

translating into a tiny compression ratio improvement

commit | commitdiff | tree

Yann Collet [Wed, 16 May 2018 23:13:37 +0000 (16:13 -0700)]

update table levels for blocks <= 16K

also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.

commit | commitdiff | tree

Yann Collet [Wed, 16 May 2018 21:53:35 +0000 (14:53 -0700)]

introduced bit-fractional cost evaluation

this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.

Consequently, bit-fractional evaluation is only turned on for btultra.

commit | commitdiff | tree

Yann Collet [Tue, 15 May 2018 18:02:53 +0000 (11:02 -0700)]

Merge pull request #1135 from facebook/frameCSize

decompress: changed error code when input is too large

commit | commitdiff | tree

Yann Collet [Tue, 15 May 2018 18:02:01 +0000 (11:02 -0700)]

Merge pull request #1136 from terrelln/fix

Fix failing Travis tests

commit | commitdiff | tree

Nick Terrell [Tue, 15 May 2018 16:46:20 +0000 (09:46 -0700)]

Fix failing Travis tests

commit | commitdiff | tree

Yann Collet [Tue, 15 May 2018 01:09:26 +0000 (18:09 -0700)]

Merge branch 'dev' into staticDictCost

commit | commitdiff | tree

Yann Collet [Tue, 15 May 2018 01:04:08 +0000 (18:04 -0700)]

opt: removed static prices

after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.

commit | commitdiff | tree

Yann Collet [Tue, 15 May 2018 00:45:50 +0000 (17:45 -0700)]

Merge pull request #1127 from facebook/staticDictCost

Improved optimal parser with dictionary

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 22:32:28 +0000 (15:32 -0700)]

decompress: changed error code when input is too large

ZSTD_decompress() can decompress multiple frames sent as a single input.
But the input size must be the exact sum of all compressed frames, no more.

In the case of a mistake on srcSize, being larger than required,
ZSTD_decompress() will try to decompress a new frame after current one, and fail.
As a consequence, it will issue an error code, ERROR(prefix_unknown).

While the error is technically correct
(the decoder could not recognise the header of _next_ frame),
it's confusing, as users will believe that the first header of the first frame is wrong,
which is not the case (it's correct).
It makes it more difficult to understand that the error is in the source size, which is too large.

This patch changes the error code provided in such a scenario.
If (at least) a first frame was successfully decoded,
and then following bytes are garbage values,
the decoder assumes the provided input size is wrong (too large),
and issue the error code ERROR(srcSize_wrong).

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 18:55:52 +0000 (11:55 -0700)]

Merge branch 'dev' into tableLevels

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 18:53:58 +0000 (11:53 -0700)]

Merge pull request #1131 from facebook/zstdcli

minor: control numeric argument overflow

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 18:52:53 +0000 (11:52 -0700)]

Merge pull request #1130 from facebook/man

fix #1115

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 18:52:41 +0000 (11:52 -0700)]

Merge pull request #1129 from facebook/paramgrill

Paramgrill refactoring

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 18:52:05 +0000 (11:52 -0700)]

Merge branch 'dev' into tableLevels

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 16:59:43 +0000 (09:59 -0700)]

Merge pull request #1133 from felixhandte/travis-fix

Make Travis CI Run `apt-get update`

commit | commitdiff | tree

W. Felix Handte [Mon, 14 May 2018 15:55:21 +0000 (11:55 -0400)]

Travis CI Runs apt-get Update

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 00:25:53 +0000 (17:25 -0700)]

paramgrill: use NB_LEVELS_TRACKED in loop

make it easier to generate/track more levels
than ZSTD_maxClevel()

commit | commitdiff | tree

Yann Collet [Mon, 14 May 2018 00:15:07 +0000 (17:15 -0700)]

update table for 128 KB blocks

commit | commitdiff | tree

Yann Collet [Sun, 13 May 2018 08:53:38 +0000 (01:53 -0700)]

update compression levels for large inputs

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 21:29:33 +0000 (14:29 -0700)]

cli: control numeric argument overflow

exit on overflow
backported from paramgrill
added associated test case

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 19:34:34 +0000 (12:34 -0700)]

minor : factor out errorOut()

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 17:21:30 +0000 (10:21 -0700)]

fix #1115

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 16:40:04 +0000 (09:40 -0700)]

paramgrill: subtle change in level spacing

distance between levels is slightly increased
to compensate for level 1 speed improvements
and the will to have stronger level 19
extending the range of speed to cover.

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 02:43:08 +0000 (19:43 -0700)]

added programmable constraints

commit | commitdiff | tree

Yann Collet [Sat, 12 May 2018 00:32:26 +0000 (17:32 -0700)]

generalized use of readU32FromChar()

and check input overflow

commit | commitdiff | tree

Yann Collet [Fri, 11 May 2018 22:54:06 +0000 (15:54 -0700)]

replaced FSE_count by FSE_count_simple

to reduce usage of stack memory.

Also : tweaked a few comments, as suggested by @terrelln

commit | commitdiff | tree

Yann Collet [Fri, 11 May 2018 18:47:59 +0000 (11:47 -0700)]

Merge pull request #1128 from facebook/libdir

minor Makefile patch