]> git.ipfire.org Git - thirdparty/zstd.git/log
thirdparty/zstd.git
5 years agoFix merge conflicts 1733/head
Nick Magerko [Thu, 22 Aug 2019 18:51:41 +0000 (11:51 -0700)] 
Fix merge conflicts

5 years agoMerge pull request #1726 from nmagerko/stream-size
Nick Terrell [Thu, 22 Aug 2019 18:31:15 +0000 (11:31 -0700)] 
Merge pull request #1726 from nmagerko/stream-size

Add --stream-size=# option

5 years agoDifferentiate --stream-size from --size-hint 1726/head
Nick Magerko [Thu, 22 Aug 2019 16:37:47 +0000 (09:37 -0700)] 
Differentiate --stream-size from --size-hint

5 years agoMinor documentation update
Nick Magerko [Thu, 22 Aug 2019 16:13:28 +0000 (09:13 -0700)] 
Minor documentation update

5 years agoRemove bc from play tests
Nick Magerko [Wed, 21 Aug 2019 17:27:54 +0000 (10:27 -0700)] 
Remove bc from play tests

5 years agoMerge pull request #1737 from terrelln/legacy-fix
Nick Terrell [Wed, 21 Aug 2019 17:10:24 +0000 (10:10 -0700)] 
Merge pull request #1737 from terrelln/legacy-fix

[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression

5 years agoMerge pull request #1736 from terrelln/fuzz-fix
Nick Terrell [Wed, 21 Aug 2019 17:09:38 +0000 (10:09 -0700)] 
Merge pull request #1736 from terrelln/fuzz-fix

[fuzz] Improve fuzzer build script and docs

5 years agoMerge pull request #1724 from facebook/blockSize
Yann Collet [Wed, 21 Aug 2019 12:19:43 +0000 (05:19 -0700)] 
Merge pull request #1724 from facebook/blockSize

clarifications on field `Block_Size`

5 years agoMerge pull request #1725 from emaste/dev
Yann Collet [Wed, 21 Aug 2019 12:19:30 +0000 (05:19 -0700)] 
Merge pull request #1725 from emaste/dev

remove extraneous doubled ;s

5 years agoMerge pull request #1721 from facebook/seq127
Yann Collet [Wed, 21 Aug 2019 12:19:12 +0000 (05:19 -0700)] 
Merge pull request #1721 from facebook/seq127

fixed very minor inefficiency (nbSeq==127)

5 years ago[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression 1737/head
Nick Terrell [Wed, 21 Aug 2019 00:13:04 +0000 (17:13 -0700)] 
[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression

Extends the fix in PR#1722 to v0.2 and v0.4. These aren't built into
zstd by default, and v0.5 onward are not affected.

I only add the `srcSize > BLOCKSIZE` check to v0.4 because the comments
say that it must hold, but the equivalent comment isn't present in v0.2.

Credit to OSS-Fuzz.

5 years ago[fuzz] Improve fuzzer build script and docs 1736/head
Nick Terrell [Tue, 20 Aug 2019 18:33:33 +0000 (11:33 -0700)] 
[fuzz] Improve fuzzer build script and docs

* Remove the `make libFuzzer` target since it is broken and obsoleted
  by `CC=clang CXX=clang++ ./fuzz.py build all --enable-fuzzer`. The
  new `-fsanitize=fuzzer` is much better because it works with MSAN
  by default.
* Improve the `./fuzz.py gen` command by making the input type explicit
  when creating a new target.
* Update the `README` for `--enable-fuzzer`.

Fixes #1727.

5 years agoDocument --size-hint
Nick Magerko [Tue, 20 Aug 2019 21:08:26 +0000 (14:08 -0700)] 
Document --size-hint

5 years agoFix ZSTD_SRCSIZEHINT_MIN typo
Nick Magerko [Tue, 20 Aug 2019 20:07:51 +0000 (13:07 -0700)] 
Fix ZSTD_SRCSIZEHINT_MIN typo

5 years agoDefine ZSTD_SRCSIZEHINT_MIN as 0
Nick Magerko [Tue, 20 Aug 2019 20:06:15 +0000 (13:06 -0700)] 
Define ZSTD_SRCSIZEHINT_MIN as 0

5 years agoRemove unnecessary test case
Nick Magerko [Tue, 20 Aug 2019 00:20:46 +0000 (17:20 -0700)] 
Remove unnecessary test case

5 years agoFix typo in test
Nick Magerko [Mon, 19 Aug 2019 23:53:02 +0000 (16:53 -0700)] 
Fix typo in test

5 years agoRevert change to zstd manual
Nick Magerko [Mon, 19 Aug 2019 23:50:26 +0000 (16:50 -0700)] 
Revert change to zstd manual

5 years agoUse int for srcSizeHint when sensible
Nick Magerko [Mon, 19 Aug 2019 23:49:25 +0000 (16:49 -0700)] 
Use int for srcSizeHint when sensible

5 years agoFix playTests and add additional cases
Nick Magerko [Mon, 19 Aug 2019 23:48:35 +0000 (16:48 -0700)] 
Fix playTests and add additional cases

5 years agoAdd size-hint to fuzz tests
Nick Magerko [Mon, 19 Aug 2019 22:12:24 +0000 (15:12 -0700)] 
Add size-hint to fuzz tests

5 years agoAdd mention of regression with poor size hints
Nick Magerko [Mon, 19 Aug 2019 20:08:41 +0000 (13:08 -0700)] 
Add mention of regression with poor size hints

5 years agoMake upper bound INT_MAX
Nick Magerko [Mon, 19 Aug 2019 19:58:54 +0000 (12:58 -0700)] 
Make upper bound INT_MAX

5 years agoFix fall-through case
Nick Magerko [Mon, 19 Aug 2019 19:32:43 +0000 (12:32 -0700)] 
Fix fall-through case

5 years agoAdd --size-hint=# option
Nick Magerko [Mon, 19 Aug 2019 15:52:08 +0000 (08:52 -0700)] 
Add --size-hint=# option

5 years agoKeep content size flag set in stream size mode
Nick Magerko [Mon, 19 Aug 2019 18:20:28 +0000 (11:20 -0700)] 
Keep content size flag set in stream size mode

5 years agoRemove extraneous variables
Nick Magerko [Mon, 19 Aug 2019 18:14:56 +0000 (11:14 -0700)] 
Remove extraneous variables

5 years agoRemove extraneous parameter
Nick Magerko [Mon, 19 Aug 2019 18:07:43 +0000 (11:07 -0700)] 
Remove extraneous parameter

5 years agoUpdate man page
Nick Magerko [Mon, 19 Aug 2019 16:11:22 +0000 (09:11 -0700)] 
Update man page

5 years agoSet pledged size just before compression
Nick Magerko [Mon, 19 Aug 2019 16:01:31 +0000 (09:01 -0700)] 
Set pledged size just before compression

5 years ago`number` instead of `nb` 1724/head
Yann Collet [Sat, 17 Aug 2019 06:04:42 +0000 (08:04 +0200)] 
`number` instead of `nb`

suggested by @terrelln

5 years agoTweak stdout, stderr redirection in new playTests
Nick Magerko [Fri, 16 Aug 2019 19:49:21 +0000 (12:49 -0700)] 
Tweak stdout, stderr redirection in new  playTests

5 years agoAdd --stream-size=# command
Nick Magerko [Fri, 16 Aug 2019 06:57:55 +0000 (23:57 -0700)] 
Add --stream-size=# command

5 years agoclarifications on the meaning of field `Block_Size`
Yann Collet [Fri, 16 Aug 2019 13:13:42 +0000 (15:13 +0200)] 
clarifications on the meaning of field `Block_Size`

following comments from Intel's Smita Kumar.

5 years agoremove extraneous doubled ;s 1725/head
Ed Maste [Fri, 16 Aug 2019 01:17:06 +0000 (21:17 -0400)] 
remove extraneous doubled ;s

5 years agoMerge pull request #1722 from felixhandte/legacy-decompression-fix 1730/head
Felix Handte [Thu, 15 Aug 2019 19:55:46 +0000 (15:55 -0400)] 
Merge pull request #1722 from felixhandte/legacy-decompression-fix

Fix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression

5 years agoAdd to CHANGELOG for Upcoming Release 1722/head
W. Felix Handte [Thu, 15 Aug 2019 18:42:38 +0000 (14:42 -0400)] 
Add to CHANGELOG for Upcoming Release

5 years agoFix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression
W. Felix Handte [Thu, 15 Aug 2019 18:24:45 +0000 (14:24 -0400)] 
Fix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression

5 years agofixed very minor inefficiency (nbSeq==127) 1721/head
Yann Collet [Thu, 15 Aug 2019 14:41:34 +0000 (16:41 +0200)] 
fixed very minor inefficiency (nbSeq==127)

The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.

6 years agoMerge pull request #1711 from felixhandte/changelog-v1.4.3
Felix Handte [Tue, 6 Aug 2019 21:02:37 +0000 (17:02 -0400)] 
Merge pull request #1711 from felixhandte/changelog-v1.4.3

Update Changelog for v1.4.3

6 years agoUpdate Changelog for v1.4.3 1711/head
W. Felix Handte [Tue, 6 Aug 2019 17:44:05 +0000 (13:44 -0400)] 
Update Changelog for v1.4.3

6 years agobumped version number
Yann Collet [Mon, 5 Aug 2019 15:17:16 +0000 (17:17 +0200)] 
bumped version number

to v1.4.3

6 years agoMerge pull request #1705 from josepho0918/dev
Yann Collet [Mon, 5 Aug 2019 13:57:28 +0000 (15:57 +0200)] 
Merge pull request #1705 from josepho0918/dev

Add support for IAR C/C++ Compiler for Arm

6 years agoMerge pull request #1706 from LeeYoung624/dev
Yann Collet [Mon, 5 Aug 2019 13:56:50 +0000 (15:56 +0200)] 
Merge pull request #1706 from LeeYoung624/dev

add NULL pointer check in util.c

6 years agoMerge pull request #1709 from facebook/fix1624
Yann Collet [Mon, 5 Aug 2019 13:54:59 +0000 (15:54 +0200)] 
Merge pull request #1709 from facebook/fix1624

Fix compression ratio inefficiency

6 years agofactored the logic selecting lowest match index 1709/head
Yann Collet [Mon, 5 Aug 2019 13:18:43 +0000 (15:18 +0200)] 
factored the logic selecting lowest match index

as suggested by @terrelln

6 years agofix test 122
Yann Collet [Sat, 3 Aug 2019 14:43:34 +0000 (16:43 +0200)] 
fix test 122

it's an unsupported scenario.

6 years agominor test refactoring
Yann Collet [Fri, 2 Aug 2019 17:31:19 +0000 (19:31 +0200)] 
minor test refactoring

just for clarity, for the currently failing unit test

6 years agofixed minor conversion warning in datagen
Yann Collet [Fri, 2 Aug 2019 16:02:54 +0000 (18:02 +0200)] 
fixed minor conversion warning in datagen

6 years agofixed datagen
Yann Collet [Fri, 2 Aug 2019 15:34:53 +0000 (17:34 +0200)] 
fixed datagen

to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.

also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.

6 years agoregenerate sample to compress
Yann Collet [Fri, 2 Aug 2019 13:31:00 +0000 (15:31 +0200)] 
regenerate sample to compress

to reduce chances of differences between 32 and 64-bit fuzzer tests

6 years agofixed strategies btopt+
Yann Collet [Fri, 2 Aug 2019 12:42:53 +0000 (14:42 +0200)] 
fixed strategies btopt+

6 years agofixed strategy btlazy2
Yann Collet [Fri, 2 Aug 2019 12:26:26 +0000 (14:26 +0200)] 
fixed strategy btlazy2

6 years agofixed strategies greedy, lazy & lazy2
Yann Collet [Fri, 2 Aug 2019 12:21:39 +0000 (14:21 +0200)] 
fixed strategies greedy, lazy & lazy2

restore dictionary compression ratio

6 years agominor : fixed ptr arithmetic
Yann Collet [Thu, 1 Aug 2019 15:12:26 +0000 (17:12 +0200)] 
minor : fixed ptr arithmetic

invalid on void ptr

6 years agoadded efficiency test
Yann Collet [Thu, 1 Aug 2019 14:59:22 +0000 (16:59 +0200)] 
added efficiency test

to detect gross CR variations after a patch.

Tests normal and dictionary compression.

6 years agofixed compression ratio regression when dictionary-compressing medium-size inputs...
Yann Collet [Thu, 1 Aug 2019 13:58:17 +0000 (15:58 +0200)] 
fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3

6 years agoMerge pull request #1707 from felixhandte/travis-versions-test
Yann Collet [Wed, 31 Jul 2019 11:43:00 +0000 (13:43 +0200)] 
Merge pull request #1707 from felixhandte/travis-versions-test

Run `versionsTest` in CI

6 years agoRun `versionsTest` in CI 1707/head
W. Felix Handte [Wed, 31 Jul 2019 00:11:25 +0000 (20:11 -0400)] 
Run `versionsTest` in CI

6 years agobug fix : NULL pointer 1706/head
LeeYoung624 [Mon, 29 Jul 2019 09:05:50 +0000 (17:05 +0800)] 
bug fix : NULL pointer

6 years agoAdd support for IAR C/C++ Compiler for Arm 1705/head
Joseph Chen [Mon, 29 Jul 2019 07:20:37 +0000 (15:20 +0800)] 
Add support for IAR C/C++ Compiler for Arm

6 years agoMerge pull request #1701 from LeeYoung624/dev 1700/head
Felix Handte [Thu, 25 Jul 2019 15:56:37 +0000 (11:56 -0400)] 
Merge pull request #1701 from LeeYoung624/dev

memory leak fix

6 years agomemory leak fix 1701/head
LeeYoung624 [Thu, 25 Jul 2019 13:07:57 +0000 (21:07 +0800)] 
memory leak fix

6 years agoMerge pull request #1699 from felixhandte/seekable-gitignore
Felix Handte [Wed, 24 Jul 2019 23:07:55 +0000 (19:07 -0400)] 
Merge pull request #1699 from felixhandte/seekable-gitignore

Add New Seekable Compression Example to .gitignore

6 years agoupdated man page
Yann Collet [Wed, 24 Jul 2019 23:04:37 +0000 (16:04 -0700)] 
updated man page

6 years agoMerge pull request #1698 from felixhandte/bump-version-to-1.4.2
Yann Collet [Wed, 24 Jul 2019 23:03:01 +0000 (16:03 -0700)] 
Merge pull request #1698 from felixhandte/bump-version-to-1.4.2

Bump Library Version Number to 1.4.2

6 years agoMerge pull request #1690 from piguin/dev
Yann Collet [Wed, 24 Jul 2019 22:37:05 +0000 (15:37 -0700)] 
Merge pull request #1690 from piguin/dev

fix compiling errors with clang-8

6 years agoMerge pull request #1697 from Tyler-Tran/dev
Yann Collet [Wed, 24 Jul 2019 22:35:11 +0000 (15:35 -0700)] 
Merge pull request #1697 from Tyler-Tran/dev

Adding documentation for --shrink flag

6 years agoAdd New Seekable Compression Example to .gitignore 1699/head
W. Felix Handte [Wed, 24 Jul 2019 22:22:20 +0000 (18:22 -0400)] 
Add New Seekable Compression Example to .gitignore

6 years agoUpdate Manual 1698/head
W. Felix Handte [Wed, 24 Jul 2019 22:21:11 +0000 (18:21 -0400)] 
Update Manual

6 years agoUpdate CHANGELOG
W. Felix Handte [Wed, 24 Jul 2019 21:35:52 +0000 (17:35 -0400)] 
Update CHANGELOG

6 years agoBump Library Version Number to 1.4.2
W. Felix Handte [Wed, 24 Jul 2019 21:28:04 +0000 (17:28 -0400)] 
Bump Library Version Number to 1.4.2

6 years agoprevious commit did not undo all changes 1697/head
Tyler Tran [Wed, 24 Jul 2019 20:53:50 +0000 (13:53 -0700)] 
previous commit did not undo all changes

6 years agoremoving changes to zstd.1
Tyler Tran [Wed, 24 Jul 2019 20:52:34 +0000 (13:52 -0700)] 
removing changes to zstd.1

6 years agomodifying minor nit
Tyler Tran [Mon, 22 Jul 2019 23:36:44 +0000 (16:36 -0700)] 
modifying minor nit

6 years agoAdding documentation for shrink flag PR #1656
Tyler Tran [Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)] 
Adding documentation for shrink flag PR #1656

6 years agoMerge pull request #1695 from iburinoc/seekable-buff
Yann Collet [Mon, 22 Jul 2019 22:34:32 +0000 (15:34 -0700)] 
Merge pull request #1695 from iburinoc/seekable-buff

Fix seekable decompression in-memory api

6 years agoMerge pull request #1696 from terrelln/legacy-fix
Nick Terrell [Mon, 22 Jul 2019 22:06:18 +0000 (18:06 -0400)] 
Merge pull request #1696 from terrelln/legacy-fix

[legacy] Fix bug in zstd-0.5 decoder

6 years ago[legacy] Fix bug in zstd-0.5 decoder 1696/head
Nick Terrell [Mon, 22 Jul 2019 20:05:09 +0000 (13:05 -0700)] 
[legacy] Fix bug in zstd-0.5 decoder

The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.

The bug, introduced by commit 0fd322f812211e653a83492c0c114b933f8b6bc5,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.

I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.

Fixes #1693.

6 years agoFix seekable decompression in-memory api 1695/head
Sean Purcell [Mon, 22 Jul 2019 03:22:25 +0000 (23:22 -0400)] 
Fix seekable decompression in-memory api

6 years agoMerge pull request #1679 from ephiepark/dev
Yann Collet [Fri, 19 Jul 2019 22:29:07 +0000 (15:29 -0700)] 
Merge pull request #1679 from ephiepark/dev

Restructure the source files

6 years agoMerge pull request #1685 from vivekmig/dev
Yann Collet [Fri, 19 Jul 2019 22:22:29 +0000 (15:22 -0700)] 
Merge pull request #1685 from vivekmig/dev

Add Check if Block Size Exceeds Maximum

6 years agoMerge pull request #1692 from felixhandte/v1.4.1-changelog 1691/head
Yann Collet [Fri, 19 Jul 2019 16:10:39 +0000 (09:10 -0700)] 
Merge pull request #1692 from felixhandte/v1.4.1-changelog

Update CHANGELOG with v1.4.1 Changes

6 years agoUpdate CHANGELOG with v1.4.1 Changes 1692/head
W. Felix Handte [Fri, 19 Jul 2019 15:18:10 +0000 (11:18 -0400)] 
Update CHANGELOG with v1.4.1 Changes

6 years agofix compiling errors with clang-8 1690/head
Qin Li [Thu, 18 Jul 2019 18:44:59 +0000 (11:44 -0700)] 
fix compiling errors with clang-8

Compiling with clang-8 fails with the following errors:

largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
        U64 const dTime_ns = result.nanoSecPerRun;
                  ~~~~~~~~   ~~~~~~~^~~~~~~~~~~~~

zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 1 means that cover parameters were correct
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 0 in case of malformed parameters
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

6 years agoFixing decodecorpus test issue 1685/head
Vivek Miglani [Thu, 18 Jul 2019 21:32:09 +0000 (14:32 -0700)] 
Fixing decodecorpus test issue

6 years ago[doc] Bump Format Spec Version
W. Felix Handte [Wed, 17 Jul 2019 21:55:15 +0000 (17:55 -0400)] 
[doc] Bump Format Spec Version

6 years ago[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content
W. Felix Handte [Wed, 17 Jul 2019 21:30:09 +0000 (17:30 -0400)] 
[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.

6 years agoFixing compressed block size checks
Vivek Miglani [Wed, 17 Jul 2019 19:53:15 +0000 (12:53 -0700)] 
Fixing compressed block size checks

6 years agoRestructure the source files 1679/head 1687/head
Ephraim Park [Wed, 3 Jul 2019 20:40:37 +0000 (13:40 -0700)] 
Restructure the source files

6 years agoMerge pull request #1684 from terrelln/regression
Nick Terrell [Mon, 15 Jul 2019 19:39:52 +0000 (15:39 -0400)] 
Merge pull request #1684 from terrelln/regression

[regression] Update results for ZSTD_double_fast update

6 years agoReturn error if block size exceeds maximum
Vivek Miglani [Mon, 15 Jul 2019 19:10:21 +0000 (12:10 -0700)] 
Return error if block size exceeds maximum

6 years ago[regression] Update results for ZSTD_double_fast update 1684/head
Nick Terrell [Mon, 15 Jul 2019 18:25:22 +0000 (11:25 -0700)] 
[regression] Update results for ZSTD_double_fast update

6 years agoMerge branch 'master' of https://github.com/vivekmig/zstd into dev
Vivek Miglani [Mon, 15 Jul 2019 17:47:09 +0000 (10:47 -0700)] 
Merge branch 'master' of https://github.com/vivekmig/zstd into dev

6 years agoMerge pull request #1681 from facebook/level3
Yann Collet [Fri, 12 Jul 2019 23:16:06 +0000 (16:16 -0700)] 
Merge pull request #1681 from facebook/level3

updated double_fast complementary insertion

6 years ago[ldm] Fix bug in overflow correction with large job size (#1678)
Nick Terrell [Fri, 12 Jul 2019 22:45:18 +0000 (18:45 -0400)] 
[ldm] Fix bug in overflow correction with large job size (#1678)

* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.

6 years agoupdated the _extDict variant of double fast 1681/head
Yann Collet [Fri, 12 Jul 2019 21:17:17 +0000 (14:17 -0700)] 
updated the _extDict variant of double fast

6 years agodouble-fast: changed the trade-off for a smaller positive change
Yann Collet [Fri, 12 Jul 2019 18:34:53 +0000 (11:34 -0700)] 
double-fast: changed the trade-off for a smaller positive change

same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).

6 years agoperf improvements for zstd decode (#1668)
mgrice [Thu, 11 Jul 2019 22:31:07 +0000 (15:31 -0700)] 
perf improvements for zstd decode (#1668)

* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase

6 years agoupdated double_fast complementary insertion
Yann Collet [Thu, 11 Jul 2019 22:25:22 +0000 (15:25 -0700)] 
updated double_fast complementary insertion

in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.