git.ipfire.org Git - thirdparty/zstd.git/log

Fix merge conflicts

Merge pull request #1726 from nmagerko/stream-size

Add --stream-size=# option

Differentiate --stream-size from --size-hint

Minor documentation update

Remove bc from play tests

Merge pull request #1737 from terrelln/legacy-fix

[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression

Merge pull request #1736 from terrelln/fuzz-fix

[fuzz] Improve fuzzer build script and docs

Merge pull request #1724 from facebook/blockSize

clarifications on field `Block_Size`

Merge pull request #1725 from emaste/dev

remove extraneous doubled ;s

Merge pull request #1721 from facebook/seq127

fixed very minor inefficiency (nbSeq==127)

[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression

Extends the fix in PR#1722 to v0.2 and v0.4. These aren't built into
zstd by default, and v0.5 onward are not affected.

I only add the `srcSize > BLOCKSIZE` check to v0.4 because the comments
say that it must hold, but the equivalent comment isn't present in v0.2.

Credit to OSS-Fuzz.

[fuzz] Improve fuzzer build script and docs

* Remove the `make libFuzzer` target since it is broken and obsoleted
  by `CC=clang CXX=clang++ ./fuzz.py build all --enable-fuzzer`. The
  new `-fsanitize=fuzzer` is much better because it works with MSAN
  by default.
* Improve the `./fuzz.py gen` command by making the input type explicit
  when creating a new target.
* Update the `README` for `--enable-fuzzer`.

Fixes #1727.

Document --size-hint

Fix ZSTD_SRCSIZEHINT_MIN typo

Define ZSTD_SRCSIZEHINT_MIN as 0

Remove unnecessary test case

Fix typo in test

Revert change to zstd manual

Use int for srcSizeHint when sensible

Fix playTests and add additional cases

Add size-hint to fuzz tests

Add mention of regression with poor size hints

Make upper bound INT_MAX

Fix fall-through case

Add --size-hint=# option

Keep content size flag set in stream size mode

Remove extraneous variables

Remove extraneous parameter

Update man page

Set pledged size just before compression

`number` instead of `nb`

suggested by @terrelln

Tweak stdout, stderr redirection in new playTests

Add --stream-size=# command

clarifications on the meaning of field `Block_Size`

following comments from Intel's Smita Kumar.

remove extraneous doubled ;s

Merge pull request #1722 from felixhandte/legacy-decompression-fix

Fix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression

Add to CHANGELOG for Upcoming Release

Fix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression

fixed very minor inefficiency (nbSeq==127)

The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.

Merge pull request #1711 from felixhandte/changelog-v1.4.3

Update Changelog for v1.4.3

Update Changelog for v1.4.3

bumped version number

to v1.4.3

Merge pull request #1705 from josepho0918/dev

Add support for IAR C/C++ Compiler for Arm

Merge pull request #1706 from LeeYoung624/dev

add NULL pointer check in util.c

Merge pull request #1709 from facebook/fix1624

Fix compression ratio inefficiency

factored the logic selecting lowest match index

as suggested by @terrelln

fix test 122

it's an unsupported scenario.

minor test refactoring

just for clarity, for the currently failing unit test

fixed minor conversion warning in datagen

fixed datagen

to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.

also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.

regenerate sample to compress

to reduce chances of differences between 32 and 64-bit fuzzer tests

fixed strategies btopt+

fixed strategy btlazy2

fixed strategies greedy, lazy & lazy2

restore dictionary compression ratio

minor : fixed ptr arithmetic

invalid on void ptr

added efficiency test

to detect gross CR variations after a patch.

Tests normal and dictionary compression.

fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3

Merge pull request #1707 from felixhandte/travis-versions-test

Run `versionsTest` in CI

Run `versionsTest` in CI

bug fix : NULL pointer

Add support for IAR C/C++ Compiler for Arm

Merge pull request #1701 from LeeYoung624/dev

memory leak fix

memory leak fix

Merge pull request #1699 from felixhandte/seekable-gitignore

Add New Seekable Compression Example to .gitignore

updated man page

Merge pull request #1698 from felixhandte/bump-version-to-1.4.2

Bump Library Version Number to 1.4.2

Merge pull request #1690 from piguin/dev

fix compiling errors with clang-8

Merge pull request #1697 from Tyler-Tran/dev

Adding documentation for --shrink flag

Add New Seekable Compression Example to .gitignore

Update Manual

Update CHANGELOG

Bump Library Version Number to 1.4.2

previous commit did not undo all changes

removing changes to zstd.1

modifying minor nit

Adding documentation for shrink flag PR #1656

Merge pull request #1695 from iburinoc/seekable-buff

Fix seekable decompression in-memory api

Merge pull request #1696 from terrelln/legacy-fix

[legacy] Fix bug in zstd-0.5 decoder

[legacy] Fix bug in zstd-0.5 decoder

The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.

The bug, introduced by commit 0fd322f812211e653a83492c0c114b933f8b6bc5,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.

I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.

Fixes #1693.

Fix seekable decompression in-memory api

Merge pull request #1679 from ephiepark/dev

Restructure the source files

Merge pull request #1685 from vivekmig/dev

Add Check if Block Size Exceeds Maximum

Merge pull request #1692 from felixhandte/v1.4.1-changelog

Update CHANGELOG with v1.4.1 Changes

Update CHANGELOG with v1.4.1 Changes

fix compiling errors with clang-8

Compiling with clang-8 fails with the following errors:

largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
        U64 const dTime_ns = result.nanoSecPerRun;
                  ~~~~~~~~   ~~~~~~~^~~~~~~~~~~~~

zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 1 means that cover parameters were correct
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 0 in case of malformed parameters
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixing decodecorpus test issue

[doc] Bump Format Spec Version

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.

Fixing compressed block size checks

Restructure the source files

Merge pull request #1684 from terrelln/regression

[regression] Update results for ZSTD_double_fast update

Return error if block size exceeds maximum

[regression] Update results for ZSTD_double_fast update

Merge branch 'master' of https://github.com/vivekmig/zstd into dev

Merge pull request #1681 from facebook/level3

updated double_fast complementary insertion

[ldm] Fix bug in overflow correction with large job size (#1678)

* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.

updated the _extDict variant of double fast

double-fast: changed the trade-off for a smaller positive change

same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).

perf improvements for zstd decode (#1668)

* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase

updated double_fast complementary insertion

in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.