git.ipfire.org Git - thirdparty/zstd.git/log

Fix the build on GCC 4.x after 812e8f2a1

The ancient GCC 4.x doesn't understand the "optimize" attribute until 4.4.
Fix the build on platforms with GCC 4.x < 4.4 by limiting the DONT_VECTORIZE
definition to GCC 5 and greater.

Noticed and patch proposed by Warner Losh <imp@FreeBSD.org>.

v1.4.2: Merge pull request #1700 from facebook/dev

Merge pull request #1701 from LeeYoung624/dev

memory leak fix

memory leak fix

Merge pull request #1699 from felixhandte/seekable-gitignore

Add New Seekable Compression Example to .gitignore

updated man page

Merge pull request #1698 from felixhandte/bump-version-to-1.4.2

Bump Library Version Number to 1.4.2

Merge pull request #1690 from piguin/dev

fix compiling errors with clang-8

Merge pull request #1697 from Tyler-Tran/dev

Adding documentation for --shrink flag

Add New Seekable Compression Example to .gitignore

Update Manual

Update CHANGELOG

Bump Library Version Number to 1.4.2

previous commit did not undo all changes

removing changes to zstd.1

modifying minor nit

Adding documentation for shrink flag PR #1656

Merge pull request #1695 from iburinoc/seekable-buff

Fix seekable decompression in-memory api

Merge pull request #1696 from terrelln/legacy-fix

[legacy] Fix bug in zstd-0.5 decoder

[legacy] Fix bug in zstd-0.5 decoder

The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.

The bug, introduced by commit 0fd322f812211e653a83492c0c114b933f8b6bc5,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.

I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.

Fixes #1693.

Fix seekable decompression in-memory api

Merge pull request #1679 from ephiepark/dev

Restructure the source files

Merge pull request #1685 from vivekmig/dev

Add Check if Block Size Exceeds Maximum

v1.4.1: Merge pull request #1691 from facebook/dev

Merge pull request #1692 from felixhandte/v1.4.1-changelog

Update CHANGELOG with v1.4.1 Changes

Update CHANGELOG with v1.4.1 Changes

fix compiling errors with clang-8

Compiling with clang-8 fails with the following errors:

largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
        U64 const dTime_ns = result.nanoSecPerRun;
                  ~~~~~~~~   ~~~~~~~^~~~~~~~~~~~~

zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 1 means that cover parameters were correct
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
* @return 0 in case of malformed parameters
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixing decodecorpus test issue

[doc] Bump Format Spec Version

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.

Fixing compressed block size checks

Restructure the source files

Merge pull request #1684 from terrelln/regression

[regression] Update results for ZSTD_double_fast update

Return error if block size exceeds maximum

[regression] Update results for ZSTD_double_fast update

Merge branch 'master' of https://github.com/vivekmig/zstd into dev

Merge pull request #1681 from facebook/level3

updated double_fast complementary insertion

[ldm] Fix bug in overflow correction with large job size (#1678)

* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.

updated the _extDict variant of double fast

double-fast: changed the trade-off for a smaller positive change

same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).

perf improvements for zstd decode (#1668)

* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase

updated double_fast complementary insertion

in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.

Merge pull request #1675 from ephiepark/dev

Factor out the logic to build sequences

updated .gitignore

updated .gitignore rule

Merge pull request #1677 from LeeYoung624/gitignore_fix

fix gitignore errors

updated version number (to v1.4.1)

also : added doc on context re-use, as suggested by @scherepanov at #1676

fix gitignore errors

Merge pull request #1671 from ephiepark/dev

Adding targetCBlockSize param

Factor out the logic to build sequences

Adding targetCBlockSize param

[fuzz] Add a compression fuzzer with randomly sized output buffer (#1670)

ZSTD_compressSequences_internal assert op <= oend (#1667)

When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.

Merge pull request #1658 from facebook/memset

memset() rather than reduceIndex()

Merge pull request #1664 from ephiepark/dev

decodecorpus

Merge pull request #7 from ephiepark/decodecorpus

reflect code review comments

reflect code review comments

Merge pull request #1665 from lzutao/meson-deprecated-warnings

meson: Fix deprecated build warnings on build options

meson: Always build gen_html on build machine

Because we use gen_html as a generator instead of a binary to
run on host machine.

meson: Fix deprecated build warnings on build options

Meson now reserves the `build_` prefix for options

meson: Beautify travis config

Merge pull request #6 from ephiepark/decodecorpus

Add test case for short bistream

Add test case for short bistream

Adding shrinking flag for cover and fastcover (#1656)

* Changed ERROR(GENERIC) excluding inits

* editing git ignore

* Edited init functions to size_t returns

* moved declarations earlier

* resolved issues with changes to init functions

* fixed style and an error check

* attempting to add tests that might trigger changes

* added && die to cases expecting to fail

* resolved no die on expected failed command

* fixed accel to be incorrect value

* Adding an automated shrinking option

* Fixing build

* finalizing fixes

* fix?

* Removing added comment in cover.h

* Styling fixes

* Merging with fb dev

* removing megic number for default regression

* Requested revisions

* fixing support for fast cover

* fixing casting errors

* parenthesis fix

* fixing some build nits

* resolving travis ci syntax

* might resolve all compilation issues

* removed unused variable

* remodeling the selectDict function

* fixing bad memory access

* fixing error checks

* fixed erroring check in selectDict

* fixing mixed declarations

* modify mixed declaration

* fixing nits and adding test cases

* Adding requested changes + fixed bug for error checking

* switched double comparison from != to <

* fixed declaration typing

* refactoring COVER_best_finish() and changing shrinkDict

* removing the const's

* modifying ZDICT_optimizeTrainFromBuffer_cover functions

* fixing potential bad memcpy

* fixing the error function for dict size

Merge pull request #5 from ephiepark/decodecorpus

Decodecorpus

Fix a constraint stricter than the spec

enable repeat mode on rle

changed naming to ZSTD_indexTooCloseToMax()

Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.

Merge pull request #1659 from terrelln/big-dict

Fix data corruption in niche use case

[tests] Add tests for big dictionaries

prefer memset() rather than reduceIndex() when close to index range limit

by disabling continue mode when index is close to limit.

benchfn : added macro macro CONTROL()

like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).

[zstd] Fix data corruption in niche use case

* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.

[zstdmt] Update assert to use ZSTD_WINDOWLOG_MAX

[opt] Add asserts for corruption in ZSTD_updateTree()

Merge pull request #1626 from LeeYoung624/dev

add cmake lz4 support

Merge pull request #1655 from terrelln/regression-test

[regression] Update results for small wlog patch PR#1624

[regression] Update results for small wlog patch PR#1624

Merge pull request #1647 from LeeYoung624/cmake_bug_fix

CMake bug fix: didn't install zstdless and zstdgrep.

Merge pull request #1640 from felixhandte/lstat-macro-guard

Protect lstat() With Better Macro Guard

Merge pull request #1624 from facebook/smallwlog

Improves compression ratio for small windowLog

Merge pull request #1650 from scharan/RemoveExportsForStaticLibrary

Remove ZSTD_DLL_EXPORT=1 for static lib

Merge pull request #1644 from chungy/chmod_600

[programs] set chmod 600 after opening destination file

Remove ZSTD_DLL_EXPORT=1 for static lib

As a principle, static libs should not dllexport methods, that should only be used when building DLLs.

Case in point: when static libs with dllexport directives are linked into DLLs created with a .def file, the VC++ compiler exports the dllexported methods into the DLL, in addition to the exports listed in the .def file. This will result in undesired link dependencies and is not the correct thing to do.

Install zstdless & zstdgrep as 'PROGRAMS' in CMake

Merge pull request #1620 from michaelforney/test-no-threads

Skip --adapt and --rsyncable tests when built without thread support

Merge pull request #1646 from terrelln/oss-fuzz

[fuzz] Remove max_len from the options

Merge pull request #4 from facebook/dev

merge

Merge pull request #1619 from j301scott/dev

CMake: Check for existing custom target 'uninstall'

[fuzz] Remove max_len from the options

Add Contbuild Test for C99 Build

Merge pull request #1642 from Absotively/dev

VS2010 project settings improvements

[programs] Don’t try to chmod a dst file if it can’t be opened

Repairs an oversight in my last commit, thanks @Cyan4973

[programs] set chmod 600 after opening destination file

This resolves a race condition where zstd or unzstd may expose read
permissions beyond the original file allowed. Mode 600 is used
temporarily during the compression and decompression write stage
and the new file inherits the original file’s mode at the end.

Fixes #1630

In VS2010+, turn off assembler output for libzstd & libzstd-dll, and don't export functions from libzstd

Switch Macro Guarding lstat()

Clean Up Temp Files Produced By playTests.sh

Merge pull request #1635 from terrelln/opt-opt

[libzstd] Optimize ZSTD_insertBt1() for repetitive data

[regression] Update results.csv

[libzstd] Optimize ZSTD_insertBt1() for repetitive data

We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```

At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.

Fixes Issue #1634.