Nick Terrell [Sat, 15 Apr 2023 00:06:24 +0000 (17:06 -0700)]
Add ZSTD_d_maxBlockSize parameter
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.
This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
Nick Terrell [Fri, 14 Apr 2023 21:27:06 +0000 (14:27 -0700)]
Reduce streaming decompression memory by 128KB
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.
The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
for compressed blocks of size exactly 128 KB
which used to be disallowed by the spec
but have become allowed in more recent version of the spec.
While this limitation is fixed in decoders v1.5.4+,
implementers should refrain from generating such block with their custom encoder
as they could be misclassified as corrupted by older decoder versions.
Nick Terrell [Wed, 12 Apr 2023 23:00:28 +0000 (16:00 -0700)]
[oss-fuzz] Fix simple_round_trip fuzzer with overlapping decompression
When `ZSTD_c_maxBlockSize` is set, we weren't computing the
decompression margin correctly, leading to `dstSize_tooSmall` errors.
Fix that computation.
This is just a bug in the fuzzer, not a bug in the library itself.
detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.
W. Felix Handte [Mon, 3 Apr 2023 19:00:05 +0000 (15:00 -0400)]
Rename/Restructure Windows Release Artifact
https://github.com/facebook/zstd/releases/tag/v1.5.0 describes the structure
we want to adhere to. This commit tries to accomplish that automatically, so
we can avoid manual fixups on future releases.
Yann Collet [Fri, 31 Mar 2023 18:13:52 +0000 (11:13 -0700)]
fix decompression with -o writing into a block device
decompression features automatic support of sparse files,
aka a form of "compression" where entire blocks consists only of zeroes.
This only works for some compatible file systems (like ext4),
others simply ignore it (like afs).
Triggering this feature relies of `fseek()`.
But `fseek()` is not compatible with non-seekable devices, such as pipes.
Therefore it's disabled for pipes.
However, there are other objects which are not compatible with `fseek()`, such as block devices.
Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files.
Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`.
Yoni Gilad [Tue, 22 Mar 2022 16:24:09 +0000 (18:24 +0200)]
seekable_format: Add unit test for multiple decompress calls
This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
Yoni Gilad [Thu, 17 Feb 2022 17:46:29 +0000 (19:46 +0200)]
seekable_format: Prevent rereading frame when seeking forward
When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
Han Zhu [Tue, 28 Mar 2023 21:33:50 +0000 (14:33 -0700)]
Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:
{ size_t offset;
#if defined(__clang__)
if (LIKELY(ofBits > 1)) {
#else
if (ofBits > 1) {
#endif
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
Han Zhu [Mon, 27 Mar 2023 22:57:55 +0000 (15:57 -0700)]
Inline BIT_reloadDStream
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.
In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.
W. Felix Handte [Mon, 27 Mar 2023 15:24:47 +0000 (11:24 -0400)]
[contrib/pzstd] Detect and Select Maximum Available C++ Standard
Rather than remove the flag entirely, as proposed in #3499, this commit uses
the newest C++ standard the compiler supports. This retains the selection of
using only standardized features (excluding GNU extensions) and keeps the
recency requirements of the codebase explicit.
Tested with various versions of `g++` and `clang++`.
Tobias Hieta [Wed, 22 Mar 2023 21:13:57 +0000 (22:13 +0100)]
Disable linker flag detection on MSVC/ClangCL.
This fixes compilation with clang-cl on Windows. There
is a bug in cmake so that check_linker_flag() doesn't give
the correct result when using link.exe/lld-link.exe.
Details in CMake's gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22023
Nick Terrell [Fri, 10 Mar 2023 01:26:07 +0000 (17:26 -0800)]
[lazy] Skip over incompressible data
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.
The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.
I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
Peter Pentchev [Sat, 18 Mar 2023 20:32:42 +0000 (22:32 +0200)]
Fix a Python bytes/int mismatch in CLI tests
In Python 3.x, a single element of a bytes array is returned as
an integer number. Thus, NEWLINE is an int variable, and attempting
to add it to the line array will fail with a type mismatch error
that may be demonstrated as follows:
[roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: can't concat int to bytes
[roam@straylight ~]$
Nick Terrell [Tue, 24 Jan 2023 20:21:49 +0000 (12:21 -0800)]
Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
call those instead
Yonatan Komornik [Mon, 13 Mar 2023 22:34:13 +0000 (15:34 -0700)]
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528
Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).