Yann Collet [Thu, 22 Dec 2022 00:21:29 +0000 (16:21 -0800)]
update levels.sh test
comparing level 19 to level 22 and expecting a stricter better result from level 22
is not that guaranteed,
because level 19 and 22 are very close to each other,
especially for small files,
so any noise in the final compression result
result in failing this test.
Level 22 could be compared to something much lower, like level 15,
But level 19 is required anyway, because there is a clamping test which depends on it.
Yann Collet [Wed, 21 Dec 2022 22:58:53 +0000 (14:58 -0800)]
improve compression ratio of small alphabets
fix #3328
In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.
After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
Yann Collet [Wed, 28 Dec 2022 07:40:34 +0000 (23:40 -0800)]
New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
Yann Collet [Thu, 22 Dec 2022 19:30:15 +0000 (11:30 -0800)]
spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.
This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.
The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.
This is just meant for completeness of the specification.
Nick Terrell [Thu, 22 Dec 2022 01:48:24 +0000 (17:48 -0800)]
[tests] Remove deprecated function from longmatch.c test
Thanks to @eli-schwartz for pointing it out!
We should maybe consider adding a helper function for applying
`ZSTD_parameters` and `ZSTD_compressionParameters` to a context.
That would aid the transition to the new API in situations like this.
Nick Terrell [Thu, 22 Dec 2022 01:21:09 +0000 (17:21 -0800)]
[cli-tests] Add --set-exact-output to update the expected output
`./run.py --set-exact-output` will update `stdout.expect` and
`stderr.expect` to match the expected output. This doesn't apply to
outputs which use `.glob` or `.ignore`.
The one that isn't pinned is the OSS-Fuzz builder and runner. They don't
offer tagged releases. I could pin to the current master commit, but I'm not
sure how desirable that is.
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
Nick Terrell [Mon, 19 Dec 2022 20:23:29 +0000 (12:23 -0800)]
[pzstd] Fixes for Windows build
* Add `Portability.h` to fix min/max issues.
* Fix conversion warnings
* Assert that windowLog <= 23, which is currently always the case.
This could be loosened, but we aren't looking to add new functionality.
Fixes on top of PR #3375 by @eli-schwartz, which added Windows CI for contrib & programs.
Yonatan Komornik [Sat, 17 Dec 2022 02:24:02 +0000 (18:24 -0800)]
Fix race condition in the Windows thread / pthread translation layer
When spawning a Windows thread we have small worker wrapper function that translates
between the interfaces of Windows and POSIX threads.
This wrapper is given a pointer that might get stale before the worker starts running,
resulting in UB and crashes.
This commit adds synchronization so that we know the wrapper has finished reading the data
it needs before we allow the main thread to resume execution.
Yonatan Komornik [Fri, 16 Dec 2022 00:11:56 +0000 (16:11 -0800)]
Fixes two bugs in the Windows thread / pthread translation layer
1. If threads are resized the threads' `ZSTD_pthread_t` might move
while the worker still holds a pointer into it (see more details in #3120).
2. The join operation was waiting for a thread and then return its `thread.arg`
as a return value, but since the `ZSTD_pthread_t thread` was passed by value it
would have a stale `arg` that wouldn't match the thread's actual return value.
This fix changes the `ZSTD_pthread_join` API and removes support for returning
a value. This means that we are diverging from the `pthread_join` API and this
is no longer just an alias.
In the future, if needed, we could return a Windows thread's return value using
`GetExitCodeThread`, but as this path wouldn't be excised in any case, it's
preferable to not add it right now.
Nick Terrell [Thu, 15 Dec 2022 23:46:34 +0000 (15:46 -0800)]
[api][visibility] Make the visibility macros more consistent
1. Follow the scheme introduced in PR #2501 for both `zdict.h` and `zstd_errors.h`.
2. If the `*_VISIBLE` macro isn't set, but the `*_VISIBILITY` macro is, use that.
Also make this change for `zstd.h`, since we probably shouldn't have changed
that macro name without backward compatibility in the first place.
3. Change all references to `*_VISIBILITY` to `*_VISIBLE`.
Nick Terrell [Fri, 16 Dec 2022 18:33:38 +0000 (10:33 -0800)]
[CI] Re-enable versions-test
It seems like with the deletion of Travis CI we didn't successfully transfer the
version compatibility test. Attempt to enable the version compatibility test.
Eli Schwartz [Thu, 15 Dec 2022 02:57:59 +0000 (21:57 -0500)]
meson: add Windows CI
There are a couple of oddities here. We don't attempt to build e.g.
contrib, because that doesn't seem to work at the moment. Also notice
that each command is its own step. This happens because github actions
runs in powershell, which doesn't seem to let you abort on the first
failure.
Eli Schwartz [Thu, 15 Dec 2022 02:10:41 +0000 (21:10 -0500)]
meson: add support for running both fast and slow version of tests
playTests.sh has an option to run really slow tests. This is enabled by
default in Meson, but what we really want is to do like the Makefile,
and run the fast ones by default, but with an option to run the slow
ones instead.
Eli Schwartz [Wed, 20 Apr 2022 00:58:33 +0000 (20:58 -0400)]
meson: don't require valgrind tests
It's entirely possible some people don't have valgrind installed, but
still want to run the tests. If they don't have it installed, then they
probably don't intend to run those precise test targets anyway.
Also, this solves an error when running the tests in an automated
environment. The valgrind tests have a hard dependency on behavior such
as `./zstd` erroring out with the message "stdin is a console, aborting"
which does not work if the automated environment doesn't have a console.
As a rough heuristic, automated environments lacking a console will
*probably* also not have valgrind, so avoiding that test definition
neatly sidesteps the issue.
Also, valgrind is not easily installable on macOS, at least homebrew
says it isn't available there. This makes it needlessly hard to
enable the testsuite on macOS.
Eli Schwartz [Fri, 16 Dec 2022 00:34:25 +0000 (19:34 -0500)]
meson: fix warning for using too-new features
In commit 031de3c69ccbf3282ed02fb49369b476730aeca8 a feature of Meson
0.50.0 was added, but the minimum specified version of Meson is 0.48.0.
Meson therefore emitted a warning:
WARNING: Project targets '>=0.48.0' but uses feature introduced in '0.50.0': required arg in compiler.has_header.
And if anyone actually used Meson 0.48.0 to build with, it would error
out with mysterious claims that the build file itself is invalid, rather
than telling the user to install a newer version of Meson.
Solve this by bumping the minimum version to align with reality. This
e.g. drops support for Debian oldstable (buster)'s packaged version of
Meson, but still works if backports are enabled, or if the user can
`pip install` a newer version.
Eli Schwartz [Fri, 16 Dec 2022 00:48:22 +0000 (19:48 -0500)]
meson: fix broken commit that broke the build
In commit 031de3c69ccbf3282ed02fb49369b476730aeca8 some code was added
that returned a boolean, but was treated as if it returned a dependency
object. This wasn't tested and could not work. Moreover, zstd no longer
built at all unless the entire programs directory was disabled and not
even evaluated.
Eli Schwartz [Wed, 14 Dec 2022 22:23:24 +0000 (17:23 -0500)]
meson: partial fix for building pzstd on MSVC
It uses non-portable compiler options unconditionally. Elsewhere, we
check the compiler ID and only add the right ones, globally. Do the same
here.
NDEBUG can actually be handled by a core option, so while we are moving
things around, do so.
Unfortunately, this doesn't fix things entirely. The remaining issue is
not Meson's issue though -- MSVC simply does not like this source code
and somehow chokes on innocent code with the inscrutable "syntax error"
and "illegal token".
Yann Collet [Thu, 15 Dec 2022 23:23:15 +0000 (15:23 -0800)]
check potential overflow of compressBound()
fixed #3323, reported by @nigeltao
Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
Nick Terrell [Thu, 15 Dec 2022 21:43:27 +0000 (13:43 -0800)]
Fix corruption that rarely occurs in 32-bit mode with wlog=25
Fix an off-by-one error in the compressor that emits corrupt blocks if:
* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset
This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.
Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
corruption, but it could waste CPU and not emit RLE even if the block
was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`
Thanks to @tansy for finding the issue, and giving us a reproducer!
Nick Terrell [Thu, 15 Dec 2022 00:07:22 +0000 (16:07 -0800)]
[legacy] Remove FORCE_MEMORY_ACCESS and only use memcpy
Delete unaligned memory access code from the legacy codebase by removing all the
non-memcpy functions. We don't care about speed at all for this codebase, only
simplicity.
Fix an instance of `NULL + 0` in `ZSTD_decompressStream()`. Also, improve our
`stream_decompress` fuzzer to pass `NULL` in/out buffers to
`ZSTD_decompressStream()`, and fix 2 issues that were immediately surfaced.