Yann Collet [Wed, 21 Feb 2024 08:22:04 +0000 (00:22 -0800)]
updated setup-msys2 to v2.22.0
following a warning in recent test reports
```
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: msys2/setup-msys2@5beef6d11f48bba68b9eb503e3adc60b23c0cc36. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
```
Yann Collet [Sat, 3 Feb 2024 22:26:18 +0000 (14:26 -0800)]
finally, a version that generalizes well
While it's not always strictly a win,
it's a win for files that see a noticeably compression ratio increase,
while it's a very small noise for other files.
Downside is, this patch is less efficient for 32-bit arrays of integer
than the previous patch which was introducing losses for other files,
but it's still a net improvement on this scenario.
Yann Collet [Tue, 30 Jan 2024 07:25:24 +0000 (23:25 -0800)]
improve high compression ratio for file like #3793
this works great for 32-bit arrays,
notably the synthetic ones, with extreme regularity,
unfortunately, it's not universal,
and in some cases, it's a loss.
Crucially, on average, it's a loss on silesia.
The most negatively impacted file is x-ray.
It deserves an investigation before suggesting it as an evolution.
Yann Collet [Mon, 29 Jan 2024 23:24:42 +0000 (15:24 -0800)]
fix Visual Studio solutions
note: we probably don't want to maintain VS2008 solution anymore.
Its successor VS2010 is > 10 years old,
which is more or less the limit after which we can stop supporting old compilers.
Yann Collet [Tue, 16 Jan 2024 20:14:35 +0000 (12:14 -0800)]
made playTests.sh more compatible with older versions of grep
replaced `\+` by `*`.
`\+` means `[1-N]`,
while `*` means `[0-N]`,
so it's not strictly equivalent
but `\+` happens to be badly supported on some flavors of grep,
and for the purpose of these tests, `*` is good enough.
Eli Schwartz [Tue, 2 Jan 2024 06:36:45 +0000 (01:36 -0500)]
CI: meson: use builtin handling for MSVC
This avoids downloading -- and periodically bumping the checksum for --
a third-party action that isn't strictly required, and thus helps keep
down dependencies and reduce update churn.
Fix a nullptr dereference in ZSTD_createCDict_advanced2()
If the relevant allocation returns NULL, ZSTD_createCDict_advanced_internal()
will return NULL. But ZSTD_createCDict_advanced2() doesn't check for
this and attempts to use the returned pointer anyway, which leads to
a segfault.
Nick Terrell [Tue, 21 Nov 2023 21:26:25 +0000 (13:26 -0800)]
Modernize macros to use `do { } while (0)`
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.
The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.
Nick Terrell [Mon, 20 Nov 2023 20:04:30 +0000 (12:04 -0800)]
[huf] Fix null pointer addition
`HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
early for empty outputs. This is no change in behavior, because the
function was already exiting 0 in this case, just slightly later.
Nick Terrell [Mon, 20 Nov 2023 19:33:57 +0000 (11:33 -0800)]
[huf] Improve fast C & ASM performance on small data
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of
`src + 6 + 8`. This is safe because the fast decoding loops guarantee
to never read below `ilowest` already. This allows the fast decoder to
run for at least two more iterations, because it consumes at most 7
bytes per iteration.
* Continue the fast loop all the way until the number of safe iterations
is 0. Initially, I thought that when it got towards the end, the
computation of how many iterations of safe might become expensive. But
it ends up being slower to have to decode each of the 4 streams
individually, which makes sense.
This drastically speeds up the Huffman decoder on the `github` dataset
for the issue raised in #3762, measured with `zstd -b1e1r github/`.
| Decoder | Speed before | Speed after |
|----------|--------------|-------------|
| Fallback | 477 MB/s | 477 MB/s |
| Fast C | 384 MB/s | 492 MB/s |
| Assembly | 385 MB/s | 501 MB/s |
We can also look at the speed delta for different block sizes of silesia
using `zstd -b1e1r silesia.tar -B#`.
Nick Terrell [Sat, 18 Nov 2023 02:20:19 +0000 (18:20 -0800)]
[huf] Improve fast huffman decoding speed in linux kernel
gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.
Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because `entry.nbBits` is guaranteed to be
less than 64.
Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue #3762. So if even after this change, there is a
performance regression, users can opt-out at compile time.
Nick Terrell [Fri, 17 Nov 2023 01:15:25 +0000 (17:15 -0800)]
[debug] Don't define g_debuglevel in the kernel
We only use this constant when `DEBUGLEVEL>=2`, but we get
-Werror=pedantic errors for empty translation units, so still define it
except in kernel environments.
More recent versions of CMake emit the following warning:
CMake Deprecation Warning at cmake/CMakeLists.txt:10 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.