]>
git.ipfire.org Git - thirdparty/zstd.git/log
Yann Collet [Sat, 20 Sep 2025 05:42:06 +0000 (21:42 -0800)]
Merge pull request #4487 from neiljohari/adhoc/dictionary-file-counting
make DiB_fileStats skip invalid files (fileSize <= 0) to prevent negative totals and bogus allocation
Yann Collet [Tue, 16 Sep 2025 21:55:11 +0000 (13:55 -0800)]
Merge pull request #4481 from w1m024/support-rvv-getmask
add RVV optimization for ZSTD_row_getMatchMask
Neil Johari [Tue, 16 Sep 2025 07:03:08 +0000 (00:03 -0700)]
Remove debug logging
Neil Johari [Tue, 16 Sep 2025 07:02:04 +0000 (00:02 -0700)]
Fix bug
Neil Johari [Tue, 16 Sep 2025 06:58:45 +0000 (23:58 -0700)]
Add debug logging and simple repro
w1m024 [Thu, 11 Sep 2025 20:42:40 +0000 (20:42 +0000)]
Refactor ZSTD_row_getMatchMask for RVV optimization
Performance (vs. SWAR)
- 16-byte data: 5.87x speedup
- 32-byte data: 9.63x speedup
- 64-byte data: 17.98x speedup
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
w1m024 [Tue, 9 Sep 2025 06:20:55 +0000 (06:20 +0000)]
add RVV optimization for ZSTD_row_getMatchMask
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
Yann Collet [Mon, 8 Sep 2025 14:52:31 +0000 (07:52 -0700)]
Merge pull request #4480 from facebook/dependabot/github_actions/github/codeql-action-3.30.1
Bump github/codeql-action from 3.29.4 to 3.30.1
Yann Collet [Mon, 8 Sep 2025 14:51:17 +0000 (07:51 -0700)]
Merge pull request #4479 from facebook/dependabot/github_actions/msys2/setup-msys2-2.29.0
Bump msys2/setup-msys2 from 2.28.0 to 2.29.0
dependabot[bot] [Mon, 8 Sep 2025 05:06:48 +0000 (05:06 +0000)]
Bump github/codeql-action from 3.29.4 to 3.30.1
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.29.4 to 3.30.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/
4e828ff8d448a8a6e532957b1811f387a63867e8 ...
f1f6e5f6af878fb37288ce1c627459e94dbf7d01 )
---
updated-dependencies:
- dependency-name: github/codeql-action
dependency-version: 3.30.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
dependabot[bot] [Mon, 8 Sep 2025 05:06:40 +0000 (05:06 +0000)]
Bump msys2/setup-msys2 from 2.28.0 to 2.29.0
Bumps [msys2/setup-msys2](https://github.com/msys2/setup-msys2) from 2.28.0 to 2.29.0.
- [Release notes](https://github.com/msys2/setup-msys2/releases)
- [Changelog](https://github.com/msys2/setup-msys2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/msys2/setup-msys2/compare/
40677d36a502eb2cf0fb808cc9dec31bf6152638 ...
fb197b72ce45fb24f17bf3f807a388985654d1f2 )
---
updated-dependencies:
- dependency-name: msys2/setup-msys2
dependency-version: 2.29.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Fri, 5 Sep 2025 22:32:16 +0000 (15:32 -0700)]
Merge pull request #4472 from bgilbert/override_dependency
meson: Call `meson.override_dependency()` if Meson is new enough
Yann Collet [Wed, 3 Sep 2025 00:19:17 +0000 (17:19 -0700)]
Merge pull request #4475 from Cyan4973/default_nbThreads
Default nb threads
Yann Collet [Tue, 2 Sep 2025 23:36:44 +0000 (16:36 -0700)]
fixed minor unused variable warning
in certain compilation modes
Yann Collet [Tue, 2 Sep 2025 23:21:48 +0000 (16:21 -0700)]
benchmark uses 1 thread by default
Yann Collet [Tue, 2 Sep 2025 23:05:35 +0000 (16:05 -0700)]
only display nbThread Msg in nbThreads > 1
Yann Collet [Tue, 2 Sep 2025 22:53:45 +0000 (15:53 -0700)]
specify nb of threads used during benchmarking
used to require `-v` (verbose) modifier
Yann Collet [Tue, 2 Sep 2025 22:46:51 +0000 (15:46 -0700)]
fixed -T# documentation in zstd -H
provide the local value for default nbThreads
which is dynamic and depends on local nb of cores.
Yann Collet [Tue, 2 Sep 2025 22:40:32 +0000 (15:40 -0700)]
Merge pull request #4474 from jlokier/threads-doc-fix
Update manual about the default value of `-T#`/`--threads=#`
Jamie Lokier [Tue, 2 Sep 2025 15:44:09 +0000 (16:44 +0100)]
Update manual about the default value of `-T#`/`--threads=#`
The section about `ZSTD_NBTHREADS` already explains the default number of
threads, since it changed from 1 (commit
17beeb5 ). But the option description
for `-T#`/`--threads=#` incorrectly said the default was still 1.
I noticed this when I found compression slower with `-T1` than without it.
Benjamin Gilbert [Thu, 28 Aug 2025 23:50:34 +0000 (18:50 -0500)]
meson: Call meson.override_dependency() if Meson is new enough
This tells Meson that we intend libzstd_dep to be used by a parent project
if the parent looks for a dependency named "libzstd". Without this, the
mapping from "libzstd" to our variable libzstd_dep must be encoded in the
Meson wrap file or in the parent's meson.build.
Yann Collet [Mon, 25 Aug 2025 16:07:01 +0000 (09:07 -0700)]
Merge pull request #4469 from facebook/dependabot/github_actions/actions/setup-java-5.0.0
Bump actions/setup-java from 4.7.1 to 5.0.0
dependabot[bot] [Mon, 25 Aug 2025 09:00:57 +0000 (09:00 +0000)]
Bump actions/setup-java from 4.7.1 to 5.0.0
Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4.7.1 to 5.0.0.
- [Release notes](https://github.com/actions/setup-java/releases)
- [Commits](https://github.com/actions/setup-java/compare/
c5195efecf7bdfc987ee8bae7a71cb8b11521c00 ...
dded0888837ed1f317902acf8a20df0ad188d165 )
---
updated-dependencies:
- dependency-name: actions/setup-java
dependency-version: 5.0.0
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Fri, 22 Aug 2025 00:20:33 +0000 (17:20 -0700)]
Merge pull request #4440 from arpadpanyik-arm/convert_seq_sve2
AArch64: Add SVE2 path for convertSequences_noRepcodes
Arpad Panyik [Thu, 17 Jul 2025 07:46:01 +0000 (07:46 +0000)]
AArch64: Add SVE2 path for convertSequences_noRepcodes
Add an 8-way vector length agnostic (VLA) SVE2 code path for
convertSequences_noRepcodes. It works with any SVE vector length.
Relative performance to GCC-13 using: `./fullbench -b18 -l5 enwik5`
Neon SVE2
Neoverse-V2 before after uplift
GCC-13: 100.000% 103.209% 1.032x
GCC-14: 100.309% 134.872% 1.344x
GCC-15: 100.355% 134.827% 1.343x
Clang-18: 123.614% 128.565% 1.040x
Clang-19: 123.587% 132.984% 1.076x
Clang-20: 123.629% 133.023% 1.075x
Neon SVE2
Cortex-A720 before after uplift
GCC-13: 100.000% 116.032% 1.160x
GCC-14: 99.700% 116.648% 1.169x
GCC-15: 100.354% 117.047% 1.166x
Clang-18: 100.447% 116.762% 1.162x
Clang-19: 100.454% 116.627% 1.160x
Clang-20: 100.452% 116.649% 1.161x
Yann Collet [Thu, 21 Aug 2025 16:30:29 +0000 (09:30 -0700)]
Merge pull request #4463 from brad0/gnu_source_qsort
Check for build environment instead of just _GNU_SOURCE
Yann Collet [Wed, 20 Aug 2025 18:23:34 +0000 (11:23 -0700)]
Merge pull request #4465 from thiru-mcw/arm64_support
WOA_support:: Add CI setup for packaging Windows on ARM artifacts
Thirumalai Nagalingam [Wed, 20 Aug 2025 11:49:48 +0000 (17:19 +0530)]
CI: Enable MSVC ARM64 job using Github WOA runner
- Reintroduce the MSVC ARM64 build configuration with "Visual Studio 17 2022"
- Update runner to `windows-11-arm` (GitHub-hosted Windows on ARM)
Thirumalai Nagalingam [Wed, 20 Aug 2025 11:42:21 +0000 (17:12 +0530)]
CI: Add CI setup for packaging Win-ARM64 artifacts
Thirumalai Nagalingam [Wed, 20 Aug 2025 11:42:05 +0000 (17:12 +0530)]
CI: Update build_package.bat for CMake builds
Yann Collet [Wed, 20 Aug 2025 00:43:11 +0000 (17:43 -0700)]
Merge pull request #4464 from facebook/cli_traces_div0
fixed a potential division by 0 in the cli trace unit
Yann Collet [Wed, 20 Aug 2025 00:13:15 +0000 (17:13 -0700)]
fixed a potential division by 0 in the cli trace unit
Brad Smith [Tue, 19 Aug 2025 13:23:38 +0000 (09:23 -0400)]
Check for build environment instead of just _GNU_SOURCE
Fixes the build on OpenBSD and NetBSD. It is too easy for _GNU_SOURCE
to be defined even on non-Linux systems. Found via py-zstandard with
the embedded copy of zstandard and Python defines _GNU_SOURCE.
Also simplify the Linux checking, there is no need to check the rest
of the symbol names.
Yann Collet [Wed, 20 Aug 2025 00:02:48 +0000 (17:02 -0700)]
Merge pull request #4419 from AZero13/patch-1
Check for job before releasing resources
Yann Collet [Mon, 18 Aug 2025 16:10:13 +0000 (09:10 -0700)]
Merge pull request #4462 from facebook/dependabot/github_actions/actions/checkout-5
Bump actions/checkout from 4 to 5
dependabot[bot] [Mon, 18 Aug 2025 08:13:07 +0000 (08:13 +0000)]
Bump actions/checkout from 4 to 5
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Commits](https://github.com/actions/checkout/compare/v4...v5)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '5'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Sun, 17 Aug 2025 19:14:42 +0000 (12:14 -0700)]
Merge pull request #4459 from Margen67/premake
Remove need for trailing forward slash in dir
Margen67 [Sun, 17 Aug 2025 07:44:39 +0000 (00:44 -0700)]
Remove need for trailing forward slash in dir
Yann Collet [Mon, 28 Jul 2025 19:01:58 +0000 (11:01 -0800)]
Merge pull request #4448 from Cyan4973/install_oses
regroup list of OSes for install inside common variable
Yann Collet [Wed, 23 Jul 2025 22:59:23 +0000 (15:59 -0700)]
regroup list of OSes for install inside common variable
within lib/install_oses.mk.
fixes #4445
Yann Collet [Mon, 28 Jul 2025 15:33:09 +0000 (07:33 -0800)]
Merge pull request #4450 from facebook/dependabot/github_actions/github/codeql-action-3.29.4
Bump github/codeql-action from 3.28.9 to 3.29.4
dependabot[bot] [Mon, 28 Jul 2025 06:30:43 +0000 (06:30 +0000)]
Bump github/codeql-action from 3.28.9 to 3.29.4
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.9 to 3.29.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/
9e8d0789d4a0fa9ceb6b1738f7e269594bdd67f0 ...
4e828ff8d448a8a6e532957b1811f387a63867e8 )
---
updated-dependencies:
- dependency-name: github/codeql-action
dependency-version: 3.29.4
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Thu, 24 Jul 2025 18:07:16 +0000 (10:07 -0800)]
Merge pull request #4447 from facebook/android-cmake
added android cmake build
Yann Collet [Wed, 23 Jul 2025 23:03:37 +0000 (15:03 -0800)]
Merge pull request #4413 from arpadpanyik-arm/huf_decode2x
AArch64: Enhance struct access in Huffman decode 2X
Yann Collet [Wed, 23 Jul 2025 23:01:36 +0000 (15:01 -0800)]
Merge pull request #4443 from facebook/opt_simplify_4442
simplify sequence resolution in zstd_opt
Yann Collet [Wed, 23 Jul 2025 21:54:18 +0000 (14:54 -0700)]
added android cmake build
is expecte to fail, due to #4444
Yann Collet [Sat, 19 Jul 2025 04:21:47 +0000 (21:21 -0700)]
simplify sequence resolution in zstd_opt
initially hinted by @pitaj in #4442
Yann Collet [Sat, 19 Jul 2025 02:55:47 +0000 (18:55 -0800)]
Merge pull request #4394 from AZero13/zstd
Remove redundant setting of allJobsCompleted to 1
Yann Collet [Sat, 19 Jul 2025 02:54:49 +0000 (18:54 -0800)]
Merge pull request #4418 from arpadpanyik-arm/decode_seq_opt
AArch64: Improve ZSTD_decodeSequence performance
Yann Collet [Sat, 19 Jul 2025 02:54:24 +0000 (18:54 -0800)]
Merge pull request #4435 from zijianli1234/dev
add riscv ci
Yann Collet [Mon, 14 Jul 2025 07:52:48 +0000 (23:52 -0800)]
Merge pull request #4429 from arpadpanyik-arm/convertSequences_Neon
Improve speed of ZSTD_compressSequencesAndLiterals using Neon
Yann Collet [Mon, 14 Jul 2025 07:52:32 +0000 (23:52 -0800)]
Merge pull request #4436 from facebook/dependabot/github_actions/cygwin/cygwin-install-action-6
Bump cygwin/cygwin-install-action from 5 to 6
dependabot[bot] [Mon, 14 Jul 2025 06:27:46 +0000 (06:27 +0000)]
Bump cygwin/cygwin-install-action from 5 to 6
Bumps [cygwin/cygwin-install-action](https://github.com/cygwin/cygwin-install-action) from 5 to 6.
- [Release notes](https://github.com/cygwin/cygwin-install-action/releases)
- [Commits](https://github.com/cygwin/cygwin-install-action/compare/
f61179d72284ceddc397ed07ddb444d82bf9e559 ...
f2009323764960f80959895c7bc3bb30210afe4d )
---
updated-dependencies:
- dependency-name: cygwin/cygwin-install-action
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Sun, 13 Jul 2025 03:44:28 +0000 (19:44 -0800)]
Merge pull request #4433 from facebook/vs2025
removed VS2019 runners
ZijianLi [Sun, 13 Jul 2025 02:46:00 +0000 (10:46 +0800)]
add -DMEM_FORCE_MEMORY_ACCESS=0 in CI RVV test
Yann Collet [Fri, 11 Jul 2025 17:29:07 +0000 (10:29 -0700)]
removed VS2019 runners
replaced by one vs2025 runner,
which is badly named since it still running MSVC 2022,
but it's a good test that shows that the matrix is able to handle multiple MSVC versions.
Arpad Panyik [Tue, 8 Jul 2025 17:09:09 +0000 (17:09 +0000)]
AArch64: Enable optimized QEMU CI builds
Add missing `-O3` flag to the compilation of AArch64 SVE2 builds
executed by QEMU. This can decrease the CI job runtime considerably.
Arpad Panyik [Tue, 8 Jul 2025 17:07:41 +0000 (17:07 +0000)]
AArch64: Add Neon path for convertSequences_noRepcodes
Add a 4-way Neon implementation for the convertSequences_noRepcodes
function. Remove 'static' keywords from all of its implementations to
be able to add unit tests.
Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5`
Neoverse-V2 before after
Clang-18: 100.000% 311.703%
Clang-19: 100.191% 311.714%
Clang-20: 100.181% 311.723%
GCC-13: 107.520% 252.309%
GCC-14: 107.652% 253.158%
GCC-15: 107.674% 253.168%
Cortex-A720 before after
Clang-18: 100.000% 204.512%
Clang-19: 102.825% 204.600%
Clang-20: 102.807% 204.558%
GCC-13: 110.668% 203.594%
GCC-14: 110.684% 203.978%
GCC-15: 102.864% 204.299%
Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
Arpad Panyik [Tue, 8 Jul 2025 17:05:45 +0000 (17:05 +0000)]
Improve ZSTD_get1BlockSummary
Add a faster scalar implementation of ZSTD_get1BlockSummary which
removes the data dependency of the accumulators in the hot loop to
leverage the superscalar potential of recent out-of-order CPUs.
The new algorithm leverages SWAR (SIMD Within A Register) methodology
to exploit the capabilities of 64-bit architectures. It achieves this
by packing two 32-bit data elements into a single 64-bit register,
enabling parallel operations on these subcomponents while ensuring
that the 32-bit boundaries prevent overflow, thereby optimizing
computational efficiency.
Corresponding unit tests are included.
Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5`
Neoverse-V2 before after
GCC-13: 100.000% 290.527%
GCC-14: 100.000% 291.714%
GCC-15: 99.914% 291.495%
Clang-18: 148.072% 264.524%
Clang-19: 148.075% 264.512%
Clang-20: 148.062% 264.490%
Cortex-A720 before after
GCC-13: 100.000% 235.261%
GCC-14: 101.064% 234.903%
GCC-15: 112.977% 218.547%
Clang-18: 127.135% 180.359%
Clang-19: 127.149% 180.297%
Clang-20: 127.154% 180.260%
Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
ZijianLi [Mon, 7 Jul 2025 15:07:39 +0000 (23:07 +0800)]
add compiler version check.
ZijianLi [Sun, 29 Jun 2025 07:36:25 +0000 (15:36 +0800)]
fix dereferencing type-punned pointer error
ZijianLi [Sun, 29 Jun 2025 07:33:50 +0000 (15:33 +0800)]
add riscv rvv ci
Yann Collet [Wed, 25 Jun 2025 11:47:01 +0000 (07:47 -0400)]
Merge pull request #4414 from arpadpanyik-arm/copy8
AArch64: Use better block COPY8
Rose [Tue, 24 Jun 2025 18:05:08 +0000 (14:05 -0400)]
Check for job before releasing
ZSTDMT_freeCCtx calls ZSTDMT_releaseAllJobResources, but ZSTDMT_releaseAllJobResources may be called when ZSTDMT_freeCCtx is called when initialization fails, resulting in a NULL pointer dereference.
Rose [Mon, 26 May 2025 19:56:55 +0000 (15:56 -0400)]
Remove redundant setting of allJobsCompleted to 1
This will do it automatically.
Arpad Panyik [Tue, 24 Jun 2025 11:26:58 +0000 (11:26 +0000)]
AArch64: Improve ZSTD_decodeSequence performance
LLVM's alias-analysis sometimes fails to see that a static-array member
of a struct cannot alias other members. This patch:
- Reduces array accesses via struct indirection to aid load/store alias
analysis under Clang.
- Converts dynamic array indexing into conditional-move arithmetic,
eliminating branches and extra loads/stores on out-of-order CPUs.
- Reloads the bitstream only when match-length bits are consumed
(assuming each reload only needs to happen once per match-length
read), improving branch-prediction rates.
- Removes the UNLIKELY() hint, which recent compilers already handle
well without cost.
Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":
Clang-19 Clang-20 Clang-* GCC-14 GCC-15
1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891%
2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926%
3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660%
4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187%
5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228%
6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851%
7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868%
* Requires Clang-21 support from LLVM commit hash
`
a53003fe23cb6c871e72d70ff2d3a075a7490da2 `
(Clang-21 hasn’t been released as of this writing)
Co-authored by:
David Sherwood, David.Sherwood@arm.com
Ola Liljedahl, Ola.Liljedahl@arm.com
Arpad Panyik [Fri, 20 Jun 2025 15:29:17 +0000 (15:29 +0000)]
AArch64: Enhance struct access in Huffman decode 2X
In the multi-stream multi-symbol Huffman decoder GCC generates
suboptimal code - emitting more loads for HUF_DEltX2 struct member
accesses. Forcing it to use 32-bit loads and bit arithmetic to extract
the necessary parts (UBFX) improves the overall decode speed.
Also avoid integer type conversions in the symbol decodes, which
leads to better instruction selection in table lookup accesses.
On AArch64 the decoder no longer runs into register-pressure limits,
so we can simplify the hot path and improve throughput
Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":
Clang-20 Clang-* GCC-13 GCC-14 GCC-15
1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987%
2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554%
3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261%
* Requires Clang-21 support from LLVM commit hash
`
a53003fe23cb6c871e72d70ff2d3a075a7490da2 `
(Clang-21 hasn’t been released as of this writing)
Yann Collet [Mon, 23 Jun 2025 13:32:14 +0000 (06:32 -0700)]
Merge pull request #4417 from facebook/dependabot/github_actions/msys2/setup-msys2-2.28.0
Bump msys2/setup-msys2 from 2.27.0 to 2.28.0
dependabot[bot] [Mon, 23 Jun 2025 06:24:00 +0000 (06:24 +0000)]
Bump msys2/setup-msys2 from 2.27.0 to 2.28.0
Bumps [msys2/setup-msys2](https://github.com/msys2/setup-msys2) from 2.27.0 to 2.28.0.
- [Release notes](https://github.com/msys2/setup-msys2/releases)
- [Changelog](https://github.com/msys2/setup-msys2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/msys2/setup-msys2/compare/
61f9e5e925871ba6c9e3e8da24ede83ea27fa91f ...
40677d36a502eb2cf0fb808cc9dec31bf6152638 )
---
updated-dependencies:
- dependency-name: msys2/setup-msys2
dependency-version: 2.28.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Yann Collet [Sun, 22 Jun 2025 03:31:26 +0000 (20:31 -0700)]
Merge pull request #4415 from bgilbert/buildtype
meson: drop unused variable
Yann Collet [Sat, 21 Jun 2025 19:37:08 +0000 (12:37 -0700)]
Merge pull request #4416 from facebook/test_largeDictionary
added test-largeDictionary to dev-long CI script
Yann Collet [Sat, 21 Jun 2025 19:01:07 +0000 (12:01 -0700)]
update tests duration indications
Yann Collet [Sat, 21 Jun 2025 17:55:34 +0000 (10:55 -0700)]
added test-largeDictionary to dev-long CI script
Yann Collet [Sat, 21 Jun 2025 18:33:44 +0000 (11:33 -0700)]
Merge pull request #4402 from mugitya03/tests
Release resources in error paths via cleanup
jinyaoguo [Sat, 21 Jun 2025 17:43:47 +0000 (13:43 -0400)]
fix
jinyaoguo [Sat, 21 Jun 2025 17:03:13 +0000 (13:03 -0400)]
delete
jinyaoguo [Sat, 21 Jun 2025 16:57:12 +0000 (12:57 -0400)]
merge
Benjamin Gilbert [Sat, 21 Jun 2025 06:34:13 +0000 (23:34 -0700)]
meson: drop unused variable
Arpad Panyik [Fri, 20 Jun 2025 14:48:33 +0000 (14:48 +0000)]
AArch64: Use better block copy8
The vector copy is only necessary for 16-byte blocks on AArch64.
Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":
Clang-19 Clang-20 GCC-14 GCC-15
1#silesia.tar: +0.316% +0.865% +0.025% +0.096%
2#silesia.tar: +0.689% +1.374% +0.027% +0.065%
3#silesia.tar: +0.811% +1.654% +0.034% +0.033%
4#silesia.tar: +0.912% +1.755% +0.027% +0.042%
5#silesia.tar: +0.995% +1.826% +0.062% +0.094%
6#silesia.tar: +0.976% +1.777% +0.065% +0.104%
7#silesia.tar: +0.910% +1.738% +0.077% +0.110%
Yann Collet [Fri, 20 Jun 2025 06:41:38 +0000 (23:41 -0700)]
Merge pull request #4367 from ClickHouse/cfi
Add unwind information in huf_decompress_amd64.S
Yann Collet [Thu, 19 Jun 2025 21:32:32 +0000 (14:32 -0700)]
Merge pull request #4412 from Cyan4973/rm_bd
remove duplicate
Yann Collet [Wed, 18 Jun 2025 22:07:32 +0000 (15:07 -0700)]
removed duplicate
this file is already present as `largeDictionary.c`
Yann Collet [Wed, 18 Jun 2025 20:48:54 +0000 (13:48 -0700)]
Merge pull request #4411 from arpadpanyik-arm/hist_sve2
AArch64: Add SVE2 implementation of histogram computation
Yann Collet [Mon, 16 Jun 2025 17:54:43 +0000 (10:54 -0700)]
Merge pull request #4409 from bgilbert/meson-license
meson: use SPDX expression for license
Yann Collet [Mon, 16 Jun 2025 16:01:58 +0000 (09:01 -0700)]
Merge pull request #4408 from mugitya03/MLK-3
Ensure BMK_timedFnState is always freed in benchMem
Benjamin Gilbert [Sun, 15 Jun 2025 02:47:54 +0000 (19:47 -0700)]
meson: use SPDX expression for license
This is the format recommended by Meson documentation.
Arpad Panyik [Wed, 11 Jun 2025 12:19:42 +0000 (12:19 +0000)]
Add unit tests for HIST_count_wksp
The following tests are included:
- Empty input scenario test.
- Workspace size and alignment tests.
- Symbol out-of-range tests.
- Cover multiple input sizes, vary permitted maximum symbol
values, and include diverse symbol distributions.
These tests verifies count table correctness, maxSymbolValuePtr
updates, and error-handling paths. It enables automated regression
of core histogram logic as well.
jinyaoguo [Thu, 12 Jun 2025 23:52:58 +0000 (19:52 -0400)]
Ensure BMK_timedFnState is always freed in benchMem
When an error occurs in BMK_isSuccessful_runOutcome, the code
previously skipped the call to BMK_freeTimedFnState(tfs),
leaking the allocated tfs object.
Fiexed by calling BMK_freeTimedFnState(tfs) before goto _cleanOut.
Arpad Panyik [Wed, 11 Jun 2025 12:14:22 +0000 (12:14 +0000)]
AArch64: Add SVE2 implementation of histogram computation
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.
On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.
The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.
The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.
This implementation is the best performing of a number of different
cache blocking schemes tested.
Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(
e26dde3d ) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":
Clang-20 GCC-14
1#silesia.tar: +6.173% +5.987%
2#silesia.tar: +5.200% +5.011%
3#silesia.tar: +4.332% +5.031%
4#silesia.tar: +2.789% +3.064%
5#silesia.tar: +2.028% +1.838%
6#silesia.tar: +1.562% +1.340%
7#silesia.tar: +1.160% +0.959%
Yann Collet [Mon, 9 Jun 2025 22:19:47 +0000 (15:19 -0700)]
Merge pull request #4406 from Cyan4973/separate-cmake-tests
cmake CI tests refactor
Yann Collet [Mon, 9 Jun 2025 21:55:06 +0000 (21:55 +0000)]
remove global variable
overkill and leaky to transport a test result just in one place.
Yann Collet [Mon, 9 Jun 2025 17:57:59 +0000 (10:57 -0700)]
Merge pull request #4403 from dloidolt/fix_FUZZ_malloc_rand
fuzz: Fix FUZZ_malloc_rand() to return non-NULL for zero-size allocations
Yann Collet [Mon, 9 Jun 2025 17:06:36 +0000 (10:06 -0700)]
Merge pull request #4397 from xiaoge1001/free
Fix several locations with potential memory leak
shixuantong [Sat, 31 May 2025 16:37:57 +0000 (00:37 +0800)]
Fix several locations with potential memory leak
Yann Collet [Mon, 9 Jun 2025 07:24:03 +0000 (07:24 +0000)]
fix #4405
Yann Collet [Mon, 9 Jun 2025 07:09:51 +0000 (07:09 +0000)]
fixed cmake + windows + visual + clang-cl
by removing processing of resource files in this case
Yann Collet [Mon, 9 Jun 2025 06:47:28 +0000 (06:47 +0000)]
remove fail-fast so that the outcome of other tests can be observed
Yann Collet [Mon, 9 Jun 2025 03:47:33 +0000 (03:47 +0000)]
refactor: modularize CMakeLists.txt for better maintainability
- Split monolithic 235-line CMakeLists.txt into focused modules
- Main file reduced to 78 lines with clear section organization
- Created 5 specialized modules:
* ZstdVersion.cmake - CMake policies and version management
* ZstdOptions.cmake - Build options and platform configuration
* ZstdDependencies.cmake - External dependency management
* ZstdBuild.cmake - Build targets and validation
* ZstdPackage.cmake - Package configuration generation
Benefits:
- Improved readability and maintainability
- Better separation of concerns
- Easier debugging and modification
- Preserved 100% backward compatibility
- All existing build options and targets unchanged
The refactored build system passes all tests and maintains
identical functionality while being much easier to understand
and maintain.
Yann Collet [Sun, 8 Jun 2025 23:51:55 +0000 (23:51 +0000)]
add cmake build test with ZSTD_BUILD_TESTS disabled
should reproduce #4405 and fail
Yann Collet [Sun, 8 Jun 2025 22:40:15 +0000 (22:40 +0000)]
added macos arm64 tests
and comment out windows arm64 tests due to unacceptably long queue time