]> git.ipfire.org Git - thirdparty/zstd.git/commit
AArch64: Improve ZSTD_decodeSequence performance 4418/head
authorArpad Panyik <Arpad.Panyik@arm.com>
Tue, 24 Jun 2025 11:26:58 +0000 (11:26 +0000)
committerArpad Panyik <Arpad.Panyik@arm.com>
Tue, 24 Jun 2025 12:22:23 +0000 (12:22 +0000)
commita28e8182b1aa6e1e17bcdd099630ea67c4143d32
tree2f4464c1e1417cb8cb7d5ee97881b664d42c0b81
parent3c3b8274c517727952927c705940eb90c10c736f
AArch64: Improve ZSTD_decodeSequence performance

LLVM's alias-analysis sometimes fails to see that a static-array member
of a struct cannot alias other members. This patch:

- Reduces array accesses via struct indirection to aid load/store alias
  analysis under Clang.
- Converts dynamic array indexing into conditional-move arithmetic,
  eliminating branches and extra loads/stores on out-of-order CPUs.
- Reloads the bitstream only when match-length bits are consumed
  (assuming each reload only needs to happen once per match-length
  read), improving branch-prediction rates.
- Removes the UNLIKELY() hint, which recent compilers already handle
  well without cost.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20   Clang-*    GCC-14    GCC-15
 1#silesia.tar:  +11.556%  +16.203%   +0.240%   +2.216%   +7.891%
 2#silesia.tar:  +15.493%  +21.140%   -0.041%   +2.850%   +9.926%
 3#silesia.tar:  +16.887%  +22.570%   -0.183%   +3.056%  +10.660%
 4#silesia.tar:  +17.785%  +23.315%   -0.262%   +3.343%  +11.187%
 5#silesia.tar:  +18.125%  +24.175%   -0.466%   +3.350%  +11.228%
 6#silesia.tar:  +17.607%  +23.339%   -0.591%   +3.175%  +10.851%
 7#silesia.tar:  +17.463%  +22.837%   -0.486%   +3.292%  +10.868%

* Requires Clang-21 support from LLVM commit hash
  `a53003fe23cb6c871e72d70ff2d3a075a7490da2`
   (Clang-21 hasn’t been released as of this writing)

Co-authored by:
 David Sherwood, David.Sherwood@arm.com
 Ola Liljedahl, Ola.Liljedahl@arm.com
lib/decompress/zstd_decompress_block.c