AArch64: Improve ZSTD_decodeSequence performance
LLVM's alias-analysis sometimes fails to see that a static-array member
of a struct cannot alias other members. This patch:
- Reduces array accesses via struct indirection to aid load/store alias
analysis under Clang.
- Converts dynamic array indexing into conditional-move arithmetic,
eliminating branches and extra loads/stores on out-of-order CPUs.
- Reloads the bitstream only when match-length bits are consumed
(assuming each reload only needs to happen once per match-length
read), improving branch-prediction rates.
- Removes the UNLIKELY() hint, which recent compilers already handle
well without cost.
Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":
Clang-19 Clang-20 Clang-* GCC-14 GCC-15
1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891%
2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926%
3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660%
4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187%
5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228%
6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851%
7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868%
* Requires Clang-21 support from LLVM commit hash
`
a53003fe23cb6c871e72d70ff2d3a075a7490da2`
(Clang-21 hasn’t been released as of this writing)
Co-authored by:
David Sherwood, David.Sherwood@arm.com
Ola Liljedahl, Ola.Liljedahl@arm.com