]> git.ipfire.org Git - thirdparty/zstd.git/commit
AArch64: Enhance struct access in Huffman decode 2X 4413/head
authorArpad Panyik <Arpad.Panyik@arm.com>
Fri, 20 Jun 2025 15:29:17 +0000 (15:29 +0000)
committerArpad Panyik <Arpad.Panyik@arm.com>
Mon, 23 Jun 2025 14:16:25 +0000 (14:16 +0000)
commitbd38fc2c5f21536bc523f9a42c5e642ea9407ae4
tree7c3bec3db432c506c0fe46c47309f1f3799fc7aa
parent7eefc221696aff82e934dcf2d6e1e795d80e7b20
AArch64: Enhance struct access in Huffman decode 2X

In the multi-stream multi-symbol Huffman decoder GCC generates
suboptimal code - emitting more loads for HUF_DEltX2 struct member
accesses. Forcing it to use 32-bit loads and bit arithmetic to extract
the necessary parts (UBFX) improves the overall decode speed.

Also avoid integer type conversions in the symbol decodes, which
leads to better instruction selection in table lookup accesses.

On AArch64 the decoder no longer runs into register-pressure limits,
so we can simplify the hot path and improve throughput

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20   Clang-*    GCC-13    GCC-14    GCC-15
 1#silesia.tar:   +0.820%   +1.365%   +2.480%   +1.348%   +0.987%
 2#silesia.tar:   +0.426%   +0.784%   +1.218%   +0.665%   +0.554%
 3#silesia.tar:   +0.112%   +0.389%   +0.508%   +0.188%   +0.261%

* Requires Clang-21 support from LLVM commit hash
  `a53003fe23cb6c871e72d70ff2d3a075a7490da2`
  (Clang-21 hasn’t been released as of this writing)
lib/decompress/huf_decompress.c