]> git.ipfire.org Git - thirdparty/zlib-ng.git/commit
Use vaddvq_u32 for adler32 NEON horizontal reduction develop
authorNathan Moin Vaziri <nathan@nathanm.com>
Tue, 31 Mar 2026 20:12:33 +0000 (13:12 -0700)
committerHans Kristian Rosbach <hk-github@circlestorm.org>
Mon, 22 Jun 2026 18:03:57 +0000 (20:03 +0200)
commit9071377c5926189c4ee58a1072b554a202e65ead
treeb33a1b6cd3badf24b8452a534136e293dc35becb
parentb1e704fef333371754831d0bac4f7c6a0a2f3400
Use vaddvq_u32 for adler32 NEON horizontal reduction

Replace interleaved pairwise reduction with vaddvq_u32 to break the
dependency chain between s1 and s2 modulo computations. The original
code merged both accumulators through a shared addp, serializing the
subsequent umull/lsr/msub chains. Independent reductions allow them
to execute in parallel.

On AArch64 this maps to the ADDV instruction. A compatibility shim
in neon_intrins.h emulates this on 32-bit ARM using vadd and vpadd.
arch/arm/adler32_neon.c
arch/arm/neon_intrins.h