git.ipfire.org Git - thirdparty/linux.git/commit

author	Demian Shulhan <demyansh@gmail.com>
	Sun, 29 Mar 2026 07:43:38 +0000 (07:43 +0000)
committer	Eric Biggers <ebiggers@kernel.org>
	Sun, 29 Mar 2026 20:22:13 +0000 (13:22 -0700)
commit	63432fd625372a0e79fb00a4009af204f4edc013
tree	a6db54c5b5044e13a5779f230359c0f93abeaee3	tree \| snapshot
parent	6e4d63e8993c681e1cec7d564b4e018e21e658d0	commit \| diff

lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation

Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
software implementation is slow, which creates a bottleneck in NVMe and
other storage subsystems.

The acceleration is implemented using C intrinsics (<arm_neon.h>) rather
than raw assembly for better readability and maintainability.

Key highlights of this implementation:
- Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
  spikes on large buffers.
- Pre-calculates and loads fold constants via vld1q_u64() to minimize
  register spilling.
- Benchmarks show the break-even point against the generic implementation
  is around 128 bytes. The PMULL path is enabled only for len >= 128.

Performance results (kunit crc_benchmark on Cortex-A72):
- Generic (len=4096): ~268 MB/s
- PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)

Signed-off-by: Demian Shulhan <demyansh@gmail.com>
Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crc/Kconfig		diff \| blob \| blame \| history
lib/crc/Makefile		diff \| blob \| blame \| history
lib/crc/arm64/crc64-neon-inner.c	[new file with mode: 0644]	blob
lib/crc/arm64/crc64.h	[new file with mode: 0644]	blob