git.ipfire.org Git - thirdparty/zlib-ng.git/commit

Improve sse41 adler32 performance

Rather than doing opportunistic aligned loads, we can do scalar
unaligned loads into our two halves of the checksum until we hit
alignment. Then, we can subtract from the max number of sums for the
first run through the loop.

This allows us to force aligned loads for unaligned buffers (likely a
common case for arbitrary runs of memory). This is not meaningful after
Nehalem but pre-Nehalem architectures it makes a substantial difference
to performance and is more foolproof than hoping for an aligned buffer.

Improvement is around 44-50% for unaligned worst case scenarios.

author	Adam Stylinski <kungfujesus06@gmail.com>
	Sat, 5 Feb 2022 21:15:46 +0000 (16:15 -0500)
committer	Hans Kristian Rosbach <hk-github@circlestorm.org>
	Thu, 24 Feb 2022 14:40:33 +0000 (15:40 +0100)
commit	cd37e12f72e8b4265bf890072c2c1193991c6890
tree	80e743c46be706de03167a6f6e57a02f4229db28	tree \| snapshot
parent	2b1a033f5efcf70ec6193c674628e3c4ba691248	commit \| diff