git.ipfire.org Git - thirdparty/zlib-ng.git/commit

author	Adam Stylinski <kungfujesus06@gmail.com>
	Thu, 12 Sep 2024 21:47:30 +0000 (17:47 -0400)
committer	Hans Kristian Rosbach <hk-github@circlestorm.org>
	Sat, 12 Oct 2024 11:21:03 +0000 (13:21 +0200)
commit	e874b34e1a8d975031d7e66e3050039de08686d0
tree	68b28822c3787c635b604347a82138d877bcee95	tree
parent	b52e70341700ac5fd68ca8584b87561911cf8a75	commit \| diff

Make chunkset_avx2 half chunk aware

This gives us appreciable gains on a number of fronts. The first being
we're inlining a pretty hot function that was getting dispatched to
regularly. Another is that we're able to do a safe lagged copy of a
distance that is smaller, so CHUNKCOPY gets its teeth back here for
smaller sizes, without having to do another dispatch to a function.

We're also now doing two overlapping writes at once and letting the CPU
do its store forwarding. This was an enhancement @dougallj had suggested
a while back.

Additionally, the "half chunk mag" here is fundamentally less
complicated because it doesn't require sythensizing cross lane permutes
with a blend operation, so we can optimistically do that first if the
len is small enough that a full 32 byte chunk doesn't make any sense.

arch/arm/chunkset_neon.c		diff \| blob \| blame \| history
arch/x86/chunkset_avx2.c		diff \| blob \| blame \| history
arch/x86/chunkset_sse2.c		diff \| blob \| blame \| history
arch/x86/chunkset_ssse3.c		diff \| blob \| blame \| history
chunkset_tpl.h		diff \| blob \| blame \| history