git.ipfire.org Git - thirdparty/zlib-ng.git/log

]> git.ipfire.org Git - thirdparty/zlib-ng.git/log

projects / thirdparty / zlib-ng.git / log

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 3 Feb 2022 23:02:14 +0000 (15:02 -0800)]

Remove code coverage libraries which introduce memory leak on exit.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 3 Feb 2022 22:31:10 +0000 (14:31 -0800)]

Enable code coverage build only when code coverage report name specified.

commit | commitdiff | tree

Mika T. Lindqvist [Tue, 1 Feb 2022 14:20:38 +0000 (16:20 +0200)]

[Benchmarks] Fix adler32/vmx benchmark not found under 32-bit PowerPC

commit | commitdiff | tree

Adam Stylinski [Sun, 23 Jan 2022 16:59:57 +0000 (11:59 -0500)]

Marginal improvement by pipelining loads on NEON

The ld1{4 reg} variant saves us instructions
and only adds 3 cycles of latency to load 3
more neon/asimd registers worth of data.

commit | commitdiff | tree

Adam Stylinski [Fri, 28 Jan 2022 15:00:07 +0000 (10:00 -0500)]

More than double adler32 performance with altivec

Bits of low hanging and high hanging fruit in this round of
optimization.  Altivec has a sum characters into 4 lanes of integers
instructions (intrinsic vec_sum4s) that seems basically made for this
algorithm.  Additionally, there's a similar multiply-accumulate routine
that takes two character vectors for input and outputs a vector of 4
ints for their respective adjacent sums.  This alone was a good amount
of the performance gains.

Additionally, the shifting by 4 was still done in the loop when it was
easy to roll outside of the loop and do only once.  This removed some
latency for a dependent operand to be ready.  We also unrolled the loop
with independent sums, though, this only seems to help for much larger
input sizes.

Additionally, we reduced feeding the two 16 bit halves of the sum simply
by packing them into an aligned allocation in the stack next to each
other.  Then, when loaded, we permute and shift the values to two
separate vector registers from the same input registers.  The separation
of these scalars probably could have been done in vector registers
through some tricks but we need them in scalar GPRs anyhow every time
they leave the loop so it was naturally better to keep those separate
before hitting the vectorized code.

For the horizontal addition, the code was modified to use a sequence of
shifts and adds to produce a vector sum in the first lane.  Then, the
much cheaper vec_ste was used to store the value into a general purpose
register rather than vec_extract.

Lastly, instead of doing the relatively expensive modulus in GPRs after
we perform the scalar operations to align all of the loads in the loop,
we can instead reduce "n" here for the first round to be n minus the
alignment offset.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 31 Jan 2022 15:54:52 +0000 (07:54 -0800)]

Added sanitizer identification to CMake CI instance names.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 31 Jan 2022 15:51:01 +0000 (07:51 -0800)]

Fixed S390X CI instance code coverage report names not unique.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 30 Jan 2022 16:30:34 +0000 (08:30 -0800)]

Remove qemu-run variable which is already defined in the toolchain files. Only configure script would need this variable defined.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 21 Jan 2022 20:15:36 +0000 (12:15 -0800)]

Fixed typo when undefined behavior sanitizer is not supported.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 27 Jan 2022 00:07:31 +0000 (16:07 -0800)]

Remove unnecessary compiler specification from mingw configs.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 26 Jan 2022 22:36:26 +0000 (14:36 -0800)]

Removed unused CMake includes in benchmark cmake.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 26 Jan 2022 22:36:07 +0000 (14:36 -0800)]

Fixed wrong cpu check variable used for ARM in adler benchmarks.

commit | commitdiff | tree

Ilya Leoshkevich [Mon, 31 Jan 2022 12:19:18 +0000 (13:19 +0100)]

IBM Z: Upgrade self-hosted builder to v2.287.1

commit | commitdiff | tree

Mika T. Lindqvist [Tue, 25 Jan 2022 14:11:39 +0000 (16:11 +0200)]

[ARM] rename cmake/configure macros check_{acle,neon}_intrinsics to check_{acle,neon}_compiler_flag
* Currently these macros only check that the compiler flag(s) are supported, not that the compiler supports the actual intrinsics

commit | commitdiff | tree

Mika T. Lindqvist [Mon, 24 Jan 2022 06:21:10 +0000 (08:21 +0200)]

[ARM] Use armv8-crc+simd when compiling ACLE code on toolchains that don't enable FPU by default.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 26 Jan 2022 19:30:34 +0000 (11:30 -0800)]

Move _POSIX_C_SOURCE define before first stdlib.h include in zbuild.h for posix_memalign.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 18:52:12 +0000 (10:52 -0800)]

Remove zutil.h includes from many files to prevent zlib.h being included.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:57:24 +0000 (09:57 -0800)]

Move build basics to zbuild.h.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 25 Jan 2022 23:28:56 +0000 (15:28 -0800)]

Added CMake status message when code coverage is enabled.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 25 Jan 2022 05:28:13 +0000 (21:28 -0800)]

Use NATIVEFLAG in intrinsic checks that is added whenever it is enabled.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 24 Jan 2022 03:53:22 +0000 (19:53 -0800)]

Added fallback macros for add_compile_options and add_link_options.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 26 Jan 2022 01:56:12 +0000 (17:56 -0800)]

Use add_compile_options in cmake which sets both C and CXX flags.

It is necessary to break apart strings that contain multiple flags so they are processed by add_compile_options properly.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 26 Jan 2022 01:49:45 +0000 (17:49 -0800)]

Use add_link_options in cmake sanitizer detection.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 23 Jan 2022 21:26:36 +0000 (13:26 -0800)]

Use add_link_options in cmake code coverage detection.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 23 Jan 2022 21:48:45 +0000 (13:48 -0800)]

Update CMake toolchains for alternative gcc/gcc++ names.
Remove compiler attribute in GHA when using CMake toolchain files.

commit | commitdiff | tree

Mika T. Lindqvist [Fri, 28 Jan 2022 22:49:14 +0000 (00:49 +0200)]

[PowerPC] Default CPU to 7400 to enable VMX support in qemu-ppc

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 23 Jan 2022 20:59:01 +0000 (12:59 -0800)]

Use static keyword for vec_sumsu to prevent undefined reference error when g++ linking.

commit | commitdiff | tree

Michael Hirsch [Tue, 25 Jan 2022 00:22:01 +0000 (19:22 -0500)]

Intel compilers: update deprecated -wn to -Wall style

This removes warnings on every single target like:
icx: command line warning #10430: Unsupported command line options encountered
These options as listed are not supported.
For more information, use '-qnextgen-diag'.
option list:
-w3

Signed-off-by: Michael Hirsch <michael@scivision.dev>

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 24 Jan 2022 04:15:40 +0000 (20:15 -0800)]

Fixed GCC warning about unused variable in longest_match.

match_tpl.h:47:13: warning: unused variable ‘scan_start’ [-Wunused-variable]
47 | uint8_t scan_start[8], scan_end[8];

commit | commitdiff | tree

Adam Stylinski [Tue, 25 Jan 2022 05:16:37 +0000 (00:16 -0500)]

Make cmake and configure release flags consistent

CMake sufficiently appends -DNDEBUG to the preprocessor macros when not
compiling with debug symbols. This turns off debug level assertions and
has some other side effects. As such, we should equally append this
define to the configure scripts' CFLAGS.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 24 Jan 2022 00:18:05 +0000 (16:18 -0800)]

Remove unused fdopen define for MSVC.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 21:06:14 +0000 (13:06 -0800)]

Use pigz version 2.6 due to bug in NOTHREADS support.
https://github.com/madler/pigz/issues/97

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 21:05:44 +0000 (13:05 -0800)]

Allow setting of version when building with pigz.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:49:13 +0000 (09:49 -0800)]

Move cpu feature variant callback typedefs to cpu_features header.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:38:20 +0000 (09:38 -0800)]

Use extern keyword in slide_hash function definitions.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:37:13 +0000 (09:37 -0800)]

Group together functable definitions that use deflate_state.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:28:37 +0000 (09:28 -0800)]

Use cpu_check_features in inflate and deflate.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:27:58 +0000 (09:27 -0800)]

Move cpu_feature includes out of zutil.h.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 22 Jan 2022 17:17:19 +0000 (09:17 -0800)]

Use fixed width types in compare256 definition.

commit | commitdiff | tree

Adam Stylinski [Sun, 23 Jan 2022 01:47:45 +0000 (20:47 -0500)]

Fixed inadvertent breakage of CPUID usage

In removing the AVX512 "well-suited" flag, a second cpuid call that's
needed for feature detection was accidently removed. This brings that
back.

commit | commitdiff | tree

Adam Stylinski [Tue, 18 Jan 2022 14:47:45 +0000 (09:47 -0500)]

Remove the "avx512_well_suited" cpu flag

Now that we have confirmation that the AVX512 variants so far have been
universally better on every capable CPU we've tested them on, there's no
sense in trying to maintain a whitelist.

commit | commitdiff | tree

Adam Stylinski [Mon, 17 Jan 2022 14:27:32 +0000 (09:27 -0500)]

Improvements to avx512 adler32 implementations

Now that better benchmarks are in place, it became apparent that masked
broadcast was _not_ faster and it's actually faster to use vmovd, as
suspected.  Additionally, for the VNNI variant, we've unlocked some
additional ILP by doing a second dot product in the loop to a different
running sum that gets recombined later.  This broke a data dependency
chain and allowed the IPC be ~2.75. The result is about a 40-50%
improvement in runtime.

Additionally, we've called the lesser SIMD sized variants if the input
is too small and they happen to be compiled in.  This helps for the
impossibly small input that still is large enough to be a vector length.
For size 16 and 32 inputs I was seeing something like sub 10 ns instead
of 50 ns.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 18 Jan 2022 00:17:00 +0000 (16:17 -0800)]

Fixed AVX512-VNNI detection when compiling with ClangCl.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 18 Jan 2022 00:32:02 +0000 (16:32 -0800)]

Remove crc32 and adler32 shared library tests which don't test anyting further than what is provided by crc32 static build or examplesh.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 18 Jan 2022 00:30:10 +0000 (16:30 -0800)]

Rename test binaries with test_ prefix.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 14 Jan 2022 20:19:11 +0000 (12:19 -0800)]

Fixed incorrect version of AVX specified for inflate chunk copying in feature list.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 14 Jan 2022 20:15:50 +0000 (12:15 -0800)]

Merge feature list entries for crc32.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 15 Jan 2022 21:30:08 +0000 (13:30 -0800)]

Clean up and remove bestcmp_t type for longest_match. Use 8 byte buffer (max int comparison size) and cast to integer type when making comparison.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 2 Jan 2022 23:27:00 +0000 (15:27 -0800)]

Clean up crc32_fold structure and clearly define the size of the fold buffer.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 8 Jan 2022 21:28:41 +0000 (13:28 -0800)]

Added adler32, compare256, crc32, and slide_hash benchmarks using Google Benchmark.

Co-authored-by: Adam Stylinski <kungfujesus06@gmail.com>

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 3 Jan 2022 21:07:24 +0000 (13:07 -0800)]

Change fuzzer binaries and solutions to prefix with fuzzer_.

commit | commitdiff | tree

Adam Stylinski [Sun, 9 Jan 2022 16:57:24 +0000 (11:57 -0500)]

Improved AVX2 adler32 performance

Did this by simply doing 32 bit horizontal sums and using the same sum
of absolute difference instructions as done in the SSE4 and AVX512_VNNI
versions.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 8 Jan 2022 22:42:09 +0000 (14:42 -0800)]

Convert compare258 to compare256 and moved 2 byte check into deflate_quick. Prevents having multiple compare258 functions with 2 byte checks.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 9 Jan 2022 15:35:10 +0000 (07:35 -0800)]

Merge crc32_little and crc32_big with preprocessor macros for each endian. Removed crc32_generic since it is not being used.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 14 Jan 2022 01:05:16 +0000 (17:05 -0800)]

Remove unmaintained index markdown in arch/x86.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 14 Jan 2022 01:02:49 +0000 (17:02 -0800)]

Remove unmaintained and out-dated DLL FAQ.

commit | commitdiff | tree

Nathan Moinvaziri [Sun, 2 Jan 2022 21:19:47 +0000 (13:19 -0800)]

Rename x86 source files with instruction set version.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 8 Jan 2022 21:28:41 +0000 (13:28 -0800)]

Remove unused COMPARE258 macro since longest_match only uses COMPARE256.

commit | commitdiff | tree

merceyz [Sat, 1 Jan 2022 14:57:31 +0000 (15:57 +0100)]

build: fix check for SSSE3 when using Emscripten

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 15 Dec 2021 22:21:58 +0000 (14:21 -0800)]

VPCLMULQDQ implementation for Intel's CRC32 folding.
Based on PR https://github.com/jtkukunas/zlib/pull/28.

Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

commit | commitdiff | tree

Adam Stylinski [Tue, 4 Jan 2022 15:38:39 +0000 (10:38 -0500)]

Added an SSE4 optimized adler32 checksum

This variant uses the lower number of cycles psadw insruction in place
of pmaddubsw for the running sum that does not need multiplication.

This allows this sum to be done independently, partially overlapping the
running "sum2" half of the checksum. We also have moved the shift
outside of the loop, breaking a small data dependency chain. The code
also now does a vectorized horizontal sum without having to rebase to
the adler32 base, as NMAX is defined as the maximum number of scalar
sums that can be peformed, so we're actually safe in doing this without
upgrading to higher precision. We can do a partial horizontal sum
because psadw only ends up accumulating 16 bit words in 2 vector lanes,
the other two can safely be assumed as 0.

commit | commitdiff | tree

Adam Stylinski [Tue, 4 Jan 2022 15:37:24 +0000 (10:37 -0500)]

Add SSE4.1 detection

Code leveraging this for the adler checksum is forthcoming

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 27 Dec 2021 03:56:12 +0000 (19:56 -0800)]

Use memcpy for unaligned reads.

Co-authored-by: Matija Skala <mskala@gmx.com>

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 7 Jan 2022 17:44:12 +0000 (09:44 -0800)]

Fixed missing pointers to functions when assigning to functable.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 31 Dec 2021 17:28:34 +0000 (09:28 -0800)]

Clean up crc32 extern using #elif same as in crc32_stub.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 4 Jan 2022 16:30:39 +0000 (08:30 -0800)]

Remove use_byfour compile time detection of the existence of four byte integer types since we are targeting newer systems.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 31 Dec 2021 16:22:35 +0000 (08:22 -0800)]

Move generic crc32 assignment to else statement so it can be optimized away if use_byfour is true.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 31 Dec 2021 16:15:34 +0000 (08:15 -0800)]

Don't assign C versions of compare258 and longest_match if we don't use them.

commit | commitdiff | tree

Adam Stylinski [Fri, 7 Jan 2022 20:51:09 +0000 (15:51 -0500)]

Have functioning avx512{,_vnni} adler32

The new adler32 checksum uses the VNNI instructions with appreciable
gains when possible. Otherwise, a pure avx512f variant exists which
still gives appreciable gains.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 30 Dec 2021 04:48:47 +0000 (20:48 -0800)]

Check for err == Z_OK is always true in minideflate loops.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 30 Dec 2021 04:45:12 +0000 (20:45 -0800)]

Fixed main function does not return value which may indicated unintended behavior.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 30 Dec 2021 04:33:08 +0000 (20:33 -0800)]

Fixed part of conditional expression is always true since size is always greater than 0.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 29 Dec 2021 23:51:00 +0000 (15:51 -0800)]

Fixed crc32 assembly not returning hash value to correct variable.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 25 Dec 2021 04:18:05 +0000 (20:18 -0800)]

Don't build DLL sources if BUILD_SHARED_LIBS=OFF.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 25 Dec 2021 04:19:03 +0000 (20:19 -0800)]

Fixed duplicate symbol zng_inflate_table and zng_inflate_copyright when BUILD_SHARED_LIBS=OFF.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 25 Dec 2021 01:02:13 +0000 (17:02 -0800)]

Move stdint.h below zconf include to prevent unexpected characters warning on ClangCl.

commit | commitdiff | tree

Nathan Moinvaziri [Sat, 25 Dec 2021 00:42:23 +0000 (16:42 -0800)]

Fixed incorrect flag used for SSE 4.2 support with ClangCl.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 24 Dec 2021 21:08:32 +0000 (13:08 -0800)]

Fixed implicit declaration of _mm_extract_epi32 when compiling with ClangCl.

commit | commitdiff | tree

Nathan Moinvaziri [Fri, 24 Dec 2021 20:33:15 +0000 (12:33 -0800)]

Fixed wrong alignment definition used when compiling with ClangCl.

commit | commitdiff | tree

Mika Lindqvist [Thu, 6 Jan 2022 15:29:19 +0000 (17:29 +0200)]

Fix building shared tests
* Don't add non-PIC gz sources to shared executables if they are already included in shared library as PIC sources

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 4 Jan 2022 22:57:18 +0000 (14:57 -0800)]

Remove old win32 readme.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 4 Jan 2022 03:48:06 +0000 (19:48 -0800)]

Remove double check for SSE4 in configure.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 20 Dec 2021 16:23:44 +0000 (08:23 -0800)]

Fixed crc32_combine_gen declaration warning in zlib-ng API.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 20 Dec 2021 16:15:40 +0000 (08:15 -0800)]

Upgrade version of GitHub checkout actions. #1078

commit | commitdiff | tree

Hans Kristian Rosbach [Mon, 13 Dec 2021 21:30:58 +0000 (22:30 +0100)]

Fix deflateBound and compressBound returning very small size estimates.
Remove workaround in switchlevels.c, so we do actual testing of this.
Use named defines instead of magic numbers where we can.

commit | commitdiff | tree

Mika Lindqvist [Wed, 15 Dec 2021 07:18:03 +0000 (09:18 +0200)]

Avoid warning C4295 when using Visual C++ and maintainer warnings are enabled.

commit | commitdiff | tree

Hans Kristian Rosbach [Mon, 13 Dec 2021 15:24:20 +0000 (16:24 +0100)]

Remove gz_intmax implementation, since INT_MAX is always available in modern C implementations.

commit | commitdiff | tree

Hans Kristian Rosbach [Mon, 13 Dec 2021 15:46:21 +0000 (16:46 +0100)]

inttypes.h includes stdint.h, so only include one of them.

commit | commitdiff | tree

Nathan Moinvaziri [Wed, 8 Dec 2021 00:29:01 +0000 (19:29 -0500)]

Added checks and comments to ensure that when using raw mode no checksumming takes place.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 7 Dec 2021 21:24:26 +0000 (16:24 -0500)]

Added unit test to ensure that inflate with adler32 hash works on previously failed test case.

commit | commitdiff | tree

Nathan Moinvaziri [Tue, 7 Dec 2021 20:52:05 +0000 (15:52 -0500)]

Don't overwrite adler32 hash with crc32 hash. #1066

commit | commitdiff | tree

Mika Lindqvist [Sat, 4 Dec 2021 06:25:17 +0000 (08:25 +0200)]

Workaround for installation failure of wine32.

commit | commitdiff | tree

Adam Stylinski [Thu, 2 Dec 2021 22:05:55 +0000 (17:05 -0500)]

Made this work on 32 bit compilations

For some reason the movq instruction from a 128 bit register to a 64 bit
GPR is not supported in 32 bit code. A simple workaround seems to be to
invoke movl if compiling with -m32.

Also addressing some style nits.

commit | commitdiff | tree

Adam Stylinski [Sun, 24 Oct 2021 21:44:33 +0000 (17:44 -0400)]

Have horizontal sum here, decent wins

commit | commitdiff | tree

Adam Stylinski [Sun, 24 Oct 2021 23:24:53 +0000 (19:24 -0400)]

Minor efficiency improvement

This now leverages the broadcasting instrinsics with an AND mask
to load up the registers.  Additionally, there's a minor efficiency
boost here by casting up to 64 bit precision (by means of register
aliasing) so that the modulo can be safely deferred until the write
back to the full sums.

The "write" back to the stack here is actually optimized out by GCC
and turned into a write directly to a 32 bit GPR for each of the 8
elements.  This much is not new, but now, since we don't have to do a
modulus with the BASE value, we can bypass 8 64 bit multiplications,
shifts, and subtractions while in those registers.

I tried to do a horizontal reduction sum on the 8 64 bit elements since
the vpextract* set of instructions aren't exactly low latency, however
to do this safely (no overflow) it requires 2 128 bit register extractions,
8 vpmovsxdq to bring the things up to 64 bit precision, some shuffles, more
128 bit extractions to get around the 128 bit lane requirement of the shuffles,
and finally a trip to a GPR and back to do the modulus on the scalar value.
This method could have been more efficient if there were an inexpensive 64 bit
horizontal addition instruction for AVX, but there isn't.

To test this, I wrote a pretty basic benchmark using Python's zlib bindings on
a huge set of random data, carefully timing only the checksum bits.  Invoking
perf stat from within the python process after the RNG shows a lower average
number of cycles to complete and a shorter runtime.

commit | commitdiff | tree

Adam Stylinski [Sat, 23 Oct 2021 16:38:12 +0000 (12:38 -0400)]

Use immediate variant of shift instruction

Since this is constant, anyway, we may as well use the variant that
doesn't add vector register pressure, has better ILP opportunities,
and has shorter instruction latency.

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 15 Nov 2021 04:58:01 +0000 (20:58 -0800)]

Reuse adler32_len_64 in adler32_c.

commit | commitdiff | tree

Nathan Moinvaziri [Thu, 28 Oct 2021 00:58:21 +0000 (17:58 -0700)]

Fixed inflateGetDictionary length check may include bytes added by last call to inflate.

commit | commitdiff | tree

Ilya Leoshkevich [Mon, 25 Oct 2021 22:50:26 +0000 (18:50 -0400)]

DFLTCC update for window optimization from Jim & Nathan

Stop relying on software and hardware inflate window formats being the
same and act the way we already do for deflate: provide and implement
window-related hooks.

Another possibility would be to use an in-line history buffer (by not
setting HBT_CIRCULAR), but this would require an extra memmove().

Also fix a couple corner cases in the software implementation of
inflateGetDictionary() and inflateSetDictionary().

commit | commitdiff | tree

Nathan Moinvaziri [Mon, 23 Aug 2021 19:21:40 +0000 (12:21 -0700)]

Add back original version of inflate_fast for use with inflateBack.

Mirror of https://github.com/zlib-ng/zlib-ng.git

RSS Atom