]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
3 years agoTest CVE-2018-25032 against the default level and levels 1 and 2.
Nathan Moinvaziri [Thu, 31 Mar 2022 17:04:49 +0000 (10:04 -0700)] 
Test CVE-2018-25032 against the default level and levels 1 and 2.

3 years agoAdded unit test against CVE-2018-25032 with default strategy.
Nathan Moinvaziri [Mon, 28 Mar 2022 14:53:55 +0000 (07:53 -0700)] 
Added unit test against CVE-2018-25032 with default strategy.

Co-authored-by: Eric Biggers <ebiggers@kernel.org>
3 years agoAdded unit test against CVE-2018-25032.
Nathan Moinvaziri [Sun, 27 Mar 2022 00:49:49 +0000 (17:49 -0700)] 
Added unit test against CVE-2018-25032.
Sample input from https://www.openwall.com/lists/oss-security/2022/03/26/1.

Co-authored-by: Tavis Ormandy <taviso@users.noreply.github.com>
3 years agoAdded missing -F argument for Z_FIXED strategy in minideflate.
Nathan Moinvaziri [Sun, 27 Mar 2022 00:26:16 +0000 (17:26 -0700)] 
Added missing -F argument for Z_FIXED strategy in minideflate.

3 years agoUse size_t types for len arithmetic, matching signature
Adam Stylinski [Sun, 27 Mar 2022 23:20:08 +0000 (19:20 -0400)] 
Use size_t types for len arithmetic, matching signature

This suppresses a warning and keeps everything safely the same type.
While it's unlikely that the input for any of this will exceed the size
of an unsigned 32 bit integer, this approach is cleaner than casting and
should not result in a performance degradation.

3 years agoUse standalone fuzzing runner only when fuzzing engine is not found.
Nathan Moinvaziri [Mon, 28 Mar 2022 23:52:02 +0000 (16:52 -0700)] 
Use standalone fuzzing runner only when fuzzing engine is not found.

3 years agoAllow SSE2 and AVX2 functions with -DWITH_UNALIGNED=OFF. Even though they use unalign...
Nathan Moinvaziri [Sun, 27 Mar 2022 20:18:03 +0000 (13:18 -0700)] 
Allow SSE2 and AVX2 functions with -DWITH_UNALIGNED=OFF. Even though they use unaligned loads, they don't result in undefined behavior.

3 years agoLeverage inline CRC + copy
Adam Stylinski [Sat, 12 Mar 2022 21:09:02 +0000 (16:09 -0500)] 
Leverage inline CRC + copy

This brings back a bit of the performance that may have been sacrificed
by reverting the reorganized inflate window. Doing a copy at the same
time as a CRC is basically free.

3 years agoFixed clang signed/unsigned warning in chunkcopy_safe.
Nathan Moinvaziri [Sun, 27 Mar 2022 20:44:58 +0000 (13:44 -0700)] 
Fixed clang signed/unsigned warning in chunkcopy_safe.

inflate_p.h:159:18: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
        tocopy = MIN(non_olap_size, len);
                 ^   ~~~~~~~~~~~~~  ~~~
zbuild.h:74:24: note: expanded from macro 'MIN'
#define MIN(a, b) ((a) > (b) ? (b) : (a))
                    ~  ^  ~

3 years agoUse specific gcovr version 5.0 due to parser errors with 5.1.
Nathan Moinvaziri [Sun, 27 Mar 2022 16:14:45 +0000 (09:14 -0700)] 
Use specific gcovr version 5.0 due to parser errors with 5.1.
https://github.com/gcovr/gcovr/issues/583

3 years agoRemove unistd.h include from gzguts.h which is already included from zconf.h via...
Nathan Moinvaziri [Sat, 26 Mar 2022 17:37:29 +0000 (10:37 -0700)] 
Remove unistd.h include from gzguts.h which is already included from zconf.h via zlib.h.

3 years agoUse HAVE instead of HAS for variable name for consistency.
Nathan Moinvaziri [Sat, 26 Mar 2022 15:47:01 +0000 (08:47 -0700)] 
Use HAVE instead of HAS for variable name for consistency.

3 years agoRemove detect_leaks=0 from non-ASAN cmake jobs.
Nathan Moinvaziri [Sat, 26 Mar 2022 15:05:19 +0000 (08:05 -0700)] 
Remove detect_leaks=0 from non-ASAN cmake jobs.

3 years agoFixed error with compare256_unaligned_avx2 undefined if unaligned access is disabled.
Nathan Moinvaziri [Thu, 24 Mar 2022 20:01:21 +0000 (13:01 -0700)] 
Fixed error with compare256_unaligned_avx2 undefined if unaligned access is disabled.

3 years agoFixed signed comparison warning in zng_calloc_aligned.
Nathan Moinvaziri [Sun, 20 Mar 2022 02:41:24 +0000 (19:41 -0700)] 
Fixed signed comparison warning in zng_calloc_aligned.

zutil.c: In function ‘zng_calloc_aligned’:
zutil.c:133:20: warning: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘long unsigned int’ [-Wsign-compare]

3 years agoFixed unused opaque variable in aligned alloc test.
Nathan Moinvaziri [Sun, 20 Mar 2022 02:39:06 +0000 (19:39 -0700)] 
Fixed unused opaque variable in aligned alloc test.

test_aligned_alloc.cc: In function ‘void* zng_calloc_unaligned(void*, unsigned int, unsigned int)’:
test_aligned_alloc.cc:14:34: warning: unused parameter ‘opaque’ [-Wunused-parameter]
test_aligned_alloc.cc: In function ‘void zng_cfree_unaligned(void*, void*)’:
test_aligned_alloc.cc:28:32: warning: unused parameter ‘opaque’ [-Wunused-parameter]

3 years agoFixed operator precedence warnings in slide_hash_sse2.
Nathan Moinvaziri [Wed, 16 Mar 2022 21:21:33 +0000 (14:21 -0700)] 
Fixed operator precedence warnings in slide_hash_sse2.

slide_hash_sse2.c(58,5): warning C4554: '&': check operator precedence for possible error; use parentheses to clarify precedence
slide_hash_sse2.c(59,5): warning C4554: '&': check operator precedence for possible error; use parentheses to clarify precedence

3 years agoFixed signed/unsigned warning in chunkmemset.
Nathan Moinvaziri [Wed, 16 Mar 2022 21:20:23 +0000 (14:20 -0700)] 
Fixed signed/unsigned warning in chunkmemset.

chunkset_tpl.h(107,24): warning C4018: '>': signed/unsigned mismatch

3 years agoFixed MSVC warnings in chunkcopy_safe.
Nathan Moinvaziri [Wed, 16 Mar 2022 21:10:14 +0000 (14:10 -0700)] 
Fixed MSVC warnings in chunkcopy_safe.

inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data
inffast.c
inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data
inflate.c
inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data

3 years agoCorrect typo in functable
Rich Ercolani [Fri, 25 Mar 2022 14:06:39 +0000 (10:06 -0400)] 
Correct typo in functable

Now, I could be wrong about this being an error, but I don't see any discussion suggesting this was intended, so...

3 years agoAdd unit tests for compare256 variants.
Nathan Moinvaziri [Fri, 18 Mar 2022 19:10:27 +0000 (12:10 -0700)] 
Add unit tests for compare256 variants.

3 years agoFixed a warning about a comparison of an unsigned with a signed type
Adam Stylinski [Tue, 22 Mar 2022 23:39:41 +0000 (19:39 -0400)] 
Fixed a warning about a comparison of an unsigned with a signed type

3 years agoFix an issue with the ubsan for overflow
Adam Stylinski [Fri, 18 Mar 2022 23:18:10 +0000 (19:18 -0400)] 
Fix an issue with the ubsan for overflow

While this didn't _actually_ cause any issues for us, technically the
_mm512_reduce_add_epi32() intrinsics returns a signed integer and it
does the very last summation in scalar GPRs as signed integers. While
the ALU still did the math properly (the negative representation is the
same addition in hardware, just interpreted differently), the sanitizer
caught window of inputs here definitely outside the range of a signed
integer for this immediate operation.

The solution, as silly as it may seem, would be to implement our own 32
bit horizontal sum function that does all of the work in vector
registers. This allows us to implicitly keep things in vector register
domain and convert at the very end after we've summed the summation.

The compiler's sanitizer doesn't know the wiser and the solution still
results in being correct.

3 years agoUpdate language around ABI compatibility with zlib. #1081
Nathan Moinvaziri [Mon, 21 Mar 2022 18:58:07 +0000 (11:58 -0700)] 
Update language around ABI compatibility with zlib. #1081

3 years agoRename adler32_sse41 to adler32_ssse3
Adam Stylinski [Sun, 20 Mar 2022 15:44:32 +0000 (11:44 -0400)] 
Rename adler32_sse41 to adler32_ssse3

As it turns out, the sum of absolute differences instruction _did_ exist
in SSSE3 all along. SSE41 introduced a stranger, less commonly used
variation of the sum of absolute difference instruction.  Knowing this,
the old SSSE3 method can be axed entirely and the SSE41 method can now
be used on CPUs only having SSSE3.

Removing this extra functable entry shrinks the code and allows for a
simpler planned refactor later for the adler checksum and copy elision.

3 years agoFixed missing checks around compare256 and longest_match definitions.
Nathan Moinvaziri [Sat, 19 Mar 2022 22:53:30 +0000 (15:53 -0700)] 
Fixed missing checks around compare256 and longest_match definitions.

3 years agoUse zmemcmp_2 in 16-bit unaligned compare256 variant.
Nathan Moinvaziri [Sat, 19 Mar 2022 22:34:09 +0000 (15:34 -0700)] 
Use zmemcmp_2 in 16-bit unaligned compare256 variant.

3 years agoRevert "Reorganize inflate window layout"
Nathan Moinvaziri [Mon, 7 Mar 2022 03:20:29 +0000 (19:20 -0800)] 
Revert "Reorganize inflate window layout"

This reverts commit dc3b60841dbfa9cf37be3efb4568f055b4e15580.

3 years agoRevert "Add back original version of inflate_fast for use with inflateBack."
Nathan Moinvaziri [Mon, 7 Mar 2022 03:16:49 +0000 (19:16 -0800)] 
Revert "Add back original version of inflate_fast for use with inflateBack."

This reverts commit 2d2dde43b11c40cb58a339ff4a8425bca0091c31.

3 years agoRevert "DFLTCC update for window optimization from Jim & Nathan"
Nathan Moinvaziri [Mon, 7 Mar 2022 03:16:37 +0000 (19:16 -0800)] 
Revert "DFLTCC update for window optimization from Jim & Nathan"

This reverts commit b4ca25afabba7b4bf74d36e26728006d28df891d.

3 years agoAdded common sanitizer flags for getting optimal stack traces.
Nathan Moinvaziri [Fri, 18 Mar 2022 23:05:18 +0000 (16:05 -0700)] 
Added common sanitizer flags for getting optimal stack traces.

3 years agoUse halt_on_error in sanitizer options.
Nathan Moinvaziri [Fri, 18 Mar 2022 18:05:58 +0000 (11:05 -0700)] 
Use halt_on_error in sanitizer options.
https://lists.llvm.org/pipermail/cfe-dev/2015-October/045710.html

3 years agoMake symbolic prefix instance names consistent in NMake GHA workflow.
Nathan Moinvaziri [Thu, 17 Mar 2022 21:39:12 +0000 (14:39 -0700)] 
Make symbolic prefix instance names consistent in NMake GHA workflow.

3 years agoFixed misspelling in NO_UNALIGNED flag.
Nathan Moinvaziri [Thu, 17 Mar 2022 20:01:38 +0000 (13:01 -0700)] 
Fixed misspelling in NO_UNALIGNED flag.

3 years agoWrong variable used when detecting unaligned support for sanitize
Nathan Moinvaziri [Thu, 17 Mar 2022 19:57:41 +0000 (12:57 -0700)] 
Wrong variable used when detecting unaligned support for sanitize

3 years agoAdded sanitizer tests in configure GitHub Actions workflow.
Nathan Moinvaziri [Thu, 17 Mar 2022 19:22:22 +0000 (12:22 -0700)] 
Added sanitizer tests in configure GitHub Actions workflow.
Added missing OPTIONS environment variable for UBSAN in CMake GitHub workflow.

3 years agoUse zutil.h which already includes zlib headers.
Nathan Moinvaziri [Fri, 18 Mar 2022 00:16:11 +0000 (17:16 -0700)] 
Use zutil.h which already includes zlib headers.

3 years agoRemove unused zutil header.
Nathan Moinvaziri [Wed, 16 Mar 2022 21:38:37 +0000 (14:38 -0700)] 
Remove unused zutil header.

3 years agoFix a latent issue with chunkmemset
Adam Stylinski [Fri, 18 Mar 2022 00:22:56 +0000 (20:22 -0400)] 
Fix a latent issue with chunkmemset

It would seem that on some platforms, namely those which are
!UNALIGNED64_OK, there was a likelihood of chunkmemset_safe_c copying all
the bytes before passing control flow to chunkcopy, a function which is
explicitly unsafe to be called with a zero length copy.

This fixes that bug for those platforms.

3 years agoFix UBSAN's cry afoul
Adam Stylinski [Thu, 17 Mar 2022 02:52:44 +0000 (22:52 -0400)] 
Fix UBSAN's cry afoul

Technically, we weren't actually doing this the way C wants us to,
legally.  The zmemcpy's turn into NOPs for pretty much all > 0
optimization levels and this gets us defined behavior with the
sanitizer, putting the optimized load by arbitrary alignment into the
compiler's hands instead of ours.

3 years agoAdded check for UNALIGNED64_OK when defining zmemcpy_8 and zmemcmp_8.
Nathan Moinvaziri [Tue, 15 Mar 2022 23:54:44 +0000 (16:54 -0700)] 
Added check for UNALIGNED64_OK when defining zmemcpy_8 and zmemcmp_8.

3 years agoAdded 32-bit GCC build to CMake GitHub Actions.
Nathan Moinvaziri [Tue, 15 Mar 2022 17:24:47 +0000 (10:24 -0700)] 
Added 32-bit GCC build to CMake GitHub Actions.

3 years agoMake unaligned access being disabled configurable via build scripts.
Nathan Moinvaziri [Tue, 15 Mar 2022 01:30:51 +0000 (18:30 -0700)] 
Make unaligned access being disabled configurable via build scripts.

3 years agoMove UNALIGNED_OK detection to compile time instead of configure time.
Nathan Moinvaziri [Tue, 18 Jan 2022 02:47:23 +0000 (18:47 -0800)] 
Move UNALIGNED_OK detection to compile time instead of configure time.

3 years agoExplicitly install dependencies for wine32.
Mika T. Lindqvist [Tue, 15 Mar 2022 16:33:17 +0000 (18:33 +0200)] 
Explicitly install dependencies for wine32.
* Allow downgrading packages to resolve conflicts

3 years agoDon't use -mtune with ClangCl.
Mika Lindqvist [Tue, 15 Mar 2022 15:19:58 +0000 (17:19 +0200)] 
Don't use -mtune with ClangCl.

3 years ago[README] Add missing FORCE_SSE2 for CMake.
Mika Lindqvist [Mon, 14 Mar 2022 18:02:01 +0000 (20:02 +0200)] 
[README] Add missing FORCE_SSE2 for CMake.

3 years agoAllow bypassing runtime feature check of TZCNT instructions.
Mika Lindqvist [Sun, 13 Mar 2022 15:12:42 +0000 (17:12 +0200)] 
Allow bypassing runtime feature check of TZCNT instructions.
* This avoids conditional branch when it's known at build time that TZCNT instructions are always supported

3 years agoThrow an error when input is raw deflate stream but window_bits is not supplied.
Nathan Moinvaziri [Fri, 11 Mar 2022 23:45:06 +0000 (15:45 -0800)] 
Throw an error when input is raw deflate stream but window_bits is not supplied.

3 years agoPrint help when no arguments supplied to minideflate.
Nathan Moinvaziri [Fri, 11 Mar 2022 23:42:14 +0000 (15:42 -0800)] 
Print help when no arguments supplied to minideflate.

3 years agoAppend extension to output file path based on window_bits when compressing and remove...
Nathan Moinvaziri [Fri, 11 Mar 2022 23:31:57 +0000 (15:31 -0800)] 
Append extension to output file path based on window_bits when compressing and remove extension from output file path when decompressing.

3 years agoAdded support for -k keep argument to minideflate. By default minideflate will now...
Nathan Moinvaziri [Fri, 11 Mar 2022 22:53:47 +0000 (14:53 -0800)] 
Added support for -k keep argument to minideflate. By default minideflate will now delete the input file.

3 years agoUse large default buffer size for minideflate to match minigzip use of GZBUFSIZE.
Nathan Moinvaziri [Thu, 10 Mar 2022 16:57:13 +0000 (08:57 -0800)] 
Use large default buffer size for minideflate to match minigzip use of GZBUFSIZE.

3 years agoInclude zutil.h for definition of DEF_MEM_LEVEL.
Nathan Moinvaziri [Thu, 10 Mar 2022 16:53:44 +0000 (08:53 -0800)] 
Include zutil.h for definition of DEF_MEM_LEVEL.

3 years agoAuto-detect wrapper when inflating and no window_bits specified.
Nathan Moinvaziri [Mon, 28 Feb 2022 17:00:26 +0000 (09:00 -0800)] 
Auto-detect wrapper when inflating and no window_bits specified.

3 years agoUpdated help usage with correct values for window_bits.
Nathan Moinvaziri [Sun, 27 Feb 2022 18:06:13 +0000 (10:06 -0800)] 
Updated help usage with correct values for window_bits.

3 years agoFixed wrong error name when calling inflate in minideflate.
Nathan Moinvaziri [Wed, 23 Feb 2022 20:07:01 +0000 (12:07 -0800)] 
Fixed wrong error name when calling inflate in minideflate.

3 years agoSpeed up chunkcopy and memset
Adam Stylinski [Mon, 21 Feb 2022 21:52:17 +0000 (16:52 -0500)] 
Speed up chunkcopy and memset

This was found to have a significant impact on a highly compressible PNG
for both the encode and decode.  Some deltas show performance improving
as much as 60%+.

For the scenarios where the "dist" is not an even modulus of our chunk
size, we simply repeat the bytes as many times as possible into our
vector registers.  We then copy the entire vector and then advance the
quotient of our chunksize divided by our dist value.

If dist happens to be 1, there's no reason to not just call memset from
libc (this is likely to be just as fast if not faster).

3 years agoImprove SSE2 slide hash performance
Adam Stylinski [Mon, 24 Jan 2022 04:32:46 +0000 (23:32 -0500)] 
Improve SSE2 slide hash performance

At least on pre-nehalem CPUs, we get a > 50% improvement. This is
mostly due to the fact that we're opportunistically doing aligned loads
instead of unaligned loads.  This is something that is very likely to be
possible, given that the deflate stream initialization uses the zalloc
function, which most libraries don't override.  Our allocator aligns to
64 byte boundaries, meaning we can do aligned loads on even AVX512 for
the zstream->prev and zstream->head pointers. However, only pre-nehalem
CPUs _actually_ benefit from explicitly aligned load instructions.

The other thing being done here is we're unrolling the loop by a factor
of 2 so that we can get a tiny bit more ILP.  This improved performance
by another 5%-7% gain.

3 years agoAdded unit test for zng_calloc_aligned to ensure that it always returns 64-byte align...
Nathan Moinvaziri [Mon, 14 Mar 2022 18:46:02 +0000 (11:46 -0700)] 
Added unit test for zng_calloc_aligned to ensure that it always returns 64-byte aligned memory allocation when requested.

3 years agoBypass memory alignment compensation if not using custom allocator.
Nathan Moinvaziri [Tue, 25 Jan 2022 04:58:54 +0000 (20:58 -0800)] 
Bypass memory alignment compensation if not using custom allocator.

3 years agoAdded memory alignment compensation functions for users who may be using custom alloc...
Nathan Moinvaziri [Tue, 4 Jan 2022 18:30:54 +0000 (10:30 -0800)] 
Added memory alignment compensation functions for users who may be using custom allocators that don't align on the same boundary zlib-ng expects.

3 years agoIBM Z: Delete stale self-hosted builder containers
Ilya Leoshkevich [Tue, 15 Mar 2022 12:09:04 +0000 (08:09 -0400)] 
IBM Z: Delete stale self-hosted builder containers

Due to things like power outage ExecStop may not run, resulting in a
stale actions-runner container. This would prevent ExecStart from
succeeding, so try deleting such stale containers in ExecStartPre.

3 years agoUse older version of Google test to support older versions of GCC.
Nathan Moinvaziri [Mon, 7 Mar 2022 16:25:42 +0000 (08:25 -0800)] 
Use older version of Google test to support older versions of GCC.
Allow specifying alternative Google test repository and tag.

3 years agoIgnore code coverage for files in _dep directory.
Nathan Moinvaziri [Fri, 25 Feb 2022 01:56:09 +0000 (17:56 -0800)] 
Ignore code coverage for files in _dep directory.

3 years agoCompile MSAN instrumented C++ libraries for MSAN build with googletest.
Nathan Moinvaziri [Thu, 3 Feb 2022 22:37:12 +0000 (14:37 -0800)] 
Compile MSAN instrumented C++ libraries for MSAN build with googletest.

3 years agoPrefer posix versions of MinGW for compiling against googletest.
Nathan Moinvaziri [Mon, 31 Jan 2022 23:59:42 +0000 (15:59 -0800)] 
Prefer posix versions of MinGW for compiling against googletest.

3 years agoAdded static versions of c++ libraries on S390X, MinGW, and ppc.
Nathan Moinvaziri [Mon, 31 Jan 2022 23:50:27 +0000 (15:50 -0800)] 
Added static versions of c++ libraries on S390X, MinGW, and ppc.

3 years agoSpecify c++ compiler, packages, and flags for Google Test in cmake workflow.
Nathan Moinvaziri [Thu, 27 Jan 2022 02:27:20 +0000 (18:27 -0800)] 
Specify c++ compiler, packages, and flags for Google Test in cmake workflow.

3 years agoMove CVE-2003-0107 test to Google Tests.
Nathan Moinvaziri [Sun, 6 Feb 2022 17:51:06 +0000 (09:51 -0800)] 
Move CVE-2003-0107 test to Google Tests.

3 years agoImplement unit testing using Google Test framework.
Nathan Moinvaziri [Sun, 6 Feb 2022 17:52:27 +0000 (09:52 -0800)] 
Implement unit testing using Google Test framework.

3 years agoAdded ClangCl instances to GitHub Actions workflow.
Nathan Moinvaziri [Wed, 9 Mar 2022 22:57:22 +0000 (14:57 -0800)] 
Added ClangCl instances to GitHub Actions workflow.

3 years agoFixed inftrees.c should be compiled with infcover when zlib is a shared library.
Nathan Moinvaziri [Wed, 9 Mar 2022 22:50:41 +0000 (14:50 -0800)] 
Fixed inftrees.c should be compiled with infcover when zlib is a shared library.

3 years agoAdding some application-specific benchmarks
Adam Stylinski [Mon, 21 Feb 2022 05:17:07 +0000 (00:17 -0500)] 
Adding some application-specific benchmarks

So far there's only added png encode and decode with predictably
compressible bytes. This gives us a rough idea of more holistic
impacts of performance improvements (and regressions).

An interesting thing found with this, when compared with stock zlib,
we're slower for png decoding at levels 8 & 9. When we are slower, we
are spending a fair amount of time in the chunk copy function. This
probably merits a closer look.

This code creates optionally an alternative benchmark binary that links
with an alternative static zlib implementation. This can be used to
quickly compare between different forks.

3 years agoUse pclmulqdq accelerated CRC for exported function
Adam Stylinski [Tue, 8 Feb 2022 22:09:30 +0000 (17:09 -0500)] 
Use pclmulqdq accelerated CRC for exported function

We were already using this internally for our CRC calculations, however
the exported function to CRC checksum any arbitrary stream of bytes was
still using a generic C based version that leveraged tables. This
function is now called when len is at least 64 bytes.

3 years agoImproved adler32 NEON performance by 30-47%
Adam Stylinski [Sat, 12 Feb 2022 15:26:50 +0000 (10:26 -0500)] 
Improved adler32 NEON performance by 30-47%

We unlocked some ILP by allowing for independent sums in the loop and
reducing these sums outside of the loop. Additionally, the multiplication
by 32 (now 64) is moved outside of this loop. Similar to the chromium
implementation, this code does straight 8 bit -> 16 bit additions and defers
the fused multiply accumulate outside of the loop.  However, by unrolling by
another factor of 2, the code is measurably faster. The code does fused multiply
accmulates back to as many scratch registers we have room for in order to maximize
ILP for the 16 integer FMAs that need to occur.  The compiler seems to order them
such that the destination register is the same register as the previous instruction,
so perhaps it's not actually able to overlap or maybe the -A73's pipeline is reordering
these instructions, anyway.

On the Odroid-N2, the Cortex-A73 cores are ~30-44% faster on the adler32 benchmark,
and the Cortex-A53 cores are anywhere from 34-47% faster.

3 years agoUnlocked more ILP in SSE variant of adler checksum
Adam Stylinski [Wed, 16 Feb 2022 14:42:40 +0000 (09:42 -0500)] 
Unlocked more ILP in SSE variant of adler checksum

This helps uarchs such as sandybridge more than Yorkfield, but there
were some measurable gains on a Core 2 Quad Q9650 as well. We can sum
to two separate vs2 variables and add them back together at the end,
allowing for some overlapping multiply-adds. This was only about a 9-12%
gain on the Q9650 but it nearly doubled performance on cascade lake and
is likely to have appreciable gains on everything in between those two.

3 years agoImprove sse41 adler32 performance
Adam Stylinski [Sat, 5 Feb 2022 21:15:46 +0000 (16:15 -0500)] 
Improve sse41 adler32 performance

Rather than doing opportunistic aligned loads, we can do scalar
unaligned loads into our two halves of the checksum until we hit
alignment.  Then, we can subtract from the max number of sums for the
first run through the loop.

This allows us to force aligned loads for unaligned buffers (likely a
common case for arbitrary runs of memory). This is not meaningful after
Nehalem but pre-Nehalem architectures it makes a substantial difference
to performance and is more foolproof than hoping for an aligned buffer.

Improvement is around 44-50% for unaligned worst case scenarios.

3 years agoRun libpng tests on push in addition to pull-requests.
Hans Kristian Rosbach [Wed, 23 Feb 2022 20:31:11 +0000 (21:31 +0100)] 
Run libpng tests on push in addition to pull-requests.
Also run oss-fuzz on push to certain branches.

3 years agoFix compilation of benchmark when compiler supports, but does not default to enable...
Hans Kristian Rosbach [Fri, 11 Feb 2022 14:53:00 +0000 (15:53 +0100)] 
Fix compilation of benchmark when compiler supports, but does not default to enable C++11 or higher.

3 years agoUse multiple threads when running gcovr.
Nathan Moinvaziri [Wed, 23 Feb 2022 17:44:00 +0000 (09:44 -0800)] 
Use multiple threads when running gcovr.

3 years agoExclude unreachable branches from code coverage report.
Nathan Moinvaziri [Wed, 23 Feb 2022 15:16:19 +0000 (07:16 -0800)] 
Exclude unreachable branches from code coverage report.

3 years agoAdded codecov yaml configuration to repository.
Nathan Moinvaziri [Sat, 12 Feb 2022 16:10:17 +0000 (08:10 -0800)] 
Added codecov yaml configuration to repository.

3 years agoSwitch to using Codecov GitHub Action.
Nathan Moinvaziri [Sat, 12 Feb 2022 01:23:31 +0000 (17:23 -0800)] 
Switch to using Codecov GitHub Action.

3 years agoIBM Z: Install Codecov dependencies on the self-hosted builder
Ilya Leoshkevich [Sun, 6 Feb 2022 20:00:49 +0000 (21:00 +0100)] 
IBM Z: Install Codecov dependencies on the self-hosted builder

3 years agoPrevent stale stub functions from being called in deflate_slow
Adam Stylinski [Mon, 21 Feb 2022 21:46:18 +0000 (16:46 -0500)] 
Prevent stale stub functions from being called in deflate_slow

Just in case this is the very first call to longest match, we should
instead assign the function pointer instead of the function itself. This
way, by the time it leaves the stub, the function pointer gets
reassigned. This was found incidentally while debugging something else.

3 years agoDon't use -march=native when doing LD4 test for ARM/AArch64.
Mika Lindqvist [Fri, 18 Feb 2022 06:27:18 +0000 (08:27 +0200)] 
Don't use -march=native when doing LD4 test for ARM/AArch64.

3 years ago[AArch64] Add missing LD4 test for configure.
Mika Lindqvist [Thu, 17 Feb 2022 23:42:59 +0000 (01:42 +0200)] 
[AArch64] Add missing LD4 test for configure.

3 years agoAxe the SSE4 compare256 functions
Adam Stylinski [Sun, 23 Jan 2022 05:18:17 +0000 (00:18 -0500)] 
Axe the SSE4 compare256 functions

3 years agoWrite an SSE2 optimized compare256
Adam Stylinski [Sun, 23 Jan 2022 03:49:04 +0000 (22:49 -0500)] 
Write an SSE2 optimized compare256

The SSE4 variant uses the unfortunate string comparison instructions from
SSE4.2 which not only don't work on as many CPUs but, are often slower
than the SSE2 counterparts except in very specific circumstances.

This version should be ~2x faster than unaligned_64 for larger strings
and about half the performance of AVX2 comparisons on identical
hardware.

This version is meant to supplement pre AVX hardware. Because of this,
we're performing 1 extra load + compare at the beginning. In the event
that we're doing a full 256 byte comparison (completely equal strings),
this will result in 2 extra SIMD comparisons if the inputs are unaligned.
Given that the loads will be absorbed by L1, this isn't super likely to
be a giant penalty but for something like a core-i first or second gen,
where unaligned loads aren't nearly as expensive, this going to be
_marginally_ slower in the worst case.  This allows us to have half the
loads be aligned, so that the compiler can elide the load and compare by
using a register relative pcmpeqb.

3 years agoIntroduce zmemcmp to use unaligned access for architectures we know support unaligned...
Nathan Moinvaziri [Wed, 26 Jan 2022 18:51:23 +0000 (10:51 -0800)] 
Introduce zmemcmp to use unaligned access for architectures we know support unaligned access, otherwise use memcmp.

3 years agoIntroduce zmemcpy to use unaligned access for architectures we know support unaligned...
Nathan Moinvaziri [Sun, 9 Jan 2022 23:01:23 +0000 (15:01 -0800)] 
Introduce zmemcpy to use unaligned access for architectures we know support unaligned access, otherwise use memcpy.

3 years agoSimplify chunk_t type to uint64_t with memcpy calls.
Nathan Moinvaziri [Sun, 9 Jan 2022 22:58:53 +0000 (14:58 -0800)] 
Simplify chunk_t type to uint64_t with memcpy calls.

3 years agoFix compilation with clang-cl on windows
Deniz Bahadir [Thu, 23 Dec 2021 00:13:03 +0000 (01:13 +0100)] 
Fix compilation with clang-cl on windows

Do not include (system) headers when processing these headers with the
resource compiler, because it might trip over the headers coming from
LLVM.

3 years agoUpdate NMake GitHub Actions to use Visual Studio 2022 Enterprise.
Mika Lindqvist [Mon, 7 Feb 2022 15:54:41 +0000 (17:54 +0200)] 
Update NMake GitHub Actions to use Visual Studio 2022 Enterprise.

3 years agoRemove unnecessary zutil.h includes.
Nathan Moinvaziri [Sun, 6 Feb 2022 18:06:02 +0000 (10:06 -0800)] 
Remove unnecessary zutil.h includes.

3 years agoFixed short name for CPU features header guard.
Nathan Moinvaziri [Fri, 4 Feb 2022 03:13:59 +0000 (19:13 -0800)] 
Fixed short name for CPU features header guard.

3 years agoRemove duplicate header includes.
Nathan Moinvaziri [Fri, 4 Feb 2022 01:57:31 +0000 (17:57 -0800)] 
Remove duplicate header includes.

3 years agoOnly define CPU variants that require deflate_state when deflate.h has previously...
Nathan Moinvaziri [Thu, 27 Jan 2022 01:58:52 +0000 (17:58 -0800)] 
Only define CPU variants that require deflate_state when deflate.h has previously been included. This allows us to include cpu_features.h without including zlib.h or name mangling.

3 years agoRename CPU feature header and source files for consistency.
Nathan Moinvaziri [Sat, 5 Feb 2022 19:47:03 +0000 (11:47 -0800)] 
Rename CPU feature header and source files for consistency.