]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
3 years agoAdded an SSE4 optimized adler32 checksum
Adam Stylinski [Tue, 4 Jan 2022 15:38:39 +0000 (10:38 -0500)] 
Added an SSE4 optimized adler32 checksum

This variant uses the lower number of cycles psadw insruction in place
of pmaddubsw for the running sum that does not need multiplication.

This allows this sum to be done independently, partially overlapping the
running "sum2" half of the checksum.  We also have moved the shift
outside of the loop, breaking a small data dependency chain. The code
also now does a vectorized horizontal sum without having to rebase to
the adler32 base, as NMAX is defined as the maximum number of scalar
sums that can be peformed, so we're actually safe in doing this without
upgrading to higher precision.  We can do a partial horizontal sum
because psadw only ends up accumulating 16 bit words in 2 vector lanes,
the other two can safely be assumed as 0.

3 years agoAdd SSE4.1 detection
Adam Stylinski [Tue, 4 Jan 2022 15:37:24 +0000 (10:37 -0500)] 
Add SSE4.1 detection

Code leveraging this for the adler checksum is forthcoming

3 years agoUse memcpy for unaligned reads.
Nathan Moinvaziri [Mon, 27 Dec 2021 03:56:12 +0000 (19:56 -0800)] 
Use memcpy for unaligned reads.

Co-authored-by: Matija Skala <mskala@gmx.com>
3 years agoFixed missing pointers to functions when assigning to functable.
Nathan Moinvaziri [Fri, 7 Jan 2022 17:44:12 +0000 (09:44 -0800)] 
Fixed missing pointers to functions when assigning to functable.

3 years agoClean up crc32 extern using #elif same as in crc32_stub.
Nathan Moinvaziri [Fri, 31 Dec 2021 17:28:34 +0000 (09:28 -0800)] 
Clean up crc32 extern using #elif same as in crc32_stub.

3 years agoRemove use_byfour compile time detection of the existence of four byte integer types...
Nathan Moinvaziri [Tue, 4 Jan 2022 16:30:39 +0000 (08:30 -0800)] 
Remove use_byfour compile time detection of the existence of four byte integer types since we are targeting newer systems.

3 years agoMove generic crc32 assignment to else statement so it can be optimized away if use_by...
Nathan Moinvaziri [Fri, 31 Dec 2021 16:22:35 +0000 (08:22 -0800)] 
Move generic crc32 assignment to else statement so it can be optimized away if use_byfour is true.

3 years agoDon't assign C versions of compare258 and longest_match if we don't use them.
Nathan Moinvaziri [Fri, 31 Dec 2021 16:15:34 +0000 (08:15 -0800)] 
Don't assign C versions of compare258 and longest_match if we don't use them.

3 years agoHave functioning avx512{,_vnni} adler32
Adam Stylinski [Fri, 7 Jan 2022 20:51:09 +0000 (15:51 -0500)] 
Have functioning avx512{,_vnni} adler32

The new adler32 checksum uses the VNNI instructions with appreciable
gains when possible. Otherwise, a pure avx512f variant exists which
still gives appreciable gains.

3 years agoCheck for err == Z_OK is always true in minideflate loops.
Nathan Moinvaziri [Thu, 30 Dec 2021 04:48:47 +0000 (20:48 -0800)] 
Check for err == Z_OK is always true in minideflate loops.

3 years agoFixed main function does not return value which may indicated unintended behavior.
Nathan Moinvaziri [Thu, 30 Dec 2021 04:45:12 +0000 (20:45 -0800)] 
Fixed main function does not return value which may indicated unintended behavior.

3 years agoFixed part of conditional expression is always true since size is always greater...
Nathan Moinvaziri [Thu, 30 Dec 2021 04:33:08 +0000 (20:33 -0800)] 
Fixed part of conditional expression is always true since size is always greater than 0.

3 years agoFixed crc32 assembly not returning hash value to correct variable.
Nathan Moinvaziri [Wed, 29 Dec 2021 23:51:00 +0000 (15:51 -0800)] 
Fixed crc32 assembly not returning hash value to correct variable.

3 years agoDon't build DLL sources if BUILD_SHARED_LIBS=OFF.
Nathan Moinvaziri [Sat, 25 Dec 2021 04:18:05 +0000 (20:18 -0800)] 
Don't build DLL sources if BUILD_SHARED_LIBS=OFF.

3 years agoFixed duplicate symbol zng_inflate_table and zng_inflate_copyright when BUILD_SHARED_...
Nathan Moinvaziri [Sat, 25 Dec 2021 04:19:03 +0000 (20:19 -0800)] 
Fixed duplicate symbol zng_inflate_table and zng_inflate_copyright when BUILD_SHARED_LIBS=OFF.

3 years agoMove stdint.h below zconf include to prevent unexpected characters warning on ClangCl.
Nathan Moinvaziri [Sat, 25 Dec 2021 01:02:13 +0000 (17:02 -0800)] 
Move stdint.h below zconf include to prevent unexpected characters warning on ClangCl.

3 years agoFixed incorrect flag used for SSE 4.2 support with ClangCl.
Nathan Moinvaziri [Sat, 25 Dec 2021 00:42:23 +0000 (16:42 -0800)] 
Fixed incorrect flag used for SSE 4.2 support with ClangCl.

3 years agoFixed implicit declaration of _mm_extract_epi32 when compiling with ClangCl.
Nathan Moinvaziri [Fri, 24 Dec 2021 21:08:32 +0000 (13:08 -0800)] 
Fixed implicit declaration of _mm_extract_epi32 when compiling with ClangCl.

3 years agoFixed wrong alignment definition used when compiling with ClangCl.
Nathan Moinvaziri [Fri, 24 Dec 2021 20:33:15 +0000 (12:33 -0800)] 
Fixed wrong alignment definition used when compiling with ClangCl.

3 years agoFix building shared tests
Mika Lindqvist [Thu, 6 Jan 2022 15:29:19 +0000 (17:29 +0200)] 
Fix building shared tests
* Don't add non-PIC gz sources to shared executables if they are already included in shared library as PIC sources

3 years agoRemove old win32 readme.
Nathan Moinvaziri [Tue, 4 Jan 2022 22:57:18 +0000 (14:57 -0800)] 
Remove old win32 readme.

3 years agoRemove double check for SSE4 in configure.
Nathan Moinvaziri [Tue, 4 Jan 2022 03:48:06 +0000 (19:48 -0800)] 
Remove double check for SSE4 in configure.

3 years agoFixed crc32_combine_gen declaration warning in zlib-ng API.
Nathan Moinvaziri [Mon, 20 Dec 2021 16:23:44 +0000 (08:23 -0800)] 
Fixed crc32_combine_gen declaration warning in zlib-ng API.

3 years agoUpgrade version of GitHub checkout actions. #1078
Nathan Moinvaziri [Mon, 20 Dec 2021 16:15:40 +0000 (08:15 -0800)] 
Upgrade version of GitHub checkout actions. #1078

3 years agoFix deflateBound and compressBound returning very small size estimates.
Hans Kristian Rosbach [Mon, 13 Dec 2021 21:30:58 +0000 (22:30 +0100)] 
Fix deflateBound and compressBound returning very small size estimates.
Remove workaround in switchlevels.c, so we do actual testing of this.
Use named defines instead of magic numbers where we can.

3 years agoAvoid warning C4295 when using Visual C++ and maintainer warnings are enabled.
Mika Lindqvist [Wed, 15 Dec 2021 07:18:03 +0000 (09:18 +0200)] 
Avoid warning C4295 when using Visual C++ and maintainer warnings are enabled.

3 years agoRemove gz_intmax implementation, since INT_MAX is always available in modern C implem...
Hans Kristian Rosbach [Mon, 13 Dec 2021 15:24:20 +0000 (16:24 +0100)] 
Remove gz_intmax implementation, since INT_MAX is always available in modern C implementations.

3 years agointtypes.h includes stdint.h, so only include one of them.
Hans Kristian Rosbach [Mon, 13 Dec 2021 15:46:21 +0000 (16:46 +0100)] 
inttypes.h includes stdint.h, so only include one of them.

3 years agoAdded checks and comments to ensure that when using raw mode no checksumming takes...
Nathan Moinvaziri [Wed, 8 Dec 2021 00:29:01 +0000 (19:29 -0500)] 
Added checks and comments to ensure that when using raw mode no checksumming takes place.

3 years agoAdded unit test to ensure that inflate with adler32 hash works on previously failed...
Nathan Moinvaziri [Tue, 7 Dec 2021 21:24:26 +0000 (16:24 -0500)] 
Added unit test to ensure that inflate with adler32 hash works on previously failed test case.

3 years agoDon't overwrite adler32 hash with crc32 hash. #1066
Nathan Moinvaziri [Tue, 7 Dec 2021 20:52:05 +0000 (15:52 -0500)] 
Don't overwrite adler32 hash with crc32 hash. #1066

3 years agoWorkaround for installation failure of wine32.
Mika Lindqvist [Sat, 4 Dec 2021 06:25:17 +0000 (08:25 +0200)] 
Workaround for installation failure of wine32.

3 years agoMade this work on 32 bit compilations
Adam Stylinski [Thu, 2 Dec 2021 22:05:55 +0000 (17:05 -0500)] 
Made this work on 32 bit compilations

For some reason the movq instruction from a 128 bit register to a 64 bit
GPR is not supported in 32 bit code.  A simple workaround seems to be to
invoke movl if compiling with -m32.

Also addressing some style nits.

3 years agoHave horizontal sum here, decent wins
Adam Stylinski [Sun, 24 Oct 2021 21:44:33 +0000 (17:44 -0400)] 
Have horizontal sum here, decent wins

3 years agoMinor efficiency improvement
Adam Stylinski [Sun, 24 Oct 2021 23:24:53 +0000 (19:24 -0400)] 
Minor efficiency improvement

This now leverages the broadcasting instrinsics with an AND mask
to load up the registers.  Additionally, there's a minor efficiency
boost here by casting up to 64 bit precision (by means of register
aliasing) so that the modulo can be safely deferred until the write
back to the full sums.

The "write" back to the stack here is actually optimized out by GCC
and turned into a write directly to a 32 bit GPR for each of the 8
elements.  This much is not new, but now, since we don't have to do a
modulus with the BASE value, we can bypass 8 64 bit multiplications,
shifts, and subtractions while in those registers.

I tried to do a horizontal reduction sum on the 8 64 bit elements since
the vpextract* set of instructions aren't exactly low latency, however
to do this safely (no overflow) it requires 2 128 bit register extractions,
8 vpmovsxdq to bring the things up to 64 bit precision, some shuffles, more
128 bit extractions to get around the 128 bit lane requirement of the shuffles,
and finally a trip to a GPR and back to do the modulus on the scalar value.
This method could have been more efficient if there were an inexpensive 64 bit
horizontal addition instruction for AVX, but there isn't.

To test this, I wrote a pretty basic benchmark using Python's zlib bindings on
a huge set of random data, carefully timing only the checksum bits.  Invoking
perf stat from within the python process after the RNG shows a lower average
number of cycles to complete and a shorter runtime.

3 years agoUse immediate variant of shift instruction
Adam Stylinski [Sat, 23 Oct 2021 16:38:12 +0000 (12:38 -0400)] 
Use immediate variant of shift instruction

Since this is constant, anyway, we may as well use the variant that
doesn't add vector register pressure, has better ILP opportunities,
and has shorter instruction latency.

3 years agoReuse adler32_len_64 in adler32_c.
Nathan Moinvaziri [Mon, 15 Nov 2021 04:58:01 +0000 (20:58 -0800)] 
Reuse adler32_len_64 in adler32_c.

3 years agoFixed inflateGetDictionary length check may include bytes added by last call to inflate.
Nathan Moinvaziri [Thu, 28 Oct 2021 00:58:21 +0000 (17:58 -0700)] 
Fixed inflateGetDictionary length check may include bytes added by last call to inflate.

3 years agoDFLTCC update for window optimization from Jim & Nathan
Ilya Leoshkevich [Mon, 25 Oct 2021 22:50:26 +0000 (18:50 -0400)] 
DFLTCC update for window optimization from Jim & Nathan

Stop relying on software and hardware inflate window formats being the
same and act the way we already do for deflate: provide and implement
window-related hooks.

Another possibility would be to use an in-line history buffer (by not
setting HBT_CIRCULAR), but this would require an extra memmove().

Also fix a couple corner cases in the software implementation of
inflateGetDictionary() and inflateSetDictionary().

3 years agoAdd back original version of inflate_fast for use with inflateBack.
Nathan Moinvaziri [Mon, 23 Aug 2021 19:21:40 +0000 (12:21 -0700)] 
Add back original version of inflate_fast for use with inflateBack.

3 years agoReorganize inflate window layout
Jim Kukunas [Wed, 30 Jun 2021 23:36:08 +0000 (19:36 -0400)] 
Reorganize inflate window layout

This commit significantly improves inflate performance by reorganizing the window buffer into a contiguous window and pending output buffer. The goal of this layout is to reduce branching, improve cache locality, and enable for the use of crc folding with gzip input.

The window buffer is allocated as a multiple of the user-selected window size. In this commit, a factor of 2 is utilized.

The layout of the window buffer is divided into two sections. The first section, window offset [0, wsize), is reserved for history that has already been output. The second section, window offset [wsize, 2 * wsize), is reserved for buffering pending output that hasn't been flushed to the user's output buffer yet.

The history section grows downwards, towards the window offset of 0. The pending output section grows upwards, towards the end of the buffer. As a result, all of the possible distance/length data that may need to be copied is contiguous. This removes the need to stitch together output from 2 separate buffers.

In the case of gzip input, crc folding is used to copy the pending output to the user's buffers.

Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
3 years agoFixed minideflate write buffers being overwritten.
Nathan Moinvaziri [Fri, 12 Nov 2021 01:55:13 +0000 (17:55 -0800)] 
Fixed minideflate write buffers being overwritten.

3 years ago[ARM] Try to compile test for float-abi detection code.
Mika Lindqvist [Fri, 29 Oct 2021 17:08:30 +0000 (20:08 +0300)] 
[ARM] Try to compile test for float-abi detection code.

3 years ago[MacOS] Downgrade to XCode 11.7.0 for pkgcheck.
Mika Lindqvist [Fri, 29 Oct 2021 17:57:31 +0000 (20:57 +0300)] 
[MacOS] Downgrade to XCode 11.7.0 for pkgcheck.

3 years agoIBM Z: Run DFLTCC tests on the self-hosted builder
Ilya Leoshkevich [Wed, 4 Aug 2021 22:27:31 +0000 (00:27 +0200)] 
IBM Z: Run DFLTCC tests on the self-hosted builder

* Use the self-hosted builder instead of ubuntu-latest.
* Drop qemu-related settings from DFLTCC configurations.
* Install codecov only for the current user, since the self-hosted
  builder runs under a restricted non-root account.
* Use actions/checkout@v2 for configure checks, since for some reason
  actions/checkout@v1 cannot find git on the self-hosted builder.
* Update the testing section of the DFLTCC README.
* Add the infrastructure code for the self-hosted builder.

3 years agoENH: Transition to Ubuntu 18.04 in `GitHub` actions workflows
Jon Haitz Legarreta Gorroño [Wed, 13 Oct 2021 13:58:41 +0000 (09:58 -0400)] 
ENH: Transition to Ubuntu 18.04 in `GitHub` actions workflows

Transition to Ubuntu 18.04 in `GitHub` actions workflows.

Fixes:
```
Ubuntu 16.04 Clang
This request was automatically failed because there were no enabled runners online to process the request for more than 1 days.

Ubuntu 16.04 GCC
This request was automatically failed because there were no enabled runners online to process the request for more than 1 days.
```

reported for example at:
https://github.com/zlib-ng/zlib-ng/actions/runs/1326434358

Official `GitHub` notice related to the removal of the 16.04 virtual
environments:
https://github.blog/changelog/2021-04-29-github-actions-ubuntu-16-04-lts-virtual-environment-will-be-removed-on-september-20-2021/

3 years agoFix minor formatting issues
Dženan Zukić [Mon, 6 Sep 2021 18:38:09 +0000 (14:38 -0400)] 
Fix minor formatting issues

From ITK PR: https://github.com/InsightSoftwareConsortium/ITK/pull/2803
CI check: https://github.com/InsightSoftwareConsortium/ITK/runs/3864083025

commit 5434d42 adds bad whitespace:
README.md:223: new blank line at EOF.

commit 5434d42 is not allowed; missing newline at the end of file in .gitattributes.

3 years agoCOMP: Fix data loss warning
Jon Haitz Legarreta Gorroño [Sun, 10 Oct 2021 14:51:09 +0000 (10:51 -0400)] 
COMP: Fix data loss warning

Fix data loss warning.

Fixes:
```
itkzlib-ng/inflate.c(1209,24): warning C4267: '=': conversion from 'size_t' to 'unsigned long', possible loss of data
itkzlib-ng/inflate.c(1210,26): warning C4267: '=': conversion from 'size_t' to 'unsigned long', possible loss of data
```

3 years agoMake integration into bigger projects easier
Dženan Zukić [Mon, 6 Sep 2021 18:26:56 +0000 (14:26 -0400)] 
Make integration into bigger projects easier

3 years agoIBM Z: Adjust compressBound() for DFLTCC
Ilya Leoshkevich [Mon, 11 Oct 2021 10:24:20 +0000 (12:24 +0200)] 
IBM Z: Adjust compressBound() for DFLTCC

When DFLTCC was introduced, deflateBound() was adjusted, but
compressBound() was not, leading to compression failures when using
compressBound() + compress() with poorly compressible data.

3 years agoIBM Z: Sync crc_fold with DFLTCC
Ilya Leoshkevich [Mon, 11 Oct 2021 11:22:13 +0000 (13:22 +0200)] 
IBM Z: Sync crc_fold with DFLTCC

Intermediate CRC32 value was moved from strm->adler to
state->crc_fold.

3 years agoIBM Z: Do not check inflateGetDictionary() with DFLTCC
Ilya Leoshkevich [Mon, 11 Oct 2021 11:47:20 +0000 (13:47 +0200)] 
IBM Z: Do not check inflateGetDictionary() with DFLTCC

The zlib manual does not specify a strict contract for
inflateGetDictionary(), it merely says that it "Returns the sliding
dictionary being maintained by inflate", which is an implementation
detail. IBM Z inflate's behavior differs from that of software, and
may change in the future to boot.

3 years agoLink crc32_test and infcover with $(CFLAGS)
Ilya Leoshkevich [Mon, 11 Oct 2021 10:36:28 +0000 (12:36 +0200)] 
Link crc32_test and infcover with $(CFLAGS)

This fixes link failures when using CFLAGS=-m31 on IBM Z. All the
other tests are already linked this way.

3 years agoIBM Z: Fix building outside of a source directory
Ilya Leoshkevich [Mon, 11 Oct 2021 11:12:42 +0000 (13:12 +0200)] 
IBM Z: Fix building outside of a source directory

Do not use relative includes, since they are valid only within the
source directory. Rely on the build system to pass the necessary
include flags instead.

3 years agoExercise the new symbol prefix option in CI tests
Dženan Zukić [Wed, 22 Sep 2021 21:08:06 +0000 (17:08 -0400)] 
Exercise the new symbol prefix option in CI tests

3 years agoAdd support for name mangling
Dženan Zukić [Mon, 6 Sep 2021 20:39:28 +0000 (16:39 -0400)] 
Add support for name mangling

This is useful when zlib-ng is embedded into another library,
such as ITK: https://itk.org/

Closes #1025.

Co-authored-by: Mika Lindqvist <postmaster@raasu.org>
3 years agoUse helper function for printing error and exiting in example.
Nathan Moinvaziri [Mon, 12 Jul 2021 03:31:08 +0000 (20:31 -0700)] 
Use helper function for printing error and exiting in example.

3 years agoAdded code coverage for inflateGetDictionary in example.
Nathan Moinvaziri [Sun, 11 Jul 2021 23:17:30 +0000 (16:17 -0700)] 
Added code coverage for inflateGetDictionary in example.

3 years agoCall deflateBound to calculate length with custom gzip header in example.
Nathan Moinvaziri [Sun, 11 Jul 2021 23:58:48 +0000 (16:58 -0700)] 
Call deflateBound to calculate length with custom gzip header in example.

3 years agoFill out gzheader before calling deflateSetHeader for better code coverage in example.
Nathan Moinvaziri [Sun, 11 Jul 2021 23:59:21 +0000 (16:59 -0700)] 
Fill out gzheader before calling deflateSetHeader for better code coverage in example.

3 years agoAdded CI instances for CTZLL and CTZ builtin existence to improve code coverage.
Nathan Moinvaziri [Sat, 10 Jul 2021 17:08:53 +0000 (10:08 -0700)] 
Added CI instances for CTZLL and CTZ builtin existence to improve code coverage.

3 years agoFix UB in inffast.c when not using window
Ori Livneh [Mon, 23 Aug 2021 16:40:19 +0000 (12:40 -0400)] 
Fix UB in inffast.c when not using window

When not using window, `window + wsize` applies a zero offset to a null pointer, which is undefined behavior.

3 years agoFixed trailing whitespaces and missing new lines.
Nathan Moinvaziri [Sat, 4 Sep 2021 19:16:16 +0000 (12:16 -0700)] 
Fixed trailing whitespaces and missing new lines.

3 years agoFix hangs on macOS due to loading of misaligned addresses in chunkmemset_8.
Sergey Markelov [Thu, 22 Jul 2021 17:23:26 +0000 (10:23 -0700)] 
Fix hangs on macOS due to loading of misaligned addresses in chunkmemset_8.

3 years agoInclude win directory in pigz even if not using threads.
Nathan Moinvaziri [Thu, 1 Jul 2021 21:06:06 +0000 (14:06 -0700)] 
Include win directory in pigz even if not using threads.

3 years agoFixed undefined behavior of isgraph when character is not in the range 0 through...
Nathan Moinvaziri [Tue, 17 Aug 2021 17:12:37 +0000 (10:12 -0700)] 
Fixed undefined behavior of isgraph when character is not in the range 0 through 0xFF inclusive.

4 years agoUse static inline functions for crc32 folding load/save.
Nathan Moinvaziri [Sat, 3 Jul 2021 19:40:55 +0000 (12:40 -0700)] 
Use static inline functions for crc32 folding load/save.

4 years agoMove crc32 folding functions into functable.
Nathan Moinvaziri [Sat, 3 Jul 2021 00:44:08 +0000 (17:44 -0700)] 
Move crc32 folding functions into functable.

4 years agoAdded CRC32_INITIAL_VALUE to prevent initial call to crc32 function.
Nathan Moinvaziri [Fri, 2 Jul 2021 23:56:20 +0000 (16:56 -0700)] 
Added CRC32_INITIAL_VALUE to prevent initial call to crc32 function.

4 years agoAdd new crc32 unit test
Matheus Castanho [Wed, 16 Jun 2021 17:36:24 +0000 (14:36 -0300)] 
Add new crc32 unit test

4 years agoAdd optimized crc32 for POWER8 and later processors
Matheus Castanho [Wed, 16 Jun 2021 17:36:24 +0000 (14:36 -0300)] 
Add optimized crc32 for POWER8 and later processors

This commit adds an optimized version of the crc32 function based
on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ .
The code has been relicensed to the zlib license.

This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com>

It makes use of vector instructions to speed up CRC32 algorithm. Decompression
times were improved by +30% on tests.

Based on Daniel Black's work for the original zlib (madler/zlib#478).

4 years agoStandardize crc32_stub
Matheus Castanho [Wed, 16 Jun 2021 17:36:24 +0000 (14:36 -0300)] 
Standardize crc32_stub

Reorganize statements inside crc32_stub() to match more closely the format
used for other function stubs in functable.c.

4 years ago[arm] Disable ACLE, UNALIGNED_OK and UNALIGNED64_OK on armv7 and earlier.
Mika Lindqvist [Wed, 21 Jul 2021 16:26:43 +0000 (19:26 +0300)] 
[arm] Disable ACLE, UNALIGNED_OK and UNALIGNED64_OK on armv7 and earlier.
* armv7 has partial support for unaligned reads, but compiler might use instructions that do not support unaligned accesses

4 years ago[PowerPC] Use templatized code for slide_hash as code for VMX and VSX is very similar
Mika Lindqvist [Tue, 22 Jun 2021 19:19:13 +0000 (22:19 +0300)] 
[PowerPC] Use templatized code for slide_hash as code for VMX and VSX is very similar
* Any differences can be handled using compiler options or added as macros before including template header

4 years agoAdd AltiVec (VMX) to supported intrinsics for adler32 and slide_hash.
Mika Lindqvist [Sun, 13 Jun 2021 17:53:16 +0000 (20:53 +0300)] 
Add AltiVec (VMX) to supported intrinsics for adler32 and slide_hash.

4 years agoAdd PowerPC without Power8 optimizations to GitHub Actions' configure and cmake workf...
Mika Lindqvist [Sun, 13 Jun 2021 17:24:09 +0000 (20:24 +0300)] 
Add PowerPC without Power8 optimizations to GitHub Actions' configure and cmake workflows.

4 years agoPowerPC: Add initial support for AltiVec.
Mika Lindqvist [Sun, 26 Mar 2017 19:54:17 +0000 (22:54 +0300)] 
PowerPC: Add initial support for AltiVec.
* Add detection of VMX instructions

4 years agoFix Z_SOLO mode
Bernhard Rosenkränzer [Sun, 27 Jun 2021 12:31:54 +0000 (14:31 +0200)] 
Fix Z_SOLO mode

Without this patch, #include <zlib.h> with Z_SOLO defined
(e.g. while building perl 5.34.0) fails because of use of
undefined types.

4 years agoRename slide source files to slide_hash to match function name.
Nathan Moinvaziri [Tue, 15 Jun 2021 03:07:41 +0000 (20:07 -0700)] 
Rename slide source files to slide_hash to match function name.

4 years agoSeparate slide_hash_c in the same way that insert_string_c is separated from deflate.c.
Nathan Moinvaziri [Tue, 15 Jun 2021 02:55:09 +0000 (19:55 -0700)] 
Separate slide_hash_c in the same way that insert_string_c is separated from deflate.c.

4 years agoAdded build system check for posix_memalign support.
Nathan Moinvaziri [Sat, 26 Jun 2021 00:23:34 +0000 (17:23 -0700)] 
Added build system check for posix_memalign support.

Co-authored-by: concatime <concatime@users.noreply@github.com>
Co-authored-by: Mika Lindqvist <postmaster@raasu.org>
4 years agoUse STDC11 defined earlier in zbuild.h for Z_TLS check.
Nathan Moinvaziri [Sat, 26 Jun 2021 00:23:56 +0000 (17:23 -0700)] 
Use STDC11 defined earlier in zbuild.h for Z_TLS check.

4 years agoIBM Z: Add vectorized CRC32 implementation
Ilya Leoshkevich [Tue, 6 Apr 2021 11:51:16 +0000 (13:51 +0200)] 
IBM Z: Add vectorized CRC32 implementation

While DFLTCC takes care of accelerating compression on level 1, other
levels can be sped up too by computing CRC32 using various vector
instructions.

Take the Linux kernel assembly code that does that - its original
author (Hendrik Brueckner) works for IBM at the time of writing and has
allowed reusing the code under the zlib license. Rewrite it in C for
better maintainability, but keep the original structure, variable names
and comments.

Update the documentation.

Add CI configurations.

4 years agoRemove extra division operation in chunkcopy.
Nathan Moinvaziri [Fri, 11 Jun 2021 01:03:08 +0000 (18:03 -0700)] 
Remove extra division operation in chunkcopy.

4 years agoRemove deflate_state dependency from crc_folding.
Nathan Moinvaziri [Sun, 20 Jun 2021 00:24:00 +0000 (17:24 -0700)] 
Remove deflate_state dependency from crc_folding.

4 years agoFixed missing enclosing parentheses for ZSWAP64 in zutil.h to avoid erroneous result...
cenobit [Sat, 26 Jun 2021 02:57:00 +0000 (19:57 -0700)] 
Fixed missing enclosing parentheses for ZSWAP64 in zutil.h to avoid erroneous result in inffast.c.

4 years agoDon't define HASH_SIZE if it is already defined.
Nathan Moinvaziri [Sat, 26 Jun 2021 04:44:22 +0000 (21:44 -0700)] 
Don't define HASH_SIZE if it is already defined.

4 years agoAdded reduced memory cmake CI job.
Nathan Moinvaziri [Fri, 25 Jun 2021 19:53:39 +0000 (12:53 -0700)] 
Added reduced memory cmake CI job.

4 years agoTurn off reduced memory cmake option by default.
Nathan Moinvaziri [Fri, 25 Jun 2021 19:52:14 +0000 (12:52 -0700)] 
Turn off reduced memory cmake option by default.

4 years ago[Power8] Add chunk*_power8.
Mika Lindqvist [Sat, 19 Jun 2021 05:58:09 +0000 (08:58 +0300)] 
[Power8] Add chunk*_power8.

4 years agoAdded reduced memory configuration option to CMake and configure.
Nathan Moinvaziri [Thu, 18 Mar 2021 06:00:44 +0000 (23:00 -0700)] 
Added reduced memory configuration option to CMake and configure.

4 years agoSwitch longest_match in deflate_slow based on whether or not rolling hash is being...
Nathan Moinvaziri [Wed, 23 Jun 2021 00:22:52 +0000 (17:22 -0700)] 
Switch longest_match in deflate_slow based on whether or not rolling hash is being used.

Co-authored-by: Hans Kristian Rosbach <hk-git@circlestorm.org>
4 years agoUse UNLIKELY for branches related to rolling hash based on performance profiling.
Hans Kristian Rosbach [Tue, 22 Jun 2021 03:43:51 +0000 (20:43 -0700)] 
Use UNLIKELY for branches related to rolling hash based on performance profiling.

Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
4 years agoUse longest_match_slow in deflate_slow.
Nathan Moinvaziri [Tue, 22 Jun 2021 03:39:47 +0000 (20:39 -0700)] 
Use longest_match_slow in deflate_slow.

4 years agoSeparate fast-zlib matching algorithm into its own longest_match variant.
Nathan Moinvaziri [Tue, 22 Jun 2021 03:38:51 +0000 (20:38 -0700)] 
Separate fast-zlib matching algorithm into its own longest_match variant.

4 years agoMinor prev_length calculation improvement in deflate_slow.
Nathan Moinvaziri [Tue, 15 Jun 2021 00:42:24 +0000 (17:42 -0700)] 
Minor prev_length calculation improvement in deflate_slow.

4 years agoEnable rolling hash function switching for fast-zlib.
Nathan Moinvaziri [Tue, 15 Jun 2021 00:41:37 +0000 (17:41 -0700)] 
Enable rolling hash function switching for fast-zlib.

4 years agoIncorporate fast-zlib algorithm changes into longest_match.
Nathan Moinvaziri [Tue, 15 Jun 2021 00:40:03 +0000 (17:40 -0700)] 
Incorporate fast-zlib algorithm changes into longest_match.

4 years agoUse STD_MIN_MATCH instead of WANT_MIN_MATCH in deflate_slow for fast-zlib.
Nathan Moinvaziri [Sun, 13 Jun 2021 21:59:32 +0000 (14:59 -0700)] 
Use STD_MIN_MATCH instead of WANT_MIN_MATCH in deflate_slow for fast-zlib.

4 years agoSetup hash functions to be switched based on compression level.
Nathan Moinvaziri [Tue, 15 Jun 2021 00:36:55 +0000 (17:36 -0700)] 
Setup hash functions to be switched based on compression level.