]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
4 months agoDon't run benchmarks as part of gtest, it would only prove that the benchmarks parametrized-testing 1885/head
Hans Kristian Rosbach [Mon, 17 Mar 2025 14:09:26 +0000 (15:09 +0100)] 
Don't run benchmarks as part of gtest, it would only prove that the benchmarks
work but not show the results. It can take a long time.

4 months agoAdd parameterized testing, based on #1448 by Ruben Vorderman
Hans Kristian Rosbach [Mon, 17 Mar 2025 14:00:52 +0000 (15:00 +0100)] 
Add parameterized testing, based on #1448 by Ruben Vorderman

Co-authored-by: Ruben Vorderman <r.h.p.vorderman@lumc.nl>
4 months agoports: Use memalign or _aligned_malloc, when available. Fallback to malloc
Detlef Riekenberg [Tue, 11 Mar 2025 12:38:54 +0000 (13:38 +0100)] 
ports: Use memalign or _aligned_malloc, when available. Fallback to malloc

Using "_WIN32" to decide,
if the MSVC extensions _aligned_malloc / _aligned_free are available
is a bug that breaks other Compiler on Windows. (OpenWatcom as Example)

Regards ... Detlef

4 months agofix the url of the s390x actions worker patch
Eddy S. [Thu, 6 Mar 2025 08:13:48 +0000 (09:13 +0100)] 
fix the url of the s390x actions worker patch

gaplib changed their patch name scheme with 1a5e012.

4 months agoFold a copy into the adler32 function for UPDATEWINDOW for neon
Adam Stylinski [Sat, 30 Nov 2024 17:01:28 +0000 (12:01 -0500)] 
Fold a copy into the adler32 function for UPDATEWINDOW for neon

So a lot of alterations had to be done to make this not worse and
so far, it's not really better, either. I had to force inlining for
the adler routine, I had to remove the x4 load instruction otherwise
pipelining stalled, and I had to use restrict pointers with a copy
idiom for GCC to inline a copy routine for the tail.

Still, we see a small benefit in benchmarks, particularly when done
with size of our window or larger. There's also an added benefit that
this will fix #1824.

5 months agoFix incorrect declaration of FORCE_SSE2
Vladislav Shchapov [Tue, 25 Feb 2025 06:42:48 +0000 (11:42 +0500)] 
Fix incorrect declaration of FORCE_SSE2

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
5 months agoChange flags to "-Werror=unguarded-availability", "-Werror=unguarded-availability...
Vladislav Shchapov [Mon, 24 Feb 2025 16:58:59 +0000 (21:58 +0500)] 
Change flags to "-Werror=unguarded-availability", "-Werror=unguarded-availability-new" and add it to maybe affected symbol checking

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
5 months agoRestore support macOS prior 10.15
Vladislav Shchapov [Sun, 23 Feb 2025 15:42:41 +0000 (20:42 +0500)] 
Restore support macOS prior 10.15

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
5 months agoMake Chorba configurable,and add a few missing header files to CMake config.
Hans Kristian Rosbach [Mon, 17 Feb 2025 20:22:51 +0000 (21:22 +0100)] 
Make Chorba configurable,and add a few missing header files to CMake config.
Add CI run without chorba enabled.

5 months agoUse OPTIMAL_CMP instead of BRAID_W to test for optimal size for Chorba.
Hans Kristian Rosbach [Mon, 17 Feb 2025 19:16:09 +0000 (20:16 +0100)] 
Use OPTIMAL_CMP instead of BRAID_W to test for optimal size for Chorba.

5 months agoClean up internal crc32 function handling.
Hans Kristian Rosbach [Mon, 17 Feb 2025 19:01:15 +0000 (20:01 +0100)] 
Clean up internal crc32 function handling.
Mark crc32_c and crc32_braid functions as internal, and remove prefix.
Reorder contents of generic_functions, and remove Z_INTERNAL hints from declarations.
Add test/benchmark output to indicate whether Chorba is used.

5 months agoReplace DO1/DO8 macros
Hans Kristian Rosbach [Mon, 17 Feb 2025 19:37:55 +0000 (20:37 +0100)] 
Replace DO1/DO8 macros

5 months agoMove Chorba defines
Hans Kristian Rosbach [Mon, 17 Feb 2025 18:57:08 +0000 (19:57 +0100)] 
Move Chorba defines

5 months agoClean up crc32_braid.
Hans Kristian Rosbach [Mon, 17 Feb 2025 18:18:22 +0000 (19:18 +0100)] 
Clean up crc32_braid.
- Rename N and W to BRAID_N and BRAID_W
- Remove override capabilities for BRAID_N and BRAID_W
- Fix formatting in crc32_braid_tbl.h
- Make makecrct not rely on crc32_braid_p.h

5 months agoAdded --installnamedir
Andrew Murray [Sun, 9 Feb 2025 21:58:39 +0000 (08:58 +1100)] 
Added --installnamedir

5 months agoimplement chorba algorithm
Sam Russell [Fri, 14 Feb 2025 11:20:54 +0000 (12:20 +0100)] 
implement chorba algorithm

5 months agoProvide --without-acle/-DWITH_ACLE options for backward compatibility
Cameron Cawley [Fri, 7 Feb 2025 20:51:02 +0000 (20:51 +0000)] 
Provide --without-acle/-DWITH_ACLE options for backward compatibility

5 months agoUse -Wa,-march with older ARM toolchains
Cameron Cawley [Thu, 29 Feb 2024 21:56:20 +0000 (21:56 +0000)] 
Use -Wa,-march with older ARM toolchains

5 months agoProvide an inline asm fallback for the ARMv8 intrinsics
Cameron Cawley [Thu, 29 Feb 2024 21:20:25 +0000 (21:20 +0000)] 
Provide an inline asm fallback for the ARMv8 intrinsics

5 months agoRename most ACLE references to ARMv8
Cameron Cawley [Thu, 29 Feb 2024 18:34:01 +0000 (18:34 +0000)] 
Rename most ACLE references to ARMv8

5 months ago2.2.4 Release 2.2.x stable 2.2.4
Hans Kristian Rosbach [Sun, 9 Feb 2025 12:19:01 +0000 (13:19 +0100)] 
2.2.4 Release

5 months agoFix shift overflow in inflate and send_code.
Mika Lindqvist [Sun, 26 Jan 2025 19:31:36 +0000 (21:31 +0200)] 
Fix shift overflow in inflate and send_code.

5 months agoFix an unfortunate bug with Visual Studio 2015
Adam Stylinski [Mon, 3 Feb 2025 02:05:37 +0000 (21:05 -0500)] 
Fix an unfortunate bug with Visual Studio 2015

Evidently this instruction, despite the intrinsic having a register operand,
is a memory-register instruction. There seems to be no alignment requirement
for the source operand. Because of this, compilers when not optimized are doing
the unaligned load and then dumping back to the stack to do the broadcasting load.
In doing this, MSVC seems to be dumping to the stack with an aligned move at an
unaligned address, causing a segfault.  GCC does not seem to make this mistake, as
it stashes to an aligned address.

If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
bit load followed by a vinserti128. This _should_ fix this (issue #1861).

6 months agoFix -Wmaybe-uninitialized warnings in benchmarks.
Hans Kristian Rosbach [Wed, 29 Jan 2025 17:46:34 +0000 (18:46 +0100)] 
Fix -Wmaybe-uninitialized warnings in benchmarks.

6 months agoAdd uncompress benchmark
Hans Kristian Rosbach [Wed, 29 Jan 2025 15:54:36 +0000 (16:54 +0100)] 
Add uncompress benchmark

6 months agos390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner
Hans Kristian Rosbach [Sun, 26 Jan 2025 14:05:24 +0000 (15:05 +0100)] 
s390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner
image in order to avoid the VX compiler bug in older clang versions.

6 months agoRemove unused include directories
Vladislav Shchapov [Thu, 23 Jan 2025 20:45:41 +0000 (01:45 +0500)] 
Remove unused include directories

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 months agoRename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_bui...
Vladislav Shchapov [Thu, 23 Jan 2025 20:45:26 +0000 (01:45 +0500)] 
Rename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_builtins.h" in zlib-ng sources directory

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 months ago[abicheck] Regenerate ABI files for zlib
Mika Lindqvist [Sun, 26 Jan 2025 11:19:08 +0000 (13:19 +0200)] 
[abicheck] Regenerate ABI files for zlib
* Generate using Ubuntu 24.04.1 LTS to fix mismatch in function signatures of gzseek() and gztell()

6 months agoDisable CRC32-VX Extention for some Clang versions
Eduard Stefes [Tue, 21 Jan 2025 09:48:07 +0000 (10:48 +0100)] 
Disable CRC32-VX Extention for some Clang versions
We have to disable the CRC32-VX implementation for some Clang versions
(18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics.

6 months agoIncrease cmake workflow timeout
Vladislav Shchapov [Thu, 23 Jan 2025 18:25:09 +0000 (23:25 +0500)] 
Increase cmake workflow timeout

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 months agoUse Ubuntu 20.04 for PPC64LE tests due to broken qemu.
Nathan Moinvaziri [Mon, 20 Jan 2025 18:26:51 +0000 (10:26 -0800)] 
Use Ubuntu 20.04 for PPC64LE tests due to broken qemu.

6 months agoUse Ubuntu 22.04 for AARCH64 tests
Nathan Moinvaziri [Thu, 9 Jan 2025 23:47:06 +0000 (15:47 -0800)] 
Use Ubuntu 22.04 for AARCH64 tests

It seems that qemu might be failing. Tests on Raspberry Pi 5 with Ubuntu 24.04
appear to work just fine.

6 months agoAdd missing compiler-rt libraries for Ubuntu 24. #1840
Nathan Moinvaziri [Sun, 5 Jan 2025 16:01:41 +0000 (08:01 -0800)] 
Add missing compiler-rt libraries for Ubuntu 24. #1840

6 months agoIgnore gcovr parser errors.
Nathan Moinvaziri [Thu, 2 Jan 2025 00:20:17 +0000 (16:20 -0800)] 
Ignore gcovr parser errors.

6 months agoDon't pin gcovr version any longer. #1840
Nathan Moinvaziri [Wed, 1 Jan 2025 22:41:27 +0000 (14:41 -0800)] 
Don't pin gcovr version any longer. #1840

6 months agoUse correct version of gcov for cross-compilers.
Nathan Moinvaziri [Sun, 5 Jan 2025 06:05:25 +0000 (22:05 -0800)] 
Use correct version of gcov for cross-compilers.

6 months agoUse Ubuntu 24 crossbuild-essential packages.
Nathan Moinvaziri [Thu, 2 Jan 2025 23:17:33 +0000 (15:17 -0800)] 
Use Ubuntu 24 crossbuild-essential packages.

6 months agoRemove package qemu for Ubuntu 24. #1840
Nathan Moinvaziri [Wed, 1 Jan 2025 22:46:59 +0000 (14:46 -0800)] 
Remove package qemu for Ubuntu 24. #1840

6 months agoUpgrade CI from Clang-11 to Clang 15 for Ubuntu 24. #1840
Nathan Moinvaziri [Wed, 1 Jan 2025 22:38:12 +0000 (14:38 -0800)] 
Upgrade CI from Clang-11 to Clang 15 for Ubuntu 24. #1840

6 months agoImprove image/container rebuild script to work properly under cron.
Hans Kristian Rosbach [Sat, 4 Jan 2025 20:19:42 +0000 (21:19 +0100)] 
Improve image/container rebuild script to work properly under cron.

6 months agoWorkaround error G6E97C40B
Dmitry Kurtaev [Wed, 15 Jan 2025 17:28:44 +0000 (20:28 +0300)] 
Workaround error G6E97C40B

Warning as an error with GCC from Uubuntu 24.04:
```
/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj]
```

6 months agocmake: disable LTO for some configure checks
Sam James [Thu, 9 Jan 2025 11:36:40 +0000 (11:36 +0000)] 
cmake: disable LTO for some configure checks

Some of zlib-ng's configure tests define a function expecting it to be compiled but
don't call that function, or don't use its return value. This is risky with
LTO where the whole thing may be optimised out, which has happened before:
* https://github.com/zlib-ng/zlib-ng/issues/1616
* https://github.com/zlib-ng/zlib-ng/pull/1622
* https://gitlab.kitware.com/cmake/cmake/-/issues/26103

Closes: https://github.com/zlib-ng/zlib-ng/issues/1841
7 months agoForce use of latest Windows SDK with 32-bit ARM support for release workflows
Vladislav Shchapov [Wed, 1 Jan 2025 08:53:16 +0000 (13:53 +0500)] 
Force use of latest Windows SDK with 32-bit ARM support for release workflows

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
7 months ago2.2.3 Release 2.2.3
Hans Kristian Rosbach [Sun, 29 Dec 2024 18:01:35 +0000 (19:01 +0100)] 
2.2.3 Release

7 months agoContinued cleanup of old UNALIGNED_OK checks
Hans Kristian Rosbach [Fri, 20 Dec 2024 22:31:37 +0000 (23:31 +0100)] 
Continued cleanup of old UNALIGNED_OK checks
- Remove obsolete checks
- Fix checks that are inconsistent
- Stop compiling compare256/longest_match variants that never gets called
- Improve how the generic compare256 functions are handled.
- Allow overriding OPTIMAL_CMP

This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.

7 months agoRename functions to get rid of old and now misleading "unaligned" naming
Hans Kristian Rosbach [Sun, 22 Dec 2024 12:25:27 +0000 (13:25 +0100)] 
Rename functions to get rid of old and now misleading "unaligned" naming

7 months agoUse GCC's may_alias attribute for unaligned memory access
Cameron Cawley [Thu, 27 Jul 2023 20:07:29 +0000 (21:07 +0100)] 
Use GCC's may_alias attribute for unaligned memory access

7 months agoImproved setting of OPTIMAL_CMP on ARM
Cameron Cawley [Sun, 22 Dec 2024 13:43:30 +0000 (13:43 +0000)] 
Improved setting of OPTIMAL_CMP on ARM

7 months agoFix unaligned access in ACLE based crc32
Adam Stylinski [Sat, 21 Dec 2024 16:04:47 +0000 (11:04 -0500)] 
Fix unaligned access in ACLE based crc32

This fixes a rightful complaint from the alignment sanitizer that we
alias memory in an unaligned fashion. A nice added bonus is that this
improves performance a tiny bit on the larger buffers, perhaps due to
loops that idiomatically decrement a count and increment a single buffer
pointer rather than the maze of conditional pointer reassignments.

While here, let's write a unit test just for this. Since this is the only
variant that accesses memory in a potentially unaligned fashion that doesn't
explicitly go byte by byte or use intrinsics that don't require alignment,
we'll enable it only for this function for now. Adding more tests later if
need be should be possible. For everything else not crc, we're relying on
ubsan to hopefully catch things by chance.

7 months agoUpdate s390x actions-runner docker
Hans Kristian Rosbach [Mon, 16 Sep 2024 11:15:46 +0000 (13:15 +0200)] 
Update s390x actions-runner docker

7 months agoSet OPTIMAL_CMP for 32-bit PowerPC
Cameron Cawley [Sat, 21 Dec 2024 17:30:18 +0000 (17:30 +0000)] 
Set OPTIMAL_CMP for 32-bit PowerPC

7 months agoFix "RLE" compression with big endian architectures
Adam Stylinski [Sat, 21 Dec 2024 15:09:58 +0000 (10:09 -0500)] 
Fix "RLE" compression with big endian architectures

This was missed in #1831. The RLE methods compare a string of bytes
directly with itself to directly derive a simple run length encoding.
They use similar but not identical methods to compare256. This needs
a similar endianness check at compile time to know which compare bit
count to use (leading or trailing).

7 months agoMake big endians first class citizens again
Adam Stylinski [Fri, 20 Dec 2024 23:53:51 +0000 (18:53 -0500)] 
Make big endians first class citizens again

No longer do the big iron on yore which lack SIMD optimized loads need
to search strings a byte at a time like primitive machines of the vax
era. This guard here was mostly due to the fact that the string
comparison was searched with "count trailing zero", which assumes an
endianness.  We can just conditionally use leading zeros when on big
endian and stop using the extremely naive C implementation. This makes
things a tad bit faster.

7 months agoadler32_rvv: Fix some overflow problems
Icenowy Zheng [Sat, 14 Dec 2024 17:31:48 +0000 (01:31 +0800)] 
adler32_rvv: Fix some overflow problems

There are currently some overflow problems in adler32_rvv
implementation, which can lead to wrong results for some input, and
these problems could be easily exhibited when running `git fsck` with
zlib-ng suitituting the system zlib on a big git repository.

These problems and the solutions are the following:

- When the input data is long enough, the v_buf32_accu can overflow too.
  Add it to the modulo code that happens per ~NMAX bytes.
- When the vector data is reduced to scalar ones, the resulting scalar
  value (and the proceeded length) may lead to the calculation of sum2
  to overflow. Add mod BASE to all these reductions and initial
  calculation of sum2.
- When the remaining data less than vl bytes, the code falls back to a
  scalar implementation; however the sum2 and alder2 values are just
  reduced from vectors and could be very big that makes sum2 overflows
  in the scalar code. Modulo them before the scalar code to prevent such
  overflow (because vl is surely quite smaller than NMAX).

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
7 months agoSince we long ago make unaligned reads safe (by using memcpy or intrinsics),
Hans Kristian Rosbach [Tue, 17 Dec 2024 22:02:32 +0000 (23:02 +0100)] 
Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.

7 months agoFix typos (#1825)
Adeel Mujahid [Fri, 20 Dec 2024 22:35:50 +0000 (00:35 +0200)] 
Fix typos (#1825)

7 months agoadded in-tree build artifacts to .gitignore
Eduard Stefes [Wed, 4 Dec 2024 08:15:27 +0000 (09:15 +0100)] 
added in-tree build artifacts to .gitignore

7 months agoRevert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"
Hans Kristian Rosbach [Tue, 17 Dec 2024 22:09:31 +0000 (23:09 +0100)] 
Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"

This reverts commit 80fffd72f316df980bb15ea0daf06ba22e3583ec.
It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews.

7 months agoSince we long ago make unaligned reads safe (by using memcpy or intrinsics),
Hans Kristian Rosbach [Tue, 17 Dec 2024 22:02:32 +0000 (23:02 +0100)] 
Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.

7 months agoImprove pipeling for AVX512 chunking
Adam Stylinski [Sat, 30 Nov 2024 14:23:28 +0000 (09:23 -0500)] 
Improve pipeling for AVX512 chunking

For reasons that aren't quite so clear, using the masked writes here
did not pipeline very well. Either setting up the mask stalled things
or masked moves have issues overlapping regular moves. Simply putting
the masked moves behind a branch that is rarely taken seemed to do the
trick in improving the ILP. While here, put masked loads behind the same
branch in case there were ever a hazard for overreading.

7 months agozbuild: Provide a fallback for "ALIGNED_(x)" for other compiler
Detlef Riekenberg [Fri, 29 Nov 2024 21:59:52 +0000 (22:59 +0100)] 
zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler

7 months agoEnable AVX2 functions to be built with BMI2 instructions
Adam Stylinski [Thu, 28 Nov 2024 00:00:52 +0000 (19:00 -0500)] 
Enable AVX2 functions to be built with BMI2 instructions

While these are technically different instructions, no such CPU exists
that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
eliminate several flag stalls by having flagless versions of shifts, and
allows us to not clobber and move around GPRs so much in scalar code.
There's usually a sizeable benefit for enabling it. Since we're building
with BMI2 for AVX2 functions, let's also just make sure the CPU claims
to support it (just to cover our bases).

7 months agoAddress deprecated cmake version warning.
Bradley Lowekamp [Tue, 26 Nov 2024 14:12:49 +0000 (09:12 -0500)] 
Address deprecated cmake version warning.

Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set
the policy at the same time as the compatibile CMake version.

8 months agoBump codecov/codecov-action from 4 to 5
dependabot[bot] [Sun, 1 Dec 2024 07:13:42 +0000 (07:13 +0000)] 
Bump codecov/codecov-action from 4 to 5

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
8 months agoFix native detection of CRC instruction
Adam Stylinski [Thu, 28 Nov 2024 19:05:32 +0000 (14:05 -0500)] 
Fix native detection of CRC instruction

It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
preprocessor macro for that flag is not defined with -march=native on a
raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
and the auto detection when built without NATIVE support does properly
get dispatched to. Since we only need the scalar CRC32 and not the polynomial
stuff anyhow, let's make it be an || condition and not a && one.

8 months agoRemove unused HAVE_CHUNKMEMSET_1 define
Pavel P [Wed, 27 Nov 2024 23:18:20 +0000 (01:18 +0200)] 
Remove unused HAVE_CHUNKMEMSET_1 define

8 months agoFix casting warning/error in test_compress_bound.cc
Pavel P [Wed, 27 Nov 2024 21:13:34 +0000 (23:13 +0200)] 
Fix casting warning/error in test_compress_bound.cc

Fixes the following error when building with msvc compiler
```
test_compress_bound.cc
D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error
D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
```

8 months agoForce use of latest Windows SDK with 32-bit ARM support
Vladislav Shchapov [Sun, 24 Nov 2024 13:34:40 +0000 (18:34 +0500)] 
Force use of latest Windows SDK with 32-bit ARM support

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
8 months agoMake an AVX512 inflate fast with low cost masked writes
Adam Stylinski [Wed, 25 Sep 2024 21:56:36 +0000 (17:56 -0400)] 
Make an AVX512 inflate fast with low cost masked writes

This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.

9 months agoTry to simply the inflate loop by collapsing most cases to chunksets
Adam Stylinski [Mon, 23 Sep 2024 22:26:04 +0000 (18:26 -0400)] 
Try to simply the inflate loop by collapsing most cases to chunksets

9 months agoMake chunkset_avx2 half chunk aware
Adam Stylinski [Thu, 12 Sep 2024 21:47:30 +0000 (17:47 -0400)] 
Make chunkset_avx2 half chunk aware

This gives us appreciable gains on a number of fronts.  The first being
we're inlining a pretty hot function that was getting dispatched to
regularly. Another is that we're able to do a safe lagged copy of a
distance that is smaller, so CHUNKCOPY gets its teeth back here for
smaller sizes, without having to do another dispatch to a function.

We're also now doing two overlapping writes at once and letting the CPU
do its store forwarding. This was an enhancement @dougallj had suggested
a while back.

Additionally, the "half chunk mag" here is fundamentally less
complicated because it doesn't require sythensizing cross lane permutes
with a blend operation, so we can optimistically do that first if the
len is small enough that a full 32 byte chunk doesn't make any sense.

9 months agoSimplify avx2 chunkset a bit
Adam Stylinski [Wed, 11 Sep 2024 22:34:54 +0000 (18:34 -0400)] 
Simplify avx2 chunkset a bit

Put length 16 in the length checking ladder and take care of it there
since it's also a simple case to handle. We kind of went out of our way
to pretend 128 bit vectors didn't exist when using avx2 but this can be
handled in a single instruction. Strangely the intrinsic uses vector
register operands but the instruction itself assumes a memory operand
for the source. This also means we don't have to handle this case in our
"GET_CHUNK_MAG" function.

9 months agoReorder variables in inflate functions to reduce padding holes
Hans Kristian Rosbach [Wed, 9 Oct 2024 14:27:43 +0000 (16:27 +0200)] 
Reorder variables in inflate functions to reduce padding holes
due to variable alignment requirements.

9 months agoconfigure: add --mandir to override $mandir on command line.
Mika Lindqvist [Sat, 28 Sep 2024 05:09:17 +0000 (08:09 +0300)] 
configure: add --mandir to override $mandir on command line.

9 months agoconfigure: Fix linker flags for Haiku.
Mika Lindqvist [Fri, 27 Sep 2024 14:09:22 +0000 (17:09 +0300)] 
configure: Fix linker flags for Haiku.

9 months agoReorder 'inflate_state' struct to improve cache-locality of variables
Hans Kristian Rosbach [Wed, 25 Sep 2024 15:25:19 +0000 (17:25 +0200)] 
Reorder 'inflate_state' struct to improve cache-locality of variables
needed by inffast (from 6 cachelines to 1).
Also fill in some unnecessary holes.

9 months agoAdd variable 'wbufsize' to track window buffer including padding, to allow
Hans Kristian Rosbach [Wed, 25 Sep 2024 15:21:28 +0000 (17:21 +0200)] 
Add variable 'wbufsize' to track window buffer including padding, to allow
the chunkset code to spill garbage data into the padding area if available.

9 months agoDon't use 'dmax' and 'sane' variables unless their checks have been compiled in.
Hans Kristian Rosbach [Wed, 25 Sep 2024 15:18:49 +0000 (17:18 +0200)] 
Don't use 'dmax' and 'sane' variables unless their checks have been compiled in.

9 months agoCompute the "safe" distance properly
Adam Stylinski [Thu, 3 Oct 2024 21:17:44 +0000 (17:17 -0400)] 
Compute the "safe" distance properly

The safe pointer that is computed is an exclusive, not inclusive bounds.
While we were probably rarely ever bit this, if ever, it still makes
sense to apply the limit, properly.

10 months agoExplicitly set CMake policy 0169 to silence warning
FantasqueX [Thu, 19 Sep 2024 16:53:18 +0000 (00:53 +0800)] 
Explicitly set CMake policy 0169 to silence warning

The recommended `FetchContent_MakeAvailable()` is introduced in CMake
3.14 which is greater than `cmake_minimum_required()`.

CMake policy will effects subdirectories.

The `cmake_minimum_required(VERSION)` command implicitly calls
`cmake_policy(VERSION)`.

Closes https://github.com/zlib-ng/zlib-ng/issues/1788

10 months agoSimplify chunking in the copy ladder here
Adam Stylinski [Sun, 15 Sep 2024 16:23:50 +0000 (12:23 -0400)] 
Simplify chunking in the copy ladder here

As it turns out, trying to peel off the remainder with so many branches
caused the code size to inflate a bit too much that this function
wouldn't inline without some fairly aggressive optimization flags. Only
catching vector sized chunks here makes the loop body small enough and
having the byte by byte copy idiom at the bottom gives the compiler some
flexibility that it is likely to do something there.

10 months agoDisable MSVC warning 4324 (struct padded due to alignment)
Hans Kristian Rosbach [Wed, 25 Sep 2024 18:52:26 +0000 (20:52 +0200)] 
Disable MSVC warning 4324 (struct padded due to alignment)

10 months agoForce Visual C++ to treat source files as UTF-8.
Mika Lindqvist [Wed, 18 Sep 2024 18:55:40 +0000 (21:55 +0300)] 
Force Visual C++ to treat source files as UTF-8.

10 months agoReplace non-ascii characters to fix MSVC warning
FantasqueX [Thu, 19 Sep 2024 16:05:26 +0000 (00:05 +0800)] 
Replace non-ascii characters to fix MSVC warning

10 months ago[CI] Don't try to use macOS 11 as it's no longer supported.
Mika Lindqvist [Fri, 23 Feb 2024 11:21:28 +0000 (13:21 +0200)] 
[CI] Don't try to use macOS 11 as it's no longer supported.

10 months agoUse target include instead of raw include
Letu Ren [Tue, 17 Sep 2024 13:49:27 +0000 (21:49 +0800)] 
Use target include instead of raw include

10 months agoFix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False...
Vladislav Shchapov [Tue, 17 Sep 2024 15:10:34 +0000 (20:10 +0500)] 
Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
10 months agoAllow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS...
Vladislav Shchapov [Tue, 17 Sep 2024 15:08:41 +0000 (20:08 +0500)] 
Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
10 months agoFix build on aarch64 android.
Bartosz Taudul [Tue, 17 Sep 2024 10:46:11 +0000 (12:46 +0200)] 
Fix build on aarch64 android.

When building with CMake toolchain provided by NDK, the ARCH variable is
not "aarch64", but "aarch64-none-linux-android26" (or similar). The
strict string match check causes the WITH_ARMV6 option to be enabled in
such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which
is not intended to be used on aarch64, and fails.

Relax the check and assume aarch64 if the ARCH variable contains aarch64.

10 months ago2.2.2 Release 2.2.2
Hans Kristian Rosbach [Sun, 15 Sep 2024 14:11:48 +0000 (16:11 +0200)] 
2.2.2 Release

10 months agoRevert "Split chunkcopy_safe to allow the first part to be inlined more often."
Hans Kristian Rosbach [Tue, 17 Sep 2024 12:09:44 +0000 (14:09 +0200)] 
Revert "Split chunkcopy_safe to allow the first part to be inlined more often."

This reverts commit 6b8efe78685926d89da0cf1c496c1022fc24f588.

New and improved chunkcopy_safe is coming soon.

10 months agoMake use of unaligned loads on big endian in insert_string
Cameron Cawley [Sun, 25 Feb 2024 18:08:53 +0000 (18:08 +0000)] 
Make use of unaligned loads on big endian in insert_string

10 months agoSplit chunkcopy_safe to allow the first part to be inlined more often.
Hans Kristian Rosbach [Wed, 11 Sep 2024 10:31:38 +0000 (12:31 +0200)] 
Split chunkcopy_safe to allow the first part to be inlined more often.

10 months ago[RISCV] Better run-time detection of RVV vector instruction support
Mika Lindqvist [Mon, 26 Aug 2024 16:26:37 +0000 (19:26 +0300)] 
[RISCV] Better run-time detection of RVV vector instruction support

Original version posted by @ncopa in #1705.

10 months agoFixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms.
Alexander Smorkalov [Tue, 10 Sep 2024 14:07:09 +0000 (17:07 +0300)] 
Fixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms.

10 months agoDon't use chunkunroll for inflateBack
Nathan Moinvaziri [Mon, 9 Sep 2024 20:32:33 +0000 (13:32 -0700)] 
Don't use chunkunroll for inflateBack

If the output buffer and the window buffer are the same
memory allocation, we cannot make the assumptions that chunkunroll
does, that it is okay to overwrite the output buffer.

11 months agoAddress CR feedback
Adeel Mujahid [Thu, 29 Aug 2024 11:27:43 +0000 (14:27 +0300)] 
Address CR feedback

11 months agoFix new Windows SDK build break
Adeel Mujahid [Wed, 28 Aug 2024 20:49:55 +0000 (23:49 +0300)] 
Fix new Windows SDK build break

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
11 months agoEnable warning C4242 and treat warnings as errors for Visual C++.
Mika Lindqvist [Mon, 12 Aug 2024 23:20:19 +0000 (02:20 +0300)] 
Enable warning C4242 and treat warnings as errors for Visual C++.