]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
3 months agoSimplify LoongArch64 assembler. GCC 16, LLVM 22 have LASX and LSX conversion intrinsics.
Vladislav Shchapov [Sat, 20 Dec 2025 22:38:50 +0000 (03:38 +0500)] 
Simplify LoongArch64 assembler. GCC 16, LLVM 22 have LASX and LSX conversion intrinsics.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
3 months agoImprove LoongArch64 toolchain file.
Vladislav Shchapov [Sat, 20 Dec 2025 20:30:38 +0000 (01:30 +0500)] 
Improve LoongArch64 toolchain file.

Use COMPILER_SUFFIX variable to set gcc name suffix.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
3 months agoForce purely aligned loads in inflate_table code length counting
Adam Stylinski [Fri, 12 Dec 2025 21:23:27 +0000 (16:23 -0500)] 
Force purely aligned loads in inflate_table code length counting

At the expense of some extra stack space and eating about 4 more cache
lines, let's make these loads purely aligned. On potato CPUs such as the
Core 2, unaligned loads in a loop are not ideal. Additionally some SBC
based ARM chips (usually the little in big.little variants) suffer a
penalty for unaligned loads. This also paves the way for a trivial
altivec implementation, for which unaligned loads don't exist and need
to be synthesized with permutation vectors.

3 months agoOptimize code length counting in inflate_table using intrinsics.
Dougall Johnson [Wed, 10 Dec 2025 03:06:06 +0000 (19:06 -0800)] 
Optimize code length counting in inflate_table using intrinsics.

https://github.com/dougallj/zlib-dougallj/commit/f23fa25aa168ef782bab5e7cd6f9df50d7bb5eb2
https://godbolt.org/z/fojxrEo4T

Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
4 months agoAdd missing adler32_copy_power8 implementation
Nathan Moinvaziri [Fri, 26 Dec 2025 16:50:44 +0000 (08:50 -0800)] 
Add missing adler32_copy_power8 implementation

4 months agoAdd missing adler32_copy_ssse3 implementation
Nathan Moinvaziri [Thu, 18 Dec 2025 00:35:18 +0000 (16:35 -0800)] 
Add missing adler32_copy_ssse3 implementation

4 months agoAdd missing adler32_copy_vmx implementation
Nathan Moinvaziri [Fri, 26 Dec 2025 16:56:41 +0000 (08:56 -0800)] 
Add missing adler32_copy_vmx implementation

4 months agoAdd comment to adler32_copy_avx512_vnni about lower vector width usage
Nathan Moinvaziri [Thu, 18 Dec 2025 00:12:30 +0000 (16:12 -0800)] 
Add comment to adler32_copy_avx512_vnni about lower vector width usage

4 months agoAdd static inline/Z_FORCEINLINE to crc32_(v)pclmulqdq functions.
Nathan Moinvaziri [Fri, 26 Dec 2025 16:39:04 +0000 (08:39 -0800)] 
Add static inline/Z_FORCEINLINE to crc32_(v)pclmulqdq functions.

4 months agoUse tail optimization in final barrett reduction
Nathan Moinvaziri [Fri, 26 Dec 2025 08:30:58 +0000 (00:30 -0800)] 
Use tail optimization in final barrett reduction

Fold 4x128-bit into a single 128-bit value using k1/k2 constants, then reduce
128-bits to 32-bits.

https://www.corsix.org/content/alternative-exposition-crc32_4k_pclmulqdq

4 months agoMove COPY out of fold_16 inline with other fold_# functions.
Nathan Moinvaziri [Fri, 26 Dec 2025 08:15:20 +0000 (00:15 -0800)] 
Move COPY out of fold_16 inline with other fold_# functions.

4 months agoMove fold calls closer to last change in xmm_crc# variables.
Nathan Moinvaziri [Fri, 26 Dec 2025 07:47:14 +0000 (23:47 -0800)] 
Move fold calls closer to last change in xmm_crc# variables.

4 months agoHandle initial crc only at the beginning of crc32_(v)pclmulqdq
Nathan Moinvaziri [Fri, 26 Dec 2025 07:14:21 +0000 (23:14 -0800)] 
Handle initial crc only at the beginning of crc32_(v)pclmulqdq

4 months agoFix initial crc value loading in crc32_(v)pclmulqdq
Nathan Moinvaziri [Sun, 14 Dec 2025 08:57:37 +0000 (00:57 -0800)] 
Fix initial crc value loading in crc32_(v)pclmulqdq

In main function, alignment diff processing was getting in the way of XORing
the initial CRC, because it does not guarantee at least 16 bytes have been
loaded.

In fold_16, src data modified by initial crc XORing before being stored to dst.

4 months agoRename crc32_fold_pclmulqdq_tpl.h to crc32_pclmulqdq_tpl.h
Nathan Moinvaziri [Thu, 11 Dec 2025 07:21:47 +0000 (23:21 -0800)] 
Rename crc32_fold_pclmulqdq_tpl.h to crc32_pclmulqdq_tpl.h

4 months agoMerged crc32_fold functions save, load, reset
Nathan Moinvaziri [Thu, 11 Dec 2025 06:59:50 +0000 (22:59 -0800)] 
Merged crc32_fold functions save, load, reset

4 months agoMove crc32_fold_s struct into x86 implementation.
Nathan Moinvaziri [Sun, 14 Dec 2025 18:32:02 +0000 (10:32 -0800)] 
Move crc32_fold_s struct into x86 implementation.

4 months agoUpdate crc32_fold test and benchmarks for crc32_copy
Nathan Moinvaziri [Fri, 19 Dec 2025 00:37:34 +0000 (16:37 -0800)] 
Update crc32_fold test and benchmarks for crc32_copy

4 months agoRefactor crc32_fold functions into single crc32_copy
Nathan Moinvaziri [Fri, 19 Dec 2025 00:17:18 +0000 (16:17 -0800)] 
Refactor crc32_fold functions into single crc32_copy

4 months agoRemove redundant instructions in 256 bit wide chunkset on LoongArch64
Vladislav Shchapov [Sat, 27 Dec 2025 10:58:03 +0000 (15:58 +0500)] 
Remove redundant instructions in 256 bit wide chunkset on LoongArch64

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
4 months agoSmall optimization in 256 bit wide chunkset
Adam Stylinski [Tue, 23 Dec 2025 23:58:10 +0000 (18:58 -0500)] 
Small optimization in 256 bit wide chunkset

It turns out Intel only parses the bottom 4 bits of the shuffle vector.
This makes it already a sufficient permutation vector and saves us a
small bit of latency.

4 months agoUse different bit accumulator type for x86 compiler optimization
Nathan Moinvaziri [Sat, 13 Dec 2025 01:50:15 +0000 (17:50 -0800)] 
Use different bit accumulator type for x86 compiler optimization

4 months agoFix bits var warning conversion from unsigned int to uint8_t in MSVC
Nathan Moinvaziri [Wed, 10 Dec 2025 21:34:31 +0000 (13:34 -0800)] 
Fix bits var warning conversion from unsigned int to uint8_t in MSVC

4 months agoChange code table access from pointer to value in inflate_fast.
Dougall Johnson [Wed, 3 Dec 2025 07:44:56 +0000 (23:44 -0800)] 
Change code table access from pointer to value in inflate_fast.

+r doesn't appear to work on MIPS or RISC-V architectures

Co-authored by: Nathan Moinvaziri <nathan@nathanm.com>

4 months agoApply consistent use of UNLIKLEY across adler32 variants
Nathan Moinvaziri [Thu, 18 Dec 2025 00:05:55 +0000 (16:05 -0800)] 
Apply consistent use of UNLIKLEY across adler32 variants

4 months agoClean up adler32 short length functions
Nathan Moinvaziri [Wed, 17 Dec 2025 02:00:11 +0000 (18:00 -0800)] 
Clean up adler32 short length functions

4 months agoImprove cmake/detect-arch.cmake to also provide bitness.
Hans Kristian Rosbach [Fri, 5 Dec 2025 19:04:14 +0000 (20:04 +0100)] 
Improve cmake/detect-arch.cmake to also provide bitness.
Rewrite checks in CMakelists.txt and cmake/detect-intrinsics.cmake
to utilize the new variables.

4 months agoReorder deflate.h variables to improve cache locality
Hans Kristian Rosbach [Wed, 10 Dec 2025 19:27:46 +0000 (20:27 +0100)] 
Reorder deflate.h variables to improve cache locality

4 months agoUse uint32_t for hash_head in update_hash/insert_string
Hans Kristian Rosbach [Thu, 11 Dec 2025 19:34:05 +0000 (20:34 +0100)] 
Use uint32_t for hash_head in update_hash/insert_string

4 months agoUse uin32_t for Pos in match_tpl.h
Hans Kristian Rosbach [Thu, 11 Dec 2025 16:24:59 +0000 (17:24 +0100)] 
Use uin32_t for Pos in match_tpl.h

4 months ago- Reorder variables in longest_match, reducing gaps.
Hans Kristian Rosbach [Mon, 8 Dec 2025 13:30:05 +0000 (14:30 +0100)] 
- Reorder variables in longest_match, reducing gaps.
- Make window-based pointers in match_tpl.h const, only the
  pointers move, never the data.

4 months agoUse pointer arithmetic to access window in deflate_quick/deflate_fast
Hans Kristian Rosbach [Mon, 8 Dec 2025 13:30:05 +0000 (14:30 +0100)] 
Use pointer arithmetic to access window in deflate_quick/deflate_fast

4 months ago- Add local window pointer to:
Hans Kristian Rosbach [Mon, 8 Dec 2025 12:18:24 +0000 (13:18 +0100)] 
- Add local window pointer to:
  deflate_quick, deflate_fast, deflate_medium and fill_window.
- Add local strm pointer in fill_window.
- Fix missed change to use local lookahead variable in match_tpl

4 months agoDeflate_state changes:
Hans Kristian Rosbach [Mon, 8 Dec 2025 12:09:42 +0000 (13:09 +0100)] 
Deflate_state changes:
- Reduce opt_len/static_len sizes.
- Move matches/insert closer to their related varibles.
  These now fill a 8-byte hole in the struct on 64-bit platforms.
- Exclude compressed_len and bits_sent if ZLIB_DEBUG is
  not enabled. Also move them to the end.
- Remove x86 MSVC-specific padding

4 months ago- Minor inlining changes in trees_emit.h:
Hans Kristian Rosbach [Mon, 8 Dec 2025 12:03:33 +0000 (13:03 +0100)] 
- Minor inlining changes in trees_emit.h:
  - Inline the small bi_windup function
  - Don't attempt inlining for the big zng_emit_dist
- Don't check for too long match in deflate_quick, it cannot happen.
- Move GOTO_NEXT_CHAIN macro outside of LONGEST_MATCH function to
  improve readability.

4 months agoFix warnings: unused parameter state, comparison of integer expressions of different...
Vladislav Shchapov [Sat, 20 Dec 2025 14:31:01 +0000 (19:31 +0500)] 
Fix warnings: unused parameter state, comparison of integer expressions of different signedness: size_t and int64_t.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
4 months agoslide_hash_sse2 and slide_hash_avx2 are not dependent on HAVE_BUILTIN_CTZ
Mathias Heyer [Thu, 18 Dec 2025 01:14:41 +0000 (17:14 -0800)] 
slide_hash_sse2 and slide_hash_avx2 are not dependent on HAVE_BUILTIN_CTZ

This patch matches x86_functions.h with behavior found in functable.c

It fixes builds where HAVE_BUILTIN_CTZ remained undefined.

4 months agoChange bi_reverse to use uint16_t code arg.
Nathan Moinvaziri [Fri, 12 Dec 2025 01:28:12 +0000 (17:28 -0800)] 
Change bi_reverse to use uint16_t code arg.

4 months agoUse __builtin_bitreverse16 in inflate_table
Nathan Moinvaziri [Sun, 7 Dec 2025 07:56:21 +0000 (23:56 -0800)] 
Use __builtin_bitreverse16 in inflate_table

https://github.com/dougallj/zlib-dougallj/commit/f23fa25aa168ef782bab5e7cd6f9df50d7bb5eb2

4 months agoUse __builtin_bitreverse16 in bi_reverse if available.
Nathan Moinvaziri [Sat, 6 Dec 2025 15:55:07 +0000 (07:55 -0800)] 
Use __builtin_bitreverse16 in bi_reverse if available.

4 months agoReorder code struct fields for better access patterns
Dougall Johnson [Mon, 8 Dec 2025 04:11:52 +0000 (20:11 -0800)] 
Reorder code struct fields for better access patterns

Place bits field before op field in code struct to optimize memory
access. The bits field is accessed first in the hot path, so placing
it at offset 0 may improve code generation on some architectures.

4 months agoRemove COPY ifdef from crc32 (v)pclmulqdq.
Nathan Moinvaziri [Wed, 3 Dec 2025 03:36:54 +0000 (19:36 -0800)] 
Remove COPY ifdef from crc32 (v)pclmulqdq.

4 months agoAdd padding to deflate_struct until can be cleaned up along cachelines
Nathan Moinvaziri [Mon, 8 Dec 2025 03:54:41 +0000 (19:54 -0800)] 
Add padding to deflate_struct until can be cleaned up along cachelines

4 months agoCompute w_bits rather than storing it in the deflate_state structure
Nathan Moinvaziri [Mon, 8 Dec 2025 03:59:36 +0000 (19:59 -0800)] 
Compute w_bits rather than storing it in the deflate_state structure

Co-authored-by: Brian Pane <brianp@brianp.net>
4 months agoCompute w_mask rather than storing it in the deflate_state structure
Nathan Moinvaziri [Sat, 6 Dec 2025 01:52:47 +0000 (17:52 -0800)] 
Compute w_mask rather than storing it in the deflate_state structure

Co-authored-by: Brian Pane <brianp@brianp.net>
4 months ago[configure] Fix detecting -fno-lto support
Mika T. Lindqvist [Sat, 6 Dec 2025 21:52:57 +0000 (23:52 +0200)] 
[configure] Fix detecting -fno-lto support
* Previously -fno-lto support was assumed to be supported on non-gcc compatible or unsupported compilers.
  Support for it was never tested on those cases. Set the default to not supported.

4 months agoMicro-optimization for in pointer calculation for inflate_fast REFILL
Nathan Moinvaziri [Sat, 6 Dec 2025 03:59:06 +0000 (19:59 -0800)] 
Micro-optimization for in pointer calculation for inflate_fast REFILL

trifectatechfoundation/zlib-rs#320

Co-authored-by: Brian Pane <brianp@brianp.net>
4 months agoFix for potentially uninitialized local variable ft used.
Vladislav Shchapov [Sat, 6 Dec 2025 15:17:40 +0000 (20:17 +0500)] 
Fix for potentially uninitialized local variable ft used.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
4 months agoUse local copies of s->level and s->window in deflate_slow
Hans Kristian Rosbach [Wed, 3 Dec 2025 12:10:06 +0000 (13:10 +0100)] 
Use local copies of s->level and s->window in deflate_slow

4 months agoInline all uses of quick_insert_string*/quick_insert_value*.
Hans Kristian Rosbach [Sun, 30 Nov 2025 21:31:49 +0000 (22:31 +0100)] 
Inline all uses of quick_insert_string*/quick_insert_value*.
Inline all uses of update_hash*.
Inline insert_string into deflate_quick, deflate_fast and deflate_medium.
Remove insert_string from deflate_state
Use local function pointer for insert_string.
Fix level check to actually check level and not `s->max_chain_length <= 1024`.

4 months agoWrap _cond in Assert macro in case complex statement used.
Nathan Moinvaziri [Wed, 3 Dec 2025 06:51:05 +0000 (22:51 -0800)] 
Wrap _cond in Assert macro in case complex statement used.

4 months agoWrap support_flag for cpu features in benchmark and test macros.
Nathan Moinvaziri [Wed, 3 Dec 2025 05:23:20 +0000 (21:23 -0800)] 
Wrap support_flag for cpu features in benchmark and test macros.

4 months agoFixed casting warning in benchmark_uncompress on MSVC
Nathan Moinvaziri [Wed, 3 Dec 2025 05:19:45 +0000 (21:19 -0800)] 
Fixed casting warning in benchmark_uncompress on MSVC

benchmark_uncompress.cc(55,93): warning C4244: 'argument': conversion from 'int64_t' to 'size_t', possible loss of data

4 months agoRename adler32_fold_copy to adler32_copy (#2026)
Nathan Moinvaziri [Tue, 2 Dec 2025 23:25:56 +0000 (15:25 -0800)] 
Rename adler32_fold_copy to adler32_copy (#2026)

There are no folding techniques in adler32 implementations. It is simply hashing while copying.
- Rename adler32_fold_copy to adler32_copy.
- Remove unnecessary adler32_fold.c file.
- Reorder adler32_copy functions last in source file for consistency.
- Rename adler32_rvv_impl to adler32_copy_impl for consistency.
- Replace dst != NULL with 1 in adler32_copy_neon to remove branching.

4 months agoUse elf_aux_info() on FreeBSD and OpenBSD ARM / AArch64
Brad Smith [Fri, 14 Nov 2025 11:45:41 +0000 (06:45 -0500)] 
Use elf_aux_info() on FreeBSD and OpenBSD ARM / AArch64

Use elf_aux_info() as the prefered API for modern FreeBSD and OpenBSD
ARM and AArch64. This adds 32-bit ARM support.

4 months agoChorba: Add test cases for #2029
Sam Russell [Tue, 2 Dec 2025 19:12:17 +0000 (20:12 +0100)] 
Chorba: Add test cases for #2029

Add test case from @KungFuJesus and a few others in similar data lengths

4 months agoChorba: Fix edge case bug for >256KB input
Sam Russell [Tue, 2 Dec 2025 13:46:33 +0000 (14:46 +0100)] 
Chorba: Fix edge case bug for >256KB input

4 months agoBump actions/checkout from 5 to 6
dependabot[bot] [Mon, 1 Dec 2025 07:20:48 +0000 (07:20 +0000)] 
Bump actions/checkout from 5 to 6

Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
4 months agoAdd quick_insert_value for optimized hash insertion
Nathan Moinvaziri [Sat, 29 Nov 2025 03:31:10 +0000 (19:31 -0800)] 
Add quick_insert_value for optimized hash insertion

Reduces the number of reads by two

Co-authored-by: Brian Pane <brianp@brianp.net>
trifectatechfoundation/zlib-rs#374
trifectatechfoundation/zlib-rs#375

4 months agodeflate_stored: use local copy of s->w_size
Hans Kristian Rosbach [Fri, 28 Nov 2025 23:50:40 +0000 (18:50 -0500)] 
deflate_stored: use local copy of s->w_size

4 months agoMinor cleanups of some variables in deflate functions
Hans Kristian Rosbach [Fri, 28 Nov 2025 23:49:37 +0000 (18:49 -0500)] 
Minor cleanups of some variables in deflate functions

5 months ago2.3.1 Release 2.3.1
Hans Kristian Rosbach [Tue, 25 Nov 2025 11:18:52 +0000 (12:18 +0100)] 
2.3.1 Release

5 months agoConditionally shortcut via the chorba polynomial based on compile flags
Adam Stylinski [Fri, 21 Nov 2025 15:02:14 +0000 (10:02 -0500)] 
Conditionally shortcut via the chorba polynomial based on compile flags

As it turns out, the copying CRC32 variant _is_ slower when compiled
with generic flags. The reason for this is mainly extra stack spills and
the lack of operations we can overlap with the moves. However, when
compiling for an architecture with more registers, such as avx512, we no
longer have to eat all these costly stack spills and we can overlap with
a 3 operand XOR. Conditionally guarding this means that if a Linux
distribution wants to compile with -march=x86_64-v4 they get all the
upsides to this.

This code notably is not actually used if you happen to have something
that support 512 bit wide clmul, so this does help a somewhat narrow
range of targets (most of the earlier avx512 implementations pre ice
lake).

We also must guard with AVX512VL, as just specifying AVX512F makes GCC
generate vpternlogic instructions of 512 bit widths only, so a bunch of
packing and unpacking of 512 bit to 256 bit registers and vice versa has
to occur, absolutely killing runtime. It's only AVX512VL where there's a
128 bit wide vpternlogic.

5 months agoUse aligned loads in the chorba portions of the clmul crc routines
Adam Stylinski [Fri, 21 Nov 2025 14:45:48 +0000 (09:45 -0500)] 
Use aligned loads in the chorba portions of the clmul crc routines

We go through the trouble to do aligned loads, we may as well let the
compiler know this is certain in doing so. We can't guarantee an aligned
store but at least with an aligned load the compiler can elide a load
with a subsequent xor multiplication when not copying.

5 months agoFix build using configure
Mika Lindqvist [Mon, 17 Nov 2025 17:15:03 +0000 (19:15 +0200)] 
Fix build using configure
* "\i" is not valid escape code in BSD sed
* Some x86 shared sources were missing -fPIC due to using wrong variable in build rule

Fixes #2015.

5 months agoUpdate Google Benchmark to v1.9.4
Mika Lindqvist [Mon, 17 Nov 2025 08:21:36 +0000 (10:21 +0200)] 
Update Google Benchmark to v1.9.4
* Require CMake 3.13

5 months agoconfigure: Determine system architecture properly on *BSD systems
Brad Smith [Mon, 17 Nov 2025 05:50:47 +0000 (00:50 -0500)] 
configure: Determine system architecture properly on *BSD systems

uname -m on a BSD system will provide the architecture port .e.g.
arm64, macppc, octeon instead of the machine architecture .e.g.
aarch64, powerpc, mips64. uname -p will provide the machine
architecture. NetBSD uses x86_64, OpenBSD uses amd64, FreeBSD
is a mix between uname -p and the compiler output.

5 months ago[CI] Downgrade "Windows GCC Native Instructions (AVX)" workflow
Mika Lindqvist [Mon, 17 Nov 2025 10:28:21 +0000 (12:28 +0200)] 
[CI] Downgrade "Windows GCC Native Instructions (AVX)" workflow
* Windows Server 2025 runner has broken GCC, so use Windows Server 2022 runner instead until fix is propagated to all runners

5 months ago2.3.0 RC2 2.3.0-rc2
Hans Kristian Rosbach [Sun, 16 Nov 2025 18:41:18 +0000 (19:41 +0100)] 
2.3.0 RC2

5 months agoAdd benchmark for crc32 fold copy implementations
Hans Kristian Rosbach [Fri, 14 Nov 2025 14:33:32 +0000 (15:33 +0100)] 
Add benchmark for crc32 fold copy implementations
Uses local functions for benchmarking some of the run-time selected variants.

5 months agoDisable benchmark for slide_hash_c with Visual C++ too.
Mika Lindqvist [Sun, 16 Nov 2025 12:49:44 +0000 (14:49 +0200)] 
Disable benchmark for slide_hash_c with Visual C++ too.

5 months agoAdd tests for crc32_fold_copy functions
Hans Kristian Rosbach [Thu, 13 Nov 2025 21:54:25 +0000 (22:54 +0100)] 
Add tests for crc32_fold_copy functions

5 months agoUse CTest to simplify testing options
Hans Kristian Rosbach [Tue, 11 Nov 2025 16:24:26 +0000 (17:24 +0100)] 
Use CTest to simplify testing options
Add CMake variable TEST_STOCK_ZLIB to disable some tests if attempting
to run our testsuite on stock zlib.
PR depends on CMP0077, introduced by CMake 3.13.
Upped minimum compatible CMake version to 3.13, same as we have
actually been telling people was the minumum for years on the wiki.
Upped upper compatible CMake version to 3.31, my current version.

5 months agoUse elf_aux_info() on OpenBSD PowerPC
Brad Smith [Fri, 14 Nov 2025 01:02:25 +0000 (20:02 -0500)] 
Use elf_aux_info() on OpenBSD PowerPC

5 months ago- Unify crc32_chorba, chorba_sse2 and chorba_sse41 dispatch functions.
Hans Kristian Rosbach [Tue, 11 Nov 2025 21:47:52 +0000 (22:47 +0100)] 
- Unify crc32_chorba, chorba_sse2 and chorba_sse41 dispatch functions.
- Fixed alignment diff calculation in crc32_chorba.
- Fixed length check to happen early, avoiding extra branches for too short lengths,
this also allows removing one function call to crc32_braid_internal to handle those.
Gbench shows ~0.15-0.25ns saved per call for lengths shorter than CHORBA_SMALL_THRESHOLD.
- Avoid calculating aligned len if buffer is already aligned

5 months agoReorganize Chorba activation.
Hans Kristian Rosbach [Tue, 11 Nov 2025 19:23:24 +0000 (20:23 +0100)] 
Reorganize Chorba activation.
Now WITHOUT_CHORBA will only disable the crc32_chorba C fallback.

SSE2, SSE41 and pclmul variants will still be able to use their Chorba-algorithm based code,
but their fallback to the generic crc32_chorba C code in SSE2 and SSE41 will be disabled,
reducing their performance on really big input buffers (not used during deflate/inflate,
only when calling crc32 directly).

Remove the crc32_c function (and its file crc32_c.c), instead use the normal functable
routing to select between crc32_braid and crc32_chorba.

Disable sse2 and sse4.1 variants of Chorba-crc32 on MSVC older than 2022 due to code
generation bug in 2019 causing segfaults.

Compile either crc32_chorba_small_nondestructive or crc32_chorba_small_nondestructive_32bit,
not both. Don't compile crc32_chorba_32768_nondestructive on 32bit arch.

5 months agoriscv: features: test HWCAP regardless of kernel versions
Icenowy Zheng [Tue, 11 Nov 2025 14:47:55 +0000 (22:47 +0800)] 
riscv: features: test HWCAP regardless of kernel versions

The HWCAP facility comes at day 1 of Linux RISC-V support (date back to
4.15), only the V bit definition is added in 6.5 (because proper vector
support is added in that version too).

There should be no need to test kernel version number before accessing
hwcap, only the V bit will never be present on kernel older than 6.5
(except dirty patched downstream ones).

For Xtheadvector systems that bogusly announce V bit in HWCAP, the
assembly code should be able to factor them out. This is tested on
a Sophgo SG2042 machine with 6.1 kernel.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
5 months agoUpdate README.md, add a lot of missing info, and reorder some of it.
Hans Kristian Rosbach [Tue, 11 Nov 2025 16:17:35 +0000 (17:17 +0100)] 
Update README.md, add a lot of missing info, and reorder some of it.
Add missing parameter to configure help text.
Update descriptions and reorganize some options in CMake

5 months ago2.3.0 RC1 2.3.0-rc1
Hans Kristian Rosbach [Fri, 31 Oct 2025 22:38:52 +0000 (23:38 +0100)] 
2.3.0 RC1

5 months agoInitial support for nVidia toolchain
Mika Lindqvist [Sun, 2 Nov 2025 16:57:16 +0000 (18:57 +0200)] 
Initial support for nVidia toolchain
* Supports native and non-native builds for x86_64 using CMake

5 months agoBump github/codeql-action from 3 to 4
dependabot[bot] [Sat, 1 Nov 2025 07:04:15 +0000 (07:04 +0000)] 
Bump github/codeql-action from 3 to 4

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3 to 4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
5 months agoBump actions/upload-artifact from 4 to 5
dependabot[bot] [Sat, 1 Nov 2025 07:04:10 +0000 (07:04 +0000)] 
Bump actions/upload-artifact from 4 to 5

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
5 months agoBump actions/download-artifact from 5 to 6
dependabot[bot] [Sat, 1 Nov 2025 07:04:03 +0000 (07:04 +0000)] 
Bump actions/download-artifact from 5 to 6

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
5 months agorename cmake config target files to avoid illegal overwrite of PACKAGE_VERSION
Benjamin Buch [Thu, 23 Oct 2025 17:17:29 +0000 (19:17 +0200)] 
rename cmake config target files to avoid illegal overwrite of PACKAGE_VERSION

5 months agoRename CMake targets to avoid clashes when used as a subproject (#1970)
Cameron Cawley [Tue, 28 Oct 2025 22:34:56 +0000 (22:34 +0000)] 
Rename CMake targets to avoid clashes when used as a subproject (#1970)

6 months agoFix type mismatch on platforms where int32_t and uint32_t use long instead of int
Mika Lindqvist [Thu, 9 Oct 2025 08:40:16 +0000 (11:40 +0300)] 
Fix type mismatch on platforms where int32_t and uint32_t use long instead of int
* Based on PR #1934

6 months agoImprove resilience of the functable initialization; during functable init,
Hans Kristian Rosbach [Fri, 10 Oct 2025 11:33:53 +0000 (13:33 +0200)] 
Improve resilience of the functable initialization; during functable init,
make sure none of the function pointers are nullpointers.

Up until now, zlib-ng and the application would have segfaulted either at the start
of processing, or at some point later depending on when a nullpointer call would happen
in the processing. In any case most likely after accepting data from the application.

Now, the deflateinit/inflateinit functions will error with Z_VERSION_ERROR, and
gzopen will return Z_STREAM_ERROR before actually processing any data.

Direct calls to functions like adler32 or crc32 will however print an error message
and call abort(), as these functions have no actual way of reporting errors.

Note: This should never happen with default builds of zlib-ng, only if it is run on
a cpu that is missing both the matching optimized and the generic fallback functions.
This can currently only happen if zlib-ng is compiled using custom cflags or by
editing the code.

6 months agoDon't build C-fallback functions that never get used on x86_64
Hans Kristian Rosbach [Fri, 10 Oct 2025 12:52:21 +0000 (14:52 +0200)] 
Don't build C-fallback functions that never get used on x86_64

6 months agoRemove force-sse2 config option from x86 builds.
Hans Kristian Rosbach [Fri, 10 Oct 2025 11:26:12 +0000 (13:26 +0200)] 
Remove force-sse2 config option from x86 builds.
Due to major refactoring done long ago, this option no longer avoids a branch
in a hot path, it currently only removes a single if check during init.

6 months agoUpdate s390x actions runner.
Hans Kristian Rosbach [Fri, 10 Oct 2025 11:15:38 +0000 (13:15 +0200)] 
Update s390x actions runner.
- Update to EL10
- Update URL to s390x runner patch

6 months ago📝 Add docstrings to `cleanup3`
coderabbitai[bot] [Mon, 6 Oct 2025 18:36:46 +0000 (18:36 +0000)] 
📝 Add docstrings to `cleanup3`

Docstrings generation was requested by @mtl1979.

* https://github.com/zlib-ng/zlib-ng/pull/1978#issuecomment-3373304629

The following files were modified:

* `test/benchmarks/benchmark_slidehash.cc`

6 months agoBump actions/checkout from 4 to 5
dependabot[bot] [Wed, 8 Oct 2025 14:06:54 +0000 (14:06 +0000)] 
Bump actions/checkout from 4 to 5

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
6 months agoFix cast and truncation warnings.
Mika Lindqvist [Mon, 6 Oct 2025 18:22:40 +0000 (21:22 +0300)] 
Fix cast and truncation warnings.

6 months agoUpdate terms in txtvsbin.txt
Jeff Handley [Fri, 3 Oct 2025 16:36:43 +0000 (12:36 -0400)] 
Update terms in txtvsbin.txt

6 months agoUse 'block-list' and 'allow-list' terms
Jeff Handley [Thu, 2 Oct 2025 23:22:46 +0000 (16:22 -0700)] 
Use 'block-list' and 'allow-list' terms

6 months agoIncrease minimum supported CMake version from 3.5.1 to 3.12
Hans Kristian Rosbach [Thu, 2 Oct 2025 12:16:53 +0000 (14:16 +0200)] 
Increase minimum supported CMake version from 3.5.1 to 3.12

6 months agoInline the CHUNKSIZE function
Cameron Cawley [Thu, 2 Oct 2025 16:14:09 +0000 (17:14 +0100)] 
Inline the CHUNKSIZE function

6 months agoUpdate macOS CI images
Cameron Cawley [Sat, 27 Sep 2025 12:26:12 +0000 (13:26 +0100)] 
Update macOS CI images

7 months agoSynchronise ARMv8 and Loongarch CRC32 implementations
Cameron Cawley [Thu, 25 Sep 2025 15:30:53 +0000 (16:30 +0100)] 
Synchronise ARMv8 and Loongarch CRC32 implementations

7 months agoFix -Wstrict-prototypes warnings
Cameron Cawley [Thu, 25 Sep 2025 14:11:14 +0000 (15:11 +0100)] 
Fix -Wstrict-prototypes warnings