]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
3 days agoReplace small/large buffer tests with parameterized test_chunked develop
Nathan Moinvaziri [Tue, 14 Apr 2026 03:20:26 +0000 (20:20 -0700)] 
Replace small/large buffer tests with parameterized test_chunked

test_large_buffers reset d_stream.next_out on every inflate iteration, so the
decompressed output was never compared against the source. test_chunked keeps
the input, compressed, and decompressed buffers separate and checks them with
memcmp.

New avail_out values (3, 64, 128, 256, 259) exercise inflate_fast()'s safe-mode
MATCH-state bailout around the 258-byte maximum match length.

4 days agoBump Google Benchmark to v1.9.5
Mika T. Lindqvist [Thu, 23 Apr 2026 11:46:39 +0000 (14:46 +0300)] 
Bump Google Benchmark to v1.9.5
* Google Benchmark v1.9.4 fails to compile with recent versions of clang and Visual C++ if warnings are treated as errors

4 days agoAdd compressed and ratio fields to deflate/corpora benchmarks
Nathan Moin Vaziri [Mon, 13 Apr 2026 05:27:42 +0000 (22:27 -0700)] 
Add compressed and ratio fields to deflate/corpora benchmarks

4 days agoAdd corpora benchmarks for deflate and inflate
Nathan Moin Vaziri [Wed, 8 Apr 2026 01:17:52 +0000 (18:17 -0700)] 
Add corpora benchmarks for deflate and inflate

Adds benchmark_corpora.cc which dynamically discovers and benchmarks
all files from the zlib-ng/corpora repository (silesia, calgary,
canterbury, large, snappy, etc.).

Benchmarks are registered at startup using RegisterBenchmark. If the
corpora directory is not present, no benchmarks are registered.
Deflate is tested at levels 1, 6, and 9 per file. Inflate is tested
once per file using data pre-compressed at level 9.

6 days agoAdd --benchmark_cooldown flag to mitigate thermal throttling
Nathan Moin Vaziri [Wed, 8 Apr 2026 01:10:20 +0000 (18:10 -0700)] 
Add --benchmark_cooldown flag to mitigate thermal throttling

Adds a --benchmark_cooldown=<seconds> flag that inserts a sleep between
benchmark families. This helps produce consistent results on systems
where sustained workloads cause thermal throttling and CPU frequency
scaling.

Uses a wrapping BenchmarkReporter that sleeps before forwarding results
to the default display reporter.

6 days agoAdd /delta workflow for per-PR binary size comparison
Nathan Moin Vaziri [Tue, 14 Apr 2026 18:01:39 +0000 (11:01 -0700)] 
Add /delta workflow for per-PR binary size comparison

On a /delta PR comment the job builds the PR head and base with
RelWithDebInfo, splits the DWARF into sibling .debug companions, and
runs several tools against both stripped libraries:

- binutils size for text/data/bss totals plus a Δ row
- bloaty for sections, top 30 compile units, and top 30 symbols
- nm --defined-only --dynamic to diff the exported symbol set
- abidiff for C ABI changes (honouring test/abi/ignore)
- minigzip at levels 1-9 over silesia-small.tar and, on native
  builds, the full silesia.tar

Results come back as a "## Delta Report" PR comment with a details
block per section, reporting both head and base SHAs so offset runs
are unambiguous.

Comment syntax is /delta [arch] [-N]. Arch defaults to x86_64 and
accepts aarch64, powerpc64le, riscv64, and s390x. -N selects the Nth
commit back from the PR head so a regression can be bisected without
force-pushing. Cross-compile builds reuse cmake/toolchain-*.cmake
and run the stripped binaries under qemu-user.

6 days agoUse fallback defines for Chorba Scalar/SSE
Nathan Moinvaziri [Wed, 18 Feb 2026 08:29:00 +0000 (00:29 -0800)] 
Use fallback defines for Chorba Scalar/SSE

Gate Scalar and SSE chorba uniformly on CRC32_CHORBA_FALLBACK and
CRC32_CHORBA_SSE_FALLBACK across prototypes, dispatch, sources, tests
and benchmarks instead of spot-checking WITHOUT_CHORBA /
WITHOUT_CHORBA_SSE directly at each site.

Also move crc32_chorba_c.c into ZLIB_GENERIC_SRCS and align Makefile.in
to match so the CMake and autotools builds stay bit-identical.

6 days agoRemove inert comment about disabling Chorba SSE in X86 functions header
Nathan Moin Vaziri [Sat, 18 Apr 2026 21:16:40 +0000 (14:16 -0700)] 
Remove inert comment about disabling Chorba SSE in X86 functions header

This was never correct, it should have been WITHOUT_CHORBA_SSE not NO_CHORBA_SSE

6 days agoFix typo in No Chorba CMake option name in CI
Nathan Moin Vaziri [Sat, 18 Apr 2026 20:49:12 +0000 (13:49 -0700)] 
Fix typo in No Chorba CMake option name in CI

The 'Ubuntu GCC No Chorba' matrix entry was passing -DWITH_CHORBA=OFF
since its introduction in 9d4af458, but the actual CMake option is
named WITH_CRC32_CHORBA.

6 days agoRemove CMake warning about MSVC Chorba bug
Nathan Moin Vaziri [Sat, 18 Apr 2026 20:52:57 +0000 (13:52 -0700)] 
Remove CMake warning about MSVC Chorba bug

The Chorba bug on SSE2/SSE41 has been fixed so this no longer applies.

6 days agoMerge duplicate 32-bit _mm_cvtsi64_si128 polyfills
Nathan Moin Vaziri [Fri, 17 Apr 2026 20:23:52 +0000 (13:23 -0700)] 
Merge duplicate 32-bit _mm_cvtsi64_si128 polyfills

The MSVC and GCC 32-bit polyfills for _mm_cvtsi64_si128 /
_mm_cvtsi128_si64 had identical bodies. Merge them into a single
block guarded by !__clang__ && ARCH_32BIT, with the MSVC-only
#include <intrin.h> nested inside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 days agoFix MSVC v142 miscompile of _mm_cvtsi64_si128 polyfill on 32-bit
Nathan Moin Vaziri [Fri, 17 Apr 2026 20:22:49 +0000 (13:22 -0700)] 
Fix MSVC v142 miscompile of _mm_cvtsi64_si128 polyfill on 32-bit

MSVC v142 (Visual Studio 2019, and VS 2022 pre-17.11) miscompiles
_mm_set_epi64x(0, a) on 32-bit Windows by routing part of the synthesis
through a GPR, clobbering live register data and causing stack corruption
in the chorba SSE2/SSE4.1 CRC32 code paths.

Replace the _mm_set_epi64x(0, a) polyfill with _mm_loadl_epi64 which
compiles to a single MOVQ xmm,m64 that bypasses the buggy synthesis
path. Also convert the GCC 32-bit _mm_cvtsi64_si128 macro to a static
inline for consistency, and drop the redundant ARCH_X86 guard since
x86_intrins.h is only reachable from x86 code.

https://developercommunity.visualstudio.com/t/10853479

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 days agoFix UBSAN implicit conversion warning in test/fuzz/fuzzer_example_flush.c.
Hans Kristian Rosbach [Wed, 15 Apr 2026 13:31:48 +0000 (15:31 +0200)] 
Fix UBSAN implicit conversion warning in test/fuzz/fuzzer_example_flush.c.

11 days agoFix UBSAN implicit conversion warning in test/test_deflate_concurrency.cc.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:58:44 +0000 (14:58 +0200)] 
Fix UBSAN implicit conversion warning in test/test_deflate_concurrency.cc.

11 days agoFix UBSAN implicit conversion warning in test/test_shared_ng.h.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:27:27 +0000 (14:27 +0200)] 
Fix UBSAN implicit conversion warning in test/test_shared_ng.h.

11 days agoFix UBSAN implicit conversion warning in arch/s390/crc32_vx.c.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:13:20 +0000 (14:13 +0200)] 
Fix UBSAN implicit conversion warning in arch/s390/crc32_vx.c.

11 days agoFix UBSAN implicit conversion warning in inftrees.c.
Hans Kristian Rosbach [Wed, 15 Apr 2026 11:39:54 +0000 (13:39 +0200)] 
Fix UBSAN implicit conversion warning in inftrees.c.

Co-authored-by: Nathan Moin Vaziri <nathan@nathanm.com>
11 days agoCMake: Add 'implicit-conversion' and 'nullability' to sanitizers
Hans Kristian Rosbach [Mon, 2 Mar 2026 21:12:58 +0000 (22:12 +0100)] 
CMake: Add 'implicit-conversion' and 'nullability' to sanitizers
we attempt to enable with ubsan builds.

11 days agoRename longest_match_slow to longest_match_roll
Nathan Moin Vaziri [Mon, 13 Apr 2026 22:26:24 +0000 (15:26 -0700)] 
Rename longest_match_slow to longest_match_roll

The "slow" variant of longest_match uses a 3-byte rolling hash to seed
its offset-search lookups after a match has been found. Rename the
template gate, the functable entry, and all arch-specific instantiations
from *_slow to *_roll to reflect what the variant actually uses, so a
separate integer-hash offset-search variant can coexist under its own
name.

Pure rename, no behavior change.

12 days agoAdd small output buffer inflate benchmark #2062
Nathan Moinvaziri [Tue, 14 Apr 2026 02:00:09 +0000 (19:00 -0700)] 
Add small output buffer inflate benchmark #2062

Benchmarks the inflate fast path with constrained output
buffers ranging from 64 to 16384 bytes per call, reproducing
the libpng decompression pattern described in the "running
off a cliff" analysis.

https://nigeltao.github.io/blog/2021/fastest-safest-png-decoder.html#running-off-a-cliff

12 days agoRemove macro and inline inflate benchmark definition directly
Nathan Moinvaziri [Tue, 10 Mar 2026 18:40:39 +0000 (11:40 -0700)] 
Remove macro and inline inflate benchmark definition directly

13 days agoFix VPCLMULQDQ CRC32 build with partial AVX-512 baselines
Nathan Moin Vaziri [Fri, 10 Apr 2026 20:29:05 +0000 (13:29 -0700)] 
Fix VPCLMULQDQ CRC32 build with partial AVX-512 baselines

The 512-bit path in crc32_pclmulqdq_tpl.h assumed AVX-512F was
enough, but some of the intrinsics it used actually require
AVX-512DQ. Pick the correct variants based on the available
features.

13 days agoAdd fallback defines to skip generic C code when native intrinsics exist
Nathan Moinvaziri [Tue, 10 Mar 2026 06:41:45 +0000 (23:41 -0700)] 
Add fallback defines to skip generic C code when native intrinsics exist

Each arch header now sets *_FALLBACK defines (ADLER32_FALLBACK,
CHUNKSET_FALLBACK, COMPARE256_FALLBACK, CRC32_BRAID_FALLBACK,
SLIDE_HASH_FALLBACK) when no native SIMD implementation exists.
Generic C source files, declarations, functable entries, tests,
and benchmarks are guarded by these defines.

13 days agoUse __attribute__((constructor)) to initialize the functable
Vladislav Shchapov [Sun, 22 Mar 2026 14:43:46 +0000 (19:43 +0500)] 
Use __attribute__((constructor)) to initialize the functable

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
13 days ago[CI] Switch CMake workflow to use MSYS2 for MinGW32.
Mika Lindqvist [Mon, 13 Apr 2026 17:34:38 +0000 (20:34 +0300)] 
[CI] Switch CMake workflow to use MSYS2 for MinGW32.

2 weeks agoAdd ACCUM_ROUND macro to crc32_chorba_sse2
Nathan Moin Vaziri [Fri, 3 Apr 2026 00:16:27 +0000 (17:16 -0700)] 
Add ACCUM_ROUND macro to crc32_chorba_sse2

2 weeks agoAdd ACCUM_ROUND macro to crc32_chorba_c
Nathan Moin Vaziri [Thu, 2 Apr 2026 04:38:16 +0000 (21:38 -0700)] 
Add ACCUM_ROUND macro to crc32_chorba_c

2 weeks agoAdd NEXT_ROUND macro in crc32_chorba_c
Nathan Moin Vaziri [Thu, 2 Apr 2026 04:33:23 +0000 (21:33 -0700)] 
Add NEXT_ROUND macro in crc32_chorba_c

2 weeks agoFix formatting in crc32_chorba_c
Nathan Moin Vaziri [Thu, 2 Apr 2026 04:35:31 +0000 (21:35 -0700)] 
Fix formatting in crc32_chorba_c

2 weeks agoFix formatting in crc32_chorba_sse41
Nathan Moin Vaziri [Thu, 2 Apr 2026 04:11:21 +0000 (21:11 -0700)] 
Fix formatting in crc32_chorba_sse41

2 weeks agoFix formatting for crc32_chorba_sse2.
Nathan Moin Vaziri [Thu, 2 Apr 2026 04:04:41 +0000 (21:04 -0700)] 
Fix formatting for crc32_chorba_sse2.

This should fix a bunch of CodeQL warnings about commented code.

2 weeks agocrc32: use may_alias for chorba buffers
cl2t [Sat, 14 Mar 2026 06:17:20 +0000 (14:17 +0800)] 
crc32: use may_alias for chorba buffers

2 weeks agocrc32: zero initialize chorba bitbuffer
cl2t [Sat, 14 Mar 2026 03:36:16 +0000 (11:36 +0800)] 
crc32: zero initialize chorba bitbuffer

2 weeks agoExtract fold_block_chorba function for PCLMULQDQ path
Nathan Moinvaziri [Thu, 12 Mar 2026 19:58:31 +0000 (12:58 -0700)] 
Extract fold_block_chorba function for PCLMULQDQ path

2 weeks agoExtract fold_block_16/8 functions for VPCLMULQDQ paths
Nathan Moinvaziri [Thu, 12 Mar 2026 19:36:27 +0000 (12:36 -0700)] 
Extract fold_block_16/8 functions for VPCLMULQDQ paths

2 weeks ago[CI] Add configure MinGW32/MinGW64 workflows.
Mika Lindqvist [Sat, 11 Apr 2026 20:41:49 +0000 (23:41 +0300)] 
[CI] Add configure MinGW32/MinGW64 workflows.

2 weeks agoMove chunk_{128,256}bit_perm_idx_lut.h, chunk_permute_table.h to arch/shared.
Vladislav Shchapov [Mon, 30 Mar 2026 08:03:59 +0000 (13:03 +0500)] 
Move chunk_{128,256}bit_perm_idx_lut.h, chunk_permute_table.h to arch/shared.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoFix loongarch64 name in crc32_copy benchmark.
Vladislav Shchapov [Mon, 30 Mar 2026 07:57:32 +0000 (12:57 +0500)] 
Fix loongarch64 name in crc32_copy benchmark.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoReuse unrolled ARMv8 CRC32 implementation for LoongArch64.
Vladislav Shchapov [Fri, 27 Mar 2026 21:39:33 +0000 (02:39 +0500)] 
Reuse unrolled ARMv8 CRC32 implementation for LoongArch64.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoConvert crc32_armv8_align, crc32_armv8_tail and crc32_copy_impl functions to template.
Vladislav Shchapov [Fri, 27 Mar 2026 21:19:39 +0000 (02:19 +0500)] 
Convert crc32_armv8_align, crc32_armv8_tail and crc32_copy_impl functions to template.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks ago[CI] Build MSVC 2026 C23 with benchmarks.
Mika Lindqvist [Sat, 11 Apr 2026 19:00:54 +0000 (22:00 +0300)] 
[CI] Build MSVC 2026 C23 with benchmarks.

2 weeks agoAdd back zng_clz for big-endian and use macro for compare256
Nathan Moinvaziri [Tue, 17 Mar 2026 06:35:05 +0000 (23:35 -0700)] 
Add back zng_clz for big-endian and use macro for compare256

2 weeks agoMove S390 VX vector typedefs into vx_intrins.h
Nathan Moein Vaziri [Thu, 2 Apr 2026 02:23:27 +0000 (19:23 -0700)] 
Move S390 VX vector typedefs into vx_intrins.h

The `vector` keyword requires -fzvector which is not available on all
GCC versions (e.g. EL10). Use __attribute__((vector_size(16))) typedefs
instead, matching the existing style in crc32_vx.c.

2 weeks agoAdd compatibility header for VX instructions
Nathan Moein Vaziri [Thu, 2 Apr 2026 01:08:26 +0000 (18:08 -0700)] 
Add compatibility header for VX instructions

GCC/Clang don't support vec_sub or vec_subs, but IBM compiler does.

2 weeks agoAdd slide hash optimization for S390 VX
Nathan Moinvaziri [Tue, 17 Mar 2026 07:20:57 +0000 (00:20 -0700)] 
Add slide hash optimization for S390 VX

We can clean up the build system and combine checks for VGFMA since
they are part of base VX instruction set.

2 weeks agoFix building GH1235 test for 32-bit MinGW
Mika T. Lindqvist [Tue, 7 Apr 2026 17:57:08 +0000 (20:57 +0300)] 
Fix building GH1235 test for 32-bit MinGW

```
C:/build/git/zlib-ng/test/gh1235.c: In function 'main':
C:/build/git/zlib-ng/test/gh1235.c:34:43: error: passing argument 2 of 'compress2' from incompatible pointer type [-Wincompatible-pointer-types]
   34 |         if (PREFIX(compress2)(compressed, &bytes, plain, i, 1) != Z_OK) return -1;
      |                                           ^~~~~~
      |                                           |
      |                                           z_size_t * {aka unsigned int *}
In file included from C:/build/git/zlib-ng/zutil.h:15,
                 from C:/build/git/zlib-ng/test/gh1235.c:4:
../zlib.h:1261:69: note: expected 'long unsigned int *' but argument is of type 'z_size_t *' {aka 'unsigned int *'}
 1261 | Z_EXTERN int Z_EXPORT compress2(unsigned char *dest, unsigned long *destLen, const unsigned char *source,
      |                                                      ~~~~~~~~~~~~~~~^~~~~~~
```

4 weeks agoUpdate s390x actions runner docker build scripts
Hans Kristian Rosbach [Mon, 23 Mar 2026 18:35:05 +0000 (19:35 +0100)] 
Update s390x actions runner docker build scripts

5 weeks agoAdd an altivec variant of "count_lengths" in inftrees
Adam Stylinski [Sat, 7 Mar 2026 17:43:02 +0000 (12:43 -0500)] 
Add an altivec variant of "count_lengths" in inftrees

This accounts for a small bump in performance

5 weeks agoUpdate e2k cross compiler to version lcc-1.29.16
Vladislav Shchapov [Fri, 13 Mar 2026 15:55:47 +0000 (20:55 +0500)] 
Update e2k cross compiler to version lcc-1.29.16

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 weeks agoRemove check that is always true (even if WANT_MIN_MATCH was reduced from 4 to 3).
Hans Kristian Rosbach [Thu, 12 Mar 2026 19:31:30 +0000 (20:31 +0100)] 
Remove check that is always true (even if WANT_MIN_MATCH was reduced from 4 to 3).

6 weeks agoAvoid calling fizzle_matches unless checks pass
Hans Kristian Rosbach [Thu, 12 Mar 2026 16:00:39 +0000 (17:00 +0100)] 
Avoid calling fizzle_matches unless checks pass

6 weeks ago- Add local variables match_len and strstart in insert_match, to avoid extra lookups...
Hans Kristian Rosbach [Thu, 12 Mar 2026 15:35:50 +0000 (16:35 +0100)] 
- Add local variables match_len and strstart in insert_match, to avoid extra lookups from struct.
- Move check for enough lookahead outside of function, can avoid function call
  instead of calling and immediately returning.

6 weeks ago- Add local variable match_len in emit_match to avoid extra lookups from struct.
Hans Kristian Rosbach [Thu, 12 Mar 2026 14:32:09 +0000 (15:32 +0100)] 
- Add local variable match_len in emit_match to avoid extra lookups from struct.
- Move s->lookahead decrement to top of function, both branches of the function
does it and they don't care when it is done.

6 weeks agoAdd copy fallback for Adler32 ARM when building with no-unaligned-access
Nathan Moinvaziri [Thu, 12 Mar 2026 03:03:11 +0000 (20:03 -0700)] 
Add copy fallback for Adler32 ARM when building with no-unaligned-access

6 weeks agoUnroll 64-byte CRC32+copy loop for ARMv8
Nathan Moinvaziri [Fri, 13 Mar 2026 05:03:04 +0000 (22:03 -0700)] 
Unroll 64-byte CRC32+copy loop for ARMv8

Process 64 bytes per iteration using 8x uint64_t loads
with interleaved memcpy stores and __crc32d calls.
RPi5 benchmarks show 30-51% improvement over the
separate crc32 + memcpy baseline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 weeks agoAdd fallback for ARM CRC32 copy when compiling with no-unaligned-access
Nathan Moinvaziri [Thu, 12 Mar 2026 02:49:53 +0000 (19:49 -0700)] 
Add fallback for ARM CRC32 copy when compiling with no-unaligned-access

6 weeks agoReplace memcpy with NEON intrinsics for better performance alignment
Nathan Moinvaziri [Fri, 6 Mar 2026 23:48:35 +0000 (15:48 -0800)] 
Replace memcpy with NEON intrinsics for better performance alignment

6 weeks agoImplement interleaved copying for CRC32 ARMv8 PMULL+EOR3.
Nathan Moinvaziri [Tue, 24 Feb 2026 17:14:53 +0000 (09:14 -0800)] 
Implement interleaved copying for CRC32 ARMv8 PMULL+EOR3.

6 weeks agoImplement interleaved copying for CRC32 ARMv8.
Nathan Moinvaziri [Fri, 6 Mar 2026 03:01:45 +0000 (19:01 -0800)] 
Implement interleaved copying for CRC32 ARMv8.

6 weeks agoAdd shared align/tail helpers for CRC32 ARMv8.
Nathan Moinvaziri [Fri, 6 Mar 2026 03:00:54 +0000 (19:00 -0800)] 
Add shared align/tail helpers for CRC32 ARMv8.

6 weeks agoUse OSB workflow as an initial test before queueing all the other tests,
Hans Kristian Rosbach [Thu, 12 Mar 2026 19:57:21 +0000 (20:57 +0100)] 
Use OSB workflow as an initial test before queueing all the other tests,
this makes sure we don't spend a lot of CI time testing something that
won't even build.

6 weeks agoSeparate match finding logic in deflate_medium
Nathan Moinvaziri [Tue, 10 Mar 2026 02:39:52 +0000 (19:39 -0700)] 
Separate match finding logic in deflate_medium

6 weeks ago[CodeQL] Add Windows.
Mika Lindqvist [Wed, 11 Mar 2026 18:28:51 +0000 (20:28 +0200)] 
[CodeQL] Add Windows.

6 weeks agoRemove ASAN from s390x qemu build, it fails for unknown reasons.
Hans Kristian Rosbach [Wed, 11 Mar 2026 15:28:09 +0000 (16:28 +0100)] 
Remove ASAN from s390x qemu build, it fails for unknown reasons.

6 weeks agoGuard against ls-remote failing
pmqs [Tue, 10 Mar 2026 12:24:32 +0000 (12:24 +0000)] 
Guard against ls-remote failing

6 weeks agoCache LLVM C++ libraries for MSAN
Paul Marquess [Sun, 15 Feb 2026 16:18:59 +0000 (16:18 +0000)] 
Cache LLVM C++ libraries for MSAN

6 weeks agoExpand codeql testing to run on multiple platforms and two configs.
Hans Kristian Rosbach [Mon, 9 Mar 2026 13:09:14 +0000 (14:09 +0100)] 
Expand codeql testing to run on multiple platforms and two configs.

6 weeks ago[CI] Fix lint when using workflow_dispatch.
Mika Lindqvist [Tue, 10 Mar 2026 07:18:23 +0000 (09:18 +0200)] 
[CI] Fix lint when using workflow_dispatch.

6 weeks agoAdd ARM64EC builds to GitHub Actions
Cameron Cawley [Tue, 10 Mar 2026 20:59:18 +0000 (20:59 +0000)] 
Add ARM64EC builds to GitHub Actions

6 weeks agoFix CPU detection for ARM64EC
Cameron Cawley [Tue, 10 Mar 2026 20:57:50 +0000 (20:57 +0000)] 
Fix CPU detection for ARM64EC

6 weeks agoReplace macros with inline functions in deflate_quick.
Nathan Moinvaziri [Tue, 10 Mar 2026 01:03:26 +0000 (18:03 -0700)] 
Replace macros with inline functions in deflate_quick.

On -O2, Clang produces identical output, GCC produces 2 fewer instructions.

6 weeks agoClean up dead assignments in insert_match
Nathan Moinvaziri [Tue, 10 Mar 2026 03:15:34 +0000 (20:15 -0700)] 
Clean up dead assignments in insert_match

When 56d3d985 was reverted in b85cfdf9, it restored dead
stores to match.strstart and match.match_length that
have no effect since match is passed by value. The
compiler already eliminated them; remove from source.

6 weeks ago[CI] Fix 32-bit ARM release.
Mika Lindqvist [Tue, 10 Mar 2026 07:33:04 +0000 (09:33 +0200)] 
[CI] Fix 32-bit ARM release.

6 weeks agoAdd parameterized deflate tests
Nathan Moinvaziri [Tue, 10 Mar 2026 03:53:22 +0000 (20:53 -0700)] 
Add parameterized deflate tests

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
6 weeks agoUse uintptr_t for ASan function signatures and macro variables
Nathan Moinvaziri [Tue, 10 Mar 2026 03:20:40 +0000 (20:20 -0700)] 
Use uintptr_t for ASan function signatures and macro variables

The ASan runtime ABI expects uptr (pointer-sized unsigned) for both
parameters of __asan_loadN/__asan_storeN. On LLP64 targets like
Windows x64, long is 32-bit while pointers are 64-bit, truncating
size values. Use uintptr_t to match the ABI correctly.

6 weeks agoReorganize sanitizer header for readability
Nathan Moinvaziri [Tue, 10 Mar 2026 01:37:51 +0000 (18:37 -0700)] 
Reorganize sanitizer header for readability

6 weeks agoMove ASAN/MSAN instrumentation out of zbuild.h
Nathan Moinvaziri [Tue, 10 Mar 2026 01:27:58 +0000 (18:27 -0700)] 
Move ASAN/MSAN instrumentation out of zbuild.h

Create zsanitizer.h with all sanitizer detection, declaration
stubs, and instrument_read/write/read_write macros. Include it
only in the chunkset, inflate, and dfltcc files that perform
deliberate out-of-bounds reads for performance.

6 weeks agoSimplify slide_hash_lsx
Vladislav Shchapov [Sun, 1 Feb 2026 19:11:53 +0000 (00:11 +0500)] 
Simplify slide_hash_lsx

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 weeks agoSlide 32 hash entries per loop iteration when using LASX
Vladislav Shchapov [Sun, 1 Feb 2026 19:11:18 +0000 (00:11 +0500)] 
Slide 32 hash entries per loop iteration when using LASX

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
7 weeks agoCI: S390x has Clang, but the qemu fallback uses a toolchain specifying gcc,
Hans Kristian Rosbach [Mon, 9 Mar 2026 19:32:34 +0000 (20:32 +0100)] 
CI: S390x has Clang, but the qemu fallback uses a toolchain specifying gcc,
therefore make sure we install and use gcc.

7 weeks agoCMake: Fix incorrect order of compiler flags when using sanitizers
Hans Kristian Rosbach [Mon, 9 Mar 2026 19:32:04 +0000 (20:32 +0100)] 
CMake: Fix incorrect order of compiler flags when using sanitizers

7 weeks agoMake orchestrator the parent of most workflows, and let it handle
Hans Kristian Rosbach [Mon, 9 Mar 2026 10:22:21 +0000 (11:22 +0100)] 
Make orchestrator the parent of most workflows, and let it handle
most automatic cancellations of workflows when new commits are pushed.

Workflows 'fuzz', 'lint' and 'release' have different triggers,
so handle those separately.

7 weeks agoCombine extra_lbits/base_length and extra_dbits/base_dist lookup tables
Nathan Moinvaziri [Thu, 19 Feb 2026 22:54:19 +0000 (14:54 -0800)] 
Combine extra_lbits/base_length and extra_dbits/base_dist lookup tables

Pack base values and extra bit counts into combined tables (lbase_extra,
dbase_extra) to reduce memory loads in the deflate hot path.

Each match emission now requires 2 loads instead of 4 for the extra
bits handling.

Assisted-by: Claude Code
7 weeks agoAdd 256-bit VPCLMULQDQ CRC32 path for systems without AVX-512.
Nathan Moinvaziri [Mon, 9 Mar 2026 07:30:04 +0000 (00:30 -0700)] 
Add 256-bit VPCLMULQDQ CRC32 path for systems without AVX-512.

Split VPCLMULQDQ CRC32 into separate AVX2 and AVX-512 compilation
units. Compute fold-by-8 constants for the AVX2 path using
bitreverse(x^d mod G(x), 33) with d=992 and d=1056.

7 weeks agoAdd parameterized deflate benchmark
Nathan Moinvaziri [Fri, 27 Feb 2026 00:10:11 +0000 (16:10 -0800)] 
Add parameterized deflate benchmark

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
7 weeks agoTests: Initialize buffer in test_crc32.cc
Hans Kristian Rosbach [Sun, 8 Mar 2026 22:30:45 +0000 (23:30 +0100)] 
Tests: Initialize buffer in test_crc32.cc

7 weeks agoAdd MSAN to Aarch64.
Hans Kristian Rosbach [Sun, 8 Mar 2026 13:02:33 +0000 (14:02 +0100)] 
Add MSAN to Aarch64.
Change tests so we run UBSAN on neon/armv8 code, testing without
our optimizations is less important.
Fix windows arm test skipping check.

7 weeks agoDisable sanitizer for ARM SF
pmqs [Mon, 9 Mar 2026 15:03:02 +0000 (15:03 +0000)] 
Disable sanitizer for ARM SF

7 weeks agoDisable ARM SF Jobs
pmqs [Mon, 9 Mar 2026 12:58:27 +0000 (12:58 +0000)] 
Disable ARM SF Jobs

7 weeks agoHarden sanitizer support
Paul Marquess [Sun, 15 Feb 2026 16:18:59 +0000 (16:18 +0000)] 
Harden sanitizer support

7 weeks ago[CI] Switch Windows ARM64 workflows to use native runners.
Mika Lindqvist [Mon, 9 Mar 2026 08:06:35 +0000 (10:06 +0200)] 
[CI] Switch Windows ARM64 workflows to use native runners.

7 weeks agoREADME: Small feature list updates
Hans Kristian Rosbach [Mon, 9 Mar 2026 09:14:57 +0000 (10:14 +0100)] 
README: Small feature list updates

7 weeks agoREADME: Add coveralls badge
Hans Kristian Rosbach [Mon, 9 Mar 2026 09:05:32 +0000 (10:05 +0100)] 
README: Add coveralls badge

7 weeks agoUnroll the slide hash loop similar to other ISAs
Adam Stylinski [Sat, 7 Mar 2026 18:27:27 +0000 (13:27 -0500)] 
Unroll the slide hash loop similar to other ISAs

We do this to backfill the pipeline a little bit better, particularly
on the G5.  We also conveniently operate on an entire cacheline for
this.

7 weeks agoRevert "Relax alignment requirement in NEON_accum32."
Nathan Moinvaziri [Fri, 6 Mar 2026 20:09:20 +0000 (12:09 -0800)] 
Revert "Relax alignment requirement in NEON_accum32."

This reverts commit ced54ac89cb79d8df912d741c25ea7bce9061761.

7 weeks agoAdd NMAX_ALIGNED32 and use it in NEON adler32
Nathan Moinvaziri [Fri, 6 Mar 2026 19:38:28 +0000 (11:38 -0800)] 
Add NMAX_ALIGNED32 and use it in NEON adler32

Define NMAX_ALIGNED32 as NMAX rounded down to a multiple of 32 (5536)
and use it in the NEON adler32 implementation to ensure that src stays
32-byte aligned throughout the main SIMD loop. Previously, NMAX (5552)
is not a multiple of 32, so after the alignment preamble the first
iteration could process a non-32-aligned number of bytes, causing src
to lose 32-byte alignment for all subsequent iterations.

The first iteration's budget is rounded down with ALIGN_DOWN after
subtracting align_diff, ensuring k is always a multiple of 32.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 weeks agoAdd compile-time native feature detection macros
Nathan Moinvaziri [Sat, 7 Feb 2026 08:00:44 +0000 (00:00 -0800)] 
Add compile-time native feature detection macros

Creates [ARCH]_[FEAT]_NATIVE preprocessor defines that can be re-used
in functable to bypass CPU checks.

They are from DISABLE_RUNTIME_CPU_DETECTION preprocessor logic.

7 weeks agoUse ARM64 runners for all ARM-based builds
Hans Kristian Rosbach [Sat, 7 Mar 2026 23:17:09 +0000 (00:17 +0100)] 
Use ARM64 runners for all ARM-based builds

7 weeks agoRun lint in ubuntu-slim, a lightweight actions runner
Hans Kristian Rosbach [Sat, 7 Mar 2026 23:02:01 +0000 (00:02 +0100)] 
Run lint in ubuntu-slim, a lightweight actions runner

7 weeks agoGithub workers have been increased from 2 to 4 cores, increase concurrency.
Hans Kristian Rosbach [Sat, 7 Mar 2026 22:18:54 +0000 (23:18 +0100)] 
Github workers have been increased from 2 to 4 cores, increase concurrency.