]> git.ipfire.org Git - thirdparty/zlib-ng.git/log
thirdparty/zlib-ng.git
36 hours ago[CI] Add coveralls as trusted formula for Homebrew under MacOS develop
Mika Lindqvist [Wed, 17 Jun 2026 21:23:23 +0000 (00:23 +0300)] 
[CI] Add coveralls as trusted formula for Homebrew under MacOS
* Homebrew 6.0.0 requires explicit trust when installing formulae from
  third-party taps (repositories)

2 days agoCleanup deflate_quick variable reuse and call signatures
Nathan Moin Vaziri [Tue, 12 May 2026 06:24:09 +0000 (23:24 -0700)] 
Cleanup deflate_quick variable reuse and call signatures

Drop the lc local: the four-byte window read is hoisted above the
match-search branch, and the literal-emit takes the low byte of
str_val directly so the short-lookahead arm no longer needs its own
load.  Pass strstart into quick_start_block and quick_end_block so
they no longer reach into s for block_start.  In the early-finish
arm, pass 1 to quick_start_block since last is known to be 1.

2 days agoHoist strstart and lookahead to locals in deflate strategies
Nathan Moin Vaziri [Tue, 12 May 2026 05:11:26 +0000 (22:11 -0700)] 
Hoist strstart and lookahead to locals in deflate strategies

Lift s->strstart and s->lookahead into function-local variables in
deflate_quick, deflate_huff, and deflate_rle, synced back to s before
external callouts that observe them (fill_window, longest_match,
FLUSH_BLOCK, returns) and reloaded after callouts that mutate them.

3 days agoUse broadcast chunk store instead of memset for dist=1 on NEON/AVX2/AVX-512
Nathan Moinvaziri [Sun, 10 May 2026 08:05:49 +0000 (01:05 -0700)] 
Use broadcast chunk store instead of memset for dist=1 on NEON/AVX2/AVX-512

Benchmarks show broadcasting the byte into a vector register and
falling through to the chunk store loop beats libc memset for the
short lengths common in inflate output (29-58% on AVX-512, 15-26%
on AVX2, 33% on NEON at len=8).

In 49a6bb5d, chunkmemset_1 was replaced with memset under the
assumption that libc memset would be at least as fast. According
to benchmarking, this isn't true in all cases.

3 days agoWrap AVX-512 mask intrinsics and use them in CHUNKMEMSET
Nathan Moinvaziri [Sun, 10 May 2026 08:05:11 +0000 (01:05 -0700)] 
Wrap AVX-512 mask intrinsics and use them in CHUNKMEMSET

3 days agoRoute CHUNKMEMSET to CHUNKCOPY when dist >= len
Nathan Moin Vaziri [Sun, 10 May 2026 01:03:20 +0000 (18:03 -0700)] 
Route CHUNKMEMSET to CHUNKCOPY when dist >= len

When dist >= len the source bytes don't overlap the destination, so no pattern repeat is needed and CHUNKCOPY produces the same output. Extends the existing `dist >= sizeof(chunk_t)` shortcut so archs with chunk_t > 8 also catch dist in [chunk_t/2, chunk_t-1] with small lengths.

3 days agoSkip CHUNKMEMSET loops when length fits in one chunk
Nathan Moin Vaziri [Sun, 10 May 2026 00:24:55 +0000 (17:24 -0700)] 
Skip CHUNKMEMSET loops when length fits in one chunk

When len <= sizeof(chunk_t) a single storechunk on chunk_load gives the same output as the two while-loops + tail. The over-write of chunk_t - len bytes lies in caller headroom.

3 days agoInline CHUNKMEMSET tail with bit-decomposed stores
Nathan Moin Vaziri [Sun, 10 May 2026 00:24:24 +0000 (17:24 -0700)] 
Inline CHUNKMEMSET tail with bit-decomposed stores

Replace variable-length memcpy with bitmask-gated fixed-size copies (16 + 8 + 4 + 2 + 1) so the compiler can inline each as a direct store instead of a libc call.

3 days agoAdd chunkmemset benchmark
Nathan Moin Vaziri [Sun, 10 May 2026 00:23:53 +0000 (17:23 -0700)] 
Add chunkmemset benchmark

Args span every dispatch arc in CHUNKMEMSET — dist=1 memset, dist>=chunk_t CHUNKCOPY, fast dist=2/4/8/16, GET_CHUNK_MAG, the unrolled loops, and the trailing remainder. Registered against chunkmemset_safe_{c,neon,sse2,ssse3,avx2,avx512,power8,rvv,lsx,lasx} where the symbol is built.

4 days agoRemove ptrdiff_t configure/cmake detection
Nathan Moin Vaziri [Thu, 16 Apr 2026 06:39:28 +0000 (23:39 -0700)] 
Remove ptrdiff_t configure/cmake detection

ptrdiff_t is defined in <stddef.h> since C89 and is guaranteed to
exist on any C99+ compiler. zlib-ng requires C99, so the build-time
check and the fallback typedef in zconf headers are unnecessary.

4 days agoAdd LIKELY / UNLIKELY hints to LONGEST_MATCH.
Hans Kristian Rosbach [Wed, 10 Jun 2026 13:13:58 +0000 (15:13 +0200)] 
Add LIKELY / UNLIKELY hints to LONGEST_MATCH.
Based on CI/Coveralls data and local benchmarking.

4 days agoUnify inconsistent handling of GZBUFSIZE, and update documentation.
Hans Kristian Rosbach [Wed, 10 Jun 2026 16:58:35 +0000 (18:58 +0200)] 
Unify inconsistent handling of GZBUFSIZE, and update documentation.

6 days agoAdd -mbmi to AVX2 and AVX512 compile flags
Nathan Moin Vaziri [Thu, 11 Jun 2026 23:38:57 +0000 (16:38 -0700)] 
Add -mbmi to AVX2 and AVX512 compile flags

The AVX2 and AVX512 flags enable BMI2 but not BMI1, and TZCNT is a
BMI1 instruction. GCC emits the rep bsf encoding that executes as
TZCNT on BMI hardware regardless, but clang gates on the feature bit
and emits plain BSF, which is slower on AMD. Every CPU with AVX2 also
has BMI1, so the flag only affects code already behind AVX2 runtime
detection.

Assisted-By: Claude Opus 4.8 (1M context)
6 days agoReplace hash calculations with macros in insert_string_p.h
Hans Kristian Rosbach [Thu, 11 Jun 2026 12:27:32 +0000 (14:27 +0200)] 
Replace hash calculations with macros in insert_string_p.h

6 days agoRemove unused functions
Hans Kristian Rosbach [Wed, 10 Jun 2026 21:10:08 +0000 (23:10 +0200)] 
Remove unused functions

6 days agoClean up and simplify insert_string code.
Hans Kristian Rosbach [Wed, 10 Jun 2026 18:55:37 +0000 (20:55 +0200)] 
Clean up and simplify insert_string code.

6 days agoRemove obsolete templating of insert_string functions.
Hans Kristian Rosbach [Wed, 10 Jun 2026 18:30:35 +0000 (20:30 +0200)] 
Remove obsolete templating of insert_string functions.

6 days ago[CI] Add configure workflow for Windows AMD64 with Clang.
Mika Lindqvist [Mon, 1 Jun 2026 18:06:25 +0000 (21:06 +0300)] 
[CI] Add configure workflow for Windows AMD64 with Clang.

7 days agoAdd branch prediction hints to deflate strategy hot loops
Nathan Moin Vaziri [Fri, 15 May 2026 23:22:01 +0000 (16:22 -0700)] 
Add branch prediction hints to deflate strategy hot loops

The deflate strategies were inconsistent about branch-prediction hints
on their per-iteration lookahead checks. deflate_quick marked both the
window-refill path unlikely and the steady-state path likely;
deflate_slow marked only the steady-state path; deflate_fast and
deflate_medium marked neither, leaving the compiler to guess the inner
loop's hot/cold block placement.

Mark the window-refill and end-of-input paths unlikely and the
steady-state path likely across deflate_fast, deflate_slow,
deflate_medium, deflate_rle, and deflate_huff, following deflate_quick,
so the rarely-taken refill block stays off the hot fall-through path
and block placement is consistent across all strategies.

8 days agoImplement coderabbitai suggested cleanup and improvements for utils/CMakeLists.txt
Hans Kristian Rosbach [Tue, 9 Jun 2026 14:24:11 +0000 (16:24 +0200)] 
Implement coderabbitai suggested cleanup and improvements for utils/CMakeLists.txt

8 days agoAdd utils/README.md
Hans Kristian Rosbach [Tue, 9 Jun 2026 13:41:28 +0000 (15:41 +0200)] 
Add utils/README.md

8 days agoMove makecrct, makefixed and maketrees to utils/ folder.
Hans Kristian Rosbach [Tue, 9 Jun 2026 13:25:19 +0000 (15:25 +0200)] 
Move makecrct, makefixed and maketrees to utils/ folder.

8 days agoMove minideflate.c and minigzip.c to utils/ folder, now always building
Hans Kristian Rosbach [Tue, 9 Jun 2026 13:13:08 +0000 (15:13 +0200)] 
Move minideflate.c and minigzip.c to utils/ folder, now always building
these, no longer depending on tests being enabled.

8 days agoFix warnings about BRAID_W/BRAID_N being undefined.
Hans Kristian Rosbach [Mon, 8 Jun 2026 17:48:55 +0000 (19:48 +0200)] 
Fix warnings about BRAID_W/BRAID_N being undefined.

8 days agoFix warnings about __CYGWIN__ being undefined.
Hans Kristian Rosbach [Mon, 8 Jun 2026 17:48:09 +0000 (19:48 +0200)] 
Fix warnings about __CYGWIN__ being undefined.

8 days agoFix warnings about BIG_ENDIAN/LITTLE_ENDIAN being undefined
Hans Kristian Rosbach [Mon, 8 Jun 2026 17:46:48 +0000 (19:46 +0200)] 
Fix warnings about BIG_ENDIAN/LITTLE_ENDIAN being undefined

9 days ago[CI] Add configure workflow for Windows ARM64 with clang.
Mika T. Lindqvist [Sun, 31 May 2026 11:32:41 +0000 (14:32 +0300)] 
[CI] Add configure workflow for Windows ARM64 with clang.

9 days agoFix warnings triggered by NVHPC/nvc.
Mika T. Lindqvist [Fri, 22 May 2026 11:14:50 +0000 (14:14 +0300)] 
Fix warnings triggered by NVHPC/nvc.

9 days agoReject asymmetric C vs C++ machine flags
Nathan Moin Vaziri [Wed, 20 May 2026 00:08:36 +0000 (17:08 -0700)] 
Reject asymmetric C vs C++ machine flags

Different machine flags in C vs C++ flags produce undefined-reference
link failures.

2 weeks ago[CI] Add ARM64 version of MinGW64 with gcc.
Mika T. Lindqvist [Sun, 31 May 2026 10:45:47 +0000 (13:45 +0300)] 
[CI] Add ARM64 version of MinGW64 with gcc.

2 weeks agogzread: Fix compilation on AIX
Aelin Reidel [Fri, 29 May 2026 04:27:48 +0000 (06:27 +0200)] 
gzread: Fix compilation on AIX

2 weeks ago[CI] Add NVHPC to CMake workflow.
Mika Lindqvist [Wed, 20 May 2026 22:57:39 +0000 (01:57 +0300)] 
[CI] Add NVHPC to CMake workflow.

2 weeks agoLCC: Suppress warnings in Google Benchmark.
Vladislav Shchapov [Sun, 17 May 2026 17:01:13 +0000 (22:01 +0500)] 
LCC: Suppress warnings in Google Benchmark.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoFix warnings variable set but not used.
Vladislav Shchapov [Sun, 17 May 2026 15:55:05 +0000 (20:55 +0500)] 
Fix warnings variable set but not used.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoFix warning label followed by a declaration is a C23 extension.
Vladislav Shchapov [Sun, 17 May 2026 14:54:51 +0000 (19:54 +0500)] 
Fix warning label followed by a declaration is a C23 extension.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2 weeks agoEnable -Werror in GCC and Clang.
Vladislav Shchapov [Sun, 17 May 2026 14:29:37 +0000 (19:29 +0500)] 
Enable -Werror in GCC and Clang.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
3 weeks agoAlways check that distance is too far back in inflateBack.
Mika T. Lindqvist [Tue, 19 May 2026 13:06:46 +0000 (16:06 +0300)] 
Always check that distance is too far back in inflateBack.

3 weeks agoMake macro redefinition fatal with LCC.
Mika T. Lindqvist [Sun, 17 May 2026 19:50:26 +0000 (22:50 +0300)] 
Make macro redefinition fatal with LCC.

3 weeks agoRemove dead unlink forward declaration in minigzip tests
Nathan Moin Vaziri [Fri, 17 Apr 2026 07:07:23 +0000 (00:07 -0700)] 
Remove dead unlink forward declaration in minigzip tests

Both files include zbuild.h which now unconditionally defines
_LARGEFILE64_SOURCE, making the !defined(_LARGEFILE64_SOURCE) clause
always false and the whole block unreachable. The upstream-zlib
purpose was forward-declaring unlink on platforms without unistd.h
that aren't Windows, but zlib-ng's supported platforms either have
unistd.h or define _WIN32.

3 weeks agoRemove dead NO_FSEEKO detection
Nathan Moin Vaziri [Fri, 17 Apr 2026 06:43:59 +0000 (23:43 -0700)] 
Remove dead NO_FSEEKO detection

zlib-ng never calls fseeko; gzlib.c was rewritten to use
lseek / lseek64 / _lseeki64 directly. The NO_FSEEKO define is not
referenced by any source file, so the CMake check_function_exists
and matching configure probe are dead code.

3 weeks agoDrop dead glibc feature-macro juggling from gzguts.h
Nathan Moin Vaziri [Fri, 17 Apr 2026 06:58:18 +0000 (23:58 -0700)] 
Drop dead glibc feature-macro juggling from gzguts.h

_LARGEFILE_SOURCE enables fseeko/ftello, which zlib-ng never calls
(gzlib.c uses lseek/lseek64/_lseeki64 directly). The _FILE_OFFSET_BITS
and _TIME_BITS undefs were defensive against a consumer-provided
-D_FILE_OFFSET_BITS=64 leaking into library internals, but they sat
after zbuild.h had already included <stdio.h> so they never affected
system-header sizing. The zlib.h gzopen->gzopen64 remap they were
guarding against is already blocked by the Z_INTERNAL gate for
library builds.

3 weeks agoInclude zbuild.h first in tools and tests
Nathan Moin Vaziri [Wed, 13 May 2026 21:12:12 +0000 (14:12 -0700)] 
Include zbuild.h first in tools and tests

zbuild.h defines _LARGEFILE64_SOURCE, but the macro only takes effect
if seen before any system header. The tools and fuzz/test files that
included <stdio.h> ahead of zbuild.h processed <features.h> without it,
so _LFS64_LARGEFILE never got set and z_off64_t expanded to an unknown
off64_t once LFS64 detection moved out of the build system.

Reorder so zbuild.h precedes any system header, and drop the redundant
<stdio.h>/<stdlib.h>/<string.h> includes that zbuild.h already pulls in.

3 weeks agoMove LFS64 detection from build system to C preprocessor
Nathan Moin Vaziri [Fri, 17 Apr 2026 06:42:59 +0000 (23:42 -0700)] 
Move LFS64 detection from build system to C preprocessor

zbuild.h now defines _LARGEFILE64_SOURCE before any system header, so
glibc exposes off64_t and lseek64 where available. The _LFS64_LARGEFILE
check already in zconf.h handles per-platform gating. Drops the
redundant -D__USE_LARGEFILE64 (internal glibc macro, set automatically)
and the unused __off64_t probe that only wrote to HAVE___OFF64_T.

The configure script's _off64_t probe was log-only and the off64_t=yes
path assumed fseeko exists without testing; both are removed in favor
of the standalone fseeko check.

4 weeks agoIf runtime CPU detection is disabled then define native_* macros to generic fallbacks...
Vladislav Shchapov [Sat, 16 May 2026 04:38:14 +0000 (09:38 +0500)] 
If runtime CPU detection is disabled then define native_* macros to generic fallbacks only if native_* macros is not previous defined.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
4 weeks agoShow --benchmark_cooldown in benchmark_zlib --help output
Nathan Moinvaziri [Sun, 17 May 2026 02:14:58 +0000 (19:14 -0700)] 
Show --benchmark_cooldown in benchmark_zlib --help output

Use the custom help printer callback in benchmark::Initialize to
append the --benchmark_cooldown flag to the standard help text.

4 weeks agoPass local window variable to quit_insert_string and insert_string functions.
Hans Kristian Rosbach [Mon, 11 May 2026 18:36:50 +0000 (20:36 +0200)] 
Pass local window variable to quit_insert_string and insert_string functions.

4 weeks agoUse local block_start and window variables in FLUSH_BLOCK.
Hans Kristian Rosbach [Mon, 11 May 2026 18:01:26 +0000 (20:01 +0200)] 
Use local block_start and window variables in FLUSH_BLOCK.
Also ensure all deflate methods use local window variable.
deflate_medium now passes local window to its static functions.

4 weeks agoSplit out writing deflate headers into a separate function, keeping the
Hans Kristian Rosbach [Tue, 12 May 2026 13:50:06 +0000 (15:50 +0200)] 
Split out writing deflate headers into a separate function, keeping the
deflate hot-path clean. This only benefits cases where you call deflate()
multiple times to provide more data.

4 weeks agoTest building with ClangCl for Windows ARM64
Mika Lindqvist [Mon, 4 May 2026 15:08:08 +0000 (18:08 +0300)] 
Test building with ClangCl for Windows ARM64

4 weeks agoUse Intel(R) Software Development Emulator for run tests on emulated Sapphire Rapids CPU
Vladislav Shchapov [Mon, 11 May 2026 14:30:24 +0000 (19:30 +0500)] 
Use Intel(R) Software Development Emulator for run tests on emulated Sapphire Rapids CPU

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
4 weeks agoMake CRC32 checking of headers use crc32_small directly, instead of taking
Hans Kristian Rosbach [Thu, 14 May 2026 11:06:45 +0000 (13:06 +0200)] 
Make CRC32 checking of headers use crc32_small directly, instead of taking
a detour through functable to spin up a vector optimized function when the
header is too small to be handled by it.
CRC2 and CRC4 call CRC_DO1_B macro directly since knowing the exact size
lets us inline a very small crc implementation.

4 weeks agoRename single-letter size-table awk variables.
Nathan Moin Vaziri [Thu, 14 May 2026 20:59:48 +0000 (13:59 -0700)] 
Rename single-letter size-table awk variables.

The awk block parsing the size command output used bt/bd/bb/bD for
base values and dt/dd/db/dD for deltas, with case-only distinction
between bd (data) and bD (dec) that's easy to misread. Spell them out
as base_text/base_data/base_bss/base_dec and d_text/d_data/d_bss/d_dec
for legibility.

4 weeks agoShow stripped library file size in delta workflow.
Nathan Moin Vaziri [Thu, 14 May 2026 19:04:25 +0000 (12:04 -0700)] 
Show stripped library file size in delta workflow.

The size table reports text/data/bss/dec from the size command, which
sums section sizes but doesn't account for ELF segment alignment padding.
A delta can show up in the dec column without changing the on-disk byte
count of the shipped .so. Add a row reporting the actual file size of
the stripped library so the shipping deliverable is visible alongside
the segment accounting.

4 weeks agoInline NEON_accum32 into adler32_copy_impl
Nathan Moinvaziri [Sun, 15 Mar 2026 02:47:21 +0000 (19:47 -0700)] 
Inline NEON_accum32 into adler32_copy_impl

Remove the separate NEON_accum32 function and inline its body
directly into the adler32_copy_impl loop. This eliminates the
function call boundary and lets src/dst pointers advance
naturally through the NEON processing iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 weeks agoCombine NEON_accum32_copy and NEON_accum32 into single function
Nathan Moinvaziri [Sun, 15 Mar 2026 02:43:49 +0000 (19:43 -0700)] 
Combine NEON_accum32_copy and NEON_accum32 into single function

Use a const int COPY parameter to select between the copy and
non-copy paths, matching the pattern used by adler32_copy_impl.
The copy variant uses 4x individual loads+stores (better ILP),
while the non-copy variant uses a single ld1x4 quad load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 weeks agoRemove obsolete padding in deflate and inflate state.
Hans Kristian Rosbach [Tue, 12 May 2026 18:01:26 +0000 (20:01 +0200)] 
Remove obsolete padding in deflate and inflate state.
Reorder some elements to better pack and improve cache-locality in deflate state.

5 weeks ago[CI] Add Cygwin with gcc.
Mika T. Lindqvist [Mon, 11 May 2026 05:54:55 +0000 (08:54 +0300)] 
[CI] Add Cygwin with gcc.

5 weeks ago[CI] Cleanup action caches.
Mika T. Lindqvist [Tue, 12 May 2026 07:53:45 +0000 (10:53 +0300)] 
[CI] Cleanup action caches.

5 weeks ago[CI] Extend caching apt packages to pigz workflow.
Mika T. Lindqvist [Tue, 12 May 2026 20:54:18 +0000 (23:54 +0300)] 
[CI] Extend caching apt packages to pigz workflow.

5 weeks agoFix scan_endstr offset in longest_match slow path.
Nathan Moin Vaziri [Sat, 11 Apr 2026 03:38:29 +0000 (20:38 -0700)] 
Fix scan_endstr offset in longest_match slow path.

LONGEST_MATCH_SLOW was using len - (STD_MIN_MATCH+1) instead of
len - (STD_MIN_MATCH-1) for the end-of-string hash probe, hashing a
window inside the already-matched region instead of ending one byte
past the current match. The slow path was missing match extensions
it should have been finding. The comment and the upstream fast_zlib
source both specify the correct offset.

Closes #2248.

Reported-by: Sergey "Shnatsel" Davidoff <291257+Shnatsel@users.noreply.github.com>
Reported-by: Folkert de Vries <7949978+folkertdev@users.noreply.github.com>
5 weeks agoMove shared code to composite actions.
Mika Lindqvist [Thu, 7 May 2026 21:28:46 +0000 (00:28 +0300)] 
Move shared code to composite actions.

5 weeks ago[CI] Cache Ubuntu .deb packages to speed up installing dependencies.
Mika Lindqvist [Tue, 5 May 2026 22:33:15 +0000 (01:33 +0300)] 
[CI] Cache Ubuntu .deb packages to speed up installing dependencies.
* Purge old packages and unneeded dependencies before copying remaining packages to cached directory

6 weeks agoAdd /delta workflow for loongarch64
Vladislav Shchapov [Tue, 5 May 2026 13:16:58 +0000 (18:16 +0500)] 
Add /delta workflow for loongarch64

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
6 weeks ago[CI] Force refreshing homebrew for macOS.
Mika Lindqvist [Wed, 6 May 2026 14:53:37 +0000 (17:53 +0300)] 
[CI] Force refreshing homebrew for macOS.

6 weeks ago[CI] Use older runner for Visual Studio 2022 jobs.
Mika Lindqvist [Wed, 6 May 2026 01:19:12 +0000 (04:19 +0300)] 
[CI] Use older runner for Visual Studio 2022 jobs.

6 weeks agoWhen using ALIGN_DOWN() macro, the signedness of types must match to avoid UBSAN...
Mika Lindqvist [Tue, 5 May 2026 19:59:27 +0000 (22:59 +0300)] 
When using ALIGN_DOWN() macro, the signedness of types must match to avoid UBSAN triggering warning about implicit sign change during widening.

6 weeks agoAdd early return when prev_length already exceeds lookahead
Nathan Moin Vaziri [Sun, 12 Apr 2026 17:00:09 +0000 (10:00 -0700)] 
Add early return when prev_length already exceeds lookahead

Near end-of-input the caller's prev_length can exceed the
current lookahead, making the chain walk pointless since no
match can be longer than the available input. The non-slow
path never clamped this case — break_matching was slow-path
only — leaving the output contract unguarded.

Together with the existing `if (len >= lookahead)` early
return in the update block — which stops the chain walk as
soon as a match reaches lookahead — this ensures no
unnecessary chain steps are taken. madler/zlib does the full
chain walk when prev_length exceeds lookahead and clamps
best_len at the function exit resulting in extra work.

6 weeks agoRemove dead break_matching label from longest_match
Nathan Moin Vaziri [Sun, 12 Apr 2026 17:01:47 +0000 (10:01 -0700)] 
Remove dead break_matching label from longest_match

The lookahead guard at break_matching is unreachable because
the early return `if (len >= lookahead) return lookahead` fires
before best_len is ever assigned, keeping best_len < lookahead
as a loop invariant. Replace the three goto sites with direct
returns and delete the label entirely.

6 weeks agoAllow /delta on fork pull requests
Nathan Moin Vaziri [Mon, 4 May 2026 20:47:20 +0000 (13:47 -0700)] 
Allow /delta on fork pull requests

The author_association gate already restricts triggers to OWNER, MEMBER,
or COLLABORATOR, so a maintainer running /delta on a fork PR carries the
same trust as checking the PR out locally. Drop the fork rejection and
the unused base/head repo id parsing.

6 weeks ago[CI] Use gcov from MinGW32 when generating coverage for 32-bit builds.
Mika Lindqvist [Mon, 4 May 2026 19:01:15 +0000 (22:01 +0300)] 
[CI] Use gcov from MinGW32 when generating coverage for 32-bit builds.

6 weeks agodeflate_rle: remove unnecessary check for too long matches
Hans Kristian Rosbach [Mon, 4 May 2026 19:38:27 +0000 (21:38 +0200)] 
deflate_rle: remove unnecessary check for too long matches

6 weeks agoDeflate_fast does not have 'prev_length', fix comment.
Hans Kristian Rosbach [Sun, 3 May 2026 18:07:34 +0000 (20:07 +0200)] 
Deflate_fast does not have 'prev_length', fix comment.

6 weeks agoFix check against BUILD_ALT_BENCH that was always defined as OFF
Nathan Moin Vaziri [Wed, 22 Apr 2026 20:50:58 +0000 (13:50 -0700)] 
Fix check against BUILD_ALT_BENCH that was always defined as OFF

6 weeks agoAdd PNG decode benchmark for narrow image widths
Nathan Moinvaziri [Thu, 26 Mar 2026 19:11:59 +0000 (12:11 -0700)] 
Add PNG decode benchmark for narrow image widths

Benchmark libpng row-by-row decoding where avail_out falls
below the 260-byte inflate_fast threshold. Uses a synthetic
gradient-with-noise pixel generator that produces deflate
token distributions representative of real photographs.
Also fix encode_png to use the passed width and height
instead of the hardcoded IMWIDTH and IMHEIGHT constants.

6 weeks agoFix libpng linking and include paths for benchmark apps
Nathan Moinvaziri [Thu, 26 Mar 2026 17:51:33 +0000 (10:51 -0700)] 
Fix libpng linking and include paths for benchmark apps

The FetchContent path was missing the binary directory from
PNG_INCLUDE_DIR, causing pnglibconf.h not to be found. The
link target was hardcoded to libpng.a which does not resolve
when libpng is built via FetchContent. Use png_static as the
CMake target when fetched, and normalize both paths through
PNG_STATIC_LIBRARY and PNG_INCLUDE_DIR variables.

6 weeks agoBump actions/cache from 4 to 5
dependabot[bot] [Fri, 1 May 2026 07:52:33 +0000 (07:52 +0000)] 
Bump actions/cache from 4 to 5

Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
6 weeks agoBump actions/checkout from 4 to 6
dependabot[bot] [Fri, 1 May 2026 07:52:30 +0000 (07:52 +0000)] 
Bump actions/checkout from 4 to 6

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
6 weeks agoBump mymindstorm/setup-emsdk from 14 to 16
dependabot[bot] [Fri, 1 May 2026 07:52:24 +0000 (07:52 +0000)] 
Bump mymindstorm/setup-emsdk from 14 to 16

Bumps [mymindstorm/setup-emsdk](https://github.com/mymindstorm/setup-emsdk) from 14 to 16.
- [Release notes](https://github.com/mymindstorm/setup-emsdk/releases)
- [Commits](https://github.com/mymindstorm/setup-emsdk/compare/v14...v16)

---
updated-dependencies:
- dependency-name: mymindstorm/setup-emsdk
  dependency-version: '16'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
6 weeks agoOptimize adler32_swar alignment and remove platform conditionals
Nathan Moinvaziri [Thu, 19 Mar 2026 19:28:22 +0000 (12:28 -0700)] 
Optimize adler32_swar alignment and remove platform conditionals

6 weeks agoCall adler32_c directly in adler32_copy_c scalar fallback
Nathan Moinvaziri [Tue, 17 Mar 2026 02:03:25 +0000 (19:03 -0700)] 
Call adler32_c directly in adler32_copy_c scalar fallback

The generic copy function was calling through the function
table, which dispatched to the best SIMD implementation
instead of the scalar path. This led to incorrect benchmarks
for adler32_copy/c since it measured the SIMD path rather
than the scalar fallback. Call adler32_c directly so the
scalar copy variant actually exercises the scalar checksum.

6 weeks agoAdd SWAR scalar adler32 for 64-bit platforms with unaligned access
Michael Niedermayer [Sun, 15 Mar 2026 08:03:47 +0000 (01:03 -0700)] 
Add SWAR scalar adler32 for 64-bit platforms with unaligned access

Borrows the SWAR (SIMD Within A Register) technique from FFmpeg's
libavutil/adler32.c by Michael Niedermayer. The original splits each
8-byte load into even/odd byte lanes packed as 4x16-bit accumulators
in a uint64_t, with a running prefix sum for the s2 contribution, and
a final reduction using multiply-and-shift with positional weight
constants. The chunk size is capped at 23 iterations of 8 bytes (184
bytes) to keep the 16-bit accumulators from overflowing.

Our improvements over the original FFmpeg implementation:
  - Process 16 bytes per iteration (two 64-bit loads) instead of 8,
    halving loop overhead while staying within the 23-iteration limit.
  - Handle an 8-byte remainder after the 16-byte loop so no bytes
    fall through to the slow scalar path unnecessarily.
  - Applied to both the NMAX inner loop (adler32_c) and the combined
    copy+checksum tail path (adler32_copy_tail) for all callers.

Benchmark results (AArch64, Apple M3, 10 repetitions):
  adler32_c 4MB:   1,131,242 ns -> 232,708 ns  (-79.4%)
  adler32_c 256KB:    70,672 ns ->  14,384 ns  (-79.7%)
  adler32_c 4KB:       1,105 ns ->     228 ns  (-79.3%)
  adler32_c 512B:        141 ns ->      29 ns  (-79.3%)
  adler32_c 64B:          20 ns ->       6 ns  (-69.6%)

https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/adler32.c

Co-Authored-By: Nathan Moinvaziri <nathan@nathanm.com>
6 weeks agoSimplify safe-mode copy path selection in inflate_fast
Nathan Moinvaziri [Wed, 25 Mar 2026 00:34:05 +0000 (17:34 -0700)] 
Simplify safe-mode copy path selection in inflate_fast

The branch structure now tests safe_mode directly, which is clearer and
produces the same code on all platforms. No functional change to the copy
operations used.

6 weeks agoAdd inflateBack test for safe mode bailout MATCH state handler
Nathan Moinvaziri [Tue, 10 Mar 2026 21:28:50 +0000 (14:28 -0700)] 
Add inflateBack test for safe mode bailout MATCH state handler

6 weeks agoImprove inflate_fast performance for small output buffers
Nathan Moinvaziri [Tue, 10 Mar 2026 19:00:02 +0000 (12:00 -0700)] 
Improve inflate_fast performance for small output buffers

Lowers the inflate_fast entry threshold from 260 to 3 bytes of
available output by adding a safe_mode parameter that uses
bounds-checked copies and bails to the MATCH state when output
space is insufficient. This eliminates the performance cliff
where libpng-style row-by-row decompression falls back to the
slow inflate path for the last 260 bytes of each row.

7 weeks agoReplace small/large buffer tests with parameterized test_chunked
Nathan Moinvaziri [Tue, 14 Apr 2026 03:20:26 +0000 (20:20 -0700)] 
Replace small/large buffer tests with parameterized test_chunked

test_large_buffers reset d_stream.next_out on every inflate iteration, so the
decompressed output was never compared against the source. test_chunked keeps
the input, compressed, and decompressed buffers separate and checks them with
memcmp.

New avail_out values (3, 64, 128, 256, 259) exercise inflate_fast()'s safe-mode
MATCH-state bailout around the 258-byte maximum match length.

8 weeks agoBump Google Benchmark to v1.9.5
Mika T. Lindqvist [Thu, 23 Apr 2026 11:46:39 +0000 (14:46 +0300)] 
Bump Google Benchmark to v1.9.5
* Google Benchmark v1.9.4 fails to compile with recent versions of clang and Visual C++ if warnings are treated as errors

8 weeks agoAdd compressed and ratio fields to deflate/corpora benchmarks
Nathan Moin Vaziri [Mon, 13 Apr 2026 05:27:42 +0000 (22:27 -0700)] 
Add compressed and ratio fields to deflate/corpora benchmarks

8 weeks agoAdd corpora benchmarks for deflate and inflate
Nathan Moin Vaziri [Wed, 8 Apr 2026 01:17:52 +0000 (18:17 -0700)] 
Add corpora benchmarks for deflate and inflate

Adds benchmark_corpora.cc which dynamically discovers and benchmarks
all files from the zlib-ng/corpora repository (silesia, calgary,
canterbury, large, snappy, etc.).

Benchmarks are registered at startup using RegisterBenchmark. If the
corpora directory is not present, no benchmarks are registered.
Deflate is tested at levels 1, 6, and 9 per file. Inflate is tested
once per file using data pre-compressed at level 9.

8 weeks agoAdd --benchmark_cooldown flag to mitigate thermal throttling
Nathan Moin Vaziri [Wed, 8 Apr 2026 01:10:20 +0000 (18:10 -0700)] 
Add --benchmark_cooldown flag to mitigate thermal throttling

Adds a --benchmark_cooldown=<seconds> flag that inserts a sleep between
benchmark families. This helps produce consistent results on systems
where sustained workloads cause thermal throttling and CPU frequency
scaling.

Uses a wrapping BenchmarkReporter that sleeps before forwarding results
to the default display reporter.

8 weeks agoAdd /delta workflow for per-PR binary size comparison
Nathan Moin Vaziri [Tue, 14 Apr 2026 18:01:39 +0000 (11:01 -0700)] 
Add /delta workflow for per-PR binary size comparison

On a /delta PR comment the job builds the PR head and base with
RelWithDebInfo, splits the DWARF into sibling .debug companions, and
runs several tools against both stripped libraries:

- binutils size for text/data/bss totals plus a Δ row
- bloaty for sections, top 30 compile units, and top 30 symbols
- nm --defined-only --dynamic to diff the exported symbol set
- abidiff for C ABI changes (honouring test/abi/ignore)
- minigzip at levels 1-9 over silesia-small.tar and, on native
  builds, the full silesia.tar

Results come back as a "## Delta Report" PR comment with a details
block per section, reporting both head and base SHAs so offset runs
are unambiguous.

Comment syntax is /delta [arch] [-N]. Arch defaults to x86_64 and
accepts aarch64, powerpc64le, riscv64, and s390x. -N selects the Nth
commit back from the PR head so a regression can be bisected without
force-pushing. Cross-compile builds reuse cmake/toolchain-*.cmake
and run the stripped binaries under qemu-user.

8 weeks agoUse fallback defines for Chorba Scalar/SSE
Nathan Moinvaziri [Wed, 18 Feb 2026 08:29:00 +0000 (00:29 -0800)] 
Use fallback defines for Chorba Scalar/SSE

Gate Scalar and SSE chorba uniformly on CRC32_CHORBA_FALLBACK and
CRC32_CHORBA_SSE_FALLBACK across prototypes, dispatch, sources, tests
and benchmarks instead of spot-checking WITHOUT_CHORBA /
WITHOUT_CHORBA_SSE directly at each site.

Also move crc32_chorba_c.c into ZLIB_GENERIC_SRCS and align Makefile.in
to match so the CMake and autotools builds stay bit-identical.

8 weeks agoRemove inert comment about disabling Chorba SSE in X86 functions header
Nathan Moin Vaziri [Sat, 18 Apr 2026 21:16:40 +0000 (14:16 -0700)] 
Remove inert comment about disabling Chorba SSE in X86 functions header

This was never correct, it should have been WITHOUT_CHORBA_SSE not NO_CHORBA_SSE

8 weeks agoFix typo in No Chorba CMake option name in CI
Nathan Moin Vaziri [Sat, 18 Apr 2026 20:49:12 +0000 (13:49 -0700)] 
Fix typo in No Chorba CMake option name in CI

The 'Ubuntu GCC No Chorba' matrix entry was passing -DWITH_CHORBA=OFF
since its introduction in 9d4af458, but the actual CMake option is
named WITH_CRC32_CHORBA.

8 weeks agoRemove CMake warning about MSVC Chorba bug
Nathan Moin Vaziri [Sat, 18 Apr 2026 20:52:57 +0000 (13:52 -0700)] 
Remove CMake warning about MSVC Chorba bug

The Chorba bug on SSE2/SSE41 has been fixed so this no longer applies.

8 weeks agoMerge duplicate 32-bit _mm_cvtsi64_si128 polyfills
Nathan Moin Vaziri [Fri, 17 Apr 2026 20:23:52 +0000 (13:23 -0700)] 
Merge duplicate 32-bit _mm_cvtsi64_si128 polyfills

The MSVC and GCC 32-bit polyfills for _mm_cvtsi64_si128 /
_mm_cvtsi128_si64 had identical bodies. Merge them into a single
block guarded by !__clang__ && ARCH_32BIT, with the MSVC-only
#include <intrin.h> nested inside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 weeks agoFix MSVC v142 miscompile of _mm_cvtsi64_si128 polyfill on 32-bit
Nathan Moin Vaziri [Fri, 17 Apr 2026 20:22:49 +0000 (13:22 -0700)] 
Fix MSVC v142 miscompile of _mm_cvtsi64_si128 polyfill on 32-bit

MSVC v142 (Visual Studio 2019, and VS 2022 pre-17.11) miscompiles
_mm_set_epi64x(0, a) on 32-bit Windows by routing part of the synthesis
through a GPR, clobbering live register data and causing stack corruption
in the chorba SSE2/SSE4.1 CRC32 code paths.

Replace the _mm_set_epi64x(0, a) polyfill with _mm_loadl_epi64 which
compiles to a single MOVQ xmm,m64 that bypasses the buggy synthesis
path. Also convert the GCC 32-bit _mm_cvtsi64_si128 macro to a static
inline for consistency, and drop the redundant ARCH_X86 guard since
x86_intrins.h is only reachable from x86 code.

https://developercommunity.visualstudio.com/t/10853479

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 months agoFix UBSAN implicit conversion warning in test/fuzz/fuzzer_example_flush.c.
Hans Kristian Rosbach [Wed, 15 Apr 2026 13:31:48 +0000 (15:31 +0200)] 
Fix UBSAN implicit conversion warning in test/fuzz/fuzzer_example_flush.c.

2 months agoFix UBSAN implicit conversion warning in test/test_deflate_concurrency.cc.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:58:44 +0000 (14:58 +0200)] 
Fix UBSAN implicit conversion warning in test/test_deflate_concurrency.cc.

2 months agoFix UBSAN implicit conversion warning in test/test_shared_ng.h.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:27:27 +0000 (14:27 +0200)] 
Fix UBSAN implicit conversion warning in test/test_shared_ng.h.

2 months agoFix UBSAN implicit conversion warning in arch/s390/crc32_vx.c.
Hans Kristian Rosbach [Wed, 15 Apr 2026 12:13:20 +0000 (14:13 +0200)] 
Fix UBSAN implicit conversion warning in arch/s390/crc32_vx.c.