Enable codecov for more CI jobs.
Disable codecov where -O1 or higher is requested, since codecov sets -O0.
Disable codecov where tests are not run.
Add comments for jobs where codecov is not enabled.
Combine ARM CI jobs testing non-NEON with non-ARMv8, as these have no common optimized
functions. For Aarch64, use no-opt config for testing bothwithout Neon/Armv8.
This reduced cmake and configure jobs by 3 each.
Also reorder and rename a few other jobs to try to use a common style.
Remove separate MMAP CI job by folding into another.
Remove separate REDUCED_MEM CI job by folding into another.
Make sure both are present for both GCC and Clang.
Add ZLIB_COMPAT to clang debug job.
Fix (impossible) infinite loop in gz_fetch() detected by GCC-14 static analyzer.
According to the comment, gz_fetch() also assumes that state->x.have == 0, so
lets add an Assert to that effect.
Fix symbol mangling so symbols in shared library are exported correctly
* We need to mangle symbols in the map file, otherwise none of the symbols are exported
* Fix gz_error name conflict with zlib-ng API
Simplify the gzread.c name mangling workaround by splitting out just
the workaround into a separate file. This allows us to browse gzread.c
with code highlighting and it allows codecov to record coverage data.
Don't count tests/tools towards overall project coverage.
Set project coverage target to 80%.
Loosen project coverage reduction threshold to 10% to avoid failing coverage
tests when CI happens to run on hosts that do not support AVX-512.
Set component coverage reduction thresholds low, except for common and
arch_x86 that need higher limits due to the AVX-512 CI hosts.
deflateinit was still checking for failed secondary allocations, this is
no longer necessary as we only allocate a single buffer and has already
been checked for failure before this.
Ignore benchmarks in codecov coverage reports.
We already avoid collecting coverage when running benchmarks because the
benchmarks do not perform most error checking, thus even though they might
code increase coverage, they won't detect most bugs unless it actually
crashes the whole benchmark.
Added separate components.
Wait for CI completion before posting status report, avoids emailing an inital report with very low coverage based on pigz tests only.
Make report informational, low coverage will not be a CI failure.
Disable Github Annotations, these are deprecated due to API limits.
Improve benchmark_compress and benchmark_uncompress.
- These now use the same generated data as benchmark_inflate.
- benchmark_uncompress now also uses level 9 for compression, so that
we also get 3-byte matches to uncompress.
- Improve error checking
- Unify code with benchmark_inflate
Add new benchmark inflate_nocrc. This lets us benchmark just the
inflate process more accurately. Also adds a new shared function for
generating highly compressible data that avoids very long matches.
Adam Stylinski [Fri, 12 Dec 2025 21:23:27 +0000 (16:23 -0500)]
Force purely aligned loads in inflate_table code length counting
At the expense of some extra stack space and eating about 4 more cache
lines, let's make these loads purely aligned. On potato CPUs such as the
Core 2, unaligned loads in a loop are not ideal. Additionally some SBC
based ARM chips (usually the little in big.little variants) suffer a
penalty for unaligned loads. This also paves the way for a trivial
altivec implementation, for which unaligned loads don't exist and need
to be synthesized with permutation vectors.
Fix initial crc value loading in crc32_(v)pclmulqdq
In main function, alignment diff processing was getting in the way of XORing
the initial CRC, because it does not guarantee at least 16 bytes have been
loaded.
In fold_16, src data modified by initial crc XORing before being stored to dst.
Adam Stylinski [Tue, 23 Dec 2025 23:58:10 +0000 (18:58 -0500)]
Small optimization in 256 bit wide chunkset
It turns out Intel only parses the bottom 4 bits of the shuffle vector.
This makes it already a sufficient permutation vector and saves us a
small bit of latency.
Improve cmake/detect-arch.cmake to also provide bitness.
Rewrite checks in CMakelists.txt and cmake/detect-intrinsics.cmake
to utilize the new variables.
- Add local window pointer to:
deflate_quick, deflate_fast, deflate_medium and fill_window.
- Add local strm pointer in fill_window.
- Fix missed change to use local lookahead variable in match_tpl
Deflate_state changes:
- Reduce opt_len/static_len sizes.
- Move matches/insert closer to their related varibles.
These now fill a 8-byte hole in the struct on 64-bit platforms.
- Exclude compressed_len and bits_sent if ZLIB_DEBUG is
not enabled. Also move them to the end.
- Remove x86 MSVC-specific padding
- Minor inlining changes in trees_emit.h:
- Inline the small bi_windup function
- Don't attempt inlining for the big zng_emit_dist
- Don't check for too long match in deflate_quick, it cannot happen.
- Move GOTO_NEXT_CHAIN macro outside of LONGEST_MATCH function to
improve readability.