This release contains two years of development and improvements to zlib-ng,
as well as fixes and changes inherited from zlib.
The 2.1.x version series has new targeted minumum buildsystem versions, as detailed on the Wiki https://github.com/zlib-ng/zlib-ng/wiki
Buildsystem:
- Many improvements to the CMake scripts.
- Improved support for detecting memory alignment functions.
- Improved support for unaligned access by letting the compiler promote code to unaligned if supported by the CPU.
- Remove x86 cpu feature detection for TZCNT, safely fallback to BSF.
- Enable using AVX512 intrinsics with GCC <9.
Optimizations and Enhancements:
- Decompression is a lot faster (56% faster measured on AVX2-capable x86-64)
- Compresson is improved for Level 9, at the cost of a little performance.
- Compression is improved for Level 3, by switching from deflate_fast to deflate_medium.
- Levels 3 and 4 have been reconfigured to provide a better gradual tradeoff for speed/compression between levels 2 and 5.
- Deflate_quick (Level 1) has been improved to default to a bigger windowsize and support changing the window size like the other levels.
New instruction set optimizations:
- Adler32 implementation using AVX512, AVX512-VNNI, VMX.
- CRC32-B implementation using VPCLMULQDQ & IBM-Z.
- Slide hash implementation using VMX.
- Compare256 implementations using SSE2, Neon, & POWER9.
- Inflate chunk copying using SSSE3 & VSX.
Compatibility and Porting:
- CRC-32 computation changes from madler/zlib. zlib-ng/zlib-ng#a6155234
- Compatible and up-to-date with zlib 1.2.13.
- Removed the usage of macros in zlib-ng.h, making life easier for languages that want to call the C functions without having the C preprocessor (Python, etc).
Improved support more environments:
- Apple M1
- vcpkg
- Emscripten
Testing:
- Tests have been converted to use GTest. Many new tests have also been added.
- Gbench support has been added to easily benchmark changes to performance-critical functions.
Misc:
- Several pieces of core code has been restructured or rewritten.
- Too many changes to list here, see the git commit log for the full list of changes.
Deprecations:
- Configure no longer has the full range of tests.
- NMake is no longer actively supported and tested, it is now community supported.
- See the wiki for minimum build system versions and deprecations https://github.com/zlib-ng/zlib-ng/wiki
Mark Adler [Fri, 17 Feb 2023 08:06:32 +0000 (00:06 -0800)]
Assure that inflatePrime() can't shift a 32-bit integer by 32 bits.
The inflate() functions never leave state->bits greater than 24, so
an inflatePrime() call could not cause this. The only way this
could have happened would be by using inflatePrime() to fill the
bit buffer with 32 bits, and then calling inflatePrime() a *second*
time asking to insert zero bits, for some reason. This commit
assures that a shift by 32 bits does not occur even in that case.
IBM zSystems: Fix calling deflateBound() before deflateInit()
Even though zlib officialy forbids calling deflateBound() before
deflateInit(), Firefox does this anyway, and it happens to work [1],
but unfortunately not with DFLTCC [2], because the DFLTCC code assumes
that the deflate state is allocated, and segfaults when it isn't.
Bow down before Hyrum's Law and add deflateStateCheck() to
DEFLATE_BOUND_ADJUST_COMPLEN().
Fix CMake check for posix_memalign and aligned_alloc
These two functions were being checked using check_function_exists. This
CMake macro does not check to see if the given function is declared in
any header as it declares its own function prototype and relies on
linking to determine function availability. This causes two issues.
Firstly, it will always succeed when the CMake toolchain file sets
CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY as no linking will take
place. See: https://gitlab.kitware.com/cmake/cmake/-/issues/18121
Secondly, it will not correctly detect macros or inline functions, or
whether the function is even declared in a header at all.
Switch to check_symbol_exists at CMake's recommendation, the logic of
which actually matches the same checks in the configure script.
lawadr [Thu, 30 Mar 2023 19:37:14 +0000 (20:37 +0100)]
Check for attribute aligned compiler support
Check for compiler support in CMake and the configure script. This
allows ALIGNED_ to be defined for more compilers so that more than
just Clang, GCC and MSVC can build the project.
The header locations are OS specific and not architecture specific. The
previous behaviour was to always include machine/endian.h for ARM and
AArch64 architectures on non-Windows and non-Linux OSs, causing build
failures if the OS uses other locations defined further down the
conditional block.
Georgiy Manuilov [Sun, 12 Mar 2023 13:45:53 +0000 (14:45 +0100)]
Enable using AVX512 intrinsics with GCC <9
Replace missing '_mm512_set_epi8' with
'_mm512_set_epi32' in test code for configuring;
Add fallback for '-mtune=cascadelake' flag used
when AVX512 is enabled.
Georgiy Manuilov [Sun, 12 Mar 2023 13:45:05 +0000 (14:45 +0100)]
Add fallback function for '_mm512_set_epi8' intrinsic
'_mm512_set_epi8' intrinsic is missing in GCC <9.
However, its usage can be easily eliminated in
favor of '_mm512_set_epi32' with no loss in
performance enabling older GCC to benefit from
AVX512-optimized codepaths.
lawadr [Mon, 20 Mar 2023 17:46:35 +0000 (17:46 +0000)]
Add member to cpu_features struct if empty
When WITH_OPTIM is off, the cpu_features struct is empty. This is not
allowed in standard C and causes a build failure with various compilers,
including MSVC.
This adds a dummy char member to the struct if it would otherwise be
empty.
lawadr [Fri, 17 Mar 2023 16:35:13 +0000 (16:35 +0000)]
Fix regex for visibility attribute tests
The previous regex of `not supported` was very specific to a particular
compiler (Clang 3.4+). As Clang isn't the only compiler that throws a
warning (but otherwise succeeds) when a visibility isn't supported, make
the regex more generic to hit all such cases.
Testing on Compiler Explorer shows that looking for the string
`visibility` has a better hit rate. `attribute` is perhaps more
dangerously generic, and `hidden`/`internal` doesn't always show up in
warning messages when the visibility attribute itself isn't available.
Reduce the amount of different defines required for arch-specific optimizations.
Also removed a reference to a nonexistant adler32_sse41 in test/test_adler32.cc.
Combine some of the checks that were not identical.
Made longest_match and compare256 use the X86_NOCHECK_SSE2 override,
thus now those are also automatically enabled on x86_64.
Ilya Leoshkevich [Fri, 10 Feb 2023 12:41:07 +0000 (13:41 +0100)]
Fix warnings in benchmarks
1. Initialize len in benchmark_compare256.cc.
In function ‘typename std::enable_if<(std::is_trivially_copyable<_Tp>::value && (sizeof (Tp) <= sizeof (Tp*)))>::type benchmark::DoNotOptimize(Tp&) [with Tp = unsigned int]’,
inlined from ‘void compare256::Bench(benchmark::State&, compare256_func)’ at /zlib-ng/test/benchmarks/benchmark_compare256.cc:44:33,
inlined from ‘virtual void compare256_c_Benchmark::BenchmarkCase(benchmark::State&)’ at /zlib-ng/test/benchmarks/benchmark_compare256.cc:62:1:
/zlib-ng/_deps/benchmark-src/include/benchmark/benchmark.h:480:3: warning: ‘len’ may be used uninitialized [-Wmaybe-uninitialized]
480 | asm volatile("" : "+m,r"(value) : : "memory");
| ^~~
/zlib-ng/test/benchmarks/benchmark_compare256.cc: In member function ‘virtual void compare256_c_Benchmark::BenchmarkCase(benchmark::State&)’:
/zlib-ng/test/benchmarks/benchmark_compare256.cc:36:18: note: ‘len’ was declared here
36 | uint32_t len;
| ^~~
2. Make the loop counter unsigned in benchmark_slidehash.cc.
/zlib-ng/test/benchmarks/benchmark_slidehash.cc: In member function ‘virtual void slide_hash::SetUp(const benchmark::State&)’:
/zlib-ng/test/benchmarks/benchmark_slidehash.cc:29:31: warning: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘unsigned int’ [-Wsign-compare]
29 | for (int32_t i = 0; i < HASH_SIZE; i++) {
Adjust thread counts for compiles and tests to avoid under-utilization and congestion.
The free Github Actions VMs have 2 cores, the dedicated s390x VM has 4 cores.
Disable zlib-ng internal tests when BUILD_SHARED_LIBS=ON.
When BUILD_SHARED_LIBS=ON some zlib-ng internal functions are not exported,
which are used by gtest_zlib and benchmark_zlib. Therefore, we must disable
those tests/projects.
Replace __builtin_ctz[ll] fallback functions with branchless implementations.
Added debug assert check for value = 0.
Added more details to the comment to avoid future confusion.
Added fallback logic for older MSVC versions, just in case.
This should reduce the cost of indirection that occurs when calling functable
chunk copying functions inside inflate_fast. It should also allow the compiler
to optimize the inflate fast path for the specific architecture.
Mark Adler [Thu, 15 Dec 2022 17:07:13 +0000 (09:07 -0800)]
Fix bug in deflateBound() for level 0 and memLevel 9.
memLevel 9 would cause deflateBound() to assume the use of fixed
blocks, even if the compression level was 0, which forces stored
blocks. That could result in a bound less than the size of the
compressed data. Now level 0 always uses the stored blocks bound.
Mika Lindqvist [Sat, 21 Jan 2023 23:16:11 +0000 (01:16 +0200)]
Allow disabling visibility attribute with configure
* Disable visibility check for Cygwin, MinGW and MSYS as the compiler will only issue warning instead of error for unsupported attributes.
Fix ABI checking...
* Ubuntu 22.04 use different format for ABI files so old ones need to be removed
* Use more recent zlib-ng commit to avoid issues with internal adler32 and crc32 functions