David Korth [Sat, 15 Jul 2023 14:13:20 +0000 (10:13 -0400)]
Handle ARM64EC as ARM64.
ARM64EC is a new ARM64 variant introduced in Windows 11 that uses an
ABI similar to AMD64, which allows for better interoperability with
emulated AMD64 applications. When enabled in MSVC, it defines _M_AMD64
and _M_ARM64EC, but not _M_ARM64, so we need to check for _M_ARM64EC.
Define an empty __msan_unpoison() without Z_MEMORY_SANITIZER
Currently all the usages of __msan_unpoison() have to be guarded by
"#ifdef Z_MEMORY_SANITIZER". Simplify things by defining an empty
__msan_unpoison() when the code is compiled without MSan.
Use endianess-specific built-in function for gcc < 12 on PowerPC64
* Add support for cross-compiling using clang 13 and later for PowerPC64 little-endian and big-endian
* Fix detection for availability of Power9 intrinsics
Changes since 2.1.0-Beta1:
- Fix missing exported z_size_t type in zlib.h (zlib-compat mode).
- Fix two Coverity warnings
- Fix CMake GNUInstallDirs usage
- Configure/CMake improvements for compilers with early AVX512-VNNI support (GCC8.0 etc)
- Microptimalization for AVX512 implementation of CRC32
- Optimized deflate_rle compression, also added related test and benchmark.
- Add testing of file_compress/file_uncompress in minigzip/minideflate
- Add emulated RISC-V to CI test workflow
- Add deflate_fast to switchlevels test
- Fix abicheck CI test was not ignoring version string
- Fix MinGW CI test, broken by Github Actions VM image updates
Fix z_size_t definition:
- Zlib Compat: Move definition of z_size_t to zconf.h, so it is exported to applications.
Always defined as size_t to follow zlib 1.2.13 behavior with STDC compilers.
- Zlib-NG: Keeps internal definition of z_size_t in zbuild.h
This release contains two years of development and improvements to zlib-ng,
as well as fixes and changes inherited from zlib.
The 2.1.x version series has new targeted minumum buildsystem versions, as detailed on the Wiki https://github.com/zlib-ng/zlib-ng/wiki
Buildsystem:
- Many improvements to the CMake scripts.
- Improved support for detecting memory alignment functions.
- Improved support for unaligned access by letting the compiler promote code to unaligned if supported by the CPU.
- Remove x86 cpu feature detection for TZCNT, safely fallback to BSF.
- Enable using AVX512 intrinsics with GCC <9.
Optimizations and Enhancements:
- Decompression is a lot faster (56% faster measured on AVX2-capable x86-64)
- Compresson is improved for Level 9, at the cost of a little performance.
- Compression is improved for Level 3, by switching from deflate_fast to deflate_medium.
- Levels 3 and 4 have been reconfigured to provide a better gradual tradeoff for speed/compression between levels 2 and 5.
- Deflate_quick (Level 1) has been improved to default to a bigger windowsize and support changing the window size like the other levels.
New instruction set optimizations:
- Adler32 implementation using AVX512, AVX512-VNNI, VMX.
- CRC32-B implementation using VPCLMULQDQ & IBM-Z.
- Slide hash implementation using VMX.
- Compare256 implementations using SSE2, Neon, & POWER9.
- Inflate chunk copying using SSSE3 & VSX.
Compatibility and Porting:
- CRC-32 computation changes from madler/zlib. zlib-ng/zlib-ng#a6155234
- Compatible and up-to-date with zlib 1.2.13.
- Removed the usage of macros in zlib-ng.h, making life easier for languages that want to call the C functions without having the C preprocessor (Python, etc).
Improved support more environments:
- Apple M1
- vcpkg
- Emscripten
Testing:
- Tests have been converted to use GTest. Many new tests have also been added.
- Gbench support has been added to easily benchmark changes to performance-critical functions.
Misc:
- Several pieces of core code has been restructured or rewritten.
- Too many changes to list here, see the git commit log for the full list of changes.
Deprecations:
- Configure no longer has the full range of tests.
- NMake is no longer actively supported and tested, it is now community supported.
- See the wiki for minimum build system versions and deprecations https://github.com/zlib-ng/zlib-ng/wiki
Mark Adler [Fri, 17 Feb 2023 08:06:32 +0000 (00:06 -0800)]
Assure that inflatePrime() can't shift a 32-bit integer by 32 bits.
The inflate() functions never leave state->bits greater than 24, so
an inflatePrime() call could not cause this. The only way this
could have happened would be by using inflatePrime() to fill the
bit buffer with 32 bits, and then calling inflatePrime() a *second*
time asking to insert zero bits, for some reason. This commit
assures that a shift by 32 bits does not occur even in that case.
IBM zSystems: Fix calling deflateBound() before deflateInit()
Even though zlib officialy forbids calling deflateBound() before
deflateInit(), Firefox does this anyway, and it happens to work [1],
but unfortunately not with DFLTCC [2], because the DFLTCC code assumes
that the deflate state is allocated, and segfaults when it isn't.
Bow down before Hyrum's Law and add deflateStateCheck() to
DEFLATE_BOUND_ADJUST_COMPLEN().
Fix CMake check for posix_memalign and aligned_alloc
These two functions were being checked using check_function_exists. This
CMake macro does not check to see if the given function is declared in
any header as it declares its own function prototype and relies on
linking to determine function availability. This causes two issues.
Firstly, it will always succeed when the CMake toolchain file sets
CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY as no linking will take
place. See: https://gitlab.kitware.com/cmake/cmake/-/issues/18121
Secondly, it will not correctly detect macros or inline functions, or
whether the function is even declared in a header at all.
Switch to check_symbol_exists at CMake's recommendation, the logic of
which actually matches the same checks in the configure script.
lawadr [Thu, 30 Mar 2023 19:37:14 +0000 (20:37 +0100)]
Check for attribute aligned compiler support
Check for compiler support in CMake and the configure script. This
allows ALIGNED_ to be defined for more compilers so that more than
just Clang, GCC and MSVC can build the project.
The header locations are OS specific and not architecture specific. The
previous behaviour was to always include machine/endian.h for ARM and
AArch64 architectures on non-Windows and non-Linux OSs, causing build
failures if the OS uses other locations defined further down the
conditional block.
Georgiy Manuilov [Sun, 12 Mar 2023 13:45:53 +0000 (14:45 +0100)]
Enable using AVX512 intrinsics with GCC <9
Replace missing '_mm512_set_epi8' with
'_mm512_set_epi32' in test code for configuring;
Add fallback for '-mtune=cascadelake' flag used
when AVX512 is enabled.
Georgiy Manuilov [Sun, 12 Mar 2023 13:45:05 +0000 (14:45 +0100)]
Add fallback function for '_mm512_set_epi8' intrinsic
'_mm512_set_epi8' intrinsic is missing in GCC <9.
However, its usage can be easily eliminated in
favor of '_mm512_set_epi32' with no loss in
performance enabling older GCC to benefit from
AVX512-optimized codepaths.
lawadr [Mon, 20 Mar 2023 17:46:35 +0000 (17:46 +0000)]
Add member to cpu_features struct if empty
When WITH_OPTIM is off, the cpu_features struct is empty. This is not
allowed in standard C and causes a build failure with various compilers,
including MSVC.
This adds a dummy char member to the struct if it would otherwise be
empty.
lawadr [Fri, 17 Mar 2023 16:35:13 +0000 (16:35 +0000)]
Fix regex for visibility attribute tests
The previous regex of `not supported` was very specific to a particular
compiler (Clang 3.4+). As Clang isn't the only compiler that throws a
warning (but otherwise succeeds) when a visibility isn't supported, make
the regex more generic to hit all such cases.
Testing on Compiler Explorer shows that looking for the string
`visibility` has a better hit rate. `attribute` is perhaps more
dangerously generic, and `hidden`/`internal` doesn't always show up in
warning messages when the visibility attribute itself isn't available.
Reduce the amount of different defines required for arch-specific optimizations.
Also removed a reference to a nonexistant adler32_sse41 in test/test_adler32.cc.
Combine some of the checks that were not identical.
Made longest_match and compare256 use the X86_NOCHECK_SSE2 override,
thus now those are also automatically enabled on x86_64.
Ilya Leoshkevich [Fri, 10 Feb 2023 12:41:07 +0000 (13:41 +0100)]
Fix warnings in benchmarks
1. Initialize len in benchmark_compare256.cc.
In function ‘typename std::enable_if<(std::is_trivially_copyable<_Tp>::value && (sizeof (Tp) <= sizeof (Tp*)))>::type benchmark::DoNotOptimize(Tp&) [with Tp = unsigned int]’,
inlined from ‘void compare256::Bench(benchmark::State&, compare256_func)’ at /zlib-ng/test/benchmarks/benchmark_compare256.cc:44:33,
inlined from ‘virtual void compare256_c_Benchmark::BenchmarkCase(benchmark::State&)’ at /zlib-ng/test/benchmarks/benchmark_compare256.cc:62:1:
/zlib-ng/_deps/benchmark-src/include/benchmark/benchmark.h:480:3: warning: ‘len’ may be used uninitialized [-Wmaybe-uninitialized]
480 | asm volatile("" : "+m,r"(value) : : "memory");
| ^~~
/zlib-ng/test/benchmarks/benchmark_compare256.cc: In member function ‘virtual void compare256_c_Benchmark::BenchmarkCase(benchmark::State&)’:
/zlib-ng/test/benchmarks/benchmark_compare256.cc:36:18: note: ‘len’ was declared here
36 | uint32_t len;
| ^~~
2. Make the loop counter unsigned in benchmark_slidehash.cc.
/zlib-ng/test/benchmarks/benchmark_slidehash.cc: In member function ‘virtual void slide_hash::SetUp(const benchmark::State&)’:
/zlib-ng/test/benchmarks/benchmark_slidehash.cc:29:31: warning: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘unsigned int’ [-Wsign-compare]
29 | for (int32_t i = 0; i < HASH_SIZE; i++) {
Adjust thread counts for compiles and tests to avoid under-utilization and congestion.
The free Github Actions VMs have 2 cores, the dedicated s390x VM has 4 cores.