Clean up LIKELY/UNLIKELY definitions, making them upper-case to improve visibility.
Add LIKELY_NULL hint.
Add PREFETCH_L1, PREFETCH_L2 and PREFETCH_RW for GCC, Clang, ICC and MSVC.
Added github actions yaml for cmake and configure.
Added cmake toolchain files for qemu powerpc, sparc64, and s390x.
Turn off shared libs for qemu and use static builds for qemu.
Fixed maintainer mode being overwriting when in CI by options parsing.
Set CI environment variable to enable Fuzzers by default.
Added support for code-coverage for all tests.
Reduce indirections used by send_bits and send_code.
Also simplify the debug tracing into the define instead
of using a separate static function.
x86_64 shows a small performance improvement.
travis.yml changes:
- Select minimal vm image instead of generic ('C'), reducing size and boot time.
- Windows needs 'C' image to not hang on booting
- Update from Xenial to Bionic for linux tests.
- ppc64 images don't have Bionic, set to Xenial
- ppc64 is having misc failures with Xenial tools, so use GCC-9 and Clang-6
- ppc64 cannot run MSAN
- Add another test for more complete testing across architectures.
- Enable fuzzers, msan and/or sanitizers most various tests.
- Disable ASAN leak detection on aarch64, since it crashes under qemu
- Reorder tests to be better grouped and running windows/macox tests lasts.
- Enable verbose messages from sanitizers
CmakeLists.txt changes:
- Enable warnings by default.
- Add MAINTAINER setting to cmake
- Enables extra warnings.
- Enables fuzzers.
- Add CI detection to cmake, currently auto-enables MAINTAINER mode.
- Add detection of several more sanitizer features.
- Test sanitizer features together, not just alone.
This also helps with their inter-dependencies.
Unify detection of ARM getauxval code availability.
We don't want to compile arch-specific code when WITH_OPTIM is not set,
and the current checks don't take that into account.
travis.yml changes:
- Add shorthand-variables for some of the long parameters often used
- Enable --warn in a couple configure tests that did not have it enabled.
- Make travis print out CMakeError.log or configure.log in after_failure.
- Reorder some cmake parameters to improve consistency
- Disable ccache. Downloading and uploading the cache archive is quite slow,
especially if travis is having network-connectivity issues.
Also ccache caches gcno (coverage) files, making the coverage data wrong
because it is being shared across builds, branches and PRs.
CmakeLists.txt changes:
- Enable -Wall by default in cmake.
Make travis retry its operations, to attempt to avoid many of the
failed builds due to silly things like git failed to look up github.com
or apt failed to install deps due to network timeout, etc.
Make curl retry download of codecov bash script, and
make codecov re-try the uploads, since all the network errors
travis has is causing erratic coverage data.
Jim Kukunas [Thu, 8 Dec 2016 04:13:26 +0000 (20:13 -0800)]
crc_folding: use temp buffer for partial stores
With deflate, the only destination for crc folding was the window, which
was guaranteed to have an extra 15B of padding (because we allocated it).
This padding allowed us to handle the partial store case (len < 16)
with a regular SSE store.
For inflate, this is no longer the case. For crc folding to be
efficient, it needs to operate on large chunks of the data each call.
For inflate, this means copying the decompressed data out to the
user-provided output buffer (moreso with our reorganized window). Since
it's user-provided, we don't have the padding guarantee and therefore
need to fallback to a slower method of handling partial length stores.
Nicolas Trangez [Thu, 3 Mar 2016 17:17:43 +0000 (18:17 +0100)]
crc_folding: Fix potential out-of-bounds access
In some (very rare) scenarios, the SIMD code in the `crc_folding` module
can perform out-of-bounds reads or writes, which could lead to GPF
crashes.
Here's the deal: when the `crc_fold_copy` function is called with a
non-zero `len` argument of less then 16, `src` is read through
`_mm_loadu_si128` which always reads 16 bytes. If the `src` pointer
points to a location which contains `len` bytes, but any of the `16 -
len` out-of-bounds bytes falls in unmapped memory, this operation will
trigger a GPF.
The same goes for the `dst` pointer when written to through
`_mm_storeu_si128`.
With this patch applied, the crash no longer occurs.
We first discovered this issue though Valgrind reporting an
out-of-bounds access while running a unit-test for some code derived
from `crc_fold_copy`. In general, the out-of-bounds read is not an issue
because reads only occur in sections which are definitely mapped
(assuming page size is a multiple of 16), and garbage bytes are ignored.
While giving this some more thought we realized for small `len` values
and `src` or `dst` pointers at a very specific place in the address
space can lead to GPFs.
- Minor tweaks to merge request by Jim Kukunas <james.t.kukunas@linux.intel.com>
- removed C11-isms
- use unaligned load
- better integrated w/ zlib (use zalign)
- removed full example code from commit msg
Add slide_hash to functable, and enable the sse2-optimized version.
Add necessary code to cmake and configure.
Fix slide_hash_sse2 to compile with zlib-ng.
Changes to support compilation with MSVC ARM & ARM64 (#386)
* Merge aarch64 and arm cmake sections.
* Updated MSVC compiler support for ARM and ARM64.
* Moved detection for -mfpu=neon to where the flag is set to simplify add_intrinsics_option.
* Only add ${ACLEFLAG} on aarch64 if not WITH_NEON.
* Rename arch/x86/ctzl.h to fallback_builtins.h.
Clean up travis config.
Remove a duplicate config, replace with a missing variant.
Add a little more variety to some of the tests.
Add ctest to x86_64 runs.
Fixed optimizations not being used when compiler is msvc. (#376)
This issue I mentioned in #370. Optimization code such as crc_folding.c, deflate_quick_sse.c, fill_window_sse.c, and insert_string_sse.c were not being compiled when the compiler was MSVC because the checks for the instrincs were not being done and the HAVE_[TARGET]_INTRIN variables weren't being set. I could have simply set HAVE_[TARGET]_INTRIN variables to ON manually in the case of MSVC, but it is better this way to have one path for all the compilers (that it runs and checks some code for determination). I have just added MSVC code where necessary in the checks.
* Rename HAVE_SSE42_INTRIN to HAVE_SSE42CRC_INLINE_ASM.
* Added msvc inline asm support to insert_string_sse.c
* Added cmake build instructions, build options, install instructions, and repository contents to README.md.
* Moved INDEX file content to README.md files.
* Added configure instructions and options.
* Added CodeFactor to build integration.
Ilya Leoshkevich [Fri, 24 May 2019 09:18:33 +0000 (11:18 +0200)]
Add "reproducible" deflate parameter
IBM Z DEFLATE CONVERSION CALL may produce different (but valid)
compressed data for the same uncompressed data. This behavior might be
unacceptable for certain use cases (e.g. reproducible builds). This
patch introduces Z_DEFLATE_REPRODUCIBLE parameter, which can be used to
indicate that this is the case, and turn off IBM Z DEFLATE CONVERSION
CALL.
Ilya Leoshkevich [Thu, 23 May 2019 11:57:43 +0000 (13:57 +0200)]
Add two new public zng_deflate{Set,Get}Params() functions
These functions allow zlib-ng callers to modify and query the
compression parameters in a future-proof way. When the caller requests a
parameter, which is not supported by the current zlib-ng version, this
situation is detected and reported to the caller. The caller may modify
or query multiple parameters at once. Currently only "level" and
"strategy" parameters are supported. It is planned to add a
"reproducible" parameter, which would affect whether IBM Z DEFLATE
CONVERSION CALL is used.
Passing enum and void * buffer was chosen over passing strings, because
of simplicity for the caller. If strings were used, C callers would have
to call snprintf() and strtoul() for setting and getting integer-valued
parameters respectively, which is quite tedious.
Bulk updates were chosen over updating individual parameters separately,
because it might make sense to apply some parameters atomically, e.g.
level and strategy.
The new functions are defined only for zlib-ng, but not compat zlib.
Change CMakeLists.txt so that if WITH_GZFILEOP is OFF, gz* sources are still compiled against the tests which need them.
Remove duplicate gz functions from test code.
Always compile with gz functions when zlib tests enabled in makefile.
QEMU maintainers have found and issue related to incorrect usage of
STFLE instruction [1], which is used to get features supported by the
machine. There are three potential problems with the current usage:
- R0 must contain the number of requested doublewords *minus one*. The
existing code lacks the "minus one" part.
- Older machines may not fill all the doublewords - this is fixed by
calling `memset`.
- STFLE updates R0, but we don't tell the compiler about this - this is
fixed by using a `+` constraint.
- Not really a problem, but it's enough to load 8 bits into R0, so its
type was changed to `uint8_t`. Also, STFLE only writes to `facilities`
variable, therefore memory clobber is unnecessary.
Ilya Leoshkevich [Fri, 24 May 2019 10:36:46 +0000 (12:36 +0200)]
IBM Z DFLTCC: minor documentation fixes
* Replace "DEFLATE COMPRESSION CALL" term with "DEFLATE CONVERSION
CALL", since the latter is the proper name of the new IBM Z
instruction.
* Mention CMake options in README.md.
* Replace "new macro" term with just "macro" in README.md (calling
macros "new" is not correct in this context).
* Replace "zlib" term with "zlib-ng", except in probe point names - it's
better to keep those common between gzip, zlib and zlib-ng.
This pull request adds Windows OS to travis matrix using cmake.
I had to rename the environment variables because I believed there might have been a conflict with the previous naming. I had to break the script section into multiple lines because Windows didn't like them altogether using &&.
Fixed compiler warnings on Windows in release mode (#349)
This pull request attempts to fix some compiler warnings on Windows when compiled in Release mode.
```
"zlib-ng\ALL_BUILD.vcxproj" (default target) (1) ->
"zlib-ng\zlibstatic.vcxproj" (default target) (6) ->
zlib-ng\deflate.c(1626): warning C4244: '=': conversion from 'uint16_t' to 'unsigned cha
r', possible loss of data [zlib-ng\zlibstatic.vcxproj]
zlib-ng\deflate_fast.c(61): warning C4244: '=': conversion from 'uint16_t' to 'unsigned
char', possible loss of data [zlib-ng\zlibstatic.vcxproj]
zlib-ng\deflate_slow.c(89): warning C4244: '=': conversion from 'uint16_t' to 'unsigned
char', possible loss of data [zlib-ng\zlibstatic.vcxproj]
```
This PR only adds cmake toolchain files for ARM. NAME/COMMAND stuff is so CMAKE_CROSSCOMPILING_EMULATOR is used. The message() is to prevent a warning about unused variable when specifying CMAKE_TOOLCHAIN_FILE.
Add support for IBM Z hardware-accelerated deflate
Future versions of IBM Z mainframes will provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib-ng and ratio comparable with that of level 1.
This patch adds DFLTCC support to zlib-ng. In order to enable it, the
following build commands should be used:
$ ./configure --with-dfltcc-deflate --with-dfltcc-inflate
$ make
When built like this, zlib-ng would compress in hardware on level 1,
and in software on all other levels. Decompression will always happen
in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.
to make it used by default) one could add -DDFLTCC_LEVEL_MASK=0x7e to
CFLAGS when building zlib-ng.
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired.
DFLTCC does not support every single zlib-ng feature, in particular:
* inflate(Z_BLOCK) and inflate(Z_TREES)
* inflateMark()
* inflatePrime()
* deflateParams() after the first deflate() call
When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.
This patch tries to add DFLTCC support in a least intrusive way.
All SystemZ-specific code was placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib-ng code. Below is the summary of those changes.
DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.
While for inflate software and hardware window formats match, this is
not the case for deflate. Therefore, deflateSetDictionary and
deflateGetDictionary need special handling, which is triggered using the
new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros.
deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib-ng state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.
In order to make unsupported deflateParams(), inflatePrime() and
inflateMark() calls to fail gracefully, the new DEFLATE_PARAMS_HOOK,
INFLATE_PRIME_HOOK and INFLATE_MARK_HOOK macros were introduced.
The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.
Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.
In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.
DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, bi_reverse and flush_pending were promoted from static
to ZLIB_INTERNAL and exposed via deflate.h.
Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.
Sebastian Pop [Mon, 28 Jan 2019 22:05:50 +0000 (16:05 -0600)]
only call NEON adler32 for more than 16 bytes
improves performance of inflate by up to 6% on an A-73 Hikey running at 2.36 GHz
when executing the chromium benchmark on the snappy data set. In a few cases
inflate is slower by up to 0.8%. Overall performance of inflate is better by
about 0.3%.
Ilya Leoshkevich [Tue, 26 Mar 2019 09:52:02 +0000 (10:52 +0100)]
Fix building with gcc 8.2.1 and -Wall -Wextra -pedantic -Werror
* ptrdiff_t check always failed because of unused parameter
* sizeof(void *) check always failed because of double semicolon
* Sign issue in nice_match assignment
* dist parameter of set_bytes may be unused
* Parameters of main may be unused in test/example.c
* snprintf requires a _POSIX_C_SOURCE #define in test/minigzip.c,
because a _POSIX_SOURCE #define is present
Mark Adler [Sun, 20 Nov 2016 19:36:15 +0000 (11:36 -0800)]
Increase verbosity required to warn about bit length overflow.
When debugging the Huffman coding would warn about resulting codes
greater than 15 bits in length. This is handled properly, and is
not uncommon. This increases the verbosity of the warning by one,
so that it is not displayed by default.
Ilya Leoshkevich [Tue, 26 Mar 2019 12:06:36 +0000 (13:06 +0100)]
Fix endianness detection in memcopy.h
When memcopy.h is included into inffast.c, endianness-related
preperocessor defines are not set, which leads to
BYTE_ORDER == LITTLE_ENDIAN condition being always true. This breaks
decompression at least on s390x.
Sebastian Pop [Tue, 26 Mar 2019 16:59:45 +0000 (11:59 -0500)]
fix oss-fuzz/13863
The oss fuzzers started failing with the following assert
```
ASSERT: 0 == memcmp(data + offset, buf, len)
```
after the following patch has been pulled in the tree:
define and use chunkmemset instead of byte_memset for INFFAST_CHUNKSIZE
```
The function chunkcopysafe is assuming that the input `len` is less than 16 bytes:
```
if ((safe - out) < (ptrdiff_t)INFFAST_CHUNKSIZE) {
```
but we were called with `len = 22` because `safe` was defined too small:
```
- safe = out + (strm->avail_out - INFFAST_CHUNKSIZE);
```
and the difference `safe - out` was 16 bytes smaller than the actual `len`.
The patch fixes the initialization of `safe` to:
```
+ safe = out + strm->avail_out;
```