Mike Klein [Thu, 20 Sep 2018 20:34:42 +0000 (20:34 +0000)]
remove 16-byte alignment from deflate_state::crc0
We noticed recently on the Skia tree that if we build Chromium's zlib
with GCC, -O3, -m32, and -msse2, deflateInit2_() crashes. Might also
need -fPIC... not sure.
I tracked this down to a `movaps` (16-byte aligned store) to an address
that was only 8-byte aligned. This address was somewhere in the middle
of the deflate_state struct that deflateInit2_()'s job is to initialize.
That deflate_state struct `s` is allocated using ZALLOC, which calls any
user supplied zalloc if set, or the default if not. Neither one of
these has any special alignment contract, so generally they'll tend to
be 2*sizeof(void*) aligned. On 32-bit builds, that's 8-byte aligned.
But because we've annotated crc0 as zalign(16), the natural alignment of
the whole struct is 16-byte, and a compiler like GCC can feel free to
use 16-byte aligned stores to parts of the struct that are 16-byte
aligned, like the beginning, crc0, or any other part before or after
crc0 that happens to fall on a 16-byte boundary. With -O3 and -msse2,
GCC does exactly that, writing a few of the fields with one 16-byte
store.
The fix is simply to remove zalign(16). All the code that manipulates
this field was actually already using unaligned loads and stores. You
can see it all right at the top of crc_folding.c, CRC_LOAD and CRC_SAVE.
This bug comes from the Intel performance patches we landed a few years
ago, and isn't present in upstream zlib, Android's zlib, or Google's
internal zlib.
It doesn't seem to be tickled by Clang, and won't happen on 64-bit GCC
builds: zalloc is likely 16-byte aligned there. I _think_ it's possible
for it to trigger on non-x86 32-bit builds with GCC, but haven't tested
that. I also have not tested MSVC.
Mark Adler [Wed, 18 Apr 2018 05:09:22 +0000 (22:09 -0700)]
Fix a bug that can crash deflate on some input when using Z_FIXED.
This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.
The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.
Sebastian Pop [Sat, 10 Nov 2018 15:27:12 +0000 (09:27 -0600)]
fix oss-fuzz/11323: clear out s->prev buffer
zlib-ng compiled with MSAN used to fail with:
SUMMARY: MemorySanitizer: use-of-uninitialized-value /src/zlib-ng/match.c:473:60 in longest_match
Exiting
Uninitialized value was stored to memory at
#0 0x7fcaced77645 in fill_window_sse /src/zlib-ng/arch/x86/fill_window_sse.c:84:17
#1 0x7fcaced7d3d4 in deflate_quick /src/zlib-ng/arch/x86/deflate_quick.c:230:13
#2 0x7fcaced2f54b in zng_deflate /src/zlib-ng/deflate.c:951:18
#3 0x4a04e9 in test_large_deflate /src/zlib-ng/test/example.c:266:11
#4 0x4a38d2 in main /src/zlib-ng/test/example.c:539:5
#5 0x7fcace96a82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
Uninitialized value was created by a heap allocation
#0 0x45bf70 in malloc /src/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:910
#1 0x7fcaced26cd9 in zng_deflateInit2_ /src/zlib-ng/deflate.c:315:26
#2 0x7fcaced2605a in zng_deflateInit_ /src/zlib-ng/deflate.c:224:12
#3 0x4a03c5 in test_large_deflate /src/zlib-ng/test/example.c:255:11
#4 0x4a38d2 in main /src/zlib-ng/test/example.c:539:5
#5 0x7fcace96a82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
sebpop [Wed, 7 Nov 2018 09:05:20 +0000 (03:05 -0600)]
integration of oss-fuzz in make test #204 (#206)
The requirements for an ideal integration of a project in oss-fuzz are:
https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md
- Is maintained by code owners in their RCS (Git, SVN, etc).
- Is built with the rest of the tests - no bit rot!
- Has a seed corpus with good code coverage.
- Is continuously tested on the seed corpus with ASan/UBSan/MSan
- Is fast and has no OOMs
- Has a fuzzing dictionary, if applicable
Sebastian Pop [Tue, 30 Oct 2018 15:42:49 +0000 (10:42 -0500)]
Fix test/example.c when compiled with ASAN
Before this patch
cmake -DWITH_SANITIZERS=1
make
make test
used to fail with:
Running tests...
Test project /home/hansr/github/zlib/zlib-ng
Start 1: example
1/2 Test #1: example ..........................***Failed 0.14 sec
Start 2: example64
2/2 Test #2: example64 ........................***Failed 0.13 sec
==11605==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x62e000000595,0x62e0000053b5) and [0x62e000000400, 0x62e000005220) overlap
#0 0x7fab3bcc9662 in __asan_memcpy (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x8c662)
#1 0x40f936 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 0x40f936 in read_buf /home/spop/s/zlib-ng/deflate.c:1122
#3 0x410458 in deflate_stored /home/spop/s/zlib-ng/deflate.c:1394
#4 0x4133d7 in zng_deflate /home/spop/s/zlib-ng/deflate.c:945
#5 0x402253 in test_large_deflate /home/spop/s/zlib-ng/test/example.c:275
#6 0x4014e8 in main /home/spop/s/zlib-ng/test/example.c:536
#7 0x7fab3b89382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#8 0x4018e8 in _start (/work/spop/zlib-ng/example+0x4018e8)
0x62e000000595 is located 405 bytes inside of 40000-byte region [0x62e000000400,0x62e00000a040)
allocated by thread T0 here:
#0 0x7fab3bcd579a in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x9879a)
#1 0x40147a in main /home/spop/s/zlib-ng/test/example.c:516
0x62e000000400 is located 0 bytes inside of 40000-byte region [0x62e000000400,0x62e00000a040)
allocated by thread T0 here:
#0 0x7fab3bcd579a in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x9879a)
#1 0x40147a in main /home/spop/s/zlib-ng/test/example.c:516
fix bug #183 following recommendations of Mika Lindqvist
> the problem is in line c_stream.avail_in = (unsigned int)comprLen/2;
> which feeds it too much data ... it should cap it to
> c_stream.next_out - compr instead.
Sebastian Pop [Wed, 31 Oct 2018 19:49:03 +0000 (14:49 -0500)]
fix ASAN crash on test/minigzip
Before this patch, when configuring with address sanitizer:
./configure --with-sanitizers
make
make test
used to fail with the following error:
$ echo hello world | ./minigzip
ASAN:SIGSEGV
=================================================================
==17466==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000fc80 (pc 0x7fcacddd46f8 bp 0x7ffd01ceb310 sp 0x7ffd01ceb290 T0)
#0 0x7fcacddd46f7 in _IO_fwrite (/lib/x86_64-linux-gnu/libc.so.6+0x6e6f7)
#1 0x402602 in zng_gzwrite /home/spop/s/zlib-ng/test/minigzip.c:180
#2 0x403445 in gz_compress /home/spop/s/zlib-ng/test/minigzip.c:305
#3 0x404724 in main /home/spop/s/zlib-ng/test/minigzip.c:509
#4 0x7fcacdd8682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#5 0x4018d8 in _start (/work/spop/zlib-ng/minigzip+0x4018d8)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ??:0 _IO_fwrite
==17466==ABORTING
During compilation the following warnings point to a missing definition:
/home/spop/s/zlib-ng/test/minigzip.c:154:31: warning: implicit declaration of function 'fdopen' is invalid in C99 [-Wimplicit-function-declaration]
gz->file = path == NULL ? fdopen(fd, gz->write ? "wb" : "rb") :
^
/home/spop/s/zlib-ng/test/minigzip.c:154:29: warning: pointer/integer type mismatch in conditional expression ('int' and 'FILE *' (aka 'struct _IO_FILE *')) [-Wconditional-type-mismatch]
gz->file = path == NULL ? fdopen(fd, gz->write ? "wb" : "rb") :
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/spop/s/zlib-ng/test/minigzip.c:504:36: warning: implicit declaration of function 'fileno' is invalid in C99 [-Wimplicit-function-declaration]
file = PREFIX(gzdopen)(fileno(stdin), "rb");
^
/home/spop/s/zlib-ng/test/minigzip.c:508:36: warning: implicit declaration of function 'fileno' is invalid in C99 [-Wimplicit-function-declaration]
file = PREFIX(gzdopen)(fileno(stdout), outmode);
^
/home/spop/s/zlib-ng/test/minigzip.c:534:48: warning: implicit declaration of function 'fileno' is invalid in C99 [-Wimplicit-function-declaration]
file = PREFIX(gzdopen)(fileno(stdout), outmode);
^
5 warnings generated.
and looking at stdio.h that defines fdopen we see that it is only defined under
__USE_POSIX:
#ifdef __USE_POSIX
/* Create a new stream that refers to an existing system file descriptor. */
extern FILE *fdopen (int __fd, const char *__modes) __THROW __wur;
#endif
This patch fixes the compiler warnings and the runtime ASAN error.
Revert "[ARM/AArch64] Add run-time detection of ACLE and NEON instructions under Linux. * Use getauxval() to check support for ACLE CRC32 instructions * Allow disabling CRC32 instruction check"
Mika Lindqvist [Tue, 13 Mar 2018 09:26:19 +0000 (11:26 +0200)]
[ARM/AArch64] Add run-time detection of ACLE and NEON instructions under Linux.
* Use getauxval() to check support for ACLE CRC32 instructions
* Allow disabling CRC32 instruction check
Tell compiler to adhere to C99 standards.
Exception being newer cmake versions that will decay to gnu99 in
certain situations. This decay currently hides a warning in minigzip,
but using C99 with C_STANDARD_REQUIRED on could potentially introduce
unknown problems on other platforms, so for now we will allow this decay.
Sebastian Pop [Mon, 24 Sep 2018 14:57:48 +0000 (09:57 -0500)]
fix bug #207: avoid undefined integer overflow
zlib-ng used to fail when compiled with UBSan with this error:
deflate_slow.c:112:21: runtime error: unsigned integer overflow: 45871 - 45872 cannot be represented in type 'unsigned int'
The bug occurs in code added to zlib-ng under `#ifndef NOT_TWEAK_COMPILER`.
The original code of zlib contains a loop with two induction variables:
s->prev_length -= 2;
do {
if (++s->strstart <= max_insert) {
functable.insert_string(s, s->strstart, 1);
}
} while (--s->prev_length != 0);
The function insert_string is not executed when
!(++s->strstart <= max_insert)
i.e., when
!(s->strstart + 1 <= max_insert)
!(s->strstart < max_insert)
max_insert <= s->strstart
The function insert_string is executed when
++s->strstart <= max_insert
i.e., when
s->strstart + 1 <= max_insert
s->strstart < max_insert
The function is executed at most `max_insert - s->strstart` times, following the
exit condition of the do-while `(--s->prev_length != 0)`. If the loop exits
after evaluating the exit condition once, the function is executed once
independently of `max_insert - s->strstart`. The number of times the function
executes is the minimum between the number of iterations in the do-while loop
and `max_insert - s->strstart`.
The number of iterations of the loop is `mov_fwd = s->prev_length - 2`, and we
know that this is at least one as otherwise `--s->prev_length` would overflow.
The number of times the function insert_string is called is
`min(mov_fwd, max_insert - s->strstart)`
Sebastian Pop [Wed, 15 Aug 2018 20:28:41 +0000 (15:28 -0500)]
fix #187: remove errors exposed by undefined behavior sanitizer
Move decrement in loop to avoid the following errors:
adler32.c:91:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
adler32.c:136:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
inflate.c:972:32: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
Fix the following bugs as recommended by Mika Lindqvist:
arch/x86/deflate_quick.c:233:22: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
arch/x86/fill_window_sse.c:52:28: runtime error: unsigned integer overflow: 1 - 8192 cannot be represented in type 'unsigned int'
Fix ZLIB_COMPAT=OFF and WITH_GZFILEOP=ON compilation failure.
Also add this combination to travis testing.
Remove --native testing from travis, since they somehow make this fail very often,
probably due to caching or running the executables on a different platform than
the compiler thinks it is running on.
Sebastian Pop [Wed, 15 Aug 2018 19:14:24 +0000 (14:14 -0500)]
fix bug #184: clear out buf to avoid msan use-of-uninitialized-value
Do not use bzero as suggested by Mika Lindqvist:
> You shouldn't use bzero() in new code as some compilers, like Visual C++,
> don't have it... New code should just use memset().
==4908==ERROR: MemorySanitizer: SEGV on unknown address 0x730fffffffff (pc 0x0000004b1b97 bp 0x7ffd4bf59a00 sp 0x7ffd4bf598a0 T4908)
==4908==The signal is caused by a READ memory access.
#0 0x5a0599 in fizzle_matches zlib-ng/deflate_medium.c:168:12
#1 0x59ea27 in deflate_medium zlib-ng/deflate_medium.c:296:21
#2 0x5901c5 in zng_deflate zlib-ng/deflate.c:951:18
#3 0x586955 in zng_compress2 zlib-ng/compress.c:59:15
#4 0x5861eb in LLVMFuzzerTestOneInput zlib-ng/test/fuzz/compress_fuzzer.c:18:3
#5 0x4e9b48 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:575:15
#6 0x4a2f66 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:280:6
#7 0x4b3adb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
#8 0x4a2091 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#9 0x7fa3d7ff582f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
#10 0x41ec68 in _start
Sebastian Pop [Tue, 21 Aug 2018 14:41:12 +0000 (09:41 -0500)]
fix bugs #186 and #191, oss-fuzz/9831: use-of-uninitialized-value
==1==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x59fa93 in deflate_medium zlib-ng/deflate_medium.c:259:21
#1 0x590905 in zng_deflate zlib-ng/deflate.c:951:18
#2 0x587095 in zng_compress2 zlib-ng/compress.c:59:15
#3 0x5866e3 in check_compress_level zlib-ng/test/fuzz/compress_fuzzer.c:18:3
#4 0x5862fd in LLVMFuzzerTestOneInput zlib-ng/test/fuzz/compress_fuzzer.c:38:3
#5 0x4e9b48 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:575:15
#6 0x4a2f66 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:280:6
#7 0x4b3adb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
#8 0x4a2091 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#9 0x7fea2fea482f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
#10 0x41ec68 in _start
Uninitialized value was created by a heap allocation
#0 0x45f2a0 in malloc /src/llvm/projects/compiler-rt/lib/msan/msan_interceptors.cc:910
#1 0x587d42 in zng_deflateInit2_ zlib-ng/deflate.c:284:27
#2 0x5874fa in zng_deflateInit_ zlib-ng/deflate.c:224:12
#3 0x586c95 in zng_compress2 zlib-ng/compress.c:41:11
#4 0x5866e3 in check_compress_level zlib-ng/test/fuzz/compress_fuzzer.c:18:3
#5 0x5862fd in LLVMFuzzerTestOneInput zlib-ng/test/fuzz/compress_fuzzer.c:38:3
#6 0x4e9b48 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:575:15
#7 0x4a2f66 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:280:6
#8 0x4b3adb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
#9 0x4a2091 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#10 0x7fea2fea482f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
Sebastian Pop [Fri, 24 Aug 2018 04:28:50 +0000 (23:28 -0500)]
fix #197, oss-fuzz/10036: only write 4 bytes per iteration in deflate_quick
by aggregating the two consecutive values to be written by static_emit_ptr to
s->pending_buf and writing the two values at once in a 4 byte store, we avoid
running out of the allocated buffer. We used to call quick_send_bits twice and
bumped the counter s->pending in the first call, which made the second call
write to memory beyond the safe 4 bytes that were guaranteed by the following
condition in the enclosing loop in deflate_quick:
if (s->pending + 4 >= s->pending_buf_size) {
flush_pending(s->strm);
The bug was exposed by the memory sanitizer like so:
MemorySanitizer:DEADLYSIGNAL
--
| ==1==ERROR: MemorySanitizer: SEGV on unknown address 0x730000020000 (pc 0x0000005b6ce4 bp 0x7fff59adb5e0 sp 0x7fff59adb570 T1)
| ==1==The signal is caused by a WRITE memory access.
| #0 0x5b6ce3 in quick_send_bits zlib-ng/arch/x86/deflate_quick.c:134:48
| #1 0x5b5752 in deflate_quick zlib-ng/arch/x86/deflate_quick.c:243:21
| #2 0x590a15 in zng_deflate zlib-ng/deflate.c:952:18
| #3 0x587165 in zng_compress2 zlib-ng/compress.c:59:15
| #4 0x5866d3 in check_compress_level zlib-ng/test/fuzz/compress_fuzzer.c:22:3
| #5 0x5862d8 in LLVMFuzzerTestOneInput zlib-ng/test/fuzz/compress_fuzzer.c:74:3
| #6 0x4e9b48 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:575:15
| #7 0x4a2f66 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:280:6
| #8 0x4b3adb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
| #9 0x4a2091 in main /src/libfuzzer/FuzzerMain.cpp:20:10
| #10 0x7fb8919b082f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
| #11 0x41ec68 in _start
| MemorySanitizer can not provide additional info.
| SUMMARY: MemorySanitizer: SEGV (/mnt/scratch0/clusterfuzz/slave-bot/builds/clusterfuzz-builds_zlib-ng_7ead0a3e4980f024583384fd355b6e3ddd4b2ca2/revisions/compress_fuzzer+0x5b6ce3)
replaced include_directories() with target_include_directories()
using target_include_directories() with the zlib libraries prevents people from having to manually include those directories when linking to those libraries
Mika Lindqvist [Fri, 23 Mar 2018 12:48:53 +0000 (14:48 +0200)]
Separate feature checks for x86 and x86_64
* Don't check for SSE2 on anything else than i685
* Don't check for PCLMULQDQ on anything else than i686 or x86_64
* Check for SSE4.2 CRC intrinsics
Mika Lindqvist [Thu, 22 Mar 2018 09:01:18 +0000 (11:01 +0200)]
[MSVC] Fix size_t/ssize_t when using ZLIB_COMPAT. (#161)
* zconf.h.in wasn't including Windows.h, that is correct header to include definitions from BaseTsd.h, and such missing required type definition for ssize_t when compiling using MS Visual C++
* Various places need implicit casting to z_size_t to get around compatibility issues, this will truncate the result when ZLIB_COMPAT is defined, calling code should check for truncation.
* Add ZLIB_COMPAT flag to nmake Makefile and use it to determine correct
filenames instead of WITH_GZFILEOP
Richael Zhuang [Wed, 14 Mar 2018 09:31:58 +0000 (17:31 +0800)]
Fix the problem about rule to make target "zconf.h" on Arm platforms
If building zlib-ng with --acle option on Arm platforms, the building
process will stop in the meantime with the message "No rule to make
target zconf.h needed by crc32_acle.o".
This patch fixes the problem by including zconf.h or zconf-ng.h
according to the fact that whether ZLIB_COMPAT is defined or not in
crc32_acle.c.
Mika Lindqvist [Fri, 16 Feb 2018 15:15:46 +0000 (17:15 +0200)]
[compat] Don't check for ZLIB_COMPAT
* ZLIB_COMPAT is always implied if using zlib.h
* Revert z_stream->adler to "unsigned long" to enforce correct alignment
of struct members
richael02 [Fri, 16 Feb 2018 10:53:22 +0000 (18:53 +0800)]
Fix build problems about NEON (#149)
* Fix build problems about NEON on AArch64
NEON is enabled by default on armv8-a platforms, and so NEON related
objects should be included when the platform is armv8-a. Errors about
adler32_neon will occur when you run ./configure on armv8-a platforms
without --neon option, because zlib-ng uses --neon option to include
NEON related objects regardless of Arm architecture.
You will have similar issue when you build the project with cmake.
This patch fixes the problem by including NEON related objects when
the platform is armv8-a(including aarch64).
Use adler32_neon only when zlib-ng is configured with --neon (or
-DWITH_NEON=ON if using cmake), or else use the default adler32
no matter what Arm architecture is.
Samuel Williams [Fri, 16 Feb 2018 10:49:55 +0000 (23:49 +1300)]
Prefer memcpy and memcmp over direct memory read/comparisons. (#135)
* Prefer memcpy and memcmp over direct memory read/comparisons.
* Some platforms have alignment requirements and unaligned direct memory
read/comparisons may result in undefined behaviour.
* Prefer memcpy and memcmp which are lowered to efficient assembly where
possible.
Richael Zhuang [Tue, 6 Feb 2018 05:48:00 +0000 (13:48 +0800)]
Fix the bug in crc32_acle
On armv8-a platforms if --acle is enabled, zlib-ng will use crc32_acle
instead of the default crc32. However, in crc32_acle the __crc32b() is
used to calculate the crc result of two variables with types uint32_t
and uint64_t, which gives an error result.The correct function used
should be __crc32d().
Richael Zhuang [Tue, 13 Feb 2018 02:51:02 +0000 (10:51 +0800)]
Fix dependency problem about cmake options
According to the content of CMakeLists.txt, if building with "-DZLIB_COMPAT=ON",
the value of WITH_GZFILEOP should be ON too. However, WITH_GZFILEOP is OFF
actually when you run "cmake .. -DZIB_COMPAT=ON", which will cause errors if you
use gzfile related functions.
This patch fixes the problem by adjusting the position of WITH_GZFILEOP
option.
Add function prefix (zng_) to all exported functions to allow zlib-ng
to co-exist in an application that has been linked to something that
depends on stock zlib. Previously, that would cause random problems
since there is no way to guarantee what zlib version is being used
for each dynamically linked function.
Add the corresponding zlib-ng.h.
Tests, example and minigzip will not compile before they have been
adapted to use the correct functions as well.
Either duplicate them, so we have minigzip-ng.c for example, or add
compile-time detection in the source code.
nmlgc [Tue, 20 Jun 2017 20:12:42 +0000 (22:12 +0200)]
configure: For Windows builds, add the CROSS_PREFIX to $RC and $STRIP.
zlib's original win32/Makefile.gcc did the same, but this was removed in 7d17132436431d5f62cf5089623073d72d07deb0. It is kind of essential for
cross-compiling a Win32 build on Linux, since `windres` most certainly
doesn't exist, and the regular `strip` may not be able to handle DLLs.
It should probably actually be something like
RC="${RC-${CROSS_PREFIX}windres}"
and
STRIP="${STRIP-${CROSS_PREFIX}strip}"
to be consistent with the assignments of $AR, $RANLIB and $NM, but this
didn't work for some reason.
R.J.V. Bertin [Tue, 23 May 2017 17:46:55 +0000 (19:46 +0200)]
ZLIB_COMPAT: add an extra 32 bits of padding in z_stream
zlib "stock" uses an "uLong" for zstream::adler, meaning 4 bytes in 64
bit bits. The padding makes zlib-ng a drop-in replacement for libz; without,
the deflateInit2_() function returns a version error when called from
dependents that were built against "stock" zlib.
R.J.V. Bertin [Tue, 23 May 2017 17:32:53 +0000 (19:32 +0200)]
various CMake fixes:
- on Mac, builds can target 1 or more architectures that are not the host
architecture. Pick the first from the list and ignore the others.
A more complete implementation would warn if i386 and x86_64 builds are
mixed via the compiler options.
- use CMake's compiler IDs to detect GCC and Clang (should be applied to
icc too but I can't test)
- disable PCLMUL optimisation in 32bit Mac builds. It crashes and provides
very little gain (to builds that are probably increasingly rare)
Mat [Mon, 29 May 2017 09:06:26 +0000 (11:06 +0200)]
Fix: wrong register for BMI1 bit (#112)
The BMI1 bit is in the ebx register and not in ecx.
See reference: https://software.intel.com/sites/default/files/article/405250/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family.pdf
Mika Lindqvist [Wed, 3 May 2017 17:14:57 +0000 (20:14 +0300)]
Lazily initialize functable members. (#108)
- Split functableInit() function as separate functions for each functable member, so we don't need to initialize full functable in multiple places in the zlib-ng code, or to check for NULL on every invocation.
- Optimized function for each functable member is detected on first invocation and the functable item is updated for subsequent invocations.
- Remove NULL check in adler32() and adler32_z() as it is no longer needed.
- Add adler32 to functable
- Add missing call to functableinit from inflateinit
- Fix external direct calls to adler32 functions without calling functableinit
Mika Lindqvist [Mon, 24 Apr 2017 09:22:11 +0000 (12:22 +0300)]
ARM optimizations part 2 (#107)
* add adler32_neon to main dependency checking and ARM/Windows Makefile
* split non-optimized adler32 to adler32_c so we can test/compare both without recompiling.
* add detection of default floating point ABI in gcc
NOTE: This should avoid build error when gcc supports both ABIs but header for just one ABI is installed.
Add a struct func_table and function functableInit.
The struct contains pointers to select functions to be used by the
rest of zlib, and the init function selects what functions will be
used depending on what optimizations has been compiled in and what
instruction-sets are available at runtime.
Tests done on a haswell cpu running minigzip -6 compression of a
40M file shows a 2.5% decrease in branches, and a 25-30% reduction
in iTLB-loads. The reduction i iTLB-loads is likely mostly due to
the inability to inline functions. This also causes a slight
performance regression of around 1%, this might still be worth it
to make it much easier to implement new optimized functions for
various architectures and instruction sets.
The performance penalty will get smaller for functions that get more
alternative implementations to choose from, since there is no need
to add more branches to every call of the function.
Today insert_string has 1 branch to choose insert_string_sse
or insert_string_c, but if we also add for example insert_string_sse4
then that would have needed another branch, and it would probably
at some point hinder effective inlining too.
The checksum is calculated in the uncompressed PNG data and can be
made much faster by using SIMD. Tests in ARMv8 yielded an improvement
of about 3x (e.g. walltime was 350ms x 125ms for a 4096x4096 bytes
executed 30 times).
This yields an improvement in image decoding in Chromium around 18%
(see https://bugs.chromium.org/p/chromium/issues/detail?id=688601).
Sebastian Pop [Thu, 16 Mar 2017 15:43:36 +0000 (10:43 -0500)]
inflate: improve performance of memory copy operations
When memory copy operations happen byte by byte, the processors are unable to
fuse the loads and stores together because of aliasing issues. This patch
clusters some of the memory copy operations in chunks of 16 and 8 bytes.
For byte memset, the compiler knows how to prepare the chunk to be stored.
When the memset pattern is larger than a byte, this patch builds the pattern for
chunk memset using the same technique as in Simon Hosie's patch
https://codereview.chromium.org/2722063002
This patch improves by 50% the performance of zlib decompression of a 50K PNG on
aarch64-linux and x86_64-linux when compiled with gcc-7 or llvm-5.
The number of executed instructions reported by valgrind --tool=cachegrind
on the decompression of a 50K PNG file on aarch64-linux:
- before the patch:
I refs: 3,783,757,451
D refs: 1,574,572,882 (869,116,630 rd + 705,456,252 wr)
- with the patch:
I refs: 2,391,899,214
D refs: 899,359,836 (516,666,051 rd + 382,693,785 wr)
The compression of a 260MB directory containing the code of llvm into a tar.gz
of 35MB and decompressing that with minigzip -d
on i7-4790K x86_64-linux, it takes 0.533s before the patch and 0.493s with the patch,
on Juno-r0 aarch64-linux A57, it takes 2.796s before the patch and 2.467s with the patch,
on Juno-r0 aarch64-linux A53, it takes 4.055s before the patch and 3.604s with the patch.