Jim Kukunas [Thu, 18 Jul 2013 22:45:18 +0000 (15:45 -0700)]
deflate: add new deflate_medium strategy
From: Arjan van de Ven <arjan@linux.intel.com>
As the name suggests, the deflate_medium deflate strategy is designed
to provide an intermediate strategy between deflate_fast and deflate_slow.
After finding two adjacent matches, deflate_medium scans left from
the second match in order to determine whether a better match can be
formed.
Jim Kukunas [Thu, 18 Jul 2013 20:19:05 +0000 (13:19 -0700)]
deflate: add new deflate_quick strategy for level 1
The deflate_quick strategy is designed to provide maximum
deflate performance.
deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
table
- forcing the static Huffman tree and emitting codes immediately
instead of tallying
This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.
Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.
Jim Kukunas [Thu, 11 Jul 2013 20:49:05 +0000 (13:49 -0700)]
add PCLMULQDQ optimized CRC folding
Rather than copy the input data from strm->next_in into the window and
then compute the CRC, this patch combines these two steps into one. It
performs a SSE memory copy, while folding the data down in the SSE
registers. A final step is added, when we write the gzip trailer,
to reduce the 4 SSE registers to 32b.
Adds some extra padding bytes to the window to allow for SSE partial
writes.
Jim Kukunas [Thu, 18 Jul 2013 18:40:09 +0000 (11:40 -0700)]
add SSE4.2 optimized hash function
For systems supporting SSE4.2, use the crc32 instruction as a fast
hash function. Also, provide a better fallback hash.
For both new hash functions, we hash 4 bytes, instead of 3, for certain
levels. This shortens the hash chains, and also improves the quality
of each hash entry.
Jim Kukunas [Tue, 2 Jul 2013 19:09:37 +0000 (12:09 -0700)]
Adds SSE2 optimized hash shifting to fill_window.
Uses SSE2 subtraction with saturation to shift the hash in
16B chunks. Renames the old fill_window implementation to
fill_window_c(), and adds a new fill_window_sse() implementation
in fill_window_sse.c.
Moves UPDATE_HASH into deflate.h and changes the scope of
read_buf from local to ZLIB_INTERNAL for sharing between
the two implementations.
Updates the configure script to check for SSE2 intrinsics and enables
this optimization by default on x86. The runtime check for SSE2 support
only occurs on 32-bit, as x86_64 requires SSE2. Adds an explicit
rule in Makefile.in to build fill_window_sse.c with the -msse2 compiler
flag, which is required for SSE2 intrinsics.
Jim Kukunas [Mon, 1 Jul 2013 18:18:26 +0000 (11:18 -0700)]
Tune longest_match implementation
Separates the byte-by-byte and short-by-short longest_match
implementations into two separately tweakable versions and
splits all of the longest match functions into a separate file.
Split the end-chain and early-chain scans and provide likely/unlikely
hints to improve branh prediction.
Add an early termination condition for levels 5 and under to stop
iterating the hash chain when the match length for the current
entry is less than the current best match.
Also adjust variable types and scopes to provide better optimization
hints to the compiler.
Jim Kukunas [Wed, 17 Jul 2013 17:34:56 +0000 (10:34 -0700)]
Add preprocessor define to tune Adler32 loop unrolling.
Excessive loop unrolling is detrimental to performance. This patch
adds a preprocessor define, ADLER32_UNROLL_LESS, to reduce unrolling
factor from 16 to 8.
Mark Adler [Sun, 14 Apr 2013 17:31:31 +0000 (10:31 -0700)]
Do not force Z_CONST for C++.
Forcing Z_CONST resulted in an issue when compiling Firefox. Now
if someone wants to compile zlib as C++ code (which it isn't), now
they will need to #define Z_CONST themselves.
Mark Adler [Mon, 25 Mar 2013 05:12:31 +0000 (22:12 -0700)]
Do not return Z_BUF_ERROR if deflateParam() has nothing to write.
If the compressed data was already at a block boundary, then
deflateParam() would report Z_BUF_ERROR, because there was nothing
to write. With this patch, Z_OK is returned in that case.
Mark Adler [Sun, 24 Mar 2013 05:27:43 +0000 (22:27 -0700)]
Remove runtime check in configure for four-byte integer type.
That didn't work when cross-compiling. Simply rely on limits.h.
If a compiler does not have limits.h, then zconf.h.in should be
modified to define Z_U4 as an unsiged four-byte integer type in
order for crc32() to be fast.
This also simplifies and makes more portable to check for a four-
byte type using limits.h.
Mark Adler [Tue, 19 Feb 2013 05:06:35 +0000 (21:06 -0800)]
Fix serious but very rare decompression bug in inftrees.c.
inftrees.c compared the number of used table entries to the maximum
allowed value using >= instead of >. This patch fixes those to use
>. The bug was discovered by Ignat Kolesnichenko of Yandex LC
where they have run petabytes of data through zlib. Triggering the
bug is apparently very rare, seeing as how it has been out there in
the wild for almost three years before being discovered. The bug
is instantiated only if the exact maximum number of decoding table
entries, ENOUGH_DISTS or ENOUGH_LENS is used by the block being
decoded, resulting in the false positive of overflowing the table.
Mark Adler [Tue, 2 Oct 2012 05:42:35 +0000 (22:42 -0700)]
Fix bug in gzclose() when gzwrite() runs out of memory.
If the deflateInit2() called for the first gzwrite() failed with a
Z_MEM_ERROR, then a subsequent gzclose() would try to free an
already freed pointer. This fixes that.
Mark Adler [Sun, 30 Sep 2012 05:23:47 +0000 (22:23 -0700)]
Fix bug where gzopen(), gzclose() would write an empty file.
A gzopen() to write (mode "w") followed immediately by a gzclose()
would output an empty zero-length file. What it should do is write
an empty gzip file, with the gzip header, empty deflate content,
and gzip trailer totalling 20 bytes. This fixes it to do that.
Mark Adler [Fri, 24 Aug 2012 22:02:28 +0000 (15:02 -0700)]
Fix unintialized value bug in gzputc() introduced by const patches.
Avoid the use of an uninitialized value when the write buffers have
not been initialized. A recent change to avoid the use of strm->
next_in in order to resolve some const conflicts added the use of
state->in in its place. This patch avoids the use of state->in
when it is not initialized. Nothing bad would actually happen,
since two variables set to the same unintialized value are
subtracted. However valgrind was rightly complaining. So this
fixes that.
Mark Adler [Sun, 19 Aug 2012 00:59:50 +0000 (17:59 -0700)]
Avoid shift equal to bits in type (caused endless loop).
Also clean up comparisons between different types, and some odd
indentation problems that showed up somehow.
A new endless loop was introduced by the clang compiler, which
apparently does odd things when the right operand of << is equal to
or greater than the number of bits in the type. The C standard in
fact states that the behavior of << is undefined in that case. The
loop was rewritten to use single-bit shifts.
Mark Adler [Mon, 13 Aug 2012 01:08:52 +0000 (18:08 -0700)]
Clean up the usage of z_const and respect const usage within zlib.
This patch allows zlib to compile cleanly with the -Wcast-qual gcc
warning enabled, but only if ZLIB_CONST is defined, which adds
const to next_in and msg in z_stream and in the in_func prototype.
A --const option is added to ./configure which adds -DZLIB_CONST
to the compile flags, and adds -Wcast-qual to the compile flags
when ZLIBGCCWARN is set in the environment.
Mark Adler [Sun, 10 Jun 2012 05:42:24 +0000 (22:42 -0700)]
Fix configure check for veracity of compiler error return codes.
There were two problems before that this fixes. One was that the
check for the compiler error return code preceded the determination
of the compiler and its options. The other was that the checks
for compiler and library characteristics could be fooled if the
error options were set to reject K&R-style C. configure now aborts
if the compiler produces a hard error on K&R-style C.
In addition, aborts of configure are now consistent, and remove
any temporary files.
Mark Adler [Sun, 10 Jun 2012 02:15:36 +0000 (19:15 -0700)]
On Darwin, only use /usr/bin/libtool if libtool is not Apple.
The original change was to always use /usr/bin/libtool on Darwin,
in order to avoid using a GNU libtool installed by the user in the
path ahead of Apple's libtool. However someone might install a
more recent Apple libtool ahead of /usr/bin/libtool. This commit
checks to see if libtool is Apple, and uses /usr/bin/libtool if it
isn't.
Mark Adler [Sun, 3 Jun 2012 19:45:55 +0000 (12:45 -0700)]
Use _snprintf for snprinf in Microsoft C.
More than a decade later, Microsoft C does not support the C99
standard. It's good that _snprintf has a different name, since it
does not guarantee that the result is null terminated, as does
snprintf. However where _snprintf is used under Microsoft C, the
destination string is assured to be long enough, so this will not
be a problem. This occurs in two places, both in gzlib.c. Where
sprintf functionality is needed by gzprintf, vsnprintf is used in
the case of Microsoft C.
Mark Adler [Thu, 3 May 2012 06:18:38 +0000 (23:18 -0700)]
Replace use of unsafe string functions with snprintf if available.
This avoids warnings in OpenBSD that apparently can't be turned
off whenever you link strcpy, strcat, or sprintf. When snprintf
isn't available, the use of the "unsafe" string functions has
always in fact been safe, since the lengths are all checked before
those functions are called.
We do not use strlcpy or strlcat, since they are not (yet) found on
all systems. snprintf on the other hand is part of the C standard
library and is very common.
Mark Adler [Sun, 29 Apr 2012 23:18:12 +0000 (16:18 -0700)]
Fix type mismatch between get_crc_table() and crc_table.
crc_table is made using a four-byte integer (when that can be
determined). However get_crc_table() returned a pointer to an
unsigned long, which could be eight bytes. This fixes that by
creating a new z_crc_t type for the crc_table.
This type is also used for the BYFOUR crc calculations that depend
on a four-byte type. The four-byte type can now be determined by
./configure, which also solves a problem where ./configure --solo
would never use BYFOUR. No the Z_U4 #define indicates that four-
byte integer was found either by ./configure or by zconf.h.