Mika Lindqvist [Mon, 4 May 2015 17:08:12 +0000 (20:08 +0300)]
Fix building outside source tree
* SRCDIR points to relevant directory in source tree
* SRCTOP points to root of source tree in sub-Makefiles
* arch Makefiles use INCLUDES, and configure sets that depending on if we are
building inside source tree or outside.
* distclean cleans all files when building outside source tree
Daniel Axtens [Fri, 1 May 2015 05:56:21 +0000 (15:56 +1000)]
x86: Do not try X86_QUICK_STRATEGY without HAVE_SSE2_INTRIN
QUICK depends on fill_window_sse, and fails to link without it.
Therefore, disable QUICK_STRATEGY if we lack SSE2 support.
This could easily be worked around by making the QUICK code
fall back to regular fill_window, but it's probably not important:
if you care about speed you probably have SSE2.
Daniel Axtens [Fri, 1 May 2015 05:30:05 +0000 (15:30 +1000)]
Remove unneeded and confusing alwaysinline
alwaysinline expands to __attribute__((always_inline)).
This does not force gcc to inline the function. Instead, it allows gcc to
inline the function when complied without optimisations. (Normally, inline
functions are only inlined when compiled with optimisations.)[0]
alwaysinline was only used for bulk_insert_str, and it seems to be using it
in an attempt to force the function to be inlined. That won't work.
Furthermore, bulk_insert_str wasn't even declared inline, causing warnings.
Remove alwaysinline and replace with inline.
Remove the #defines, as they're no longer used.
Mika Lindqvist [Wed, 29 Apr 2015 13:49:43 +0000 (16:49 +0300)]
* Remove assembler targets and OBJA from Visual Studio makefile
* Fix creating manifest files in Visual Studio makefile
* Add missing dependency information for match.obj in Visual Studio
makefile
hansr [Thu, 6 Nov 2014 19:40:56 +0000 (20:40 +0100)]
Drop support for old systems in configure. The remaining ones should
ideally be tested by someone familiar with them and a decision made
whether to keep/remove/update the detection and settings for them.
hansr [Wed, 5 Nov 2014 12:54:07 +0000 (13:54 +0100)]
Remove support for ASMV and ASMINF defines and clean up match.c handling.
This makes it easier to implement support for ASM replacements using
configure parameters if needed later. Also since zlib-ng uses
compiler intrinsics, this needed a cleanup in any case.
Testing on a Raspberry Pi shows that -DUNALIGNED_OK and -DCRC32_UNROLL_LESS
both give a consistent performance gain, so enable these on the armv6 arch.
Also enabled -DADLER32_UNROLL_LESS on the untested assumption that it will
also be faster.
hansr [Tue, 14 Oct 2014 08:01:18 +0000 (10:01 +0200)]
Merge x86 and x86_64 handling in configure.
Add parameter to disable new strategies.
Add parameter to disable arch-specific optimizations.
(This is just the first few steps, more changes needed)
Shuxin Yang [Sun, 20 Apr 2014 22:50:33 +0000 (15:50 -0700)]
Minor enhancement to put_short() macro. This change saw marginal speedup
(about 0% to 3% depending on the compression level and input). I guess
the speedup likely arises from following facts:
1) "s->pending" now is loaded once, and stored once. In the original
implementation, it needs to be loaded and stored twice as the
compiler isn't able to disambiguate "s->pending" and
"s->pending_buf[]"
2) better code generations:
2.1) no instruction are needed for extracting two bytes from a short.
2.2) need less registers
2.3) stores to adjacent bytes are merged into a single store, albeit
at the cost of penalty of potentially unaligned access.
Shuxin Yang [Tue, 18 Mar 2014 01:17:23 +0000 (18:17 -0700)]
Restructure the loop, and see about 3% speedup in run time. I believe the
speedup arises from:
o. Remove the conditional branch in the loop
o. Remove some indirection memory accesses:
The memory accesses to "s->prev_length" s->strstart" cannot be promoted
to register because the compiler is not able to disambiguate them with
store-operation in INSERT_STRING()
o. Convert non-countable loop to countable loop.
I'm not sure if this change really contribute, in general, countable
loop is lots easier to optimized than non-countable loop.
shuxinyang [Mon, 10 Mar 2014 00:20:02 +0000 (17:20 -0700)]
Rewrite the loops such that gcc can vectorize them using saturated-sub
on x86-64 architecture. Speedup the performance by some 7% on my linux box
with corei7 archiecture.
The original loop is legal to be vectorized; gcc 4.7.* and 4.8.*
somehow fail to catch this case. There are still have room to squeeze
from the vectorized code. However, since these loops now account for about
1.5% of execution time, it is not worthwhile to sequeeze the performance
via hand-writing assembly.
The original loops are guarded with "#ifdef NOT_TWEAK_COMPILER". By
default, the modified version is picked up unless the code is compiled
explictly with -DNOT_TWEAK_COMPILER.
Jim Kukunas [Thu, 18 Jul 2013 22:45:18 +0000 (15:45 -0700)]
deflate: add new deflate_medium strategy
From: Arjan van de Ven <arjan@linux.intel.com>
As the name suggests, the deflate_medium deflate strategy is designed
to provide an intermediate strategy between deflate_fast and deflate_slow.
After finding two adjacent matches, deflate_medium scans left from
the second match in order to determine whether a better match can be
formed.