Matheus Castanho [Wed, 27 May 2020 13:06:09 +0000 (10:06 -0300)]
Add optimized slide_hash for POWER processors
This commit introduces a new slide_hash function that
uses VSX vector instructions to slide 8 hash elements at a time,
instead of just one as the standard code does.
Matheus Castanho [Mon, 25 May 2020 21:10:29 +0000 (18:10 -0300)]
Preparation for POWER optimizations
Add the scaffolding for future optimizations for POWER processors. Now
the build is capable of correctly detecting multiple processor
sub-architectures (ppc, ppc64 and ppc64le) and also if features
needed for the optimizations are available during build and runtime.
With these changes, adding a new optimized function for POWER should be
as simple as adding a new file under arch/power/, appending build
instructions to the build files and editing functable.c accordingly.
The UNALIGNED_OK flag is now also added by default for powerpc64le
targets.
Fixed multi-line assembly macro in UPDATE_HASH for MSVC when using ClangCl.
https://docs.microsoft.com/en-us/cpp/assembler/inline/defining-asm-blocks-as-c-macros
insert_string_tpl.h(44,5): error : cannot use more than one symbol in memory operand
insert_string_sse.c(28,13): message : expanded from macro 'UPDATE_HASH'
Fixed casting warnings in trees.c with count being int and freq being uint16_t.
Changed to use uint16_t for all counting variable types.
trees.c(431,45): warning C4244: '+=': conversion from 'int' to 'uint16_t', possible loss of data
trees.c(431,45): warning C4244: '+=': conversion from 'int' to 'uint16_t', possible loss of data
Fixed casting warnings in calls to put_short and put_short_msb.
deflate.c(845,32): warning C4244: 'function': conversion from 'unsigned int' to 'uint16_t', possible loss of data
deflate.c(894,50): warning C4244: 'function': conversion from 'unsigned int' to 'uint16_t', possible loss of data
Fixed casting warnings about copying len into pending buf, replaced with calls to put_short.
deflate.c(1413,45): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
deflate.c(1414,50): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
deflate.c(1415,46): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
deflate.c(1416,51): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
deflate_p.h(42,37): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
deflate_p.h(43,42): warning C4244: '=': conversion from 'unsigned int' to 'unsigned char', possible loss of data
Fixed signedness warning in calls to __cpuid and cpuidex.
x86.c(29,22): warning C4057: 'function': 'int *' differs in indirection to slightly different base types from 'unsigned int [4]'
x86.c(43,24): warning C4057: 'function': 'int *' differs in indirection to slightly different base types from 'unsigned int [4]'
Added ability to set window bits for switchlevels.
Initialize deflateInit with first level which is necessary for deflate_quick testing where initial window size is set to 8k.
Fixed zero length stored block left open when using Z_SYNC_FLUSH. Moved toggling of block_open back to deflate_quick since zero length stored block doesn't emit end of block code.
Unroll more in compare258_c for performance improvement.
Unify length count variable across all compare256 variants.
Early return without break for possible performance improvements.
Split tree emitting code into its own source header to be included by both trees.c and deflate_quick.c so that their functions can be statically linked for performance reasons.
Removed TRIGGER_LEVEL byte masking from INSERT_STRING and UPDATE_HASH due to poor performance on levels 6 and 9 especially with optimized versions of UPDATE_HASH.
Standardize insert_string functionality across architectures. Added unaligned conditionally compiled code for insert_string and quick_insert_string. Unify sse42 crc32 assembly between insert_string and quick_insert_string. Modified quick_insert_string to work across architectures.
Fixed segmentation fault in deflate_quick() when switching levels using deflateParam. deflateInit would be initialized with a window size greater than 8K then deflateParams called to switch to level 1 without updating to w_size and the fault would occur because deflate_quick was not checking w_size bounds on dist when accessing quick_dist_codes.
Pavel P [Tue, 24 Mar 2020 03:25:08 +0000 (09:25 +0600)]
Avoid unnecessary include of windows.h from zbuild.h
zbuild.h is included from every .c file of zlib-ng, which forces every translation unit to parse all windows system includes only to be able to typedef ssize_t. This change removes windows.h include from zbuild.h and ssize_t is instead defined in-line with equivalent defines from windows.h
Reduce size of 'match' struct to 8 bytes, this allows us to fit two
structs into a single cacheline, resulting in a measurable speedup
in deflate_medium.