Add optimized crc32 for POWER8 and later processors
This commit adds an optimized version of the crc32 function based
on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ .
The code has been relicensed to the zlib license.
This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com>
It makes use of vector instructions to speed up CRC32 algorithm. Decompression
times were improved by +30% on tests.
Based on Daniel Black's work for the original zlib (madler/zlib#478).