Uses SSE2 subtraction with saturation to shift the hash in
16B chunks. Renames the old fill_window implementation to
fill_window_c(), and adds a new fill_window_sse() implementation
in fill_window_sse.c.
Moves UPDATE_HASH into deflate.h and changes the scope of
read_buf from local to ZLIB_INTERNAL for sharing between
the two implementations.
Updates the configure script to check for SSE2 intrinsics and enables
this optimization by default on x86. The runtime check for SSE2 support
only occurs on 32-bit, as x86_64 requires SSE2. Adds an explicit
rule in Makefile.in to build fill_window_sse.c with the -msse2 compiler
flag, which is required for SSE2 intrinsics.