wc: use avx2 optimization when counting only lines
Use cpuid to detect CPU support for avx2 instructions.
Performance was seen to improve by 5x for a file with only newlines,
while the performance for a file with no such characters is unchanged.
* configure.ac [USE_AVX2_WC_LINECOUNT]: A new conditional,
set when __get_cpuid_count() and avx2 compiler intrinsics are supported.
* src/wc.c (avx2_supported): A new function using __get_cpuid_count()
to determine if avx2 instructions are supported.
(wc_lines): A new function refactored from wc(),
which implements the standard line counting logic,
and provides the fallback implementation for when avx2 is not supported.
* src/wc_avx2.c: A new module to implement using avx2 intrinsics.
* src/local.mk: Reference the new module. Note we build as a separate
lib so that it can be portably built with separate -mavx2 etc. flags.