After commit
e0190a9d1 (wc: improve aarch64 Neon optimization for
'wc -l', 2026-03-09), on a Ampere eMAG machine:
$ yes | head -n
10000000000 > input
$ (time ./src/wc -l input)
10000000000 input
real 0m3.447s
user 0m1.533s
sys 0m1.913s
$ (export GLIBC_TUNABLES='glibc.cpu.hwcaps=-ASIMD,-AVX2,-AVX512F'; \
time ./src/wc -l input)
10000000000 input
real 0m15.758s
user 0m14.039s
sys 0m1.720s
* NEWS: Mention the improved benchmark.
'shuf -i' now operates up to two times faster on systems with unlocked stdio
functions.
- 'wc -l' now operates up to three times faster on hosts that support Neon
- instructions.
+ 'wc -l' now operates up to four and a half times faster on hosts that support
+ Neon instructions.
'yes' now uses zero-copy I/O on Linux to significantly increase throughput.
E.g., increases from 12GiB/s to 175GiB/s were seen on some systems.