]> git.ipfire.org Git - thirdparty/zstd.git/commit
[huf] Add generic C versions of the fast decoding loops
authorNick Terrell <terrelln@fb.com>
Sat, 14 Jan 2023 00:34:52 +0000 (16:34 -0800)
committerNick Terrell <nickrterrell@gmail.com>
Wed, 25 Jan 2023 21:47:51 +0000 (13:47 -0800)
commit8957fef554e844ef724022075ffdf740464aa515
tree4ca9fa3eaeef219cb743328cd10c8a73c54bb6f5
parentf3255bfeffe6e32c79d90a3757b0899e3cf6a7a9
[huf] Add generic C versions of the fast decoding loops

Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
13 files changed:
lib/common/entropy_common.c
lib/common/huf.h
lib/common/portability_macros.h
lib/decompress/huf_decompress.c
lib/decompress/huf_decompress_amd64.S
lib/decompress/zstd_decompress.c
lib/decompress/zstd_decompress_block.c
lib/decompress/zstd_decompress_internal.h
lib/zstd.h
tests/fuzz/huf_decompress.c
tests/fuzz/huf_round_trip.c
tests/fuzzer.c
tests/zstreamtest.c