]> git.ipfire.org Git - thirdparty/zstd.git/commit
Improved seekable format ingestion speed for small frame size 3544/head
authorYann Collet <cyan@fb.com>
Fri, 10 Mar 2023 01:48:35 +0000 (17:48 -0800)
committerYann Collet <cyan@fb.com>
Fri, 10 Mar 2023 02:00:30 +0000 (18:00 -0800)
commit1df9f36c6c6cea08778d45a4adaf60e2433439a3
tree157e3ea9aba23f96863661dbc88aee5b9876696d
parentd55a6483d7ec63b9175301d1849577f03a968ffb
Improved seekable format ingestion speed for small frame size

As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.

This is improved in this PR,
by providing approximate parameter adaptation to the compression process.

Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.

These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.

The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
contrib/seekable_format/examples/parallel_compression.c
contrib/seekable_format/examples/parallel_processing.c
contrib/seekable_format/examples/seekable_compression.c
contrib/seekable_format/examples/seekable_decompression.c
contrib/seekable_format/tests/seekable_tests.c
contrib/seekable_format/zstd_seekable.h
contrib/seekable_format/zstdseek_compress.c