Baldur Karlsson [Tue, 13 Mar 2018 20:02:21 +0000 (20:02 +0000)]
Remove non-ASCII characters in header file comments
* Replaced a non-breaking space and an en dash with a plain space and
a hyphen.
* This means the files are simple ASCII and less likely to run into
codepage issues.
Yann Collet [Mon, 5 Mar 2018 21:50:07 +0000 (13:50 -0800)]
fix benchmark issue when measuring only decoding speed
zstd bench module can focus on decompression speed _only_.
This is useful when trying to measure performance
on large input data compressed using a high level
as compression time becomes problematic (too long).
This mode is triggered by command : zstd -b -d
Problem was : in such a mode,
measured decoding speed was > 10% slower
than in nominal mode (compression + decompression),
making decompression benchmark mode much less useful.
This patch fixes the issue.
It's not completely clear why, but
moving the `memcpy()` operation sooner in the pipeline fixed it.
I can still measure some difference, but it is in the < 2% range,
so it's much more tolerable.
also : it doesn't matter anymore in which order are selected
commands `-b` and `-d`.
The combination always triggers bench_decodeOnly mode.
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value. Still, appeasing Coverity is pretty harmless in this case.)
Yann Collet [Mon, 26 Feb 2018 22:52:23 +0000 (14:52 -0800)]
minor cleaning of huff0
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
Nick Terrell [Wed, 21 Feb 2018 03:34:43 +0000 (19:34 -0800)]
Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
stores them in a table. It should work with any chunk size, but
is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
instead of encoding the literals directly, it passes them to a
secondary block compressor. The code to handle chunk sizes greater
than the block size is currently commented out, since it is unused.
The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
`targetLength`. Also don't emit repcode matches in the LDM block
compressor. Enabling the LDM with the optimal parser now actually improves
the compression ratio.
* The compression ratio is very similar to before. It is very slightly
different, because the repcode handling is slightly different. If I remove
immediate repcode checking in both branches the compressed size is exactly
the same.
* The speed looks to be the same or better than before.
Up Next (in a separate PR)
--------------------------
Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
Yann Collet [Thu, 22 Feb 2018 07:52:45 +0000 (23:52 -0800)]
Implemented BMI2 functions directly within huf_decompress.c
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).
The methology followed seems broadly applicable to other BMI2 modules.
Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.
Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
Yann Collet [Tue, 20 Feb 2018 22:48:09 +0000 (14:48 -0800)]
improve benchmark measurement for small inputs
by invoking time() once per batch, instead of once per compression / decompression.
Batch is dynamically resized so that each round lasts approximately 1 second.
Yann Collet [Wed, 7 Feb 2018 22:22:35 +0000 (14:22 -0800)]
Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).
Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
Mileage vary, depending on file, and cpu type.
But a generic rule is : x86 benefits less from "long-offset mode" than x64,
maybe due to register pressure.
On "entropy", long-mode is _never_ a win for x86.
On my laptop though, it may, depending on file and compression level
(enwik8 benefits more from "long-mode" than silesia).