We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.
I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:
Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller. If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.
Nick Terrell [Tue, 2 Apr 2019 00:25:26 +0000 (17:25 -0700)]
[examples] Update streaming_decompression.c
Update to use the new streaming API. Making progress on Issue #1548.
Tested that it can decompress files produced by `streaming_compression`.
Tested that it can decompress two frames concatenated together.
Tested that it fails on corrupted data.
Nick Terrell [Tue, 2 Apr 2019 00:51:28 +0000 (17:51 -0700)]
Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.
Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.
I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
Update to use the new streaming API. Making progress on Issue #1548.
Tested that multiple files could be compressed, and that the output
is the same as calling `streaming_compression` multiple times with
the same compression level, and that it can be decompressed.
Nick Terrell [Fri, 22 Mar 2019 19:28:55 +0000 (12:28 -0700)]
[cover] Improvements for small or homogeneous data
* The algorithm would bail as soon as it found one epoch that
contained no new segments. Change it so it now has to fail
>= 10 times in a row (10 for fastcover, 10-100 for cover).
* The algorithm uses the `maxDict` size to decide the epoch size.
When this size is absurdly large, it causes tiny epochs. Lower
bound the epoch size at 10x the segment size, and warn the user
that their training set is too small.
Nick Terrell [Thu, 21 Mar 2019 22:17:41 +0000 (15:17 -0700)]
[lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
compression job. This allows the user to pass the dictionary before they
set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
Nick Terrell [Wed, 13 Mar 2019 22:23:24 +0000 (15:23 -0700)]
[libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.
If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
shakeelrao [Wed, 13 Mar 2019 08:23:07 +0000 (01:23 -0700)]
Fix incorrectly assigned value in ZSTD_errorFrameSizeInfo
As documented in `zstd.h`, ZSTD_decompressBound returns `ZSTD_CONTENTSIZE_ERROR`
if an error occurs (not `ZSTD_CONTENTSIZE_UNKNOWN`). This is consistent with
the error checking made in ZSTD_decompressBound, particularly line 545.
shakeelrao [Thu, 28 Feb 2019 08:42:49 +0000 (00:42 -0800)]
Provide an API function to estimate decompressed size.
Introduces a new utility function `ZSTD_findFrameCompressedSize_internal` which
is equivalent to `ZSTD_findFrameCompressSize`, but accepts an additional output
parameter `bound` that computes an upper-bound for the compressed data in the frame.
The new API function is named `ZSTD_decompressBound` to be consistent with
`zstd_compressBound` (the inverse operation). Clients will now be able to compute an upper-bound for
their compressed payloads instead of guessing a large size.
Nick Terrell [Sat, 16 Feb 2019 00:15:20 +0000 (16:15 -0800)]
[libzstd] Clean up parameter code
* Move all ZSTDMT parameter setting code to ZSTD_CCtxParams_*Parameter().
ZSTDMT now calls these functions, so we can keep all the logic in the
same place.
* Clean up `ZSTD_CCtx_setParameter()` to only add extra checks where needed.
* Clean up `ZSTDMT_initJobCCtxParams()` by copying all parameters by default,
and then zeroing the ones that need to be zeroed. We've missed adding several
parameters here, and it makes more sense to only have to update it if you
change something in ZSTDMT.
* Add `ZSTDMT_cParam_clampBounds()` to clamp a parameter into its valid
range. Use this to keep backwards compatibility when setting ZSTDMT parameters,
which clamp into the valid range.
Björn Ketelaars [Mon, 11 Feb 2019 23:03:11 +0000 (00:03 +0100)]
Detect symbolic links on OpenBSD
In #1520 it is described that FreeBSD doesn't detect symbolic links. The
same is true for OpenBSD. This diff fixes this issue for OpenBSD. I'm
guessing that something similar works for FreeBSD as well. However, I'm
unable to test this.
Björn Ketelaars [Mon, 11 Feb 2019 10:49:35 +0000 (11:49 +0100)]
'head -c BYTES' is non-portable.
Pull request #1499 added a new test, which uses 'head -c'. The '-c'
option is non-portable (not in POSIX). Instead use 'dd'. Similar issue
has been resolved in the past (#1321).