Yann Collet [Fri, 31 Dec 2021 21:03:12 +0000 (13:03 -0800)]
fix performance issue in scenario #2966 (part 1)
When re-using a compression state, across multiple successive compressions,
the state should minimize the amount of allocation and initialization required.
This mostly matters in situations where initialization is an overwhelming task
compared to compression itself.
This can happen when the amount to compress is small,
while the compression state was given the impression that it would be much larger,
aka, streaming mode without providing a srcSize hint.
This commit fixes it, making this scenario once again on par with v1.4.9.
Note that this does not completely fix #2966,
since another heavy initialization, specific to row mode,
is also happening (and was not present in v1.4.9).
This will be fixed in a separate commit.
Yann Collet [Tue, 28 Dec 2021 19:46:15 +0000 (11:46 -0800)]
separate newRep() from updateRep()
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.
Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
Yann Collet [Fri, 24 Dec 2021 05:58:08 +0000 (21:58 -0800)]
introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.
Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.
While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".
One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.
This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.
At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
Yann Collet [Thu, 23 Dec 2021 21:39:46 +0000 (13:39 -0800)]
changed seqDef.matchLength into seqDef.mlBase
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
Yann Collet [Wed, 15 Dec 2021 22:37:05 +0000 (14:37 -0800)]
fixed incorrect rowlog initialization
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
W. Felix Handte [Mon, 13 Dec 2021 20:46:41 +0000 (15:46 -0500)]
Increment Step by 1 not 2
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)
Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
W. Felix Handte [Mon, 13 Dec 2021 19:48:26 +0000 (14:48 -0500)]
Rewrite `step` to Track Increment Between Pairs of Positions
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.
W. Felix Handte [Fri, 10 Dec 2021 20:44:39 +0000 (15:44 -0500)]
Stagger Application of `stepSize` in ZSTD_fast
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
Yann Collet [Wed, 8 Dec 2021 23:05:17 +0000 (15:05 -0800)]
remove offending static assert lines
no idea why visual + clang-cl + appveyor don't like them,
I've not been able to reproduce the issue locally,
but these static assert are very unlikely to deliver a useful signal,
I can't imagine a situation where they will be wrong,
and if they are, then a ton of other things will be broken way before reaching that point.
W. Felix Handte [Mon, 6 Dec 2021 18:47:18 +0000 (13:47 -0500)]
Reject Irregular Dictionary Files
I hadn't seen #2890, so I wrote my own version. I like this approach a little
better, since it does an explicit check for a regular file, rather than
passing a magic value.
Nick Terrell [Wed, 1 Dec 2021 20:52:23 +0000 (12:52 -0800)]
[asm] Share portability macros and restrict ASM further
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.
Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.
Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.
Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.