git.ipfire.org Git - thirdparty/zstd.git/log

fixed zstdmt cli freeze issue with large nb of threads

fileio.c was continually pushing more content without giving a chance to flush compressed one.
It would block the job queue when input data was accumulated too fast (requiring to define many threads).
Fixed : fileio flushes whatever it can after each input attempt.

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 25 Jan 2017 12:11:26 +0000 (13:11 +0100)]

improved #232 fix

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 25 Jan 2017 12:02:33 +0000 (13:02 +0100)]

Fixed https://github.com/facebook/zstd/issues/232

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 25 Jan 2017 11:24:24 +0000 (12:24 +0100)]

Merge remote-tracking branch 'refs/remotes/origin/master' into dev11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 25 Jan 2017 10:57:28 +0000 (11:57 +0100)]

.travis.yml: different tests for "master" branch

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 25 Jan 2017 10:19:35 +0000 (11:19 +0100)]

.travis.yml: optimized order of short tests

commit | commitdiff | tree

Yann Collet [Wed, 25 Jan 2017 06:32:12 +0000 (22:32 -0800)]

overlapped section, for improved compression

Sections 2+ read a bit of data from previous section
in order to improve compression ratio.
This also costs some CPU, to reference read data.

Read data is currently fixed to window>>3 size

commit | commitdiff | tree

Yann Collet [Wed, 25 Jan 2017 01:41:49 +0000 (17:41 -0800)]

refactor job creation

code shared accross ZSTDMT_{compress,flush,end}Stream(),
for easier maintenance

commit | commitdiff | tree

Yann Collet [Wed, 25 Jan 2017 01:02:26 +0000 (17:02 -0800)]

zstdmt cli and API allow selection of section sizes

By default, section sizes are 4x window size.
This new setting allow manual selection of section sizes.
The larger they are, the (slightly) better the compression ratio,
but also the higher the memory allocation cost,
and eventually the lesser the nb of possible threads,
since each section is compressed by a single thread.

It also introduces a prototype to set generic parameters,
ZSTDMT_setMTCtxParameter()

The idea is that it's possible to add enums
to extend the list of parameters that can be set this way.
This is more long-term oriented than a fixed-size struct.
Consider it as a test.

commit | commitdiff | tree

Yann Collet [Tue, 24 Jan 2017 19:48:40 +0000 (11:48 -0800)]

ZSTDMT now supports frame checksum

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 24 Jan 2017 16:42:28 +0000 (17:42 +0100)]

.travis.yml: test jobs 12-15

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 24 Jan 2017 14:01:46 +0000 (15:01 +0100)]

JOB_NUMBER -eq 9

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 24 Jan 2017 12:18:50 +0000 (13:18 +0100)]

improved ZSTD_compressBlock_opt_extDict_generic

commit | commitdiff | tree

Yann Collet [Mon, 23 Jan 2017 19:43:51 +0000 (11:43 -0800)]

refactor ZSTDMT streaming flush code

now shared by both ZSTDMT_compressStream() and ZSTDMT_flushStream()

commit | commitdiff | tree

Yann Collet [Mon, 23 Jan 2017 09:43:58 +0000 (01:43 -0800)]

ZSTDMT streaming : fall back to (regular) single thread mode

when nbThreads==1

commit | commitdiff | tree

Yann Collet [Mon, 23 Jan 2017 08:56:54 +0000 (00:56 -0800)]

ZSTDMT_compressCCtx : fallback to single-thread mode when nbChunks==1

commit | commitdiff | tree

Yann Collet [Mon, 23 Jan 2017 07:49:52 +0000 (23:49 -0800)]

ZSTDMT_compressStream() becomes blocking when required to ensure forward progresses

In some (rare) cases, job list could be blocked by a first job still being processed,
while all following ones are completed, waiting to be flushed.
In such case, the current job-table implementation is unable to accept new job.
As a consequence, a call to ZSTDMT_compressStream() can be useless (nothing read, nothing flushed),
with the risk to trigger a busy-wait on the caller side
(needlessly loop over ZSTDMT_compressStream() ).

In such a case, ZSTDMT_compressStream() will block until the first job is completed and ready to flush.
It ensures some forward progress by guaranteeing it will flush at least a part of the completed job.
Energy-wasting busy-wait is avoided.

commit | commitdiff | tree

Yann Collet [Mon, 23 Jan 2017 00:40:06 +0000 (16:40 -0800)]

ZSTDMT_initCStream_usingDict() can outlive dict

Like ZSTD_initCStream_usingDict(),
ZSTDMT_initCStream_usingDict() now keep a copy of dict internally.
This way, dict can be released :
it does not longer have to outlive all future compression sessions.

commit | commitdiff | tree

Yann Collet [Sun, 22 Jan 2017 23:54:14 +0000 (15:54 -0800)]

playtest.sh : changed sdiff into $DIFF

commit | commitdiff | tree

Yann Collet [Sun, 22 Jan 2017 06:14:08 +0000 (22:14 -0800)]

protected (mutex) read to jobCompleted, as suggested by @terrelln

commit | commitdiff | tree

Yann Collet [Sun, 22 Jan 2017 06:06:49 +0000 (22:06 -0800)]

optimized pool allocation by 1 slot

commit | commitdiff | tree

Yann Collet [Sun, 22 Jan 2017 05:56:36 +0000 (21:56 -0800)]

minor : tab to spaces

commit | commitdiff | tree

Yann Collet [Sat, 21 Jan 2017 01:23:19 +0000 (17:23 -0800)]

convert tabs to space

joys of using multiple editors from multiple environments ...

commit | commitdiff | tree

Yann Collet [Sat, 21 Jan 2017 01:18:41 +0000 (17:18 -0800)]

fixed : compilation of zstreamtest in dll mode

commit | commitdiff | tree

Yann Collet [Sat, 21 Jan 2017 00:44:50 +0000 (16:44 -0800)]

Resolved merge conflict dev+zstdmt

commit | commitdiff | tree

Yann Collet [Sat, 21 Jan 2017 00:36:29 +0000 (16:36 -0800)]

skip zstdmt at root directory

commit | commitdiff | tree

cyan4973 [Fri, 20 Jan 2017 23:24:06 +0000 (15:24 -0800)]

fixed VS2008 project

commit | commitdiff | tree

cyan4973 [Fri, 20 Jan 2017 22:49:44 +0000 (14:49 -0800)]

fixed VS2010 project

commit | commitdiff | tree

cyan4973 [Fri, 20 Jan 2017 22:00:41 +0000 (14:00 -0800)]

fixed minor warnings (Visual, conversion, doxygen)

commit | commitdiff | tree

cyan4973 [Fri, 20 Jan 2017 20:23:30 +0000 (12:23 -0800)]

updated util's time for Windows compatibility

Correctly measures time on Posix systems when running with
Multi-threading

Todo : check Windows measurement under multi-threading

commit | commitdiff | tree

Yann Collet [Fri, 20 Jan 2017 01:44:15 +0000 (17:44 -0800)]

minor refactoring : cleaner MT integration within bench

commit | commitdiff | tree

Yann Collet [Fri, 20 Jan 2017 01:33:37 +0000 (17:33 -0800)]

renamed savedRep into repToConfirm

commit | commitdiff | tree

Yann Collet [Fri, 20 Jan 2017 00:59:56 +0000 (16:59 -0800)]

zstd cli can now compress using multi-threading

added : command -T#
added : ZSTD_resetCStream() (zstdmt_compress)
added : FIO_setNbThreads() (fileio)

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 23:32:07 +0000 (15:32 -0800)]

Added ZSTDMT_initCStream_advanced() variant

Correctly compress with custom params and dictionary
Added relevant fuzzer test in zstreamtest

Also :
new macro ZSTDMT_SECTION_LOGSIZE_MIN, which sets a minimum size for a full job
(note : a flush() command can still generate a partial job anytime)

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 22:05:07 +0000 (14:05 -0800)]

changed MT enabling macro to ZSTD_MULTITHREAD

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 21:46:30 +0000 (13:46 -0800)]

fixed minor warning (unused variable) in fuzzer

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 20:12:50 +0000 (12:12 -0800)]

added streaming fuzzer tests for MT API

Also : fixed corner case, where nb of jobs completed becomes > jobQueueSize
which is possible when many flushes are issued
while there is not enough dst buffer to flush completed ones.

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 18:32:55 +0000 (10:32 -0800)]

fixed Multi-threaded compression

MT compression generates a single frame.
Multi-threading operates by breaking the frames into independent sections.
But from a decoder perspective, there is no difference :
it's just a suite of blocks.

Problem is, decoder preserves repCodes from previous block to start decoding next block.
This is also valid between sections, since they are no different than changing block.

Previous version would incorrectly initialize repcodes to their default value at the beginning of each section.
When using them, there was a mismatch between encoder (default values) and decoder (values from previous block).

This change ensures that repcodes won't be used at the beginning of a new section.
It works by setting them to 0.
This only works with regular (single segment) variants : extDict variants will fail !
Fortunately, sections beyond the 1st one belong to this category.

To be checked : btopt strategy.
This change was only validated from fast to btlazy2 strategies.

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 18:18:17 +0000 (10:18 -0800)]

Simplified compressChunk job

minor refactoring : compression done in a single call on first chunk
Avoid a mutable hSize variable and eventual recombination to cSize at the end

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 17:06:13 +0000 (09:06 -0800)]

Merge pull request #516 from inikep/dev11

fix for zlibWrapper

commit | commitdiff | tree

Yann Collet [Thu, 19 Jan 2017 17:02:42 +0000 (09:02 -0800)]

Merge pull request #515 from iburinoc/emptydict

Don't create dict in streaming apis if dictSize == 0

commit | commitdiff | tree

Przemyslaw Skibinski [Thu, 19 Jan 2017 11:11:22 +0000 (12:11 +0100)]

Merge remote-tracking branch 'refs/remotes/facebook/dev' into dev11

commit | commitdiff | tree

Przemyslaw Skibinski [Thu, 19 Jan 2017 11:10:52 +0000 (12:10 +0100)]

zlibWrapper: added the totalInBytes flag - we need it as strm->total_in can be reset by user

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 23:32:38 +0000 (15:32 -0800)]

ZSTDMT_endStream : nullify input buffer after flush

There will be no more input after ZSTDMT_endStream invocation :
only flush/end is allowed (to fully collect compressed result).

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 23:21:17 +0000 (15:21 -0800)]

Merge pull request #514 from inikep/dev11

fixed gz functions based on zlib 1.2.11

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 23:18:17 +0000 (15:18 -0800)]

ZSTDMT_initCStream() supports restart from invalid state

ZSTDMT_initCStream() will correcly scrub for resources
when it detects that previous compression was not properly finished.

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 22:11:37 +0000 (14:11 -0800)]

trap compression errors, collect back resources from workers

commit | commitdiff | tree

Sean Purcell [Wed, 18 Jan 2017 21:44:43 +0000 (13:44 -0800)]

Prefix notes with /**<

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 20:12:10 +0000 (12:12 -0800)]

CCtxPool starts empty, as suggested by @terrelln

Also : make zstdmt now a target from root

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 19:57:34 +0000 (11:57 -0800)]

fixed cmaketest

(buffer_t){NULL,0} is not considered a constant.
{NULL,0} is.

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 18:04:00 +0000 (19:04 +0100)]

updated link to copyright notice

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 13:36:10 +0000 (14:36 +0100)]

fixed clang warnings in gzread.c and gzwrite.c

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 11:51:44 +0000 (12:51 +0100)]

gzcompatibility.h updated to zlib 1.2.11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 11:47:32 +0000 (12:47 +0100)]

gzwrite.c updated to zlib 1.2.11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 11:14:01 +0000 (12:14 +0100)]

gzread.c updated to zlib 1.2.11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 11:08:08 +0000 (12:08 +0100)]

gzlib.c updated to zlib 1.2.11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 11:01:50 +0000 (12:01 +0100)]

gzguts.h updated to zlib 1.2.11

commit | commitdiff | tree

Przemyslaw Skibinski [Wed, 18 Jan 2017 09:39:39 +0000 (10:39 +0100)]

get_crc_table only with ZLIB_VERNUM >= 0x1270

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 01:46:33 +0000 (17:46 -0800)]

ZSTDMT_free() scrubs potentially unfinished jobs to release their resources

In some complex scenarios (free() without finishing compression),
it is possible that some resources are still into jobs
and not collected back into pools.
In which case, previous version of free() would miss them.
This would be equivalent to a leak.

New version ensures that it even foes after such resource.
It requires job consumers to properly mark resources as released,
by replacing entries by NULL after releasing back to the pool.

Obviously, it's not recommended to free() zstdmt context mid-term,
still that's now a supported scenario.

The same methodology is also used to ensure proper resource collection
after an error is detected.

Still to do :
- detect compression errors (not just allocation ones)
- properly manage resource when init() is called without finishing previous compression.

commit | commitdiff | tree

Yann Collet [Wed, 18 Jan 2017 00:15:18 +0000 (16:15 -0800)]

ZSTDMT_{flush,end}Stream() now block on next job completion when nothing to flush

The main issue was to avoid a caller to continually loop on {flush,end}Stream()
when there was nothing ready to be flushed but still some compression work ongoing in a worker thread.
The continuous loop would have resulted in wasted energy.
The new version makes call to {flush,end}Stream blocking when there is nothing ready to be flushed.
Of course, if all worker threads have exhausted job, it will return zero (all flush completed).

Note : There are still some remaining issues to report error codes
and properly collect back resources into pools when an error is triggered.

commit | commitdiff | tree

Yann Collet [Tue, 17 Jan 2017 23:31:16 +0000 (15:31 -0800)]

completed ZSTDMT streaming compression

Provides the baseline compression API :
size_t ZSTDMT_initCStream(ZSTDMT_CCtx* zcs, int compressionLevel);
size_t ZSTDMT_compressStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
size_t ZSTDMT_flushStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output);
size_t ZSTDMT_endStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output);

Not tested yet

commit | commitdiff | tree

Sean Purcell [Tue, 17 Jan 2017 19:04:08 +0000 (11:04 -0800)]

Don't create dict in streaming apis if dictSize == 0

commit | commitdiff | tree

Yann Collet [Tue, 17 Jan 2017 21:15:25 +0000 (13:15 -0800)]

updated NEWS

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 17 Jan 2017 12:02:29 +0000 (13:02 +0100)]

Merge remote-tracking branch 'refs/remotes/facebook/dev' into dev11

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 17 Jan 2017 12:02:06 +0000 (13:02 +0100)]

zlibWrapper: added get_crc_table

commit | commitdiff | tree

Przemyslaw Skibinski [Tue, 17 Jan 2017 11:40:06 +0000 (12:40 +0100)]

added "Makefile is validated"

commit | commitdiff | tree

Yann Collet [Tue, 17 Jan 2017 04:17:44 +0000 (20:17 -0800)]

Merge pull request #511 from indygreg/cdict-dictid

Set dictionary ID in ZSTD_initCStream_usingCDict()

commit | commitdiff | tree

Yann Collet [Tue, 17 Jan 2017 03:46:22 +0000 (19:46 -0800)]

added test checking dictID when using ZSTD_initCStream_usingCDict()

It shows that dictID is not properly added into frame header

commit | commitdiff | tree

Gregory Szorc [Sun, 15 Jan 2017 01:44:54 +0000 (17:44 -0800)]

Set dictionary ID in ZSTD_initCStream_usingCDict()

When porting python-zstandard to use ZSTD_initCStream_usingCDict()
so compression dictionaries could be reused, an automated test
failed due to compressed content changing.

I tracked this down to ZSTD_initCStream_usingCDict() not
setting the dictID field of the ZSTD_CCtx attached to the
ZSTD_CStream instance.

I'm not 100% convinced this is the correct or full solution,
as I'm still seeing one automated test failing with this change.

commit | commitdiff | tree

Yann Collet [Thu, 12 Jan 2017 21:10:04 +0000 (22:10 +0100)]

Merge pull request #510 from iburinoc/baddict

Fixed decompress_usingDict not propagating corrupted dictionary error

commit | commitdiff | tree

Sean Purcell [Thu, 12 Jan 2017 17:38:29 +0000 (09:38 -0800)]

Fix missing 'OK' logging on fuzzer testcase

commit | commitdiff | tree

Yann Collet [Thu, 12 Jan 2017 16:46:46 +0000 (17:46 +0100)]

fix gcc-arm warning "suggest braces around empty body"

commit | commitdiff | tree

Yann Collet [Thu, 12 Jan 2017 02:06:35 +0000 (03:06 +0100)]

zstdmt : fix : resources properly collected even when early fail

In previous version, main function would return early when detecting a job error.
Late threads resources were therefore not collected back into pools.
New version just register the error, but continue the collecting process.
All buffers and context should be released back to pool before leaving main function.

commit | commitdiff | tree

Sean Purcell [Thu, 12 Jan 2017 01:31:06 +0000 (17:31 -0800)]

Fixed decompress_usingDict not propagating corrupted dictionary error

commit | commitdiff | tree

Yann Collet [Thu, 12 Jan 2017 01:01:28 +0000 (02:01 +0100)]

zstdmt : correctly check for cctx and buffer allocation

Result from getBuffer and getCCtx could be NULL when allocation fails.
Now correctly checks : job creation stop and last job reports an allocation error.
releaseBuffer and releaseCCtx are now also compatible with NULL input.

Identified a new potential issue :
when early job fails, later jobs are not collected for resource retrieval.

commit | commitdiff | tree

Yann Collet [Thu, 12 Jan 2017 00:25:46 +0000 (01:25 +0100)]

zstdmt : changed internal naming from frame to chunk

Since the result of mt compression is a single frame,
changed naming, which implied the concatenation of multiple frames.

minor : ensures that content size is written in header

commit | commitdiff | tree

Yann Collet [Wed, 11 Jan 2017 17:21:25 +0000 (18:21 +0100)]

ZSTDMT_compress() creates a single frame

The new strategy involves cutting frame at block level.
The result is a single frame, preserving ZSTD_getDecompressedSize()

As a consequence, bench can now make a full round-trip,
since the result is compatible with ZSTD_decompress().

This strategy will not make it possible to decode the frame with multiple threads
since the exact cut between independent blocks is not known.
MT decoding needs further discussions.

commit | commitdiff | tree

Yann Collet [Wed, 11 Jan 2017 15:08:08 +0000 (16:08 +0100)]

minor refactor (release CCtx 1st) and comment clarification

commit | commitdiff | tree

Yann Collet [Wed, 11 Jan 2017 14:58:05 +0000 (15:58 +0100)]

fixed ZSTDMT_createCCtx() : checked inner objects are properly created

commit | commitdiff | tree

Yann Collet [Wed, 11 Jan 2017 14:44:26 +0000 (15:44 +0100)]

improved ZSTD_createCCtxPool() cancellation

use ZSTD_freeCCtxPool() to release the partially created pool.
avoids to duplicate logic.

Also : identified a new difficult corner case :
when freeing the Pool, all CCtx should be previously released back to the pool.
Otherwise, it means some CCtx are still in use.
There is currently no clear policy on what to do in such a case.
Note : it's supposed to never happen.
Since pool creation/usage is static, it has no external user,
which limits risks.

commit | commitdiff | tree

Yann Collet [Wed, 11 Jan 2017 14:35:56 +0000 (15:35 +0100)]

fixed ZSTDMT_createCCtxPool() when inner CCtx creation fails

commit | commitdiff | tree

Yann Collet [Tue, 10 Jan 2017 05:30:28 +0000 (06:30 +0100)]

Merge pull request #509 from terrelln/dict-builder-32

Handle cover dictionary builder maximum input size for 32-bit mode

commit | commitdiff | tree

Nick Terrell [Tue, 10 Jan 2017 01:00:12 +0000 (17:00 -0800)]

Document memory requirements for COVER algorithm

commit | commitdiff | tree

Nick Terrell [Tue, 10 Jan 2017 00:50:00 +0000 (16:50 -0800)]

Handle large input size in 32-bit mode correctly