From: Yann Collet Date: Sat, 3 Sep 2016 00:04:49 +0000 (-0700) Subject: clarified dictionary in format description X-Git-Tag: v1.1.0~75 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=855766d73dac1d19abb82f984015a842e09deef2;p=thirdparty%2Fzstd.git clarified dictionary in format description --- diff --git a/NEWS b/NEWS index 726a9e38d..dcfabbf37 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,7 @@ v1.0.1 New : contrib/pzstd, parallel version of zstd, by Nick Terrell Fixed : CLI -d output to stdout by default when input is stdin (#322) Fixed : CLI correctly detects console on Mac OS-X +Fixed : compatibility with OpenBSD, reported by Juan Francisco Cantero Hurtado (#319) Fixed : zstd-pgo, reported by octoploid (#329) v1.0.0 diff --git a/lib/compress/zstd_compress.c b/lib/compress/zstd_compress.c index 723d40559..9e733b8f2 100644 --- a/lib/compress/zstd_compress.c +++ b/lib/compress/zstd_compress.c @@ -8,7 +8,6 @@ */ - /*-******************************************************* * Compiler specifics *********************************************************/ diff --git a/lib/dictBuilder/zdict.c b/lib/dictBuilder/zdict.c index 916a481eb..4d9e3a92e 100644 --- a/lib/dictBuilder/zdict.c +++ b/lib/dictBuilder/zdict.c @@ -463,12 +463,6 @@ static U32 ZDICT_dictSize(const dictItem* dictList) } -#define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \ - if (ZDICT_clockSpan(displayClock) > refreshRate) \ - { displayClock = clock(); DISPLAY(__VA_ARGS__); \ - if (g_displayLevel>=4) fflush(stdout); } } -static const clock_t refreshRate = CLOCKS_PER_SEC * 3 / 10; - static size_t ZDICT_trainBuffer(dictItem* dictList, U32 dictListSize, const void* const buffer, size_t bufferSize, /* buffer must end with noisy guard band */ const size_t* fileSizes, unsigned nbFiles, @@ -481,6 +475,12 @@ static size_t ZDICT_trainBuffer(dictItem* dictList, U32 dictListSize, U32* filePos = (U32*)malloc(nbFiles * sizeof(*filePos)); size_t result = 0; clock_t displayClock = 0; + clock_t const refreshRate = CLOCKS_PER_SEC * 3 / 10; + +# define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \ + if (ZDICT_clockSpan(displayClock) > refreshRate) \ + { displayClock = clock(); DISPLAY(__VA_ARGS__); \ + if (g_displayLevel>=4) fflush(stdout); } } /* init */ DISPLAYLEVEL(2, "\r%70s\r", ""); /* clean display line */ diff --git a/zstd_compression_format.md b/zstd_compression_format.md index 3facb3210..8a5d7b77e 100644 --- a/zstd_compression_format.md +++ b/zstd_compression_format.md @@ -551,7 +551,7 @@ Let's presume the following Huffman tree must be described : The tree depth is 4, since its smallest element uses 4 bits. Value `5` will not be listed, nor will values above `5`. Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`. -Weight formula is : +Weight formula is : ``` Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0 ``` @@ -779,7 +779,7 @@ which specifies `Baseline` and `Number_of_Bits` to add. _Codes_ are FSE compressed, and interleaved with raw additional bits in the same bitstream. -##### Literals length codes +##### Literals length codes Literals length codes are values ranging from `0` to `35` included. They define lengths from 0 to 131071 bytes. @@ -1126,10 +1126,10 @@ When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`. Dictionary format ----------------- -`zstd` is compatible with "pure content" dictionaries, free of any format restriction. +`zstd` is compatible with "raw content" dictionaries, free of any format restriction. But dictionaries created by `zstd --train` follow a format, described here. -__Pre-requisites__ : a dictionary has a known length, +__Pre-requisites__ : a dictionary has a size, defined either by a buffer limit, or a file size. | `Magic_Number` | `Dictionary_ID` | `Entropy_Tables` | `Content` | @@ -1151,20 +1151,21 @@ _Reserved ranges :_ - high range : >= (2^31) __`Entropy_Tables`__ : following the same format as a [compressed blocks]. - They are stored in following order : - Huffman tables for literals, FSE table for offsets, - FSE table for match lengths, and FSE table for literals lengths. - It's finally followed by 3 offset values, populating recent offsets, - stored in order, 4-bytes little-endian each, for a total of 12 bytes. + They are stored in following order : + Huffman tables for literals, FSE table for offsets, + FSE table for match lengths, and FSE table for literals lengths. + It's finally followed by 3 offset values, populating recent offsets, + stored in order, 4-bytes little-endian each, for a total of 12 bytes. -__`Content`__ : Where the actual dictionary content is. - Content size depends on Dictionary size. +__`Content`__ : The rest of the dictionary is its content. + The content act as a "past" in front of data to compress or decompress. [compressed blocks]: #the-format-of-compressed_block Version changes --------------- +- 0.2.1 : clarify field names, by Przemyslaw Skibinski - 0.2.0 : numerous format adjustments for zstd v0.8 - 0.1.2 : limit Huffman tree depth to 11 bits - 0.1.1 : reserved dictID ranges