Restructure Format Specification

author Sean Purcell <me@seanp.xyz>

Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)

committer Sean Purcell <me@seanp.xyz>

Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)
author Sean Purcell <me@seanp.xyz>
Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)
committer Sean Purcell <me@seanp.xyz>
Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)
diff --git a/doc/zstd_compression_format.md b/doc/zstd_compression_format.md

index df983284f399913631728dc84a00f04e4b73b2b5..f08dc9537130cae9c913d851d25cf60f65f33b3f 100644 (file)
--- a/doc/zstd_compression_format.md
+++ b/doc/zstd_compression_format.md
@@ -3,7 +3,7 @@ Zstandard Compression Format
  
  ### Notices
  
-Copyright (c) 2016 Yann Collet
+Copyright (c) 2016-present Yann Collet, Facebook, Inc.
  
  Permission is granted to copy and distribute this document
  for any purpose and without charge,
@@ -16,8 +16,7 @@ Distribution of this document is unlimited.
  
  ### Version
  
-0.2.3 (27/01/17)
-
+0.2.4 (17/02/17)
  
  Introduction
  ------------
@@ -57,17 +56,15 @@ Whenever it does not support a parameter defined in the compressed stream,
  it must produce a non-ambiguous error code and associated error message
  explaining which parameter is unsupported.
  
-Overall conventions
------------
+### Overall conventions
  In this document:
  - square brackets i.e. `[` and `]` are used to indicate optional fields or parameters.
-- a naming convention for identifiers is `Mixed_Case_With_Underscores`
+- the naming convention for identifiers is `Mixed_Case_With_Underscores`
  
-Definitions
------------
-A content compressed by Zstandard is transformed into a Zstandard __frame__.
+### Definitions
+Content compressed by Zstandard is transformed into a Zstandard __frame__.
  Multiple frames can be appended into a single file or stream.
-A frame is totally independent, has a defined beginning and end,
+A frame is completely independent, has a defined beginning and end,
  and a set of parameters which tells the decoder how to decompress it.
  
  A frame encapsulates one or multiple __blocks__.
@@ -77,63 +74,33 @@ Unlike frames, each block depends on previous blocks for proper decoding.
  However, each block can be decompressed without waiting for its successor,
  allowing streaming operations.
  
-
-Frame Concatenation
--------------------
-
-In some circumstances, it may be required to append multiple frames,
-for example in order to add new data to an existing compressed file
-without re-framing it.
-
-In such case, each frame brings its own set of descriptor flags.
-Each frame is considered independent.
-The only relation between frames is their sequential order.
-
-The ability to decode multiple concatenated frames
-within a single stream or file is left outside of this specification.
-As an example, the reference `zstd` command line utility is able
-to decode all concatenated frames in their sequential order,
-delivering the final decompressed result as if it was a single content.
-
-
-Skippable Frames
-----------------
-
-| `Magic_Number` | `Frame_Size` | `User_Data` |
-|:--------------:|:------------:|:-----------:|
-|   4 bytes      |  4 bytes     |   n bytes   |
-
-Skippable frames allow the insertion of user-defined data
-into a flow of concatenated frames.
-Its design is pretty straightforward,
-with the sole objective to allow the decoder to quickly skip
-over user-defined data and continue decoding.
-
-Skippable frames defined in this specification are compatible with [LZ4] ones.
-
-[LZ4]:http://www.lz4.org
-
-__`Magic_Number`__
-
-4 Bytes, little-endian format.
-Value : 0x184D2A5?, which means any value from 0x184D2A50 to 0x184D2A5F.
-All 16 values are valid to identify a skippable frame.
-
-__`Frame_Size`__
-
-This is the size, in bytes, of the following `User_Data`
-(without including the magic number nor the size field itself).
-This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
-This means `User_Data` can’t be bigger than (2^32-1) bytes.
-
-__`User_Data`__
-
-The `User_Data` can be anything. Data will just be skipped by the decoder.
-
-
-
-General Structure of Zstandard Frame format
--------------------------------------------
+Overview
+---------
+- [Frames](#frames)
+  - [Zstandard frames](#zstandard-frames)
+    - [Blocks](#blocks)
+      - [Literals Section](#literals-section)
+      - [Sequences Section](#sequences-section)
+      - [Sequence Execution](#sequence-execution)
+  - [Skippable frames](#skippable-frames)
+- [Entropy Encoding](#entropy-encoding)
+  - [FSE](#fse)
+  - [Huffman Coding](#huffman-coding)
+- [Dictionary Format](#dictionary-format)
+
+Frames
+------
+Zstandard compressed data is made of up one or more __frames__.
+Each frame is independent and can be decompressed indepedently of other frames.
+The decompressed content of multiple concatenated frames is the concatenation of
+each frames decompressed content.
+
+There are two frame formats defined by Zstandard:
+  Zstandard frames and Skippable frames.
+Zstandard frames contain compressed data, while
+skippable frames contain no data and can be used for metadata.
+
+## Zstandard frames
  The structure of a single Zstandard frame is following:
  
  | `Magic_Number` | `Frame_Header` |`Data_Block`| [More data blocks] | [`Content_Checksum`] |
@@ -147,11 +114,11 @@ Value : 0xFD2FB528
  
  __`Frame_Header`__
  
-2 to 14 Bytes, detailed in [next part](#the-structure-of-frame_header).
+2 to 14 Bytes, detailed in [`Frame_Header`](#frame_header).
  
  __`Data_Block`__
  
-Detailed in [next chapter](#the-structure-of-data_block).
+Detailed in [`Blocks`](#blocks).
  That’s where compressed data is stored.
  
  __`Content_Checksum`__
@@ -162,10 +129,9 @@ of [xxh64() hash function](http://www.xxhash.org)
  digesting the original (decoded) data as input, and a seed of zero.
  The low 4 bytes of the checksum are stored in little endian format.
  
+### `Frame_Header`
  
-The structure of `Frame_Header`
--------------------------------
-The `Frame_Header` has a variable size, which uses a minimum of 2 bytes,
+The `Frame_Header` has a variable size, with a minimum of 2 bytes,
  and up to 14 bytes depending on optional parameters.
  The structure of `Frame_Header` is following:
  
@@ -173,10 +139,10 @@ The structure of `Frame_Header` is following:
  | ------------------------- | --------------------- | ----------------- | ---------------------- |
  | 1 byte                    | 0-1 byte              | 0-4 bytes         | 0-8 bytes              |
  
-### `Frame_Header_Descriptor`
+#### `Frame_Header_Descriptor`
  
  The first header's byte is called the `Frame_Header_Descriptor`.
-It tells which other fields are present.
+It describes which other fields are present.
  Decoding this byte is enough to tell the size of `Frame_Header`.
  
  | Bit number | Field name                |
@@ -188,7 +154,7 @@ Decoding this byte is enough to tell the size of `Frame_Header`.
  | 2          | `Content_Checksum_flag`   |
  | 1-0        | `Dictionary_ID_flag`      |
  
-In this table, bit 7 is highest bit, while bit 0 is lowest.
+In this table, bit 7 the is highest bit, while bit 0 the is lowest.
  
  __`Frame_Content_Size_flag`__
  
@@ -216,7 +182,7 @@ but `Window_Descriptor` byte is skipped.
  As a consequence, the decoder must allocate a memory segment
  of size equal or bigger than `Frame_Content_Size`.
  
-In order to preserve the decoder from unreasonable memory requirement,
+In order to preserve the decoder from unreasonable memory requirements,
  a decoder can reject a compressed frame
  which requests a memory size beyond decoder's authorized range.
  
@@ -256,7 +222,7 @@ It also specifies the size of this field as `Field_Size`.
  | ---------- | --- | --- | --- | --- |
  |`Field_Size`|  0  |  1  |  2  |  4  |
  
-### `Window_Descriptor`
+#### `Window_Descriptor`
  
  Provides guarantees on maximum back-reference distance
  that will be used within compressed data.
@@ -294,12 +260,12 @@ It's merely a recommendation though,
  decoders are free to support larger or lower limits,
  depending on local limitations.
  
-### `Dictionary_ID`
+#### `Dictionary_ID`
  
  This is a variable size field, which contains
  the ID of the dictionary required to properly decode the frame.
  Note that this field is optional. When it's not present,
-it's up to the caller to make sure it uses the correct dictionary.
+it's up to the decoder to make sure it uses the correct dictionary.
  Format is little-endian.
  
  Field size depends on `Dictionary_ID_flag`.
@@ -319,7 +285,7 @@ the following ranges are reserved for future use and should not be used :
  - high range : >= (2^31)
  
  
-### `Frame_Content_Size`
+#### `Frame_Content_Size`
  
  This is the original (uncompressed) size. This information is optional.
  The `Field_Size` is provided according to value of `Frame_Content_Size_flag`.
@@ -337,10 +303,14 @@ When `Field_Size` is 1, 4 or 8 bytes, the value is read directly.
  When `Field_Size` is 2, _the offset of 256 is added_.
  It's allowed to represent a small size (for example `18`) using any compatible variant.
  
+Blocks
+-------
+After the magic number and header of each block,
+there are some number of blocks.
+Each frame must have at least one block  but there is no upper limit
+on the number of blocks per frame.
  
-The structure of `Data_Block`
------------------------------
-The structure of `Data_Block` is following:
+The structure of a block is as follows:
  
  | `Last_Block` | `Block_Type` | `Block_Size` | `Block_Content` |
  |:------------:|:------------:|:------------:|:---------------:|
@@ -351,8 +321,9 @@ The block header (`Last_Block`, `Block_Type`, and `Block_Size`) uses 3-bytes.
  __`Last_Block`__
  
  The lowest bit signals if this block is the last one.
-Frame ends right after this block.
-It may be followed by an optional `Content_Checksum` .
+The frame will end after this one.
+It may be followed by an optional `Content_Checksum`
+(see [Zstandard Frames](#zstandard-frames)).
  
  __`Block_Type` and `Block_Size`__
  
@@ -367,15 +338,19 @@ There are 4 block types :
  | `Block_Type` | `Raw_Block` | `RLE_Block` | `Compressed_Block` | `Reserved`|
  
  - `Raw_Block` - this is an uncompressed block.
-  `Block_Size` is the number of bytes to read and copy.
+  `Block_Content` contains `Block_Size` bytes to read and copy
+  as decoded data.
+
  - `RLE_Block` - this is a single byte, repeated N times.
-  In which case, `Block_Size` is the size to regenerate,
-  while the "compressed" block is just 1 byte (the byte to repeat).
-- `Compressed_Block` - this is a [Zstandard compressed block](#the-format-of-compressed_block),
-  detailed in another section of this specification.
-  `Block_Size` is the compressed size.
-  Decompressed size is unknown,
+  `Block_Content` consists of a single byte,
+  and `Block_Size` is the number of times this byte should be repeated.
+
+- `Compressed_Block` - this is a [Zstandard compressed block](#compressed-blocks),
+  explained later on.
+  `Block_Size` is the length of `Block_Content`, the compressed data.
+  The decompressed size is unknown,
    but its maximum possible value is guaranteed (see below)
+
  - `Reserved` - this is not a block.
    This value cannot be used with current version of this specification.
  
@@ -384,42 +359,36 @@ Block sizes must respect a few rules :
  - Block decompressed size is always <= maximum back-reference distance.
  - Block decompressed size is always <= 128 KB.
  
-
-__`Block_Content`__
-
-The `Block_Content` is where the actual data to decode stands.
-It might be compressed or not, depending on previous field indications.
  A data block is not necessarily "full" :
  since an arbitrary “flush” may happen anytime,
-block decompressed content can be any size,
+block decompressed content can be any size (even empty),
  up to `Block_Maximum_Decompressed_Size`, which is the smallest of :
  - Maximum back-reference distance
  - 128 KB
  
-
-
-The format of `Compressed_Block`
---------------------------------
-The size of `Compressed_Block` must be provided using `Block_Size` field from `Data_Block`.
-The `Compressed_Block` has a guaranteed maximum regenerated size,
-in order to properly allocate destination buffer.
-See [`Data_Block`](#the-structure-of-data_block) for more details.
+Compressed Blocks
+-----------------
+To decompress a compressed block, the compressed size must be provided from
+`Block_Size` field in the block header.
  
  A compressed block consists of 2 sections :
-- [`Literals_Section`](#literals_section)
-- [`Sequences_Section`](#sequences_section)
+- [Literals Section](#literals-section)
+- [Sequences Section](#sequences-section)
+
+The results of the two sections are then combined to produce the decompressed
+data in [Sequence Execution](#sequence-execution)
  
-### Prerequisites
+#### Prerequisites
  To decode a compressed block, the following elements are necessary :
-- Previous decoded blocks, up to a distance of `Window_Size`,
-  or all previous blocks when `Single_Segment_flag` is set.
-- List of "recent offsets" from previous compressed block.
-- Decoding tables of previous compressed block for each symbol type
+- Previous decoded data, up to a distance of `Window_Size`,
+  or all previous data when `Single_Segment_flag` is set.
+- List of "recent offsets" from the previous compressed block.
+- Decoding tables of the previous compressed block for each symbol type
    (literals, literals lengths, match lengths, offsets).
  
-
-### `Literals_Section`
-
+Literals Section
+----------------
+During sequence execution, symbols from the literals section
  During sequence phase, literals will be entangled with match copy operations.
  All literals are regrouped in the first part of the block.
  They can be decoded first, and then copied during sequence operations,
@@ -443,7 +412,7 @@ using little-endian convention.
  | --------------------- | ------------- | ------------------ | ----------------- |
  |   2 bits              |  1 - 2 bits   |    5 - 20 bits     |    0 - 18 bits    |
  
-In this representation, bits on the left are smallest bits.
+In this representation, bits on the left are the lowest bits.
  
  __`Literals_Block_Type`__
  
@@ -464,14 +433,16 @@ This field uses 2 lowest bits of first byte, describing 4 different block types
  - `Repeat_Stats_Literals_Block` - This is a Huffman-compressed block,
          using Huffman tree _from previous Huffman-compressed literals block_.
          Huffman tree description will be skipped.
+        Note: If this mode is used without any previous Huffman-table in the frame
+        (or [dictionary](#dictionary-format)), this should be treated as corruption.
  
  __`Size_Format`__
  
  `Size_Format` is divided into 2 families :
  
-- For `Compressed_Block`, it requires to decode both `Compressed_Size`
-  and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
  - For `Raw_Literals_Block` and `RLE_Literals_Block` it's enough to decode `Regenerated_Size`.
+- For `Compressed_Block`, its required to decode both `Compressed_Size`
+  and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
  
  For values spanning several bytes, convention is little-endian.
  
@@ -490,266 +461,90 @@ __`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :
                 `Literals_Section_Header` has 3 bytes.
                 `Regenerated_Size = (Header[0]>>4) + (Header[1]<<4) + (Header[2]<<12)`
  
+Only Stream1 is present for these cases.
  Note : it's allowed to represent a short value (for example `13`)
  using a long format, accepting the increased compressed data size.
  
  __`Size_Format` for `Compressed_Literals_Block` and `Repeat_Stats_Literals_Block`__ :
  
  - Value 00 : _A single stream_.
-               Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
+               Both `Regenerated_Size` and `Compressed_Size` use 10 bits (0-1023).
                 `Literals_Section_Header` has 3 bytes.
  - Value 01 : 4 streams.
-               Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
+               Both `Regenerated_Size` and `Compressed_Size` use 10 bits (0-1023).
                 `Literals_Section_Header` has 3 bytes.
  - Value 10 : 4 streams.
-               Both `Compressed_Size` and `Regenerated_Size` use 14 bits (0-16383).
+               Both `Regenerated_Size` and `Compressed_Size` use 14 bits (0-16383).
                 `Literals_Section_Header` has 4 bytes.
  - Value 11 : 4 streams.
-               Both `Compressed_Size` and `Regenerated_Size` use 18 bits (0-262143).
+               Both `Regenerated_Size` and `Compressed_Size` use 18 bits (0-262143).
                 `Literals_Section_Header` has 5 bytes.
  
  Both `Compressed_Size` and `Regenerated_Size` fields follow little-endian convention.
  Note: `Compressed_Size` __includes__ the size of the Huffman Tree description if it
  is present.
  
-#### `Huffman_Tree_Description`
-
-This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
-
-Prefix coding represents symbols from an a priori known alphabet
-by bit sequences (codewords), one codeword for each symbol,
-in a manner such that different symbols may be represented
-by bit sequences of different lengths,
-but a parser can always parse an encoded string
-unambiguously symbol-by-symbol.
-
-Given an alphabet with known symbol frequencies,
-the Huffman algorithm allows the construction of an optimal prefix code
-using the fewest bits of any possible prefix codes for that alphabet.
-
-Prefix code must not exceed a maximum code length.
-More bits improve accuracy but cost more header size,
-and require more memory or more complex decoding operations.
-This specification limits maximum code length to 11 bits.
-
-
-##### Representation
-
-All literal values from zero (included) to last present one (excluded)
-are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
-Transformation from `Weight` to `Number_of_Bits` follows this formula :
-```
-Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
-```
-The last symbol's `Weight` is deduced from previously decoded ones,
-by completing to the nearest power of 2.
-This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
-
-__Example__ :
-Let's presume the following Huffman tree must be described :
-
-|     literal      |  0  |  1  |  2  |  3  |  4  |  5  |
-| ---------------- | --- | --- | --- | --- | --- | --- |
-| `Number_of_Bits` |  1  |  2  |  3  |  0  |  4  |  4  |
-
-The tree depth is 4, since its smallest element uses 4 bits.
-Value `5` will not be listed as it can be determined from the values for 0-4,
-nor will values above `5` as they are all 0.
-Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
-Weight formula is :
-```
-Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
-```
-It gives the following series of weights :
-
-| literal  |  0  |  1  |  2  |  3  |  4  |
-| -------- | --- | --- | --- | --- | --- |
-| `Weight` |  4  |  3  |  2  |  0  |  1  |
-
-The decoder will do the inverse operation :
-having collected weights of literals from `0` to `4`,
-it knows the last literal, `5`, is present with a non-zero weight.
-The weight of `5` can be determined by advancing to the next power of 2.
-The sum of `2^(Weight-1)` (excluding 0's) is :
-`8 + 4 + 2 + 0 + 1 = 15`.
-Nearest power of 2 is 16.
-Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = 1`.
-
-##### Huffman Tree header
-
-This is a single byte value (0-255),
-which describes how to decode the list of weights.
-
-- if `headerByte` >= 128 : this is a direct representation,
-  where each `Weight` is written directly as a 4 bits field (0-15).
-  They are encoded forward, 2 weights to a byte with the first weight taking
-  the top four bits and the second taking the bottom four (e.g. the following
-  operations could be used to read the weights:
-  `Weight[0] = (Byte[0] >> 4), Weight[1] = (Byte[0] & 0xf)`, etc.).
-  The full representation occupies `((Number_of_Symbols+1)/2)` bytes,
-  meaning it uses a last full byte even if `Number_of_Symbols` is odd.
-  `Number_of_Symbols = headerByte - 127`.
-  Note that maximum `Number_of_Symbols` is 255-127 = 128.
-  A larger series must necessarily use FSE compression.
-
-- if `headerByte` < 128 :
-  the series of weights is compressed by FSE.
-  The length of the FSE-compressed series is equal to `headerByte` (0-127).
-
-##### Finite State Entropy (FSE) compression of Huffman weights
-
-FSE decoding uses three operations: `Init_State`, `Decode_Symbol`, and `Update_State`.
-`Init_State` reads in the initial state value from a bitstream,
-`Decode_Symbol` outputs a symbol based on the current state,
-and `Update_State` goes to a new state based on the current state and some number of consumed bits.
-
-FSE streams must be read in reverse from the order they're encoded in,
-so bitstreams start at a certain offset and works backwards towards their base.
-
-For more on how FSE bitstreams work, see [Finite State Entropy].
-
-[Finite State Entropy]:https://github.com/Cyan4973/FiniteStateEntropy/
-
-The series of Huffman weights is compressed using FSE compression.
-It's a single bitstream with 2 interleaved states,
-sharing a single distribution table.
-
-To decode an FSE bitstream, it is necessary to know its compressed size.
-Compressed size is provided by `headerByte`.
-It's also necessary to know its _maximum possible_ decompressed size,
-which is `255`, since literal values span from `0` to `255`,
-and last symbol's weight is not represented.
-
-An FSE bitstream starts by a header, describing probabilities distribution.
-It will create a Decoding Table.
-The table must be pre-allocated, so a maximum accuracy must be fixed.
-For a list of Huffman weights, maximum accuracy is 7 bits.
-
-The FSE header format is [described in a relevant chapter](#fse-distribution-table--condensed-format),
-as well as the [FSE bitstream](#bitstream).
-The main difference is that Huffman header compression uses 2 states,
-which share the same FSE distribution table.
-The first state (`State1`) encodes the even indexed symbols,
-and the second (`State2`) encodes the odd indexes.
-State1 is initialized first, and then State2, and they take turns decoding
-a single symbol and updating their state.
-
-The number of symbols to decode is determined
-by tracking bitStream overflow condition:
-If updating state after decoding a symbol would require more bits than
-remain in the stream, it is assumed the extra bits are 0.  Then,
-the symbols for each of the final states are decoded and the process is complete.
-
-##### Conversion from weights to Huffman prefix codes
-
-All present symbols shall now have a `Weight` value.
-It is possible to transform weights into Number_of_Bits, using this formula:
-```
-Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0
-```
-Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
-Symbols with a `Weight` of zero are removed.
-Then, starting from lowest weight, prefix codes are distributed in order.
-
-__Example__ :
-Let's presume the following list of weights has been decoded :
-
-| Literal  |  0  |  1  |  2  |  3  |  4  |  5  |
-| -------- | --- | --- | --- | --- | --- | --- |
-| `Weight` |  4  |  3  |  2  |  0  |  1  |  1  |
-
-Sorted by weight and then natural order,
-it gives the following distribution :
-
-| Literal          |  3  |  4  |  5  |  2  |  1  |   0  |
-| ---------------- | --- | --- | --- | --- | --- | ---- |
-| `Weight`         |  0  |  1  |  1  |  2  |  3  |   4  |
-| `Number_of_Bits` |  0  |  4  |  4  |  3  |  2  |   1  |
-| prefix codes     | N/A | 0000| 0001| 001 | 01  |   1  |
+### Raw Literals Block
+The data in Stream1 is `Regenerated_Size` bytes long, and contains the raw literals data
+to be used in sequence execution.
  
+### RLE Literals Block
+Stream1 consists of a single byte which should be repeated `Regenerated_Size` times
+to generate the decoded literals.
  
-#### The content of Huffman-compressed literal stream
+### Compressed Literals Block and Repeat Stats Literals Block
+Both of these modes contain Huffman encoded data 
  
-##### Bitstreams sizes
+#### `Huffman_Tree_Description`
+This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
+The format of the Huffman tree description can be found at [Huffman Tree description](#huffman-tree-description).
+The size Huffman Tree description will be determined during the decoding process,
+and must be used to determine where the compressed Huffman streams begin.
  
-As seen in a previous paragraph,
-there are 2 types of Huffman-compressed literals :
-a single stream and 4 streams.
+If repeat stats mode is used, the Huffman table used in the previous compressed block will
+be used to decompress this block as well.
  
-Encoding using 4 streams is useful for CPU with multiple execution units and out-of-order operations.
-Since each stream can be decoded independently,
-it's possible to decode them up to 4x faster than a single stream,
-presuming the CPU has enough parallelism available.
+Huffman compressed data consists either 1 or 4 Huffman-coded streams.
  
-For single stream, header provides both the compressed and regenerated size.
-For 4 streams though,
-header only provides compressed and regenerated size of all 4 streams combined.
-In order to properly decode the 4 streams,
-it's necessary to know the compressed and regenerated size of each stream.
+If only one stream is present, it is a single bitstream occupying the entire
+remaining portion of the literals block, encoded as described at
+[Huffman-Coded Streams](#huffman-coded-streams).
  
-Regenerated size of each stream can be calculated by `(totalSize+3)/4`,
-except for last one, which can be up to 3 bytes smaller, to reach `totalSize`.
+If there are four streams, the literals section header only provides enough
+information to know the regenerated and compressed sizes of all four streams combined.
+The regenerated size of each stream is equal to `(totalSize+3)/4`, except for the last stream,
+which may be up to 3 bytes smaller, to reach a total decompressed size match that described
+in the literals header.
  
-Compressed size is provided explicitly : in the 4-streams variant,
-bitstreams are preceded by 3 unsigned little-endian 16-bits values.
-Each value represents the compressed size of one stream, in order.
-The last stream size is deducted from total compressed size
-and from previously decoded stream sizes :
+The compressed size of each stream is provided explicitly: the first 6 bytes of the compressed
+data consist of three 2-byte little endian fields, describing the compressed sizes
+of the first three streams.
+The last streams size is computed from the total compressed size and the size of the other
+three streams.
  
  `stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.
  
+Note: remember that totalCSize may be smaller than the `Compressed_Size` found in the literals
+block header as `Compressed_Size` also contains the size of the Huffman Tree description if it
+is present.
  
-##### Bitstreams read and decode
-
-Each bitstream must be read _backward_,
-that is starting from the end down to the beginning.
-Therefore it's necessary to know the size of each bitstream.
-
-It's also necessary to know exactly which _bit_ is the latest.
-This is detected by a final bit flag :
-the highest bit of latest byte is a final-bit-flag.
-Consequently, a last byte of `0` is not possible.
-And the final-bit-flag itself is not part of the useful bitstream.
-Hence, the last byte contains between 0 and 7 useful bits.
-
-For example, if the literal sequence "0145" was encoded using the prefix codes above,
-it would be encoded as:
-```
-00000001 01110000
-```
-
-|Symbol  |   5  |   4  |  1 | 0 | Padding |
-|--------|------|------|----|---|---------|
-|Encoding|`0000`|`0001`|`01`|`1`| `10000` |
-
-Starting from the end,
-it's possible to read the bitstream in a little-endian fashion,
-keeping track of already used bits.  Since the bitstream is encoded in reverse
-order, by starting at the end the symbols can be read in forward order.
-
-Reading the last `Max_Number_of_Bits` bits,
-it's then possible to compare extracted value to decoding table,
-determining the symbol to decode and number of bits to discard.
-
-The process continues up to reading the required number of symbols per stream.
-If a bitstream is not entirely and exactly consumed,
-hence reaching exactly its beginning position with _all_ bits consumed,
-the decoding process is considered faulty.
-
-### `Sequences_Section`
+Each of these 4 bitstreams is then decoded independently as a Huffman-Coded stream,
+as described at [Huffman-Coded Streams](#huffman-coded-streams)
  
+Sequences Section
+-----------------
  A compressed block is a succession of _sequences_ .
  A sequence is a literal copy command, followed by a match copy command.
  A literal copy command specifies a length.
  It is the number of bytes to be copied (or extracted) from the literal section.
  A match copy command specifies an offset and a length.
-The offset gives the position to copy from,
-which can be within a previous block.
  
  When all _sequences_ are decoded,
  if there is are any literals left in the _literal section_,
  these bytes are added at the end of the block.
  
+This is described in more detail in [Sequence Execution](#sequence-execution)
+
  The `Sequences_Section` regroup all symbols required to decode commands.
  There are 3 symbol types : literals lengths, offsets and match lengths.
  They are encoded together, interleaved, in a single _bitstream_.
@@ -762,7 +557,7 @@ followed by the bitstream.
  | -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
  
  To decode the `Sequences_Section`, it's required to know its size.
-This size is deducted from `blockSize - literalSectionSize`.
+This size is deduced from `blockSize - literalSectionSize`.
  
  
  #### `Sequences_Section_Header`
@@ -801,12 +596,21 @@ They follow the same enumeration :
  | ------------------ | ----------------- | ---------- | --------------------- | ------------- |
  | `Compression_Mode` | `Predefined_Mode` | `RLE_Mode` | `FSE_Compressed_Mode` | `Repeat_Mode` |
  
-- `Predefined_Mode` : uses a predefined distribution table.
-- `RLE_Mode` : it's a single code, repeated `Number_of_Sequences` times.
-- `Repeat_Mode` : re-use distribution table from previous compressed block.
+- `Predefined_Mode` : A predefined FSE distribution table is used, defined in
+          [default distributions](#default-distributions).
+          The table takes no space in the compressed data.
+- `RLE_Mode` : The table description consists of a single byte.
+          This code will be repeated for every sequence.
+- `Repeat_Mode` : The table used in the previous compressed block will be used again.
+          No distribution table will be present.
+          Note: this includes RLE mode, so if repeat_mode follows rle_mode the same symbol will be repeated.
+          If this mode is used without any previous sequence table in the frame
+          (or [dictionary](#dictionary-format)) to repeat, this should be treated as corruption.
  - `FSE_Compressed_Mode` : standard FSE compression.
            A distribution table will be present.
-          It will be described in [next part](#distribution-tables).
+          The format of this distribution table is described in (FSE Table Description)[#fse-table-description].
+          Note that the maximum allowed accuracy log for literals length and match length tables is 9,
+          and the maximum accuracy log for the offsets table is 8.
  
  #### The codes for literals lengths, match lengths, and offsets.
  
@@ -819,6 +623,9 @@ and interleaved with raw additional bits in the same bitstream.
  
  Literals length codes are values ranging from `0` to `35` included.
  They define lengths from 0 to 131071 bytes.
+The literals length is equal to the decoded `Baseline` plus
+the result of reading `Number_of_Bits` bits from the bitstream,
+as a little-endian value.
  
  | `Literals_Length_Code` |         0-15           |
  | ---------------------- | ---------------------- |
@@ -840,23 +647,14 @@ They define lengths from 0 to 131071 bytes.
  | `Baseline`             | 8192 |16384 |32768 |65536 |
  | `Number_of_Bits`       |  13  |  14  |  15  |  16  |
  
-##### Default distribution for literals length codes
-
-When `Compression_Mode` is `Predefined_Mode`,
-a predefined distribution is used for FSE compression.
-
-Its definition is below. It uses an accuracy of 6 bits (64 states).
-```
-short literalsLength_defaultDistribution[36] =
-        { 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
-          2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1,
-         -1,-1,-1,-1 };
-```
  
  ##### Match length codes
  
  Match length codes are values ranging from `0` to `52` included.
  They define lengths from 3 to 131074 bytes.
+The match length is equal to the decoded `Baseline` plus
+the result of reading `Number_of_Bits` bits from the bitstream,
+as a little-endian value.
  
  | `Match_Length_Code` |         0-31            |
  | ------------------- | ----------------------- |
@@ -878,20 +676,6 @@ They define lengths from 3 to 131074 bytes.
  | `Baseline`          | 4099 | 8195 |16387 |32771 |65539 |
  | `Number_of_Bits`    |  12  |  13  |  14  |  15  |  16  |
  
-##### Default distribution for match length codes
-
-When `Compression_Mode` is defined as `Predefined_Mode`,
-a predefined distribution is used for FSE compression.
-
-Its definition is below. It uses an accuracy of 6 bits (64 states).
-```
-short matchLengths_defaultDistribution[53] =
-        { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
-          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,
-         -1,-1,-1,-1,-1 };
-```
-
  ##### Offset codes
  
  Offset codes are values ranging from `0` to `N`.
@@ -901,7 +685,7 @@ Recommendation is to support at least up to `22`.
  For information, at the time of this writing.
  the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
  
-An offset code is also the number of additional bits to read,
+An offset code is also the number of additional bits to read in little-endian fashion,
  and can be translated into an `Offset_Value` using the following formulas :
  
  ```
@@ -911,44 +695,204 @@ if (Offset_Value > 3) offset = Offset_Value - 3;
  It means that maximum `Offset_Value` is `(2^(N+1))-1` and it supports back-reference distance up to `(2^(N+1))-4`
  but is limited by [maximum back-reference distance](#window_descriptor).
  
-`Offset_Value` from 1 to 3 are special : they define "repeat codes",
-which means one of the previous offsets will be repeated.
-They are sorted in recency order, with 1 meaning the most recent one.
-See [Repeat offsets](#repeat-offsets) paragraph.
+`Offset_Value` from 1 to 3 are special : they define "repeat codes".
+This is described in more detail in [Repeat Offsets](#repeat-offsets).
  
+#### Decoding Sequences
+FSE bitstreams are read in reverse direction than written. In zstd,
+the compressor writes bits forward into a block and the decompressor
+must read the bitstream _backwards_.
  
-##### Default distribution for offset codes
-
-When `Compression_Mode` is defined as `Predefined_Mode`,
-a predefined distribution is used for FSE compression.
-
-Below is its definition. It uses an accuracy of 5 bits (32 states),
-and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
+To find the start of the bitstream it is therefore necessary to
+know the offset of the last byte of the block which can be found
+by counting `Block_Size` bytes after the block header.
  
-If any sequence in the compressed block requires an offset larger than this,
-it's not possible to use the default distribution to represent it.
+After writing the last bit containing information, the compressor
+writes a single `1`-bit and then fills the byte with 0-7 `0` bits of
+padding. The last byte of the compressed bitstream cannot be `0` for
+that reason.
  
-```
-short offsetCodes_defaultDistribution[29] =
-        { 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
-          1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 };
-```
+When decompressing, the last byte containing the padding is the first
+byte to read. The decompressor needs to skip 0-7 initial `0`-bits and
+the first `1`-bit it occurs. Afterwards, the useful part of the bitstream
+begins.
+
+FSE decoding requires a 'state' to be carried from symbol to symbol.
+For more explanation on FSE decoding, see the [FSE section](#fse).
+
+For sequence decoding, a separate state must be kept track of for each of
+literal lengths, offsets, and match lengths.
+Some FSE primitives are also used.
+For more details on the operation of these primitives, see the [FSE section](#fse).
+
+##### Starting states
+The bitstream starts with initial FSE state values,
+each using the required number of bits in their respective _accuracy_,
+decoded previously from their normalized distribution.
+
+It starts by `Literals_Length_State`,
+followed by `Offset_State`,
+and finally `Match_Length_State`.
+
+Reminder : always keep in mind that all values are read _backward_,
+so the 'start' of the bitstream is at the highest position in memory,
+immediately before the last `1`-bit for padding.
+
+After decoding the starting states, a single sequence is decoded
+`Number_Of_Sequences` times.
+These sequences are decoded in order from first to last.
+Since the compressor writes the bitstream in the forward direction,
+this means the compressor must encode the sequences starting with the last
+one and ending with the first.
+
+##### Decoding a sequence
+For each of the symbol types, the FSE state can be used to determine the appropriate code.
+The code then defines the baseline and number of bits to read for each type.
+See the [description of the codes] for how to determine these values.
+
+[description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets
+
+Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
+It then does the same for `Match_Length`,
+and then for `Literals_Length`.
+This sequence is then used for [sequence execution](#sequence-execution).
+
+If it is not the last sequence in the block,
+the next operation is to update states.
+Using the rules pre-calculated in the decoding tables,
+`Literals_Length_State` is updated,
+followed by `Match_Length_State`,
+and then `Offset_State`.
+See the [FSE section](#fse) for details on how to update states from the bitstream.
+
+This operation will be repeated `Number_of_Sequences` times.
+At the end, the bitstream shall be entirely consumed,
+otherwise the bitstream is considered corrupted.
+
+#### Default Distributions
+If `Predefined_Mode` is selected for a symbol type,
+its FSE decoding table is generated from a predefined distribution table defined here.
+For details on how to convert this distribution into a decoding table, see the [FSE section].
+
+[FSE section]: #from-normalized-distribution-to-decoding-tables
+
+Sequence Execution
+------------------
+Once literals and sequences have been decoded,
+they are combined to produce the decoded content of a block.
+
+Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`),
+decoded as described in the [Sequences Section)[#sequences-section].
+To execute a sequence, first copy `literals_length` bytes from the literals section
+to the output.
+
+Then `match_length` bytes are copied from previous decoded data.
+The offset to copy from is determined by `offset_value`:
+if `offset_value > 3`, then the offset is `offset_value - 3`.
+If `offset_value` is from 1-3, the offset is a special repeat offset value.
+See the [repeat offset](#repeat-offsets) section for how the offset is determined
+in this case.
+
+The offset is defined as from the current position, so an offset of 6
+and a match length of 3 means that 3 bytes should be copied from 6 bytes back.
+Note that all offsets must be at most equal to the window size defined by the frame header.
+
+#### Repeat offsets
+As seen in [Sequence Execution](#sequence-execution),
+the first 3 values define a repeated offset and we will call them
+`Repeated_Offset1`, `Repeated_Offset2`, and `Repeated_Offset3`.
+They are sorted in recency order, with `Repeated_Offset1` meaning "most recent one".
+
+If `offset_value == 1`, then the offset used is `Repeated_Offset1`, etc.
+
+There is an exception though, when current sequence's `literals_length = 0`.
+In this case, repeated offsets are shifted by one,
+so an `offset_value` of 1 means `Repeated_Offset2`,
+an `offset_value` of 2 means `Repeated_Offset3`,
+and an `offset_value` of 3 means `Repeated_Offset1 - 1_byte`.
+
+In the first block, the offset history is populated with the following values : 1, 4 and 8 (in order).
+
+Then each block gets its starting offset history from the ending values of the most recent compressed block.
+Note that non-compressed blocks are skipped,
+they do not contribute to offset history.
+
+[Offset Codes]: #offset-codes
+
+###### Offset updates rules
+
+The newest offset takes the lead in offset history,
+shifting others back (up to its previous place if it was already present).
+
+This means that when `Repeated_Offset1` (most recent) is used, history is unmodified.
+When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`.
+If any other offset is used, it becomes `Repeated_Offset1` and the rest are shift back by one.
+
+Skippable Frames
+----------------
+
+| `Magic_Number` | `Frame_Size` | `User_Data` |
+|:--------------:|:------------:|:-----------:|
+|   4 bytes      |  4 bytes     |   n bytes   |
+
+Skippable frames allow the insertion of user-defined data
+into a flow of concatenated frames.
+Its design is pretty straightforward,
+with the sole objective to allow the decoder to quickly skip
+over user-defined data and continue decoding.
  
-#### Distribution tables
+Skippable frames defined in this specification are compatible with [LZ4] ones.
+
+[LZ4]:http://www.lz4.org
  
-Following the header, up to 3 distribution tables can be described.
-When present, they are in this order :
-- Literals lengths
-- Offsets
-- Match Lengths
+__`Magic_Number`__
  
-The content to decode depends on their respective encoding mode :
-- `Predefined_Mode` : no content. Use the predefined distribution table.
-- `RLE_Mode` : 1 byte. This is the only code to use across the whole compressed block.
-- `FSE_Compressed_Mode` : A distribution table is present.
-- `Repeat_Mode` : no content. Re-use distribution from previous compressed block.
+4 Bytes, little-endian format.
+Value : 0x184D2A5?, which means any value from 0x184D2A50 to 0x184D2A5F.
+All 16 values are valid to identify a skippable frame.
  
-##### FSE distribution table : condensed format
+__`Frame_Size`__
+
+This is the size, in bytes, of the following `User_Data`
+(without including the magic number nor the size field itself).
+This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
+This means `User_Data` can’t be bigger than (2^32-1) bytes.
+
+__`User_Data`__
+
+The `User_Data` can be anything. Data will just be skipped by the decoder.
+
+Entropy Encoding
+----------------
+Two types of entropy encoding are used by the Zstandard format:
+FSE, and Huffman coding.
+
+FSE
+---
+FSE, or FiniteStateEntropy is an entropy coding based on [ANS].
+FSE encoding/decoding involves a state that is carried over between symbols,
+so decoding must be done in the opposite direction as encoding.
+Therefore, all FSE bitstreams are read from end to beginning.
+
+For additional details on FSE, see [Finite State Entropy].
+
+[Finite State Entropy]:https://github.com/Cyan4973/FiniteStateEntropy/
+
+FSE decoding involves a decoding table which has a power of 2 size and three elements:
+`Symbol`, `Num_Bits`, and `Baseline`.
+The `log2` of the table size is its `Accuracy_Log`.
+The FSE state represents an index in this table.
+The next symbol in the stream is the symbol indicated by the table value for that state.
+To obtain the next state value,
+the decoder should consume `Num_Bits` bits from the stream as a little endian value and add it to baseline.
+
+To obtain the initial state value, consume `Accuracy_Log` bits from the stream as a little endian value.
+
+[ANS]: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
+
+### FSE Table Description
+To decode FSE streams, it is necessary to construct the decoding table.
+The Zstandard format encodes FSE table descriptions as follows:
  
  An FSE distribution table describes the probabilities of all symbols
  from `0` to the last present one (included)
@@ -960,8 +904,6 @@ since it will be discovered and reported by the decoding process.
  
  The bitstream starts by reporting on which scale it operates.
  `Accuracy_Log = low4bits + 5`.
-Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
-and for offsets is `8`. Higher values are considered errors.
  
  Then follows each symbol value, from `0` to last present one.
  The number of bits used by each field is variable.
@@ -997,10 +939,10 @@ Probability is obtained from Value decoded by following formula :
  
  It means value `0` becomes negative probability `-1`.
  `-1` is a special probability, which means "less than 1".
-Its effect on distribution table is described in [next paragraph].
-For the purpose of calculating cumulated distribution, it counts as one.
+Its effect on distribution table is described in the [next section].
+For the purpose of calculating total allocated probability points, it counts as one.
  
-[next paragraph]:#fse-decoding--from-normalized-distribution-to-decoding-tables
+[next section]:#from-normalized-distribution-to-decoding-tables
  
  When a symbol has a __probability__ of `zero`,
  it is followed by a 2-bits repeat flag.
@@ -1018,14 +960,14 @@ and how many symbols are present.
  The bitstream consumes a round number of bytes.
  Any remaining bit within the last byte is just unused.
  
-##### FSE decoding : from normalized distribution to decoding tables
+##### From normalized distribution to decoding tables
  
  The distribution of normalized probabilities is enough
  to create a unique decoding table.
  
  It follows the following build rule :
  
-The table has a size of `tableSize = 1 << Accuracy_Log`.
+The table has a size of `Table_Size = 1 << Accuracy_Log`.
  Each cell describes the symbol decoded,
  and instructions to get the next state.
  
@@ -1045,8 +987,7 @@ position += (tableSize>>1) + (tableSize>>3) + 3;
  position &= tableSize-1;
  ```
  
-A position is skipped if already occupied,
-typically by a "less than 1" probability symbol.
+A position is skipped if already occupied by a "less than 1" probability symbol.
  `position` does not reset between symbols, it simply iterates through
  each position in the table, switching to the next symbol when enough
  states have been allocated to the current one.
@@ -1084,16 +1025,16 @@ Numbering starts from higher states using less bits.
  The next state is determined from current state
  by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
  
+See [Appendix A] for the results of this process applied to the default distributions.
  
-#### Bitstream
+[Appendix A]: #appendix-a---decoding-tables-for-predefined-codes
  
-FSE bitstreams are read in reverse direction than written. In zstd,
-the compressor writes bits forward into a block and the decompressor
-must read the bitstream _backwards_.
-
-To find the start of the bitstream it is therefore necessary to
-know the offset of the last byte of the block which can be found
-by counting `Block_Size` bytes after the block header.
+Huffman Coding
+--------------
+Zstandard Huffman-coded streams are read backwards,
+similar to the FSE bitstreams.
+Therefore, to find the start of the bitstream it is therefore necessary to
+know the offset of the last byte of the Huffman-coded stream.
  
  After writing the last bit containing information, the compressor
  writes a single `1`-bit and then fills the byte with 0-7 `0` bits of
@@ -1105,77 +1046,191 @@ byte to read. The decompressor needs to skip 0-7 initial `0`-bits and
  the first `1`-bit it occurs. Afterwards, the useful part of the bitstream
  begins.
  
-##### Starting states
+The bitstream contains Huffman-coded symbols in little-endian order,
+with the codes defined by the method below.
  
-The bitstream starts with initial state values,
-each using the required number of bits in their respective _accuracy_,
-decoded previously from their normalized distribution.
+### Huffman Tree Description
+Prefix coding represents symbols from an a priori known alphabet
+by bit sequences (codewords), one codeword for each symbol,
+in a manner such that different symbols may be represented
+by bit sequences of different lengths,
+but a parser can always parse an encoded string
+unambiguously symbol-by-symbol.
  
-It starts by `Literals_Length_State`,
-followed by `Offset_State`,
-and finally `Match_Length_State`.
+Given an alphabet with known symbol frequencies,
+the Huffman algorithm allows the construction of an optimal prefix code
+using the fewest bits of any possible prefix codes for that alphabet.
  
-Reminder : always keep in mind that all values are read _backward_.
+Prefix code must not exceed a maximum code length.
+More bits improve accuracy but cost more header size,
+and require more memory or more complex decoding operations.
+This specification limits maximum code length to 11 bits.
  
-##### Decoding a sequence
  
-A state gives a code.
-A code provides `Baseline` and `Number_of_Bits` to add.
-See [Symbol Decoding] section for details on each symbol.
+##### Representation
  
-Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
-It then does the same for `Match_Length`,
-and then for `Literals_Length`.
+All literal values from zero (included) to last present one (excluded)
+are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
+Transformation from `Weight` to `Number_of_Bits` follows this formula :
+```
+Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
+```
+The last symbol's `Weight` is deduced from previously decoded ones,
+by completing to the nearest power of 2.
+This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
  
-`Offset`, `Match_Length`, and `Literals_Length` define a sequence.
-It starts by inserting the number of literals defined by `Literals_Length`,
-then continue by copying `Match_Length` bytes from `currentPos - Offset`.
+__Example__ :
+Let's presume the following Huffman tree must be described :
  
-If it is not the last sequence in the block,
-the next operation is to update states.
-Using the rules pre-calculated in the decoding tables,
-`Literals_Length_State` is updated,
-followed by `Match_Length_State`,
-and then `Offset_State`.
+|     literal      |  0  |  1  |  2  |  3  |  4  |  5  |
+| ---------------- | --- | --- | --- | --- | --- | --- |
+| `Number_of_Bits` |  1  |  2  |  3  |  0  |  4  |  4  |
  
-This operation will be repeated `Number_of_Sequences` times.
-At the end, the bitstream shall be entirely consumed,
-otherwise the bitstream is considered corrupted.
+The tree depth is 4, since its smallest element uses 4 bits.
+Value `5` will not be listed as it can be determined from the values for 0-4,
+nor will values above `5` as they are all 0.
+Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
+Weight formula is :
+```
+Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
+```
+It gives the following series of weights :
  
-[Symbol Decoding]:#the-codes-for-literals-lengths-match-lengths-and-offsets
+| literal  |  0  |  1  |  2  |  3  |  4  |
+| -------- | --- | --- | --- | --- | --- |
+| `Weight` |  4  |  3  |  2  |  0  |  1  |
  
-##### Repeat offsets
+The decoder will do the inverse operation :
+having collected weights of literals from `0` to `4`,
+it knows the last literal, `5`, is present with a non-zero weight.
+The weight of `5` can be determined by advancing to the next power of 2.
+The sum of `2^(Weight-1)` (excluding 0's) is :
+`8 + 4 + 2 + 0 + 1 = 15`.
+Nearest power of 2 is 16.
+Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = 1`.
  
-As seen in [Offset Codes], the first 3 values define a repeated offset and we will call them `Repeated_Offset1`, `Repeated_Offset2`, and `Repeated_Offset3`.
-They are sorted in recency order, with `Repeated_Offset1` meaning "most recent one".
+##### Huffman Tree header
  
-There is an exception though, when current sequence's literals length is `0`.
-In this case, repeated offsets are shifted by one,
-so `Repeated_Offset1` becomes `Repeated_Offset2`, `Repeated_Offset2` becomes `Repeated_Offset3`,
-and `Repeated_Offset3` becomes `Repeated_Offset1 - 1_byte`.
+This is a single byte value (0-255),
+which describes how to decode the list of weights.
  
-In the first block, the offset history is populated with the following values : 1, 4 and 8 (in order).
+- if `headerByte` >= 128 : this is a direct representation,
+  where each `Weight` is written directly as a 4 bits field (0-15).
+  They are encoded forward, 2 weights to a byte with the first weight taking
+  the top four bits and the second taking the bottom four (e.g. the following
+  operations could be used to read the weights:
+  `Weight[0] = (Byte[0] >> 4), Weight[1] = (Byte[0] & 0xf)`, etc.).
+  The full representation occupies `((Number_of_Symbols+1)/2)` bytes,
+  meaning it uses a last full byte even if `Number_of_Symbols` is odd.
+  `Number_of_Symbols = headerByte - 127`.
+  Note that maximum `Number_of_Symbols` is 255-127 = 128.
+  A larger series must necessarily use FSE compression.
  
-Then each block gets its starting offset history from the ending values of the most recent compressed block.
-Note that non-compressed blocks are skipped,
-they do not contribute to offset history.
+- if `headerByte` < 128 :
+  the series of weights is compressed by FSE.
+  The length of the FSE-compressed series is equal to `headerByte` (0-127).
  
-[Offset Codes]: #offset-codes
+##### Finite State Entropy (FSE) compression of Huffman weights
  
-###### Offset updates rules
+In this case, the series of Huffman weights is compressed using FSE compression.
+It's a single bitstream with 2 interleaved states,
+sharing a single distribution table.
  
-The newest offset takes the lead in offset history,
-shifting others back (up to its previous place if it was already present).
+To decode an FSE bitstream, it is necessary to know its compressed size.
+Compressed size is provided by `headerByte`.
+It's also necessary to know its _maximum possible_ decompressed size,
+which is `255`, since literal values span from `0` to `255`,
+and last symbol's weight is not represented.
  
-This means that when `Repeated_Offset1` (most recent) is used, history is unmodified.
-When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`.
-If any other offset is used, it becomes `Repeated_Offset1` and the rest are shift back by one.
+An FSE bitstream starts by a header, describing probabilities distribution.
+It will create a Decoding Table.
+For a list of Huffman weights, the maximum accuracy log is 7 bits.
+For more description see the [FSE header description](#fse-table-description)
+
+The Huffman header compression uses 2 states,
+which share the same FSE distribution table.
+The first state (`State1`) encodes the even indexed symbols,
+and the second (`State2`) encodes the odd indexes.
+State1 is initialized first, and then State2, and they take turns decoding
+a single symbol and updating their state.
+For more details on these FSE operations, see the [FSE section](#fse).
+
+The number of symbols to decode is determined
+by tracking bitStream overflow condition:
+If updating state after decoding a symbol would require more bits than
+remain in the stream, it is assumed the extra bits are 0.  Then,
+the symbols for each of the final states are decoded and the process is complete.
  
+##### Conversion from weights to Huffman prefix codes
+
+All present symbols shall now have a `Weight` value.
+It is possible to transform weights into Number_of_Bits, using this formula:
+```
+Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0
+```
+Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
+Symbols with a `Weight` of zero are removed.
+Then, starting from lowest weight, prefix codes are distributed in order.
  
-Dictionary format
+__Example__ :
+Let's presume the following list of weights has been decoded :
+
+| Literal  |  0  |  1  |  2  |  3  |  4  |  5  |
+| -------- | --- | --- | --- | --- | --- | --- |
+| `Weight` |  4  |  3  |  2  |  0  |  1  |  1  |
+
+Sorted by weight and then natural order,
+it gives the following distribution :
+
+| Literal          |  3  |  4  |  5  |  2  |  1  |   0  |
+| ---------------- | --- | --- | --- | --- | --- | ---- |
+| `Weight`         |  0  |  1  |  1  |  2  |  3  |   4  |
+| `Number_of_Bits` |  0  |  4  |  4  |  3  |  2  |   1  |
+| prefix codes     | N/A | 0000| 0001| 001 | 01  |   1  |
+
+### Huffman-coded Streams
+Given a Huffman decoding table,
+it's possible to decode a Huffman-coded stream.
+
+Each bitstream must be read _backward_,
+that is starting from the end down to the beginning.
+Therefore it's necessary to know the size of each bitstream.
+
+It's also necessary to know exactly which _bit_ is the latest.
+This is detected by a final bit flag :
+the highest bit of latest byte is a final-bit-flag.
+Consequently, a last byte of `0` is not possible.
+And the final-bit-flag itself is not part of the useful bitstream.
+Hence, the last byte contains between 0 and 7 useful bits.
+
+For example, if the literal sequence "0145" was encoded using the prefix codes above,
+it would be encoded as:
+```
+00000001 01110000
+```
+
+|Symbol  |   5  |   4  |  1 | 0 | Padding |
+|--------|------|------|----|---|---------|
+|Encoding|`0000`|`0001`|`01`|`1`| `10000` |
+
+Starting from the end,
+it's possible to read the bitstream in a little-endian fashion,
+keeping track of already used bits.  Since the bitstream is encoded in reverse
+order, by starting at the end the symbols can be read in forward order.
+
+Reading the last `Max_Number_of_Bits` bits,
+it's then possible to compare extracted value to decoding table,
+determining the symbol to decode and number of bits to discard.
+
+The process continues up to reading the required number of symbols per stream.
+If a bitstream is not entirely and exactly consumed,
+hence reaching exactly its beginning position with _all_ bits consumed,
+the decoding process is considered faulty.
+
+Dictionary Format
  -----------------
  
-`zstd` is compatible with "raw content" dictionaries, free of any format restriction,
+Zstandard is compatible with "raw content" dictionaries, free of any format restriction,
  except that they must be at least 8 bytes.
  These dictionaries function as if they were just the `Content` block of a formatted
  dictionary.
@@ -1203,10 +1258,15 @@ _Reserved ranges :_
                - low range : 1 - 32767
                - high range : >= (2^31)
  
-__`Entropy_Tables`__ : following the same format as the tables in [compressed blocks].
+__`Entropy_Tables`__ : following the same format as the tables in compressed blocks.
+              See the relevant [FSE](#fse-table-description)
+              and [Huffman](#huffman-tree-description) sections for how to decode these tables.
                They are stored in following order :
                Huffman tables for literals, FSE table for offsets,
                FSE table for match lengths, and FSE table for literals lengths.
+              These tables populate the Repeat Stats literals mode and
+              Repeat distribution mode for sequence decoding.
+
                It's finally followed by 3 offset values, populating recent offsets (instead of using `{1,4,8}`),
                stored in order, 4-bytes little-endian each, for a total of 12 bytes.
                Each recent offset must have a value < dictionary size.
@@ -1214,9 +1274,13 @@ __`Entropy_Tables`__ : following the same format as the tables in [compressed bl
  __`Content`__ : The rest of the dictionary is its content.
                The content act as a "past" in front of data to compress or decompress,
                so it can be referenced in sequence commands.
+              As long as the amount of data decoded from this frame is less than or
+              equal to the window-size, sequence commands may specify offsets longer
+              than the lenght of total decoded output so far to reference back to the
+              dictionary.  After the total output has surpassed the window size however,
+              this is no longer allowed and the dictionary is no longer accessible.
  
  [compressed blocks]: #the-format-of-compressed_block
-
  Appendix A - Decoding tables for predefined codes
  -------------------------------------------------
  
@@ -1402,6 +1466,7 @@ to crosscheck that an implementation implements the decoding table generation al
  
  Version changes
  ---------------
+- 0.2.4 : section restructuring, by Sean Purcell
  - 0.2.3 : clarified several details, by Sean Purcell
  - 0.2.2 : added predefined codes, by Johannes Rudolph
  - 0.2.1 : clarify field names, by Przemyslaw Skibinski
author	Sean Purcell <me@seanp.xyz>
	Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)
committer	Sean Purcell <me@seanp.xyz>
	Sat, 18 Feb 2017 00:24:26 +0000 (16:24 -0800)