| ----------- | ---------- | ---------- |
| Field name | `Exponent` | `Mantissa` |
-Maximum distance is given by the following formulae :
+Maximum distance is given by the following formulas :
```
windowLog = 10 + Exponent;
windowBase = 1 << windowLog;
or all previous blocks when `Single_Segment_flag` is set.
- List of "recent offsets" from previous compressed block.
- Decoding tables of previous compressed block for each symbol type
- (literals, litLength, matchLength, offset).
+ (literals, literals lengths, match lengths, offsets).
### `Literals_Section`
#### `Huffman_Tree_Description`
-This section is only present when `Literals_Block_Type` type is `Compressed_Block` (`2`).
+This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
Prefix coding represents symbols from an a priori known alphabet
by bit sequences (codewords), one codeword for each symbol,
##### Representation
All literal values from zero (included) to last present one (excluded)
-are represented by `Weight` values, from 0 to `Max_Number_of_Bits`.
-Transformation from `Weight` to `Number_of_Bits` follows this formulae :
-`Number_of_Bits = Weight ? Max_Number_of_Bits + 1 - Weight : 0` .
+are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
+Transformation from `Weight` to `Number_of_Bits` follows this formula :
+```
+Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
+```
The last symbol's `Weight` is deduced from previously decoded ones,
by completing to the nearest power of 2.
This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
The tree depth is 4, since its smallest element uses 4 bits.
Value `5` will not be listed, nor will values above `5`.
Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
-Weight formula is : `Weight = Number_of_Bits ? Max_Number_of_Bits + 1 - Number_of_Bits : 0`.
+Weight formula is :
+```
+Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
+```
It gives the following serie of weights :
| `Weight` | 4 | 3 | 2 | 0 | 1 |
- if `headerByte` < 128 :
the serie of weights is compressed by FSE.
- The length of the FSE-compressed serie is `headerByte` (0-127).
+ The length of the FSE-compressed serie is equal to `headerByte` (0-127).
-##### FSE (Finite State Entropy) compression of Huffman weights
+##### Finite State Entropy (FSE) compression of Huffman weights
The serie of weights is compressed using FSE compression.
It's a single bitstream with 2 interleaved states,
##### Conversion from weights to Huffman prefix codes
All present symbols shall now have a `Weight` value.
-It is possible to transform weights into Number_of_Bits, using this formula :
-`Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0` .
-
+It is possible to transform weights into Number_of_Bits, using this formula:
+```
+Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0
+```
Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
Symbols with a `Weight` of zero are removed.
Then, starting from lowest weight, prefix codes are distributed in order.
| prefix codes | N/A | 0000| 0001| 001 | 01 | 1 |
-#### Literals bitstreams
+#### The content of Huffman-compressed literal stream
##### Bitstreams sizes
As seen in a previous paragraph,
-there are 2 flavors of Huffman-compressed literals :
-single stream, and 4-streams.
+there are 2 types of Huffman-compressed literals :
+a single stream and 4 streams.
-4-streams is useful for CPU with multiple execution units and out-of-order operations.
+Encoding using 4 streams is useful for CPU with multiple execution units and out-of-order operations.
Since each stream can be decoded independently,
it's possible to decode them up to 4x faster than a single stream,
presuming the CPU has enough parallelism available.
For single stream, header provides both the compressed and regenerated size.
-For 4-streams though,
+For 4 streams though,
header only provides compressed and regenerated size of all 4 streams combined.
In order to properly decode the 4 streams,
it's necessary to know the compressed and regenerated size of each stream.
Each value represents the compressed size of one stream, in order.
The last stream size is deducted from total compressed size
and from previously decoded stream sizes :
+
`stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.
+
##### Bitstreams read and decode
Each bitstream must be read _backward_,
if there is any literal left in the _literal section_,
these bytes are added at the end of the block.
-The _Sequences_Section_ regroup all symbols required to decode commands.
+The `Sequences_Section` regroup all symbols required to decode commands.
There are 3 symbol types : literals lengths, offsets and match lengths.
They are encoded together, interleaved, in a single _bitstream_.
-Each symbol is a _code_ in its own context,
-which specifies a baseline and a number of bits to add.
-_Codes_ are FSE compressed,
-and interleaved with raw additional bits in the same bitstream.
-
-The Sequences section starts by a header,
-followed by optional Probability tables for each symbol type,
+The `Sequences_Section` starts by a header,
+followed by optional probability tables for each symbol type,
followed by the bitstream.
| `Sequences_Section_Header` | [`Literals_Length_Table`] | [`Offset_Table`] | [`Match_Length_Table`] | bitStream |
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
-To decode the Sequence section, it's required to know its size.
+To decode the `Sequences_Section`, it's required to know its size.
This size is deducted from `blockSize - literalSectionSize`.
The last field, `Reserved`, must be all-zeroes.
-`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the compression mode of
-literals lengths, offsets and match lengths respectively.
+`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the `Compression_Mode` of
+literals lengths, offsets, and match lengths respectively.
They follow the same enumeration :
A distribution table will be present.
It will be described in [next part](#distribution-tables).
-#### Symbols decoding
+#### The codes for literals lengths, match lengths, and offsets.
-##### Literals Length codes
+Each symbol is a _code_ in its own context,
+which specifies `Baseline` and `Number_of_Bits` to add.
+_Codes_ are FSE compressed,
+and interleaved with raw additional bits in the same bitstream.
+
+##### Literals length codes
Literals length codes are values ranging from `0` to `35` included.
They define lengths from 0 to 131071 bytes.
| `Literals_Length_Code` | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline | 16 | 18 | 20 | 22 | 24 | 28 | 32 | 40 |
+| `Baseline` | 16 | 18 | 20 | 22 | 24 | 28 | 32 | 40 |
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
| `Literals_Length_Code` | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline | 48 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 |
+| `Baseline` | 48 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 |
| `Number_of_Bits` | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| `Literals_Length_Code` | 32 | 33 | 34 | 35 |
| ---------------------- | ---- | ---- | ---- | ---- |
-| Baseline | 8192 |16384 |32768 |65536 |
+| `Baseline` | 8192 |16384 |32768 |65536 |
| `Number_of_Bits` | 13 | 14 | 15 | 16 |
-__Default distribution__
+##### Default distribution for literals length codes
When `Compression_Mode` is `Predefined_Mode`,
a predefined distribution is used for FSE compression.
-1,-1,-1,-1 };
```
-##### Match Length codes
+##### Match length codes
Match length codes are values ranging from `0` to `52` included.
They define lengths from 3 to 131074 bytes.
| `Match_Length_Code` | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline | 35 | 37 | 39 | 41 | 43 | 47 | 51 | 59 |
+| `Baseline` | 35 | 37 | 39 | 41 | 43 | 47 | 51 | 59 |
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
| `Match_Length_Code` | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline | 67 | 83 | 99 | 131 | 258 | 514 | 1026 | 2050 |
+| `Baseline` | 67 | 83 | 99 | 131 | 258 | 514 | 1026 | 2050 |
| `Number_of_Bits` | 4 | 4 | 5 | 7 | 8 | 9 | 10 | 11 |
| `Match_Length_Code` | 48 | 49 | 50 | 51 | 52 |
| ------------------- | ---- | ---- | ---- | ---- | ---- |
-| Baseline | 4098 | 8194 |16486 |32770 |65538 |
+| `Baseline` | 4098 | 8194 |16486 |32770 |65538 |
| `Number_of_Bits` | 12 | 13 | 14 | 15 | 16 |
-__Default distribution__
+##### Default distribution for match length codes
When `Compression_Mode` is defined as `Predefined_Mode`,
a predefined distribution is used for FSE compression.
-Here is its definition. It uses an accuracy of 6 bits (64 states).
+Below is its definition. It uses an accuracy of 6 bits (64 states).
```
short matchLengths_defaultDistribution[53] =
{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
An offset code is also the number of additional bits to read,
-and can be translated into an `Offset_Value` using the following formulae :
+and can be translated into an `Offset_Value` using the following formulas :
```
Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
if (Offset_Value > 3) offset = Offset_Value - 3;
```
-It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to 2^(N+1))-4
+It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to `2^(N+1))-4`
but is limited by [maximum back-reference distance](#window_descriptor).
-Offset_Value from 1 to 3 are special : they define "repeat codes",
+`Offset_Value` from 1 to 3 are special : they define "repeat codes",
which means one of the previous offsets will be repeated.
They are sorted in recency order, with 1 meaning the most recent one.
See [Repeat offsets](#repeat-offsets) paragraph.
-__Default distribution__
+
+##### Default distribution for offset codes
When `Compression_Mode` is defined as `Predefined_Mode`,
a predefined distribution is used for FSE compression.
-Here is its definition. It uses an accuracy of 5 bits (32 states),
+Below is its definition. It uses an accuracy of 5 bits (32 states),
and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
If any sequence in the compressed block requires an offset larger than this,
Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
and for offsets is `8`. Higher values are considered errors.
-Then follow each symbol value, from `0` to last present one.
+Then follows each symbol value, from `0` to last present one.
The number of bits used by each field is variable.
It depends on :
Symbols probabilities are read one by one, in order.
-Probability is obtained from Value decoded by following formulae :
+Probability is obtained from Value decoded by following formula :
`Proba = value - 1`
It means value `0` becomes negative probability `-1`.
-`-1` is a special probability, which means `less than 1`.
+`-1` is a special probability, which means "less than 1".
Its effect on distribution table is described in [next paragraph].
For the purpose of calculating cumulated distribution, it counts as one.
The result is a list of state values.
Each state will decode the current symbol.
-To get the Number of bits and baseline required for next state,
+To get the `Number_of_Bits` and `Baseline` required for next state,
it's first necessary to sort all states in their natural order.
The lower states will need 1 more bit than higher ones.
| width | 32 | 32 | 32 | 16 | 16 |
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
| range number | 2 | 4 | 6 | 0 | 1 |
-| baseline | 32 | 64 | 96 | 0 | 16 |
+| `Baseline` | 32 | 64 | 96 | 0 | 16 |
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
Next state is determined from current state
-by reading the required number of bits, and adding the specified baseline.
+by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
#### Bitstream
##### Decoding a sequence
A state gives a code.
-A code provides a baseline and number of bits to add.
+A code provides `Baseline` and `Number_of_Bits` to add.
See [Symbol Decoding] section for details on each symbol.
-Decoding starts by reading the number of bits required to decode offset.
-It then does the same for match length,
-and then for literals length.
+Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
+It then does the same for `Match_Length`,
+and then for `Literals_Length`.
-Offset / matchLength / litLength define a sequence.
-It starts by inserting the number of literals defined by `litLength`,
-then continue by copying `matchLength` bytes from `currentPos - offset`.
+`Offset`, `Match_Length`, and `Literals_Length` define a sequence.
+It starts by inserting the number of literals defined by `Literals_Length`,
+then continue by copying `Match_Length` bytes from `currentPos - Offset`.
The next operation is to update states.
Using rules pre-calculated in the decoding tables,
At the end, the bitstream shall be entirely consumed,
otherwise bitstream is considered corrupted.
-[Symbol Decoding]:#symbols-decoding
+[Symbol Decoding]:#the-codes-for-literals-lengths-match-lengths-and-offsets
##### Repeat offsets
__`Entropy_Tables`__ : following the same format as a [compressed blocks].
They are stored in following order :
- Huffman tables for literals, FSE table for offset,
- FSE table for matchLenth, and FSE table for litLength.
+ Huffman tables for literals, FSE table for offsets,
+ FSE table for match lengths, and FSE table for literals lengths.
It's finally followed by 3 offset values, populating recent offsets,
stored in order, 4-bytes little-endian each, for a total of 12 bytes.