From: elasota <1137273+elasota@users.noreply.github.com> Date: Thu, 20 Jun 2024 19:19:58 +0000 (-0400) Subject: Throw error if Huffman weight initial states are truncated X-Git-Tag: v1.5.7^2~108^2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=0938308ff69b3a7679898d75403832bafe43ba89;p=thirdparty%2Fzstd.git Throw error if Huffman weight initial states are truncated --- diff --git a/doc/decompressor_permissive.md b/doc/decompressor_permissive.md index bd77165f0..164d6c86d 100644 --- a/doc/decompressor_permissive.md +++ b/doc/decompressor_permissive.md @@ -18,6 +18,26 @@ This document lists a few known cases where invalid data was formerly accepted by the decoder, and what has changed since. +Truncated Huffman states +------------------------ + +**Last affected version**: v1.5.6 + +**Produced by the reference compressor**: No + +**Example Frame**: `28b5 2ffd 0000 5500 0072 8001 0420 7e1f 02aa 00` + +When using FSE-compressed Huffman weights, the compressed weight bitstream +could contain fewer bits than necessary to decode the initial states. + +The reference decompressor up to v1.5.6 will decode truncated or missing +initial states as zero, which can result in a valid Huffman tree if only +the second state is truncated. + +In newer versions, truncated initial states are reported as a corruption +error by the decoder. + + Offset == 0 ----------- diff --git a/doc/zstd_compression_format.md b/doc/zstd_compression_format.md index 5cae85524..fb0090f9a 100644 --- a/doc/zstd_compression_format.md +++ b/doc/zstd_compression_format.md @@ -1362,6 +1362,10 @@ symbols for each of the final states are decoded and the process is complete. If this process would produce more weights than the maximum number of decoded weights (255), then the data is considered corrupted. +If either of the 2 initial states are absent or truncated, then the data is +considered corrupted. Consequently, it is not possible to encode fewer than +2 weights using this mode. + #### Conversion from weights to Huffman prefix codes All present symbols shall now have a `Weight` value. diff --git a/lib/common/fse_decompress.c b/lib/common/fse_decompress.c index 0dcc4640d..c8f1bb0cf 100644 --- a/lib/common/fse_decompress.c +++ b/lib/common/fse_decompress.c @@ -190,6 +190,8 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_usingDTable_generic( FSE_initDState(&state1, &bitD, dt); FSE_initDState(&state2, &bitD, dt); + RETURN_ERROR_IF(BIT_reloadDStream(&bitD)==BIT_DStream_overflow, corruption_detected, ""); + #define FSE_GETSYMBOL(statePtr) fast ? FSE_decodeSymbolFast(statePtr, &bitD) : FSE_decodeSymbol(statePtr, &bitD) /* 4 symbols per loop */ diff --git a/tests/golden-decompression-errors/truncated_huff_state.zst b/tests/golden-decompression-errors/truncated_huff_state.zst new file mode 100644 index 000000000..2ce18c0b7 Binary files /dev/null and b/tests/golden-decompression-errors/truncated_huff_state.zst differ