Nick Terrell [Wed, 27 Sep 2023 00:53:26 +0000 (17:53 -0700)]
Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
pointer-overflow at a per-function level. This is a superior approach
because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
The end goal is to only tag these functions with
`ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
that rely on pointer overflow, and gradually transition to using
these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
more, but these are the ones that came up within a few minutes of
running the fuzzers, and while running GitHub CI.
Nick Terrell [Thu, 24 Aug 2023 21:41:21 +0000 (14:41 -0700)]
Fix & refactor Huffman repeat tables for dictionaries
The Huffman repeat mode checker assumed that the CTable was zeroed in the region `[maxSymbolValue + 1, 256)`.
This assumption didn't hold for tables built in the dictionaries, because it didn't go through the same codepath.
Since this code was originally written, we added a header to the CTable that specifies the `tableLog`.
Add `maxSymbolValue` to that header, and check that the table's `maxSymbolValue` is at least the block's `maxSymbolValue`.
This solution is cleaner because we write this header for every CTable we build, so it can't be missed in any code path.
Dmitry V. Levin [Tue, 8 Aug 2023 08:00:00 +0000 (08:00 +0000)]
zdictlib: fix prototype mismatch
Fix the following warnings reported by the compiler when
ZDICTLIB_STATIC_API is not defined to ZDICTLIB_API:
lib/dictBuilder/cover.c:1122:21: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_cover' with different visibility (old visibility
preserved)
lib/dictBuilder/cover.c:736:21: warning: redeclaration of 'ZDICT_trainFromBuffer_cover' with different visibility (old visibility
+preserved)
lib/dictBuilder/fastcover.c:549:1: warning: redeclaration of 'ZDICT_trainFromBuffer_fastCover' with different visibility (old visibility
preserved)
lib/dictBuilder/fastcover.c:618:1: warning: redeclaration of 'ZDICT_optimizeTrainFromBuffer_fastCover' with different visibility (old
visibility preserved)
Nick Terrell [Mon, 21 Aug 2023 18:33:29 +0000 (11:33 -0700)]
No longer reject dictionaries with literals maxSymbolValue < 255
We already have logic in our Huffman encoder to validate Huffman tables with missing symbols.
We use this for higher compression levels to re-use the previous blocks statistics, or when the dictionaries table has zero-weighted symbols.
This check was leftover as an oversight from before we added validation for Huffman tables.
I validated that the `dictionary_loader` fuzzer has coverage of every line in the `ZSTD_loadCEntropy()` function to validate that it is correctly testing this function.
W. Felix Handte [Wed, 16 Aug 2023 16:09:12 +0000 (12:09 -0400)]
Unpoison Workspace Memory Before Freeing to Custom Free
MSAN is hooked into the system malloc, but when the user provides a custom
allocator, it may not provide the same cleansing behavior. So if we leave
memory poisoned and return it to the user's allocator, where it is re-used
elsewhere, our poisoning can blow up in some other context.
Update fileio.c: fix build failure with enabled LTO
For some reasons when LTO is enabled, the compiler complains about statbuf variable not being correctly initialized, even though the variable has an assert != NULL just few lines below (FIO_getDictFileStat)
This is the fixed build failure:
x86_64-linux-gnu-gcc -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<<PKGBUILDDIR>>=/usr/src/libzstd-1.5.5+dfsg2-1 -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow -Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement -Wstrict-prototypes -Wundef -Wpointer-arith -Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings -Wredundant-decls -Wmissing-prototypes -Wc++-compat -g -Werror -Wa,--noexecstack -Wdate-time -D_FORTIFY_SOURCE=2 -DXXH_NAMESPACE=ZSTD_ -DDEBUGLEVEL=1 -DZSTD_LEGACY_SUPPORT=5 -DZSTD_MULTITHREAD -DZSTD_GZCOMPRESS -DZSTD_GZDECOMPRESS -DZSTD_LZMACOMPRESS -DZSTD_LZMADECOMPRESS -DZSTD_LZ4COMPRESS -DZSTD_LZ4DECOMPRESS -DZSTD_LEGACY_SUPPORT=5 -c -MT obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o -MMD -MP -MF obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.d -o obj/conf_086c46a51a716b674719b8acb8484eb8/zstdcli_trace.o zstdcli_trace.c
In function ‘UTIL_isRegularFileStat’,
inlined from ‘UTIL_getFileSizeStat’ at util.c:524:10,
inlined from ‘FIO_createDResources’ at fileio.c:2230:30:
util.c:209:12: error: ‘statbuf.st_mode’ may be used uninitialized [-Werror=maybe-uninitialized]
209 | return S_ISREG(statbuf->st_mode) != 0;
| ^
fileio.c: In function ‘FIO_createDResources’:
fileio.c:2223:12: note: ‘statbuf’ declared here
2223 | stat_t statbuf;
| ^
lto1: all warnings being treated as errors
Yann Collet [Mon, 5 Jun 2023 23:03:00 +0000 (16:03 -0700)]
fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.
The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.
However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.
Fixed this behavior, in both the reference decoder and the educational behavior.
In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.
Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
Yann Collet [Mon, 5 Jun 2023 16:51:52 +0000 (09:51 -0700)]
fix a minor inefficiency in compress_superblock
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.