git.ipfire.org Git - thirdparty/zstd.git/commit

author	Yann Collet <cyan@fb.com>
	Tue, 18 Dec 2018 20:32:58 +0000 (12:32 -0800)
committer	Yann Collet <cyan@fb.com>
	Tue, 18 Dec 2018 20:32:58 +0000 (12:32 -0800)
commit	635783da123cef1027a9d7d6fe452ca36d26d58f
tree	dcd097c9c7a0a914512cc02287c81efc33668b85	tree \| snapshot
parent	373ff8b98308910bb0830ec98a269c010f40ed2f	commit \| diff

btultra2 and very small srcSize

When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.

There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.

btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.

This fixes btultra2 performance issue on very small input.

Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.

lib/compress/zstd_opt.c		diff \| blob \| blame \| history
programs/benchzstd.c		diff \| blob \| blame \| history