]> git.ipfire.org Git - thirdparty/zstd.git/blame - README.md
updated CHANGELOG in preparation for v1.5.4 release
[thirdparty/zstd.git] / README.md
CommitLineData
821efa46 1<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p>
4856a001 2
78de2823
YC
3__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm,
4targeting real-time compression scenarios at zlib-level and better compression ratios.
5It's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
4856a001 6
ecb2daea 7Zstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available.
70a80b6b 8This repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C** library,
78de2823
YC
9and a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
10Should your project require another programming language,
4dffc35f 11a list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
78de2823 12
c4b9b8aa
LT
13**Development branch status:**
14
15[![Build Status][travisDevBadge]][travisLink]
16[![Build status][AppveyorDevBadge]][AppveyorLink]
17[![Build status][CircleDevBadge]][CircleLink]
281c7970 18[![Build status][CirrusDevBadge]][CirrusLink]
8826f3b4 19[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink]
3d55e1fb 20
912bb9fb
GK
21[travisDevBadge]: https://api.travis-ci.com/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
22[travisLink]: https://travis-ci.com/facebook/zstd
3d55e1fb
YC
23[AppveyorDevBadge]: https://ci.appveyor.com/api/projects/status/xt38wbdxjk5mrbem/branch/dev?svg=true "Windows test suite"
24[AppveyorLink]: https://ci.appveyor.com/project/YannCollet/zstd-p0yf0
11bff3fb
YC
25[CircleDevBadge]: https://circleci.com/gh/facebook/zstd/tree/dev.svg?style=shield "Short test suite"
26[CircleLink]: https://circleci.com/gh/facebook/zstd
1a6f2b4f 27[CirrusDevBadge]: https://api.cirrus-ci.com/github/facebook/zstd.svg?branch=dev
281c7970 28[CirrusLink]: https://cirrus-ci.com/github/facebook/zstd
8826f3b4
HL
29[OSSFuzzBadge]: https://oss-fuzz-build-logs.storage.googleapis.com/badges/zstd.svg
30[OSSFuzzLink]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:zstd
3d55e1fb 31
c4b9b8aa 32## Benchmarks
4856a001 33
78de2823 34For reference, several fast compression algorithms were tested and compared
5c630962
YC
35on a desktop running Ubuntu 20.04 (`Linux 5.11.0-41-generic`),
36with a Core i7-9700K CPU @ 4.9GHz,
e5c4f040 37using [lzbench], an open-source in-memory benchmark by @inikep
f2f86b50 38compiled with [gcc] 9.3.0,
baa9b114 39on the [Silesia compression corpus].
45ff4309 40
e5c4f040 41[lzbench]: https://github.com/inikep/lzbench
4dffc35f 42[Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia
184e1c8b 43[gcc]: https://gcc.gnu.org/
45ff4309 44
764c2fdf
YC
45| Compressor name | Ratio | Compression| Decompress.|
46| --------------- | ------| -----------| ---------- |
5c630962
YC
47| **zstd 1.5.1 -1** | 2.887 | 530 MB/s | 1700 MB/s |
48| [zlib] 1.2.11 -1 | 2.743 | 95 MB/s | 400 MB/s |
49| brotli 1.0.9 -0 | 2.702 | 395 MB/s | 450 MB/s |
925649b2
YC
50| **zstd 1.5.1 --fast=1** | 2.437 | 600 MB/s | 2150 MB/s |
51| **zstd 1.5.1 --fast=3** | 2.239 | 670 MB/s | 2250 MB/s |
5c630962 52| quicklz 1.5.0 -1 | 2.238 | 540 MB/s | 760 MB/s |
925649b2 53| **zstd 1.5.1 --fast=4** | 2.148 | 710 MB/s | 2300 MB/s |
5c630962
YC
54| lzo1x 2.10 -1 | 2.106 | 660 MB/s | 845 MB/s |
55| [lz4] 1.9.3 | 2.101 | 740 MB/s | 4500 MB/s |
56| lzf 3.6 -1 | 2.077 | 410 MB/s | 830 MB/s |
57| snappy 1.1.9 | 2.073 | 550 MB/s | 1750 MB/s |
56213d89 58
4dffc35f
DR
59[zlib]: https://www.zlib.net/
60[lz4]: https://lz4.github.io/lz4/
4856a001 61
f2f86b50 62The negative compression levels, specified with `--fast=#`,
5c630962
YC
63offer faster compression and decompression speed
64at the cost of compression ratio (compared to level 1).
f2f86b50 65
ec2031e2 66Zstd can also offer stronger compression ratios at the cost of compression speed.
78de2823
YC
67Speed vs Compression trade-off is configurable by small increments.
68Decompression speed is preserved and remains roughly the same at all settings,
69a property shared by most LZ compression algorithms, such as [zlib] or lzma.
4856a001 70
e5c4f040 71The following tests were run
184e1c8b 72on a server running Linux Debian (`Linux version 4.14.0-3-amd64`)
e5c4f040
YC
73with a Core i7-6700K CPU @ 4.0GHz,
74using [lzbench], an open-source in-memory benchmark by @inikep
184e1c8b 75compiled with [gcc] 7.3.0,
e5c4f040 76on the [Silesia compression corpus].
7671f393
YC
77
78Compression Speed vs Ratio | Decompression Speed
8d8d59e9 79---------------------------|--------------------
184e1c8b 80![Compression Speed vs Ratio](doc/images/CSpeed2.png "Compression Speed vs Ratio") | ![Decompression Speed](doc/images/DSpeed3.png "Decompression Speed")
067a83a2 81
78de2823
YC
82A few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph.
83For a larger picture including slow modes, [click on this link](doc/images/DCspeed5.png).
8d8d59e9 84
4856a001 85
c4b9b8aa 86## The case for Small Data compression
f506c8b2 87
c13cd3aa 88Previous charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.
45ff4309 89
c13cd3aa 90The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no "past" to build upon.
45ff4309 91
c13cd3aa 92To solve this situation, Zstd offers a __training mode__, which can be used to tune the algorithm for a selected type of data.
385f8d96 93Training Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression.
c13cd3aa 94Using this dictionary, the compression ratio achievable on small data improves dramatically.
45ff4309 95
d44d363e 96The following example uses the `github-users` [sample set](https://github.com/facebook/zstd/releases/tag/v1.1.3), created from [github public API](https://developer.github.com/v3/users/#get-all-users).
385f8d96 97It consists of roughly 10K records weighing about 1KB each.
45ff4309 98
c13cd3aa
YC
99Compression Ratio | Compression Speed | Decompression Speed
100------------------|-------------------|--------------------
101![Compression Ratio](doc/images/dict-cr.png "Compression Ratio") | ![Compression Speed](doc/images/dict-cs.png "Compression Speed") | ![Decompression Speed](doc/images/dict-ds.png "Decompression Speed")
45ff4309 102
c13cd3aa
YC
103
104These compression gains are achieved while simultaneously providing _faster_ compression and decompression speeds.
105
106Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no _universal dictionary_).
107Hence, deploying one dictionary per type of data will provide the greatest benefits.
108Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
31dd08ca 109
c4b9b8aa 110### Dictionary compression How To:
31dd08ca 111
c4b9b8aa 1121. Create the dictionary
31dd08ca 113
c4b9b8aa 114 `zstd --train FullPathToTrainingSet/* -o dictionaryName`
31dd08ca 115
c4b9b8aa 1162. Compress with dictionary
31dd08ca 117
c4b9b8aa 118 `zstd -D dictionaryName FILE`
31dd08ca 119
c4b9b8aa 1203. Decompress with dictionary
31dd08ca 121
c4b9b8aa 122 `zstd -D dictionaryName --decompress FILE.zst`
c13cd3aa 123
31dd08ca 124
c4b9b8aa 125## Build instructions
4c9a4c18 126
b33ef916
YC
127`make` is the officially maintained build system of this project.
128All other build systems are "compatible" and 3rd-party maintained,
129they may feature small differences in advanced options.
130When your system allows it, prefer using `make` to build `zstd` and `libzstd`.
131
c4b9b8aa 132### Makefile
4c9a4c18 133
78de2823
YC
134If your system is compatible with standard `make` (or `gmake`),
135invoking `make` in root directory will generate `zstd` cli in root directory.
b33ef916 136It will also create `libzstd` into `lib/`.
4c9a4c18 137
287db17c 138Other available options include:
78de2823 139- `make install` : create and install zstd cli, library and man pages
b33ef916
YC
140- `make check` : create and run `zstd`, test its behavior on local platform
141
142The `Makefile` follows the [GNU Standard Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html),
143allowing staged install, standard flags, directory variables and command variables.
4c9a4c18 144
6be31813
YC
145For advanced use cases, specialized compilation flags which control binary generation
146are documented in [`lib/README.md`](lib/README.md#modular-build) for the `libzstd` library
147and in [`programs/README.md`](programs/README.md#compilation-variables) for the `zstd` CLI.
148
c4b9b8aa 149### cmake
4c9a4c18
YC
150
151A `cmake` project generator is provided within `build/cmake`.
152It can generate Makefiles or other build scripts
153to create `zstd` binary, and `libzstd` dynamic and static libraries.
154
a6df9614
EZ
155By default, `CMAKE_BUILD_TYPE` is set to `Release`.
156
c4b9b8aa
LT
157### Meson
158
159A Meson project is provided within [`build/meson`](build/meson). Follow
160build instructions in that directory.
161
162You can also take a look at [`.travis.yml`](.travis.yml) file for an
163example about how Meson is used to build this project.
da145123 164
c4b9b8aa 165Note that default build type is **release**.
da145123 166
3e8222be 167### VCPKG
77387090
P
168You can build and install zstd [vcpkg](https://github.com/Microsoft/vcpkg/) dependency manager:
169
170 git clone https://github.com/Microsoft/vcpkg.git
171 cd vcpkg
172 ./bootstrap-vcpkg.sh
173 ./vcpkg integrate install
174 ./vcpkg install zstd
175
176The zstd port in vcpkg is kept up to date by Microsoft team members and community contributors.
177If the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
3e8222be 178
c4b9b8aa 179### Visual Studio (Windows)
4c9a4c18 180
287db17c
D
181Going into `build` directory, you will find additional possibilities:
182- Projects for Visual Studio 2005, 2008 and 2010.
c6351021 183 + VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
bffb4b46 184- Automated build scripts for Visual compiler by [@KrzysFR](https://github.com/KrzysFR), in `build/VS_scripts`,
4c9a4c18
YC
185 which will build `zstd` cli and `libzstd` library without any need to open Visual Studio solution.
186
c4b9b8aa 187### Buck
f67da612
MG
188
189You can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
190The output binary will be in `buck-out/gen/programs/`.
4c9a4c18 191
3a3da171
SL
192## Testing
193
194You can run quick local smoke tests by executing the `playTest.sh` script from the `src/tests` directory.
195Two env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the zstd and datagen binary.
196For information on CI testing, please refer to TESTING.md
197
c4b9b8aa 198## Status
45ff4309 199
4f73b3b5 200Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases.
4ded9e59
YC
201Zstandard is considered safe for production environments.
202
c4b9b8aa 203## License
4ded9e59 204
4f73b3b5 205Zstandard is dual-licensed under [BSD](LICENSE) and [GPLv2](COPYING).
4856a001 206
c4b9b8aa 207## Contributing
45ff4309 208
0b39531d
YC
209The `dev` branch is the one where all contributions are merged before reaching `release`.
210If you plan to propose a patch, please commit into the `dev` branch, or its own feature branch.
211Direct commit to `release` are not permitted.
4ded9e59 212For more information, please read [CONTRIBUTING](CONTRIBUTING.md).