with command line syntax similar to `gzip (1)` and `xz (1)`.
It is based on the **LZ77** family, with further FSE & huff0 entropy stages.
`zstd` offers highly configurable compression speed,
-with fast modes at > 200 MB/s per core,
-and strong modes nearing lzma compression ratios.
+from fast modes at > 200 MB/s per core,
+to strong modes with excellent compression ratios.
It also features a very fast decoder, with speeds > 500 MB/s per core.
`zstd` command line syntax is generally similar to gzip,
- When compressing a single file, `zstd` displays progress notifications
and result summary by default.
Use `-q` to turn them off.
- - `zstd` does not accept input from console,
- but it properly accepts `stdin` when it's not the console.
- `zstd` displays a short help page when command line is an error.
Use `-q` to turn it off.
+ - `zstd` does not accept input from console,
+ though it does accept `stdin` when it's not the console.
-`zstd` compresses or decompresses each _file_ according to the selected
-operation mode.
+`zstd` processes each _file_ according to the selected operation mode.
If no _files_ are given or _file_ is `-`, `zstd` reads from standard input
and writes the processed data to standard output.
`zstd` will refuse to write compressed data to standard output
get the target filename
### Concatenation with .zst files
-It is possible to concatenate `.zst` files as is.
-`zstd` will decompress such files as if they were a single `.zst` file.
+It is possible to concatenate multiple `.zst` files. `zstd` will decompress
+such agglomerated file as if it was a single `.zst` file.
OPTIONS
-------
Decompress.
* `-t`, `--test`:
Test the integrity of compressed _files_.
- This option is equivalent to `--decompress --stdout` except that the
- decompressed data is discarded instead of being written to standard output.
+ This option is equivalent to `--decompress --stdout > /dev/null`,
+ decompressed data is discarded and checksummed for errors.
No files are created or removed.
* `-b#`:
Benchmark file(s) using compression level #
* `-l`, `--list`:
Display information related to a zstd compressed file, such as size, ratio, and checksum.
Some of these fields may not be available.
- This command can be augmented with the `-v` modifier.
+ This command's output can be augmented with the `-v` modifier.
### Operation modifiers
alias to the option `-9`.
-### Restricted usage of Environment Variables
+### Interactions with Environment Variables
-Using environment variables to set parameters has security implications.
-Therefore, this avenue is intentionally restricted.
+Employing environment variables to set parameters has security implications.
+Therefore, this avenue is intentionally limited.
Only `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` are currently supported.
They set the compression level and number of threads to use during compression, respectively.
`ZSTD_NBTHREADS` can be used to set the number of threads `zstd` will attempt to use during compression.
If the value of `ZSTD_NBTHREADS` is not a valid unsigned integer, it will be ignored with a warning message.
-`ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200. `zstd` must be
-compiled with multithread support for this to have any effect.
+`ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200.
+`zstd` must be compiled with multithread support for this to have any effect.
They can both be overridden by corresponding command line arguments:
`-#` for compression level and `-T#` for number of compression threads.
which greatly improves efficiency on small files and messages.
It's possible to train `zstd` with a set of samples,
the result of which is saved into a file called a `dictionary`.
-Then during compression and decompression, reference the same dictionary,
+Then, during compression and decompression, reference the same dictionary,
using command `-D dictionaryFileName`.
Compression of small files similar to the sample set will be greatly improved.
* `--train FILEs`:
Use FILEs as training set to create a dictionary.
- The training set should contain a lot of small files (> 100),
+ The training set should ideally contain a lot of samples (> 100),
and weight typically 100x the target dictionary size
- (for example, 10 MB for a 100 KB dictionary).
+ (for example, ~10 MB for a 100 KB dictionary).
`--train` can be combined with `-r` to indicate a directory rather than listing all the files,
which can be useful to circumvent shell expansion limits.
+ Since dictionary compression is mostly effective for small files,
+ the expectation is that the training set will only contain small files.
+ In the case where some samples happen to be large,
+ only the first 128 KB of these samples will be used for training.
+
`--train` supports multithreading if `zstd` is compiled with threading support (default).
- Additional parameters can be specified with `--train-fastcover`.
+ Additional advanced parameters can be specified with `--train-fastcover`.
The legacy dictionary builder can be accessed with `--train-legacy`.
The slower cover dictionary builder can be accessed with `--train-cover`.
- Default is equivalent to `--train-fastcover=d=8,steps=4`.
-* `-o file`:
- Dictionary saved into `file` (default name: dictionary).
+ Default `--train` is equivalent to `--train-fastcover=d=8,steps=4`.
+
+* `-o FILE`:
+ Dictionary saved into `FILE` (default name: dictionary).
* `--maxdict=#`:
- Limit dictionary to specified size (default: 112640).
+ Limit dictionary to specified size (default: 112640 bytes).
+ As usual, quantities are expressed in bytes by default,
+ and it's possible to employ suffixes (like `KB` or `MB`)
+ to specify larger values.
* `-#`:
Use `#` compression level during training (optional).
Will generate statistics more tuned for selected compression level,
* `-B#`:
Split input files into blocks of size # (default: no split)
* `-M#`, `--memory=#`:
- Limit the amount of sample data loaded for training (default: 2 GB). See above for details.
+ Limit the amount of sample data loaded for training (default: 2 GB).
+ Note that the default (2 GB) is also the maximum.
+ This parameter can be useful in situations where the training set size
+ is not well controlled and could be potentially very large.
+ Since speed of the training process is directly correlated to
+ the size of the training sample set,
+ a smaller sample set leads to faster training.
+
+ In situations where the training set is larger than maximum memory,
+ the CLI will randomly select samples among the available ones,
+ up to the maximum allowed memory budget.
+ This is meant to improve dictionary relevance
+ by mitigating the potential impact of clustering,
+ such as selecting only files from the beginning of a list
+ sorted by modification date, or sorted by alphabetical order.
+ The randomization process is deterministic, so
+ training of the same list of files with the same parameters
+ will lead to the creation of the same dictionary.
+
* `--dictID=#`:
- A dictionary ID is a locally unique ID
- that a decoder can use to verify it is using the right dictionary.
+ A dictionary ID is a locally unique ID.
+ The decoder will use this value to verify it is using the right dictionary.
By default, zstd will create a 4-bytes random number ID.
- It's possible to give a precise number instead.
- Short numbers have an advantage : an ID < 256 will only need 1 byte in the
- compressed frame header, and an ID < 65536 will only need 2 bytes.
- This compares favorably to 4 bytes default.
- However, it's up to the dictionary manager to not assign twice the same ID to
+ It's possible to provide an explicit number ID instead.
+ It's up to the dictionary manager to not assign twice the same ID to
2 different dictionaries.
+ Note that short numbers have an advantage :
+ an ID < 256 will only need 1 byte in the compressed frame header,
+ and an ID < 65536 will only need 2 bytes.
+ This compares favorably to 4 bytes default.
+
* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`:
Select parameters for the default dictionary builder algorithm named cover.
If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
Use legacy dictionary builder algorithm with the given dictionary
_selectivity_ (default: 9).
The smaller the _selectivity_ value, the denser the dictionary,
- improving its efficiency but reducing its possible maximum size.
+ improving its efficiency but reducing its achievable maximum size.
`--train-legacy=s=#` is also accepted.
Examples:
ADVANCED COMPRESSION OPTIONS
----------------------------
### -B#:
-Select the size of each compression job.
+Specify the size of each compression job.
This parameter is only available when multi-threading is enabled.
Each compression job is run in parallel, so this value indirectly impacts the nb of active threads.
Default job size varies depending on compression level (generally `4 * windowSize`).
`-B#` makes it possible to manually select a custom size.
Note that job size must respect a minimum value which is enforced transparently.
This minimum is either 512 KB, or `overlapSize`, whichever is largest.
-Different job sizes will lead to (slightly) different compressed frames.
+Different job sizes will lead to non-identical compressed frames.
### --zstd[=options]:
`zstd` provides 22 predefined compression levels.