Adding documentation for shrink flag PR #1656

author Tyler Tran <tranqt@fb.com>

Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)

committer Tyler Tran <tranqt@fb.com>

Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)
author Tyler Tran <tranqt@fb.com>
Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)
committer Tyler Tran <tranqt@fb.com>
Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)
diff --git a/programs/README.md b/programs/README.md

index d9ef5dd2a62e48f4d45970908dd4d0b83bd43b0d..c3a5590d6de5ea0e737bbad9ba5d5e8f04e97987 100644 (file)
--- a/programs/README.md
+++ b/programs/README.md
@@ -157,8 +157,8 @@ Advanced arguments :
  
  Dictionary builder :
  --train ## : create a dictionary from a training set of files
---train-cover[=k=#,d=#,steps=#,split=#] : use the cover algorithm with optional args
---train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#] : use the fastcover algorithm with optional args
+--train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
+--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,shrink[=#],accel=#] : use the fastcover algorithm with optional args
  --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
   -o file : `file` is dictionary name (default: dictionary)
  --maxdict=# : limit dictionary to specified size (default: 112640)
diff --git a/programs/zstd.1 b/programs/zstd.1

index beca9da421d1c7d868c63dad07565cb910aadca1..8ef3edd22b01d7bd56c4391984d73b9711878ea8 100644 (file)
--- a/programs/zstd.1
+++ b/programs/zstd.1
@@ -229,8 +229,8 @@ Split input files in blocks of size # (default: no split)
  A dictionary ID is a locally unique ID that a decoder can use to verify it is using the right dictionary\. By default, zstd will create a 4\-bytes random number ID\. It\'s possible to give a precise number instead\. Short numbers have an advantage : an ID < 256 will only need 1 byte in the compressed frame header, and an ID < 65536 will only need 2 bytes\. This compares favorably to 4 bytes default\. However, it\'s up to the dictionary manager to not assign twice the same ID to 2 different dictionaries\.
  .
  .TP
-\fB\-\-train\-cover[=k#,d=#,steps=#,split=#]\fR
-Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or split <= 0, then the default value of 100 is used\. Requires that \fId\fR <= \fIk\fR\.
+\fB\-\-train\-cover[=k#,d=#,steps=#,split=#,shrink[=#]]\fR
+Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or split <= 0, then the default value of 100 is used\. Requires that \fId\fR <= \fIk\fR\. If \fIshrink\fR flag is not used, then the default value for \fIshrinkDict\fR of 0 is used\. If \fIshrink\fR is not specified, then the default value for \fIshrinkDictMaxRegression\fR of 1 is used\.
  .
  .IP
  Selects segments of size \fIk\fR with highest score to put in the dictionary\. The score of a segment is computed by the sum of the frequencies of all the subsegments of size \fId\fR\. Generally \fId\fR should be in the range [6, 8], occasionally up to 16, but the algorithm will run faster with d <= \fI8\fR\. Good values for \fIk\fR vary widely based on the input data, but a safe range is [2 * \fId\fR, 2000]\. If \fIsplit\fR is 100, all input samples are used for both training and testing to find optimal \fId\fR and \fIk\fR to build dictionary\. Supports multithreading if \fBzstd\fR is compiled with threading support\.
@@ -254,7 +254,7 @@ Examples:
  \fBzstd \-\-train\-cover=k=50,split=60 FILEs\fR
  .
  .TP
-\fB\-\-train\-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]\fR
+\fB\-\-train\-fastcover[=k#,d=#,f=#,steps=#,split=#,shrink[=#],accel=#]\fR
  Same as cover but with extra parameters \fIf\fR and \fIaccel\fR and different default value of split If \fIsplit\fR is not specified, then it tries \fIsplit\fR = 75\. If \fIf\fR is not specified, then it tries \fIf\fR = 20\. Requires that 0 < \fIf\fR < 32\. If \fIaccel\fR is not specified, then it tries \fIaccel\fR = 1\. Requires that 0 < \fIaccel\fR <= 10\. Requires that \fId\fR = 6 or \fId\fR = 8\.
  .
  .IP
diff --git a/programs/zstd.1.md b/programs/zstd.1.md

index 93c6fa40010ed0a220f963d24fcea21a52731074..ca4d64301b84024ef663590e7a4555850d16ff5d 100644 (file)
--- a/programs/zstd.1.md
+++ b/programs/zstd.1.md
@@ -244,13 +244,15 @@ Compression of small files similar to the sample set will be greatly improved.
      This compares favorably to 4 bytes default.
      However, it's up to the dictionary manager to not assign twice the same ID to
      2 different dictionaries.
-* `--train-cover[=k#,d=#,steps=#,split=#]`:
+* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`:
      Select parameters for the default dictionary builder algorithm named cover.
      If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
      If _k_ is not specified, then it tries _steps_ values in the range [50, 2000].
      If _steps_ is not specified, then the default value of 40 is used.
      If _split_ is not specified or split <= 0, then the default value of 100 is used.
      Requires that _d_ <= _k_.
+    If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used.
+    If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used.
  
      Selects segments of size _k_ with highest score to put in the dictionary.
      The score of a segment is computed by the sum of the frequencies of all the
@@ -262,6 +264,9 @@ Compression of small files similar to the sample set will be greatly improved.
      If _split_ is 100, all input samples are used for both training and testing
      to find optimal _d_ and _k_ to build dictionary.
      Supports multithreading if `zstd` is compiled with threading support.
+    Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles
+    in size until compression ratio of the truncated dictionary is at most
+    _shrinkDictMaxRegression_% worse than the compression ratio of the largest dictionary.
  
      Examples:
  
@@ -275,6 +280,10 @@ Compression of small files similar to the sample set will be greatly improved.
  
      `zstd --train-cover=k=50,split=60 FILEs`
  
+    `zstd --train-cover=shrink FILEs`
+
+    `zstd --train-cover=shrink=2 FILEs`
+
  * `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`:
      Same as cover but with extra parameters _f_ and _accel_ and different default value of split
      If _split_ is not specified, then it tries _split_ = 75.
author	Tyler Tran <tranqt@fb.com>
	Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)
committer	Tyler Tran <tranqt@fb.com>
	Mon, 22 Jul 2019 23:33:22 +0000 (16:33 -0700)
programs/README.md		patch \| blob \| blame \| history
programs/zstd.1		patch \| blob \| blame \| history
programs/zstd.1.md		patch \| blob \| blame \| history