Updates `ZSTD_RowFindBestMatch` comment (#3947)

author Yonatan Komornik <11005061+yoniko@users.noreply.github.com>

Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)

committer GitHub <noreply@github.com>

Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)
author Yonatan Komornik <11005061+yoniko@users.noreply.github.com>
Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)
committer GitHub <noreply@github.com>
Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)
diff --git a/lib/compress/zstd_lazy.c b/lib/compress/zstd_lazy.c

index 3aba83c6fc38ada499909b92f929d9293844f0f1..67dd55fdb8060394698e2576e90154c328364e2a 100644 (file)
--- a/lib/compress/zstd_lazy.c
+++ b/lib/compress/zstd_lazy.c
@@ -1123,18 +1123,18 @@ ZSTD_row_getMatchMask(const BYTE* const tagRow, const BYTE tag, const U32 headGr
  
  /* The high-level approach of the SIMD row based match finder is as follows:
   * - Figure out where to insert the new entry:
- *      - Generate a hash from a byte along with an additional 1-byte "short hash". The additional byte is our "tag"
- *      - The hashTable is effectively split into groups or "rows" of 16 or 32 entries of U32, and the hash determines
+ *      - Generate a hash for current input posistion and split it into a one byte of tag and `rowHashLog` bits of index.
+ *           - The hash is salted by a value that changes on every contex reset, so when the same table is used
+ *             we will avoid collisions that would otherwise slow us down by intorducing phantom matches.
+ *      - The hashTable is effectively split into groups or "rows" of 15 or 31 entries of U32, and the index determines
   *        which row to insert into.
- *      - Determine the correct position within the row to insert the entry into. Each row of 16 or 32 can
- *        be considered as a circular buffer with a "head" index that resides in the tagTable.
- *      - Also insert the "tag" into the equivalent row and position in the tagTable.
- *          - Note: The tagTable has 17 or 33 1-byte entries per row, due to 16 or 32 tags, and 1 "head" entry.
- *                  The 17 or 33 entry rows are spaced out to occur every 32 or 64 bytes, respectively,
- *                  for alignment/performance reasons, leaving some bytes unused.
- * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byte "short hash" and
+ *      - Determine the correct position within the row to insert the entry into. Each row of 15 or 31 can
+ *        be considered as a circular buffer with a "head" index that resides in the tagTable (overall 16 or 32 bytes
+ *        per row).
+ * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byte tag calculated for the position and
   *   generate a bitfield that we can cycle through to check the collisions in the hash table.
   * - Pick the longest match.
+ * - Insert the tag into the equivalent row and position in the tagTable.
   */
  FORCE_INLINE_TEMPLATE
  ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
author	Yonatan Komornik <11005061+yoniko@users.noreply.github.com>
	Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)
committer	GitHub <noreply@github.com>
	Tue, 12 Mar 2024 22:10:07 +0000 (15:10 -0700)