Doc: improve explanation of GiST compress/decompress methods.

author Tom Lane <tgl@sss.pgh.pa.us>

Tue, 31 Mar 2026 15:23:20 +0000 (11:23 -0400)

committer Tom Lane <tgl@sss.pgh.pa.us>

Tue, 31 Mar 2026 15:23:26 +0000 (11:23 -0400)
author Tom Lane <tgl@sss.pgh.pa.us>
Tue, 31 Mar 2026 15:23:20 +0000 (11:23 -0400)
committer Tom Lane <tgl@sss.pgh.pa.us>
Tue, 31 Mar 2026 15:23:26 +0000 (11:23 -0400)
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml

index 5c0a0c48bab59b9e2c45812d02bfca2696d4b26e..3f1a01f381f9d4bca1e7e04a154db291c57290a6 100644 (file)
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -273,14 +273,10 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
     index will depend on the <function>penalty</function> and <function>picksplit</function>
     methods.
     Two optional methods are <function>compress</function> and
-   <function>decompress</function>, which allow an index to have internal tree data of
-   a different type than the data it indexes. The leaves are to be of the
-   indexed data type, while the other tree nodes can be of any C struct (but
-   you still have to follow <productname>PostgreSQL</productname> data type rules here,
-   see about <literal>varlena</literal> for variable sized data). If the tree's
-   internal data type exists at the SQL level, the <literal>STORAGE</literal> option
-   of the <command>CREATE OPERATOR CLASS</command> command can be used.
-   The optional eighth method is <function>distance</function>, which is needed
+   <function>decompress</function>, which allow an index to store keys that
+   are of a different type than the data it indexes, or are a compressed
+   representation of that type.
+   The optional eighth method <function>distance</function> is needed
     if the operator class wishes to support ordered scans (nearest-neighbor
     searches). The optional ninth method <function>fetch</function> is needed if the
     operator class wishes to support index-only scans, except when the
@@ -294,6 +290,7 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
     <filename>src/include/access/cmptype.h</filename>) into strategy numbers
     used by the operator class.  This lets the core code look up operators for
     temporal constraint indexes.
+   All these methods are described in more detail below.
   </para>
  
   <variablelist>
@@ -484,6 +481,24 @@ my_union(PG_FUNCTION_ARGS)
         in the index without modification.
        </para>
  
+      <para>
+       Use the <literal>STORAGE</literal> option of the <command>CREATE
+       OPERATOR CLASS</command> command to define the data type that is
+       stored in the index, if it is different from the data type being
+       indexed.  Be aware however that the <literal>STORAGE</literal> data
+       type is only used to define the physical properties of the index
+       entries (their <replaceable>typlen</replaceable>,
+       <replaceable>typbyval</replaceable>,
+       and <replaceable>typalign</replaceable> attributes).  What is
+       actually in the index datums is under the control of the
+       <function>compress</function> and <function>decompress</function>
+       methods, so long as the stored datums match those properties.
+       It is allowed for <function>compress</function> to produce different
+       representations for leaf keys than for keys on higher-level index
+       pages, so long as both representations match
+       the <literal>STORAGE</literal> data type.
+      </para>
+
        <para>
          The <acronym>SQL</acronym> declaration of the function must look like this:
  
diff --git a/src/backend/access/gist/README b/src/backend/access/gist/README

index 76e0e11f2283ad331b851ae793d5cfab4fc4e551..75445b074555390b630638030afb0ab4022bfecb 100644 (file)
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -10,9 +10,13 @@ GiST stands for Generalized Search Tree. It was introduced in the seminal paper
  Jeffrey F. Naughton, Avi Pfeffer:
  
      http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
+
+Concurrency support was described in "Concurrency and Recovery in Generalized
+Search Trees", 1997, Marcel Kornacker, C. Mohan, Joseph M. Hellerstein:
+
      https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
  
-and implemented by J. Hellerstein and P. Aoki in an early version of
+GiST was implemented by J. Hellerstein and P. Aoki in an early version of
  PostgreSQL (more details are available from The GiST Indexing Project
  at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
  project it had a limited number of features and was in rare use.
@@ -55,6 +59,9 @@ The original algorithms were modified in several ways:
    it is now a single-pass algorithm.
  * Since the papers were theoretical, some details were omitted and we
    had to find out ourself how to solve some specific problems.
+* The 1997 paper above (but not the 1995 one) states that leaf pages should
+  store the original key.  While that can be done in PostgreSQL, it is
+  also possible to use a compressed representation in leaf pages.
  
  Because of the above reasons, we have revised the interaction of GiST
  core and PostgreSQL WAL system. Moreover, we encountered (and solved)
author	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 31 Mar 2026 15:23:20 +0000 (11:23 -0400)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 31 Mar 2026 15:23:26 +0000 (11:23 -0400)
doc/src/sgml/gist.sgml		patch \| blob \| blame \| history
src/backend/access/gist/README		patch \| blob \| blame \| history