ITS#9270 Additional information on indexing

author Ondřej Kuzník <ondra@mistotebe.net>

Tue, 2 Mar 2021 16:22:39 +0000 (16:22 +0000)

committer Quanah Gibson-Mount <quanah@openldap.org>

Tue, 2 Mar 2021 18:15:58 +0000 (18:15 +0000)
author Ondřej Kuzník <ondra@mistotebe.net>
Tue, 2 Mar 2021 16:22:39 +0000 (16:22 +0000)
committer Quanah Gibson-Mount <quanah@openldap.org>
Tue, 2 Mar 2021 18:15:58 +0000 (18:15 +0000)
diff --git a/doc/guide/admin/tuning.sdf b/doc/guide/admin/tuning.sdf

index 9c8fb46b51dc6b79cf14d9e91f05d68afb62ab52..ec47eb09631bcdfe809caa43660016f7f0907da9 100644 (file)
--- a/doc/guide/admin/tuning.sdf
+++ b/doc/guide/admin/tuning.sdf
@@ -64,9 +64,20 @@ If the filter term has not been indexed, then the search must read every single
   entry in the target scope and test to see if each entry matches the filter. 
  Obviously indexing can save a lot of work when it's used correctly.
  
+In back-mdb, indexes can only track a certain number of entries per key (by
+default that number is 2^16 = 65536). If more entries' values hash to this
+key, some/all of them will have to be represented by a range of candidates,
+making the index less useful over time as deletions cannot usually be tracked
+accurately.
+
  H3: What to index
  
-You should create indices to match the actual filter terms used in
+As a general rule, to make any use of indexes, you must set up an equality
+index on objectClass:
+
+>        index objectClass eq
+
+Then you should create indices to match the actual filter terms used in
  search queries. 
  
  >        index cn,sn,givenname,mail eq
@@ -86,7 +97,8 @@ all of those entries are going to be read anyway, because they are valid
  members of the result set. In a subtree where 100% of the
  entries are going to contain the same attributes, the presence index does
  absolutely NOTHING to benefit the search, because 100% of the entries match
-that presence filter. 
+that presence filter. As an example, setting a presence index on objectClass
+provides no benefit since it is present on every entry.
  
  So the resource cost of generating the index is a
  complete waste of CPU time, disk, and memory. Don't do it unless you know
@@ -101,6 +113,32 @@ not be done, it's just wasted overhead.
  See the {{Logging}} section below on what to watch out for if you have a frequently searched
  for attribute that is unindexed.
  
+H3: Equality indexing
+
+Similarly to presence indexes, equality indexes are most useful if the
+values searched for are uncommon. Most OpenLDAP indexes work by hashing
+the normalised value and using the hash as the key. Hashing behaviour
+depends on the matching rule syntax, some matching rules also implement
+indexers that help speed up inequality (lower than, ...) queries.
+
+Check the documentation and other parts of this guide if some indexes are
+mandatory - e.g. to enable replication, it is expected you index certain
+operational attributes, likewise if you rely on filters in ACL processing.
+
+Approximate indexes are usually identical to equality indexes unless
+a matching rule explicitly implements it. As of OpenLDAP 2.5, only
+directoryStringApproxMatch and IA5StringApproxMatch matchers
+and indexers are implemented, currently using soundex or metaphone, with
+metaphone being the default.
+
+H3: Substring indexing
+
+Substring indexes work on spliting the value into short chunks and then
+indexing those in a similar way to how equality index does. The storage
+space needed to store all of this data is analogous to the amount of data
+being indexed, which makes the indexes extremely heavy-handed in most
+scenarios.
+
  
  H2: Logging
author	Ondřej Kuzník <ondra@mistotebe.net>
	Tue, 2 Mar 2021 16:22:39 +0000 (16:22 +0000)
committer	Quanah Gibson-Mount <quanah@openldap.org>
	Tue, 2 Mar 2021 18:15:58 +0000 (18:15 +0000)