From 1bf3bf2fb78b32afa1ef209272cec25049f92c33 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Mon, 3 Nov 2025 02:50:11 +0000 Subject: [PATCH] doc/cindex: flesh out documentation of supported features --join is the big and most useful one with current cindex functionality, so it should be documented. --- Documentation/public-inbox-cindex.pod | 144 +++++++++++++++++++++++--- Documentation/public-inbox-config.pod | 14 +++ 2 files changed, 145 insertions(+), 13 deletions(-) diff --git a/Documentation/public-inbox-cindex.pod b/Documentation/public-inbox-cindex.pod index f7d7de6e5..9619ea252 100644 --- a/Documentation/public-inbox-cindex.pod +++ b/Documentation/public-inbox-cindex.pod @@ -1,24 +1,35 @@ =head1 NAME -public-inbox-cindex - create and update code repository search indices +public-inbox-cindex - create and update coderepo search indices =head1 SYNOPSIS -public-inbox-cindex [OPTIONS] -g GIT_DIR [-g GIT_DIR]... +public-inbox-cindex -d EXTDIR [OPTIONS] --join + +public-inbox-cindex -d EXTDIR [OPTIONS] --update -public-inbox-cindex [OPTIONS] --update +public-inbox-cindex [OPTIONS] -g GIT_DIR [-g GIT_DIR]... =head1 DESCRIPTION public-inbox-cindex creates and updates the Xapian search index for -git code repository (C) search. It can also associate -(fuzzy join) coderepos with Xapian-indexed inboxes. It only indexes -commit messages and diffs as they would show up in an email. It -does not index the contents of blobs directly. +git code repository (C) search. It can associate +(fuzzy join) coderepos with Xapian-indexed inboxes to enable blob +reconstruction in the C<$INBOX_URL/$BLOB_OID/s/>) WWW endpoint. +It only indexes commit messages and diffs as they would show up in +an email. It does not currently index the contents of blobs directly. Like inbox indices, coderepo indices can either be internal or external to a coderepo. Either way, they're both created and updated through -public-inbox-cindex. +public-inbox-cindex. External indices via L are recommended +for sites hosting multiple coderepos with common history. + +Currently, public-inbox-cindex exists mainly to save WWW admins the +trouble of associating hundreds/thousands of inboxes and coderepos +with each other via C and C +directives in L). Eventually, it will +allow L functionality to be ported to the WWW UI and +allow searching commits in coderepos directly via WWW interface. Once the initial indices are created by public-inbox-cindex, the L switch will incrementally update them. @@ -31,10 +42,20 @@ the L switch will incrementally update them. Use the given directory as an external index. External indices are generally recommended to internal indices since they do not need -write access to any code repositories themselves. They are highly -recommended when many repositories share a common history or if +write access to any coderepos themselves. They are highly +recommended when many coderepos share a common history or if there is an M:N relationship between inboxes and coderepos. +=item -g GIT_DIR + +=item --git-dir=GIT_DIR + +When not using L, the cindex will be written to +C<$GIT_DIR/public-inbox-cindex>. May also be combined with L to index a single (or subset of) git coderepos. + +May be specified multiple times. + =item -j JOBS =item --jobs=JOBS @@ -46,6 +67,22 @@ shards will be created. Default: the number of existing Xapian shards +=item --join + +Attempt a fuzzy association of all inboxes and coderepos to +enhance the WWW interface. See L below. + +A C++ compiler, L and Xapian development files +(e.g. libxapian-dev or xapian-core-devel) will make this operation +orders of magnitude faster. + +This operation should be rerun whenever inboxes or coderepos are +added or removed, or when one project merges with another. + +Web servers running PublicInbox::WWW (e.g. L or +L) currently need to be restarted to pick up +new (or expire old) associations. + =item --reindex Forces a re-index of all commits. This can be used for in-place @@ -55,12 +92,14 @@ upgrades and bugfixes while read-only processes are utilizing the index. =item -u -Incrementally index all previously-indexed coderepos. +Incrementally index all previously-indexed coderepos without +checking for new ones. =item --prune Unindexes commits which are no longer accessible via git. -Use this after L (or L). +Use this after L (or L), or if coderepos +are removed. =item --no-fsync @@ -73,6 +112,26 @@ Use this after L (or L). These affect the coderepo index the same way they affect inbox indices. See L. +A smaller value of C<--max-size> (e.g. C<--max-size=10m>) +is highly recommended to limit memory usage for gigantic +commits. + +=item --project-list=FILE + +The same project list used by cgit, gitweb, grokmirror and +L. Requires L. + +=item --project-root=DIRECTORY + +=item -r DIRECTORY + +Specifies the top-level directory for projects in L. + +=item --exclude (GLOB|PATH) + +Exclude given coderepos when using L. +May be specified multiple times. + =back =head1 FILES @@ -80,7 +139,7 @@ inbox indices. See L. For internal indices, the Xapian DB is stored in C<$GIT_DIR/public-inbox-cindex>. -External indices are stored wherever L EXTDIR points. +External indices are stored wherever L points. =head1 CONFIGURATION @@ -93,6 +152,25 @@ External indices are stored wherever L EXTDIR points. These configuration knobs affect the coderepo index the same way they affect inbox indices. See L. +=item cindex.$NAME.topdir + +Directory where an external coderepo index was created (with +L). + +C<$NAME> is the URL prefix (without leading/trailing slashes) +for all coderepos in the WWW interface, an empty string (C<"">) +is allowed when repos are stored at the toplevel. + +Combined with L, this allows admins +to avoid specifying a separate L entry +for every coderepo indexed. + +=item cindex.$NAME.localprefix + +The local directory name prefix of all coderepos to be displayed +in the WWW interface. This is typically a subdirectory in +L + =back =head1 ENVIRONMENT @@ -118,6 +196,46 @@ Use C instead. Occasionally, public-inbox will update its schema version and require a full reindex by running this command with L. +=head1 EXAMPLE + +Assuming you have an "all" extindex for your inboxes and store +coderepos in C, the contents of your C +file should include something like this: + + [extindex "all"] + topdir = /path/to/eidx-all + [cindex "pub"] + localprefix = /path/to/repos/pub + topdir = /path/to/cidx-all + +Assuming you're using a cgit/gitweb/grokmirror/L +compatible L<--project-list|public-inbox-clone(1)/--project-list=FILE>, +you can periodically use L when new coderepos are added, +deleted, or when one project merges with another: + + public-inbox-cindex -d /path/to/cidx-all --max-size=10m \ + -L medium --join \ + --exclude='**/uninteresting-a.git' \ + --exclude='**/uninteresting-b.git' \ + --project-list=/path/to/repos/projects.list \ + --project-root=/path/to/repos + +After this (and restarting the webserver), a project in +C should be visible via +C from a WWW instance and the +bottom of the project.git HTML page should display a list of +C. When viewing patch emails in any +associated inbox, diff hunk headers (those C<@@ -123,4 +123,4 @@> +lines) will link to C<$INBOX_URL/$BLOB_OID/s/> URLs which attempt +to display the blob (reconstructing blobs from patch emails if +necessary). + +Since C<--join> is expensive and (coderepo|inbox) additions/removals +are rare, incrementally updating the index can be done more quickly +with L: + + public-inbox-cindex -d /path/to/cidx-all --max-size=10m --update + =head1 CONTACT Feedback welcome via plain-text mail to L diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index 82b355710..00f260faf 100644 --- a/Documentation/public-inbox-config.pod +++ b/Documentation/public-inbox-config.pod @@ -174,6 +174,8 @@ May be specified more than once for M:N mapping of code repos to inboxes. If enabled, diff hunk headers in patch emails will link to the line numbers of blobs. +Unnecessary with L + Default: none =item publicinbox..altid @@ -370,16 +372,26 @@ respectively Default: none +=item cindex.$NAME.topdir + +=item cindex.$NAME.localprefix + +See L + =item coderepo..dir The path to a git repository for "publicinbox..coderepo" Absolute pathnames longer than 244 bytes cannot be indexed with L +Unnecessary with L + =item coderepo..cgitUrl The URL of the cgit instance associated with the coderepo. +Unnecessary with L + Default: none =item coderepo.snapshots @@ -555,6 +567,8 @@ Identical to LnameE.coderepo>, but for external indices. Code repos may be freely associated with any number of public inboxes and external indices. +Unnecessary with L + =back =head2 NAMED LIMITER (PSGI) -- 2.47.3