Eric Wong [Thu, 24 Aug 2023 01:22:31 +0000 (01:22 +0000)]
cindex: read-only association dump
This will eventually allow associating coderepos with inboxes
and vice-versa; avoiding the need for manual configuration via
tedious publicinbox.*.coderepo directives.
I'm not sure how this should be stored for WWW, yet, but it's
required since it takes about 8 hours to do this fully across
lore and git.kernel.org.
Eric Wong [Sat, 19 Aug 2023 08:30:51 +0000 (08:30 +0000)]
isearch: avoid hex string for Xapian sortable_serialise
While a string representing a integer in hex is fine for DBI and
SQLite, Xapian's sortable_serialise requires a Perl integer value.
So just retrieve the last Xapian DB document ID in this rare
code path because we can't use 64-bit integer literals in some
32-bit Perl builds (e.g. OpenBSD on i386)
Fixes: be2a0a353d60 ("isearch: support 64-bit article numbers for SQLite query")
Eric Wong [Thu, 17 Aug 2023 07:23:10 +0000 (07:23 +0000)]
t/nntp.t: attempt to quiet spurious uninitialized warnings
When running via t/run.perl ("make check-run") to reduce test
startup time, t/nntp.t occasionally hits uninitialized variable
warnings in the quote_str sub. I can't reproduce these
reliably, but scoping subs in tests reduces the chance of
conflict when we reuse interpreters.
Eric Wong [Wed, 16 Aug 2023 08:07:12 +0000 (08:07 +0000)]
search: all_terms: remove needless prefix check
The ->allterms_{begin,end} methods of Xapian::Database already
filter match on prefix natively. Thus there's no need to do
filtering ourselves (unlike per-document ->termlist_{begin/end})
Eric Wong [Thu, 27 Jul 2023 21:18:55 +0000 (21:18 +0000)]
clone: allow running without DBI / DBD::SQLite
Due to historic reasons, LeiQuery.pm gets loaded with LEI.pm and
-clone depends on LEI. So delay loading any DBI-dependent
modules until querying is actually required.
Eric Wong [Thu, 27 Jul 2023 21:18:54 +0000 (21:18 +0000)]
Makefile.pl: *.cols: account for non-UTF-8-aware awk
When checking line length limits, the `length()' function of
mawk doesn't count non-ASCII characters properly in UTF-8
locales. Force the man(1) output to use C locale and use normal
`-' instead of multi-byte dash characters.
Eric Wong [Fri, 14 Jul 2023 09:28:47 +0000 (09:28 +0000)]
tests: t/run.perl: fix invocations with <10 tests
We must account for the maximum index of an array to avoid
filling unused slots with `undef' from out-of-bounds reads.
This is needed to avoid undefined entry errors in workers when
fewer than 10 tests are run. We'll also silence the message
when a single test is run.
While I was diagnosing this, I also noticed a small
simplification and optimization in our generation of $todo_buf
since I initially thought that was the cause of undefined
entry errors in the $todo arrayref.
Eric Wong [Thu, 13 Jul 2023 05:39:17 +0000 (05:39 +0000)]
t/imapd: workaround a Perl 5.36.0 readline regression
Buffered readline (and read) ops under Perl 5.36.0 fails to read
new data after writes are made by other file handles (or
processes).
To fix and improve our test, introduce a new, (currently)
test-only TailNotify class to use inotify or kevent if available
to workaround it while avoiding infinite polling loops. Further
refinements to these test APIs since we use the same pattern for
testing daemons in many places.
This also fixes the TEST_KILL_IMAPD condition in t/imapd.t under
GNU/Linux, AFAIK that test was never reliable under FreeBSD.
Eric Wong [Thu, 13 Jul 2023 05:40:20 +0000 (05:40 +0000)]
doc: HACKING: drop bit about Debian 9.x (stretch)
It's oldoldstable, by now; just refer to Debian stable as
the primary but keep LTS distros in mind because stuff like
CentOS 7.x needs to remain supported.
Eric Wong [Tue, 11 Jul 2023 10:29:28 +0000 (10:29 +0000)]
Makefile.PL: depend on IO::Poll in case distros split it out
IO::Poll is part of the Perl standard library, but there's
always a chance distros will make it part of another package
since it's not portable to non-POSIX-like OSes.
Eric Wong [Wed, 21 Jun 2023 10:16:57 +0000 (10:16 +0000)]
t/solver_git: drop needless `use' and Plack deps
`lei (blob|rediff)' works without Plack installed, so don't put
a dependency on Plack or anything related to HTTP aside from
the URI module which we use everywhere. This only enables testing
the solver component on systems without Plack (as the actual lei
functionality has always worked without Plack).
Eric Wong [Fri, 16 Jun 2023 23:13:01 +0000 (23:13 +0000)]
www: use correct threadid for per-thread search
For individual public-inboxes relying on extindex for per-inbox
search, we must use the threadid from the extindex over.sqlite3
rather than the per-inbox over.sqlite3 file.
Eric Wong [Thu, 15 Jun 2023 09:50:53 +0000 (09:50 +0000)]
lei: make --dedupe=content always account for Message-IDs
The content dedupe logic was originally designed for v2 public
inboxes as a fallback for when the importer sees identical
Message-IDs. Thus it did not account for Message-ID(s) in
the message itself.
This change doesn't affect saved searches (the default when
writing to a pathname or IMAP). It affects --no-save, and
outputs to stdout (even if stdout is redirected to a file).
Prior to this change, lei reused the v2 logic as-is without
accounting for Message-IDs anywhere with `--dedupe=content'
(the default). This could cause messages to be skipped when
the content matches despite Message-IDs being different.
So with this change, `lei q --dedupe=content' will hash the
Message-ID(s) in the message to ensure messages with different
Message-IDs are NOT deduplicated.
Whether or not this change is a bug fix or introduces regression
is actually debatable. In my mind, it is better to err on the
side of showing too many messages rather than too few, even if
the actual contents of the message are identical. Making saved
searches deduplicate without accounting for Message-IDs would be
more difficult, too.
Eric Wong [Thu, 15 Jun 2023 08:46:37 +0000 (08:46 +0000)]
lei import: set +(L|kw) on already-imported blobs
When import hits blobs it's already seen, we'll add labels
regardless in order to match the behavior of other inexact
matches. This is useful when importing exact copies of
messages which exist in multiple mailboxes.
I noticed this when I had a message imported from my normal IMAP
`INBOX', but also copied it to a different folder for future
reference.
Eric Wong [Fri, 9 Jun 2023 10:31:08 +0000 (10:31 +0000)]
add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
Eric Wong [Fri, 9 Jun 2023 10:31:07 +0000 (10:31 +0000)]
search: hoist out do_enquire for codesearch
Reusing this bit seems to make sense as mail and code search
are similar enough w.r.t. setting up sort options. This
deduplication will become more useful as -cindex will
likely combine code and mail search to generate associations
between inboxes and code repos.
Eric Wong [Fri, 9 Jun 2023 10:31:06 +0000 (10:31 +0000)]
search: add comments wrt codesearch, reduce ops
Add some comments about various usages of xdb_shards_flat and
mset since the addition of CodeSearch (and other search things)
subclassing it may become confusing.
Since we're in the area, we can also avoid an extra hash
lookups/initializations and reduce Perl ops in various places.
Eric Wong [Thu, 8 Jun 2023 18:26:08 +0000 (18:26 +0000)]
t/lei.t: quiet newline warning on older Perls
Perl < 5.22 warned on newlines in the middle of a string instead
of just the end. Workaround it by disabling all warnings on older
Perls while running File::Path::mkpath.
Eric Wong [Thu, 8 Jun 2023 18:04:54 +0000 (18:04 +0000)]
xapcmd: rely on File::Temp cleanup for temporary dir
remove_tree from File::Path 2.09 (from Perl 5.16.3 on CentOS 7.x)
doesn't seem to work properly on File::Temp objects. Since
File::Temp->newdir sets CLEANUP=>1 by default anyways, we'll
just rely on that to perform cleanup instead of doing it ourselves.
Eric Wong [Wed, 31 May 2023 22:10:01 +0000 (22:10 +0000)]
www: more restrictive query string parsing
Only allow single-character query keys to prevent clients from
wasting memory in Perl's hash tables. We'll also perform the
utf8::decode and tr/+/ / calls once on the whole query string at
once to reduce op calls.
This also avoids creating an empty hash in the common case
when the QUERY_STRING is empty and instead relies on
auto-vivification of Perl.
Eric Wong [Tue, 9 May 2023 09:15:30 +0000 (09:15 +0000)]
cindex: fix --no-scan no-op non-termination
We must account for the shards_active() recursing upon itself
when outside DS->event_loop. This is tricky, unfortunately, but
--no-scan isn't a common mode of operation. Noticed while
developing the monster --associate functionality to
automatically create bidirectional associations of
inboxes/extindices to coderepos.
Eric Wong [Mon, 8 May 2023 16:58:14 +0000 (16:58 +0000)]
cindex: fix --no-scan --prune
We must define $GIT_TODO to be non-undef when using --no-scan
for prune-only invocations to run. I'm leaning towards making
--no-scan a publicly-documented switch for -cindex; but I'm
less certain about documenting it for -index and -extindex...
Eric Wong [Sat, 6 May 2023 22:40:03 +0000 (22:40 +0000)]
isearch: support 64-bit article numbers for SQLite query
While IMAP UIDs are specified as 32-bit in RFC 3501, there's no
reason we can't support 64-bit article numbers on our end when
the time comes. Neither NNTP nor POP3 have the 32-bit
limitation, even, so it's not inconceivable that IMAP will drop
that limitation at some point, too.
Eric Wong [Thu, 4 May 2023 11:06:42 +0000 (11:06 +0000)]
xcpdb: support cindex upgrades and resharding
xcpdb is necessary for upgrading Xapian backends (e.g. glass to
honey), thus codesearch indices (cindex) must be supported.
Resharding is also useful if CPU count is altered on system
upgrades or downgrades.
cindex Xapian sharding is completely different than anything
else we do, so the resharding operation must be a special case
based on existing cindex sharding rules.
Eric Wong [Wed, 3 May 2023 11:42:15 +0000 (11:42 +0000)]
cindex: --prune + --exclude= drops repo information
--exclude= alone only prevents a coderepo from being indexed in
a particular invocation, but --prune will purge all traces of it
to ensure --update doesn't pick it up again w/o --exclude=
(unless --project-list= includes it).
Eric Wong [Mon, 1 May 2023 23:29:35 +0000 (23:29 +0000)]
daemon: improve handling of Git->async_abort
The $oid arg for Git->cat_async is defined on async_abort using
the original request, so use undefined $type to distinguish that
case in caller-supplied callbacks. async_abort isn't common, of
course, but sometimes git subprocesses can die unexpectedly.
Eric Wong [Sat, 29 Apr 2023 20:02:14 +0000 (20:02 +0000)]
solver_git: don't spew to daemon err on git apply failure
Too many patches don't apply (due to coderepos being a PITA to
associate) and interested admins can check for 404s to diagnose
them, anyways. This reduces the noise in syslog/stderr for
public-facing daemons.
Eric Wong [Fri, 28 Apr 2023 21:07:30 +0000 (21:07 +0000)]
git: make check_async callbacks identical to cat_async
This simplifies Git->cat_async_step and fixes Git->async_abort,
the latter of which was passing arguments improperly for the
--batch-check (or `info') case at the cost of making the few
check_async callers handle an extra argument.
The extra (PublicInbox::Git) $self argument for check_async
callbacks is now gone, as avoiding the temporary cyclic
reference doesn't seem worthwhile since the temporary cyclic
reference appears in the ->cat_async code paths, too.
Eric Wong [Wed, 26 Apr 2023 00:49:29 +0000 (00:49 +0000)]
xcpdb: preserve indexlevel for extindex
This likely fixes indexlevel preservation for some v2 on some
systems, too, since (apparently) we need to sort shards
numerically to get Xapian metadata working properly on a
combined (multi-shard) Xapian DB.
Eric Wong [Tue, 25 Apr 2023 11:02:58 +0000 (11:02 +0000)]
emergency: make error messages more consistent
Showing "failed" is needless if we already know the program
is die-ing. We'll prefix "BUG:" to bug messages, "W:" to
non-fatal warnings to be consistent with our newer code such
as lei.
Eric Wong [Tue, 25 Apr 2023 11:02:55 +0000 (11:02 +0000)]
cindex: simplify store_repo
It's easier to just create a new Xapian::Document and
replace it rather than to load and edit it. I don't
know if there's any performance difference one way or
the other, but fewer branches helps with maintainability
and smaller optree size to lower memory use and startup
speed.
Eric Wong [Tue, 25 Apr 2023 11:02:54 +0000 (11:02 +0000)]
cindex: simplify tmpfile management for indexing
I considered making this a pipe, but we must avoid spawning
`git log --stdin --no-walk=unsorted' for the no-op case since that
still emits a commit if stdin is empty. So just get rid of an
unnecessary loop and do lseek(2) inside workers for parallelism
Eric Wong [Tue, 25 Apr 2023 11:02:53 +0000 (11:02 +0000)]
cindex: drop unneeded module use
I initially thought I'd use the PublicInbox::Eml module and rely
on --pretty=mboxrd; but eventually decided against it since
it wasn't saving any code.
Eric Wong [Tue, 25 Apr 2023 10:50:52 +0000 (10:50 +0000)]
content_digest_dbg: improve display of To:/Cc: diffs
To: and Cc: headers can be long and differences in long lines
are easier to view when broken apart. Just split by /,/ since
Data::Dumper will delimit with "," anyways.
Eric Wong [Tue, 25 Apr 2023 10:50:51 +0000 (10:50 +0000)]
mail_diff: show headers differences in WWW /$MSGID/d/ view
Some messages only differ in the To/Cc headers because some
MTAs seem to normalize them. I was getting confused when I
saw some /d/ endpoints with no visible differences
Eric Wong [Tue, 25 Apr 2023 10:50:50 +0000 (10:50 +0000)]
mail_diff: match ContentHash EOL and EOM behavior more closely
ContentHash currently doesn't convert CRCRLF to LF. Perhaps it
should, but for now, have diff behavior match the actual
comparison behavior used for dedupe and omit all trailing
whitespace for diff.
Eric Wong [Tue, 25 Apr 2023 10:50:49 +0000 (10:50 +0000)]
mid+contenthash: eliminate needless local variable captures
It's possible in theory that Perl could be smarter and free
memory a tad sooner this way. Regardless, fewer lines of code
is easier-to-navigate/read and can save optree size and reduce
parsing times.
Eric Wong [Sat, 22 Apr 2023 10:33:42 +0000 (10:33 +0000)]
cindex: rewrite prune (again) for speed
With my partial git.kernel.org mirror, this brings a full prune
down from ~75 minutes to under 5 minutes using git 2.19+. This
speedup even applies to users on slow storage (rotational HDD).
First off, xapian-delve(1) is nearly 10x faster for dumping
boolean terms by prefix than the equivalent Perl code with
Xapian bindings. This performance difference is critical since
we need to check over 5 million commits for pruning a partial
git.kernel.org mirror.
We can use sed(1) and sort(1) to massage delve output into
something suitable for the first comm(1) input.
For the second comm(1) input, the output of `git cat-file
--batch-check --batch-all-objects' against all indexed git repos
with awk(1) filtering provides the necessary output for
generating a list of indexed-but-no-longer accessible commits.
sed(1) and awk(1) are POSIX standard tools which can be roughly
2x faster than equivalent Perl for simple filters, while
sort(1) is designed to handle larger-than-memory datasets
efficiently (unlike the `sort' perlop).
With slow storage and git <2.19, the switch to --batch-all-objects
actually results in a performance regression since having git
perform sorting results in worse disk locality than the previous
sequential iteration by Xapian docid. git 2.19+ users with
`--unordered' support benefits from improved storage locality;
and speedups from storage locality dwarfs the extra overhead of
an extra external sort(1) invocation.
Even with consumer-grade SATA-II SSDs, the combo of --unordered
and sort(1) provides a noticeable speedup since SSD latency
remains a factor for --batch-all-objects.
git <2.19 users must upgrade git to get acceptable performance
on slow storage and giant indexes, but git 2.19 was released
nearly 5 years ago so it's probably a reasonable requirement for
performance.
The only remaining downside of this change for all users
the extra temporary disk space for sort(1) and comm(1);
but the speedup provided with git 2.19+ is well worth it.
Eric Wong [Thu, 20 Apr 2023 10:23:02 +0000 (10:23 +0000)]
lei_mail_sync: prepare to support SHA-256
I'm not sure how combining SHA-1 and SHA-256 in a single git
repo will work, eventually. But this is an obvious place to do
the right thing if we ever see a 64-byte hex string (unless git
adds support for another hash which uses 64-byte hex string
representations, which would break many assumptions elsewhere,
too...).
Eric Wong [Thu, 20 Apr 2023 00:53:30 +0000 (00:53 +0000)]
cindex: limit parallelism of extensions.objectFormat check
We can't safely spawn all `git config' processes of every
indexed git directory at once due to system resource limits
(RLIMIT_NPROC, RLIMIT_NOFILE). So queue them up and limit
parallelism that way.
Eric Wong [Wed, 19 Apr 2023 21:54:48 +0000 (21:54 +0000)]
cindex: support sha256 coderepos alongside sha1
This special support is only needed for --prune at the moment
since the indexing side works on a per-repo basis. There's no
automated tests, yet, but it seems to work well on my sha256
projects when sharing a cindex with sha1 projects.
Eric Wong [Tue, 18 Apr 2023 18:39:14 +0000 (18:39 +0000)]
www_coderepo: rescan cgit project-list for new coderepos
Coderepo changes are probably more common than inbox changes, so
it probably makes sense to rescan and look for new coderepos on
404s, especially since we serve mirrored manifest.js.gz as-is.
I noticed my git.kernel.org mirror was serving manifest.js.gz
pointing to irretrievable repositories. This should stop that.
We'll also drop the underscore ('_') and use `coderepo'
everywhere to be consistent with our documentation.
We may serve new inboxes in a similar way down the line, too;
but this change only affects coderepos for now since we can
guarantee the inbox manifest.js.gz never contains irretrievable
inboxes as it's dynamically generated.
Eric Wong [Wed, 12 Apr 2023 10:17:42 +0000 (10:17 +0000)]
listener: support multi-accept like nginx
While accepting a single connection at-a-time is likely best for
multi-worker and/or load-balanced deployments; accepting
multiple connections at once should be less bad on overloaded
single-worker systems.
We can't automatically pick the best value here since worker
counts are dynamic via SIGTTIN/SIGTTOU. Process managers
(e.g. systemd) can also spawn multiple instances sharing a
single listener with no knowledge sharing between listeners.
Eric Wong [Wed, 12 Apr 2023 06:19:10 +0000 (06:19 +0000)]
lei_mail_sync: cleanup stale/dangling fids if possible
I'm not sure how it happens or if/when it was fixed, but my
earliest lei installations have hit some
"E: fid=$fid for $oidhex unknown" messages on `lei import'
invocations.
This really should've enabled the foreign keys pragma to begin
with; but we'll probably start using that in the future. For
now, at least rely on a transaction to keep things consistent
in SQLite.
Eric Wong [Wed, 12 Apr 2023 00:13:02 +0000 (00:13 +0000)]
git: parallelize manifest_entry
This saves a few milliseconds per-epoch without incurring
any dependencies on the event loop. It can be parallelized
further, of course, but it may not be worth it for -extindex
users since it's already cached.
Eric Wong [Wed, 12 Apr 2023 00:12:58 +0000 (00:12 +0000)]
git: cat_async_step: reduce batch-command info checks
This improves readability for me. Instead of checking for `info '
requests of `--batch-command' in multiple places of every
common branch, do it once per-call and stash its result.
We'll also avoid storing `$bc' for now since the only other
check is in a cold path.
Eric Wong [Sun, 9 Apr 2023 22:30:13 +0000 (22:30 +0000)]
www_coderepo: use OnDestroy to render summary view
This lets us get rid of a /bin/sh process and allows us us to
rely on Qspawn to parallelize git commands.
Special treatment of the OnDestroy object is necessary to keep
its scope limited for MockHTTP. Neither the generic `plackup'
HTTP server and nor our -httpd/-netd needed this scope
limitation. As a result, summary() is now called inside an
anonymous sub to keep the memory overhead of the anonymous sub
itself as small as possible. Avoiding anonymous subs entirely
would be preferable for memory savings, but it's necessary for
PSGI.
Eric Wong [Sat, 8 Apr 2023 09:23:44 +0000 (09:23 +0000)]
v2writable: drop experimental DEBUG_DIFF support
I haven't used it in 5 years, and I doubt anybody else has,
either. In any case, we have both `lei mail-diff' and diff
support in the WWW UI, now, so more convenient options are
available.
Eric Wong [Fri, 7 Apr 2023 12:40:53 +0000 (12:40 +0000)]
switch git version comparisons to vstrings, too
There's too many require_git callsites in t/*.t to change,
but we can make the rest of the code more readable and reuse
PublicInbox::Git::version() in our test suite, too.
Eric Wong [Fri, 7 Apr 2023 12:40:52 +0000 (12:40 +0000)]
searchidx: use vstring to improve readability
Perl has native `vstring' encoding for vector (or version)
strings, make use of it instead of relying on difficult-to-read
hex versions and integer shifts.
Eric Wong [Fri, 7 Apr 2023 12:40:50 +0000 (12:40 +0000)]
umask: hoist out of InboxWritable
Since CodeSearchIdx doesn't deal with inboxes, it makes sense
to split it out from inbox-specific code and start moving
towards using OnDestroy to restore the umask at the end of
scope and reducing extra functions.
Eric Wong [Fri, 7 Apr 2023 12:40:48 +0000 (12:40 +0000)]
cindex: improve progress display
Instead of displaying the total number of changes across all
repos next to the repo path ("$GIT_DIR: $TOTAL commits"), we'll
only show the number of changes made in that repo.
We'll also note when a prune is complete on a shard, since
prunes may often be expensive no-ops.
Eric Wong [Thu, 6 Apr 2023 12:39:53 +0000 (12:39 +0000)]
watch: close inotify FD on ->quit
For simplicity, we quit and recreate an entire watch instance
on SIGHUP. However, inotify (and signalfd) FDs are tied to
the DS event loop and stay pinned to existence that way.
Thus we explicitly close the FD in Watch->quit to prevent
leakage on SIGHUP.
Eric Wong [Thu, 6 Apr 2023 12:39:52 +0000 (12:39 +0000)]
watch: use detect_indexlevel for unconfigured inboxes
I favor leaving the publicinbox.<name>.indexlevel parameter
out of config files to make it easier to alter and reduce
sources of truth. It worked well in most cases, but
public-inbox-watch also needs to detect the indexlevel.
Moving the sub to InboxWritable (from Admin) probably makes
sense since it's a per-inbox attribute and allows -watch
to reuse it.
Eric Wong [Wed, 5 Apr 2023 11:26:56 +0000 (11:26 +0000)]
cindex: enter event loop once per run
This avoids needing to alter the sigmask for systems without
signalfd or EVFILT_SIGNAL. This will also make it easier to
workaround FreeBSD (and likely *BSD) signal behavior in the
next commit.
Eric Wong [Wed, 5 Apr 2023 11:26:53 +0000 (11:26 +0000)]
cindex: do prune work while waiting for `git log -p'
`git log -p' can several seconds to generate its initial output.
SMP systems can be processing prunes during this delay, so let
DS do a one-shot notification for us while prune is running. On
Linux, we'll also use the biggest pipe possible so git can do
more CPU-intensive work to generate diffs while our Perl
processes are indexing and likely hitting I/O wait.
Eric Wong [Wed, 5 Apr 2023 11:26:52 +0000 (11:26 +0000)]
ipc: support awaitpid in WQ workers
Using signalfd is necessary to get reliable signal wakeups w/o
polling on fixed intervals. This change will make it possible
to use awaitpid in cidx shard workers so they can perform prune
work while waiting on the initial output of `git log -p'.
Eric Wong [Thu, 30 Mar 2023 11:29:51 +0000 (11:29 +0000)]
www: support POST /$INBOX/$MSGID/?x=m&q=
This allows filtering the contents of any existing thread using
a search query. It uses the existing THREADID column in Xapian
so we can internally add a Xapian OP_FILTER to the results.
This new functionality is orthogonal to the existing `t=1'
parameter which gives mairix-style thread expansion. It doesn't
make sense to use `t=1' with this functionality, but it's not
disallowed, either.
The indentation change in Over->next_by_mid is to ensure
DBI->prepare_cached can share across both ->next_by_mid
and ->mid2tid.
I also noticed the existing regex for `POST /$INBOX/?x=m&q=' was
allowing extra characters. With an added \z, it's now as strict
was originally intended and AFAIK nothing was generating invalid
URLs for it
Eric Wong [Wed, 29 Mar 2023 20:32:59 +0000 (20:32 +0000)]
cindex: interleave prune with indexing
We need to ensure we don't block indexing for too long while
pruning, since pruning coderepos seems more frequent and
necessary than inbox repos due to the prevalence of force
pushes with branches like `seen' (formerly `pu') in git.git.
Implement this via ->event_step and requeue mechanisms of DS so
we periodically flush our work and let indexing resume.
I originally wanted to implement this as a dedicated group
of workers, but the XS Search::Xapian bug[1] workaround
to handle uncaught C++ exceptions was expensive and complex
compared to the evented mechanism.