]> git.ipfire.org Git - thirdparty/public-inbox.git/log
thirdparty/public-inbox.git
2 years agowww: support POST /$INBOX/$MSGID/?x=m&q=
Eric Wong [Thu, 30 Mar 2023 11:29:51 +0000 (11:29 +0000)] 
www: support POST /$INBOX/$MSGID/?x=m&q=

This allows filtering the contents of any existing thread using
a search query.  It uses the existing THREADID column in Xapian
so we can internally add a Xapian OP_FILTER to the results.

This new functionality is orthogonal to the existing `t=1'
parameter which gives mairix-style thread expansion.  It doesn't
make sense to use `t=1' with this functionality, but it's not
disallowed, either.

The indentation change in Over->next_by_mid is to ensure
DBI->prepare_cached can share across both ->next_by_mid
and ->mid2tid.

I also noticed the existing regex for `POST /$INBOX/?x=m&q=' was
allowing extra characters.  With an added \z, it's now as strict
was originally intended and AFAIK nothing was generating invalid
URLs for it

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/aaniyhk7wfm4e6m5mbukcrhevzoc6ftctyrfwvmz4fkykwwtlj@mverfng6ytas/T/
2 years agocindex: interleave prune with indexing
Eric Wong [Wed, 29 Mar 2023 20:32:59 +0000 (20:32 +0000)] 
cindex: interleave prune with indexing

We need to ensure we don't block indexing for too long while
pruning, since pruning coderepos seems more frequent and
necessary than inbox repos due to the prevalence of force
pushes with branches like `seen' (formerly `pu') in git.git.

Implement this via ->event_step and requeue mechanisms of DS so
we periodically flush our work and let indexing resume.

I originally wanted to implement this as a dedicated group
of workers, but the XS Search::Xapian bug[1] workaround
to handle uncaught C++ exceptions was expensive and complex
compared to the evented mechanism.

[1] https://lists.xapian.org/pipermail/xapian-discuss/2023-March/009967.html
   <20230327114604.M803690@dcvr>

2 years agocindex: leave SIGTSTP and SIGCONT unblocked
Eric Wong [Tue, 28 Mar 2023 02:59:04 +0000 (02:59 +0000)] 
cindex: leave SIGTSTP and SIGCONT unblocked

This makes it easier to pause and restart long-running indexing
jobs which use our event loop.

2 years agocindex: always break out of event loop on $DO_QUIT
Eric Wong [Tue, 28 Mar 2023 02:59:02 +0000 (02:59 +0000)] 
cindex: always break out of event loop on $DO_QUIT

Shard workers may not die soon enough (or get stuck), just let
the parent die earlier since it doesn't need to commit anything.

2 years agocindex: simplify some internal data structures
Eric Wong [Tue, 28 Mar 2023 02:59:01 +0000 (02:59 +0000)] 
cindex: simplify some internal data structures

We'll rely more on local-ized `our' globals rather than
hashref fields.  The former is more resistant to typos
and can be checked at compile-time earlier via `perl -c'.

The {-internal} field is also renamed to {-cidx_internal}
in case to reduce confusion within a large code base.

2 years agot/lei-refresh-mail-sync: improve test reliability
Eric Wong [Tue, 28 Mar 2023 10:53:58 +0000 (10:53 +0000)] 
t/lei-refresh-mail-sync: improve test reliability

Lack of signalfd/EVFILT_SIGNAL means we need to kill a
process repeatedly to ensure it wakes up.

2 years agoinotify: wrap with informative error message
Eric Wong [Tue, 28 Mar 2023 11:12:36 +0000 (11:12 +0000)] 
inotify: wrap with informative error message

As encountered by Louis DeLosSantos, Linux inotify is capped by
a lesser-known limit than the standard RLIMIT_NOFILE (`ulimit -n`)
value.  Give the user a hint about the fs.inotify.max_user_instances
sysctl knob on EMFILE, since EMFILE alone may mislead users into
thinking they've hit the (typically higher) RLIMIT_NOFILE limit.

I can test this on my system using:

  perl -I lib -MPublicInbox::Inotify -E \
   'my @x = map { PublicInbox::Inotify->new } (1..128)'

But I hesitate to include it in the test suite since triggering
the limit can cause unrelated processes to fail.

Link: https://public-inbox.org/meta/CAE6jdTo8iQfNM9Yuk0Dwi-ARMxmQxX-onL8buXcQ9Ze3r0hKrg@mail.gmail.com/
Reported-by: Louis DeLosSantos <louis.delos@gmail.com>
2 years agogit: check for --version errors
Eric Wong [Sun, 26 Mar 2023 23:48:03 +0000 (23:48 +0000)] 
git: check for --version errors

While unlikely, `git --version' may fail, so we must check for
errors and by reaping the process ASAP via tied close().

2 years agowatch: do not recreate signalfd on SIGHUP
Eric Wong [Sun, 26 Mar 2023 10:52:46 +0000 (10:52 +0000)] 
watch: do not recreate signalfd on SIGHUP

The normal method by which PublicInbox::DS::event_loop sets up
signals once needs some coercing to work with -watch.
Otherwise, we'll end up wasting FDs every time somebody reloads
-watch via SIGHUP.

2 years agowatch: avoid Mail::IMAPClient errors when disconnected
Eric Wong [Sun, 26 Mar 2023 10:52:45 +0000 (10:52 +0000)] 
watch: avoid Mail::IMAPClient errors when disconnected

No point in issuing LOGOUT commands and causing Mail::IMAPClient
to spew a giant backtrace when we're unconnected.

2 years agolei_mirror: fix sh syntax error in "make help" target
Eric Wong [Sun, 26 Mar 2023 08:21:32 +0000 (08:21 +0000)] 
lei_mirror: fix sh syntax error in "make help" target

Oops :x

Fixes: 836faf5093df ("lei_mirror: add `index' target to generated Makefile")
2 years agoMerge branch 'cindex'
Eric Wong [Sun, 26 Mar 2023 09:35:43 +0000 (09:35 +0000)] 
Merge branch 'cindex'

* cindex: (29 commits)
  cindex: --prune checkpoints to avoid OOM
  cindex: ignore SIGPIPE
  cindex: respect existing permissions
  cindex: squelch incompatible options
  cindex: implement reindex
  cindex: add support for --prune
  cindex: filter out non-existent git directories
  spawn: show failing directory for chdir failures
  cindex: improve granularity of quit checks
  cindex: attempt to give oldest commits lowest docids
  cindex: truncate or drop body for over-sized commits
  cindex: check for checkpoint before giant messages
  cindex: implement --max-size=SIZE
  sigfd: pass signal name rather than number to callback
  cindex: handle graceful shutdown by default
  cindex: drop `unchanged' progress message
  cindex: show shard number in progress message
  cindex: implement --exclude= like -clone
  ds: @post_loop_do replaces SetPostLoopCallback
  cindex: use DS and workqueues for parallelism
  ...

2 years agolei: improve bash completion involving colons
Eric Wong [Thu, 23 Mar 2023 21:45:45 +0000 (21:45 +0000)] 
lei: improve bash completion involving colons

This fixes completions of labels (`+L:' for `lei import' and
`L:' for `lei q') so they can appear anywhere in the
command-line.

I mainly wanted this for `lei import $URL +L:label', but
this also fixes `lei forget-external' completions for URLs
(which involve colons).

2 years agolei_store: avoid redundant work on no-op worker spawn
Eric Wong [Sat, 25 Mar 2023 11:11:05 +0000 (11:11 +0000)] 
lei_store: avoid redundant work on no-op worker spawn

While ->wq_workers_start is idempotent, the pipe creation for
PublicInbox::LeiStoreErr was not and required several extra
syscalls and FD allocations.  Check the correct field required
for SOCK_SEQPACKET workers rather than pipe-based workers.

Fixes: cbc2890cb89b81cb ("lei/store: use SOCK_SEQPACKET rather than pipe")
2 years agocindex: --prune checkpoints to avoid OOM
Eric Wong [Fri, 24 Mar 2023 10:40:22 +0000 (10:40 +0000)] 
cindex: --prune checkpoints to avoid OOM

Having many ->delete_document calls in a transaction still
causes Xapian to eat up a large amount of memory and OOM on my
system.

I may reimplement --prune to avoid blocking ongoing updates, but
this is a simple fix for swapping and OOMs for now.

2 years agocindex: ignore SIGPIPE
Eric Wong [Tue, 21 Mar 2023 23:07:43 +0000 (23:07 +0000)] 
cindex: ignore SIGPIPE

We check for all socket write errors anyways, and I don't expect
stderr output to be significant enough to matter.

2 years agocindex: respect existing permissions
Eric Wong [Tue, 21 Mar 2023 23:07:42 +0000 (23:07 +0000)] 
cindex: respect existing permissions

For internal ($GIT_DIR/public-inbox-cindex) Xapian DBs, we can
rely on core.sharedRepository.  For external ones, we'll just
rely on existing permissions if the directory already exists.

2 years agocindex: squelch incompatible options
Eric Wong [Tue, 21 Mar 2023 23:07:41 +0000 (23:07 +0000)] 
cindex: squelch incompatible options

Some options don't make sense when used together.

2 years agocindex: implement reindex
Eric Wong [Tue, 21 Mar 2023 23:07:40 +0000 (23:07 +0000)] 
cindex: implement reindex

This allows changing --indexlevel at the moment and will allow
us to fix some yet-to-be-discovered bugs or backwards-compatible
improvements in the future.

2 years agocindex: add support for --prune
Eric Wong [Tue, 21 Mar 2023 23:07:39 +0000 (23:07 +0000)] 
cindex: add support for --prune

This gets rid of both inaccessible commits AND repositories.
It will only unindex commits which are pruned in git, first,
so repos with auto GC disabled will need GC to prune them.

2 years agocindex: filter out non-existent git directories
Eric Wong [Tue, 21 Mar 2023 23:07:38 +0000 (23:07 +0000)] 
cindex: filter out non-existent git directories

We'll just warn them about our non-existent prune support,
for now, and implement --prune in the next commit.

2 years agospawn: show failing directory for chdir failures
Eric Wong [Tue, 21 Mar 2023 23:07:37 +0000 (23:07 +0000)] 
spawn: show failing directory for chdir failures

Our use of `git rev-parse --git-dir' depends on our (v)fork+exec
wrapper doing chdir, so the error message is required to avoid
user confusion.  I'm still avoiding `git -C $DIR' for now since
ancient versions of git did not support it.

2 years agocindex: improve granularity of quit checks
Eric Wong [Tue, 21 Mar 2023 23:07:36 +0000 (23:07 +0000)] 
cindex: improve granularity of quit checks

This fixes shutdown handling when shard_index() isn't running
and ensures we can shut down the process more quickly.

2 years agocindex: attempt to give oldest commits lowest docids
Eric Wong [Tue, 21 Mar 2023 23:07:35 +0000 (23:07 +0000)] 
cindex: attempt to give oldest commits lowest docids

Monotonically increasing docids may help us avoid sorting output
for the web and CLI, since recent commits are generally the most
desired search results.

`git log --reverse' incurs no extra overhead in this case, since
`--stdin' will mean git buffers the commit list in memory before
attempting to emit anything.

2 years agocindex: truncate or drop body for over-sized commits
Eric Wong [Tue, 21 Mar 2023 23:07:34 +0000 (23:07 +0000)] 
cindex: truncate or drop body for over-sized commits

We need to get at least the commit OID indexed to
avoid redundant work.

2 years agocindex: check for checkpoint before giant messages
Eric Wong [Tue, 21 Mar 2023 23:07:33 +0000 (23:07 +0000)] 
cindex: check for checkpoint before giant messages

Giant messages may put us far over the batch limit if we're
close to it.

2 years agocindex: implement --max-size=SIZE
Eric Wong [Tue, 21 Mar 2023 23:07:32 +0000 (23:07 +0000)] 
cindex: implement --max-size=SIZE

This matches existing behavior of -index and -extindex, and
will hopefully allow me to avoid OOM problems by skipping
problematic commits.

2 years agosigfd: pass signal name rather than number to callback
Eric Wong [Tue, 21 Mar 2023 23:07:31 +0000 (23:07 +0000)] 
sigfd: pass signal name rather than number to callback

This is consistent with normal Perl %SIG handlers, and allows
-cindex signal handlers to be implemented consistently across
platforms.

2 years agocindex: handle graceful shutdown by default
Eric Wong [Tue, 21 Mar 2023 23:07:30 +0000 (23:07 +0000)] 
cindex: handle graceful shutdown by default

While individual Xapian shards are consistent due to the use of
Xapian transactions, the data across shards still needs to be
in a consistent state for our search to work.

2 years agocindex: drop `unchanged' progress message
Eric Wong [Tue, 21 Mar 2023 23:07:29 +0000 (23:07 +0000)] 
cindex: drop `unchanged' progress message

It's too noisy, and a similar message isn't emitted by -clone.

2 years agocindex: show shard number in progress message
Eric Wong [Tue, 21 Mar 2023 23:07:28 +0000 (23:07 +0000)] 
cindex: show shard number in progress message

Otherwise it may be confusing to see the `$nr' value walk
backwards if some shards are indexing at a slower pace.

2 years agocindex: implement --exclude= like -clone
Eric Wong [Tue, 21 Mar 2023 23:07:27 +0000 (23:07 +0000)] 
cindex: implement --exclude= like -clone

This is to ensure we can exclude certain repos which are
expensive-to-index (e.g. `**/deps.git', `**/transparency-logs/**').

2 years agods: @post_loop_do replaces SetPostLoopCallback
Eric Wong [Tue, 21 Mar 2023 23:07:26 +0000 (23:07 +0000)] 
ds: @post_loop_do replaces SetPostLoopCallback

This allows us to avoid repeatedly using memory-intensive
anonymous subs in CodeSearchIdx where the callback is assigned
frequently.  Anonymous subs are known to leak memory in old
Perls (e.g. 5.16.3 in enterprise distros) and still expensive in
newer Perls.  So favor the (\&subroutine, @args) form which
allows us to eliminate anonymous subs going forward.

Only CodeSearchIdx takes advantage of the new API at the moment,
since it's the biggest repeat user of post-loop callback
changes.

Getting rid of the subroutine and relying on a global `our'
variable also has two advantages:

1) Perl warnings can detect typos at compile-time, whereas the
   (now gone) method could only detect errors at run-time.

2) `our' variable assignment can be `local'-ized to a scope

2 years agocindex: use DS and workqueues for parallelism
Eric Wong [Tue, 21 Mar 2023 23:07:25 +0000 (23:07 +0000)] 
cindex: use DS and workqueues for parallelism

This avoids forking new shard processes for each repo we scan,
but we can't avoid many excessive commits since we need to
ensure the `seen()' sub can avoid excessive work.

2 years agosearchidxshard: improve comment wording
Eric Wong [Tue, 21 Mar 2023 23:07:24 +0000 (23:07 +0000)] 
searchidxshard: improve comment wording

Just something I noticed while considering using this package
for CodeSearchIdx.

2 years agocindex: use read-only shards during prep phases
Eric Wong [Tue, 21 Mar 2023 23:07:23 +0000 (23:07 +0000)] 
cindex: use read-only shards during prep phases

No need to open shards for read/write access when read-only
will do.  Since we also control how a document gets sharded,
we'll also access the shard directly instead of letting Xapian
do the mappings.

--reindex didn't work properly before this change since it was
over-indexing.  It is now broken in the opposite way in that it
doesn't do reindexing at all.  --reindex will be implemented
properly in the future.

2 years agocindex: parallelize prep phases
Eric Wong [Tue, 21 Mar 2023 23:07:22 +0000 (23:07 +0000)] 
cindex: parallelize prep phases

Listing refs, fingerprinting and root scanning can all be
parallelized to reduce runtime on SMP systems.

We'll use DESTROY-based dependency management with
parallelizagion as in LeiMirror to handle ref listing and
fingerprinting before serializing Xapian DB access to check
against the existing fingerprint.

We'll also delay root listing until we get a fingerprint
mismatch to speed up no-op indexing.

2 years agocodesearch: initial cut w/ -cindex tool
Eric Wong [Tue, 21 Mar 2023 23:07:21 +0000 (23:07 +0000)] 
codesearch: initial cut w/ -cindex tool

It seems relying on root commits is a reasonable way to
deduplicate and handle repositories with common history.

I initially wanted to shoehorn this into extindex, but decided a
separate Xapian index layout capable of being EITHER external to
handle many forks or internal (in $GIT_DIR/public-inbox-cindex)
for small projects is the right way to go.

Unlike most existing parts of public-inbox, this relies on
absolute paths of $GIT_DIR stored in the Xapian DB and does not
rely on the config file.  We'll be relying on the config file to
map absolute paths to public URL paths for WWW.

2 years agotest_common: create_inbox: use `$!' properly on mkdir failure
Eric Wong [Tue, 21 Mar 2023 23:07:20 +0000 (23:07 +0000)] 
test_common: create_inbox: use `$!' properly on mkdir failure

stat(2) may fail and set `$!', too, so we must stash it, first.

2 years agoadmin: ensure resolved GIT_DIR is absolute
Eric Wong [Tue, 21 Mar 2023 23:07:19 +0000 (23:07 +0000)] 
admin: ensure resolved GIT_DIR is absolute

We'll also support the $base arg of File::Spec->rel2abs
since it should make codesearch indexing easier.

2 years agoadmin: hoist out resolve_git_dir
Eric Wong [Tue, 21 Mar 2023 23:07:18 +0000 (23:07 +0000)] 
admin: hoist out resolve_git_dir

We'll be using this for indexing git coderepos, and
switch to Perl 5.12 while we're at it since unicode_strings
doesn't affect this package.

2 years agosearch: relocate all_terms from lei_search
Eric Wong [Tue, 21 Mar 2023 23:07:17 +0000 (23:07 +0000)] 
search: relocate all_terms from lei_search

This will be used for code_search, too.

2 years agoipc: move nproc_shards from v2writable
Eric Wong [Tue, 21 Mar 2023 23:07:16 +0000 (23:07 +0000)] 
ipc: move nproc_shards from v2writable

We'll be using nproc_shards for indexing non-Inbox stuff.

2 years agoipc: retry sendmsg + recvmsg calls on EINTR
Eric Wong [Sat, 25 Mar 2023 02:08:52 +0000 (02:08 +0000)] 
ipc: retry sendmsg + recvmsg calls on EINTR

I'm not sure how this went undetected for so long, but EINTR
must be checked for when working with blocking sockets.  EINTR
shouldn't happen for non-blocking sockets, though, but it's
easier to just use the new wrapper in most of those places.

I don't know what I was smoking when I left out EINTR checks :x

2 years agoMerge branch 'fetch.hiderefs' into mirror
Eric Wong [Thu, 23 Mar 2023 08:21:54 +0000 (08:21 +0000)] 
Merge branch 'fetch.hiderefs' into mirror

* fetch.hiderefs:
  lei_mirror: use fetch.hideRefs to speed up connectivity check

2 years agoclone: support --purge to delete remotely-deleted repos
Eric Wong [Sat, 18 Mar 2023 12:02:13 +0000 (12:02 +0000)] 
clone: support --purge to delete remotely-deleted repos

This lets us clean up disk space when repos are removed
on the remote side.

2 years agoclone: show stale directories unconditionally
Eric Wong [Sat, 18 Mar 2023 12:02:12 +0000 (12:02 +0000)] 
clone: show stale directories unconditionally

--project-list= is no longer required to show stale
repositories.

2 years agodoc: clone: note the default value of --remote-manifest=
Eric Wong [Sat, 18 Mar 2023 12:02:11 +0000 (12:02 +0000)] 
doc: clone: note the default value of --remote-manifest=

It may not be immediately obvious to users unfamiliar with
grokmirror.

2 years agoconfig: glob2re supports `**' to match multiple path components
Eric Wong [Fri, 17 Mar 2023 20:31:37 +0000 (20:31 +0000)] 
config: glob2re supports `**' to match multiple path components

This should match behavior documented in gitglossary(7)

2 years agotreewide: move glob2re to PublicInbox::Config
Eric Wong [Fri, 17 Mar 2023 20:31:36 +0000 (20:31 +0000)] 
treewide: move glob2re to PublicInbox::Config

It seems suitable for the config class since globs are a
config/option thing.

2 years agods: reap_pids: remove redundant signal blocking
Eric Wong [Wed, 15 Mar 2023 21:47:56 +0000 (21:47 +0000)] 
ds: reap_pids: remove redundant signal blocking

Blocking signals when reaping was done when the lei pager was
spawned by the daemon in b90e8d6e02.  Shortly afterwards in
7b79c918a5, the client script took over spawning of the pager
and made b90e8d6e02 redundant.

cf. b90e8d6e02 (ds: block signals when reaping, 2021-01-10)
    7b79c918a5 (lei: run pager in client script, 2021-01-10)

2 years agolei_mirror: use fetch.hideRefs to speed up connectivity check
Eric Wong [Wed, 15 Feb 2023 22:20:23 +0000 (22:20 +0000)] 
lei_mirror: use fetch.hideRefs to speed up connectivity check

`git fetch' runs an expensive connectivity check against all
refs, which is unnecessarily expensive for incremental fetches
on RAM-constrained systems.

This depends on the proposal to support `fetch.hideRefs' for `git fetch':
https://public-inbox.org/git/20230212090426.M558990@dcvr/

2 years agot/solver_git: squelch non-UTF-8 commit warning
Eric Wong [Mon, 13 Mar 2023 20:17:14 +0000 (20:17 +0000)] 
t/solver_git: squelch non-UTF-8 commit warning

We're making an ISO-8859-1 commit for testing purposes, so use
i18n.commitEncoding to shut git up.

2 years agowatch: add space before "UID" or "ARTICLE" in warnings
Eric Wong [Mon, 13 Mar 2023 19:38:27 +0000 (19:38 +0000)] 
watch: add space before "UID" or "ARTICLE" in warnings

In other words, it now shows `imap://example.com/INBOX.foo UID:123'
instead of: `imap://example.com/INBOX.foo UID:123'

2 years agospamcheck: use v5.12 and golf
Eric Wong [Mon, 13 Mar 2023 19:38:26 +0000 (19:38 +0000)] 
spamcheck: use v5.12 and golf

No problems with `unicode_strings' in these modules.  We can
also shave our LoC count in a few places.

2 years agouse v5.12 for various network client-side packages
Eric Wong [Mon, 13 Mar 2023 19:38:25 +0000 (19:38 +0000)] 
use v5.12 for various network client-side packages

None of these are affected by the Perl unicode_strings feature,
so they can `use v5.12' safely

2 years agodoc: clone: fix typo in --remote-manifest= description
Eric Wong [Tue, 14 Mar 2023 20:48:19 +0000 (20:48 +0000)] 
doc: clone: fix typo in --remote-manifest= description

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v8j4ql8k.fsf@kyleam.com/
2 years agodoc: clone: document --remote-manifest= option
Eric Wong [Mon, 13 Mar 2023 12:00:24 +0000 (12:00 +0000)] 
doc: clone: document --remote-manifest= option

2 years agolei_mirror: handle UTF-8 from manifest.js.gz properly
Eric Wong [Mon, 13 Mar 2023 12:00:23 +0000 (12:00 +0000)] 
lei_mirror: handle UTF-8 from manifest.js.gz properly

This should ensure we display the "git config gitweb.owner
$OWNER" command invocation properly and also ensures we set the
description properly without triggering wide character warnings.

Also tested with a smallish iproute2 repo
(/pub/scm/linux/kernel/git/toke/iproute2.git) using my mirror:

  public-inbox-clone --remote-manifest=pub/manifest.js.gz \
    --include='*/toke/iproute2.git' --inbox-config=never \
    https://80x24.org/lore $DST

Anyways, I'm fairly certain this change and its tests are
correct; but I still struggle to understand Perl's approach to
Unicode and it's interactions with various JSON implementations.

Fixes: 0830817c132cb105 ("lei_mirror: show non-ASCII owner properly w/ --verbose")
2 years agolei_mirror: do not fetch to read-only directories
Eric Wong [Mon, 13 Mar 2023 12:00:22 +0000 (12:00 +0000)] 
lei_mirror: do not fetch to read-only directories

As with public-inbox-fetch, we shouldn't waste time fetching
into read-only directories, since --epoch= will make unwanted
epoch directories read-only placeholders.

2 years agolei_mirror: do not re-fetch inbox.config.example
Eric Wong [Mon, 13 Mar 2023 12:00:21 +0000 (12:00 +0000)] 
lei_mirror: do not re-fetch inbox.config.example

It's a significant source of latency for incremental updates at
the moment, and not really needed since it's just an example.

2 years agolei_mirror: describe why the {ibx} field is used
Eric Wong [Mon, 13 Mar 2023 12:00:20 +0000 (12:00 +0000)] 
lei_mirror: describe why the {ibx} field is used

I forgot why that hunk of code was needed :x, so maybe others
will find the comment helpful, too.

2 years agot/solver_git: fix when Plack::Middleware::ReverseProxy is missing
Eric Wong [Mon, 13 Mar 2023 11:58:11 +0000 (11:58 +0000)] 
t/solver_git: fix when Plack::Middleware::ReverseProxy is missing

We need to ignore the harmless warnings in stderr when
Plack::Middleware::ReverseProxy isn't installed.

2 years agolei_dedupe: simplify smsg_hash sub
Eric Wong [Sat, 11 Mar 2023 17:36:00 +0000 (17:36 +0000)] 
lei_dedupe: simplify smsg_hash sub

We can just use the sha256() sub instead of dealing with the
OO interface for a small string.

2 years agodoc: 2.0.0 release notes update
Eric Wong [Thu, 9 Mar 2023 19:28:42 +0000 (19:28 +0000)] 
doc: 2.0.0 release notes update

Did some stuff, still a ton of stuff to do :x

2 years agodoc: lei config: update with --edit and --list examples
Eric Wong [Thu, 9 Mar 2023 19:28:41 +0000 (19:28 +0000)] 
doc: lei config: update with --edit and --list examples

I typically use --edit/-e to make changes and --list/-l with
git; and same with lei.

2 years agodoc: lei import: add hints about nntp.* and imap.* config options
Eric Wong [Thu, 9 Mar 2023 19:28:40 +0000 (19:28 +0000)] 
doc: lei import: add hints about nntp.* and imap.* config options

I'm setting up more imports and forgot about them :x

2 years agodoc: technical: document weird stuff in our codebase
Eric Wong [Thu, 9 Mar 2023 19:28:39 +0000 (19:28 +0000)] 
doc: technical: document weird stuff in our codebase

Hopefully this makes things less surprising to new hackers.

2 years agodoc: technical/ds: update blurb to note more daemons
Eric Wong [Thu, 9 Mar 2023 19:28:38 +0000 (19:28 +0000)] 
doc: technical/ds: update blurb to note more daemons

And add a note about the various wakeup modes of kqueue|epoll
while we're at it; we use all of them!

2 years agodoc: technical/memory: add note about mwrap-perl
Eric Wong [Thu, 9 Mar 2023 19:28:37 +0000 (19:28 +0000)] 
doc: technical/memory: add note about mwrap-perl

It's already fixed memory usage problems not only in our codebase,
but also the standard `Encode' XS module and `git pack-objects'.

2 years agolei_mirror: unlink FETCH_HEAD when fetching forkgroups
Eric Wong [Wed, 8 Mar 2023 11:02:58 +0000 (11:02 +0000)] 
lei_mirror: unlink FETCH_HEAD when fetching forkgroups

Apparently, --no-write-fetch-head is broken in current git[1].
It also wasn't in older git, at all.  So just unlink FETCH_HEAD
as we see it, but keep using --no-write-fetch-head to avoid the
syscall and I/O overhead when we can.

[1] https://yhbt.net/lore/git/20230308100438.908471-1-e@80x24.org/

2 years agotest_common: run_script: drop special-case for -clone
Eric Wong [Tue, 7 Mar 2023 09:54:15 +0000 (09:54 +0000)] 
test_common: run_script: drop special-case for -clone

`make check' and `make check-run' actually work fine with it,
and TMPDIR=/dev/shm prove -lvw t/clone-coderepo.t is 2-3x faster

2 years agocgit: fix smart HTTP clone interception
Eric Wong [Tue, 7 Mar 2023 09:32:37 +0000 (09:32 +0000)] 
cgit: fix smart HTTP clone interception

We need to use the proper hash and key to do coderepo lookups
since we culled a redundant data structure a few months back.

Fixes: 1802dc29bda25a54 ("www_coderepo: do not copy {-code_repos} from config")
2 years agosha: fix compatibility with old OpenSSL + Net::SSLeay
Eric Wong [Tue, 7 Mar 2023 08:47:15 +0000 (08:47 +0000)] 
sha: fix compatibility with old OpenSSL + Net::SSLeay

In older OpenSSL, EVP_get_digestbyname() didn't work properly
without calling OpenSSL_add_all_digests(), first.  However,
OpenSSL_add_all_digests() is deprecated by OpenSSL 1.1.0 in
favor of OPENSSL_init_crypto().  Of course, OpenSSL_init_crypto()
isn't available in OpenSSL 1.0.1k nor Net::SSLeay as of 1.93_02
(2023-02-22).

Thus, instead of relying on string lookups and conditional
subroutine calls, just call EVP_sha1() and EVP_sha256() which
work on both old and new systems.

Tested with Net::SSLeay 1.55 and OpenSSL 1.0.1k on on CentOS 7.x

2 years agodoc: update public-inbox-clone examples and help
Eric Wong [Sun, 5 Mar 2023 22:18:11 +0000 (22:18 +0000)] 
doc: update public-inbox-clone examples and help

Basically, public-inbox-clone has become grok-pull without
config files nor absolute paths.

2 years agodoc: drop hosted.txt
Eric Wong [Thu, 2 Mar 2023 00:13:14 +0000 (00:13 +0000)] 
doc: drop hosted.txt

I'll have to downsize the server due to increased hosting costs,
so stop advertising these mirrors.

The inboxes still exist, for now; but will probably be proxied
behind an ssh tunnel via slow DSL connection, but it's not worth
increasing traffic to.

2 years agodoc: update clone+fetch with 2.0+ switches
Eric Wong [Mon, 27 Feb 2023 10:21:05 +0000 (10:21 +0000)] 
doc: update clone+fetch with 2.0+ switches

Because old versions will exist for a long time and our latest
documentation is visible on the web, we must document when a
switch appears to avoid confusing users of old versions.

2 years agoprocess_pipe: BINMODE: pass LAYER argument
Eric Wong [Mon, 27 Feb 2023 07:18:34 +0000 (07:18 +0000)] 
process_pipe: BINMODE: pass LAYER argument

We'll end up using this to handle `:utf8', probably.

2 years agodoc: note "lei q -tt" is broken with HTTP(S) remotes
Eric Wong [Sun, 26 Feb 2023 17:15:06 +0000 (17:15 +0000)] 
doc: note "lei q -tt" is broken with HTTP(S) remotes

I'm still trying to decide how to handle HTTP(S) remotes
properly...

Link: https://public-inbox.org/meta/20230226170931.M947721@dcvr/
2 years agods: write: do not assume final wbuf entry is tmpio
Eric Wong [Fri, 24 Feb 2023 16:59:10 +0000 (16:59 +0000)] 
ds: write: do not assume final wbuf entry is tmpio

The final entry of {wbuf} may be a CODE ref and not a
tmpio ARRAY ref, so we must ensure it's an ARRAY before
attempting to use `->[INDEX]' to access it.

This fixes:
  forward ->close error: Not an ARRAY reference at PublicInbox/DS.pm line 544.

2 years agoexamples: remove `Standard{Error,Output} = syslog' lines
Eric Wong [Wed, 22 Feb 2023 18:17:39 +0000 (18:17 +0000)] 
examples: remove `Standard{Error,Output} = syslog' lines

systemd (247.3-7+deb11u1 on Debian 11.x) considers them "obsolete" and
emits the following to my syslog:

  Standard output type syslog is obsolete, automatically updating to journal.
  Please update your unit file, and consider removing the setting altogether.

So we'll remove it altogether, as I'm sticking with rsyslog for now.

2 years agotreewide: simplify File::Path mkpath/make_path callers
Eric Wong [Wed, 22 Feb 2023 17:25:55 +0000 (17:25 +0000)] 
treewide: simplify File::Path mkpath/make_path callers

File::Path already accounts for the existence of directories,
handles races from redundant mkdir(2), and croaks on
unrecoverable errors.  So there's no point in doing any
of that on our end.

Furthermore, avoiding the overhead of loading File::Path doesn't
seem worth it to save 20-60ms given the overhead of loading
our other code.  Instead, try to reduce optree overhead on
our code, instead, since File::Path gets used in a bunch of
places.

We'll also favor the newer make_path for multi-directory
invocations to avoid bloating our own optree to create an
arrayref, but mkpath is one fewer subroutine call within
File::Path itself, right now.

2 years agosendmsg: prefix sleep message with `#'
Eric Wong [Wed, 22 Feb 2023 17:25:52 +0000 (17:25 +0000)] 
sendmsg: prefix sleep message with `#'

It's an informative message that's harmless, so hopefully
the `#' prefix puts the users mind at ease.

(I saw it on an `lei import' against an IMAP source)

2 years agolei_mirror: support --remote-manifest=URL
Eric Wong [Tue, 21 Feb 2023 12:17:44 +0000 (12:17 +0000)] 
lei_mirror: support --remote-manifest=URL

Since PublicInbox::WWW already generates manifest.js.gz, I'm
using an alternate path with PublicInbox::WwwStatic to host the
manifest.js.gz for coderepos at an alternate location.  The
following snippet lets me host
https://yhbt.net/lore/pub/manifest.js.gz for mirrored git
repositories, while https://yhbt.net/lore/manifest.js.gz
(no `pub') remains for inbox mirroring.

==> sample.psgi <==
use PublicInbox::WWW;
use PublicInbox::WwwStatic;
my $www = PublicInbox::WWW->new; # use default PI_CONFIG
my $st = PublicInbox::WwwStatic->new(docroot => '/path/to/code');
my $www_cb = sub {
my ($env) = @_;
if ($env->{PATH_INFO} eq '/pub/manifest.js.gz') {
local $env->{PATH_INFO} = '/manifest.js.gz';
my $res = $st->call($env);
return $res if $res->[0] != 404;
}
$www->call($env);
};
builder {
enable 'ReverseProxy';
enable 'Head';
mount '/lore' => $www_cb;
}

2 years agoviewvcs: handle non-UTF-8 commit message
Eric Wong [Tue, 21 Feb 2023 11:17:58 +0000 (11:17 +0000)] 
viewvcs: handle non-UTF-8 commit message

Back in the old days, git didn't store commit encodings
and allowed messages in various encodings to enter history.
Assuming such a commit is UTF-8 trips up s/// operations
on buffers read with the `:utf8' PerlIO layer.  So clear
Perl's internal UTF-8 flag if we end up with something
which isn't valid UTF-8

An example is commit 7eb93c89651c47c8095d476251f2e4314656b292
in git.git ([PATCH] Simplify git script, 2005-09-07)

2 years agoREADME: add POP3 bits
Eric Wong [Mon, 20 Feb 2023 11:06:03 +0000 (11:06 +0000)] 
README: add POP3 bits

Maybe this can make our newish support of POP3 more
noticeable...

2 years agosearchidx: do not index quoted Base-85 patches
Eric Wong [Mon, 20 Feb 2023 09:21:50 +0000 (09:21 +0000)] 
searchidx: do not index quoted Base-85 patches

Base-85 binary patches were a source of false-positives in results
and we've filtered out in non-quoted text since July 2022.
Unfortunately, people were quoting binary patch contents
in replies (*sigh*) and triggering false positives in search
results.  So we must filter out base-85-looking contents from
quoted text, too.

Followup-to: 8fda04081acde705 (search: do not index base-85 binary patches, 2022-06-20)
Followup-to: 840785917bc74c8e (searchidx: skip "delta $N" sections for base-85, 2022-07-19)
2 years agomulti_git: do not set include.path if already set
Eric Wong [Mon, 20 Feb 2023 05:32:02 +0000 (05:32 +0000)] 
multi_git: do not set include.path if already set

The epoch may already be read-only, and we don't need to cause
more I/O traffic and disk wear for no-op stuff.  This fixes
idempotent use of public-inbox-clone to update multi-epoch
inboxes.

2 years agogit_async_cat: don't mis-abort replaced process
Eric Wong [Mon, 20 Feb 2023 08:19:43 +0000 (08:19 +0000)] 
git_async_cat: don't mis-abort replaced process

When a git process gets replaced (e.g. due to new
epochs/alternates), we must be careful and not abort the wrong
one.

I suspect this fixes the problem exacerbated by --batch-command.
It was theoretically possible w/o --batch-command, but it seems
to have made it surface more readily.

This should fix "Failed to retrieve generated blob" errors from
PublicInbox/ViewVCS.pm appearing in syslog

Link: https://public-inbox.org/meta/20230209012932.M934961@dcvr/
2 years agosearch: translate d: to dt: in query
Eric Wong [Sun, 19 Feb 2023 08:18:14 +0000 (08:18 +0000)] 
search: translate d: to dt: in query

dt: is higher resolution and the YYYYMMDD column will be dropped
if there's ever another SCHEMA_VERSION update.  While the
upcoming code repo index is independent of the mail schemas,
it'll use similar query prefixes and likely use d:/dt: for
Author Date of git commits.

2 years agosearch: move query transform + enquire setup out of retry loop
Eric Wong [Fri, 17 Feb 2023 10:36:14 +0000 (10:36 +0000)] 
search: move query transform + enquire setup out of retry loop

The Xapian query transformation and Enquire object setup aren't
subject to MVCC and retries, so move it outside the retry loop
to save some cycles in case we need to retry on a busy DB.

2 years agopublic-inbox.cgi(1): Mention AllowEncodedSlashes for Apache setups
Uwe Kleine-König [Fri, 17 Feb 2023 11:08:50 +0000 (12:08 +0100)] 
public-inbox.cgi(1): Mention AllowEncodedSlashes for Apache setups

When AllowEncodedSlashes is Off (the default setting), URLs containing
%2f are replied with a 404 error without calling the CGI. To (maybe)
prevent others debugging this issue add a hint with the solution.

2 years agoTODO: handle more cases of unencoded slashes
Eric Wong [Fri, 17 Feb 2023 10:32:22 +0000 (10:32 +0000)] 
TODO: handle more cases of unencoded slashes

Nowadays, mutt defaults to Message-IDs with `/' in them :<

2 years agoMakefile.PL: drop update-copyrights rule
Eric Wong [Wed, 15 Feb 2023 08:01:12 +0000 (08:01 +0000)] 
Makefile.PL: drop update-copyrights rule

I'm no longer updating them since it's noisy and acceptable
to not have them:

  https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/

I'm tired of being reminded what year it is :<

2 years agodoc: extindex update on configuration and union section
Eric Wong [Wed, 15 Feb 2023 08:01:11 +0000 (08:01 +0000)] 
doc: extindex update on configuration and union section

The coderepo indexer will use similar ideas, I think...

2 years agodoc: flow: update with newer tools, note forkability
Eric Wong [Wed, 15 Feb 2023 08:01:10 +0000 (08:01 +0000)] 
doc: flow: update with newer tools, note forkability

public-inbox-{clone,fetch,netd} are all relatively new
developments which we can document, here.

We'll also update the generator Makefile snippet since there may
be more Graph::Easy-based docs coming.

2 years agodoc: WWW + cgi: favor -netd over -httpd
Eric Wong [Wed, 15 Feb 2023 08:01:09 +0000 (08:01 +0000)] 
doc: WWW + cgi: favor -netd over -httpd

-netd is strictly more powerful and a gateway drug for
imapd/nntpd/pop3d instances :>

2 years agowww_coderepo: handle unborn/dead branches in summary
Eric Wong [Tue, 14 Feb 2023 13:17:39 +0000 (13:17 +0000)] 
www_coderepo: handle unborn/dead branches in summary

We need to account for `git log' showing nothing for invalid
branches and continue to render properly.  We'll also quiet down
`git log' stderr to avoid cluttering stderr, too.

2 years agowww_coderepo: quiet 404s on Atom feeds for dead branches
Eric Wong [Tue, 14 Feb 2023 13:17:38 +0000 (13:17 +0000)] 
www_coderepo: quiet 404s on Atom feeds for dead branches

No need to clutter up logs when a request hits a dead branch.

2 years agolei q: do not collapse threads with `-tt'
Eric Wong [Tue, 14 Feb 2023 02:42:32 +0000 (02:42 +0000)] 
lei q: do not collapse threads with `-tt'

While having Xapian collapse threads is an easy way to reduce
the amount of deduplication work we need to do when writing
out threads; we can't rely on it when using `lei q -tt` since
that needs to flag all hits.

Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://public-inbox.org/git/Y+pgBmj0jxR+cVkD@mail.gmail.com/