]> git.ipfire.org Git - thirdparty/public-inbox.git/log
thirdparty/public-inbox.git
2 years agocindex: respect existing permissions
Eric Wong [Tue, 21 Mar 2023 23:07:42 +0000 (23:07 +0000)] 
cindex: respect existing permissions

For internal ($GIT_DIR/public-inbox-cindex) Xapian DBs, we can
rely on core.sharedRepository.  For external ones, we'll just
rely on existing permissions if the directory already exists.

2 years agocindex: squelch incompatible options
Eric Wong [Tue, 21 Mar 2023 23:07:41 +0000 (23:07 +0000)] 
cindex: squelch incompatible options

Some options don't make sense when used together.

2 years agocindex: implement reindex
Eric Wong [Tue, 21 Mar 2023 23:07:40 +0000 (23:07 +0000)] 
cindex: implement reindex

This allows changing --indexlevel at the moment and will allow
us to fix some yet-to-be-discovered bugs or backwards-compatible
improvements in the future.

2 years agocindex: add support for --prune
Eric Wong [Tue, 21 Mar 2023 23:07:39 +0000 (23:07 +0000)] 
cindex: add support for --prune

This gets rid of both inaccessible commits AND repositories.
It will only unindex commits which are pruned in git, first,
so repos with auto GC disabled will need GC to prune them.

2 years agocindex: filter out non-existent git directories
Eric Wong [Tue, 21 Mar 2023 23:07:38 +0000 (23:07 +0000)] 
cindex: filter out non-existent git directories

We'll just warn them about our non-existent prune support,
for now, and implement --prune in the next commit.

2 years agospawn: show failing directory for chdir failures
Eric Wong [Tue, 21 Mar 2023 23:07:37 +0000 (23:07 +0000)] 
spawn: show failing directory for chdir failures

Our use of `git rev-parse --git-dir' depends on our (v)fork+exec
wrapper doing chdir, so the error message is required to avoid
user confusion.  I'm still avoiding `git -C $DIR' for now since
ancient versions of git did not support it.

2 years agocindex: improve granularity of quit checks
Eric Wong [Tue, 21 Mar 2023 23:07:36 +0000 (23:07 +0000)] 
cindex: improve granularity of quit checks

This fixes shutdown handling when shard_index() isn't running
and ensures we can shut down the process more quickly.

2 years agocindex: attempt to give oldest commits lowest docids
Eric Wong [Tue, 21 Mar 2023 23:07:35 +0000 (23:07 +0000)] 
cindex: attempt to give oldest commits lowest docids

Monotonically increasing docids may help us avoid sorting output
for the web and CLI, since recent commits are generally the most
desired search results.

`git log --reverse' incurs no extra overhead in this case, since
`--stdin' will mean git buffers the commit list in memory before
attempting to emit anything.

2 years agocindex: truncate or drop body for over-sized commits
Eric Wong [Tue, 21 Mar 2023 23:07:34 +0000 (23:07 +0000)] 
cindex: truncate or drop body for over-sized commits

We need to get at least the commit OID indexed to
avoid redundant work.

2 years agocindex: check for checkpoint before giant messages
Eric Wong [Tue, 21 Mar 2023 23:07:33 +0000 (23:07 +0000)] 
cindex: check for checkpoint before giant messages

Giant messages may put us far over the batch limit if we're
close to it.

2 years agocindex: implement --max-size=SIZE
Eric Wong [Tue, 21 Mar 2023 23:07:32 +0000 (23:07 +0000)] 
cindex: implement --max-size=SIZE

This matches existing behavior of -index and -extindex, and
will hopefully allow me to avoid OOM problems by skipping
problematic commits.

2 years agosigfd: pass signal name rather than number to callback
Eric Wong [Tue, 21 Mar 2023 23:07:31 +0000 (23:07 +0000)] 
sigfd: pass signal name rather than number to callback

This is consistent with normal Perl %SIG handlers, and allows
-cindex signal handlers to be implemented consistently across
platforms.

2 years agocindex: handle graceful shutdown by default
Eric Wong [Tue, 21 Mar 2023 23:07:30 +0000 (23:07 +0000)] 
cindex: handle graceful shutdown by default

While individual Xapian shards are consistent due to the use of
Xapian transactions, the data across shards still needs to be
in a consistent state for our search to work.

2 years agocindex: drop `unchanged' progress message
Eric Wong [Tue, 21 Mar 2023 23:07:29 +0000 (23:07 +0000)] 
cindex: drop `unchanged' progress message

It's too noisy, and a similar message isn't emitted by -clone.

2 years agocindex: show shard number in progress message
Eric Wong [Tue, 21 Mar 2023 23:07:28 +0000 (23:07 +0000)] 
cindex: show shard number in progress message

Otherwise it may be confusing to see the `$nr' value walk
backwards if some shards are indexing at a slower pace.

2 years agocindex: implement --exclude= like -clone
Eric Wong [Tue, 21 Mar 2023 23:07:27 +0000 (23:07 +0000)] 
cindex: implement --exclude= like -clone

This is to ensure we can exclude certain repos which are
expensive-to-index (e.g. `**/deps.git', `**/transparency-logs/**').

2 years agods: @post_loop_do replaces SetPostLoopCallback
Eric Wong [Tue, 21 Mar 2023 23:07:26 +0000 (23:07 +0000)] 
ds: @post_loop_do replaces SetPostLoopCallback

This allows us to avoid repeatedly using memory-intensive
anonymous subs in CodeSearchIdx where the callback is assigned
frequently.  Anonymous subs are known to leak memory in old
Perls (e.g. 5.16.3 in enterprise distros) and still expensive in
newer Perls.  So favor the (\&subroutine, @args) form which
allows us to eliminate anonymous subs going forward.

Only CodeSearchIdx takes advantage of the new API at the moment,
since it's the biggest repeat user of post-loop callback
changes.

Getting rid of the subroutine and relying on a global `our'
variable also has two advantages:

1) Perl warnings can detect typos at compile-time, whereas the
   (now gone) method could only detect errors at run-time.

2) `our' variable assignment can be `local'-ized to a scope

2 years agocindex: use DS and workqueues for parallelism
Eric Wong [Tue, 21 Mar 2023 23:07:25 +0000 (23:07 +0000)] 
cindex: use DS and workqueues for parallelism

This avoids forking new shard processes for each repo we scan,
but we can't avoid many excessive commits since we need to
ensure the `seen()' sub can avoid excessive work.

2 years agosearchidxshard: improve comment wording
Eric Wong [Tue, 21 Mar 2023 23:07:24 +0000 (23:07 +0000)] 
searchidxshard: improve comment wording

Just something I noticed while considering using this package
for CodeSearchIdx.

2 years agocindex: use read-only shards during prep phases
Eric Wong [Tue, 21 Mar 2023 23:07:23 +0000 (23:07 +0000)] 
cindex: use read-only shards during prep phases

No need to open shards for read/write access when read-only
will do.  Since we also control how a document gets sharded,
we'll also access the shard directly instead of letting Xapian
do the mappings.

--reindex didn't work properly before this change since it was
over-indexing.  It is now broken in the opposite way in that it
doesn't do reindexing at all.  --reindex will be implemented
properly in the future.

2 years agocindex: parallelize prep phases
Eric Wong [Tue, 21 Mar 2023 23:07:22 +0000 (23:07 +0000)] 
cindex: parallelize prep phases

Listing refs, fingerprinting and root scanning can all be
parallelized to reduce runtime on SMP systems.

We'll use DESTROY-based dependency management with
parallelizagion as in LeiMirror to handle ref listing and
fingerprinting before serializing Xapian DB access to check
against the existing fingerprint.

We'll also delay root listing until we get a fingerprint
mismatch to speed up no-op indexing.

2 years agocodesearch: initial cut w/ -cindex tool
Eric Wong [Tue, 21 Mar 2023 23:07:21 +0000 (23:07 +0000)] 
codesearch: initial cut w/ -cindex tool

It seems relying on root commits is a reasonable way to
deduplicate and handle repositories with common history.

I initially wanted to shoehorn this into extindex, but decided a
separate Xapian index layout capable of being EITHER external to
handle many forks or internal (in $GIT_DIR/public-inbox-cindex)
for small projects is the right way to go.

Unlike most existing parts of public-inbox, this relies on
absolute paths of $GIT_DIR stored in the Xapian DB and does not
rely on the config file.  We'll be relying on the config file to
map absolute paths to public URL paths for WWW.

2 years agotest_common: create_inbox: use `$!' properly on mkdir failure
Eric Wong [Tue, 21 Mar 2023 23:07:20 +0000 (23:07 +0000)] 
test_common: create_inbox: use `$!' properly on mkdir failure

stat(2) may fail and set `$!', too, so we must stash it, first.

2 years agoadmin: ensure resolved GIT_DIR is absolute
Eric Wong [Tue, 21 Mar 2023 23:07:19 +0000 (23:07 +0000)] 
admin: ensure resolved GIT_DIR is absolute

We'll also support the $base arg of File::Spec->rel2abs
since it should make codesearch indexing easier.

2 years agoadmin: hoist out resolve_git_dir
Eric Wong [Tue, 21 Mar 2023 23:07:18 +0000 (23:07 +0000)] 
admin: hoist out resolve_git_dir

We'll be using this for indexing git coderepos, and
switch to Perl 5.12 while we're at it since unicode_strings
doesn't affect this package.

2 years agosearch: relocate all_terms from lei_search
Eric Wong [Tue, 21 Mar 2023 23:07:17 +0000 (23:07 +0000)] 
search: relocate all_terms from lei_search

This will be used for code_search, too.

2 years agoipc: move nproc_shards from v2writable
Eric Wong [Tue, 21 Mar 2023 23:07:16 +0000 (23:07 +0000)] 
ipc: move nproc_shards from v2writable

We'll be using nproc_shards for indexing non-Inbox stuff.

2 years agoclone: support --purge to delete remotely-deleted repos
Eric Wong [Sat, 18 Mar 2023 12:02:13 +0000 (12:02 +0000)] 
clone: support --purge to delete remotely-deleted repos

This lets us clean up disk space when repos are removed
on the remote side.

2 years agoclone: show stale directories unconditionally
Eric Wong [Sat, 18 Mar 2023 12:02:12 +0000 (12:02 +0000)] 
clone: show stale directories unconditionally

--project-list= is no longer required to show stale
repositories.

2 years agodoc: clone: note the default value of --remote-manifest=
Eric Wong [Sat, 18 Mar 2023 12:02:11 +0000 (12:02 +0000)] 
doc: clone: note the default value of --remote-manifest=

It may not be immediately obvious to users unfamiliar with
grokmirror.

2 years agoconfig: glob2re supports `**' to match multiple path components
Eric Wong [Fri, 17 Mar 2023 20:31:37 +0000 (20:31 +0000)] 
config: glob2re supports `**' to match multiple path components

This should match behavior documented in gitglossary(7)

2 years agotreewide: move glob2re to PublicInbox::Config
Eric Wong [Fri, 17 Mar 2023 20:31:36 +0000 (20:31 +0000)] 
treewide: move glob2re to PublicInbox::Config

It seems suitable for the config class since globs are a
config/option thing.

2 years agods: reap_pids: remove redundant signal blocking
Eric Wong [Wed, 15 Mar 2023 21:47:56 +0000 (21:47 +0000)] 
ds: reap_pids: remove redundant signal blocking

Blocking signals when reaping was done when the lei pager was
spawned by the daemon in b90e8d6e02.  Shortly afterwards in
7b79c918a5, the client script took over spawning of the pager
and made b90e8d6e02 redundant.

cf. b90e8d6e02 (ds: block signals when reaping, 2021-01-10)
    7b79c918a5 (lei: run pager in client script, 2021-01-10)

2 years agot/solver_git: squelch non-UTF-8 commit warning
Eric Wong [Mon, 13 Mar 2023 20:17:14 +0000 (20:17 +0000)] 
t/solver_git: squelch non-UTF-8 commit warning

We're making an ISO-8859-1 commit for testing purposes, so use
i18n.commitEncoding to shut git up.

2 years agowatch: add space before "UID" or "ARTICLE" in warnings
Eric Wong [Mon, 13 Mar 2023 19:38:27 +0000 (19:38 +0000)] 
watch: add space before "UID" or "ARTICLE" in warnings

In other words, it now shows `imap://example.com/INBOX.foo UID:123'
instead of: `imap://example.com/INBOX.foo UID:123'

2 years agospamcheck: use v5.12 and golf
Eric Wong [Mon, 13 Mar 2023 19:38:26 +0000 (19:38 +0000)] 
spamcheck: use v5.12 and golf

No problems with `unicode_strings' in these modules.  We can
also shave our LoC count in a few places.

2 years agouse v5.12 for various network client-side packages
Eric Wong [Mon, 13 Mar 2023 19:38:25 +0000 (19:38 +0000)] 
use v5.12 for various network client-side packages

None of these are affected by the Perl unicode_strings feature,
so they can `use v5.12' safely

2 years agodoc: clone: fix typo in --remote-manifest= description
Eric Wong [Tue, 14 Mar 2023 20:48:19 +0000 (20:48 +0000)] 
doc: clone: fix typo in --remote-manifest= description

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v8j4ql8k.fsf@kyleam.com/
2 years agodoc: clone: document --remote-manifest= option
Eric Wong [Mon, 13 Mar 2023 12:00:24 +0000 (12:00 +0000)] 
doc: clone: document --remote-manifest= option

2 years agolei_mirror: handle UTF-8 from manifest.js.gz properly
Eric Wong [Mon, 13 Mar 2023 12:00:23 +0000 (12:00 +0000)] 
lei_mirror: handle UTF-8 from manifest.js.gz properly

This should ensure we display the "git config gitweb.owner
$OWNER" command invocation properly and also ensures we set the
description properly without triggering wide character warnings.

Also tested with a smallish iproute2 repo
(/pub/scm/linux/kernel/git/toke/iproute2.git) using my mirror:

  public-inbox-clone --remote-manifest=pub/manifest.js.gz \
    --include='*/toke/iproute2.git' --inbox-config=never \
    https://80x24.org/lore $DST

Anyways, I'm fairly certain this change and its tests are
correct; but I still struggle to understand Perl's approach to
Unicode and it's interactions with various JSON implementations.

Fixes: 0830817c132cb105 ("lei_mirror: show non-ASCII owner properly w/ --verbose")
2 years agolei_mirror: do not fetch to read-only directories
Eric Wong [Mon, 13 Mar 2023 12:00:22 +0000 (12:00 +0000)] 
lei_mirror: do not fetch to read-only directories

As with public-inbox-fetch, we shouldn't waste time fetching
into read-only directories, since --epoch= will make unwanted
epoch directories read-only placeholders.

2 years agolei_mirror: do not re-fetch inbox.config.example
Eric Wong [Mon, 13 Mar 2023 12:00:21 +0000 (12:00 +0000)] 
lei_mirror: do not re-fetch inbox.config.example

It's a significant source of latency for incremental updates at
the moment, and not really needed since it's just an example.

2 years agolei_mirror: describe why the {ibx} field is used
Eric Wong [Mon, 13 Mar 2023 12:00:20 +0000 (12:00 +0000)] 
lei_mirror: describe why the {ibx} field is used

I forgot why that hunk of code was needed :x, so maybe others
will find the comment helpful, too.

2 years agot/solver_git: fix when Plack::Middleware::ReverseProxy is missing
Eric Wong [Mon, 13 Mar 2023 11:58:11 +0000 (11:58 +0000)] 
t/solver_git: fix when Plack::Middleware::ReverseProxy is missing

We need to ignore the harmless warnings in stderr when
Plack::Middleware::ReverseProxy isn't installed.

2 years agolei_dedupe: simplify smsg_hash sub
Eric Wong [Sat, 11 Mar 2023 17:36:00 +0000 (17:36 +0000)] 
lei_dedupe: simplify smsg_hash sub

We can just use the sha256() sub instead of dealing with the
OO interface for a small string.

2 years agodoc: 2.0.0 release notes update
Eric Wong [Thu, 9 Mar 2023 19:28:42 +0000 (19:28 +0000)] 
doc: 2.0.0 release notes update

Did some stuff, still a ton of stuff to do :x

2 years agodoc: lei config: update with --edit and --list examples
Eric Wong [Thu, 9 Mar 2023 19:28:41 +0000 (19:28 +0000)] 
doc: lei config: update with --edit and --list examples

I typically use --edit/-e to make changes and --list/-l with
git; and same with lei.

2 years agodoc: lei import: add hints about nntp.* and imap.* config options
Eric Wong [Thu, 9 Mar 2023 19:28:40 +0000 (19:28 +0000)] 
doc: lei import: add hints about nntp.* and imap.* config options

I'm setting up more imports and forgot about them :x

2 years agodoc: technical: document weird stuff in our codebase
Eric Wong [Thu, 9 Mar 2023 19:28:39 +0000 (19:28 +0000)] 
doc: technical: document weird stuff in our codebase

Hopefully this makes things less surprising to new hackers.

2 years agodoc: technical/ds: update blurb to note more daemons
Eric Wong [Thu, 9 Mar 2023 19:28:38 +0000 (19:28 +0000)] 
doc: technical/ds: update blurb to note more daemons

And add a note about the various wakeup modes of kqueue|epoll
while we're at it; we use all of them!

2 years agodoc: technical/memory: add note about mwrap-perl
Eric Wong [Thu, 9 Mar 2023 19:28:37 +0000 (19:28 +0000)] 
doc: technical/memory: add note about mwrap-perl

It's already fixed memory usage problems not only in our codebase,
but also the standard `Encode' XS module and `git pack-objects'.

2 years agolei_mirror: unlink FETCH_HEAD when fetching forkgroups
Eric Wong [Wed, 8 Mar 2023 11:02:58 +0000 (11:02 +0000)] 
lei_mirror: unlink FETCH_HEAD when fetching forkgroups

Apparently, --no-write-fetch-head is broken in current git[1].
It also wasn't in older git, at all.  So just unlink FETCH_HEAD
as we see it, but keep using --no-write-fetch-head to avoid the
syscall and I/O overhead when we can.

[1] https://yhbt.net/lore/git/20230308100438.908471-1-e@80x24.org/

2 years agotest_common: run_script: drop special-case for -clone
Eric Wong [Tue, 7 Mar 2023 09:54:15 +0000 (09:54 +0000)] 
test_common: run_script: drop special-case for -clone

`make check' and `make check-run' actually work fine with it,
and TMPDIR=/dev/shm prove -lvw t/clone-coderepo.t is 2-3x faster

2 years agocgit: fix smart HTTP clone interception
Eric Wong [Tue, 7 Mar 2023 09:32:37 +0000 (09:32 +0000)] 
cgit: fix smart HTTP clone interception

We need to use the proper hash and key to do coderepo lookups
since we culled a redundant data structure a few months back.

Fixes: 1802dc29bda25a54 ("www_coderepo: do not copy {-code_repos} from config")
2 years agosha: fix compatibility with old OpenSSL + Net::SSLeay
Eric Wong [Tue, 7 Mar 2023 08:47:15 +0000 (08:47 +0000)] 
sha: fix compatibility with old OpenSSL + Net::SSLeay

In older OpenSSL, EVP_get_digestbyname() didn't work properly
without calling OpenSSL_add_all_digests(), first.  However,
OpenSSL_add_all_digests() is deprecated by OpenSSL 1.1.0 in
favor of OPENSSL_init_crypto().  Of course, OpenSSL_init_crypto()
isn't available in OpenSSL 1.0.1k nor Net::SSLeay as of 1.93_02
(2023-02-22).

Thus, instead of relying on string lookups and conditional
subroutine calls, just call EVP_sha1() and EVP_sha256() which
work on both old and new systems.

Tested with Net::SSLeay 1.55 and OpenSSL 1.0.1k on on CentOS 7.x

2 years agodoc: update public-inbox-clone examples and help
Eric Wong [Sun, 5 Mar 2023 22:18:11 +0000 (22:18 +0000)] 
doc: update public-inbox-clone examples and help

Basically, public-inbox-clone has become grok-pull without
config files nor absolute paths.

2 years agodoc: drop hosted.txt
Eric Wong [Thu, 2 Mar 2023 00:13:14 +0000 (00:13 +0000)] 
doc: drop hosted.txt

I'll have to downsize the server due to increased hosting costs,
so stop advertising these mirrors.

The inboxes still exist, for now; but will probably be proxied
behind an ssh tunnel via slow DSL connection, but it's not worth
increasing traffic to.

2 years agodoc: update clone+fetch with 2.0+ switches
Eric Wong [Mon, 27 Feb 2023 10:21:05 +0000 (10:21 +0000)] 
doc: update clone+fetch with 2.0+ switches

Because old versions will exist for a long time and our latest
documentation is visible on the web, we must document when a
switch appears to avoid confusing users of old versions.

2 years agoprocess_pipe: BINMODE: pass LAYER argument
Eric Wong [Mon, 27 Feb 2023 07:18:34 +0000 (07:18 +0000)] 
process_pipe: BINMODE: pass LAYER argument

We'll end up using this to handle `:utf8', probably.

2 years agodoc: note "lei q -tt" is broken with HTTP(S) remotes
Eric Wong [Sun, 26 Feb 2023 17:15:06 +0000 (17:15 +0000)] 
doc: note "lei q -tt" is broken with HTTP(S) remotes

I'm still trying to decide how to handle HTTP(S) remotes
properly...

Link: https://public-inbox.org/meta/20230226170931.M947721@dcvr/
2 years agods: write: do not assume final wbuf entry is tmpio
Eric Wong [Fri, 24 Feb 2023 16:59:10 +0000 (16:59 +0000)] 
ds: write: do not assume final wbuf entry is tmpio

The final entry of {wbuf} may be a CODE ref and not a
tmpio ARRAY ref, so we must ensure it's an ARRAY before
attempting to use `->[INDEX]' to access it.

This fixes:
  forward ->close error: Not an ARRAY reference at PublicInbox/DS.pm line 544.

2 years agoexamples: remove `Standard{Error,Output} = syslog' lines
Eric Wong [Wed, 22 Feb 2023 18:17:39 +0000 (18:17 +0000)] 
examples: remove `Standard{Error,Output} = syslog' lines

systemd (247.3-7+deb11u1 on Debian 11.x) considers them "obsolete" and
emits the following to my syslog:

  Standard output type syslog is obsolete, automatically updating to journal.
  Please update your unit file, and consider removing the setting altogether.

So we'll remove it altogether, as I'm sticking with rsyslog for now.

2 years agotreewide: simplify File::Path mkpath/make_path callers
Eric Wong [Wed, 22 Feb 2023 17:25:55 +0000 (17:25 +0000)] 
treewide: simplify File::Path mkpath/make_path callers

File::Path already accounts for the existence of directories,
handles races from redundant mkdir(2), and croaks on
unrecoverable errors.  So there's no point in doing any
of that on our end.

Furthermore, avoiding the overhead of loading File::Path doesn't
seem worth it to save 20-60ms given the overhead of loading
our other code.  Instead, try to reduce optree overhead on
our code, instead, since File::Path gets used in a bunch of
places.

We'll also favor the newer make_path for multi-directory
invocations to avoid bloating our own optree to create an
arrayref, but mkpath is one fewer subroutine call within
File::Path itself, right now.

2 years agosendmsg: prefix sleep message with `#'
Eric Wong [Wed, 22 Feb 2023 17:25:52 +0000 (17:25 +0000)] 
sendmsg: prefix sleep message with `#'

It's an informative message that's harmless, so hopefully
the `#' prefix puts the users mind at ease.

(I saw it on an `lei import' against an IMAP source)

2 years agolei_mirror: support --remote-manifest=URL
Eric Wong [Tue, 21 Feb 2023 12:17:44 +0000 (12:17 +0000)] 
lei_mirror: support --remote-manifest=URL

Since PublicInbox::WWW already generates manifest.js.gz, I'm
using an alternate path with PublicInbox::WwwStatic to host the
manifest.js.gz for coderepos at an alternate location.  The
following snippet lets me host
https://yhbt.net/lore/pub/manifest.js.gz for mirrored git
repositories, while https://yhbt.net/lore/manifest.js.gz
(no `pub') remains for inbox mirroring.

==> sample.psgi <==
use PublicInbox::WWW;
use PublicInbox::WwwStatic;
my $www = PublicInbox::WWW->new; # use default PI_CONFIG
my $st = PublicInbox::WwwStatic->new(docroot => '/path/to/code');
my $www_cb = sub {
my ($env) = @_;
if ($env->{PATH_INFO} eq '/pub/manifest.js.gz') {
local $env->{PATH_INFO} = '/manifest.js.gz';
my $res = $st->call($env);
return $res if $res->[0] != 404;
}
$www->call($env);
};
builder {
enable 'ReverseProxy';
enable 'Head';
mount '/lore' => $www_cb;
}

2 years agoviewvcs: handle non-UTF-8 commit message
Eric Wong [Tue, 21 Feb 2023 11:17:58 +0000 (11:17 +0000)] 
viewvcs: handle non-UTF-8 commit message

Back in the old days, git didn't store commit encodings
and allowed messages in various encodings to enter history.
Assuming such a commit is UTF-8 trips up s/// operations
on buffers read with the `:utf8' PerlIO layer.  So clear
Perl's internal UTF-8 flag if we end up with something
which isn't valid UTF-8

An example is commit 7eb93c89651c47c8095d476251f2e4314656b292
in git.git ([PATCH] Simplify git script, 2005-09-07)

2 years agoREADME: add POP3 bits
Eric Wong [Mon, 20 Feb 2023 11:06:03 +0000 (11:06 +0000)] 
README: add POP3 bits

Maybe this can make our newish support of POP3 more
noticeable...

2 years agosearchidx: do not index quoted Base-85 patches
Eric Wong [Mon, 20 Feb 2023 09:21:50 +0000 (09:21 +0000)] 
searchidx: do not index quoted Base-85 patches

Base-85 binary patches were a source of false-positives in results
and we've filtered out in non-quoted text since July 2022.
Unfortunately, people were quoting binary patch contents
in replies (*sigh*) and triggering false positives in search
results.  So we must filter out base-85-looking contents from
quoted text, too.

Followup-to: 8fda04081acde705 (search: do not index base-85 binary patches, 2022-06-20)
Followup-to: 840785917bc74c8e (searchidx: skip "delta $N" sections for base-85, 2022-07-19)
2 years agomulti_git: do not set include.path if already set
Eric Wong [Mon, 20 Feb 2023 05:32:02 +0000 (05:32 +0000)] 
multi_git: do not set include.path if already set

The epoch may already be read-only, and we don't need to cause
more I/O traffic and disk wear for no-op stuff.  This fixes
idempotent use of public-inbox-clone to update multi-epoch
inboxes.

2 years agogit_async_cat: don't mis-abort replaced process
Eric Wong [Mon, 20 Feb 2023 08:19:43 +0000 (08:19 +0000)] 
git_async_cat: don't mis-abort replaced process

When a git process gets replaced (e.g. due to new
epochs/alternates), we must be careful and not abort the wrong
one.

I suspect this fixes the problem exacerbated by --batch-command.
It was theoretically possible w/o --batch-command, but it seems
to have made it surface more readily.

This should fix "Failed to retrieve generated blob" errors from
PublicInbox/ViewVCS.pm appearing in syslog

Link: https://public-inbox.org/meta/20230209012932.M934961@dcvr/
2 years agosearch: translate d: to dt: in query
Eric Wong [Sun, 19 Feb 2023 08:18:14 +0000 (08:18 +0000)] 
search: translate d: to dt: in query

dt: is higher resolution and the YYYYMMDD column will be dropped
if there's ever another SCHEMA_VERSION update.  While the
upcoming code repo index is independent of the mail schemas,
it'll use similar query prefixes and likely use d:/dt: for
Author Date of git commits.

2 years agosearch: move query transform + enquire setup out of retry loop
Eric Wong [Fri, 17 Feb 2023 10:36:14 +0000 (10:36 +0000)] 
search: move query transform + enquire setup out of retry loop

The Xapian query transformation and Enquire object setup aren't
subject to MVCC and retries, so move it outside the retry loop
to save some cycles in case we need to retry on a busy DB.

2 years agopublic-inbox.cgi(1): Mention AllowEncodedSlashes for Apache setups
Uwe Kleine-König [Fri, 17 Feb 2023 11:08:50 +0000 (12:08 +0100)] 
public-inbox.cgi(1): Mention AllowEncodedSlashes for Apache setups

When AllowEncodedSlashes is Off (the default setting), URLs containing
%2f are replied with a 404 error without calling the CGI. To (maybe)
prevent others debugging this issue add a hint with the solution.

2 years agoTODO: handle more cases of unencoded slashes
Eric Wong [Fri, 17 Feb 2023 10:32:22 +0000 (10:32 +0000)] 
TODO: handle more cases of unencoded slashes

Nowadays, mutt defaults to Message-IDs with `/' in them :<

2 years agoMakefile.PL: drop update-copyrights rule
Eric Wong [Wed, 15 Feb 2023 08:01:12 +0000 (08:01 +0000)] 
Makefile.PL: drop update-copyrights rule

I'm no longer updating them since it's noisy and acceptable
to not have them:

  https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/

I'm tired of being reminded what year it is :<

2 years agodoc: extindex update on configuration and union section
Eric Wong [Wed, 15 Feb 2023 08:01:11 +0000 (08:01 +0000)] 
doc: extindex update on configuration and union section

The coderepo indexer will use similar ideas, I think...

2 years agodoc: flow: update with newer tools, note forkability
Eric Wong [Wed, 15 Feb 2023 08:01:10 +0000 (08:01 +0000)] 
doc: flow: update with newer tools, note forkability

public-inbox-{clone,fetch,netd} are all relatively new
developments which we can document, here.

We'll also update the generator Makefile snippet since there may
be more Graph::Easy-based docs coming.

2 years agodoc: WWW + cgi: favor -netd over -httpd
Eric Wong [Wed, 15 Feb 2023 08:01:09 +0000 (08:01 +0000)] 
doc: WWW + cgi: favor -netd over -httpd

-netd is strictly more powerful and a gateway drug for
imapd/nntpd/pop3d instances :>

2 years agowww_coderepo: handle unborn/dead branches in summary
Eric Wong [Tue, 14 Feb 2023 13:17:39 +0000 (13:17 +0000)] 
www_coderepo: handle unborn/dead branches in summary

We need to account for `git log' showing nothing for invalid
branches and continue to render properly.  We'll also quiet down
`git log' stderr to avoid cluttering stderr, too.

2 years agowww_coderepo: quiet 404s on Atom feeds for dead branches
Eric Wong [Tue, 14 Feb 2023 13:17:38 +0000 (13:17 +0000)] 
www_coderepo: quiet 404s on Atom feeds for dead branches

No need to clutter up logs when a request hits a dead branch.

2 years agolei q: do not collapse threads with `-tt'
Eric Wong [Tue, 14 Feb 2023 02:42:32 +0000 (02:42 +0000)] 
lei q: do not collapse threads with `-tt'

While having Xapian collapse threads is an easy way to reduce
the amount of deduplication work we need to do when writing
out threads; we can't rely on it when using `lei q -tt` since
that needs to flag all hits.

Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://public-inbox.org/git/Y+pgBmj0jxR+cVkD@mail.gmail.com/
2 years agoimap: quiet Parse::RecDescent errors on bad search queries
Eric Wong [Mon, 13 Feb 2023 01:02:12 +0000 (01:02 +0000)] 
imap: quiet Parse::RecDescent errors on bad search queries

Parse::RecDescent emits giant errors to STDERR by default
(bypassing $SIG{__WARN__}, even).  Shut it up since there's
no good way to pass those back to a client, and we don't want
clients flooding logs with bogus requests.

2 years agolei_mirror: fetch most-recently-updated repos, first
Eric Wong [Sun, 12 Feb 2023 23:18:28 +0000 (23:18 +0000)] 
lei_mirror: fetch most-recently-updated repos, first

Within the same forkgroup, we can assume the most recently updated
repo has the most data, so fetch those, first.  We'll save new clones
for last since we can preserve {reference} ordering for them.

2 years agolei_mirror: further reduce `git config' calls
Eric Wong [Sun, 12 Feb 2023 23:18:27 +0000 (23:18 +0000)] 
lei_mirror: further reduce `git config' calls

We can parse the config at once and avoid clobbering variables
which do not need changing.  We'll also do some prep work for
fetch.hideRefs proposal being discussed at
<https://public-inbox.org/git/20230209122857.M669733@dcvr/>

2 years agot/lei-refresh-mail-sync: avoid kill+sleep loop
Eric Wong [Sun, 12 Feb 2023 03:12:03 +0000 (03:12 +0000)] 
t/lei-refresh-mail-sync: avoid kill+sleep loop

While we can't waitpid() on daemonized process, we can abuse the
lack of FD_CLOEXEC to detect a process death.  This saves
roughly 400ms for this slow test.

2 years agogit_async_cat: use awaitpid
Eric Wong [Fri, 10 Feb 2023 08:56:41 +0000 (08:56 +0000)] 
git_async_cat: use awaitpid

While awaitpid already registered a no-op callback in
_bidi_pipe, we can still call it again when registering it into
our event loop to ensure EPOLL_CTL_DEL fires.

2 years agolei_mirror: avoid dir/file conflicts in update-ref
Eric Wong [Fri, 10 Feb 2023 03:58:52 +0000 (03:58 +0000)] 
lei_mirror: avoid dir/file conflicts in update-ref

Using the files ref backend for git, `delete' and `create'
operations for `update-ref --stdin' need to be processed in
separate transactions to avoid conflicts in cases where a file
becomes a directory (or presumably, vice versa).

2 years agospawn_pp: fix incorrect `use'
Eric Wong [Thu, 9 Feb 2023 21:53:20 +0000 (21:53 +0000)] 
spawn_pp: fix incorrect `use'

We can't `use PublicInbox::Spawn' from SpawnPP because
PublicInbox::Spawn loads SpawnPP from BEGIN.

Fixes: 9eb8baf199cd148b (spawn_pp: use `which()' properly for pure-Perl spawn, 2023-01-29)
2 years agolei_mirror: show non-ASCII owner properly w/ --verbose
Eric Wong [Thu, 9 Feb 2023 12:30:59 +0000 (12:30 +0000)] 
lei_mirror: show non-ASCII owner properly w/ --verbose

This makes the verbose progress output look nicer, but doesn't
affect the actual config file generation.

2 years agolei_mirror: reduce `git config' usage
Eric Wong [Mon, 6 Feb 2023 05:56:35 +0000 (05:56 +0000)] 
lei_mirror: reduce `git config' usage

We can use `git -c $KEY=$VAL fetch' with a random remote name
that never makes it to a config file.

2 years agowww: sort all /$INBOX/ topics by Received: timestamp
Eric Wong [Sat, 4 Feb 2023 20:41:10 +0000 (20:41 +0000)] 
www: sort all /$INBOX/ topics by Received: timestamp

Our previous pinning prevention only worked to prevent older
(non-most-recent) topics from being pinned to the landing page,
but not the most recent window of messages.

We still sort messages within threads by Date: because that
makes git-send-email patchsets display more nicely, but we
don't want recent topics pinned due to future Date: headers.

I nearly switched sort_ds() back to sorting by Received: until
I looked back on commit 8e52e5fdea416d6fda0b8d301144af0c043a5a76
(use both Date: and Received: times, 2018-03-21) and was reminded
git-send-email relies on Date: for large series, so I added a
note about it for sort_ds().

Reported-by: Kyle Meyer <kyle@kyleam.com>
Tested-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87edr5gx63.fsf@kyleam.com/
2 years agolei_mirror: use --no-write-fetch-head on git 2.29+
Eric Wong [Fri, 3 Feb 2023 03:46:03 +0000 (03:46 +0000)] 
lei_mirror: use --no-write-fetch-head on git 2.29+

This avoids unnecessary writes to the FETCH_HEAD file, which is
worthless in multi-remote mirrors.  Actually, I haven't found
FETCH_HEAD useful anywhere since the `/remotes/' namespace
became popular...

2 years agowww: diff: fix encoding problems when showing diff
Eric Wong [Tue, 31 Jan 2023 10:31:57 +0000 (10:31 +0000)] 
www: diff: fix encoding problems when showing diff

We need to use the utf8 layer when writing files to be diffed,
and utf8::decode the `git diff' output.  Furthermore, do the
CRLF > LF conversion early to avoid showing CRLF vs LF
differences in the diff, since that doesn't matter to MUAs
(nor our normal HTML views)

2 years agolei: drop -watches and -lei_note_event from workers
Eric Wong [Tue, 31 Jan 2023 00:05:15 +0000 (00:05 +0000)] 
lei: drop -watches and -lei_note_event from workers

I noticed these while tracking down circular refs for commit
7b654d175cf2e31b (ipc: drop awaitpid_init to avoid circular refs, 2023-01-30).
While they're not the cause of circular refs, they're still
a waste of memory in worker processes.

2 years agotests: make require_git and require_cmd easier-to-use
Eric Wong [Mon, 30 Jan 2023 22:50:07 +0000 (22:50 +0000)] 
tests: make require_git and require_cmd easier-to-use

We'll rely on defined(wantarray) to implicitly skip subtests,
and memoize these to reduce syscalls, since tests should
be short-lived enough to not be affected by new installations or
removals of git/xapian-compact/curl/etc...

2 years agotests: make slow tests easier-to-find
Eric Wong [Mon, 30 Jan 2023 04:30:58 +0000 (04:30 +0000)] 
tests: make slow tests easier-to-find

t/run.perl now prints slowest 10 tests at startup, and I've
added ./devel/longest-tests to print all tests sorted by
elapsed time.

This should allow us to notice outliers more quickly in the
future.

2 years agoipc: drop awaitpid_init to avoid circular refs
Eric Wong [Mon, 30 Jan 2023 04:30:57 +0000 (04:30 +0000)] 
ipc: drop awaitpid_init to avoid circular refs

This brings t/lei-index.t back down from ~8 to ~3s.  I didn't
notice this before was because the LeiNoteEvent timer was firing
every 5s and clearing circular refs and parallel testing meant
the delay got hidden.

Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
2 years agoxt/lei-auth-fail: use valid label name
Eric Wong [Sun, 29 Jan 2023 22:58:35 +0000 (22:58 +0000)] 
xt/lei-auth-fail: use valid label name

Uppercase characters aren't allowed for labels due to Xapian
boolean limitations, so we need to use lowercase labels.

Fixes: 27015c3365fd0690 (lei_input: disallow uppercase characters for labels, 2021-10-31)
2 years agolei_input: give a hint for upper-case in labels
Eric Wong [Sun, 29 Jan 2023 22:58:34 +0000 (22:58 +0000)] 
lei_input: give a hint for upper-case in labels

I just encountered this error in xt/lei-auth-fail.t

2 years agocontent_digest_dbg: convert to arrayref and limit to lei
Eric Wong [Sun, 29 Jan 2023 10:30:42 +0000 (10:30 +0000)] 
content_digest_dbg: convert to arrayref and limit to lei

Since it's an extremely small class and not subclassed or
anything, we'll make it even smaller as an arrayref.

We also don't load this for PublicInbox::WWW or anything that
runs in public-facing daemons.