Eric Wong [Mon, 13 Mar 2023 12:00:23 +0000 (12:00 +0000)]
lei_mirror: handle UTF-8 from manifest.js.gz properly
This should ensure we display the "git config gitweb.owner
$OWNER" command invocation properly and also ensures we set the
description properly without triggering wide character warnings.
Also tested with a smallish iproute2 repo
(/pub/scm/linux/kernel/git/toke/iproute2.git) using my mirror:
Anyways, I'm fairly certain this change and its tests are
correct; but I still struggle to understand Perl's approach to
Unicode and it's interactions with various JSON implementations.
Fixes: 0830817c132cb105 ("lei_mirror: show non-ASCII owner properly w/ --verbose")
Eric Wong [Mon, 13 Mar 2023 12:00:22 +0000 (12:00 +0000)]
lei_mirror: do not fetch to read-only directories
As with public-inbox-fetch, we shouldn't waste time fetching
into read-only directories, since --epoch= will make unwanted
epoch directories read-only placeholders.
Eric Wong [Wed, 8 Mar 2023 11:02:58 +0000 (11:02 +0000)]
lei_mirror: unlink FETCH_HEAD when fetching forkgroups
Apparently, --no-write-fetch-head is broken in current git[1].
It also wasn't in older git, at all. So just unlink FETCH_HEAD
as we see it, but keep using --no-write-fetch-head to avoid the
syscall and I/O overhead when we can.
Eric Wong [Tue, 7 Mar 2023 08:47:15 +0000 (08:47 +0000)]
sha: fix compatibility with old OpenSSL + Net::SSLeay
In older OpenSSL, EVP_get_digestbyname() didn't work properly
without calling OpenSSL_add_all_digests(), first. However,
OpenSSL_add_all_digests() is deprecated by OpenSSL 1.1.0 in
favor of OPENSSL_init_crypto(). Of course, OpenSSL_init_crypto()
isn't available in OpenSSL 1.0.1k nor Net::SSLeay as of 1.93_02
(2023-02-22).
Thus, instead of relying on string lookups and conditional
subroutine calls, just call EVP_sha1() and EVP_sha256() which
work on both old and new systems.
Tested with Net::SSLeay 1.55 and OpenSSL 1.0.1k on on CentOS 7.x
Eric Wong [Mon, 27 Feb 2023 10:21:05 +0000 (10:21 +0000)]
doc: update clone+fetch with 2.0+ switches
Because old versions will exist for a long time and our latest
documentation is visible on the web, we must document when a
switch appears to avoid confusing users of old versions.
Eric Wong [Fri, 24 Feb 2023 16:59:10 +0000 (16:59 +0000)]
ds: write: do not assume final wbuf entry is tmpio
The final entry of {wbuf} may be a CODE ref and not a
tmpio ARRAY ref, so we must ensure it's an ARRAY before
attempting to use `->[INDEX]' to access it.
This fixes:
forward ->close error: Not an ARRAY reference at PublicInbox/DS.pm line 544.
systemd (247.3-7+deb11u1 on Debian 11.x) considers them "obsolete" and
emits the following to my syslog:
Standard output type syslog is obsolete, automatically updating to journal.
Please update your unit file, and consider removing the setting altogether.
So we'll remove it altogether, as I'm sticking with rsyslog for now.
File::Path already accounts for the existence of directories,
handles races from redundant mkdir(2), and croaks on
unrecoverable errors. So there's no point in doing any
of that on our end.
Furthermore, avoiding the overhead of loading File::Path doesn't
seem worth it to save 20-60ms given the overhead of loading
our other code. Instead, try to reduce optree overhead on
our code, instead, since File::Path gets used in a bunch of
places.
We'll also favor the newer make_path for multi-directory
invocations to avoid bloating our own optree to create an
arrayref, but mkpath is one fewer subroutine call within
File::Path itself, right now.
Eric Wong [Tue, 21 Feb 2023 12:17:44 +0000 (12:17 +0000)]
lei_mirror: support --remote-manifest=URL
Since PublicInbox::WWW already generates manifest.js.gz, I'm
using an alternate path with PublicInbox::WwwStatic to host the
manifest.js.gz for coderepos at an alternate location. The
following snippet lets me host
https://yhbt.net/lore/pub/manifest.js.gz for mirrored git
repositories, while https://yhbt.net/lore/manifest.js.gz
(no `pub') remains for inbox mirroring.
==> sample.psgi <==
use PublicInbox::WWW;
use PublicInbox::WwwStatic;
my $www = PublicInbox::WWW->new; # use default PI_CONFIG
my $st = PublicInbox::WwwStatic->new(docroot => '/path/to/code');
my $www_cb = sub {
my ($env) = @_;
if ($env->{PATH_INFO} eq '/pub/manifest.js.gz') {
local $env->{PATH_INFO} = '/manifest.js.gz';
my $res = $st->call($env);
return $res if $res->[0] != 404;
}
$www->call($env);
};
builder {
enable 'ReverseProxy';
enable 'Head';
mount '/lore' => $www_cb;
}
Eric Wong [Tue, 21 Feb 2023 11:17:58 +0000 (11:17 +0000)]
viewvcs: handle non-UTF-8 commit message
Back in the old days, git didn't store commit encodings
and allowed messages in various encodings to enter history.
Assuming such a commit is UTF-8 trips up s/// operations
on buffers read with the `:utf8' PerlIO layer. So clear
Perl's internal UTF-8 flag if we end up with something
which isn't valid UTF-8
Eric Wong [Mon, 20 Feb 2023 09:21:50 +0000 (09:21 +0000)]
searchidx: do not index quoted Base-85 patches
Base-85 binary patches were a source of false-positives in results
and we've filtered out in non-quoted text since July 2022.
Unfortunately, people were quoting binary patch contents
in replies (*sigh*) and triggering false positives in search
results. So we must filter out base-85-looking contents from
quoted text, too.
Followup-to: 8fda04081acde705 (search: do not index base-85 binary patches, 2022-06-20) Followup-to: 840785917bc74c8e (searchidx: skip "delta $N" sections for base-85, 2022-07-19)
Eric Wong [Mon, 20 Feb 2023 05:32:02 +0000 (05:32 +0000)]
multi_git: do not set include.path if already set
The epoch may already be read-only, and we don't need to cause
more I/O traffic and disk wear for no-op stuff. This fixes
idempotent use of public-inbox-clone to update multi-epoch
inboxes.
Eric Wong [Mon, 20 Feb 2023 08:19:43 +0000 (08:19 +0000)]
git_async_cat: don't mis-abort replaced process
When a git process gets replaced (e.g. due to new
epochs/alternates), we must be careful and not abort the wrong
one.
I suspect this fixes the problem exacerbated by --batch-command.
It was theoretically possible w/o --batch-command, but it seems
to have made it surface more readily.
This should fix "Failed to retrieve generated blob" errors from
PublicInbox/ViewVCS.pm appearing in syslog
Eric Wong [Sun, 19 Feb 2023 08:18:14 +0000 (08:18 +0000)]
search: translate d: to dt: in query
dt: is higher resolution and the YYYYMMDD column will be dropped
if there's ever another SCHEMA_VERSION update. While the
upcoming code repo index is independent of the mail schemas,
it'll use similar query prefixes and likely use d:/dt: for
Author Date of git commits.
Eric Wong [Fri, 17 Feb 2023 10:36:14 +0000 (10:36 +0000)]
search: move query transform + enquire setup out of retry loop
The Xapian query transformation and Enquire object setup aren't
subject to MVCC and retries, so move it outside the retry loop
to save some cycles in case we need to retry on a busy DB.
public-inbox.cgi(1): Mention AllowEncodedSlashes for Apache setups
When AllowEncodedSlashes is Off (the default setting), URLs containing
%2f are replied with a 404 error without calling the CGI. To (maybe)
prevent others debugging this issue add a hint with the solution.
Eric Wong [Tue, 14 Feb 2023 13:17:39 +0000 (13:17 +0000)]
www_coderepo: handle unborn/dead branches in summary
We need to account for `git log' showing nothing for invalid
branches and continue to render properly. We'll also quiet down
`git log' stderr to avoid cluttering stderr, too.
Eric Wong [Tue, 14 Feb 2023 02:42:32 +0000 (02:42 +0000)]
lei q: do not collapse threads with `-tt'
While having Xapian collapse threads is an easy way to reduce
the amount of deduplication work we need to do when writing
out threads; we can't rely on it when using `lei q -tt` since
that needs to flag all hits.
Eric Wong [Mon, 13 Feb 2023 01:02:12 +0000 (01:02 +0000)]
imap: quiet Parse::RecDescent errors on bad search queries
Parse::RecDescent emits giant errors to STDERR by default
(bypassing $SIG{__WARN__}, even). Shut it up since there's
no good way to pass those back to a client, and we don't want
clients flooding logs with bogus requests.
Eric Wong [Sun, 12 Feb 2023 23:18:28 +0000 (23:18 +0000)]
lei_mirror: fetch most-recently-updated repos, first
Within the same forkgroup, we can assume the most recently updated
repo has the most data, so fetch those, first. We'll save new clones
for last since we can preserve {reference} ordering for them.
Eric Wong [Sun, 12 Feb 2023 23:18:27 +0000 (23:18 +0000)]
lei_mirror: further reduce `git config' calls
We can parse the config at once and avoid clobbering variables
which do not need changing. We'll also do some prep work for
fetch.hideRefs proposal being discussed at
<https://public-inbox.org/git/20230209122857.M669733@dcvr/>
Eric Wong [Sun, 12 Feb 2023 03:12:03 +0000 (03:12 +0000)]
t/lei-refresh-mail-sync: avoid kill+sleep loop
While we can't waitpid() on daemonized process, we can abuse the
lack of FD_CLOEXEC to detect a process death. This saves
roughly 400ms for this slow test.
Eric Wong [Fri, 10 Feb 2023 08:56:41 +0000 (08:56 +0000)]
git_async_cat: use awaitpid
While awaitpid already registered a no-op callback in
_bidi_pipe, we can still call it again when registering it into
our event loop to ensure EPOLL_CTL_DEL fires.
Eric Wong [Fri, 10 Feb 2023 03:58:52 +0000 (03:58 +0000)]
lei_mirror: avoid dir/file conflicts in update-ref
Using the files ref backend for git, `delete' and `create'
operations for `update-ref --stdin' need to be processed in
separate transactions to avoid conflicts in cases where a file
becomes a directory (or presumably, vice versa).
Eric Wong [Sat, 4 Feb 2023 20:41:10 +0000 (20:41 +0000)]
www: sort all /$INBOX/ topics by Received: timestamp
Our previous pinning prevention only worked to prevent older
(non-most-recent) topics from being pinned to the landing page,
but not the most recent window of messages.
We still sort messages within threads by Date: because that
makes git-send-email patchsets display more nicely, but we
don't want recent topics pinned due to future Date: headers.
I nearly switched sort_ds() back to sorting by Received: until
I looked back on commit 8e52e5fdea416d6fda0b8d301144af0c043a5a76
(use both Date: and Received: times, 2018-03-21) and was reminded
git-send-email relies on Date: for large series, so I added a
note about it for sort_ds().
Eric Wong [Fri, 3 Feb 2023 03:46:03 +0000 (03:46 +0000)]
lei_mirror: use --no-write-fetch-head on git 2.29+
This avoids unnecessary writes to the FETCH_HEAD file, which is
worthless in multi-remote mirrors. Actually, I haven't found
FETCH_HEAD useful anywhere since the `/remotes/' namespace
became popular...
Eric Wong [Tue, 31 Jan 2023 10:31:57 +0000 (10:31 +0000)]
www: diff: fix encoding problems when showing diff
We need to use the utf8 layer when writing files to be diffed,
and utf8::decode the `git diff' output. Furthermore, do the
CRLF > LF conversion early to avoid showing CRLF vs LF
differences in the diff, since that doesn't matter to MUAs
(nor our normal HTML views)
Eric Wong [Tue, 31 Jan 2023 00:05:15 +0000 (00:05 +0000)]
lei: drop -watches and -lei_note_event from workers
I noticed these while tracking down circular refs for commit 7b654d175cf2e31b (ipc: drop awaitpid_init to avoid circular refs, 2023-01-30).
While they're not the cause of circular refs, they're still
a waste of memory in worker processes.
Eric Wong [Mon, 30 Jan 2023 22:50:07 +0000 (22:50 +0000)]
tests: make require_git and require_cmd easier-to-use
We'll rely on defined(wantarray) to implicitly skip subtests,
and memoize these to reduce syscalls, since tests should
be short-lived enough to not be affected by new installations or
removals of git/xapian-compact/curl/etc...
Eric Wong [Mon, 30 Jan 2023 04:30:57 +0000 (04:30 +0000)]
ipc: drop awaitpid_init to avoid circular refs
This brings t/lei-index.t back down from ~8 to ~3s. I didn't
notice this before was because the LeiNoteEvent timer was firing
every 5s and clearing circular refs and parallel testing meant
the delay got hidden.
Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
Eric Wong [Sun, 29 Jan 2023 10:30:41 +0000 (10:30 +0000)]
use Net::SSLeay (OpenSSL) for SHA-(1|256) if installed
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
Eric Wong [Sun, 29 Jan 2023 09:45:11 +0000 (09:45 +0000)]
spawn_pp: use `which()' properly for pure-Perl spawn
I have no idea if mod_perl/mod_perl2 is used nowadays, but
we're stuck supporting it as long as mod_perl exists. So
add some tests and make minor updates to existing ones to
ensure it stays working.
Eric Wong [Sat, 28 Jan 2023 11:02:54 +0000 (11:02 +0000)]
www_coderepo: support $REPO/refs/{heads,tags}/ endpoints
These are also in cgit, but we'll include CLI hints to show
viewers how our data is generated. We don't have "$REPO/refs/"
without (heads|tags) yet, though...
Eric Wong [Thu, 26 Jan 2023 09:32:57 +0000 (09:32 +0000)]
git: drop needless checks for old git
`ambiguous' was added in git 2.21, and `dangling' was the only
other possible phrase which was inadvertantly slipped in prior
to 2.21. Thus there's no need to check for `notdir' or `loop'
responses since we aren't using `git cat-file --follow-symlinks'
anywhere.
Eric Wong [Thu, 26 Jan 2023 09:32:56 +0000 (09:32 +0000)]
git: use --batch-command in git 2.36+ to save processes
`git cat-file --batch-command' combines the functionality of
`--batch' and `--batch-check' into a single process. This
reduces the amount of running processes and is primarily
useful for coderepos (e.g. solver).
This also fixes prior use of `print { $git->{out} }' which is
a a potential (but unlikely) bug since commit d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
Lack of libgit2 on one of my test machines also uncovered fixes
necessary for t/imapd.t, t/nntpd.t and t/nntpd-v2.t.
Eric Wong [Wed, 25 Jan 2023 10:18:33 +0000 (10:18 +0000)]
process_pipe: warn hackers off using it for bidirectional pipes
While most uses of ->DESTROY happens in a predictable order in
long-lived daemons, process teardown on exit is chaotic and not
subject to ordering guarantees, so we must keep both ends of a
`git cat-file --batch*' pipe at the same level in the object
hierarchy.
Eric Wong [Tue, 24 Jan 2023 09:49:40 +0000 (09:49 +0000)]
viewvcs: improve tree glossary view
Adding an <hr> helps delineate the glossary, note that
submodules are rare, and avoid needlessly defining the
commits-in-trees case since the extra information is likely
to overwhelm new users.
Eric Wong [Tue, 24 Jan 2023 09:49:34 +0000 (09:49 +0000)]
www_coderepo: eliminate debug log footer
WwwCoderepo is for viewing blobs already in code repositories,
so there's no place for a debug log showing which mails were
used to arrive at a given blob. The debug footer remains for
/$INBOX/$OID/s/ URLs, of course.
Eric Wong [Tue, 24 Jan 2023 09:49:33 +0000 (09:49 +0000)]
www_coderepo: show /$INBOX/?t=$DATE link for commits
While we can't inexpensively search for git commits based on the
timestamp, coderepos configured for inboxes can still look up
messages based on the inbox URL.
Eric Wong [Tue, 24 Jan 2023 09:49:30 +0000 (09:49 +0000)]
qspawn: drop lineno from command failure warning
git, cgit, or any other command failing isn't an error
we can do anything about in qspawn, so don't have Perl
emit line number info and needlessly pollute logs.
Eric Wong [Sat, 21 Jan 2023 08:58:19 +0000 (08:58 +0000)]
ds: awaitpid: do not clobber entries for reaped processes
We must only write to $AWAIT_PIDS on the initial reap attempt.
While we're at it, avoid triggering an extra wakeup if we're
doing synchronous awaitpid. This seems to eliminate most
reliance on Qspawn->DESTROY to call Qspawn->finalize.
Eric Wong [Tue, 17 Jan 2023 07:19:10 +0000 (07:19 +0000)]
ipc+lei: switch to awaitpid
This avoids awkwardly stuffing an arrayref into callbacks
which expect multiple arguments. IPC->awaitpid_init now
allows pre-registering callbacks before spawning workers.
Eric Wong [Tue, 17 Jan 2023 07:19:07 +0000 (07:19 +0000)]
eofpipe: drop {arg} support for now
The only user of EOFpipe has no args, so avoid wasting a hash
slot on it. If we need it again in the future, EOFpipe will
allow an array of args, instead.