Eric Wong [Sun, 15 Oct 2023 08:16:28 +0000 (08:16 +0000)]
learn: respect indexlevel for v1 inboxes
v2 never suffered from this bug, apparently, but -learn didn't
seem able to handle indexlevel=basic (nor respect `medium')
for v1 inboxes. I only noticed this bug because I converted
some ancient v1 inboxes to `basic' to save space.
Eric Wong [Fri, 13 Oct 2023 06:12:29 +0000 (06:12 +0000)]
xap_helper_cxx: allow sharing XDG_CACHE_HOME across ABIs
For users sharing home directories (or just XDG_CACHE_HOME)
across hosts of different architectures, we must use a compiler
and architecture-specific destination directory for storing the
binary result. Even on the same OS and architecture, different
C++ compilers may have different ABIs, so we must account for
that.
Eric Wong [Thu, 12 Oct 2023 00:21:00 +0000 (00:21 +0000)]
lei: quiet excessive write/seen messages
We don't want to end up dumping nr_seen/nr_write when progress
is disabled, nor do we want forked off `lei note-event' workers
dump them when DS->Reset is called on fork.
Eric Wong [Wed, 11 Oct 2023 07:20:57 +0000 (07:20 +0000)]
lei import|tag|rm: support --commit-delay=SECONDS
Delayed commits allows users to trade off immediate safety for
throughput and reduced storage wear when running multiple
discreet commands.
This feature is currently useful for providing a way to make
t/lei-store-fail.t reliable and for ensuring `lei blob' can
retrieve messages which have not yet been committed.
In the future, it'll also be useful for the FUSE layer to batch
git activity.
Eric Wong [Wed, 11 Oct 2023 07:20:55 +0000 (07:20 +0000)]
import: cat_blob is a no-op w/o live fast-import
cat_blob is a fallback for handling files which haven't made it
onto disk to be readable by `git cat-file'. Thus spawning a new
fast-import process to retrieve a blob is pointless, as cat_blob
is only used as a last resort when `git cat-file' fails.
Eric Wong [Wed, 11 Oct 2023 07:20:54 +0000 (07:20 +0000)]
import: switch to Unix stream socket for fast-import
We use fewer file descriptors and fewer lines of code this way.
I'm not aware of any place we rely on POSIX pipe semantics with
`git fast-import', and sockets have bigger buffers by default
in most cases (even if Linux allows larger pipe buffers).
Eric Wong [Wed, 11 Oct 2023 07:20:53 +0000 (07:20 +0000)]
treewide: consolidate "From " line removal
Aside from our prior import bugs (fixed in a0c07cba0e5d8b6a
(mda: drop leading "From " lines again, 2016-06-26)), we'll
always have to be dealing with mutt piping messages to us and
`git format-patch' output. So just share the regexp so we
can use it everywhere.
In may be desirable to allow importing messages with a leading
"From " line for FUSE, even.
Additionally, some instances of this regexp needlessly added
optional `\r?' (CR) checks ahead of the `\n' (LF) element; but
they're pointless anyways since [^\n]* is enough to exclude all
non-LF bytes.
Eric Wong [Wed, 11 Oct 2023 07:20:51 +0000 (07:20 +0000)]
msgtime: quiet warnings we can do nothing about
In retrospect, warning about bad times and dates is pointless
since there's nothing actionable about it. We'll also drop an
unnecessary capture in msg_received_at while we're at it and
favor using $eml since as the input variable name to match
current usage.
The note to install Date::Parse as a fallback remains since it
can be helpful in some cases (and is actionable by the user).
Eric Wong [Wed, 11 Oct 2023 07:20:50 +0000 (07:20 +0000)]
lei_xsearch: improve curl progress reporting
Instead of having tail(1) follow a file when we're in verbose
mode, unconditionally pipe stderr to a Perl 2-liner which tees
its output to a regular file with line buffering.
POSIX tee(1) isn't suitable for this task since it's required
to be completely unbuffered while we want line-buffering when
running parallel processes. Fortunately, Perl makes this easy.
This also means we no longer leave curl-err.XXXX files around
on premature shutdown if we're hit by a SIGKILL or similar and
can't exit normally.
We do need to stop and respawn the Perl process if we hit a curl
error, though, since we need to be certain the output is
flushed.
Eric Wong [Wed, 11 Oct 2023 07:20:49 +0000 (07:20 +0000)]
lei rediff: use ProcessIO for --drq support
This required fixing binmode support a few commits ago, along
with properly enabling autoflush in popen_wr instead of setting
it on the wrapper ProcessIO class.
Eric Wong [Tue, 10 Oct 2023 10:09:04 +0000 (10:09 +0000)]
ds: use a dummy poller during Reset
commit 1897c3be1ed644a05f96ed06cde4a9cc2ad0e5a4
(ds: Reset: replace Poller object early, 2023-10-04)
was not effective at eliminating the following message
at daemon shutdown:
Can't call method "FILENO" on an undefined value at
.../PublicInbox/Select.pm line 34 during global destruction.
This seems down to some tied objects having unpredictable
destruction order. So use a dummy class to ensure its ep_*
methods never call the tied `FILENO' method at all since
dropping the Poller object will release any resources it holds.
Eric Wong [Tue, 10 Oct 2023 10:07:56 +0000 (10:07 +0000)]
over*: avoid defined-or hash assignments with side-effects
These may've been causing strange errors[1] in t/imapd.t from
the -watch daemon, such as:
Cannot copy to HASH in scalar assignment ../PublicInbox/Over.pm
in the Over->dbh() sub. I've only noticed this failure on
FreeBSD 13.2 (Perl 5.32.1, DBD::SQLite 1.72 (bundled SQLite
3.39.4), DBI 1.643) so far, so it could also be something to do
with the versions used and/or memory layout differences
with libc or build toolchain.
Eric Wong [Tue, 10 Oct 2023 09:03:09 +0000 (09:03 +0000)]
t/nntp.t: attempt to track source of undefined vars
Occasionally, t/nntp.t spews undefined variable warnings under
`make check-run'. While the test doesn't fail, it's annoying
to see them and it could be a source of deeper problems.
Eric Wong [Mon, 9 Oct 2023 17:56:23 +0000 (17:56 +0000)]
www_coderepo: fix handling of non-UTF-8 git data
We can't assume git output is UTF-8, and we'll always have
legacy data in git coderepos. So attempt to display some
some garbled text rather than nothing at all if Perl croaks
on it.
sox commit c38987e8d20505621b8d872863afa7d233ed1096
(Added raw inverse-bit u-law and A-law support. Updated *.txt files., 2001-12-13)
is an example of a commit which caused problems for me.
Eric Wong [Sun, 8 Oct 2023 22:11:48 +0000 (22:11 +0000)]
introduce ProcessIONBF for multiplexed non-blocking IO
This is required for reliable epoll/kevent/poll/select
wakeup notifications, since we have no visibility into
the buffer states used internally by Perl.
We can safely use sysread here since we never use the :utf8
nor any :encoding Perl IO layers for readable pipes.
I suspect this fixes occasional failures from t/solver_git.t
when retrieving the WwwCoderepo summary.
Eric Wong [Sun, 8 Oct 2023 22:11:47 +0000 (22:11 +0000)]
process_io: fix binmode and use it in lei_xsearch
The `binmode' perlop can only take two scalars, so passing
`@_' blindly won't work since prototypes are checked. This
means we can get IO::Uncompress::Gunzip working properly
with ProcessIO and use it for curl.
We'll also just autodie (instead of warn) on FS errors when
dealing with curl stderr; since the process will likely be
in bigger trouble soon, anyways.
Eric Wong [Sun, 8 Oct 2023 20:19:40 +0000 (20:19 +0000)]
overidx: use croak/confess instead of die
Unlike `die', `croak' can be expanded to `confess' to give a
full backtrace. We'll use `confess' on transaction failures
since that occasionally causes sporadic t/imapd.t failures on
FreeBSD (IO::Kqueue is installed, so signals are deferred).
Eric Wong [Sat, 7 Oct 2023 21:24:09 +0000 (21:24 +0000)]
process_io: pass args to awaitpid as list
Specifying {cb_args} in the options hash felt awkward to me.
Instead, just use the Perl stack like we do with awaitpid()
and pass the list down directly.
Eric Wong [Sat, 7 Oct 2023 21:24:08 +0000 (21:24 +0000)]
rename ProcessPipe to ProcessIO
Since we deal with pipes (of either direction) and bidirectional
stream sockets for this class, it's better to remove the `Pipe'
from the name and replace it with `IO' to communicate that it
works for any form of IO::Handle-like object tied to a process.
Eric Wong [Sat, 7 Oct 2023 21:24:07 +0000 (21:24 +0000)]
import: use autodie, rely on PerlIO for retries
As documented in perlipc(1), the default :perlio layer retries
the `read' perlop on EINTR. The :perlio layer also makes `read'
perform read-in-full behavior; so there's no need to loop
ourselves. Our responsibility is now only to detect short reads
in case fast-import is killed mid-stream.
Eric Wong [Sat, 7 Oct 2023 21:24:06 +0000 (21:24 +0000)]
ipc: use autodie for most syscalls
I'm not sure how/if we should bother recovering from these,
so just croak and let some caller deal with it. `autodie'
uses Carp internally, so setting `PERL5OPT=-MCarp=verbose'
in the environment gives us full stacktraces.
Eric Wong [Sat, 7 Oct 2023 21:24:05 +0000 (21:24 +0000)]
ipc: require fork+SOCK_SEQPACKET for wq_* functions
None of the lei internals works properly without forking and
sockets. The fallback code increases the potential to accidentally
call subs in the wrong process during the teardown phase.
We'll still support ipc_do w/o forking for now since it
forking doesn't benefit small indexing runs from -mda and
such.
Eric Wong [Sun, 8 Oct 2023 05:49:34 +0000 (05:49 +0000)]
lei: fix implicit stdin support for pipes
Eric Wong <e@80x24.org> wrote:
> +++ b/t/lei-store-fail.t
> + my $cmd = [ qw(lei import -q -F mboxrd) ];
> + my $tp = start_script($cmd, undef, $opt);
Of course the lack of `-' or `--stdin' only worked on Linux and
NetBSD, but not other BSDs.
-------8<------
Subject: [PATCH] lei: fix implicit stdin support for pipes
st_mode permission bits can't be used to determine if a file or
pipe we have on stdin readable or not. Writable regular files
can be opened O_RDONLY, and permissions bits for pipes are
inconsistent across platforms.
On FreeBSD, OpenBSD, and Dragonfly, only the S_IFIFO bit is set
in st_mode with none of the permission bits are set. Linux and
NetBSD have both the read and write permission bits set for both
ends of a the pipe, so they're just as inaccurate but allowed
the feature to work before this change.
For now, we'll just assume our users know that stdin is intended
for input and consider any pipe or regular file to be readable.
If we were to be pedantic, we'd check O_RDONLY or O_RDWR
description flags via the F_GETFL fcntl(2) op to determine if a
pipe or socket is readable. However, I don't think it's worth
the code to do so.
Eric Wong [Fri, 6 Oct 2023 09:46:01 +0000 (09:46 +0000)]
xap_helper.h: strdup keys for DragonFlyBSD hdestroy(3)
DragonFlyBSD matches OpenBSD behavior in freeing every single key on
hdestroy(3). I suppose hdestroy(3) is neglected enough these days that
nobody cares and we'll likely introduce a small C hash table such as
khash (also used within git).
Eric Wong [Fri, 6 Oct 2023 09:46:00 +0000 (09:46 +0000)]
kqnotify: drop EV_CLEAR (edge triggering)
I'm not entirely certain how it works with the way we use
kevent. I do know IO::KQueue has hard-coded kevent retrievals
to 1000 events so it's conceivable we'd end up missing wakeups
as we don't loop or requeue in callers. So just rely on the
*BSD kernel to provided requeue behavior for us by using
level-triggering.
In any case, this seems to workaround t/dir_idle.t failures
on Dragonfly due to a tmpfs bug in all versions up to v6.4.
This ensures script/lei $send_cmd usage is EINTR-safe (since
I prefer to avoid loading PublicInbox::IPC for startup time).
Overall, it saves us some code, too.
Eric Wong [Wed, 4 Oct 2023 17:46:35 +0000 (17:46 +0000)]
makefile: symlink-install: do not depend on realpath
For the Makefile, we can use $(PWD) make macro from make(1posix)
as POSIX requires all environment variables be accessible as
macros, and $PWD is a standard sh(1) environment variable.
lei.sh must quiet the stderr of realpath before falling back to
readlink(1) which is available on NetBSD.
Eric Wong [Wed, 4 Oct 2023 08:50:58 +0000 (08:50 +0000)]
ds: make %AWAIT_PIDS a hash, not hashref
This is more persistent than some of the others and we don't
swap it on use (unlike $nextq or $ToClose). In other words,
it's helpful for communicating its lifetime expectancy is
close to %DescriptorMap and not like to queue-type things
such as $ToClose.
Eric Wong [Wed, 4 Oct 2023 08:50:57 +0000 (08:50 +0000)]
ds: cleanup fork + Reset support
We used to have many entries for %Stack, but nowadays it's just
the one used by next_tick, so just replace it a $cur_runq variable.
I'm reducing reliance on hash keys for things with global scope
to ensure typos can be detected (strict||v5.12 forces us to fix
uses of undeclared variables, but they can't detect typos in
hash keys.
Eric Wong [Wed, 4 Oct 2023 08:50:56 +0000 (08:50 +0000)]
ds: Reset: replace Poller object early
Process shutdown can be chaotic and unpredictable. Try to make
it more predictable by ensuring any PublicInbox::Select object
can't hold references to any objects.
This should fix the following error I saw in syslog during a deploy:
Can't call method "FILENO" on an undefined value at
.../PublicInbox/Select.pm line 34 during global destruction.
Replacing $Poller with PublicInbox::Select (instead of undef-ing
it) means we can avoid adding branches to ->epwait and ->close
before calls to ->ep_mod and ->ep_del, respectively.
Eric Wong [Wed, 4 Oct 2023 03:49:20 +0000 (03:49 +0000)]
lei: get rid of l2m_progress PktOp callback
We already have an ->incr callback we can enhance to support
multiple counters with a single request. Furthermore, we can
just flatten the object graph by storing counters directly in
the $lei object itself to reduce hash lookups.
Eric Wong [Wed, 4 Oct 2023 03:49:16 +0000 (03:49 +0000)]
move all non-test @post_loop_do into named subs
Compared to Danga::Socket, our @post_loop_do API is designed to
make it easier to avoid anonymous subs (and their potential for
leaks in buggy old versions of Perl).
Eric Wong [Tue, 3 Oct 2023 16:18:17 +0000 (16:18 +0000)]
t/lei-q-save: quiet `no email in From: ...' warnings
PublicInbox::Import will warn if it can't extract a valid
address from an email. We need to ensure our tests capture
them to $lei_err instead of spewing them to the terminal.
While we're at it, use autodie and xsys_e to simplify some.
Eric Wong [Tue, 3 Oct 2023 09:26:01 +0000 (09:26 +0000)]
daemon: enable SO_ACCEPTFILTER on NetBSD
NetBSD 5.0+ has accept filter support from FreeBSD; and I
I think we can assume all NetBSD is 5.0+ (released in 2009)
nowadays if we're already depending on Perl 5.12 from 2010.
Eric Wong [Tue, 3 Oct 2023 06:43:49 +0000 (06:43 +0000)]
lei: workers exit after they tell lei-daemon
We don't want workers continuing after their stdout has triggered
EPIPE or some other write error.
This fixes xt/lei-onion-convert.t to ensure the quit_waiter_pipe
is fully-closed at daemon teardown during tests. Using the
`exit' perlop still ensures OnDestroy callbacks will fire.
Eric Wong [Tue, 3 Oct 2023 06:43:48 +0000 (06:43 +0000)]
net_reader: support imap.sslVerify + nntp.sslVerify
These options are useful for testing as well as users stuck on
out-of-date systems, dealing with forgetful sysadmins, broken
cronjobs, and/or are willing to risk MITM attacks.
Eric Wong [Tue, 3 Oct 2023 06:43:47 +0000 (06:43 +0000)]
config: fix key-only truthy values with urlmatch
When using --get-urlmatch, we need a way to distinguish between
between key-only or a `key=val' pair even if the `val' is empty.
In other words, git interprets `-c imap.debug' as true and
`-c imap.debug=' as false, but an untyped --get-urlmatch
invocation has no way to distinguish between them.
So we must specify we want `--bool' (we're avoiding `--type=bool'
since that only appears in git 2.18+)
IO::Socket::SSL had an unitialized variable warning from a bad
regexp for a few releases. This will also prepare us to support
imap.sslverify as git does and possibly other TLS-related
options.
Eric Wong [Tue, 3 Oct 2023 06:43:45 +0000 (06:43 +0000)]
net_reader: bail out on NNTP SOCKS connection failure
It takes some effort to get Net::NNTP and IO::Socket::Socks
to place nice together, but we don't want the setsockopt
call to fail on an undefined value. So die with an error
that tries to show various possible error sources.
$SOCKS_ERROR is a special variable, so even using `//'
(defined-or) operator isn't enough to squelch warnings
about using it in its uninitialized state.
Eric Wong [Mon, 2 Oct 2023 15:00:23 +0000 (15:00 +0000)]
lei: do label/keyword parsing in optparse
Calling vmd_mod_extract after optparse causes the implicit
stdin-as-input functionality to fail, as the implicit stdin
requires a lack of inputs remaining in argv after option
parsing (along with a regular file or pipe as stdin).
This allows commands such as `lei import -F eml +kw:seen'
to work without `--stdin', `-' or any path names when
importing a single message. This also ensures commands like
`lei import +kw:seen' without any inputs/locations will fail
reliably, as the extra +kw: arg won't be a false-positive.
Eric Wong [Mon, 2 Oct 2023 14:58:07 +0000 (14:58 +0000)]
lei up: faster non-thread, single-source incremental query
When using isearch (that is v1/v2 inbox relying on extindex
for search), there's actually no guarantee that IMAP UIDs
are in the correct order with regard to Xapian docids.
Thus we must iterate through every UID(num) to see if it's
suitable to display in a saved search. The old grep filter
(before commit a6fe84489127) was not effective since it
didn't account for the mset->items correspondence.
Fortunately, this bug merely manifests in reduced performance
as of a6fe84489127. Prior to that, it could cause incorrect
keywords and labels to be applied.
Unfortunately, this behavior is hard-to-test so no test case
is included.
Eric Wong [Sun, 1 Oct 2023 09:54:29 +0000 (09:54 +0000)]
lei: deal with clients with blocked stderr
lei/store can get stuck if lei-daemon is blocked, and lei-daemon
can get stuck when a clients stderr is redirected to a pager
that isn't consumed.
So start relying on Time::HiRes::alarm to generate SIGALRM to
break out of the `print' perlop. Unfortunately, this isn't easy
since Perl auto-restarts all writes, so we dup(2) the
destination FD and close the copy in the SIGALRM handler to
force `print' to return.
Most programs (MUAs, editors, etc.) aren't equipped to deal with
non-blocking STDERR, so we can't make the stderr file description
non-blocking.
Another way to solve this problem would be to have script/lei
send a non-blocking pipe to lei-daemon in the {2} slot and
make script/lei splice messages from the pipe to stderr.
Unfortunately, that requires more work and forces more
complexity into script/lei and slow down normal cases where
stderr doesn't get blocked.
Eric Wong [Sun, 1 Oct 2023 09:54:27 +0000 (09:54 +0000)]
treewide: enable warnings in all exec-ed processes
While forked processes inherit from the parent, exec-ed
processes need the `-w' flag passed to them. To determine
whether or not we should pass them, we must check the `$^W'
global perlvar, first.
We'll also favor `perl -e' over `perl -E' in places where
we don't rely on the latest features, since `-E' incurs
slightly more startup time overhead from loading feature.pm
(while `perl -Mv5.12' does not).
Eric Wong [Sun, 1 Oct 2023 09:54:26 +0000 (09:54 +0000)]
overidx: fix version comparison
We can't use $DBD::SQLite::sqlite_version_number with older versions of
DBD::SQLite. Thus we need to treat the $DBD::SQLite::sqlite_version
string (e.g. "3.8.3", not v-string) and convert it to a v-string with
eval for version comparisons to determine if we can fork multiple
children when using SQLite.
Fixes: fa04201baae9 ("lei: force --jobs=1,1 for SQLite < 3.8.3")
Eric Wong [Sun, 1 Oct 2023 09:54:25 +0000 (09:54 +0000)]
lei_store: unlink stderr buffer early
While we're at it, ensure we clear the Perl internal EOF
marker before attempting to read the appended-to file
handle since newer Perl may leave the internal EOF marker set.
Eric Wong [Sun, 1 Oct 2023 09:54:24 +0000 (09:54 +0000)]
lei mail-diff: don't remove temporary subdirectory
->{curdir} is localized inside MailDiff->dump_eml anyways, so it
was attempting to remove `undef' :x. Since most messages don't
have too many attachments, save some opcodes on our end and just
let File::Temp::Dir->DESTROY handle all the cleanup.
Eric Wong [Sun, 1 Oct 2023 09:54:23 +0000 (09:54 +0000)]
lei: correct exit signal
The first argument passed to Perl signal handlers is a
signal name (e.g. "TERM") and not an integer that can
be passed to the `exit' perlop. Thus we must look up the
integer value from the POSIX module.
Eric Wong [Sun, 1 Oct 2023 09:54:22 +0000 (09:54 +0000)]
lei rediff: `git diff -O<order-file>' support
We can't use the `-O' switch since it conflicts with
--only|-O= to specify externals. Thus we'll introduce
a more verbose `--order-file=FILE' option when running
`git diff'.
Eric Wong [Sun, 1 Oct 2023 09:54:21 +0000 (09:54 +0000)]
git: packed_bytes: deal with glob+stat TOCTTOU
There's not much we can do about this aside from just ignoring
errors and considering un-stat-able files as zero-sized.
There's no syscalls which expose FUSE3 `readdirplus' type
functionality to userspace to avoid this problem.
Eric Wong [Sun, 1 Oct 2023 09:54:20 +0000 (09:54 +0000)]
git: improve error reporting
We can use autodie for socketpair to handle errors for us,
but we need Time::HiRes::stat so we must write the error message
ourselves if stat-ing the git executable fails.
Eric Wong [Sun, 1 Oct 2023 09:54:19 +0000 (09:54 +0000)]
process_pipe: don't run `close' unless requested
If a user is relying on reference counts to invalidate FDs
(as we do in many places), rely on them instead of explicit
`close'. This forces us to do a better job of managing refs
and avoiding redundant fields which make our code more fragile.
Eric Wong [Sun, 1 Oct 2023 09:54:17 +0000 (09:54 +0000)]
gcf2: take non-ref scalar request arg
Asking callers to pass a scalar reference is awkward and
doesn't benefit modern Perl with CoW support. Unlike
some constant error messages, it can't save any allocations
at all since there's no constant strings being passed to
libgit2.
Eric Wong [Sat, 30 Sep 2023 15:20:40 +0000 (15:20 +0000)]
git+gcf2client: switch to level-triggered wakeups
Instead of using ->requeue to emulate level-triggered wakeups in
userspace, just use level-triggered wakeups in the kernel to
save some user time at the expense of system (kernel) time. Of
course, the ready list implementation in the kernel via C is
faster than a Perl one on our end.
We must still use requeue if we've got buffered data, however.
Followup-to: 1181a7e6a853 (listener: switch to level-triggered epoll)
Eric Wong [Sat, 30 Sep 2023 15:20:39 +0000 (15:20 +0000)]
git: use Unix stream sockets for `cat-file --batch-*'
The benefit of 1MB potential pipe buffer size (on Linux) doesn't
seem noticeable when reading from git (unlike when writing to v2
shards), so Unix stream sockets seem fine, here.
This allows us to simplify our process management by using the
same socket FD for reads and writes and enables us to use our
ProcessPipe class for reaping (as we can do with Gcf2Client).
Gcf2Client no longer relies on PublicInbox::DS for write
buffering, and instead just waits for requests to complete
once the number of inflight requests hits the MAX_INFLIGHT
threshold as we do with PublicInbox::Git.
We reuse the existing MAX_INFLIGHT limit (18) that was
determined by the minimum allowed PIPE_BUF (512). (AFAIK) Unix
stream sockets have no analogy to PIPE_BUF, but all *BSDs and
Linux I've checked have default SO_RCVBUF and SO_SNDBUF values
larger than the previously-required PIPE_BUF size of 512 bytes.
Eric Wong [Sat, 30 Sep 2023 15:20:38 +0000 (15:20 +0000)]
git: decouple cat_async_retry from POSIX pipe semantics
While pipes guarantee writes of <= 512 bytes to be atomic,
Unix stream sockets (or TCP sockets) have no such guarantees.
Removing the pipe assumption will make it possible for us to
switch to bidirectional Unix stream sockets and save FDs with
`git cat-file' processes as we have with Gcf2Client. The
performance benefit of larger pipe buffers over stream sockets
isn't irrelevant when interacting with git as it is with
SearchIdx shards.
Eric Wong [Sat, 30 Sep 2023 00:36:16 +0000 (00:36 +0000)]
lei convert: support reading from v1, v2, and extindex
We should be able to dump all public-inbox and extindex directories
to Maildir/mbox* or IMAP folders. Even unindexed inboxes can be
dumped as long as inbox.lock (or ssoma.lock) exists.
This change likely works for `lei tag' and other lei_input-using
things, as well, but that's untested at the moment. I mainly
want to be able to use `lei convert' to benchmark some upcoming
changes...
Eric Wong [Sat, 30 Sep 2023 00:36:15 +0000 (00:36 +0000)]
lei_input: always prefix `maildir:' internally
This allows us to reduce stats for `new' and `cur' subdirs
of the Maildir and will also make it easier for us to support
MH, v2, v1, and extindex directories as inputs.
Eric Wong [Fri, 29 Sep 2023 10:41:05 +0000 (10:41 +0000)]
git: fix unused code path for cat-file stderr reset
We haven't used _bidi_pipe idempotently in a while, so
the stderr was never getting reset on reads.
This isn't fully useful when using async eeeae20893a25956
(imap: use git-cat-file asynchronously, 2020-06-10)
So instead of truncating it on reads, we'll truncate
immediately after reading and rely on O_APPEND to keep
new writes at the end.
Fortunately, this stderrr error checking isn't used
outside of solver (which is synchronous).
Eric Wong [Fri, 29 Sep 2023 02:44:06 +0000 (02:44 +0000)]
git: calculate MAX_INFLIGHT properly in Perl
Unlike C, Perl automatically converts quotients to double-precision
floating point even with UV/IV numerators and denominators. So
force the intermediate quotient to be an integer before
multiplying it by the size of each inflight array element.
This bug was inconsequential for all platforms since d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
and inconsequential on most (or all?) Linux even before that due
to the larger 4096-byte PIPE_BUF on Linux.