Eric Wong [Wed, 30 Aug 2023 05:10:39 +0000 (05:10 +0000)]
treewide: drop MSG_EOR with AF_UNIX+SOCK_SEQPACKET
It's apparently not needed for AF_UNIX + SOCK_SEQPACKET as our
receivers never check for MSG_EOR in "struct msghdr".msg_flags
anyways. I don't believe POSIX is clear on the exact semantics
of MSG_EOR on this socket type. This works around truncation
problems on OpenBSD recvmsg when MSG_EOR is used by the sender.
Eric Wong [Tue, 29 Aug 2023 17:20:16 +0000 (17:20 +0000)]
t/spawn.t: workaround OpenBSD RLIMIT_CPU delays
RLIMIT_CPU on OpenBSD doesn't work reliably with few syscalls or
on mostly idle systems. Even at its most accurate, it takes an
extra second to fire compared to FreeBSD or Linux due to
internal accounting differences, but worst case even the SIGKILL
can be 50s delayed.
So rewrite the CPU burner script in Perl where we can unblock
SIGXCPU and reliably use more syscalls.
Štěpán Němec [Mon, 28 Aug 2023 10:45:13 +0000 (12:45 +0200)]
public-inbox-init: honor umask when creating config file
Creating config 0600 disregarding umask breaks scenarios where daemons
run with credentials different from config owner (but need to read the
config).
File::Temp defaults to 0600, which is unsuitable for the
recommended/typical scenario of daemons running unprivileged and with
UID different from $PI_CONFIG owner, as the deamons need to read
$PI_CONFIG.
Respecting umask might end up creating world-unreadable config, too,
but for people who use such umask that's expected behavior.
Štěpán Němec [Mon, 28 Aug 2023 10:42:42 +0000 (12:42 +0200)]
ci/profiles.sh: fix case matching logic
'-' could never match, remove that alternative (it might have been a
typo of '--', but that is already covered by '*--|--*' ('*' matches
the null string)).
Replace '*--*' with the equivalent '*' ('--' is always present).
It would seem clearer to just replace the whole case command with
something like '[ "$ID" -a "$VERSION_ID" ] && break' (or the
POSIX-non-deprecated equivalent '[ "$ID" ] && [ "$VERSION_ID" ]' ); I
assume a preference of using case here (e.g., to avoid syscall
overhead in case [ is not implemented as a shell builtin (which seems
far-fetched given the context, though)).
Eric Wong [Sat, 26 Aug 2023 20:14:04 +0000 (20:14 +0000)]
t/xap_helper: skip test if missing SCM_RIGHTS support
xap_helper currently relies on FDs passed via SCM_RIGHTS for
robustness against $TMPDIR failures and over-eager FS cleanup
tasks. This depends on stable syscall numbers (Linux) or
Inline::C||Socket::MsgHdr being available, though, as Perl5
itself doesn't support SCM_RIGHTS.
We could probably add FIFO support to xap_helper for portability
to systems where neither Inline::C nor Socket::MsgHdr are available,
but that's for another day.
Eric Wong [Sat, 26 Aug 2023 06:13:17 +0000 (06:13 +0000)]
xap_helper: fix C++-specific warnings
While initialization of zeroed structs in C is done via `{0}',
I've just learned from g++(1) that C++ uses `{}'. I can't seem
to get use of a single designated initializer to compile without
warnings in C++, either, so we'll just initialize them as zero
and assign them ASAP for __cleanup__ functions.
This fixes compilation warnings under -Wextra in g++ (Debian 10.2.1-6)
which adds -Wmissing-field-initializers. This also fixes compilation
warnings under -Wall in clang (FreeBSD 13.0.0) from -Wmissing.
The `xapian-bindings-perl' package contains the Xapian.pm
SWIG bindings, but doesn't adhere to the existing convention
of naming system packages after the Perl package name itself
using: "p5-${\($Perl_package_name =~ s/::/-/gr)}".
Eric Wong [Thu, 24 Aug 2023 22:07:46 +0000 (22:07 +0000)]
cindex: dump cidx shards before inboxes
Since cidx shards used for associations are typically bigger
than individual inboxes, we'll dump them first to get better
work scheduling for xap_helper processes.
This gives roughly a 5% performance improvement with doing
a full associate on (git+lore).kernel.org
Eric Wong [Thu, 24 Aug 2023 01:22:36 +0000 (01:22 +0000)]
xap_helper: reopen+retry in MSetIterator loops
It's possible to hit a DatabaseModifiedError while iterating
through an MSet. We'll retry in these cases and cleanup some
code in both the Perl and C++ implementations.
Eric Wong [Thu, 24 Aug 2023 01:22:35 +0000 (01:22 +0000)]
cindex: implement dump_roots in C++
It's now just `dump_roots' instead of `dump_shard_roots', since
this doesn't need to be tied to the concept of shards. I'm
still shaky with C++, but intend to keep using stuff like
hsearch(3) to make life easier for C hackers :P
Eric Wong [Thu, 24 Aug 2023 01:22:34 +0000 (01:22 +0000)]
cindex: fix sorting and uniqueness
We can't rely on combining the `-u' and `-k1,1' switches of POSIX
sort(1) to do what we want. So only rely on `sort -k1,1' while
introducing a small Perl helper to fold identical prefixes into
one line. In other words, input such as:
ORS is current the comma (`,') for inbox IDs, but it'll be a
space (` ') for coderepo root IDs. This implementation also
combines identical IDs in the 2nd column. Thus:
Becomes a single `deadbeef 0' line thanks to the use of
XS List::Util::uniq (which beats a pure Perl hash).
I attempted to implement this in awk but Perl is close enough to
gawk in performance while being shorter and easier-to-understand
due to List::Util::uniq. mawk was faster, but still not enough
to matter as the bottleneck is from iterating through Xapian
MSets.
Eric Wong [Thu, 24 Aug 2023 01:22:33 +0000 (01:22 +0000)]
introduce optional C++ xap_helper
This allows us to perform the expensive "dump_ibx" operations in
native C++ code using the Xapian C++ library. This provides the
majority of the speedup with the -cindex --associate switch.
Eventually this may be expanded to cover all uses of Xapian
within the project to ensure we have access to Xapian APIs which
aren't available in XS|SWIG bindings; and also for
ease-of-installation on systems which don't provide
pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but
do provide Xapian development libraries.
Most of the C++ code is still C, as I'm not remotely familiar
with C++ compared to C. I suspect many users and potential
hackers being from git, Linux kernel, and glibc world are in the
same boat.
Eric Wong [Thu, 24 Aug 2023 01:22:31 +0000 (01:22 +0000)]
cindex: read-only association dump
This will eventually allow associating coderepos with inboxes
and vice-versa; avoiding the need for manual configuration via
tedious publicinbox.*.coderepo directives.
I'm not sure how this should be stored for WWW, yet, but it's
required since it takes about 8 hours to do this fully across
lore and git.kernel.org.
Eric Wong [Sat, 19 Aug 2023 08:30:51 +0000 (08:30 +0000)]
isearch: avoid hex string for Xapian sortable_serialise
While a string representing a integer in hex is fine for DBI and
SQLite, Xapian's sortable_serialise requires a Perl integer value.
So just retrieve the last Xapian DB document ID in this rare
code path because we can't use 64-bit integer literals in some
32-bit Perl builds (e.g. OpenBSD on i386)
Fixes: be2a0a353d60 ("isearch: support 64-bit article numbers for SQLite query")
Eric Wong [Thu, 17 Aug 2023 07:23:10 +0000 (07:23 +0000)]
t/nntp.t: attempt to quiet spurious uninitialized warnings
When running via t/run.perl ("make check-run") to reduce test
startup time, t/nntp.t occasionally hits uninitialized variable
warnings in the quote_str sub. I can't reproduce these
reliably, but scoping subs in tests reduces the chance of
conflict when we reuse interpreters.
Eric Wong [Wed, 16 Aug 2023 08:07:12 +0000 (08:07 +0000)]
search: all_terms: remove needless prefix check
The ->allterms_{begin,end} methods of Xapian::Database already
filter match on prefix natively. Thus there's no need to do
filtering ourselves (unlike per-document ->termlist_{begin/end})
Eric Wong [Thu, 27 Jul 2023 21:18:55 +0000 (21:18 +0000)]
clone: allow running without DBI / DBD::SQLite
Due to historic reasons, LeiQuery.pm gets loaded with LEI.pm and
-clone depends on LEI. So delay loading any DBI-dependent
modules until querying is actually required.
Eric Wong [Thu, 27 Jul 2023 21:18:54 +0000 (21:18 +0000)]
Makefile.pl: *.cols: account for non-UTF-8-aware awk
When checking line length limits, the `length()' function of
mawk doesn't count non-ASCII characters properly in UTF-8
locales. Force the man(1) output to use C locale and use normal
`-' instead of multi-byte dash characters.
Eric Wong [Fri, 14 Jul 2023 09:28:47 +0000 (09:28 +0000)]
tests: t/run.perl: fix invocations with <10 tests
We must account for the maximum index of an array to avoid
filling unused slots with `undef' from out-of-bounds reads.
This is needed to avoid undefined entry errors in workers when
fewer than 10 tests are run. We'll also silence the message
when a single test is run.
While I was diagnosing this, I also noticed a small
simplification and optimization in our generation of $todo_buf
since I initially thought that was the cause of undefined
entry errors in the $todo arrayref.
Eric Wong [Thu, 13 Jul 2023 05:39:17 +0000 (05:39 +0000)]
t/imapd: workaround a Perl 5.36.0 readline regression
Buffered readline (and read) ops under Perl 5.36.0 fails to read
new data after writes are made by other file handles (or
processes).
To fix and improve our test, introduce a new, (currently)
test-only TailNotify class to use inotify or kevent if available
to workaround it while avoiding infinite polling loops. Further
refinements to these test APIs since we use the same pattern for
testing daemons in many places.
This also fixes the TEST_KILL_IMAPD condition in t/imapd.t under
GNU/Linux, AFAIK that test was never reliable under FreeBSD.
Eric Wong [Thu, 13 Jul 2023 05:40:20 +0000 (05:40 +0000)]
doc: HACKING: drop bit about Debian 9.x (stretch)
It's oldoldstable, by now; just refer to Debian stable as
the primary but keep LTS distros in mind because stuff like
CentOS 7.x needs to remain supported.
Eric Wong [Tue, 11 Jul 2023 10:29:28 +0000 (10:29 +0000)]
Makefile.PL: depend on IO::Poll in case distros split it out
IO::Poll is part of the Perl standard library, but there's
always a chance distros will make it part of another package
since it's not portable to non-POSIX-like OSes.
Eric Wong [Wed, 21 Jun 2023 10:16:57 +0000 (10:16 +0000)]
t/solver_git: drop needless `use' and Plack deps
`lei (blob|rediff)' works without Plack installed, so don't put
a dependency on Plack or anything related to HTTP aside from
the URI module which we use everywhere. This only enables testing
the solver component on systems without Plack (as the actual lei
functionality has always worked without Plack).
Eric Wong [Fri, 16 Jun 2023 23:13:01 +0000 (23:13 +0000)]
www: use correct threadid for per-thread search
For individual public-inboxes relying on extindex for per-inbox
search, we must use the threadid from the extindex over.sqlite3
rather than the per-inbox over.sqlite3 file.
Eric Wong [Thu, 15 Jun 2023 09:50:53 +0000 (09:50 +0000)]
lei: make --dedupe=content always account for Message-IDs
The content dedupe logic was originally designed for v2 public
inboxes as a fallback for when the importer sees identical
Message-IDs. Thus it did not account for Message-ID(s) in
the message itself.
This change doesn't affect saved searches (the default when
writing to a pathname or IMAP). It affects --no-save, and
outputs to stdout (even if stdout is redirected to a file).
Prior to this change, lei reused the v2 logic as-is without
accounting for Message-IDs anywhere with `--dedupe=content'
(the default). This could cause messages to be skipped when
the content matches despite Message-IDs being different.
So with this change, `lei q --dedupe=content' will hash the
Message-ID(s) in the message to ensure messages with different
Message-IDs are NOT deduplicated.
Whether or not this change is a bug fix or introduces regression
is actually debatable. In my mind, it is better to err on the
side of showing too many messages rather than too few, even if
the actual contents of the message are identical. Making saved
searches deduplicate without accounting for Message-IDs would be
more difficult, too.
Eric Wong [Thu, 15 Jun 2023 08:46:37 +0000 (08:46 +0000)]
lei import: set +(L|kw) on already-imported blobs
When import hits blobs it's already seen, we'll add labels
regardless in order to match the behavior of other inexact
matches. This is useful when importing exact copies of
messages which exist in multiple mailboxes.
I noticed this when I had a message imported from my normal IMAP
`INBOX', but also copied it to a different folder for future
reference.
Eric Wong [Fri, 9 Jun 2023 10:31:08 +0000 (10:31 +0000)]
add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
Eric Wong [Fri, 9 Jun 2023 10:31:07 +0000 (10:31 +0000)]
search: hoist out do_enquire for codesearch
Reusing this bit seems to make sense as mail and code search
are similar enough w.r.t. setting up sort options. This
deduplication will become more useful as -cindex will
likely combine code and mail search to generate associations
between inboxes and code repos.
Eric Wong [Fri, 9 Jun 2023 10:31:06 +0000 (10:31 +0000)]
search: add comments wrt codesearch, reduce ops
Add some comments about various usages of xdb_shards_flat and
mset since the addition of CodeSearch (and other search things)
subclassing it may become confusing.
Since we're in the area, we can also avoid an extra hash
lookups/initializations and reduce Perl ops in various places.
Eric Wong [Thu, 8 Jun 2023 18:26:08 +0000 (18:26 +0000)]
t/lei.t: quiet newline warning on older Perls
Perl < 5.22 warned on newlines in the middle of a string instead
of just the end. Workaround it by disabling all warnings on older
Perls while running File::Path::mkpath.
Eric Wong [Thu, 8 Jun 2023 18:04:54 +0000 (18:04 +0000)]
xapcmd: rely on File::Temp cleanup for temporary dir
remove_tree from File::Path 2.09 (from Perl 5.16.3 on CentOS 7.x)
doesn't seem to work properly on File::Temp objects. Since
File::Temp->newdir sets CLEANUP=>1 by default anyways, we'll
just rely on that to perform cleanup instead of doing it ourselves.
Eric Wong [Wed, 31 May 2023 22:10:01 +0000 (22:10 +0000)]
www: more restrictive query string parsing
Only allow single-character query keys to prevent clients from
wasting memory in Perl's hash tables. We'll also perform the
utf8::decode and tr/+/ / calls once on the whole query string at
once to reduce op calls.
This also avoids creating an empty hash in the common case
when the QUERY_STRING is empty and instead relies on
auto-vivification of Perl.
Eric Wong [Tue, 9 May 2023 09:15:30 +0000 (09:15 +0000)]
cindex: fix --no-scan no-op non-termination
We must account for the shards_active() recursing upon itself
when outside DS->event_loop. This is tricky, unfortunately, but
--no-scan isn't a common mode of operation. Noticed while
developing the monster --associate functionality to
automatically create bidirectional associations of
inboxes/extindices to coderepos.
Eric Wong [Mon, 8 May 2023 16:58:14 +0000 (16:58 +0000)]
cindex: fix --no-scan --prune
We must define $GIT_TODO to be non-undef when using --no-scan
for prune-only invocations to run. I'm leaning towards making
--no-scan a publicly-documented switch for -cindex; but I'm
less certain about documenting it for -index and -extindex...
Eric Wong [Sat, 6 May 2023 22:40:03 +0000 (22:40 +0000)]
isearch: support 64-bit article numbers for SQLite query
While IMAP UIDs are specified as 32-bit in RFC 3501, there's no
reason we can't support 64-bit article numbers on our end when
the time comes. Neither NNTP nor POP3 have the 32-bit
limitation, even, so it's not inconceivable that IMAP will drop
that limitation at some point, too.
Eric Wong [Thu, 4 May 2023 11:06:42 +0000 (11:06 +0000)]
xcpdb: support cindex upgrades and resharding
xcpdb is necessary for upgrading Xapian backends (e.g. glass to
honey), thus codesearch indices (cindex) must be supported.
Resharding is also useful if CPU count is altered on system
upgrades or downgrades.
cindex Xapian sharding is completely different than anything
else we do, so the resharding operation must be a special case
based on existing cindex sharding rules.
Eric Wong [Wed, 3 May 2023 11:42:15 +0000 (11:42 +0000)]
cindex: --prune + --exclude= drops repo information
--exclude= alone only prevents a coderepo from being indexed in
a particular invocation, but --prune will purge all traces of it
to ensure --update doesn't pick it up again w/o --exclude=
(unless --project-list= includes it).
Eric Wong [Mon, 1 May 2023 23:29:35 +0000 (23:29 +0000)]
daemon: improve handling of Git->async_abort
The $oid arg for Git->cat_async is defined on async_abort using
the original request, so use undefined $type to distinguish that
case in caller-supplied callbacks. async_abort isn't common, of
course, but sometimes git subprocesses can die unexpectedly.
Eric Wong [Sat, 29 Apr 2023 20:02:14 +0000 (20:02 +0000)]
solver_git: don't spew to daemon err on git apply failure
Too many patches don't apply (due to coderepos being a PITA to
associate) and interested admins can check for 404s to diagnose
them, anyways. This reduces the noise in syslog/stderr for
public-facing daemons.
Eric Wong [Fri, 28 Apr 2023 21:07:30 +0000 (21:07 +0000)]
git: make check_async callbacks identical to cat_async
This simplifies Git->cat_async_step and fixes Git->async_abort,
the latter of which was passing arguments improperly for the
--batch-check (or `info') case at the cost of making the few
check_async callers handle an extra argument.
The extra (PublicInbox::Git) $self argument for check_async
callbacks is now gone, as avoiding the temporary cyclic
reference doesn't seem worthwhile since the temporary cyclic
reference appears in the ->cat_async code paths, too.
Eric Wong [Wed, 26 Apr 2023 00:49:29 +0000 (00:49 +0000)]
xcpdb: preserve indexlevel for extindex
This likely fixes indexlevel preservation for some v2 on some
systems, too, since (apparently) we need to sort shards
numerically to get Xapian metadata working properly on a
combined (multi-shard) Xapian DB.
Eric Wong [Tue, 25 Apr 2023 11:02:58 +0000 (11:02 +0000)]
emergency: make error messages more consistent
Showing "failed" is needless if we already know the program
is die-ing. We'll prefix "BUG:" to bug messages, "W:" to
non-fatal warnings to be consistent with our newer code such
as lei.
Eric Wong [Tue, 25 Apr 2023 11:02:55 +0000 (11:02 +0000)]
cindex: simplify store_repo
It's easier to just create a new Xapian::Document and
replace it rather than to load and edit it. I don't
know if there's any performance difference one way or
the other, but fewer branches helps with maintainability
and smaller optree size to lower memory use and startup
speed.
Eric Wong [Tue, 25 Apr 2023 11:02:54 +0000 (11:02 +0000)]
cindex: simplify tmpfile management for indexing
I considered making this a pipe, but we must avoid spawning
`git log --stdin --no-walk=unsorted' for the no-op case since that
still emits a commit if stdin is empty. So just get rid of an
unnecessary loop and do lseek(2) inside workers for parallelism
Eric Wong [Tue, 25 Apr 2023 11:02:53 +0000 (11:02 +0000)]
cindex: drop unneeded module use
I initially thought I'd use the PublicInbox::Eml module and rely
on --pretty=mboxrd; but eventually decided against it since
it wasn't saving any code.
Eric Wong [Tue, 25 Apr 2023 10:50:52 +0000 (10:50 +0000)]
content_digest_dbg: improve display of To:/Cc: diffs
To: and Cc: headers can be long and differences in long lines
are easier to view when broken apart. Just split by /,/ since
Data::Dumper will delimit with "," anyways.
Eric Wong [Tue, 25 Apr 2023 10:50:51 +0000 (10:50 +0000)]
mail_diff: show headers differences in WWW /$MSGID/d/ view
Some messages only differ in the To/Cc headers because some
MTAs seem to normalize them. I was getting confused when I
saw some /d/ endpoints with no visible differences
Eric Wong [Tue, 25 Apr 2023 10:50:50 +0000 (10:50 +0000)]
mail_diff: match ContentHash EOL and EOM behavior more closely
ContentHash currently doesn't convert CRCRLF to LF. Perhaps it
should, but for now, have diff behavior match the actual
comparison behavior used for dedupe and omit all trailing
whitespace for diff.
Eric Wong [Tue, 25 Apr 2023 10:50:49 +0000 (10:50 +0000)]
mid+contenthash: eliminate needless local variable captures
It's possible in theory that Perl could be smarter and free
memory a tad sooner this way. Regardless, fewer lines of code
is easier-to-navigate/read and can save optree size and reduce
parsing times.
Eric Wong [Sat, 22 Apr 2023 10:33:42 +0000 (10:33 +0000)]
cindex: rewrite prune (again) for speed
With my partial git.kernel.org mirror, this brings a full prune
down from ~75 minutes to under 5 minutes using git 2.19+. This
speedup even applies to users on slow storage (rotational HDD).
First off, xapian-delve(1) is nearly 10x faster for dumping
boolean terms by prefix than the equivalent Perl code with
Xapian bindings. This performance difference is critical since
we need to check over 5 million commits for pruning a partial
git.kernel.org mirror.
We can use sed(1) and sort(1) to massage delve output into
something suitable for the first comm(1) input.
For the second comm(1) input, the output of `git cat-file
--batch-check --batch-all-objects' against all indexed git repos
with awk(1) filtering provides the necessary output for
generating a list of indexed-but-no-longer accessible commits.
sed(1) and awk(1) are POSIX standard tools which can be roughly
2x faster than equivalent Perl for simple filters, while
sort(1) is designed to handle larger-than-memory datasets
efficiently (unlike the `sort' perlop).
With slow storage and git <2.19, the switch to --batch-all-objects
actually results in a performance regression since having git
perform sorting results in worse disk locality than the previous
sequential iteration by Xapian docid. git 2.19+ users with
`--unordered' support benefits from improved storage locality;
and speedups from storage locality dwarfs the extra overhead of
an extra external sort(1) invocation.
Even with consumer-grade SATA-II SSDs, the combo of --unordered
and sort(1) provides a noticeable speedup since SSD latency
remains a factor for --batch-all-objects.
git <2.19 users must upgrade git to get acceptable performance
on slow storage and giant indexes, but git 2.19 was released
nearly 5 years ago so it's probably a reasonable requirement for
performance.
The only remaining downside of this change for all users
the extra temporary disk space for sort(1) and comm(1);
but the speedup provided with git 2.19+ is well worth it.
Eric Wong [Thu, 20 Apr 2023 10:23:02 +0000 (10:23 +0000)]
lei_mail_sync: prepare to support SHA-256
I'm not sure how combining SHA-1 and SHA-256 in a single git
repo will work, eventually. But this is an obvious place to do
the right thing if we ever see a 64-byte hex string (unless git
adds support for another hash which uses 64-byte hex string
representations, which would break many assumptions elsewhere,
too...).
Eric Wong [Thu, 20 Apr 2023 00:53:30 +0000 (00:53 +0000)]
cindex: limit parallelism of extensions.objectFormat check
We can't safely spawn all `git config' processes of every
indexed git directory at once due to system resource limits
(RLIMIT_NPROC, RLIMIT_NOFILE). So queue them up and limit
parallelism that way.
Eric Wong [Wed, 19 Apr 2023 21:54:48 +0000 (21:54 +0000)]
cindex: support sha256 coderepos alongside sha1
This special support is only needed for --prune at the moment
since the indexing side works on a per-repo basis. There's no
automated tests, yet, but it seems to work well on my sha256
projects when sharing a cindex with sha1 projects.
Eric Wong [Tue, 18 Apr 2023 18:39:14 +0000 (18:39 +0000)]
www_coderepo: rescan cgit project-list for new coderepos
Coderepo changes are probably more common than inbox changes, so
it probably makes sense to rescan and look for new coderepos on
404s, especially since we serve mirrored manifest.js.gz as-is.
I noticed my git.kernel.org mirror was serving manifest.js.gz
pointing to irretrievable repositories. This should stop that.
We'll also drop the underscore ('_') and use `coderepo'
everywhere to be consistent with our documentation.
We may serve new inboxes in a similar way down the line, too;
but this change only affects coderepos for now since we can
guarantee the inbox manifest.js.gz never contains irretrievable
inboxes as it's dynamically generated.
Eric Wong [Wed, 12 Apr 2023 10:17:42 +0000 (10:17 +0000)]
listener: support multi-accept like nginx
While accepting a single connection at-a-time is likely best for
multi-worker and/or load-balanced deployments; accepting
multiple connections at once should be less bad on overloaded
single-worker systems.
We can't automatically pick the best value here since worker
counts are dynamic via SIGTTIN/SIGTTOU. Process managers
(e.g. systemd) can also spawn multiple instances sharing a
single listener with no knowledge sharing between listeners.
Eric Wong [Wed, 12 Apr 2023 06:19:10 +0000 (06:19 +0000)]
lei_mail_sync: cleanup stale/dangling fids if possible
I'm not sure how it happens or if/when it was fixed, but my
earliest lei installations have hit some
"E: fid=$fid for $oidhex unknown" messages on `lei import'
invocations.
This really should've enabled the foreign keys pragma to begin
with; but we'll probably start using that in the future. For
now, at least rely on a transaction to keep things consistent
in SQLite.
Eric Wong [Wed, 12 Apr 2023 00:13:02 +0000 (00:13 +0000)]
git: parallelize manifest_entry
This saves a few milliseconds per-epoch without incurring
any dependencies on the event loop. It can be parallelized
further, of course, but it may not be worth it for -extindex
users since it's already cached.
Eric Wong [Wed, 12 Apr 2023 00:12:58 +0000 (00:12 +0000)]
git: cat_async_step: reduce batch-command info checks
This improves readability for me. Instead of checking for `info '
requests of `--batch-command' in multiple places of every
common branch, do it once per-call and stash its result.
We'll also avoid storing `$bc' for now since the only other
check is in a cold path.
Eric Wong [Sun, 9 Apr 2023 22:30:13 +0000 (22:30 +0000)]
www_coderepo: use OnDestroy to render summary view
This lets us get rid of a /bin/sh process and allows us us to
rely on Qspawn to parallelize git commands.
Special treatment of the OnDestroy object is necessary to keep
its scope limited for MockHTTP. Neither the generic `plackup'
HTTP server and nor our -httpd/-netd needed this scope
limitation. As a result, summary() is now called inside an
anonymous sub to keep the memory overhead of the anonymous sub
itself as small as possible. Avoiding anonymous subs entirely
would be preferable for memory savings, but it's necessary for
PSGI.