Eric Wong [Thu, 2 Nov 2023 09:35:33 +0000 (09:35 +0000)]
replace ProcessIO with untied PublicInbox::IO
This fixes two major problems with the use of tie for filehandles:
* no way to do fcntl, stat, etc. calls directly on the tied handle,
forcing callers to use the `tied' perlop to access the underlying
IO::Handle
* needing separate classes to handle blocking and non-blocking I/O
As a result, Git->cleanup_if_unlinked, InputPipe->consume,
and Qspawn->_yield_start have fewer bizzare bits and we
can call `$io->blocking(0)' directly instead of
`(tied *$io)->{fh}->blocking(0)'
Having a PublicInbox::IO class will also allow us to support
custom read buffering which allows inspecting the current state.
Eric Wong [Thu, 2 Nov 2023 09:35:30 +0000 (09:35 +0000)]
treewide: use ->close to call ProcessIO->CLOSE
This will open the door for us to drop `tie' usage from
ProcessIO completely in favor of OO method dispatch. While
OO method dispatches (e.g. `$fh->close') are slower than normal
subroutine calls, it hardly matters in this case since process
teardown is a fairly rare operation and we continue to use
`close($fh)' for Maildir writes.
Eric Wong [Thu, 2 Nov 2023 09:35:29 +0000 (09:35 +0000)]
cindex: drop redundant close on regular FH
There's no need to waste optree space on close() statements for
file handles which are (effectively) read-only on their last use
and incapable of error checking in our Perl code (since they're
only read by git).
Let Perl refcounting take care of it so we have less code to
wade through when focusing on `close' statements which actually
matter.
Eric Wong [Thu, 2 Nov 2023 21:35:50 +0000 (21:35 +0000)]
ds: don't try ->close after ->accept_SSL failure
Eric Wong <e@80x24.org> wrote:
> --- a/lib/PublicInbox/DS.pm
> +++ b/lib/PublicInbox/DS.pm
> @@ -341,8 +341,8 @@ sub greet {
> my $ev = EPOLLIN;
> my $wbuf;
> if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
> - return CORE::close($sock) if $! != EAGAIN;
> - $ev = PublicInbox::TLS::epollbit() or return CORE::close($sock);
> + return $sock->close if $! != EAGAIN;
> + $ev = PublicInbox::TLS::epollbit() or return $sock->close;
> $wbuf = [ \&accept_tls_step, $self->can('do_greet')];
> }
> new($self, $sock, $ev | EPOLLONESHOT);
Noticed this on deploy:
-----8<-----
Subject: [PATCH] ds: don't try ->close after ->accept_SSL failure
->accept_SSL failures leaves the socket ref as a GLOB (not
IO::Handle) and unable to respond to the ->close method.
Calling close in any form isn't actually necessary at all,
so just let refcounting destroy the socket.
Eric Wong [Thu, 2 Nov 2023 09:35:28 +0000 (09:35 +0000)]
treewide: use ->close method rather than CORE::close
It's easier-to-read and should open the door for us to get rid
of `tie' for ProcessIO without performance penalties for
more frequently-used perlop calls and ability to do `stat' directly
on the object instead of the awkward `tied' thing.
Eric Wong [Thu, 2 Nov 2023 09:35:27 +0000 (09:35 +0000)]
ds: replace FD map hash table with array
FDs are array indices into the kernel, anyways, so we can
take advantage of space savings and speedups because the
majority of FDs a big process has is going to end up in
the array, anyways.
Eric Wong [Wed, 1 Nov 2023 06:31:48 +0000 (06:31 +0000)]
git: reschedule cleanup if active
This is necessary to reliably cleanup cat-file processes for
coderepos in long-lived -netd and -httpd processes if they
haven't been accessed in a while.
Eric Wong [Tue, 31 Oct 2023 20:42:55 +0000 (20:42 +0000)]
ds: make ->close behave like CORE::close
Matching existing Perl IO semantics seems like a good idea to
reduce confusion in the future.
We'll also fix some outdated comments and update indentation
to match the rest of our code base since we're far detached from
Danga::Socket at this point.
Eric Wong [Tue, 31 Oct 2023 20:42:53 +0000 (20:42 +0000)]
watch: simplify DirIdle object cleanup
There's no need to waste time nor reach into DS internals to
map FDs to Perl objects, here. LEI.pm has never had to deal
with integer FDs for DirIdle, either.
Eric Wong [Tue, 31 Oct 2023 20:42:52 +0000 (20:42 +0000)]
ds: move maxevents further down the stack
The epoll implementation is the only one which respects the
limit (kevent would, but IO::KQueue does not). In any case,
I'm not a fan of the maxevents=1000 historical default since
it leads to fairness problems with shared non-blocking listeners
across multiple daemon workers.
Eric Wong [Tue, 31 Oct 2023 20:42:51 +0000 (20:42 +0000)]
ds: do not defer close
We can map all integer FDs to Perl objects once ->ep_wait returns,
so there's no need to play tricks elsewhere to ensure FDs can
be mapped to objects within the same event loop iteration.
Eric Wong [Tue, 31 Oct 2023 20:42:50 +0000 (20:42 +0000)]
ds: next_tick: shorten object lifetimes
Drop reference counts ASAP in case it saves us some memory
sooner rather than later. This ought to give us more predictable
resource use and ensure OnDestroy callbacks fire sooner.
There's no need to use `local' to clobber the arrayref anymore,
either.
AFAIK, this doesn't fix any known bug, but more predictability
will make it easier to debug things going forward.
Eric Wong [Tue, 31 Oct 2023 20:34:51 +0000 (20:34 +0000)]
xap_helper.pm: quiet undefined die at shutdown
Another attempt at doing what commit 35de8fdcbf290e25
(xap_helper.pm: quiet undefined warnings at shutdown, 2023-10-23)
tried to do. It turns out perl croaks (not warn/carp) when it sees
an undefined file handle, here.
Eric Wong [Mon, 30 Oct 2023 18:29:40 +0000 (18:29 +0000)]
poll+select: check EBADF + POLLNVAL errors
I hit this in via select running -cindex with some other
experimental patches. I can't reproduce the problem, though,
but this ensure we have a chance to diagnose it if it happens
again instead of looping on select(2) => EBADF.
Eric Wong [Sat, 28 Oct 2023 18:01:12 +0000 (18:01 +0000)]
examples/*.service: avoid `nobody' user on systemd
systemd complains about `User=nobody' since `nobody' has access
to all files which can't be mapped to a valid UID. We'll also
switch to `Group=ssl-cert' since that ought to be able to read
TLS certificates.
Eric Wong [Fri, 27 Oct 2023 22:21:11 +0000 (22:21 +0000)]
git: avoid extra stat(2) for git version
No sane installer will update executable files in place due to
ETXTBSY on execve. So save ourselves a stat(2) call by relying
on the special `CORE::stat(_)' case to reuse the cached result
from the `-x FILE' filetest in which().
Eric Wong [Fri, 27 Oct 2023 22:21:09 +0000 (22:21 +0000)]
spawn: avoid alloca in C pi_fork_exec
We don't have thread-safety to worry about, so just leave a few
allocations at process exit at worst. We'll also update some
comments about usage while we're at it.
Eric Wong [Fri, 27 Oct 2023 01:14:35 +0000 (01:14 +0000)]
lei: don't exit lei-daemon on ovv_begin failure
When ->ovv_begin is called in LeiXSearch->do_query in the top-level
lei-daemon process, $lei->{pkt_op_p} still exists. We must make
sure we're exiting the correct process since lei->out can call
lei->fail and lei->fail calls lei->x_it.
As to avoiding how I caused ->ovv_begin failures to begin with,
that's for a much bigger change...
Eric Wong [Thu, 26 Oct 2023 08:20:07 +0000 (08:20 +0000)]
cindex: clarify fatal vs non-fatal messages
cindex must be able to handle coderepos being deleted mid-run
since `public-inbox-clone --purge' may be running at the same
time. This is a step towards handling parallel invocations
of -cindex and public-inbox-clone as gracefully as possible
by improving error messages.
Eric Wong [Thu, 26 Oct 2023 08:20:06 +0000 (08:20 +0000)]
git: cleanup un-associated coderepo processes
It's possible to have many coderepos with no inbox association
that never see git->cleanup. So instead of tying git->cleanup
to inboxes, ensure it gets armed when ->watch_async is called
(since it's only called in our -netd or -httpd servers).
Eric Wong [Wed, 25 Oct 2023 15:33:49 +0000 (15:33 +0000)]
cindex: fix large prunes
When comm(1) has a lot of data to output, we must ensure we
explicitly close FDs of processes in previous stages of the
pipeline to ensure comm(1) to terminates properly.
This is difficult to test automatically with small test repos...
Fixes: 17b06aa32aac (cindex: start using run_await to simplify code)
While uncommon, some git repos have hundreds of thousands of
refs and slurping that output into memory can bloat the heap.
Introduce a sha_all sub in PublicInbox::SHA to loop until EOF
and rely on autodie for checking sysread errors.
Eric Wong [Wed, 25 Oct 2023 00:29:49 +0000 (00:29 +0000)]
cindex: use sysread for generating fingerprint
We use sysseek for this file handle elsewhere (since it's passed
to `git rev-list --stdin' multiple times), and sysread ensures
we can use a larger read buffer than the tiny 8K BUFSIZ Perl +
glibc is contrained to.
This also ensures we autodie on sysread failures, since the
autodie import for `read' was missing and we don't call `read'
anywhere else in this file.
Eric Wong [Wed, 25 Oct 2023 00:29:46 +0000 (00:29 +0000)]
cindex: use run_await to read extensions.objectFormat
This saves us the trouble of seeking ourselves by using existing
run_await functionality. We'll also be more robust to ensure we
only handle the result if the `git config' process exited without
a signal.
Eric Wong [Wed, 25 Oct 2023 00:29:45 +0000 (00:29 +0000)]
cindex: start using run_await to simplify code
This saves us some awaitpid calls. We can also start passing
hashref redirect elements directly to pipe and open perlops,
saving us the trouble of naming some variables.
Eric Wong [Wed, 25 Oct 2023 00:29:44 +0000 (00:29 +0000)]
cindex: use timer for inits
We'll need to be in the event loop to use run_await in parallel,
so we can't start processes outside of it. This change isn't
ideal, but it likely keeps the rest of our (hotter) code simpler.
Eric Wong [Wed, 25 Oct 2023 00:29:43 +0000 (00:29 +0000)]
cindex: avoid awaitpid for popen
We can use popen_rd to pass command and callbacks to a
callback sub. This is another step which may allow us
to get rid of the wantarray forms of popen_rd/popen_wr
in the future.
Eric Wong [Wed, 25 Oct 2023 00:29:41 +0000 (00:29 +0000)]
qspawn: simplify internal argument passing
Now that psgi_return is gone, we can further simplify our
internals to support only psgi_qx and psgi_yield. Internal
argument passing is reduced and we keep the command env and
redirects in the Qspawn object for as long as it's alive.
I wanted to get rid of finalize() entirely, but it seems
trickier to do when having to support generic PSGI.
Eric Wong [Wed, 25 Oct 2023 00:29:39 +0000 (00:29 +0000)]
drop psgi_return, httpd/async and GetlineBody
Now that psgi_yield is used everywhere, the more complex
psgi_return and it's helper bits can be removed. We'll also fix
some outdated comments now that everything on psgi_return has
switched to psgi_yield. GetlineResponse replaces GetlineBody
and does a better job of isolating generic PSGI-only code.
Eric Wong [Wed, 25 Oct 2023 00:29:32 +0000 (00:29 +0000)]
qspawn: introduce new psgi_yield API
This is intended to replace psgi_return and HTTPD/Async
entirely, hopefully making our code less convoluted while
maintaining the ability to handle slow clients on
memory-constrained systems
This was made possible by the philosophy shift in commit 21a539a2df0c
(httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28).
We'll still support generic PSGI via the `pull' model with a
GetlineResponse class which is similar to the old GetlineBody.
Eric Wong [Wed, 25 Oct 2023 00:29:30 +0000 (00:29 +0000)]
httpd/async: require IO arg
Callers that want to requeue can call PublicInbox::DS::requeue
directly and not go through the convoluted argument handling
via PublicInbox::HTTPD::Async->new.
Eric Wong [Wed, 25 Oct 2023 00:29:26 +0000 (00:29 +0000)]
psgi_qx: use a temporary file rather than pipe
A pipe requires more context switches, syscalls, and code to
deal with unpredictable pipe EOF vs waitpid ordering. So just
use the new spawn/aspawn features to automatically handle
slurping output into a string.
Eric Wong [Wed, 25 Oct 2023 00:29:25 +0000 (00:29 +0000)]
spawn: support synchronous run_qx
This is similar to `backtick` but supports all our existing spawn
functionality (chdir, env, rlimit, redirects, etc.). It also
supports SCALAR ref redirects like run_script in our test suite
for std{in,out,err}.
We can probably use :utf8 by default for these redirects, even.
Eric Wong [Wed, 25 Oct 2023 00:29:24 +0000 (00:29 +0000)]
limiter: split out from qspawn
It's slightly better organized this way, especially since
`publicinboxLimiter' has its own user-facing config section
and knobs. I may use it in LeiMirror and CodeSearchIdx for
process management.
Eric Wong [Thu, 19 Oct 2023 01:14:31 +0000 (01:14 +0000)]
lei: simplify startq/au_done wakeup notifications
We only need to write one byte at MUA start instead of a byte
for every LeiXSearch worker. Also, make sure it succeeds by
enabling autodie for syswrite.
When reading, we can rely on `:perlio' layer `read' semantics
to retry on EINTR to avoid looping and other error checking.
Eric Wong [Tue, 17 Oct 2023 23:38:05 +0000 (23:38 +0000)]
test_common: only hide TCP port in messages
v2:// lei outputs are on the filesystem, so putting $HOST:$PORT
is nonsensical. We'll also keep `127.0.0.1' or `[::1]' since
it's harmless and can point out obvious errors in system
configuration when testing with old Perls or libraries.
Eric Wong [Tue, 17 Oct 2023 23:37:58 +0000 (23:37 +0000)]
xt/git-http-backend: remove Net::HTTP usage
HTTP::Tiny is part of the Perl standard library since Perl 5.14
while Net::HTTP has never been (unlike Net::NNTP or Net::POP3).
For the test which forces server-side buffering, we'll just use
regular socket handle.
Eric Wong [Tue, 17 Oct 2023 23:37:54 +0000 (23:37 +0000)]
xap_helper: die more easily in both implementations
We don't need to tolerate bad requests since it's only handling
requests from the parent process. So simplify error management
and just die||exit if we get a bad request.
Eric Wong [Tue, 17 Oct 2023 23:37:52 +0000 (23:37 +0000)]
use read_all in more places to improve safety
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.