Eric Wong [Sat, 28 Oct 2023 18:01:12 +0000 (18:01 +0000)]
examples/*.service: avoid `nobody' user on systemd
systemd complains about `User=nobody' since `nobody' has access
to all files which can't be mapped to a valid UID. We'll also
switch to `Group=ssl-cert' since that ought to be able to read
TLS certificates.
Eric Wong [Fri, 27 Oct 2023 22:21:11 +0000 (22:21 +0000)]
git: avoid extra stat(2) for git version
No sane installer will update executable files in place due to
ETXTBSY on execve. So save ourselves a stat(2) call by relying
on the special `CORE::stat(_)' case to reuse the cached result
from the `-x FILE' filetest in which().
Eric Wong [Fri, 27 Oct 2023 22:21:09 +0000 (22:21 +0000)]
spawn: avoid alloca in C pi_fork_exec
We don't have thread-safety to worry about, so just leave a few
allocations at process exit at worst. We'll also update some
comments about usage while we're at it.
Eric Wong [Fri, 27 Oct 2023 01:14:35 +0000 (01:14 +0000)]
lei: don't exit lei-daemon on ovv_begin failure
When ->ovv_begin is called in LeiXSearch->do_query in the top-level
lei-daemon process, $lei->{pkt_op_p} still exists. We must make
sure we're exiting the correct process since lei->out can call
lei->fail and lei->fail calls lei->x_it.
As to avoiding how I caused ->ovv_begin failures to begin with,
that's for a much bigger change...
Eric Wong [Thu, 26 Oct 2023 08:20:07 +0000 (08:20 +0000)]
cindex: clarify fatal vs non-fatal messages
cindex must be able to handle coderepos being deleted mid-run
since `public-inbox-clone --purge' may be running at the same
time. This is a step towards handling parallel invocations
of -cindex and public-inbox-clone as gracefully as possible
by improving error messages.
Eric Wong [Thu, 26 Oct 2023 08:20:06 +0000 (08:20 +0000)]
git: cleanup un-associated coderepo processes
It's possible to have many coderepos with no inbox association
that never see git->cleanup. So instead of tying git->cleanup
to inboxes, ensure it gets armed when ->watch_async is called
(since it's only called in our -netd or -httpd servers).
Eric Wong [Wed, 25 Oct 2023 15:33:49 +0000 (15:33 +0000)]
cindex: fix large prunes
When comm(1) has a lot of data to output, we must ensure we
explicitly close FDs of processes in previous stages of the
pipeline to ensure comm(1) to terminates properly.
This is difficult to test automatically with small test repos...
Fixes: 17b06aa32aac (cindex: start using run_await to simplify code)
While uncommon, some git repos have hundreds of thousands of
refs and slurping that output into memory can bloat the heap.
Introduce a sha_all sub in PublicInbox::SHA to loop until EOF
and rely on autodie for checking sysread errors.
Eric Wong [Wed, 25 Oct 2023 00:29:49 +0000 (00:29 +0000)]
cindex: use sysread for generating fingerprint
We use sysseek for this file handle elsewhere (since it's passed
to `git rev-list --stdin' multiple times), and sysread ensures
we can use a larger read buffer than the tiny 8K BUFSIZ Perl +
glibc is contrained to.
This also ensures we autodie on sysread failures, since the
autodie import for `read' was missing and we don't call `read'
anywhere else in this file.
Eric Wong [Wed, 25 Oct 2023 00:29:46 +0000 (00:29 +0000)]
cindex: use run_await to read extensions.objectFormat
This saves us the trouble of seeking ourselves by using existing
run_await functionality. We'll also be more robust to ensure we
only handle the result if the `git config' process exited without
a signal.
Eric Wong [Wed, 25 Oct 2023 00:29:45 +0000 (00:29 +0000)]
cindex: start using run_await to simplify code
This saves us some awaitpid calls. We can also start passing
hashref redirect elements directly to pipe and open perlops,
saving us the trouble of naming some variables.
Eric Wong [Wed, 25 Oct 2023 00:29:44 +0000 (00:29 +0000)]
cindex: use timer for inits
We'll need to be in the event loop to use run_await in parallel,
so we can't start processes outside of it. This change isn't
ideal, but it likely keeps the rest of our (hotter) code simpler.
Eric Wong [Wed, 25 Oct 2023 00:29:43 +0000 (00:29 +0000)]
cindex: avoid awaitpid for popen
We can use popen_rd to pass command and callbacks to a
callback sub. This is another step which may allow us
to get rid of the wantarray forms of popen_rd/popen_wr
in the future.
Eric Wong [Wed, 25 Oct 2023 00:29:41 +0000 (00:29 +0000)]
qspawn: simplify internal argument passing
Now that psgi_return is gone, we can further simplify our
internals to support only psgi_qx and psgi_yield. Internal
argument passing is reduced and we keep the command env and
redirects in the Qspawn object for as long as it's alive.
I wanted to get rid of finalize() entirely, but it seems
trickier to do when having to support generic PSGI.
Eric Wong [Wed, 25 Oct 2023 00:29:39 +0000 (00:29 +0000)]
drop psgi_return, httpd/async and GetlineBody
Now that psgi_yield is used everywhere, the more complex
psgi_return and it's helper bits can be removed. We'll also fix
some outdated comments now that everything on psgi_return has
switched to psgi_yield. GetlineResponse replaces GetlineBody
and does a better job of isolating generic PSGI-only code.
Eric Wong [Wed, 25 Oct 2023 00:29:32 +0000 (00:29 +0000)]
qspawn: introduce new psgi_yield API
This is intended to replace psgi_return and HTTPD/Async
entirely, hopefully making our code less convoluted while
maintaining the ability to handle slow clients on
memory-constrained systems
This was made possible by the philosophy shift in commit 21a539a2df0c
(httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28).
We'll still support generic PSGI via the `pull' model with a
GetlineResponse class which is similar to the old GetlineBody.
Eric Wong [Wed, 25 Oct 2023 00:29:30 +0000 (00:29 +0000)]
httpd/async: require IO arg
Callers that want to requeue can call PublicInbox::DS::requeue
directly and not go through the convoluted argument handling
via PublicInbox::HTTPD::Async->new.
Eric Wong [Wed, 25 Oct 2023 00:29:26 +0000 (00:29 +0000)]
psgi_qx: use a temporary file rather than pipe
A pipe requires more context switches, syscalls, and code to
deal with unpredictable pipe EOF vs waitpid ordering. So just
use the new spawn/aspawn features to automatically handle
slurping output into a string.
Eric Wong [Wed, 25 Oct 2023 00:29:25 +0000 (00:29 +0000)]
spawn: support synchronous run_qx
This is similar to `backtick` but supports all our existing spawn
functionality (chdir, env, rlimit, redirects, etc.). It also
supports SCALAR ref redirects like run_script in our test suite
for std{in,out,err}.
We can probably use :utf8 by default for these redirects, even.
Eric Wong [Wed, 25 Oct 2023 00:29:24 +0000 (00:29 +0000)]
limiter: split out from qspawn
It's slightly better organized this way, especially since
`publicinboxLimiter' has its own user-facing config section
and knobs. I may use it in LeiMirror and CodeSearchIdx for
process management.
Eric Wong [Thu, 19 Oct 2023 01:14:31 +0000 (01:14 +0000)]
lei: simplify startq/au_done wakeup notifications
We only need to write one byte at MUA start instead of a byte
for every LeiXSearch worker. Also, make sure it succeeds by
enabling autodie for syswrite.
When reading, we can rely on `:perlio' layer `read' semantics
to retry on EINTR to avoid looping and other error checking.
Eric Wong [Tue, 17 Oct 2023 23:38:05 +0000 (23:38 +0000)]
test_common: only hide TCP port in messages
v2:// lei outputs are on the filesystem, so putting $HOST:$PORT
is nonsensical. We'll also keep `127.0.0.1' or `[::1]' since
it's harmless and can point out obvious errors in system
configuration when testing with old Perls or libraries.
Eric Wong [Tue, 17 Oct 2023 23:37:58 +0000 (23:37 +0000)]
xt/git-http-backend: remove Net::HTTP usage
HTTP::Tiny is part of the Perl standard library since Perl 5.14
while Net::HTTP has never been (unlike Net::NNTP or Net::POP3).
For the test which forces server-side buffering, we'll just use
regular socket handle.
Eric Wong [Tue, 17 Oct 2023 23:37:54 +0000 (23:37 +0000)]
xap_helper: die more easily in both implementations
We don't need to tolerate bad requests since it's only handling
requests from the parent process. So simplify error management
and just die||exit if we get a bad request.
Eric Wong [Tue, 17 Oct 2023 23:37:52 +0000 (23:37 +0000)]
use read_all in more places to improve safety
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
Eric Wong [Tue, 17 Oct 2023 23:37:49 +0000 (23:37 +0000)]
git: introduce read_all function
This makes it easier to improve error checking, since the
`do { local $/; readline(FH) }' construct does not detect
errors (autodie does not cover `readline' or `<FH>').
I'm not sure exactly where this should be, but PublicInbox::Git
is used nearly everywhere in our code base and it's probably
not worth creating a new package for it.
Eric Wong [Tue, 17 Oct 2023 23:37:46 +0000 (23:37 +0000)]
lei_mirror: start converting to autodie
This code is too noisy and not critical for startup performance;
so autodie provides a nice noise reduction while improving error
reporting in most cases.
For places where failures are expected, the `CORE::' prefix
gives us an easy escape hatch to fall back to normal error
checking.
Eric Wong [Tue, 17 Oct 2023 10:11:06 +0000 (10:11 +0000)]
input_pipe: handle noncanonical TTY
lei could get a TTY in noncanonical mode for stdin, so rely on
VMIN+VTIME to get the desired non-blocking semantics we'd expect
from a pipe or socket. This ought to prevent read(2) (Perl sysread)
from returning zero when we really want to hit EAGAIN.
Eric Wong [Tue, 17 Oct 2023 10:11:05 +0000 (10:11 +0000)]
input_pipe: improve error handling
Ensure the callback is always guarded by `eval' to catch
exceptions and to force a ->close (EPOLL_CTL_DEL).
We also don't want to blindly set O_NONBLOCK on TTYs since their
O_NONBLOCK semantics aren't well-defined by POSIX. We can also
drop EPOLLET (edge-triggered) use to reduce the need to make
->requeue calls on our end.
Eric Wong [Tue, 17 Oct 2023 10:11:04 +0000 (10:11 +0000)]
lei: consolidate stdin slurp, fix warnings
We can share more code amongst stdin slurper (not streaming)
commands. This also fixes uninitialized variable warnings when
feeding an empty stdin to these commands.
Eric Wong [Sun, 15 Oct 2023 08:16:28 +0000 (08:16 +0000)]
learn: respect indexlevel for v1 inboxes
v2 never suffered from this bug, apparently, but -learn didn't
seem able to handle indexlevel=basic (nor respect `medium')
for v1 inboxes. I only noticed this bug because I converted
some ancient v1 inboxes to `basic' to save space.
Eric Wong [Fri, 13 Oct 2023 06:12:29 +0000 (06:12 +0000)]
xap_helper_cxx: allow sharing XDG_CACHE_HOME across ABIs
For users sharing home directories (or just XDG_CACHE_HOME)
across hosts of different architectures, we must use a compiler
and architecture-specific destination directory for storing the
binary result. Even on the same OS and architecture, different
C++ compilers may have different ABIs, so we must account for
that.
Eric Wong [Thu, 12 Oct 2023 00:21:00 +0000 (00:21 +0000)]
lei: quiet excessive write/seen messages
We don't want to end up dumping nr_seen/nr_write when progress
is disabled, nor do we want forked off `lei note-event' workers
dump them when DS->Reset is called on fork.
Eric Wong [Wed, 11 Oct 2023 07:20:57 +0000 (07:20 +0000)]
lei import|tag|rm: support --commit-delay=SECONDS
Delayed commits allows users to trade off immediate safety for
throughput and reduced storage wear when running multiple
discreet commands.
This feature is currently useful for providing a way to make
t/lei-store-fail.t reliable and for ensuring `lei blob' can
retrieve messages which have not yet been committed.
In the future, it'll also be useful for the FUSE layer to batch
git activity.
Eric Wong [Wed, 11 Oct 2023 07:20:55 +0000 (07:20 +0000)]
import: cat_blob is a no-op w/o live fast-import
cat_blob is a fallback for handling files which haven't made it
onto disk to be readable by `git cat-file'. Thus spawning a new
fast-import process to retrieve a blob is pointless, as cat_blob
is only used as a last resort when `git cat-file' fails.
Eric Wong [Wed, 11 Oct 2023 07:20:54 +0000 (07:20 +0000)]
import: switch to Unix stream socket for fast-import
We use fewer file descriptors and fewer lines of code this way.
I'm not aware of any place we rely on POSIX pipe semantics with
`git fast-import', and sockets have bigger buffers by default
in most cases (even if Linux allows larger pipe buffers).