Eric Wong [Thu, 26 Sep 2024 00:55:01 +0000 (00:55 +0000)]
user_content: simplify internal API and use v5.12
We use {env} and {ibx} everywhere so there's no point in
unpacking args. There's no odd unicode_strings problems
here, either, so we can use v5.12 and autodie to reduce
`or die' checks.
Eric Wong [Tue, 24 Sep 2024 18:35:48 +0000 (18:35 +0000)]
viewvcs: fix b= generation in $REPO/tree/ listing
Queries such as `b=contrib/cssREADME' are incorrect despite
having the actual blob OID for the given file. Add a trailing
slash for files in a project subdirectory in those cases as we
do for cases we don't have a known path name.
While we're in the area, avoid needless shadowing of the `$t'
var and add a comment to describe its contents.
Eric Wong [Mon, 16 Sep 2024 21:03:01 +0000 (21:03 +0000)]
www: test address URL-fication
Probably more tests coming, but setup stuff is still on the slow
side. While email addresses can be all sorts of uncommon
characters, I'm also fairly certain we can disallow the [&;<>]
set from being URL-fied.
Eric Wong [Mon, 16 Sep 2024 21:03:00 +0000 (21:03 +0000)]
test_common: improve psgi test setup + loading
Since we have many PSGI tests nowadays, put in a `psgi' shortcut
like we do for many other components for `require_mods' to make
it easier to load a consistent set of modules.
We'll also cut down on `require_ok' and `use_ok' tests since
they should be limited to code maintained in our source tree,
not 3rd-party dependencies.
Eric Wong [Mon, 16 Sep 2024 21:02:58 +0000 (21:02 +0000)]
config: ignore blank address= and listid= entries
At the minimum, there must be a non-space character in
address= and listid= entries for matches to occur.
Filter out the obviously unmatchable entries here to
avoid potential problems elsewhere.
Eric Wong [Mon, 16 Sep 2024 09:53:59 +0000 (09:53 +0000)]
t/feed: fix uninitialized variable warnings
These warnings only happened under test conditions and never
when running under any PSGI servers. In retrospect, t/feed.t is
likely redundant nowadays and ought to be folded into existing
PSGI tests so we don't have to consider setup problems like
these.
Fixes: bbe582cdfa429 ("view: fix addr2urlmap with Plack::Builder::mount")
Eric Wong [Fri, 13 Sep 2024 22:07:24 +0000 (22:07 +0000)]
view: disable address URL-fication of possible HTML escapes
In case somebody uses local email address of `lt' or `gt' (with
no domain component, or something matching /#\d+/a), disable
URL-fication of such addresses to prevent breaking HTML output.
Somebody with better Perl regexp knowledge than I can attempt to
write a regexp which functions like \b but avoids matching `&'
to allow such local email addresses. But I suspect the use of
local-only email addresses to be limited and this isn't a real
problem in practice.
Eric Wong [Fri, 13 Sep 2024 22:07:23 +0000 (22:07 +0000)]
view: fix addr2urlmap with Plack::Builder::mount
Plack::App::URLMap does not preserve SCRIPT_NAME set for PSGI
`mount' directives when running response callbacks. Thus we
must get $ibx->base_url($ctx->{env}) calls to generate correct
full URLs when relying on publicinbox.nameIsUrl up front before
the PSGI response callback is returned.
Eric Wong [Tue, 10 Sep 2024 00:40:48 +0000 (00:40 +0000)]
view: fix x-post links for relative urls
We need to make correct relative URL paths for users configuring
publicinbox.$NAME.url as relative URL paths (e.g. matching the
inbox `$NAME').
Users of protocol-relative (e.g. `//$HOST/$NAME') and absolute URIs
(e.g `https://example.com/$NAME') were unaffected by this bug.
Users relying on publicinbox.nameIsUrl and omitting
publicinbox.*.url entries were also immune to this bug.
Automated tests are in progress and will come in a separate commit.
Eric Wong [Wed, 11 Sep 2024 21:25:49 +0000 (21:25 +0000)]
www: preload all inboxes if using ->ALL
This ought to improve memory layout and ensure the regexp
for address => inbox linkification works when hitting
/$EXTINBOX/$MSGID/ links first (instead of /$INBOX/$MSGID)
This fill_all call is redundant for cindex users who get the
preload anyways, but necessary for non-cindex users.
This should also avoid the broken/empty regexps problem described in 3b51fcc196e3 (view: fix addr2url mapping corruption, 2024-09-06)
Eric Wong [Fri, 6 Sep 2024 23:29:03 +0000 (23:29 +0000)]
view: fix addr2url mapping corruption
We must avoid generating a qr/\b()\b/ regexp which matches
every word boundary. This is caused by a particular set of
circumstances for WWW instances:
1. extindex must be in use
2. cindex must NOT be in use OR WWW->preload wasn't used
(custom .psgi or non-p-i-{httpd,netd} users)
3. first HTTP request hits /$EXTINDEX/$MSGID/
(where $EXTINDEX is typically `all')
On extindex-using instances without a cindex configured, the
first HTTP request hitting the extindex encounters an empty
{-by_addr} hash table. This empty {-by_addr} hash table causes
View->addr2urlmap() to return an all-matching regexp which
corrupts HTML when attempting address substitutions.
cindex-using instances avoid the problem by triggering
_fill_all() during PublicInbox::WWW->preload and ensuring
{-by_addr} of the PublicInbox::Config object is populated.
Thanks to Konstantin for the initial report and Filip for the
immensely helpful explanation of the problem.
Eric Wong [Sat, 31 Aug 2024 08:17:56 +0000 (08:17 +0000)]
tests: skip ENOSPC injection on restricted systems
Yama will not allow ptrace(2) on existing processes (only new
ones) if the kernel.yaml.ptrace_scope sysctl is non-zero. Skip
those tests for now since the majority of strace(1) testing
is probably done on systems without ptrace restrictions.
Eric Wong [Fri, 30 Aug 2024 20:36:35 +0000 (20:36 +0000)]
view: fix unclosed parentheses after `raw' link
This formatting error was accidentally introduced while
converting a `qq{}' concatenation to a `say' statement. Re-add
the `)'. While we're at it, switch to a `print' statement
since we use a string literal anyways and `say' would require an
extra global variable lookup at runtime.
Eric Wong [Thu, 29 Aug 2024 23:26:03 +0000 (23:26 +0000)]
solver: use async_check for the temporary git repo
While the temporary git repo is likely in cache and not
subject to high seek latency as normal code repos are,
inflating objects still takes a non-trivial amount of time.
So use this as an opportunity to serve other clients and
exploit parallelism in SMP systems.
Eric Wong [Thu, 29 Aug 2024 23:26:02 +0000 (23:26 +0000)]
solver: use xap_helper for async search if available
The async search API using xap_helper allows -httpd/netd users
to exploit storage and CPU-level parallelism via sockets. It is
another step towards reducing head-of-line blocking in our Perl
event loop. This reduces the effect of slow storage and extremely
large search results on unrelated HTTP requests.
Eric Wong [Thu, 29 Aug 2024 23:26:01 +0000 (23:26 +0000)]
solver: use async check (`info') for coderepo
Async --batch-check or `info' batch commands allow our Perl
process to handle other requests while git is busy waiting
on slow storage or CPUs to retrieve blob information.
This improves parallelism for SMP machines in addition to
allowing the Perl process to service other HTTP/NNTP/IMAP/POP3
requests while waiting for disk seeks, zlib inflation, and
delta resolution.
Checking stderr for error hints is now potentially racy, but
it's only a hint so overall performance under worst case
scenarios is preferable to correctness.
Eric Wong [Fri, 30 Aug 2024 19:05:29 +0000 (19:05 +0000)]
t/v2writable: avoid failure on strace un-readyiness
poll(2) uses milliseconds, IO::Poll::_poll doesn't abstract that,
nor does our ->poll_in wrapper. This ensures we wait enough time
for strace to start up on overloaded systems.
Eric Wong [Fri, 30 Aug 2024 19:05:15 +0000 (19:05 +0000)]
lei: increase umask timeout
On slow or overloaded systems, 2 seconds may not be sufficient
time to wait for a lei client to respond to the umask request
from lei-daemon. Use 60s to be consistent with the FD transfer
in the general case.
While we're at it, consistently use poll_in() now that it exists
since it's a better API than vec() + select() and will give
consistent performance regardless of the FD value.
Eric Wong [Fri, 23 Aug 2024 16:30:29 +0000 (16:30 +0000)]
tls: set SSL_OP_NO_COMPRESSION explicitly
TLS compression is susceptible to the CRIME attack and
per-connection zlib contexts waste memory for idle clients.
While compression should already be off by default in modern OpenSSL;
Net::SSLeay::CTX_get_mode reveals OP_NO_COMPRESSION was not set
when created by IO::Socket::SSL::SSL_Context->new. So set it
explicitly to ensure it's really off.
Eric Wong [Tue, 20 Aug 2024 18:40:59 +0000 (18:40 +0000)]
t/sigfd: reduce getpid() calls and hash lookups
getpid() is no longer cached by glibc, syscalls are more
expensive nowadays, so only call it once per test. The
additional hash table depth is no longer necessary since there's
no longer a difference between signal dispatch methods now that
Sigfd uses the global %SIG.
Eric Wong [Tue, 20 Aug 2024 10:35:21 +0000 (10:35 +0000)]
lei_xsearch: allow signals during long queries
Xapian ->mset, remote Xapian calls via remote inboxes, and
lcat dumps can take a long time via wq_io_do and hold
lei_xsearch processes open for too long after a client
disconnects prematurely.
This fixes wait_for_eof shutdown timeouts on the lei-daemon quit
pipe when running t/lei-sigpipe.t with GIANT_INBOX_DIR pointed
to a meta@public-inbox.org mirror on my old laptop.
Eric Wong [Tue, 20 Aug 2024 10:35:20 +0000 (10:35 +0000)]
lei: allow Ctrl-C to interrupt IMAP+NNTP reads
Mail::IMAPClient and Net::NNTP remain synchronous APIs with
indefinite wait times on slow/unreliable connections or servers.
Since these APIs don't play nicely with signalfd or
EVFILT_SIGNAL, we will temporarily drop the reliable (but
sometimes delayed) signal handling mechanisms in favor of the
less reliable built-in signal handling of Perl to provide a
best-effort attempt to handle signals during slow operations.
Eric Wong [Tue, 20 Aug 2024 10:35:19 +0000 (10:35 +0000)]
sigfd: call normal Perl %SIG handlers
Instead of storing our own mapping of signal handler callbacks,
rely on the standard %SIG hash table which can be arbitrarily
updated from anywhere.
This makes it easier to allow existing synchronous code (e.g.
NetReader using Mail::IMAPClient or Net::NNTP) to add explicit
points where pending signals can be checked.
Additionally, it allows the `DEFAULT' (SIG_DFL) signal handler
to fire when there's no Perl subroutine to register.
Finally, this also allows us to rely on the OS + Perl itself to
dispatch signal handlers on kevent-based systems (and avoid
redundant dispatch due to our (previous) Linux-centric API). It
makes Linux signalfd the only system where we'd need to dispatch
%SIG callbacks ourselves.
Eric Wong [Tue, 20 Aug 2024 10:35:17 +0000 (10:35 +0000)]
treewide: handle EINTR for non-(signalfd|kevent)
We may encounter new architectures in Linux without syscall
number definitions or *BSD systems without IO::KQueue or kevent
support at all, so be prepared to handle signals anywhere within
the event loop in such cases.
Eric Wong [Sat, 10 Aug 2024 09:00:12 +0000 (09:00 +0000)]
extindex: support per-inbox indexheader+altid
This allows the venerable altid (e.g. gmane:1234) to finally
work for extindex users. The newer indexheader directive works
here, too. This allows a multi-inbox extindex to fully emulate
the capabilities of per-inbox Xapian indices.
For now, per-inbox indexheader and altid DO NOT work when
searching the extindex directly. In other words, gmane:1234
might work on the /git/ inbox, but not the /all/ extindex
virtual inbox. This may remain the case since altid is
typically per-inbox only, and stuff like X-Archives-Hash
can be global across inboxes.
Eric Wong [Sat, 10 Aug 2024 09:00:07 +0000 (09:00 +0000)]
www: don't memoize ->user_help contents
Generating it is cheap enough and not worth the extra memory
and long-lived allocations. We can avoid allocating a
Xapian::QueryParser object here, too, to avoid wasting memory
for xap_helper external process users.
Eric Wong [Sat, 10 Aug 2024 09:00:04 +0000 (09:00 +0000)]
search: help: avoid ':' in user prefixes
The non-':'-suffixed variation of the string is already used as
hash keys and literals elsewhere. Theoretically, a Perl
implementation can save some allocations this way (though Perl 5
currently doesn't).
In any case, we'll introduce a help2txt method to allow sharing
code between the callers in WwwText and Documentation/common.perl
Eric Wong [Sat, 10 Aug 2024 09:00:03 +0000 (09:00 +0000)]
indexheader: deduplicate common values
Since we plan on sharing IndexHeader across multiple inboxes for
large installations with thousands of inboxes, it makes sense to
deduplicate the values to save some memory at the cost of
increased startup time.
Eric Wong [Sat, 10 Aug 2024 09:00:02 +0000 (09:00 +0000)]
search: support per-inbox indexheader directive
This allows indexing arbitrary headers to allow filtering by
boolean terms or existing text rules. Disabling RFC 2047
decoding is supported, as well.
This also refactors AltId support to rely on the same mechanisms
as the IndexHeader class for indexing, user help, and
Xapian::QueryParser setup via both bindings and external
XapHelper process to avoid adding complexity to Search.pm and
SearchIdx.pm.
We'll finally document altid support in public-inbox-config(5)
since we're in the area, as it's been a stable feature for many
years, now.
Eric Wong [Wed, 14 Aug 2024 00:16:44 +0000 (00:16 +0000)]
lei_search: make missing Xapian docs for kw lookups
Missing keyword entries should be non-fatal since Xapian
data is always less important than what's in git and SQLite.
As such, Xapian data has and remains written last, leaving
the possibility of documents being missing from Xapian but
present in SQLite and git.
This improves recovery dealing with badly interrupted or failed
imports due to bugs or hardware failures.
Eric Wong [Wed, 14 Aug 2024 00:16:43 +0000 (00:16 +0000)]
v2writable: confess on broken {idx_shards}
There's a bug in `lei import' introduced in 4ff8e8d21ab5
(lei/store: stop shard workers + cat-file on idle, 2024-04-16)
which causes {idx_shards} to not be recreated properly.
Hopefully this can help me track it down since it's not easily
reproducible.
Eric Wong [Fri, 26 Jul 2024 21:59:26 +0000 (21:59 +0000)]
watch: add per-directory scanning diagnostics
This may help track down problems associated with a single
directory. Note we emit a separate message for each of the
`new' and `cur' subdirectories of a Maildir. Full scans only
happen at startup (or manually), so it shouldn't be too noisy
if logging to syslog.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:59:25 +0000 (21:59 +0000)]
watch: only open one directory at a time when scanning
This avoids EMFILE/ENFILE for large setups with many Maildir
watch directives. It also makes adding per-directory scanning
messages easier in the next commit.
Eric Wong [Fri, 26 Jul 2024 21:59:24 +0000 (21:59 +0000)]
watch: more details about full scan start/completion
Start and stop happens infrequently and may be useful for
diagnosing problems about missing messages. A future change
will add more details about per-directory scans.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:31:11 +0000 (21:31 +0000)]
t/v2writable: use 5.10.1 and autodie more
Switching to Perl v5.12 will require more review due to
unicode_strings, but 5.10.1 is an easy change and we can rely
more on autodie to simplify error checking.
Eric Wong [Fri, 26 Jul 2024 21:31:09 +0000 (21:31 +0000)]
msgmap: mid_insert: reraise on unexpected errors
SQLITE_CONSTRAINT is the only SQLite error we really expect under
normal circumstances. This avoids infinite loops when writing
to inboxes after hitting ENOSPC.
Eric Wong [Sun, 7 Jul 2024 06:01:58 +0000 (06:01 +0000)]
www: replace *eml_entry with *emit_eml
This further reduces the amount of copies, temporary strings,
and scratchpad use started way back in 2022. With a 700+
message thread on a /T/ endpoint, this saves roughly 1-2% time
and roughly 100 KB of memory.
Eric Wong [Sun, 7 Jul 2024 05:57:27 +0000 (05:57 +0000)]
t/www_listing: use autodie, reduce useless tests
Noisy error checking is noisy and less useful than autodie
diagnostics in case of failure. Furthermore, Most of the xsys()
failures would not allow us to continue, so favor xsys_e() in
those places.
Eric Wong [Sun, 7 Jul 2024 05:57:26 +0000 (05:57 +0000)]
www: manifest.js.gz handles If-Modified-Since
While we can't avoid the expensive manifest.js.gz generation,
non-Varnish users now get the bandwidth savings from seeing a
304 response. This has no effect on Varnish users since Varnish
will forward the request to us without If-Modified-Since if it
gets a cache miss, and handle 304 for us on cache hits.
Eric Wong [Thu, 4 Jul 2024 02:20:55 +0000 (02:20 +0000)]
http: don't requeue if using write buffer
The write buffering will already be processed inside
->event_step, so requeue will cause a needless read(2) outside
of epoll_wait/kevent(2) readiness notifications.
This ought to avoid problems in case of pipelined connections,
but those aren't possible behind a reverse proxy and AFAIK most
HTTP clients don't do pipelining. This bug was only noticed via
strace while searching for extra syscalls, and not from
real-world use.
Eric Wong [Tue, 25 Jun 2024 18:49:37 +0000 (18:49 +0000)]
speedup $EXTRACT_DIFFS callers by 1%
While Perl docs recommend against using //o, we know the regexp
won't change at runtime and there's a measurable improvement to
be found. The included perf test on a packed mirror of
meta@public-inbox.org shows a consistent ~1% improvement on my
system.
cmd_authenticate() replies to AUTHENTICATE commands with "+" CRLF but
the imap4rev1 RFC [^0] defines the following ABNF syntax for a continuation
request:
Eric Wong [Thu, 20 Jun 2024 22:54:34 +0000 (22:54 +0000)]
http: don't store `127.0.0.1' for idle clients
For persistent HTTP clients, we can set REMOTE_ADDR lazily
for the common `127.0.0.1' value and save a few bytes when
dealing with idle connections which linger inbetween requests.
Eric Wong [Wed, 19 Jun 2024 23:41:04 +0000 (23:41 +0000)]
http: use writev for known Content-Length responses
We could use sendmsg(2) without MSG_MORE here, too, but
writev(2) is simpler to setup and call and we may want to use it
with pipes or regular files in the future, too, not just sockets.
Eric Wong [Wed, 19 Jun 2024 23:41:02 +0000 (23:41 +0000)]
use sendmsg w/ MSG_MORE to reduce syscalls
In places where we made multiple send(..., MSG_MORE) calls in
quick succession, we now use sendmsg(2) to provide the same
semantics with fewer syscalls. While this may be less efficient
inside the kernel for small messages, syscalls are expensive
nowadays and we can avoid userspace copies and large allocations
when streaming large HTTP chunks in /T/, /t/, and t.mbox.gz
endpoints.
This allows *BSD systems lacking MSG_MORE to save some syscalls
when writing HTTP chunked encoding, among other things.
Eric Wong [Wed, 19 Jun 2024 23:41:01 +0000 (23:41 +0000)]
ds: update indentation to match rest of source
Our changes aren't compatible with Danga::Socket at all at this
point. While we're at it, depend more on subroutine prototypes
to get some compile-time checking.
Eric Wong [Sun, 16 Jun 2024 23:35:32 +0000 (23:35 +0000)]
www: search patch subject in #related query
Blob OIDs would not be accurate for merges and fuzzy
applications, so include the commit title/Subject to
increase the likelyhood of finding related commits.
Eric Wong [Sun, 16 Jun 2024 23:35:30 +0000 (23:35 +0000)]
www: merge dfblob query data
We combine pre and post-image blob OIDs anyways for the #related
query, so there's no need to have separate arrays to store their
intermediate values. We'll also rename {-qry} to {-qry_dfblob}
in preparation of subject-based searches.
Eric Wong [Mon, 17 Jun 2024 00:01:40 +0000 (00:01 +0000)]
www: strip and redirect on `<' and `>' in MSGID of URL
Some users may needlessly include `<' and `>' braces in URLs, so
account for this common mistake and redirect users to the
non-braced URL. This common mistake could be learned behavior
from other sites (e.g. sr.ht) which include `<' and `>' in URLs.
Eric Wong [Tue, 11 Jun 2024 18:54:42 +0000 (18:54 +0000)]
solver_git: workaround truncated `b' path in patch
For messages like <780a3faf-9e44-64f4-a354-bdee39af3af5@redhat.com>
where the "diff --git" line is truncated, favor the filename from
the "+++ b/" line.
Eric Wong [Mon, 10 Jun 2024 11:34:27 +0000 (11:34 +0000)]
www: deduplicate Message-ID in threading + skeleton
xt/perf-threading.t reports a small 0.5-1.0% memory reduction in
non-ancient Perls with CoW strings for threading alone (w/o
rendering the View.pm stuff).
On informal tests using -httpd and giant Linux stable patch set
threads (700+ messages), this ends up being roughly 5MB saved in
/T/ rendering since we use the {mid} field again in the
$ctx->{mapping} table. This becomes even more beneficial if
handling parallel HTTP requests for messages in the same message
thread, even across different endpoints.
Eric Wong [Sun, 9 Jun 2024 20:05:23 +0000 (20:05 +0000)]
gzip_filter: use zlib DEF_MEM_LEVEL for gzip
Compress::Raw::Zlib uses MAX_MEM_LEVEL by default which deviates
fom the zlib default. Since the zlib default is good enough for
git, nginx and varnish: it's good enough for our use. This
change reduces maximum zlib memory use by 1/3.
There's also a new note explaining why gzip happens in Perl
instead of varnish || nginx.
Eric Wong [Thu, 6 Jun 2024 07:44:16 +0000 (07:44 +0000)]
www: reduce fragmentation in /t/ and /T/ endpoints
For giant threads with /t/ and /T/ endpoints, avoid generating a
large string with a medium lifetime for the thread skeleton
($ctx->{skel}). Instead, make $ctx->{skel} an arrayref and use
it to store a bunch of smaller strings, instead.
While keeping many small strings is inefficient due to pointer
chasing; forcing a smaller distribution of sizes makes it easier
for the malloc implementation to organize and find small chunks
of memory instead of having to find (and hold) larger contiguous
chunks. When a large string is created now, it's lifetime is
kept as short as possible to decrease its likelyhood of causing
fragmentation.
Preliminary testing shows this appears to reduce RSS by roughly
20-40% under both glibc malloc (using a tiny
MALLOC_MMAP_THRESHOLD_=67000) on 32-bit and jemalloc 5.2.1 on
64-bit with standard settings.
Eric Wong [Thu, 6 Jun 2024 07:44:15 +0000 (07:44 +0000)]
treewide: use cached git executable lookup
Repeated stat(2) syscalls are more expensive nowadays due
to CPU vulnerability mitigations and this change also
allows bypassing some heap allocations done by Perl.
Eric Wong [Thu, 6 Jun 2024 07:44:12 +0000 (07:44 +0000)]
treewide: use \*STD(IN|OUT|ERR) consistently
Referencing the {IO} slot may not always be populated or work
(e.g. with `-t' filetest) if there's no IO handle. Using merely
using `\*' is shorter than typing out `{GLOB}', so just use the
shortest form consistently.
This may fix occasional and difficult-to-reproduce failures from
redirecting STDERR in t/imap_searchqp.t
Eric Wong [Wed, 5 Jun 2024 20:03:23 +0000 (20:03 +0000)]
searchview: avoid uninitialized vals in %rmap_inc
Modules (e.g. `PublicInbox::Gcf2') may have an undef value in
the %rmap_inc hash table if an attempt has been made to load it
and failed due to a missing libgit2-dev dependency. Avoid using
it in interpolation to avoid warnings.
Eric Wong [Tue, 4 Jun 2024 22:25:20 +0000 (22:25 +0000)]
mda: do not auto-create Xapian indices
As with -learn, -mda now detects indexlevel=basic without an
explicit config setting for inboxes which only have SQLite
files. Omitting indexlevel=basic in the config file allows
users to reduce configuration file size (and RAM usage).
We'll also ensure completely unindexed v1 inboxes can stay
unindexed despite the default being indexlevel=full.
git.git commit f4aa8c8b (fetch/clone: detect dubious ownership
of local repositories, 2024-04-10) has proven to be overly aggressive
and breaks existing setups where git-http-backend is serving
read-only repositories from reasonably trusted sources and not
running hooks of any sort.
Just mark everything as safe since our public-facing instances
have always assumed writes to all git repos come from a
different user than whatever user -netd/-httpd runs as.
Eric Wong [Thu, 30 May 2024 09:45:15 +0000 (09:45 +0000)]
git: reduce spawning for rev-parse --git-path
Since every non-worktree git repo has an `objects' directory, we
can quickly stat(2) to check for its presence and avoid an
expensive process spawn. This should be the common case on
servers since it's rare to use worktrees on servers for
coderepos (or inboxes).
Eric Wong [Thu, 30 May 2024 09:45:14 +0000 (09:45 +0000)]
git: prefer WNOHANG for `git cat-file --batch-*'
When inside our DS event loop, ensure we don't stall on
synchronous waitpid when stopping `--batch-*' processes.
Instead of calling PublicInbox::IO::close explicitly, let
refcounting close the socket via PublicInbox::IO::DESTROY and
the SIGCHLD handler will deal with it when the kernel and event
loop get to it.
Eric Wong [Tue, 28 May 2024 21:25:02 +0000 (21:25 +0000)]
search: forbid getopt(3) switch injection in query
Search queries may start with `-', confusing getopt(3) and
Getopt::Long; so we use `--' to separate the query string
from switches.
Consequences of this bug were limited to a single broken HTTP
response for the requesting client.
It didn't didn't allow writes to on-disk Xapian DBs, but caused
aborts on some searches or nonsensical results when using the
optional external xap_helper processes. There was no risk of
data leaks since the mset xap_helper endpoint only returns
document IDs (unsigned integers), and not terms.
The biggest danger from this bug was that it could run systems
out of space if they are configured to write out core dumps.
Eric Wong [Tue, 21 May 2024 07:14:23 +0000 (07:14 +0000)]
t/lei-tag: allow changing time for --commit-delay test
Sometimes `lei ls-label' can run slowly enough that the
previously-scheduled delayed commit happens by the time it runs.
So support tuning the delay and add a helpful message to someone
analyzing failures on slow/overloaded machines.