Eric Wong [Fri, 6 Sep 2024 23:29:03 +0000 (23:29 +0000)]
view: fix addr2url mapping corruption
We must avoid generating a qr/\b()\b/ regexp which matches
every word boundary. This is caused by a particular set of
circumstances for WWW instances:
1. extindex must be in use
2. cindex must NOT be in use OR WWW->preload wasn't used
(custom .psgi or non-p-i-{httpd,netd} users)
3. first HTTP request hits /$EXTINDEX/$MSGID/
(where $EXTINDEX is typically `all')
On extindex-using instances without a cindex configured, the
first HTTP request hitting the extindex encounters an empty
{-by_addr} hash table. This empty {-by_addr} hash table causes
View->addr2urlmap() to return an all-matching regexp which
corrupts HTML when attempting address substitutions.
cindex-using instances avoid the problem by triggering
_fill_all() during PublicInbox::WWW->preload and ensuring
{-by_addr} of the PublicInbox::Config object is populated.
Thanks to Konstantin for the initial report and Filip for the
immensely helpful explanation of the problem.
Eric Wong [Sat, 31 Aug 2024 08:17:56 +0000 (08:17 +0000)]
tests: skip ENOSPC injection on restricted systems
Yama will not allow ptrace(2) on existing processes (only new
ones) if the kernel.yaml.ptrace_scope sysctl is non-zero. Skip
those tests for now since the majority of strace(1) testing
is probably done on systems without ptrace restrictions.
Eric Wong [Fri, 30 Aug 2024 20:36:35 +0000 (20:36 +0000)]
view: fix unclosed parentheses after `raw' link
This formatting error was accidentally introduced while
converting a `qq{}' concatenation to a `say' statement. Re-add
the `)'. While we're at it, switch to a `print' statement
since we use a string literal anyways and `say' would require an
extra global variable lookup at runtime.
Eric Wong [Thu, 29 Aug 2024 23:26:03 +0000 (23:26 +0000)]
solver: use async_check for the temporary git repo
While the temporary git repo is likely in cache and not
subject to high seek latency as normal code repos are,
inflating objects still takes a non-trivial amount of time.
So use this as an opportunity to serve other clients and
exploit parallelism in SMP systems.
Eric Wong [Thu, 29 Aug 2024 23:26:02 +0000 (23:26 +0000)]
solver: use xap_helper for async search if available
The async search API using xap_helper allows -httpd/netd users
to exploit storage and CPU-level parallelism via sockets. It is
another step towards reducing head-of-line blocking in our Perl
event loop. This reduces the effect of slow storage and extremely
large search results on unrelated HTTP requests.
Eric Wong [Thu, 29 Aug 2024 23:26:01 +0000 (23:26 +0000)]
solver: use async check (`info') for coderepo
Async --batch-check or `info' batch commands allow our Perl
process to handle other requests while git is busy waiting
on slow storage or CPUs to retrieve blob information.
This improves parallelism for SMP machines in addition to
allowing the Perl process to service other HTTP/NNTP/IMAP/POP3
requests while waiting for disk seeks, zlib inflation, and
delta resolution.
Checking stderr for error hints is now potentially racy, but
it's only a hint so overall performance under worst case
scenarios is preferable to correctness.
Eric Wong [Fri, 30 Aug 2024 19:05:29 +0000 (19:05 +0000)]
t/v2writable: avoid failure on strace un-readyiness
poll(2) uses milliseconds, IO::Poll::_poll doesn't abstract that,
nor does our ->poll_in wrapper. This ensures we wait enough time
for strace to start up on overloaded systems.
Eric Wong [Fri, 30 Aug 2024 19:05:15 +0000 (19:05 +0000)]
lei: increase umask timeout
On slow or overloaded systems, 2 seconds may not be sufficient
time to wait for a lei client to respond to the umask request
from lei-daemon. Use 60s to be consistent with the FD transfer
in the general case.
While we're at it, consistently use poll_in() now that it exists
since it's a better API than vec() + select() and will give
consistent performance regardless of the FD value.
Eric Wong [Fri, 23 Aug 2024 16:30:29 +0000 (16:30 +0000)]
tls: set SSL_OP_NO_COMPRESSION explicitly
TLS compression is susceptible to the CRIME attack and
per-connection zlib contexts waste memory for idle clients.
While compression should already be off by default in modern OpenSSL;
Net::SSLeay::CTX_get_mode reveals OP_NO_COMPRESSION was not set
when created by IO::Socket::SSL::SSL_Context->new. So set it
explicitly to ensure it's really off.
Eric Wong [Tue, 20 Aug 2024 18:40:59 +0000 (18:40 +0000)]
t/sigfd: reduce getpid() calls and hash lookups
getpid() is no longer cached by glibc, syscalls are more
expensive nowadays, so only call it once per test. The
additional hash table depth is no longer necessary since there's
no longer a difference between signal dispatch methods now that
Sigfd uses the global %SIG.
Eric Wong [Tue, 20 Aug 2024 10:35:21 +0000 (10:35 +0000)]
lei_xsearch: allow signals during long queries
Xapian ->mset, remote Xapian calls via remote inboxes, and
lcat dumps can take a long time via wq_io_do and hold
lei_xsearch processes open for too long after a client
disconnects prematurely.
This fixes wait_for_eof shutdown timeouts on the lei-daemon quit
pipe when running t/lei-sigpipe.t with GIANT_INBOX_DIR pointed
to a meta@public-inbox.org mirror on my old laptop.
Eric Wong [Tue, 20 Aug 2024 10:35:20 +0000 (10:35 +0000)]
lei: allow Ctrl-C to interrupt IMAP+NNTP reads
Mail::IMAPClient and Net::NNTP remain synchronous APIs with
indefinite wait times on slow/unreliable connections or servers.
Since these APIs don't play nicely with signalfd or
EVFILT_SIGNAL, we will temporarily drop the reliable (but
sometimes delayed) signal handling mechanisms in favor of the
less reliable built-in signal handling of Perl to provide a
best-effort attempt to handle signals during slow operations.
Eric Wong [Tue, 20 Aug 2024 10:35:19 +0000 (10:35 +0000)]
sigfd: call normal Perl %SIG handlers
Instead of storing our own mapping of signal handler callbacks,
rely on the standard %SIG hash table which can be arbitrarily
updated from anywhere.
This makes it easier to allow existing synchronous code (e.g.
NetReader using Mail::IMAPClient or Net::NNTP) to add explicit
points where pending signals can be checked.
Additionally, it allows the `DEFAULT' (SIG_DFL) signal handler
to fire when there's no Perl subroutine to register.
Finally, this also allows us to rely on the OS + Perl itself to
dispatch signal handlers on kevent-based systems (and avoid
redundant dispatch due to our (previous) Linux-centric API). It
makes Linux signalfd the only system where we'd need to dispatch
%SIG callbacks ourselves.
Eric Wong [Tue, 20 Aug 2024 10:35:17 +0000 (10:35 +0000)]
treewide: handle EINTR for non-(signalfd|kevent)
We may encounter new architectures in Linux without syscall
number definitions or *BSD systems without IO::KQueue or kevent
support at all, so be prepared to handle signals anywhere within
the event loop in such cases.
Eric Wong [Sat, 10 Aug 2024 09:00:12 +0000 (09:00 +0000)]
extindex: support per-inbox indexheader+altid
This allows the venerable altid (e.g. gmane:1234) to finally
work for extindex users. The newer indexheader directive works
here, too. This allows a multi-inbox extindex to fully emulate
the capabilities of per-inbox Xapian indices.
For now, per-inbox indexheader and altid DO NOT work when
searching the extindex directly. In other words, gmane:1234
might work on the /git/ inbox, but not the /all/ extindex
virtual inbox. This may remain the case since altid is
typically per-inbox only, and stuff like X-Archives-Hash
can be global across inboxes.
Eric Wong [Sat, 10 Aug 2024 09:00:07 +0000 (09:00 +0000)]
www: don't memoize ->user_help contents
Generating it is cheap enough and not worth the extra memory
and long-lived allocations. We can avoid allocating a
Xapian::QueryParser object here, too, to avoid wasting memory
for xap_helper external process users.
Eric Wong [Sat, 10 Aug 2024 09:00:04 +0000 (09:00 +0000)]
search: help: avoid ':' in user prefixes
The non-':'-suffixed variation of the string is already used as
hash keys and literals elsewhere. Theoretically, a Perl
implementation can save some allocations this way (though Perl 5
currently doesn't).
In any case, we'll introduce a help2txt method to allow sharing
code between the callers in WwwText and Documentation/common.perl
Eric Wong [Sat, 10 Aug 2024 09:00:03 +0000 (09:00 +0000)]
indexheader: deduplicate common values
Since we plan on sharing IndexHeader across multiple inboxes for
large installations with thousands of inboxes, it makes sense to
deduplicate the values to save some memory at the cost of
increased startup time.
Eric Wong [Sat, 10 Aug 2024 09:00:02 +0000 (09:00 +0000)]
search: support per-inbox indexheader directive
This allows indexing arbitrary headers to allow filtering by
boolean terms or existing text rules. Disabling RFC 2047
decoding is supported, as well.
This also refactors AltId support to rely on the same mechanisms
as the IndexHeader class for indexing, user help, and
Xapian::QueryParser setup via both bindings and external
XapHelper process to avoid adding complexity to Search.pm and
SearchIdx.pm.
We'll finally document altid support in public-inbox-config(5)
since we're in the area, as it's been a stable feature for many
years, now.
Eric Wong [Wed, 14 Aug 2024 00:16:44 +0000 (00:16 +0000)]
lei_search: make missing Xapian docs for kw lookups
Missing keyword entries should be non-fatal since Xapian
data is always less important than what's in git and SQLite.
As such, Xapian data has and remains written last, leaving
the possibility of documents being missing from Xapian but
present in SQLite and git.
This improves recovery dealing with badly interrupted or failed
imports due to bugs or hardware failures.
Eric Wong [Wed, 14 Aug 2024 00:16:43 +0000 (00:16 +0000)]
v2writable: confess on broken {idx_shards}
There's a bug in `lei import' introduced in 4ff8e8d21ab5
(lei/store: stop shard workers + cat-file on idle, 2024-04-16)
which causes {idx_shards} to not be recreated properly.
Hopefully this can help me track it down since it's not easily
reproducible.
Eric Wong [Fri, 26 Jul 2024 21:59:26 +0000 (21:59 +0000)]
watch: add per-directory scanning diagnostics
This may help track down problems associated with a single
directory. Note we emit a separate message for each of the
`new' and `cur' subdirectories of a Maildir. Full scans only
happen at startup (or manually), so it shouldn't be too noisy
if logging to syslog.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:59:25 +0000 (21:59 +0000)]
watch: only open one directory at a time when scanning
This avoids EMFILE/ENFILE for large setups with many Maildir
watch directives. It also makes adding per-directory scanning
messages easier in the next commit.
Eric Wong [Fri, 26 Jul 2024 21:59:24 +0000 (21:59 +0000)]
watch: more details about full scan start/completion
Start and stop happens infrequently and may be useful for
diagnosing problems about missing messages. A future change
will add more details about per-directory scans.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:31:11 +0000 (21:31 +0000)]
t/v2writable: use 5.10.1 and autodie more
Switching to Perl v5.12 will require more review due to
unicode_strings, but 5.10.1 is an easy change and we can rely
more on autodie to simplify error checking.
Eric Wong [Fri, 26 Jul 2024 21:31:09 +0000 (21:31 +0000)]
msgmap: mid_insert: reraise on unexpected errors
SQLITE_CONSTRAINT is the only SQLite error we really expect under
normal circumstances. This avoids infinite loops when writing
to inboxes after hitting ENOSPC.
Eric Wong [Sun, 7 Jul 2024 06:01:58 +0000 (06:01 +0000)]
www: replace *eml_entry with *emit_eml
This further reduces the amount of copies, temporary strings,
and scratchpad use started way back in 2022. With a 700+
message thread on a /T/ endpoint, this saves roughly 1-2% time
and roughly 100 KB of memory.
Eric Wong [Sun, 7 Jul 2024 05:57:27 +0000 (05:57 +0000)]
t/www_listing: use autodie, reduce useless tests
Noisy error checking is noisy and less useful than autodie
diagnostics in case of failure. Furthermore, Most of the xsys()
failures would not allow us to continue, so favor xsys_e() in
those places.
Eric Wong [Sun, 7 Jul 2024 05:57:26 +0000 (05:57 +0000)]
www: manifest.js.gz handles If-Modified-Since
While we can't avoid the expensive manifest.js.gz generation,
non-Varnish users now get the bandwidth savings from seeing a
304 response. This has no effect on Varnish users since Varnish
will forward the request to us without If-Modified-Since if it
gets a cache miss, and handle 304 for us on cache hits.
Eric Wong [Thu, 4 Jul 2024 02:20:55 +0000 (02:20 +0000)]
http: don't requeue if using write buffer
The write buffering will already be processed inside
->event_step, so requeue will cause a needless read(2) outside
of epoll_wait/kevent(2) readiness notifications.
This ought to avoid problems in case of pipelined connections,
but those aren't possible behind a reverse proxy and AFAIK most
HTTP clients don't do pipelining. This bug was only noticed via
strace while searching for extra syscalls, and not from
real-world use.
Eric Wong [Tue, 25 Jun 2024 18:49:37 +0000 (18:49 +0000)]
speedup $EXTRACT_DIFFS callers by 1%
While Perl docs recommend against using //o, we know the regexp
won't change at runtime and there's a measurable improvement to
be found. The included perf test on a packed mirror of
meta@public-inbox.org shows a consistent ~1% improvement on my
system.
cmd_authenticate() replies to AUTHENTICATE commands with "+" CRLF but
the imap4rev1 RFC [^0] defines the following ABNF syntax for a continuation
request:
Eric Wong [Thu, 20 Jun 2024 22:54:34 +0000 (22:54 +0000)]
http: don't store `127.0.0.1' for idle clients
For persistent HTTP clients, we can set REMOTE_ADDR lazily
for the common `127.0.0.1' value and save a few bytes when
dealing with idle connections which linger inbetween requests.
Eric Wong [Wed, 19 Jun 2024 23:41:04 +0000 (23:41 +0000)]
http: use writev for known Content-Length responses
We could use sendmsg(2) without MSG_MORE here, too, but
writev(2) is simpler to setup and call and we may want to use it
with pipes or regular files in the future, too, not just sockets.
Eric Wong [Wed, 19 Jun 2024 23:41:02 +0000 (23:41 +0000)]
use sendmsg w/ MSG_MORE to reduce syscalls
In places where we made multiple send(..., MSG_MORE) calls in
quick succession, we now use sendmsg(2) to provide the same
semantics with fewer syscalls. While this may be less efficient
inside the kernel for small messages, syscalls are expensive
nowadays and we can avoid userspace copies and large allocations
when streaming large HTTP chunks in /T/, /t/, and t.mbox.gz
endpoints.
This allows *BSD systems lacking MSG_MORE to save some syscalls
when writing HTTP chunked encoding, among other things.
Eric Wong [Wed, 19 Jun 2024 23:41:01 +0000 (23:41 +0000)]
ds: update indentation to match rest of source
Our changes aren't compatible with Danga::Socket at all at this
point. While we're at it, depend more on subroutine prototypes
to get some compile-time checking.
Eric Wong [Sun, 16 Jun 2024 23:35:32 +0000 (23:35 +0000)]
www: search patch subject in #related query
Blob OIDs would not be accurate for merges and fuzzy
applications, so include the commit title/Subject to
increase the likelyhood of finding related commits.
Eric Wong [Sun, 16 Jun 2024 23:35:30 +0000 (23:35 +0000)]
www: merge dfblob query data
We combine pre and post-image blob OIDs anyways for the #related
query, so there's no need to have separate arrays to store their
intermediate values. We'll also rename {-qry} to {-qry_dfblob}
in preparation of subject-based searches.
Eric Wong [Mon, 17 Jun 2024 00:01:40 +0000 (00:01 +0000)]
www: strip and redirect on `<' and `>' in MSGID of URL
Some users may needlessly include `<' and `>' braces in URLs, so
account for this common mistake and redirect users to the
non-braced URL. This common mistake could be learned behavior
from other sites (e.g. sr.ht) which include `<' and `>' in URLs.
Eric Wong [Tue, 11 Jun 2024 18:54:42 +0000 (18:54 +0000)]
solver_git: workaround truncated `b' path in patch
For messages like <780a3faf-9e44-64f4-a354-bdee39af3af5@redhat.com>
where the "diff --git" line is truncated, favor the filename from
the "+++ b/" line.
Eric Wong [Mon, 10 Jun 2024 11:34:27 +0000 (11:34 +0000)]
www: deduplicate Message-ID in threading + skeleton
xt/perf-threading.t reports a small 0.5-1.0% memory reduction in
non-ancient Perls with CoW strings for threading alone (w/o
rendering the View.pm stuff).
On informal tests using -httpd and giant Linux stable patch set
threads (700+ messages), this ends up being roughly 5MB saved in
/T/ rendering since we use the {mid} field again in the
$ctx->{mapping} table. This becomes even more beneficial if
handling parallel HTTP requests for messages in the same message
thread, even across different endpoints.
Eric Wong [Sun, 9 Jun 2024 20:05:23 +0000 (20:05 +0000)]
gzip_filter: use zlib DEF_MEM_LEVEL for gzip
Compress::Raw::Zlib uses MAX_MEM_LEVEL by default which deviates
fom the zlib default. Since the zlib default is good enough for
git, nginx and varnish: it's good enough for our use. This
change reduces maximum zlib memory use by 1/3.
There's also a new note explaining why gzip happens in Perl
instead of varnish || nginx.
Eric Wong [Thu, 6 Jun 2024 07:44:16 +0000 (07:44 +0000)]
www: reduce fragmentation in /t/ and /T/ endpoints
For giant threads with /t/ and /T/ endpoints, avoid generating a
large string with a medium lifetime for the thread skeleton
($ctx->{skel}). Instead, make $ctx->{skel} an arrayref and use
it to store a bunch of smaller strings, instead.
While keeping many small strings is inefficient due to pointer
chasing; forcing a smaller distribution of sizes makes it easier
for the malloc implementation to organize and find small chunks
of memory instead of having to find (and hold) larger contiguous
chunks. When a large string is created now, it's lifetime is
kept as short as possible to decrease its likelyhood of causing
fragmentation.
Preliminary testing shows this appears to reduce RSS by roughly
20-40% under both glibc malloc (using a tiny
MALLOC_MMAP_THRESHOLD_=67000) on 32-bit and jemalloc 5.2.1 on
64-bit with standard settings.
Eric Wong [Thu, 6 Jun 2024 07:44:15 +0000 (07:44 +0000)]
treewide: use cached git executable lookup
Repeated stat(2) syscalls are more expensive nowadays due
to CPU vulnerability mitigations and this change also
allows bypassing some heap allocations done by Perl.
Eric Wong [Thu, 6 Jun 2024 07:44:12 +0000 (07:44 +0000)]
treewide: use \*STD(IN|OUT|ERR) consistently
Referencing the {IO} slot may not always be populated or work
(e.g. with `-t' filetest) if there's no IO handle. Using merely
using `\*' is shorter than typing out `{GLOB}', so just use the
shortest form consistently.
This may fix occasional and difficult-to-reproduce failures from
redirecting STDERR in t/imap_searchqp.t
Eric Wong [Wed, 5 Jun 2024 20:03:23 +0000 (20:03 +0000)]
searchview: avoid uninitialized vals in %rmap_inc
Modules (e.g. `PublicInbox::Gcf2') may have an undef value in
the %rmap_inc hash table if an attempt has been made to load it
and failed due to a missing libgit2-dev dependency. Avoid using
it in interpolation to avoid warnings.
Eric Wong [Tue, 4 Jun 2024 22:25:20 +0000 (22:25 +0000)]
mda: do not auto-create Xapian indices
As with -learn, -mda now detects indexlevel=basic without an
explicit config setting for inboxes which only have SQLite
files. Omitting indexlevel=basic in the config file allows
users to reduce configuration file size (and RAM usage).
We'll also ensure completely unindexed v1 inboxes can stay
unindexed despite the default being indexlevel=full.
git.git commit f4aa8c8b (fetch/clone: detect dubious ownership
of local repositories, 2024-04-10) has proven to be overly aggressive
and breaks existing setups where git-http-backend is serving
read-only repositories from reasonably trusted sources and not
running hooks of any sort.
Just mark everything as safe since our public-facing instances
have always assumed writes to all git repos come from a
different user than whatever user -netd/-httpd runs as.
Eric Wong [Thu, 30 May 2024 09:45:15 +0000 (09:45 +0000)]
git: reduce spawning for rev-parse --git-path
Since every non-worktree git repo has an `objects' directory, we
can quickly stat(2) to check for its presence and avoid an
expensive process spawn. This should be the common case on
servers since it's rare to use worktrees on servers for
coderepos (or inboxes).
Eric Wong [Thu, 30 May 2024 09:45:14 +0000 (09:45 +0000)]
git: prefer WNOHANG for `git cat-file --batch-*'
When inside our DS event loop, ensure we don't stall on
synchronous waitpid when stopping `--batch-*' processes.
Instead of calling PublicInbox::IO::close explicitly, let
refcounting close the socket via PublicInbox::IO::DESTROY and
the SIGCHLD handler will deal with it when the kernel and event
loop get to it.
Eric Wong [Tue, 28 May 2024 21:25:02 +0000 (21:25 +0000)]
search: forbid getopt(3) switch injection in query
Search queries may start with `-', confusing getopt(3) and
Getopt::Long; so we use `--' to separate the query string
from switches.
Consequences of this bug were limited to a single broken HTTP
response for the requesting client.
It didn't didn't allow writes to on-disk Xapian DBs, but caused
aborts on some searches or nonsensical results when using the
optional external xap_helper processes. There was no risk of
data leaks since the mset xap_helper endpoint only returns
document IDs (unsigned integers), and not terms.
The biggest danger from this bug was that it could run systems
out of space if they are configured to write out core dumps.
Eric Wong [Tue, 21 May 2024 07:14:23 +0000 (07:14 +0000)]
t/lei-tag: allow changing time for --commit-delay test
Sometimes `lei ls-label' can run slowly enough that the
previously-scheduled delayed commit happens by the time it runs.
So support tuning the delay and add a helpful message to someone
analyzing failures on slow/overloaded machines.
Eric Wong [Sun, 19 May 2024 21:55:07 +0000 (21:55 +0000)]
xap_helper: drop DB handles on EMFILE/ENFILE/etc...
This allows the process to recover in case we get the SHARD_COST
calculation wrong in case Xapian uses more FDs than expected in
new versions. We'll no longer attempt to recover from ENOMEM
and similar errors during Xapian DB initialization and instead
just tear down the process (as we do in other places).
Eric Wong [Sun, 19 May 2024 21:55:06 +0000 (21:55 +0000)]
xap_helper: expire DB handles when FD table is near full
For long-lived daemons across config reloads, we shouldn't keep
Xapian DBs open forever under FD pressure. So estimate the
number of FDs we need per-shard and start clearing some out
if we have too many open.
While we're at it, hoist out our ulimit_n helper and share it
across extindex and the Perl XapHelper implementation.
Eric Wong [Sun, 19 May 2024 21:55:05 +0000 (21:55 +0000)]
xap_helper.h: memoize Xapian handles with khashl
Since we're already using khashl in the C++ implementation,
get rid of tsearch(3) and friends as well. Relying on hash
tables in both the Perl and C(++) implementation reduces
cognitive load for hackers.
Eric Wong [Sun, 19 May 2024 21:55:03 +0000 (21:55 +0000)]
xap_helper.h: use khashl.h instead of hsearch(3)
hsearch(3) and friends are just too horrid of APIs and subject
to fatal problems due to system-dependent ENTRY.key use of
strdup(3). So replace it with khashl (which is a newer, smaller
version of the widely-used khash in git.git).
We'll also be able to use khashl in the future for
the FUSE shim if liburcu isn't available.
Eric Wong [Sun, 19 May 2024 21:55:02 +0000 (21:55 +0000)]
xap_helper: key search instances by -Q params, too
In addition to the shards which comprise the xap_helper search
instance, we also account for changes in altid and indexheader
in case xap_helper lifetime exceeds the given
PublicInbox::Config.
xap_helper will be Config lifetime agnostic since it's possible
to run -netd and -httpd instances with multiple Config files,
but a single xap_helper instance (with workers) should be able
to service all of them.
Eric Wong [Sun, 19 May 2024 21:55:01 +0000 (21:55 +0000)]
config: dedupe ibx->{newsgroup}
We definitely use newsgroup names as hash keys, so get rid
of the duplicate value for some memory savings when we have
hundreds or thousands of newsgroups.
Eric Wong [Tue, 14 May 2024 06:38:06 +0000 (06:38 +0000)]
doc: limit jemalloc recommendation to 64-bit systems
My 32-bit server seems less happy with jemalloc; likely since
munmap is creating holes and it's not using sbrk by default.
jemalloc seems to need large VM space (not actual memory)
to work well, and that isn't a possibility for constrained
32-bit systems.
Eric Wong [Sat, 11 May 2024 23:29:40 +0000 (23:29 +0000)]
solver: quiet complex regexp warning for old Perl
I'm not sure when the actual recursion limit was removed,
but the warning was removed for Perl 5.37.1. In any case,
it's probably not worth doing anything about for older Perls
it since it's rarely triggered and it seems nobody cares too
much about solver, anyways :<
Eric Wong [Thu, 9 May 2024 00:39:01 +0000 (00:39 +0000)]
treewide: reduce $PATH checks for `git' executable
Repeatedly checking $PATH for `git' when we need to call it
multiple times in quick succession doesn't seem useful. So
avoid some expensive stat(2) syscalls to make things less bad
for systems which require expensive CPU vulnerability
mitigations.
This also saves a bunch of memory allocations since we do the
$PATH lookup in pure Perl to avoid doing the uncacheable lookup
in a vfork-ed child.
Eric Wong [Tue, 7 May 2024 19:14:27 +0000 (19:14 +0000)]
xap_helper: unconditionally reopen DBs on reuse
Reopening Xapian DBs is a fairly cheap operation and Xapian
avoids doing work when nothing's changed, so just do it
to ensure we always get the latest updates in search results.
The old synchronous search interface worked around this by
having a timer based expiration in hopes of mitigating
fragmentation problems, but perhaps that's not worth doing
anymore now that memory fragmentation from Perl itself is
better understood.