Debian abandoned SVN and Alioth in favor of Salsa (Gitlab) a
while back, so point to the new URL of the SA configs. The
new repo is limited to SA configs, while the old pkg-listmaster
repo had some other tools and plugins which were probably unused
and abandoned.
Eric Wong [Mon, 28 Oct 2024 20:47:58 +0000 (20:47 +0000)]
coderepo: sort per-inbox coderepos by score
This increases the likelyhood of early solver success and looks
more logical in listings.
It's probably OK to ditch the timestamp column since the score
is far more important and we can reduce Xapians lookups this
way. The raw score is still shown, for now, but that could
probably be a percentage in the future...
Eric Wong [Fri, 25 Oct 2024 03:19:56 +0000 (03:19 +0000)]
learn: reduce parameter passing
Since $remove_or_add is now a locally-scoped anonymous sub, it
can access `global' variables under TestCommon::key2sub and
avoid shadowing the names of global variables.
Eric Wong [Fri, 25 Oct 2024 03:19:55 +0000 (03:19 +0000)]
learn: support --keep-going/-k switch
Inspired by make(1), this switch allows -learn to work with
config files which contain both read-only and read-write
inboxes.
The remove_or_add() sub is now anonymous and locally-scoped to
facilitate `@fail_ibx' variable sharing use when the entire
script is made into a subroutine for `make check-run' (via
TestCommon->key2sub). A subsequent commit will reduce needless
parameter passing of global variables for readability.
Eric Wong [Mon, 21 Oct 2024 20:24:58 +0000 (20:24 +0000)]
doc: cindex: clarify --prune switch
--prune doesn't remove commits from git, it only removes them
from the index. Thus "unindex" is a better word to describe the
removal of commits from the index.
Eric Wong [Tue, 8 Oct 2024 05:18:37 +0000 (05:18 +0000)]
v2writable: more debug output for `lei import' failures
A difficult-to-reproduce bug in `lei import' introduced in 4ff8e8d21ab5
(lei/store: stop shard workers + cat-file on idle, 2024-04-16)
causes {idx_shards} to not be recreated properly due to a shard
being locked while attempting to get a write lock on it:
Exception: Unable to get write lock on /path/to/shard0: already locked
I'm not sure if the bug is from unnecessarily holding onto a
shard too long, or incorrectly attempting to open an
already-open shard. In either case, hopefully a more complete
backtrace can be obtained since setting PERL5OPT=-MCarp=verbose
in the shared lei/store worker process isn't straightforward[*].
AFAIK, this doesn't affect normal v2 and -extindex activity,
only lei users who import mail[*]
[*] lei attempts to ensure read-after-write consistency
across parallel instances to satisfy users who simultaneously
import multiple IMAP mailboxes at once over high-latency networks.
However, Xapian, SQLite, and multi-epoch git only work well with
one writer-at-a-time so lei jumps through hoops to avoid
introducing suprising local delays and waits.
Eric Wong [Mon, 7 Oct 2024 08:30:20 +0000 (08:30 +0000)]
lock: improve error reporting
Lock errors should not happen under normal use, so use confess()
to aid debugging on failure. We'll also start using `E: ' to
denote errors (as opposed to warnings).
Eric Wong [Wed, 2 Oct 2024 22:39:02 +0000 (22:39 +0000)]
viewvcs: generate search query for merge commits
Attempt to parse commit titles out of merge commit messages and
generate search queries out of them to find the related emails
for the individual patch(es).
As with all search-related functionality, it's best-effort
and inexact, but seems somewhat successful.
Eric Wong [Wed, 2 Oct 2024 22:39:01 +0000 (22:39 +0000)]
viewvcs: use wider textarea for search query
We normally wrap text at 72, but the <textarea> can be wider
in case there's long words which aren't broken apart.
With w3m, the enclosing `[' and `]' take up two columns
combined, so it works out to maximizing a standard 80-column
terminal.
Eric Wong [Mon, 30 Sep 2024 21:30:08 +0000 (21:30 +0000)]
t/{config,solver_git}: simplify error handling
autodie and the newish PublicInbox::IO::write_file make error
handling more consistent and less noisy to people reading the
code. We'll also avoid testing `git config' set behavior
of git(1) and instead bail out via `xsys_e()' if it fails
unexpectedly due to hardware problems or bugs in git.
Eric Wong [Mon, 30 Sep 2024 21:30:07 +0000 (21:30 +0000)]
xt/solver: use `psgi' shortcut for require_mods()
The `psgi' shortcut simplifies setup, reduces the likelyhood
of human error from omitted modules, and avoid needless `use_ok'
tests which make the output noisier than necessary.
Eric Wong [Mon, 30 Sep 2024 21:30:06 +0000 (21:30 +0000)]
www: improve handling of missing coderepos
Git coderepos may appear and disappear during the lifetime of
an -httpd or -netd instance. This happens quite frequently on
on my git.kernel.org mirror and is and clutters up stderr, so
we'll validate the existance of a git directory before trying
to serve anything.
Whether or not this should be the case for inboxes is yet-to-be
decided, especially since there's inbox-specific information
inside in PI_CONFIG (e.g. address, newsgroup) and we only use a
project listing for coderepos.
Eric Wong [Mon, 30 Sep 2024 21:30:04 +0000 (21:30 +0000)]
t/www_static: test with our -httpd server, too
While our current HTTP implementation doesn't make special
allowances for static files, it only hurts a little to fork
off -httpd instances and ensure any future changes work as
expected.
Eric Wong [Mon, 30 Sep 2024 21:30:03 +0000 (21:30 +0000)]
t/www_static: modernize test
autodie is standard in Perl v5.10+ and we now have
PublicInbox::IO::write_file to denoise test setup code.
Then we'll favor the non-wantarray calls to tmpdir() and
rely on an overloaded stringification rather than keeping
the $for_destroy object around.
Eric Wong [Thu, 26 Sep 2024 10:56:35 +0000 (10:56 +0000)]
t/lei-import-imap: bail on missing UIDVALIDITY after import
I haven't been able to reproduce it, but I've seen the test
fail because ls-mail-sync didn't emit anything which matched
/;UIDVALIDITY=(\d+)\s*/. Add some diagnostics and bail out
right away if it happens again.
Eric Wong [Thu, 26 Sep 2024 00:55:05 +0000 (00:55 +0000)]
www: use mtime as CSS cache-buster instead of ctime
mtime can be synchronized across multiple machines via package
managers, tarballs, rsync deploys, or tools like
`git-set-file-times'. This synchronization increases cache hit
rates for browsers hitting multi-host public-inbox instances
loadbalancing behind a single $hostname:$port identity.
While mtime may be less correct, it's unusual that anyone would
want to intentionally alter or preserve mtime after a file is
changed on the FS.
Eric Wong [Sat, 28 Sep 2024 18:39:35 +0000 (18:39 +0000)]
www: allow specifying CSS @import or <link> tags
Apparently, some browsers (or settings/extensions) will not
honor certain media= attributes in HTML <link> tags. So as a
workaround, users may force the use @import statements inside the
<style> tag before the normal monospace CSS $STYLE using the new
`load' directive.
I've only lightly tested due to swap thrashing on my system, but it
appears to work on a new Firefox profile w/ ui.systemUsesDarkTheme=1.
I couldn't get @import nor <link> working on an existing profile,
likely due to some other settings in the config.
I initially wanted to use @import by default, but the ordering
constraint prevents user-specified CSS from overriding the
default $STYLE we normally inject first.
This also ensures @import use inside a specified CSS file forces
the file to be imported first, before the default $STYLE is
inlined.
In the future, `load' may support `last' as a comma-delimited
directive to load CSS at the bottom of the page (before the
`</html>' tag).
Eric Wong [Thu, 26 Sep 2024 00:55:02 +0000 (00:55 +0000)]
www: don't reread CSS files
With preloading, we usually read our CSS files in quick
succession so they are unlikely to change in between loads.
Thus we can save some syscalls and memory (assuming newer
Perl with CoW scalars).
Since our normal Perl code is only loaded once and ignores
changes on the FS after startup, we'll treat our CSS the
same way and assume they don't change after startup.
Eric Wong [Thu, 26 Sep 2024 00:55:01 +0000 (00:55 +0000)]
user_content: simplify internal API and use v5.12
We use {env} and {ibx} everywhere so there's no point in
unpacking args. There's no odd unicode_strings problems
here, either, so we can use v5.12 and autodie to reduce
`or die' checks.
Eric Wong [Tue, 24 Sep 2024 18:35:48 +0000 (18:35 +0000)]
viewvcs: fix b= generation in $REPO/tree/ listing
Queries such as `b=contrib/cssREADME' are incorrect despite
having the actual blob OID for the given file. Add a trailing
slash for files in a project subdirectory in those cases as we
do for cases we don't have a known path name.
While we're in the area, avoid needless shadowing of the `$t'
var and add a comment to describe its contents.
Eric Wong [Mon, 16 Sep 2024 21:03:01 +0000 (21:03 +0000)]
www: test address URL-fication
Probably more tests coming, but setup stuff is still on the slow
side. While email addresses can be all sorts of uncommon
characters, I'm also fairly certain we can disallow the [&;<>]
set from being URL-fied.
Eric Wong [Mon, 16 Sep 2024 21:03:00 +0000 (21:03 +0000)]
test_common: improve psgi test setup + loading
Since we have many PSGI tests nowadays, put in a `psgi' shortcut
like we do for many other components for `require_mods' to make
it easier to load a consistent set of modules.
We'll also cut down on `require_ok' and `use_ok' tests since
they should be limited to code maintained in our source tree,
not 3rd-party dependencies.
Eric Wong [Mon, 16 Sep 2024 21:02:58 +0000 (21:02 +0000)]
config: ignore blank address= and listid= entries
At the minimum, there must be a non-space character in
address= and listid= entries for matches to occur.
Filter out the obviously unmatchable entries here to
avoid potential problems elsewhere.
Eric Wong [Mon, 16 Sep 2024 09:53:59 +0000 (09:53 +0000)]
t/feed: fix uninitialized variable warnings
These warnings only happened under test conditions and never
when running under any PSGI servers. In retrospect, t/feed.t is
likely redundant nowadays and ought to be folded into existing
PSGI tests so we don't have to consider setup problems like
these.
Fixes: bbe582cdfa429 ("view: fix addr2urlmap with Plack::Builder::mount")
Eric Wong [Fri, 13 Sep 2024 22:07:24 +0000 (22:07 +0000)]
view: disable address URL-fication of possible HTML escapes
In case somebody uses local email address of `lt' or `gt' (with
no domain component, or something matching /#\d+/a), disable
URL-fication of such addresses to prevent breaking HTML output.
Somebody with better Perl regexp knowledge than I can attempt to
write a regexp which functions like \b but avoids matching `&'
to allow such local email addresses. But I suspect the use of
local-only email addresses to be limited and this isn't a real
problem in practice.
Eric Wong [Fri, 13 Sep 2024 22:07:23 +0000 (22:07 +0000)]
view: fix addr2urlmap with Plack::Builder::mount
Plack::App::URLMap does not preserve SCRIPT_NAME set for PSGI
`mount' directives when running response callbacks. Thus we
must get $ibx->base_url($ctx->{env}) calls to generate correct
full URLs when relying on publicinbox.nameIsUrl up front before
the PSGI response callback is returned.
Eric Wong [Tue, 10 Sep 2024 00:40:48 +0000 (00:40 +0000)]
view: fix x-post links for relative urls
We need to make correct relative URL paths for users configuring
publicinbox.$NAME.url as relative URL paths (e.g. matching the
inbox `$NAME').
Users of protocol-relative (e.g. `//$HOST/$NAME') and absolute URIs
(e.g `https://example.com/$NAME') were unaffected by this bug.
Users relying on publicinbox.nameIsUrl and omitting
publicinbox.*.url entries were also immune to this bug.
Automated tests are in progress and will come in a separate commit.
Eric Wong [Wed, 11 Sep 2024 21:25:49 +0000 (21:25 +0000)]
www: preload all inboxes if using ->ALL
This ought to improve memory layout and ensure the regexp
for address => inbox linkification works when hitting
/$EXTINBOX/$MSGID/ links first (instead of /$INBOX/$MSGID)
This fill_all call is redundant for cindex users who get the
preload anyways, but necessary for non-cindex users.
This should also avoid the broken/empty regexps problem described in 3b51fcc196e3 (view: fix addr2url mapping corruption, 2024-09-06)
Eric Wong [Fri, 6 Sep 2024 23:29:03 +0000 (23:29 +0000)]
view: fix addr2url mapping corruption
We must avoid generating a qr/\b()\b/ regexp which matches
every word boundary. This is caused by a particular set of
circumstances for WWW instances:
1. extindex must be in use
2. cindex must NOT be in use OR WWW->preload wasn't used
(custom .psgi or non-p-i-{httpd,netd} users)
3. first HTTP request hits /$EXTINDEX/$MSGID/
(where $EXTINDEX is typically `all')
On extindex-using instances without a cindex configured, the
first HTTP request hitting the extindex encounters an empty
{-by_addr} hash table. This empty {-by_addr} hash table causes
View->addr2urlmap() to return an all-matching regexp which
corrupts HTML when attempting address substitutions.
cindex-using instances avoid the problem by triggering
_fill_all() during PublicInbox::WWW->preload and ensuring
{-by_addr} of the PublicInbox::Config object is populated.
Thanks to Konstantin for the initial report and Filip for the
immensely helpful explanation of the problem.
Eric Wong [Sat, 31 Aug 2024 08:17:56 +0000 (08:17 +0000)]
tests: skip ENOSPC injection on restricted systems
Yama will not allow ptrace(2) on existing processes (only new
ones) if the kernel.yaml.ptrace_scope sysctl is non-zero. Skip
those tests for now since the majority of strace(1) testing
is probably done on systems without ptrace restrictions.
Eric Wong [Fri, 30 Aug 2024 20:36:35 +0000 (20:36 +0000)]
view: fix unclosed parentheses after `raw' link
This formatting error was accidentally introduced while
converting a `qq{}' concatenation to a `say' statement. Re-add
the `)'. While we're at it, switch to a `print' statement
since we use a string literal anyways and `say' would require an
extra global variable lookup at runtime.
Eric Wong [Thu, 29 Aug 2024 23:26:03 +0000 (23:26 +0000)]
solver: use async_check for the temporary git repo
While the temporary git repo is likely in cache and not
subject to high seek latency as normal code repos are,
inflating objects still takes a non-trivial amount of time.
So use this as an opportunity to serve other clients and
exploit parallelism in SMP systems.
Eric Wong [Thu, 29 Aug 2024 23:26:02 +0000 (23:26 +0000)]
solver: use xap_helper for async search if available
The async search API using xap_helper allows -httpd/netd users
to exploit storage and CPU-level parallelism via sockets. It is
another step towards reducing head-of-line blocking in our Perl
event loop. This reduces the effect of slow storage and extremely
large search results on unrelated HTTP requests.
Eric Wong [Thu, 29 Aug 2024 23:26:01 +0000 (23:26 +0000)]
solver: use async check (`info') for coderepo
Async --batch-check or `info' batch commands allow our Perl
process to handle other requests while git is busy waiting
on slow storage or CPUs to retrieve blob information.
This improves parallelism for SMP machines in addition to
allowing the Perl process to service other HTTP/NNTP/IMAP/POP3
requests while waiting for disk seeks, zlib inflation, and
delta resolution.
Checking stderr for error hints is now potentially racy, but
it's only a hint so overall performance under worst case
scenarios is preferable to correctness.
Eric Wong [Fri, 30 Aug 2024 19:05:29 +0000 (19:05 +0000)]
t/v2writable: avoid failure on strace un-readyiness
poll(2) uses milliseconds, IO::Poll::_poll doesn't abstract that,
nor does our ->poll_in wrapper. This ensures we wait enough time
for strace to start up on overloaded systems.
Eric Wong [Fri, 30 Aug 2024 19:05:15 +0000 (19:05 +0000)]
lei: increase umask timeout
On slow or overloaded systems, 2 seconds may not be sufficient
time to wait for a lei client to respond to the umask request
from lei-daemon. Use 60s to be consistent with the FD transfer
in the general case.
While we're at it, consistently use poll_in() now that it exists
since it's a better API than vec() + select() and will give
consistent performance regardless of the FD value.
Eric Wong [Fri, 23 Aug 2024 16:30:29 +0000 (16:30 +0000)]
tls: set SSL_OP_NO_COMPRESSION explicitly
TLS compression is susceptible to the CRIME attack and
per-connection zlib contexts waste memory for idle clients.
While compression should already be off by default in modern OpenSSL;
Net::SSLeay::CTX_get_mode reveals OP_NO_COMPRESSION was not set
when created by IO::Socket::SSL::SSL_Context->new. So set it
explicitly to ensure it's really off.
Eric Wong [Tue, 20 Aug 2024 18:40:59 +0000 (18:40 +0000)]
t/sigfd: reduce getpid() calls and hash lookups
getpid() is no longer cached by glibc, syscalls are more
expensive nowadays, so only call it once per test. The
additional hash table depth is no longer necessary since there's
no longer a difference between signal dispatch methods now that
Sigfd uses the global %SIG.
Eric Wong [Tue, 20 Aug 2024 10:35:21 +0000 (10:35 +0000)]
lei_xsearch: allow signals during long queries
Xapian ->mset, remote Xapian calls via remote inboxes, and
lcat dumps can take a long time via wq_io_do and hold
lei_xsearch processes open for too long after a client
disconnects prematurely.
This fixes wait_for_eof shutdown timeouts on the lei-daemon quit
pipe when running t/lei-sigpipe.t with GIANT_INBOX_DIR pointed
to a meta@public-inbox.org mirror on my old laptop.
Eric Wong [Tue, 20 Aug 2024 10:35:20 +0000 (10:35 +0000)]
lei: allow Ctrl-C to interrupt IMAP+NNTP reads
Mail::IMAPClient and Net::NNTP remain synchronous APIs with
indefinite wait times on slow/unreliable connections or servers.
Since these APIs don't play nicely with signalfd or
EVFILT_SIGNAL, we will temporarily drop the reliable (but
sometimes delayed) signal handling mechanisms in favor of the
less reliable built-in signal handling of Perl to provide a
best-effort attempt to handle signals during slow operations.
Eric Wong [Tue, 20 Aug 2024 10:35:19 +0000 (10:35 +0000)]
sigfd: call normal Perl %SIG handlers
Instead of storing our own mapping of signal handler callbacks,
rely on the standard %SIG hash table which can be arbitrarily
updated from anywhere.
This makes it easier to allow existing synchronous code (e.g.
NetReader using Mail::IMAPClient or Net::NNTP) to add explicit
points where pending signals can be checked.
Additionally, it allows the `DEFAULT' (SIG_DFL) signal handler
to fire when there's no Perl subroutine to register.
Finally, this also allows us to rely on the OS + Perl itself to
dispatch signal handlers on kevent-based systems (and avoid
redundant dispatch due to our (previous) Linux-centric API). It
makes Linux signalfd the only system where we'd need to dispatch
%SIG callbacks ourselves.
Eric Wong [Tue, 20 Aug 2024 10:35:17 +0000 (10:35 +0000)]
treewide: handle EINTR for non-(signalfd|kevent)
We may encounter new architectures in Linux without syscall
number definitions or *BSD systems without IO::KQueue or kevent
support at all, so be prepared to handle signals anywhere within
the event loop in such cases.
Eric Wong [Sat, 10 Aug 2024 09:00:12 +0000 (09:00 +0000)]
extindex: support per-inbox indexheader+altid
This allows the venerable altid (e.g. gmane:1234) to finally
work for extindex users. The newer indexheader directive works
here, too. This allows a multi-inbox extindex to fully emulate
the capabilities of per-inbox Xapian indices.
For now, per-inbox indexheader and altid DO NOT work when
searching the extindex directly. In other words, gmane:1234
might work on the /git/ inbox, but not the /all/ extindex
virtual inbox. This may remain the case since altid is
typically per-inbox only, and stuff like X-Archives-Hash
can be global across inboxes.
Eric Wong [Sat, 10 Aug 2024 09:00:07 +0000 (09:00 +0000)]
www: don't memoize ->user_help contents
Generating it is cheap enough and not worth the extra memory
and long-lived allocations. We can avoid allocating a
Xapian::QueryParser object here, too, to avoid wasting memory
for xap_helper external process users.
Eric Wong [Sat, 10 Aug 2024 09:00:04 +0000 (09:00 +0000)]
search: help: avoid ':' in user prefixes
The non-':'-suffixed variation of the string is already used as
hash keys and literals elsewhere. Theoretically, a Perl
implementation can save some allocations this way (though Perl 5
currently doesn't).
In any case, we'll introduce a help2txt method to allow sharing
code between the callers in WwwText and Documentation/common.perl
Eric Wong [Sat, 10 Aug 2024 09:00:03 +0000 (09:00 +0000)]
indexheader: deduplicate common values
Since we plan on sharing IndexHeader across multiple inboxes for
large installations with thousands of inboxes, it makes sense to
deduplicate the values to save some memory at the cost of
increased startup time.
Eric Wong [Sat, 10 Aug 2024 09:00:02 +0000 (09:00 +0000)]
search: support per-inbox indexheader directive
This allows indexing arbitrary headers to allow filtering by
boolean terms or existing text rules. Disabling RFC 2047
decoding is supported, as well.
This also refactors AltId support to rely on the same mechanisms
as the IndexHeader class for indexing, user help, and
Xapian::QueryParser setup via both bindings and external
XapHelper process to avoid adding complexity to Search.pm and
SearchIdx.pm.
We'll finally document altid support in public-inbox-config(5)
since we're in the area, as it's been a stable feature for many
years, now.
Eric Wong [Wed, 14 Aug 2024 00:16:44 +0000 (00:16 +0000)]
lei_search: make missing Xapian docs for kw lookups
Missing keyword entries should be non-fatal since Xapian
data is always less important than what's in git and SQLite.
As such, Xapian data has and remains written last, leaving
the possibility of documents being missing from Xapian but
present in SQLite and git.
This improves recovery dealing with badly interrupted or failed
imports due to bugs or hardware failures.
Eric Wong [Wed, 14 Aug 2024 00:16:43 +0000 (00:16 +0000)]
v2writable: confess on broken {idx_shards}
There's a bug in `lei import' introduced in 4ff8e8d21ab5
(lei/store: stop shard workers + cat-file on idle, 2024-04-16)
which causes {idx_shards} to not be recreated properly.
Hopefully this can help me track it down since it's not easily
reproducible.
Eric Wong [Fri, 26 Jul 2024 21:59:26 +0000 (21:59 +0000)]
watch: add per-directory scanning diagnostics
This may help track down problems associated with a single
directory. Note we emit a separate message for each of the
`new' and `cur' subdirectories of a Maildir. Full scans only
happen at startup (or manually), so it shouldn't be too noisy
if logging to syslog.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:59:25 +0000 (21:59 +0000)]
watch: only open one directory at a time when scanning
This avoids EMFILE/ENFILE for large setups with many Maildir
watch directives. It also makes adding per-directory scanning
messages easier in the next commit.
Eric Wong [Fri, 26 Jul 2024 21:59:24 +0000 (21:59 +0000)]
watch: more details about full scan start/completion
Start and stop happens infrequently and may be useful for
diagnosing problems about missing messages. A future change
will add more details about per-directory scans.
Requested-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
Eric Wong [Fri, 26 Jul 2024 21:31:11 +0000 (21:31 +0000)]
t/v2writable: use 5.10.1 and autodie more
Switching to Perl v5.12 will require more review due to
unicode_strings, but 5.10.1 is an easy change and we can rely
more on autodie to simplify error checking.
Eric Wong [Fri, 26 Jul 2024 21:31:09 +0000 (21:31 +0000)]
msgmap: mid_insert: reraise on unexpected errors
SQLITE_CONSTRAINT is the only SQLite error we really expect under
normal circumstances. This avoids infinite loops when writing
to inboxes after hitting ENOSPC.
Eric Wong [Sun, 7 Jul 2024 06:01:58 +0000 (06:01 +0000)]
www: replace *eml_entry with *emit_eml
This further reduces the amount of copies, temporary strings,
and scratchpad use started way back in 2022. With a 700+
message thread on a /T/ endpoint, this saves roughly 1-2% time
and roughly 100 KB of memory.
Eric Wong [Sun, 7 Jul 2024 05:57:27 +0000 (05:57 +0000)]
t/www_listing: use autodie, reduce useless tests
Noisy error checking is noisy and less useful than autodie
diagnostics in case of failure. Furthermore, Most of the xsys()
failures would not allow us to continue, so favor xsys_e() in
those places.
Eric Wong [Sun, 7 Jul 2024 05:57:26 +0000 (05:57 +0000)]
www: manifest.js.gz handles If-Modified-Since
While we can't avoid the expensive manifest.js.gz generation,
non-Varnish users now get the bandwidth savings from seeing a
304 response. This has no effect on Varnish users since Varnish
will forward the request to us without If-Modified-Since if it
gets a cache miss, and handle 304 for us on cache hits.
Eric Wong [Thu, 4 Jul 2024 02:20:55 +0000 (02:20 +0000)]
http: don't requeue if using write buffer
The write buffering will already be processed inside
->event_step, so requeue will cause a needless read(2) outside
of epoll_wait/kevent(2) readiness notifications.
This ought to avoid problems in case of pipelined connections,
but those aren't possible behind a reverse proxy and AFAIK most
HTTP clients don't do pipelining. This bug was only noticed via
strace while searching for extra syscalls, and not from
real-world use.
Eric Wong [Tue, 25 Jun 2024 18:49:37 +0000 (18:49 +0000)]
speedup $EXTRACT_DIFFS callers by 1%
While Perl docs recommend against using //o, we know the regexp
won't change at runtime and there's a measurable improvement to
be found. The included perf test on a packed mirror of
meta@public-inbox.org shows a consistent ~1% improvement on my
system.
cmd_authenticate() replies to AUTHENTICATE commands with "+" CRLF but
the imap4rev1 RFC [^0] defines the following ABNF syntax for a continuation
request:
Eric Wong [Thu, 20 Jun 2024 22:54:34 +0000 (22:54 +0000)]
http: don't store `127.0.0.1' for idle clients
For persistent HTTP clients, we can set REMOTE_ADDR lazily
for the common `127.0.0.1' value and save a few bytes when
dealing with idle connections which linger inbetween requests.
Eric Wong [Wed, 19 Jun 2024 23:41:04 +0000 (23:41 +0000)]
http: use writev for known Content-Length responses
We could use sendmsg(2) without MSG_MORE here, too, but
writev(2) is simpler to setup and call and we may want to use it
with pipes or regular files in the future, too, not just sockets.
Eric Wong [Wed, 19 Jun 2024 23:41:02 +0000 (23:41 +0000)]
use sendmsg w/ MSG_MORE to reduce syscalls
In places where we made multiple send(..., MSG_MORE) calls in
quick succession, we now use sendmsg(2) to provide the same
semantics with fewer syscalls. While this may be less efficient
inside the kernel for small messages, syscalls are expensive
nowadays and we can avoid userspace copies and large allocations
when streaming large HTTP chunks in /T/, /t/, and t.mbox.gz
endpoints.
This allows *BSD systems lacking MSG_MORE to save some syscalls
when writing HTTP chunked encoding, among other things.
Eric Wong [Wed, 19 Jun 2024 23:41:01 +0000 (23:41 +0000)]
ds: update indentation to match rest of source
Our changes aren't compatible with Danga::Socket at all at this
point. While we're at it, depend more on subroutine prototypes
to get some compile-time checking.
Eric Wong [Sun, 16 Jun 2024 23:35:32 +0000 (23:35 +0000)]
www: search patch subject in #related query
Blob OIDs would not be accurate for merges and fuzzy
applications, so include the commit title/Subject to
increase the likelyhood of finding related commits.