Eric Wong [Fri, 10 Feb 2017 03:30:40 +0000 (03:30 +0000)]
repo: search index flushes for excessive active refs
For certain repos, having too many active refs will cause
memory usage problems. Mitigate the Xapian problems, for
now, and consider a switch to GDBM_File or similar for
repos with more refs.
Eric Wong [Thu, 9 Feb 2017 21:11:00 +0000 (21:11 +0000)]
repo: increase search index flush granularity
We need to flush Xapian more frequently to account for
gigantic commits which introduce lots of text, so do
it when accounting for each line processed, and not
for each commit processed.
Eric Wong [Thu, 9 Feb 2017 01:37:03 +0000 (01:37 +0000)]
repobrowse: shorten internal names
We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.
Eric Wong [Thu, 9 Feb 2017 00:43:02 +0000 (00:43 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master:
config: do not slurp lines into memory
TODO: several updates
search: schema version bump for empty References/In-Reply-To
Revert "searchidx: reindex clobbers old thread IDs"
searchidx: reindex clobbers old thread IDs
searchidx: deal with empty In-Reply-To and References headers
searchview: increase limit for displaying search results
searchview: clarify numeric summary at bottom
add filter for Subject: tags
watchmaildir: allow arguments for filters
watchmaildir: limit live importer processes
learn: implement "rm" only functionality
mime: avoid SUPER usage in Email::MIME subclass
inbox: reinstate periodic cleanup of Xapian and SQLite objects
introduce PublicInbox::MIME wrapper class
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)]
search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)]
searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)]
searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)]
searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)]
add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Eric Wong [Sat, 21 Jan 2017 11:50:58 +0000 (11:50 +0000)]
repobrowse: preserve newlines in Atom feed
Commit messages are assumed to be displayed in a terminal
with a fixed width font, so we must preserve newlines and
all whitespace as-is so ASCII art may be displayed properly.
Based on what was done for the Atom feed, this will allow us to
simplify state management through metaprogramming and avoid
placeholder characters ('D' for decoration) for empty fields.
Eric Wong [Sat, 21 Jan 2017 02:29:52 +0000 (02:29 +0000)]
repobrowse: git Atom feed uses Qspawn->psgi_return
This allows us to wait on "git log" output in a non-blocking manner
while being able to throttle on backpressure from slow clients
when used with pi-httpd.
Eric Wong [Sat, 21 Jan 2017 02:29:50 +0000 (02:29 +0000)]
qspawn: better annotate where $qx_cb is called
Hopefully this makes the code easier-to-follow for random
readers. This requires a small amount of modification to
our one caller, but this is a new, unstable API (as is
nearly all of our code).
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)]
learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)]
mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
Eric Wong [Wed, 18 Jan 2017 07:35:35 +0000 (07:35 +0000)]
http: cast a wider net to prevent circular references
We can more effectly nuke circular references by clearing
the entire PSGI $env, not just particular keys, when
there are self-referential fields such as "qspawn.response"
in our environment.
Eric Wong [Wed, 18 Jan 2017 07:27:03 +0000 (07:27 +0000)]
repobrowse: git snapshot waits for all commands asynchronously
This new asynchronous API, psgi_qx, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
Eric Wong [Sun, 15 Jan 2017 02:26:39 +0000 (02:26 +0000)]
repobrowse: use qspawn for plain tree views
We may eventually handle tree parsing ourselves (since we
already git cat-file), but for now we can rely on ls-tree
to give good output and qspawn to manage resource allocation.
Eric Wong [Fri, 13 Jan 2017 23:10:25 +0000 (23:10 +0000)]
httpd/async: stop running command if client disconnects
If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)]
inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)]
inbox: properly register cleanup timer for git processes
We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).
For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)]
remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)]
httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
Eric Wong [Mon, 26 Dec 2016 05:25:36 +0000 (05:25 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: (25 commits)
evcleanup: ensure deferred close from timers are handled ASAP
httpd/async: improve variable naming
githttpbackend: minor cleanups to improve readability
githttpbackend: simplify compatibility code
githttpbackend: minor readability improvement
http: fix clobbering of $null_io
linkify: modify argument in place
view: do not modify array during iteration
view: stop chomping off whitespace at ends of messages
view: remove unused parameter
search: lookup_mail handles modified DBs
doc: various comments on async handling
searchthread: simplify API and remove needless OO
searchthread: update comment about loop prevention
searchmsg: remove ensure_metadata
tests: add thread-all testing for benchmarking
searchmsg: do not memoize {date} field
searchmsg: remove locale-dependency for ->date
t/config.t: fix feedmax default
wwwtext: link to RFC4685 (Atom Threading)
...
Eric Wong [Mon, 26 Dec 2016 03:05:15 +0000 (03:05 +0000)]
evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
Eric Wong [Sat, 24 Dec 2016 11:52:42 +0000 (11:52 +0000)]
view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation
speed for a 368+ message thread. It also more faithfully
preserves the message as intended; even if the it makes the
sender look like a space-wasting slob :P
Eric Wong [Thu, 22 Dec 2016 04:38:13 +0000 (04:38 +0000)]
repobrowse: remove Plack::Request dependency
This does not make installation easier, but lightens runtime a
bit. Plack::Request is unnecessary bloat and indirection which
does things behind our back. $env has all the stuff we need.