Eric Wong [Wed, 12 Apr 2017 21:10:05 +0000 (21:10 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master:
search: fix help message for searching within quotes
learn: scan all inboxes when learning spam
watchmaildir: do not reject lowercase flags on Maildir files
searchview: show full (&x=t) messages in ascending chronlogical order
searchview: add "t" id to link to thread overview
extmsg: use updated mail-archive.com URL
view: escape HTML description name
Eric Wong [Tue, 11 Apr 2017 23:39:54 +0000 (23:39 +0000)]
search: fix help message for searching within quotes
I'm not sure if people use either and it's not in mairix
(where we base our abbreviations off of). Lets go
with the shorter prefix since it's easier-to-type.
Eric Wong [Fri, 24 Mar 2017 01:41:11 +0000 (01:41 +0000)]
searchview: show full (&x=t) messages in ascending chronlogical order
When displaying search results with full messages, it makes
more sense to show them in ascending chronological order when
going by date. Reverse chronological order makes more sense
for search results which only show the subject.
Eric Wong [Sat, 4 Mar 2017 03:52:29 +0000 (03:52 +0000)]
repoobrowse: explicit EOF handling for git async callback
We need to ensure we've fully-drained the pipe before
signalling EOF to the callback, since pipelining may
not be the best choice with detachable processes
in the future.
Eric Wong [Sat, 4 Mar 2017 00:32:45 +0000 (00:32 +0000)]
repobrowse: fixup format-patch display
We need to take the revision into account when generating
patches :P While we're at it, disabiguate URLs by resolving
refnames to (un-SHAttered) hex identifiers.
Eric Wong [Fri, 3 Mar 2017 22:07:19 +0000 (22:07 +0000)]
repobrowse: src/ endpoint requires a tip to be specified
Implying a tip would make for ambiguous URLs and ruin
caching, so try to get everybody to hit the same URL.
This also simplifies some of our other code since
the tip is always in the request.
Eric Wong [Fri, 3 Mar 2017 04:09:34 +0000 (04:09 +0000)]
repobrowse: avoid excessive buffering in raw endpoint
Relying on qspawn allows us to serve arbitrarily large
files without excessive buffering. We'll special-case
small files in the future to avoid qspawn, as those
small files should fit comfortably in socket buffers.
Eric Wong [Thu, 2 Mar 2017 23:39:49 +0000 (23:39 +0000)]
repobrowse: rename "tree" endpoint to "src"
This is shorter, and makes more sense as the endpoint
displays both tree listings and actual blob sources.
This will also make rewriting existing URLs from cgit
installations easier.
Eric Wong [Thu, 2 Mar 2017 03:36:24 +0000 (03:36 +0000)]
repobrowse: rework source view to use async cat-file API
This will allow most source files to be displayed without
blocking public-inbox-httpd on slow disk access. However, we no
longer support displaying source files larger than 65536 bytes
(the size of a pipe on current Linux).
Eric Wong [Wed, 22 Feb 2017 03:01:24 +0000 (03:01 +0000)]
repobrowse: fixup revision handling
Revisions passed in the URL must not be ignored.
This fixes some bugs introduced in commit f6244586ba4f5a5e7575e1254be8c9bbe303fce9
("repobrowse: switch to new URL format to avoid query strings")
Eric Wong [Tue, 21 Feb 2017 23:05:46 +0000 (23:05 +0000)]
repobrowse: stop abbreviating commit hashes
Abbreviations can become ambiguous over time, and it seems other
tools are fine with displaying unabbreviated hashes for commits.
This should reduce workload for the search engines, too.
We do not need specialized trailing slashes if we break URL
compatibility from cgit, here. Removing trailing (and redundant)
slashes improves our hit rates with across both server-side
(varnish, squid) and client-side (browser) layers.
Eric Wong [Fri, 17 Feb 2017 23:40:37 +0000 (23:40 +0000)]
repobrowse: minor style cleanups
Avoid using '=>' arrow notation for arrays and array references,
it is confusing and more verbose. Additionally, combine
"use constant" statements when possible.
Eric Wong [Wed, 15 Feb 2017 22:35:18 +0000 (22:35 +0000)]
repobrowse: switch to new URL format to avoid query strings
Query strings make endpoint caching more difficult since
they're order-independent. They are also more likely lost
or truncated inadvertantly when copy+pasting, so try to
avoid them for default endpoints.
There's still some things which are broken and followup
commits will be needed to fix them.
Eric Wong [Tue, 14 Feb 2017 22:56:37 +0000 (22:56 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master:
www: do not unescape PATH_INFO twice
t/mime: quiet warnings for old versions of Email::Simple
handle repeated References and In-Reply-To headers
Eric Wong [Sun, 12 Feb 2017 09:04:54 +0000 (09:04 +0000)]
searchidx: switch to accounting by message bytes
Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.
More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.
This causes a mismatch between how our search indexer threads
and how our HTML view handles threading. In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.
We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.
Eric Wong [Fri, 10 Feb 2017 03:30:40 +0000 (03:30 +0000)]
repo: search index flushes for excessive active refs
For certain repos, having too many active refs will cause
memory usage problems. Mitigate the Xapian problems, for
now, and consider a switch to GDBM_File or similar for
repos with more refs.
Eric Wong [Thu, 9 Feb 2017 21:11:00 +0000 (21:11 +0000)]
repo: increase search index flush granularity
We need to flush Xapian more frequently to account for
gigantic commits which introduce lots of text, so do
it when accounting for each line processed, and not
for each commit processed.
Eric Wong [Thu, 9 Feb 2017 01:37:03 +0000 (01:37 +0000)]
repobrowse: shorten internal names
We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.
Eric Wong [Thu, 9 Feb 2017 00:43:02 +0000 (00:43 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master:
config: do not slurp lines into memory
TODO: several updates
search: schema version bump for empty References/In-Reply-To
Revert "searchidx: reindex clobbers old thread IDs"
searchidx: reindex clobbers old thread IDs
searchidx: deal with empty In-Reply-To and References headers
searchview: increase limit for displaying search results
searchview: clarify numeric summary at bottom
add filter for Subject: tags
watchmaildir: allow arguments for filters
watchmaildir: limit live importer processes
learn: implement "rm" only functionality
mime: avoid SUPER usage in Email::MIME subclass
inbox: reinstate periodic cleanup of Xapian and SQLite objects
introduce PublicInbox::MIME wrapper class
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)]
search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)]
searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)]
searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)]
searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)]
add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Eric Wong [Sat, 21 Jan 2017 11:50:58 +0000 (11:50 +0000)]
repobrowse: preserve newlines in Atom feed
Commit messages are assumed to be displayed in a terminal
with a fixed width font, so we must preserve newlines and
all whitespace as-is so ASCII art may be displayed properly.
Based on what was done for the Atom feed, this will allow us to
simplify state management through metaprogramming and avoid
placeholder characters ('D' for decoration) for empty fields.
Eric Wong [Sat, 21 Jan 2017 02:29:52 +0000 (02:29 +0000)]
repobrowse: git Atom feed uses Qspawn->psgi_return
This allows us to wait on "git log" output in a non-blocking manner
while being able to throttle on backpressure from slow clients
when used with pi-httpd.
Eric Wong [Sat, 21 Jan 2017 02:29:50 +0000 (02:29 +0000)]
qspawn: better annotate where $qx_cb is called
Hopefully this makes the code easier-to-follow for random
readers. This requires a small amount of modification to
our one caller, but this is a new, unstable API (as is
nearly all of our code).
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)]
learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)]
mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
Eric Wong [Wed, 18 Jan 2017 07:35:35 +0000 (07:35 +0000)]
http: cast a wider net to prevent circular references
We can more effectly nuke circular references by clearing
the entire PSGI $env, not just particular keys, when
there are self-referential fields such as "qspawn.response"
in our environment.
Eric Wong [Wed, 18 Jan 2017 07:27:03 +0000 (07:27 +0000)]
repobrowse: git snapshot waits for all commands asynchronously
This new asynchronous API, psgi_qx, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
Eric Wong [Sun, 15 Jan 2017 02:26:39 +0000 (02:26 +0000)]
repobrowse: use qspawn for plain tree views
We may eventually handle tree parsing ourselves (since we
already git cat-file), but for now we can rely on ls-tree
to give good output and qspawn to manage resource allocation.
Eric Wong [Fri, 13 Jan 2017 23:10:25 +0000 (23:10 +0000)]
httpd/async: stop running command if client disconnects
If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)]
inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.