Eric Wong [Sat, 21 Jan 2017 02:29:52 +0000 (02:29 +0000)]
repobrowse: git Atom feed uses Qspawn->psgi_return
This allows us to wait on "git log" output in a non-blocking manner
while being able to throttle on backpressure from slow clients
when used with pi-httpd.
Eric Wong [Sat, 21 Jan 2017 02:29:50 +0000 (02:29 +0000)]
qspawn: better annotate where $qx_cb is called
Hopefully this makes the code easier-to-follow for random
readers. This requires a small amount of modification to
our one caller, but this is a new, unstable API (as is
nearly all of our code).
Eric Wong [Wed, 18 Jan 2017 07:35:35 +0000 (07:35 +0000)]
http: cast a wider net to prevent circular references
We can more effectly nuke circular references by clearing
the entire PSGI $env, not just particular keys, when
there are self-referential fields such as "qspawn.response"
in our environment.
Eric Wong [Wed, 18 Jan 2017 07:27:03 +0000 (07:27 +0000)]
repobrowse: git snapshot waits for all commands asynchronously
This new asynchronous API, psgi_qx, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
Eric Wong [Sun, 15 Jan 2017 02:26:39 +0000 (02:26 +0000)]
repobrowse: use qspawn for plain tree views
We may eventually handle tree parsing ourselves (since we
already git cat-file), but for now we can rely on ls-tree
to give good output and qspawn to manage resource allocation.
Eric Wong [Fri, 13 Jan 2017 23:10:25 +0000 (23:10 +0000)]
httpd/async: stop running command if client disconnects
If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)]
inbox: properly register cleanup timer for git processes
We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).
For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)]
remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)]
httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
Eric Wong [Mon, 26 Dec 2016 05:25:36 +0000 (05:25 +0000)]
Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: (25 commits)
evcleanup: ensure deferred close from timers are handled ASAP
httpd/async: improve variable naming
githttpbackend: minor cleanups to improve readability
githttpbackend: simplify compatibility code
githttpbackend: minor readability improvement
http: fix clobbering of $null_io
linkify: modify argument in place
view: do not modify array during iteration
view: stop chomping off whitespace at ends of messages
view: remove unused parameter
search: lookup_mail handles modified DBs
doc: various comments on async handling
searchthread: simplify API and remove needless OO
searchthread: update comment about loop prevention
searchmsg: remove ensure_metadata
tests: add thread-all testing for benchmarking
searchmsg: do not memoize {date} field
searchmsg: remove locale-dependency for ->date
t/config.t: fix feedmax default
wwwtext: link to RFC4685 (Atom Threading)
...
Eric Wong [Mon, 26 Dec 2016 03:05:15 +0000 (03:05 +0000)]
evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
Eric Wong [Sat, 24 Dec 2016 11:52:42 +0000 (11:52 +0000)]
view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation
speed for a 368+ message thread. It also more faithfully
preserves the message as intended; even if the it makes the
sender look like a space-wasting slob :P
Eric Wong [Thu, 22 Dec 2016 04:38:13 +0000 (04:38 +0000)]
repobrowse: remove Plack::Request dependency
This does not make installation easier, but lightens runtime a
bit. Plack::Request is unnecessary bloat and indirection which
does things behind our back. $env has all the stuff we need.
Eric Wong [Tue, 20 Dec 2016 03:03:57 +0000 (03:03 +0000)]
searchmsg: remove ensure_metadata
Instead, only preload the ->mid field for threading,
as we only need ->thread and ->path once in Search->get_thread
(but we will need the ->mid field repeatedly).
This more than doubles View->load_results performance on
according to thread-all on an inbox with over 300K messages.
Eric Wong [Wed, 14 Dec 2016 20:58:00 +0000 (20:58 +0000)]
wwwtext: remove outdated comment
I originally envisioned wwwtext being more flexible and able to
serve arbitrary blobs; but at this point I consider it redundant
and public-inbox is not wiki software.
Eric Wong [Tue, 13 Dec 2016 21:56:39 +0000 (21:56 +0000)]
Merge remote-tracking branch 'origin/repobrowse' into repobrowse
* origin/repobrowse: (98 commits)
t/repobrowse_git_httpd.t: ensure signature exists for split
t/repobrowse_git_tree.t: fix test for lack of bold
repobrowse: fix alignment of gitlink entries
repobrowse: show invalid type for tree views
repobrowse: do not bold directory names in tree view
repobrowse: reduce checks for response fh
repobrowse: larger, short-lived buffer for reading patches
repobrowse: reduce risk of callback reference cycles
repobrowse: snapshot support for cgit compatibility
test: disable warning for Plack::Test::Impl
repobrowse: avoid confusing linkification for "diff"
repobrowse: git commit view uses pi-httpd.async
repobrowse: more consistent variable naming for /commit/
repobrowse: show roughly equivalent "diff-tree" invocation
repobrowse: reduce local variables for state management
repobrowse: summary handles multiple README types
repobrowse: remove bold decorations from diff view
repobrowse: common git diff parsing code
repobrowse: implement diff view for compatibility
examples/repobrowse.psgi: disable Chunked response by default
...
Eric Wong [Mon, 12 Dec 2016 12:14:02 +0000 (12:14 +0000)]
daemon: set $now time for NNTP shutdown
commit 6e238ee3396719e578d6a90e177a71ce9f8c1ca0
("nntp: respect 3 minute idle time for shutdown")
was incomplete, and needed this change to Daemon
to be effective.
In the future, there will be more common code between
NNTP.pm and HTTP.pm
Eric Wong [Sat, 10 Dec 2016 23:35:43 +0000 (23:35 +0000)]
search: retry document loading from Xapian
In addition to needing to retry enquire queries, we also need
to protect document loading from the Xapian DB and retry on
modification, as it seems to throw the same errors.
Checking the $@ ref for Search::Xapian::DatabaseModifiedError
is actually in the test suite for both the XS and SWIG Xapian
bindings, so we should be good as far as forward/backwards
compatibility.
Eric Wong [Sat, 10 Dec 2016 01:09:51 +0000 (01:09 +0000)]
search: always sort thread results in ascending time order
This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.
This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.
Eric Wong [Sat, 10 Dec 2016 01:09:49 +0000 (01:09 +0000)]
view: skip ghosts with no direct children
Otherwise, a malicious or broken client could populate the
thread skeleton with invalid References. We only care about
ghosts which messages correctly refer to, not totally bogus ones
which may be the result of long line or token truncation +
wrapping in MUA headers.
Eric Wong [Sat, 10 Dec 2016 03:21:29 +0000 (03:21 +0000)]
view: favor SearchMsg for In-Reply-To over Email::MIME
This should avoid warnings during thread skeleton generation if
ever the Xapian database disagrees with View.pm about which is
the proper direct parent of a message. We will treat the data
in Xapian as the truth (if Xapian is available).
Eric Wong [Sat, 10 Dec 2016 01:09:46 +0000 (01:09 +0000)]
search: favor In-Reply-To over last References iff IRT exists
Some email clients set the References headers backwards, so
trust the In-Reply-To header if (and only if) it exists and
is parseable as direct parent of the current message.
For affected repos, this will require reindexing (via
"public-inbox-index --reindex"), but there will be no
version bump for this bugfix.
Eric Wong [Tue, 6 Dec 2016 23:40:33 +0000 (23:40 +0000)]
linkify: implement Markdown link compatibility (again)
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
This fixes parentheses detection at sentence endings, as seen
in practice on emails.
Eric Wong [Tue, 6 Dec 2016 23:01:39 +0000 (23:01 +0000)]
linkify: implement Markdown link compatibility
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
Eric Wong [Sat, 3 Dec 2016 00:24:06 +0000 (00:24 +0000)]
atom: switch to getline/close for response bodies
This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.
Hopefully clients are using streaming (SAX) parsers, too.
This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.
Eric Wong [Sat, 3 Dec 2016 00:24:51 +0000 (00:24 +0000)]
searchview: fix <title> tag in Atom feed
This only affects the Atom feed for search results.
"xmlstarlet val" failed to detect or warn about this,
and I only noticed this bug while working on another
patch.
Eric Wong [Tue, 29 Nov 2016 21:40:35 +0000 (21:40 +0000)]
note the source code is AGPL for cloning
This should be adequate warning for folks who may be
uncomfortable or uncertain about even possessing AGPL
source code due to employer agreements and such.
Disclaimer: I remain completely in favor of AGPL and strong
copyleft, and am more than willing to risk my own future on it.
However, I refuse to even nudge people into downloading AGPL
source code if it presents any legal risk to them.