]> git.ipfire.org Git - thirdparty/public-inbox.git/log
thirdparty/public-inbox.git
8 years agogit: move async detection to runtime
Eric Wong [Fri, 24 Feb 2017 00:47:45 +0000 (00:47 +0000)] 
git: move async detection to runtime

We don't actually know what context we'll be called under,
so detecting the mere use-ability of Danga::Socket is not
sufficient.

8 years agorepobrowse: eliminate unused query parameters
Eric Wong [Wed, 22 Feb 2017 03:10:43 +0000 (03:10 +0000)] 
repobrowse: eliminate unused query parameters

We will try to reduce the amount of query parameters as
much as possible to make URLs more amenable to caching
at various levels.

8 years agorepobrowse: fixup revision handling
Eric Wong [Wed, 22 Feb 2017 03:01:24 +0000 (03:01 +0000)] 
repobrowse: fixup revision handling

Revisions passed in the URL must not be ignored.
This fixes some bugs introduced in commit
f6244586ba4f5a5e7575e1254be8c9bbe303fce9
("repobrowse: switch to new URL format to avoid query strings")

8 years agorepobrowse: stop abbreviating commit hashes
Eric Wong [Tue, 21 Feb 2017 23:05:46 +0000 (23:05 +0000)] 
repobrowse: stop abbreviating commit hashes

Abbreviations can become ambiguous over time, and it seems other
tools are fine with displaying unabbreviated hashes for commits.
This should reduce workload for the search engines, too.

8 years agorepobrowse: unconditionally remove trailing slash handling
Eric Wong [Sun, 19 Feb 2017 19:01:39 +0000 (19:01 +0000)] 
repobrowse: unconditionally remove trailing slash handling

We do not need specialized trailing slashes if we break URL
compatibility from cgit, here.  Removing trailing (and redundant)
slashes improves our hit rates with across both server-side
(varnish, squid) and client-side (browser) layers.

8 years agorepobrowse: return git errors as text/plain, for now
Eric Wong [Sun, 19 Feb 2017 03:44:27 +0000 (03:44 +0000)] 
repobrowse: return git errors as text/plain, for now

For now, this avoids an HTML injection vector.  We'll try to
have more consistent error reporting in the future.

8 years agorepobrowse: minor style cleanups
Eric Wong [Fri, 17 Feb 2017 23:40:37 +0000 (23:40 +0000)] 
repobrowse: minor style cleanups

Avoid using '=>' arrow notation for arrays and array references,
it is confusing and more verbose.  Additionally, combine
"use constant" statements when possible.

8 years agorepobrowse: remove unnecessary import
Eric Wong [Fri, 17 Feb 2017 23:08:39 +0000 (23:08 +0000)] 
repobrowse: remove unnecessary import

We do not need to escape URIs in this file.

8 years agorepobrowse: rename "plain" endpoint to "raw"
Eric Wong [Fri, 17 Feb 2017 03:31:16 +0000 (03:31 +0000)] 
repobrowse: rename "plain" endpoint to "raw"

This name is shorter and matches terminology in gitweb and
other popular git web viewers.

8 years agorepobrowse: memoize git symbolic-ref resolution
Eric Wong [Thu, 16 Feb 2017 23:26:01 +0000 (23:26 +0000)] 
repobrowse: memoize git symbolic-ref resolution

The "HEAD" symbolic ref is rarely changed, so
memoize it for now and avoid exposing it in URLs.

8 years agorepobrowse: shorten "repo_info" to "-repo"
Eric Wong [Thu, 16 Feb 2017 20:53:42 +0000 (20:53 +0000)] 
repobrowse: shorten "repo_info" to "-repo"

This makes it more consistent with how we use the Inbox
objects for the main code.

8 years agorepo: only read description if git
Eric Wong [Thu, 16 Feb 2017 20:39:08 +0000 (20:39 +0000)] 
repo: only read description if git

Other VCSes have other means of providing the description.

8 years agorepobrowse: switch to new URL format to avoid query strings
Eric Wong [Wed, 15 Feb 2017 22:35:18 +0000 (22:35 +0000)] 
repobrowse: switch to new URL format to avoid query strings

Query strings make endpoint caching more difficult since
they're order-independent.  They are also more likely lost
or truncated inadvertantly when copy+pasting, so try to
avoid them for default endpoints.

There's still some things which are broken and followup
commits will be needed to fix them.

8 years agoconfig: avoid circular loading dependency
Eric Wong [Wed, 15 Feb 2017 00:06:06 +0000 (00:06 +0000)] 
config: avoid circular loading dependency

We must lazilly load one of them, so load Inbox later
since we need to parse the config, first.

8 years agorepobrowse: do not unescape PATH_INFO twice
Eric Wong [Tue, 14 Feb 2017 23:19:34 +0000 (23:19 +0000)] 
repobrowse: do not unescape PATH_INFO twice

PSGI specs already require PATH_INFO to be unescaped.

Followup-to: commit 364de65f8a6b5729027cb70228312a141430122f
("www: do not unescape PATH_INFO twice")

8 years agoMerge remote-tracking branch 'origin/master' into repobrowse
Eric Wong [Tue, 14 Feb 2017 22:56:37 +0000 (22:56 +0000)] 
Merge remote-tracking branch 'origin/master' into repobrowse

* origin/master:
  www: do not unescape PATH_INFO twice
  t/mime: quiet warnings for old versions of Email::Simple
  handle repeated References and In-Reply-To headers

8 years agosearchidx: switch to accounting by message bytes
Eric Wong [Sun, 12 Feb 2017 09:04:54 +0000 (09:04 +0000)] 
searchidx: switch to accounting by message bytes

Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.

More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.

8 years agowww: do not unescape PATH_INFO twice
Eric Wong [Tue, 14 Feb 2017 22:45:15 +0000 (22:45 +0000)] 
www: do not unescape PATH_INFO twice

PSGI specs already require PATH_INFO to be unescaped;
so our tests were wrong, too.

8 years agot/mime: quiet warnings for old versions of Email::Simple
Eric Wong [Sun, 12 Feb 2017 02:41:22 +0000 (02:41 +0000)] 
t/mime: quiet warnings for old versions of Email::Simple

This is fixed in the newest versions of Email::Simple,
but not the version in Debian jessie (2.203)

8 years agohandle repeated References and In-Reply-To headers
Eric Wong [Sat, 11 Feb 2017 23:54:48 +0000 (23:54 +0000)] 
handle repeated References and In-Reply-To headers

It seems possible for git-send-email(1) to generate repeated
repeated instances of References and In-Reply-To headers,
as evidenced in:

https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw

This causes a mismatch between how our search indexer threads
and how our HTML view handles threading.  In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.

We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.

8 years agorepo: lazily read description and cloneurl
Eric Wong [Sat, 11 Feb 2017 00:41:29 +0000 (00:41 +0000)] 
repo: lazily read description and cloneurl

This improves startup speed at the cost of CoW-friendliness
for long-lived daemons (which can be fixed, later).

8 years agoconfig: move try_cat function from inbox
Eric Wong [Fri, 10 Feb 2017 21:27:11 +0000 (21:27 +0000)] 
config: move try_cat function from inbox

This allows RepoConfig to be independent of the
PublicInbox::Inbox class.

8 years agorepo: add class for representing a code repo
Eric Wong [Fri, 10 Feb 2017 21:23:01 +0000 (21:23 +0000)] 
repo: add class for representing a code repo

This should hopefully allow us to organize our code better

8 years agorepogit: add prototypes for error checking
Eric Wong [Fri, 10 Feb 2017 21:19:11 +0000 (21:19 +0000)] 
repogit: add prototypes for error checking

And add a note to remove git_commit_title

8 years agorepo: search index flushes for excessive active refs
Eric Wong [Fri, 10 Feb 2017 03:30:40 +0000 (03:30 +0000)] 
repo: search index flushes for excessive active refs

For certain repos, having too many active refs will cause
memory usage problems.  Mitigate the Xapian problems, for
now, and consider a switch to GDBM_File or similar for
repos with more refs.

8 years agosearch: remove unnecessary abstractions and functionality
Eric Wong [Fri, 10 Feb 2017 01:51:05 +0000 (01:51 +0000)] 
search: remove unnecessary abstractions and functionality

This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.

While we're at it, fix thread-all.t :)

8 years agorepo: search index no longer indexes for --contains
Eric Wong [Thu, 9 Feb 2017 23:50:25 +0000 (23:50 +0000)] 
repo: search index no longer indexes for --contains

It's extraordinarily expensive to add these terms for
each and every commit.

8 years agorepo: increase search index flush granularity
Eric Wong [Thu, 9 Feb 2017 21:11:00 +0000 (21:11 +0000)] 
repo: increase search index flush granularity

We need to flush Xapian more frequently to account for
gigantic commits which introduce lots of text, so do
it when accounting for each line processed, and not
for each commit processed.

8 years agorepobrowse: shorten internal names
Eric Wong [Thu, 9 Feb 2017 01:37:03 +0000 (01:37 +0000)] 
repobrowse: shorten internal names

We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.

8 years agoMerge remote-tracking branch 'origin/master' into repobrowse
Eric Wong [Thu, 9 Feb 2017 00:43:02 +0000 (00:43 +0000)] 
Merge remote-tracking branch 'origin/master' into repobrowse

* origin/master:
  config: do not slurp lines into memory
  TODO: several updates
  search: schema version bump for empty References/In-Reply-To
  Revert "searchidx: reindex clobbers old thread IDs"
  searchidx: reindex clobbers old thread IDs
  searchidx: deal with empty In-Reply-To and References headers
  searchview: increase limit for displaying search results
  searchview: clarify numeric summary at bottom
  add filter for Subject: tags
  watchmaildir: allow arguments for filters
  watchmaildir: limit live importer processes
  learn: implement "rm" only functionality
  mime: avoid SUPER usage in Email::MIME subclass
  inbox: reinstate periodic cleanup of Xapian and SQLite objects
  introduce PublicInbox::MIME wrapper class

8 years agorepobrowse: avoid slurping lines
Eric Wong [Thu, 9 Feb 2017 00:26:52 +0000 (00:26 +0000)] 
repobrowse: avoid slurping lines

"foreach (<$fh>)" in Perl requests lines in array
context, so use "while" instead for lazy reading.

This follows ba4c50c20b95679580beba1ef290a4281d5285b7
in master ("config: do not slurp lines into memory")

8 years agoconfig: do not slurp lines into memory
Eric Wong [Wed, 8 Feb 2017 21:41:38 +0000 (21:41 +0000)] 
config: do not slurp lines into memory

There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context

(for some reason, I thought Perl5 was smart enough
 to avoid creating a temporary array, here...)

8 years agorepobrowse: start wiring up git search
Eric Wong [Sat, 4 Feb 2017 02:20:35 +0000 (02:20 +0000)] 
repobrowse: start wiring up git search

Much more work on this will be needed, but at least explicit
flush points prevents OOMs on my system.

8 years agoTODO: several updates
Eric Wong [Tue, 7 Feb 2017 22:27:52 +0000 (22:27 +0000)] 
TODO: several updates

Always plenty to do while working on this...

8 years agosearch: hoist out git directory search index helper
Eric Wong [Tue, 31 Jan 2017 22:55:58 +0000 (22:55 +0000)] 
search: hoist out git directory search index helper

We will be reusing this for indexing normal (code) repositories
using git and Xapian, too.

8 years agosearch: schema version bump for empty References/In-Reply-To
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)] 
search: schema version bump for empty References/In-Reply-To

We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.

8 years agoRevert "searchidx: reindex clobbers old thread IDs"
Eric Wong [Mon, 6 Feb 2017 21:37:26 +0000 (21:37 +0000)] 
Revert "searchidx: reindex clobbers old thread IDs"

Oops, that's broken, too.  I guess the only way to reindex
after fixing the thread detection is to start from scratch.

This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.

8 years agosearchidx: reindex clobbers old thread IDs
Eric Wong [Mon, 6 Feb 2017 21:08:13 +0000 (21:08 +0000)] 
searchidx: reindex clobbers old thread IDs

We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.

8 years agosearchidx: deal with empty In-Reply-To and References headers
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)] 
searchidx: deal with empty In-Reply-To and References headers

In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).

See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.

Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
  <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>

8 years agosearchview: increase limit for displaying search results
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)] 
searchview: increase limit for displaying search results

We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000!  Increase this to
200 for now.

8 years agosearchview: clarify numeric summary at bottom
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)] 
searchview: clarify numeric summary at bottom

Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.

8 years agorepobrowse: git tag listing is now async
Eric Wong [Sat, 4 Feb 2017 02:21:06 +0000 (02:21 +0000)] 
repobrowse: git tag listing is now async

I'm unsure if this is even a good idea to support,
but we have it, for now.

8 years agorepobrowse/git/atom: remove unused subroutine
Eric Wong [Thu, 26 Jan 2017 07:58:28 +0000 (07:58 +0000)] 
repobrowse/git/atom: remove unused subroutine

We never ended up using it.

8 years agorepobrowse: simplify command generation for git commands
Eric Wong [Thu, 26 Jan 2017 04:27:02 +0000 (04:27 +0000)] 
repobrowse: simplify command generation for git commands

This shortens the code quite a bit at a negligible performance cost,
and the diffstat agrees.

8 years agoadd filter for Subject: tags
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)] 
add filter for Subject: tags

Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side.  They also waste precious screen space and
attention span.

Remove them from our archives to reduce clutter.

8 years agowatchmaildir: allow arguments for filters
Eric Wong [Wed, 25 Jan 2017 21:39:06 +0000 (21:39 +0000)] 
watchmaildir: allow arguments for filters

We'll want to allow some degree of configuration for
various mailing lists.

8 years agorepobrowse: git summary view uses psgi_qx
Eric Wong [Sun, 22 Jan 2017 22:10:46 +0000 (22:10 +0000)] 
repobrowse: git summary view uses psgi_qx

This reduces one synchronous dependency from the hot path,
and psgi_return will be used in the future.

8 years agot/httpd-unix: better diagnostics and comments for test
Eric Wong [Sun, 22 Jan 2017 01:52:25 +0000 (01:52 +0000)] 
t/httpd-unix: better diagnostics and comments for test

I've hit random test failures on this, so attempt to improve
diagnostics and improve documentation for this test.

8 years agorepobrowse: preserve newlines in Atom feed
Eric Wong [Sat, 21 Jan 2017 11:50:58 +0000 (11:50 +0000)] 
repobrowse: preserve newlines in Atom feed

Commit messages are assumed to be displayed in a terminal
with a fixed width font, so we must preserve newlines and
all whitespace as-is so ASCII art may be displayed properly.

8 years agorepobrowse: simplify git log parsing implementation
Eric Wong [Sat, 21 Jan 2017 11:34:31 +0000 (11:34 +0000)] 
repobrowse: simplify git log parsing implementation

Based on what was done for the Atom feed, this will allow us to
simplify state management through metaprogramming and avoid
placeholder characters ('D' for decoration) for empty fields.

8 years agorepobrowse: fix full URL generation in Atom feed
Eric Wong [Sat, 21 Jan 2017 04:41:06 +0000 (04:41 +0000)] 
repobrowse: fix full URL generation in Atom feed

We must not drop the leading slash in the URI.  This
regression was introduced when we dropped Plack::Request
dependency.

8 years agorepobrowse: avoid extra hash assignments for Atom feed
Eric Wong [Sat, 21 Jan 2017 04:35:27 +0000 (04:35 +0000)] 
repobrowse: avoid extra hash assignments for Atom feed

This should make the code somewhat easier-to-follow.

8 years agorepobrowse: git Atom feed uses Qspawn->psgi_return
Eric Wong [Sat, 21 Jan 2017 02:29:52 +0000 (02:29 +0000)] 
repobrowse: git Atom feed uses Qspawn->psgi_return

This allows us to wait on "git log" output in a non-blocking manner
while being able to throttle on backpressure from slow clients
when used with pi-httpd.

8 years agorepobrowse: git Atom feed uses Qspawn->psgi_qx
Eric Wong [Sat, 21 Jan 2017 02:29:51 +0000 (02:29 +0000)] 
repobrowse: git Atom feed uses Qspawn->psgi_qx

This allows pi-httpd to service other I/O while we wait on "git
symbolic-ref" to run.  And psgi_return will be used in the next
commit...

8 years agoqspawn: better annotate where $qx_cb is called
Eric Wong [Sat, 21 Jan 2017 02:29:50 +0000 (02:29 +0000)] 
qspawn: better annotate where $qx_cb is called

Hopefully this makes the code easier-to-follow for random
readers.  This requires a small amount of modification to
our one caller, but this is a new, unstable API (as is
nearly all of our code).

8 years agowatchmaildir: limit live importer processes
Eric Wong [Wed, 18 Jan 2017 19:13:09 +0000 (19:13 +0000)] 
watchmaildir: limit live importer processes

We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.

8 years agolearn: implement "rm" only functionality
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)] 
learn: implement "rm" only functionality

Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.

8 years agomime: avoid SUPER usage in Email::MIME subclass
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)] 
mime: avoid SUPER usage in Email::MIME subclass

We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method.  Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.

Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>

8 years agorepobrowse: expath is always defined
Eric Wong [Wed, 18 Jan 2017 08:17:50 +0000 (08:17 +0000)] 
repobrowse: expath is always defined

Remove an outdated comment while we're at it, too.

8 years agohttp: cast a wider net to prevent circular references
Eric Wong [Wed, 18 Jan 2017 07:35:35 +0000 (07:35 +0000)] 
http: cast a wider net to prevent circular references

We can more effectly nuke circular references by clearing
the entire PSGI $env, not just particular keys, when
there are self-referential fields such as "qspawn.response"
in our environment.

8 years agorepobrowse: git snapshot waits for all commands asynchronously
Eric Wong [Wed, 18 Jan 2017 07:27:03 +0000 (07:27 +0000)] 
repobrowse: git snapshot waits for all commands asynchronously

This new asynchronous API, psgi_qx, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.

8 years agoqspawn: better description
Eric Wong [Tue, 17 Jan 2017 19:38:36 +0000 (19:38 +0000)] 
qspawn: better description

We'll probably use this in a lot of places...

8 years agorepobrowse: verbose git tree display uses qspawn for ls-tree
Eric Wong [Sun, 15 Jan 2017 03:11:14 +0000 (03:11 +0000)] 
repobrowse: verbose git tree display uses qspawn for ls-tree

For now, qspawn provides resource management for dealing with
expensive "git ls-tree" processes.

8 years agorepobrowse: use qspawn for plain tree views
Eric Wong [Sun, 15 Jan 2017 02:26:39 +0000 (02:26 +0000)] 
repobrowse: use qspawn for plain tree views

We may eventually handle tree parsing ourselves (since we
already git cat-file), but for now we can rely on ls-tree
to give good output and qspawn to manage resource allocation.

8 years agorepobrowse: git: drop unused diff parsing routines
Eric Wong [Wed, 11 Jan 2017 08:46:35 +0000 (08:46 +0000)] 
repobrowse: git: drop unused diff parsing routines

We don't need these legacy routines anymore and use the
newer stream-friendly _sed interface.

8 years agohttpd/async: stop running command if client disconnects
Eric Wong [Fri, 13 Jan 2017 23:10:25 +0000 (23:10 +0000)] 
httpd/async: stop running command if client disconnects

If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.

8 years agorepobrowse: simplify conditional for cat-file input
Eric Wong [Fri, 13 Jan 2017 22:53:20 +0000 (22:53 +0000)] 
repobrowse: simplify conditional for cat-file input

expath is always defined, even to an empty string,
so simplify the conditional for checking it.

8 years agorename "GitAsyncRd" to "GitAsync"
Eric Wong [Fri, 13 Jan 2017 22:28:36 +0000 (22:28 +0000)] 
rename "GitAsyncRd" to "GitAsync"

This wrapper class actually does both reading and
writing, and a shorter name is nicer.

8 years agogitasyncrd: pass a reference to Danga::Socket::write
Eric Wong [Fri, 13 Jan 2017 22:24:45 +0000 (22:24 +0000)] 
gitasyncrd: pass a reference to Danga::Socket::write

D::S creates a reference for this, anyways, so avoid
the extra work by doing it ourselves.

8 years agorepobrowse: comment describing Git wrapper creation
Eric Wong [Fri, 13 Jan 2017 22:05:10 +0000 (22:05 +0000)] 
repobrowse: comment describing Git wrapper creation

Metaprogramming can be difficult-to-read after several
months, so leave comments in place to describe common
usage results of.

8 years agorepobrowse: port git log view to qspawn streaming interface
Eric Wong [Fri, 13 Jan 2017 02:13:18 +0000 (02:13 +0000)] 
repobrowse: port git log view to qspawn streaming interface

This will prevent too many processes from being spawned at once
while also allowing us to respond to backpressure from slow
clients.

8 years agoinbox: reinstate periodic cleanup of Xapian and SQLite objects
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)] 
inbox: reinstate periodic cleanup of Xapian and SQLite objects

We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.

8 years agorepobrowse: make git diff output use qspawn
Eric Wong [Wed, 11 Jan 2017 04:12:29 +0000 (04:12 +0000)] 
repobrowse: make git diff output use qspawn

This is a potentially expensive operation, so we may want to
give it it's own limiter channel.

8 years agodiff: note the dangers of gigantic anchors hash
Eric Wong [Wed, 11 Jan 2017 04:12:28 +0000 (04:12 +0000)] 
diff: note the dangers of gigantic anchors hash

8 years agoasync: improve and fix out-of-date comments
Eric Wong [Wed, 11 Jan 2017 04:12:27 +0000 (04:12 +0000)] 
async: improve and fix out-of-date comments

8 years agorepobrowse: qspawn + streaming for git commit display
Eric Wong [Wed, 11 Jan 2017 04:12:26 +0000 (04:12 +0000)] 
repobrowse: qspawn + streaming for git commit display

This prevents "git show" processes from monopolizing
the system and allows us to better handle backpressure
from gigantic commits.

8 years agoqspawn: fix bad error reporting on errors
Eric Wong [Wed, 11 Jan 2017 04:12:25 +0000 (04:12 +0000)] 
qspawn: fix bad error reporting on errors

Oops :x

8 years agointroduce PublicInbox::MIME wrapper class
Eric Wong [Tue, 10 Jan 2017 21:40:37 +0000 (21:40 +0000)] 
introduce PublicInbox::MIME wrapper class

This should fix problems with multipart messages where
text/plain parts lack a header.

cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
    refs/pull/28/head

In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.

8 years agogithttpbackend: use psgi_return shortcut
Eric Wong [Sun, 8 Jan 2017 04:39:18 +0000 (04:39 +0000)] 
githttpbackend: use psgi_return shortcut

This drastically cuts down the amount of duplicate code
we have in this branch.

8 years agohttpd/async: remove needless sysread wrapper
Eric Wong [Sun, 8 Jan 2017 04:31:30 +0000 (04:31 +0000)] 
httpd/async: remove needless sysread wrapper

We don't appear to be using it anywhere

8 years agoMerge remote-tracking branch 'origin/master' into repobrowse
Eric Wong [Sun, 8 Jan 2017 04:25:51 +0000 (04:25 +0000)] 
Merge remote-tracking branch 'origin/master' into repobrowse

* origin/master:
  inbox: properly register cleanup timer for git processes
  search: remove subject_summary
  searchmsg: favor direct hash access over accessor methods
  remove incorrect comment about strftime + locales
  config: allow per-inbox nntpserver
  inbox: eliminate weaken usage entirely
  inbox: describe the full key name
  config: remove unused get() method
  config: always use namespaced "publicinboxlimiter"
  qspawn: prepare to support runtime reloading of Limiter
  http: remove weaken usage, reduce anonsub capture scope
  httpd/async: remove weaken usage
  http: fix spelling error
  watch: watchspam affects all configured inboxes
  doc: minor updates to design notes

8 years agoinitial git async work
Eric Wong [Sat, 31 Dec 2016 11:16:47 +0000 (11:16 +0000)] 
initial git async work

This will allow us to handle network operations while waiting
on "git cat-file" to seek and unpack things.

8 years agoinbox: drop $ref arg for writing destination buffer
Eric Wong [Sat, 7 Jan 2017 22:56:03 +0000 (22:56 +0000)] 
inbox: drop $ref arg for writing destination buffer

We never used this feature, so lets drop it for now
since we can have fine-grained memory release with
reference counting, anyways.

8 years agoinbox: properly register cleanup timer for git processes
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)] 
inbox: properly register cleanup timer for git processes

We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).

For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.

8 years agosearch: remove subject_summary
Eric Wong [Sat, 7 Jan 2017 01:44:52 +0000 (01:44 +0000)] 
search: remove subject_summary

Apparently it never actually got used, and the world seems
fine without it, so we can drop it.

While we're at it, consider removing our subject_path
usage from existence, too.  We are not using fancy subject-line
based URLs, here.

8 years agosearchmsg: favor direct hash access over accessor methods
Eric Wong [Sat, 7 Jan 2017 01:44:51 +0000 (01:44 +0000)] 
searchmsg: favor direct hash access over accessor methods

This is faster, smaller, and more straighforward to me with
fewer layers of indirection.

8 years agoremove incorrect comment about strftime + locales
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)] 
remove incorrect comment about strftime + locales

We only need strftime to be locale-independent when generating
dates for email and HTTP headers.  Purely numeric dates can
use strftime for ease-of-readability.

8 years agoconfig: allow per-inbox nntpserver
Eric Wong [Sat, 7 Jan 2017 01:44:49 +0000 (01:44 +0000)] 
config: allow per-inbox nntpserver

This allows certain inboxes to override the global nntpserver
(perhaps under a different domain).

8 years agoinbox: eliminate weaken usage entirely
Eric Wong [Sat, 7 Jan 2017 01:44:48 +0000 (01:44 +0000)] 
inbox: eliminate weaken usage entirely

We can do a better job initializing the data structure
so we no longer need to rely on weak references to cleanup
when we ditch the config on reload.

8 years agoinbox: describe the full key name
Eric Wong [Sat, 7 Jan 2017 01:44:47 +0000 (01:44 +0000)] 
inbox: describe the full key name

Hopefully make this easier for future generations to understand.

8 years agoconfig: remove unused get() method
Eric Wong [Sat, 7 Jan 2017 01:44:46 +0000 (01:44 +0000)] 
config: remove unused get() method

This seems like an unnecessary abstraction, or an abstraction
on the wrong level.

8 years agoconfig: always use namespaced "publicinboxlimiter"
Eric Wong [Sat, 7 Jan 2017 01:44:45 +0000 (01:44 +0000)] 
config: always use namespaced "publicinboxlimiter"

I'm not sure if we'll ever support sharing a config file
with other tools, but maybe we will, and "limiter" is
too generic.

8 years agoqspawn: prepare to support runtime reloading of Limiter
Eric Wong [Sat, 7 Jan 2017 01:44:44 +0000 (01:44 +0000)] 
qspawn: prepare to support runtime reloading of Limiter

We may allow the {max} value of a limiter to be changed
in the future, so lets start accounting for it before we
spawn followup processes.

8 years agohttp: remove weaken usage, reduce anonsub capture scope
Eric Wong [Wed, 4 Jan 2017 11:20:51 +0000 (11:20 +0000)] 
http: remove weaken usage, reduce anonsub capture scope

Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection.  So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.

While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.

8 years agohttpd/async: remove weaken usage
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)] 
httpd/async: remove weaken usage

We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.

8 years agohttp: fix spelling error
Eric Wong [Wed, 4 Jan 2017 11:20:49 +0000 (11:20 +0000)] 
http: fix spelling error

Oops.  And we'll be fixing circular references from now...

8 years agowatch: watchspam affects all configured inboxes
Eric Wong [Mon, 2 Jan 2017 13:16:15 +0000 (13:16 +0000)] 
watch: watchspam affects all configured inboxes

If a message is spam in one mailbox, it is spam in all others a
particular user/group will care about.

8 years agorepobrowse: avoid empty pathspecs for future git compatibility
Eric Wong [Mon, 26 Dec 2016 03:04:08 +0000 (03:04 +0000)] 
repobrowse: avoid empty pathspecs for future git compatibility

At the moment, we always set expath, so it will always be
defined.

8 years agodoc: minor updates to design notes
Eric Wong [Mon, 26 Dec 2016 21:41:15 +0000 (21:41 +0000)] 
doc: minor updates to design notes

ssoma is not worth marketing, but perhaps our mirror of
the git mailing list archives is...

8 years agospawn: remove non-blocking support, here
Eric Wong [Mon, 26 Dec 2016 09:58:02 +0000 (09:58 +0000)] 
spawn: remove non-blocking support, here

It is never used, and inappropriate to support in generic code.

HTTPD::Async already sets non-blocking, and it's better to do it
in -httpd-specific code since we know our -httpd can handle it.