Eric Wong [Wed, 11 Dec 2024 08:10:46 +0000 (08:10 +0000)]
cindex: adjust estimated memory cost for deletes
Based on my conversations with the Xapian lead, the cost of
deletes were overestimated by 7x in cindex. Adjust the estimate
cost of a deleted document to a more reasonable number based on
calculations discussed on the xapian-discuss list.
In any case, all of our batch size memory cost estimates are
rough since since Xapian provides no way of letting us know the
memory cost of the current transaction.
Eric Wong [Wed, 11 Dec 2024 08:10:45 +0000 (08:10 +0000)]
lei/store: use global checkpoint interval
Maybe this can be made configurable at some point, but it
probably needs to be stored in the config since per-invocation
intervals won't work when multiple lei clients can be writing to
the lei/store.
Eric Wong [Wed, 11 Dec 2024 08:10:44 +0000 (08:10 +0000)]
(ext)index: use time-based commits to avoid busy timeout
With public-facing read-only daemons typically run from a
different user than writers, we cannot rely on SQLite WAL
(write-ahead-log) for parallelism since all readers need write
permissions on read-write FSes to read from WAL DBs. Since we
can't force or even encourage WAL use for public-facing inboxes,
we need to ensure long-running --reindex jobs can commit
occasionally to prevent read-only daemons from hitting the
default 30s busy_timeout set by DBD::SQLite (not SQLite itself).
This mainly affects --reindex users, but can also affect
newly-cloned inboxes which are being served by read-only
daemons while they're being indexed.
This change only benefits read-only processes, and is likely to
penalize writer performance and storage efficiency due to
increased write frequency. We still maintain and respect
--batch-size for memory sized-based commits in addition to
time-based commits, but the new time-based commit interval is
necessary in case the batch size is too large or the system
is too slow to index a large batch.
While Xapian doesn't need time-based commits for read
parallelism, we commit to Xapian anyways since we want to
minimize consistency problems on interrupted indexing jobs.
Followup-to: 807abf67e14d (lei/store: auto-commit for long-running imports, 2024-11-15)
Eric Wong [Mon, 9 Dec 2024 20:57:49 +0000 (20:57 +0000)]
search_query: drop CR (`\r') from queries
w3m (and presumably other browsers) send CRLF instead of LF for
queries made from <textarea> from the VCS view. CRs require
unnecessary escaping and make RLs look ugly, so omit them
to cleanup URLs and reduce memory traffic in Xapian.
Eric Wong [Mon, 9 Dec 2024 20:01:28 +0000 (20:01 +0000)]
solver: fix and improve ambiguous OID debug messages
Ambiguous messages from git start with "hint:" (at least
nowadays) so we need to account for that in the regexp.
Furthermore, do not limit objects to just blobs since our
ViewVCS package is capable of displaying tags, trees, and
commits in addition to blobs.
Finally, we can generate full URLs so linkification can pick
them up and generate <a> tags.
Eric Wong [Sat, 7 Dec 2024 02:05:48 +0000 (02:05 +0000)]
v2: don't set No_COW for git repos
Unlike Xapian and SQLite DBs, data stored in git is both
precious and written sequentially. Thus disabling copy-on-write
doesn't make sense for git objects since the problems with
copy-on-write don't apply here while the benefits of CoW do.
Eric Wong [Wed, 4 Dec 2024 19:39:13 +0000 (19:39 +0000)]
sqlite: use `BLOB' column type instead of `VARBINARY'
`VARBINARY' isn't actually a documented type for SQLite but
rather the result of MySQL infecting my mind decades ago.
`BLOB' gives it the proper affinity and probably makes it easier
for 3rd-party and one-off scripts to deal with such columns and
should also make things more familiar to existing users.
Surprisingly, this appears to have no functional change with
forward or backwards compatibility since we ->bind_param binary
data with SQL_BLOB on INSERTs anyways to prevent the flexible
typing of SQLite from trying to guess types for us.
Eric Wong [Wed, 4 Dec 2024 19:39:12 +0000 (19:39 +0000)]
sqlite: avoid incorrect/deprecated `LIKE' use
The `case_sensitive_like' pragma is deprecated since
SQLite 3.44+. Furthermore, our use of `LIKE' in SharedKV->keys
seemed to be broken anyways since `LIKE' doesn't seem to
work with binary data (stored with SQL_BLOB), but neither does
`GLOB'.
So avoid `LIKE' entirely. For non-SQL_BLOB data we'll favor the
always-case-sensitive GLOB. For SQL_BLOB data, we must rely on
the Perl regexp engine from what I can tell. `GLOB' is
preferred where possible since SQLite will be able to use
indices in some cases whereas `REGEXP' cannot.
Fixing SharedKV->keys should improve bash completion for lei.
Some common SQLite-related utilities are now in a
PublicInbox::SQLiteUtil package which will be expanded to deal
with more commonalities between SQLite users in our tree.
Eric Wong [Sat, 30 Nov 2024 22:59:05 +0000 (22:59 +0000)]
daemon: improve warning on missing SO_ACCEPTFILTER
I noticed tests were failing on a freshly booted FreeBSD
instance due to the accf_http module not being loaded and
triggering autodie-generated error messages from setsockopt.
Instead, give a helpful warning message for users to use
kldload(8) to load the necessary filter.
We'll also relax tests to ignore the kldload warning and fix an
overzealous use of /.*/ while we're at by using /[^\n]/ instead
to avoid filtering out subsequent lines.
Eric Wong [Sat, 30 Nov 2024 08:26:32 +0000 (08:26 +0000)]
wqblocked: use per-instance unique timer
It was possible to end up with multiple timers being armed on a
busy system from LeiNoteEvent due to the often rapid nature of
renames. Since ->flush_send will flush all pending messages for
the WQ, it's only necessary to arm a single timer for a given WQ
socket. This saves some memory and unnecessary wakeups by using
the handy unique timer mechanism in DS.
Eric Wong [Fri, 29 Nov 2024 23:54:00 +0000 (23:54 +0000)]
send_cmd: throttle `sleeping on sendmsg' messages
Emitting a message every 100ms was too much for busy machines.
Throttle it to 1.6s to avoid flooding user terminals by emitting
every 16 sleeps. A power-of-two (16) was chosen for its
optimization potential via bitwise AND. Since perl(1) doesn't
do this optimization, we open code the bitwise AND. I don't
assume an optimizing compiler for Inline::C, either, since I
find working in C far more enjoyable with optimizations off.
Eric Wong [Fri, 29 Nov 2024 23:53:59 +0000 (23:53 +0000)]
send_cmd: use (practically) infinite retries for writers
Write tools (-*index, -watch, -mda, lei) should never croak due
to the system being busy. So make the retry infinite to benefit
users who run several parallel imports at once on a slower
system. The previous 5s timeout was too close to failing in
my own experience using `lei import' on an old, busy machine.
For lei (inotify || EVFILT_VNODE) watches, we now retry on busy
sockets to avoid loss of FS change notifications.
On the contrary, public-facing read-only interfaces have always
been assumed to constantly be under attack. Thus continuing to
drop requests due to a lack of kernel memory/buffers is probably
prudent.
Eric Wong [Fri, 29 Nov 2024 06:45:11 +0000 (06:45 +0000)]
lei/store: use WAL for over.sqlite3
WAL (write-ahead log) improves parallelism for readers when
they also have write access to the SQLite DB. While we
can't use WAL for public-inboxes where the -netd processes
are intended to only have read-only permssions, lei/store
always assumes read-write access.
The lei/store */ei15/over.sqlite3 DB was the only SQLite DB used
by lei without WAL. lei already set WAL for mail_sync.sqlite3
and the saved-searches/*/over.sqlite3 DBs.
Now that all SQLite DBs used by lei are WAL, commit 807abf67
(lei/store: auto-commit for long-running imports, 2024-11-15)
is no longer strictly necessary for parallelism during
long-running imports. However, 807abf67 may continue to be
useful to minimize the need to refetch after a power outage
during `lei import').
For saved-searches, we'll make use of the new mechanism for
setting {journal_mode} per-instance.
Eric Wong [Wed, 27 Nov 2024 02:35:17 +0000 (02:35 +0000)]
xapcmd: suppress opendir + my usage warning
Apparently perl gets confused here regardless of autodie, so we
add a parenthese around the subroutine call to disambiguate.
This only appears to happen when the target directory name is a
scalar variable and not if it's a constant and when the result
of opendir isn't explicitly checked.
Eric Wong [Tue, 26 Nov 2024 21:29:23 +0000 (21:29 +0000)]
lei: avoid repeatedly recreating anonymous subs
The SIGTERM handler doesn't change, so we can reuse it across
different instances without repeatedly creating a new one since
(AFAIK) perl(1) isn't able to deduplicate identical subs. In
any case, looping `local $SIG{TERM} = $coderef' reveals a minor
speedup compared to the equivalent `local $SIG{TERM} = sub {...}'.
Creating an anonymous sub for $SIG{__WARN__} every time
`lei rediff' is called is wasteful. Instead, provide a
knob to prevent the unnecessary warning from being emitted
by PublicInbox::Import in the first place so we can use the
existing warn_ignore_cb.
Eric Wong [Mon, 25 Nov 2024 22:27:48 +0000 (22:27 +0000)]
devel/try-lei: for interactive testing + debugging
This script allows creating a clean lei instance for interactive
testing without modifying a user's current lei $HOME||$XDG_*
directories. I used this to debug and test fixes leading to 99fc3d76 (v2writable: done: force synchronous awaitpid, 2024-11-19) and 807abf67 (lei/store: auto-commit for long-running imports, 2024-11-15)
fixes for long-running `lei import' runs.
Eric Wong [Mon, 25 Nov 2024 08:59:33 +0000 (08:59 +0000)]
imap_searchqp: attempt to suppress error messages harder
In addition to setting $::RD_ERRORS and $::RD_WARN to `undef'
for parsing the generated $prd object, we'll make those `undef'
on Parse::RecDescent object instantion, too, in an attempt to
reduce test failures.
Furthermore, add a note about the occasional test failure and
maybe somebody else can help us figure it out since it's been
sporadically failing for a while...
Followup-to: 31ca305f28d747a0 (t/imap_searchqp: hopefully fix test reliability, 2024-04-28) Followup-to: fa8bce03925461ef (t/imap_searchqp.t: retry bad query test on failure, 2023-10-10)
Eric Wong [Fri, 22 Nov 2024 23:05:57 +0000 (23:05 +0000)]
lei import: non-noisy by default, add --noisy switch
Email::Address::XS is too noisy by default to be useful given
the poorly formatted messages which exist in history. Quiet it
down by default since users often don't have the means to fix
such historical messages anyways.
Eric Wong [Tue, 19 Nov 2024 21:47:52 +0000 (21:47 +0000)]
v2writable: done: force synchronous awaitpid
We need to shut down shards synchronously to reliably release
the inbox write lock when inside the DS event loop (as the
lei/store subprocess is, unlike most v2writable users).
This seems to fix long-running `lei import' failures to
lei/store after repeated tests. It is a good idea anyways to
ensure exit status of shard workers are correct before returning
from ->done.
Eric Wong [Tue, 19 Nov 2024 21:47:51 +0000 (21:47 +0000)]
treewide: warn on SQLite `PRAGMA optimize' failure
While `PRAGMA optimize' isn't a strict requirement for proper
functionality anywhere, displaying the failure can help detect
bigger problems in the future in case of failing hardware.
Eric Wong [Sat, 16 Nov 2024 07:09:51 +0000 (07:09 +0000)]
admin: autodie chdir + open
autodie gives us more consistent error messages and reduces
visual noise on our end. We can also open() directly into a
hash entry without relying on a temporary variable.
Eric Wong [Sat, 16 Nov 2024 07:09:48 +0000 (07:09 +0000)]
index: use v5.12, remove outdated comment
There's no unicode_strings-dependent code in this script, so
v5.12 is safe. The comment about libeatmydata shouldn't be
necessary for -index any longer since we support --no-fsync
nowadays and Xapian 1.4+ is fairly widespread.
Eric Wong [Fri, 15 Nov 2024 22:23:15 +0000 (22:23 +0000)]
lei/store: auto-commit for long-running imports
DBD::SQLite (not SQLite itself) sets a 30s busy_timeout which we
currently do not override. This means readers can wait up to
30s for a writer to finish. For long imports exceeding 30s,
SQLite readers (for deduplication during import) can die with a
"database is locked" message while the lei/store process holds a
long write transaction open.
Forcing commits every 5s ought to fix the problem in most cases,
assuming commits themselves happen in under 25s (which isn't
always true on slow devices). 5 seconds was chosen since it
matches the default commit interval on ext* filesystems and the
vm.dirty_writeback_centisecs sysctl.
Many (but not all) failures around long-running `lei import'
processes.
Eric Wong [Fri, 15 Nov 2024 02:59:32 +0000 (02:59 +0000)]
view: fix obfuscation in message/* attachments
Our address obfuscation currently relies on HTML-escaped output,
so we need to call obfuscate_addrs() after ascii_html(). This
bug only affected rare messages which include another message/*
attachment. Without this fix it didn't fail to obfuscate, but
rather showed the showed `•' in the HTML instead of the
entity it represents.
Eric Wong [Fri, 15 Nov 2024 02:59:30 +0000 (02:59 +0000)]
nntp: integerize {article} to save memory
The NNTP article number is always an integer, so ensure it's
stored as one to avoid malloc overhead since NNTP clients may
linger for minutes at a time.
Eric Wong [Fri, 15 Nov 2024 02:59:28 +0000 (02:59 +0000)]
test_common: disable fsync in git(1) commands
As with git itself, fsync(2) results in needless overhead and
storage wear in test cases where data integrity is not an issue.
I normally point TMPDIR to tmpfs when running tests, but this
still affects initial setup of data for stuff in t/data-gen as
well as improving life for users with too little RAM for a tmpfs
TMPDIR.
Eric Wong [Fri, 15 Nov 2024 02:59:27 +0000 (02:59 +0000)]
tests: fix missing modules under TEST_RUN_MODE=0
By default, we rely heavily on preload to speed up tests and
missing modules were always present by the time we hit some
tests. However, the more realistic (and significantly slower)
TEST_RUN_MODE=0 doesn't preload so we must explicitly load
missing modules.
Eric Wong [Tue, 12 Nov 2024 20:34:33 +0000 (20:34 +0000)]
lei_mirror: favor File::Spec::Functions
Functions calls are preferable over `->' method dispatch in
tight loops. This can be the case when scanning alternates in
v1_done(). Since we're at it, replace all other `File::Spec->'
method dispatches with function calls since function calls can
be used to validate function prototypes at compile time.
Eric Wong [Tue, 12 Nov 2024 20:34:30 +0000 (20:34 +0000)]
cindex: rework path canonicalization check
While reading the code, I noticed inadvertant `$_' use when the
loop iterator is `$d'. Using `$_' here would result in
uninitialized variable access. I've yet to hit this case in
real-world access.
Furthermore, we can use a single pass to canonicalize existing
directories instead of relying on a grep block, first.
Finally, favor File::Spec::Functions since `->' method dispatch
is slower than normal subroutine calls by a small amount even
when both the package and method names are static and known
early in advance..
Eric Wong [Mon, 11 Nov 2024 21:56:55 +0000 (21:56 +0000)]
import: avoid uninitialized comparison on failures
readline may return undef when fast-import fails (as triggered
by t/lei-store-fail.t). Ensure we give a more informative error
message in the syslog when this happens. Arguably, having this
in the syslog when a client is connected via terminal is probably
not great, but perhaps unavoidable...
Eric Wong [Mon, 11 Nov 2024 21:56:54 +0000 (21:56 +0000)]
t/spawn: increase timeout for slow systems
Smetimes the SIGXCPU handler doesn't fire in time on an
overloaded VPS, so hopefully increasing the timeout is now
enough. The $rset allocation and bitset is now moved before
the spawn to avoid measuring any possible overhead from the
scalar creation.
Eric Wong [Mon, 11 Nov 2024 21:56:53 +0000 (21:56 +0000)]
t/inbox_idle: delay for low-res FS w/o inotify||kqueue
On systems without inotify||kqueue, changes are unreliably
detected on filesystems with low-resolution timestamps. The
FakeInotify emulation can't detect changes properly in all
cases. While this remains a problem for real-world use cases,
systems w/o inotify or IO::KQueue are probably rare, so we'll
just change this test case to accomodate old FSes which lack
high resolution timestamps.
Keep in mind that mounting an old FS on a modern kernel doesn't
automatically give it high-resolution timestamps. I discovered
this problem because I still use an ancient ext3 FS created
decades ago on a modern kernel :x
Eric Wong [Sun, 10 Nov 2024 20:43:26 +0000 (20:43 +0000)]
lei: show searches prefixed with `.'
Sometimes, a user will use an output with a basename which
starts with a `.'. glob("*") won't list files prefixed with `.'
by default and glob(".* *") requires iterating twice on my
system. So just rely on Perl regexps instead of glob to get the
directory listing done in a single pass. We can improve error
detection with autodie for opendir, as well.
Eric Wong [Sun, 10 Nov 2024 11:14:15 +0000 (11:14 +0000)]
lei_store_err: flush before disabling alarm
We must flush the output to ensure the alarm works properly,
so the easiest way is to rely on IO::Handle::autoflush rather
than calling an explicit close() since the SIG{ALRM} handler
may close it.
Eric Wong [Fri, 8 Nov 2024 12:06:46 +0000 (12:06 +0000)]
EOFpipe: avoid uninitialized variables in lei tests
EINTR can happen due to spurious wakeups and lead to
uninitialized variable warnings when comparing the result of
->do_read to a numeric value. Thus avoid EINTR by making the
pipe non-blocking on initialization.
Furthermore, our ->do_read API is overkill and not appropriate
for only detecting EOF on a pipe since it is designed for
parsing network protocols. Relying on level-triggering makes
more sense here, since we only want to detect EOF and don't have
to worry about event loop monopolization.
Debian abandoned SVN and Alioth in favor of Salsa (Gitlab) a
while back, so point to the new URL of the SA configs. The
new repo is limited to SA configs, while the old pkg-listmaster
repo had some other tools and plugins which were probably unused
and abandoned.
Eric Wong [Mon, 28 Oct 2024 20:47:58 +0000 (20:47 +0000)]
coderepo: sort per-inbox coderepos by score
This increases the likelyhood of early solver success and looks
more logical in listings.
It's probably OK to ditch the timestamp column since the score
is far more important and we can reduce Xapians lookups this
way. The raw score is still shown, for now, but that could
probably be a percentage in the future...
Eric Wong [Fri, 25 Oct 2024 03:19:56 +0000 (03:19 +0000)]
learn: reduce parameter passing
Since $remove_or_add is now a locally-scoped anonymous sub, it
can access `global' variables under TestCommon::key2sub and
avoid shadowing the names of global variables.
Eric Wong [Fri, 25 Oct 2024 03:19:55 +0000 (03:19 +0000)]
learn: support --keep-going/-k switch
Inspired by make(1), this switch allows -learn to work with
config files which contain both read-only and read-write
inboxes.
The remove_or_add() sub is now anonymous and locally-scoped to
facilitate `@fail_ibx' variable sharing use when the entire
script is made into a subroutine for `make check-run' (via
TestCommon->key2sub). A subsequent commit will reduce needless
parameter passing of global variables for readability.
Eric Wong [Mon, 21 Oct 2024 20:24:58 +0000 (20:24 +0000)]
doc: cindex: clarify --prune switch
--prune doesn't remove commits from git, it only removes them
from the index. Thus "unindex" is a better word to describe the
removal of commits from the index.
Eric Wong [Tue, 8 Oct 2024 05:18:37 +0000 (05:18 +0000)]
v2writable: more debug output for `lei import' failures
A difficult-to-reproduce bug in `lei import' introduced in 4ff8e8d21ab5
(lei/store: stop shard workers + cat-file on idle, 2024-04-16)
causes {idx_shards} to not be recreated properly due to a shard
being locked while attempting to get a write lock on it:
Exception: Unable to get write lock on /path/to/shard0: already locked
I'm not sure if the bug is from unnecessarily holding onto a
shard too long, or incorrectly attempting to open an
already-open shard. In either case, hopefully a more complete
backtrace can be obtained since setting PERL5OPT=-MCarp=verbose
in the shared lei/store worker process isn't straightforward[*].
AFAIK, this doesn't affect normal v2 and -extindex activity,
only lei users who import mail[*]
[*] lei attempts to ensure read-after-write consistency
across parallel instances to satisfy users who simultaneously
import multiple IMAP mailboxes at once over high-latency networks.
However, Xapian, SQLite, and multi-epoch git only work well with
one writer-at-a-time so lei jumps through hoops to avoid
introducing suprising local delays and waits.
Eric Wong [Mon, 7 Oct 2024 08:30:20 +0000 (08:30 +0000)]
lock: improve error reporting
Lock errors should not happen under normal use, so use confess()
to aid debugging on failure. We'll also start using `E: ' to
denote errors (as opposed to warnings).
Eric Wong [Wed, 2 Oct 2024 22:39:02 +0000 (22:39 +0000)]
viewvcs: generate search query for merge commits
Attempt to parse commit titles out of merge commit messages and
generate search queries out of them to find the related emails
for the individual patch(es).
As with all search-related functionality, it's best-effort
and inexact, but seems somewhat successful.
Eric Wong [Wed, 2 Oct 2024 22:39:01 +0000 (22:39 +0000)]
viewvcs: use wider textarea for search query
We normally wrap text at 72, but the <textarea> can be wider
in case there's long words which aren't broken apart.
With w3m, the enclosing `[' and `]' take up two columns
combined, so it works out to maximizing a standard 80-column
terminal.
Eric Wong [Mon, 30 Sep 2024 21:30:08 +0000 (21:30 +0000)]
t/{config,solver_git}: simplify error handling
autodie and the newish PublicInbox::IO::write_file make error
handling more consistent and less noisy to people reading the
code. We'll also avoid testing `git config' set behavior
of git(1) and instead bail out via `xsys_e()' if it fails
unexpectedly due to hardware problems or bugs in git.
Eric Wong [Mon, 30 Sep 2024 21:30:07 +0000 (21:30 +0000)]
xt/solver: use `psgi' shortcut for require_mods()
The `psgi' shortcut simplifies setup, reduces the likelyhood
of human error from omitted modules, and avoid needless `use_ok'
tests which make the output noisier than necessary.
Eric Wong [Mon, 30 Sep 2024 21:30:06 +0000 (21:30 +0000)]
www: improve handling of missing coderepos
Git coderepos may appear and disappear during the lifetime of
an -httpd or -netd instance. This happens quite frequently on
on my git.kernel.org mirror and is and clutters up stderr, so
we'll validate the existance of a git directory before trying
to serve anything.
Whether or not this should be the case for inboxes is yet-to-be
decided, especially since there's inbox-specific information
inside in PI_CONFIG (e.g. address, newsgroup) and we only use a
project listing for coderepos.
Eric Wong [Mon, 30 Sep 2024 21:30:04 +0000 (21:30 +0000)]
t/www_static: test with our -httpd server, too
While our current HTTP implementation doesn't make special
allowances for static files, it only hurts a little to fork
off -httpd instances and ensure any future changes work as
expected.
Eric Wong [Mon, 30 Sep 2024 21:30:03 +0000 (21:30 +0000)]
t/www_static: modernize test
autodie is standard in Perl v5.10+ and we now have
PublicInbox::IO::write_file to denoise test setup code.
Then we'll favor the non-wantarray calls to tmpdir() and
rely on an overloaded stringification rather than keeping
the $for_destroy object around.
Eric Wong [Thu, 26 Sep 2024 10:56:35 +0000 (10:56 +0000)]
t/lei-import-imap: bail on missing UIDVALIDITY after import
I haven't been able to reproduce it, but I've seen the test
fail because ls-mail-sync didn't emit anything which matched
/;UIDVALIDITY=(\d+)\s*/. Add some diagnostics and bail out
right away if it happens again.
Eric Wong [Thu, 26 Sep 2024 00:55:05 +0000 (00:55 +0000)]
www: use mtime as CSS cache-buster instead of ctime
mtime can be synchronized across multiple machines via package
managers, tarballs, rsync deploys, or tools like
`git-set-file-times'. This synchronization increases cache hit
rates for browsers hitting multi-host public-inbox instances
loadbalancing behind a single $hostname:$port identity.
While mtime may be less correct, it's unusual that anyone would
want to intentionally alter or preserve mtime after a file is
changed on the FS.
Eric Wong [Sat, 28 Sep 2024 18:39:35 +0000 (18:39 +0000)]
www: allow specifying CSS @import or <link> tags
Apparently, some browsers (or settings/extensions) will not
honor certain media= attributes in HTML <link> tags. So as a
workaround, users may force the use @import statements inside the
<style> tag before the normal monospace CSS $STYLE using the new
`load' directive.
I've only lightly tested due to swap thrashing on my system, but it
appears to work on a new Firefox profile w/ ui.systemUsesDarkTheme=1.
I couldn't get @import nor <link> working on an existing profile,
likely due to some other settings in the config.
I initially wanted to use @import by default, but the ordering
constraint prevents user-specified CSS from overriding the
default $STYLE we normally inject first.
This also ensures @import use inside a specified CSS file forces
the file to be imported first, before the default $STYLE is
inlined.
In the future, `load' may support `last' as a comma-delimited
directive to load CSS at the bottom of the page (before the
`</html>' tag).
Eric Wong [Thu, 26 Sep 2024 00:55:02 +0000 (00:55 +0000)]
www: don't reread CSS files
With preloading, we usually read our CSS files in quick
succession so they are unlikely to change in between loads.
Thus we can save some syscalls and memory (assuming newer
Perl with CoW scalars).
Since our normal Perl code is only loaded once and ignores
changes on the FS after startup, we'll treat our CSS the
same way and assume they don't change after startup.
Eric Wong [Thu, 26 Sep 2024 00:55:01 +0000 (00:55 +0000)]
user_content: simplify internal API and use v5.12
We use {env} and {ibx} everywhere so there's no point in
unpacking args. There's no odd unicode_strings problems
here, either, so we can use v5.12 and autodie to reduce
`or die' checks.
Eric Wong [Tue, 24 Sep 2024 18:35:48 +0000 (18:35 +0000)]
viewvcs: fix b= generation in $REPO/tree/ listing
Queries such as `b=contrib/cssREADME' are incorrect despite
having the actual blob OID for the given file. Add a trailing
slash for files in a project subdirectory in those cases as we
do for cases we don't have a known path name.
While we're in the area, avoid needless shadowing of the `$t'
var and add a comment to describe its contents.
Eric Wong [Mon, 16 Sep 2024 21:03:01 +0000 (21:03 +0000)]
www: test address URL-fication
Probably more tests coming, but setup stuff is still on the slow
side. While email addresses can be all sorts of uncommon
characters, I'm also fairly certain we can disallow the [&;<>]
set from being URL-fied.
Eric Wong [Mon, 16 Sep 2024 21:03:00 +0000 (21:03 +0000)]
test_common: improve psgi test setup + loading
Since we have many PSGI tests nowadays, put in a `psgi' shortcut
like we do for many other components for `require_mods' to make
it easier to load a consistent set of modules.
We'll also cut down on `require_ok' and `use_ok' tests since
they should be limited to code maintained in our source tree,
not 3rd-party dependencies.
Eric Wong [Mon, 16 Sep 2024 21:02:58 +0000 (21:02 +0000)]
config: ignore blank address= and listid= entries
At the minimum, there must be a non-space character in
address= and listid= entries for matches to occur.
Filter out the obviously unmatchable entries here to
avoid potential problems elsewhere.
Eric Wong [Mon, 16 Sep 2024 09:53:59 +0000 (09:53 +0000)]
t/feed: fix uninitialized variable warnings
These warnings only happened under test conditions and never
when running under any PSGI servers. In retrospect, t/feed.t is
likely redundant nowadays and ought to be folded into existing
PSGI tests so we don't have to consider setup problems like
these.
Fixes: bbe582cdfa429 ("view: fix addr2urlmap with Plack::Builder::mount")
Eric Wong [Fri, 13 Sep 2024 22:07:24 +0000 (22:07 +0000)]
view: disable address URL-fication of possible HTML escapes
In case somebody uses local email address of `lt' or `gt' (with
no domain component, or something matching /#\d+/a), disable
URL-fication of such addresses to prevent breaking HTML output.
Somebody with better Perl regexp knowledge than I can attempt to
write a regexp which functions like \b but avoids matching `&'
to allow such local email addresses. But I suspect the use of
local-only email addresses to be limited and this isn't a real
problem in practice.
Eric Wong [Fri, 13 Sep 2024 22:07:23 +0000 (22:07 +0000)]
view: fix addr2urlmap with Plack::Builder::mount
Plack::App::URLMap does not preserve SCRIPT_NAME set for PSGI
`mount' directives when running response callbacks. Thus we
must get $ibx->base_url($ctx->{env}) calls to generate correct
full URLs when relying on publicinbox.nameIsUrl up front before
the PSGI response callback is returned.
Eric Wong [Tue, 10 Sep 2024 00:40:48 +0000 (00:40 +0000)]
view: fix x-post links for relative urls
We need to make correct relative URL paths for users configuring
publicinbox.$NAME.url as relative URL paths (e.g. matching the
inbox `$NAME').
Users of protocol-relative (e.g. `//$HOST/$NAME') and absolute URIs
(e.g `https://example.com/$NAME') were unaffected by this bug.
Users relying on publicinbox.nameIsUrl and omitting
publicinbox.*.url entries were also immune to this bug.
Automated tests are in progress and will come in a separate commit.
Eric Wong [Wed, 11 Sep 2024 21:25:49 +0000 (21:25 +0000)]
www: preload all inboxes if using ->ALL
This ought to improve memory layout and ensure the regexp
for address => inbox linkification works when hitting
/$EXTINBOX/$MSGID/ links first (instead of /$INBOX/$MSGID)
This fill_all call is redundant for cindex users who get the
preload anyways, but necessary for non-cindex users.
This should also avoid the broken/empty regexps problem described in 3b51fcc196e3 (view: fix addr2url mapping corruption, 2024-09-06)
Eric Wong [Fri, 6 Sep 2024 23:29:03 +0000 (23:29 +0000)]
view: fix addr2url mapping corruption
We must avoid generating a qr/\b()\b/ regexp which matches
every word boundary. This is caused by a particular set of
circumstances for WWW instances:
1. extindex must be in use
2. cindex must NOT be in use OR WWW->preload wasn't used
(custom .psgi or non-p-i-{httpd,netd} users)
3. first HTTP request hits /$EXTINDEX/$MSGID/
(where $EXTINDEX is typically `all')
On extindex-using instances without a cindex configured, the
first HTTP request hitting the extindex encounters an empty
{-by_addr} hash table. This empty {-by_addr} hash table causes
View->addr2urlmap() to return an all-matching regexp which
corrupts HTML when attempting address substitutions.
cindex-using instances avoid the problem by triggering
_fill_all() during PublicInbox::WWW->preload and ensuring
{-by_addr} of the PublicInbox::Config object is populated.
Thanks to Konstantin for the initial report and Filip for the
immensely helpful explanation of the problem.
Eric Wong [Sat, 31 Aug 2024 08:17:56 +0000 (08:17 +0000)]
tests: skip ENOSPC injection on restricted systems
Yama will not allow ptrace(2) on existing processes (only new
ones) if the kernel.yaml.ptrace_scope sysctl is non-zero. Skip
those tests for now since the majority of strace(1) testing
is probably done on systems without ptrace restrictions.
Eric Wong [Fri, 30 Aug 2024 20:36:35 +0000 (20:36 +0000)]
view: fix unclosed parentheses after `raw' link
This formatting error was accidentally introduced while
converting a `qq{}' concatenation to a `say' statement. Re-add
the `)'. While we're at it, switch to a `print' statement
since we use a string literal anyways and `say' would require an
extra global variable lookup at runtime.
Eric Wong [Thu, 29 Aug 2024 23:26:03 +0000 (23:26 +0000)]
solver: use async_check for the temporary git repo
While the temporary git repo is likely in cache and not
subject to high seek latency as normal code repos are,
inflating objects still takes a non-trivial amount of time.
So use this as an opportunity to serve other clients and
exploit parallelism in SMP systems.
Eric Wong [Thu, 29 Aug 2024 23:26:02 +0000 (23:26 +0000)]
solver: use xap_helper for async search if available
The async search API using xap_helper allows -httpd/netd users
to exploit storage and CPU-level parallelism via sockets. It is
another step towards reducing head-of-line blocking in our Perl
event loop. This reduces the effect of slow storage and extremely
large search results on unrelated HTTP requests.
Eric Wong [Thu, 29 Aug 2024 23:26:01 +0000 (23:26 +0000)]
solver: use async check (`info') for coderepo
Async --batch-check or `info' batch commands allow our Perl
process to handle other requests while git is busy waiting
on slow storage or CPUs to retrieve blob information.
This improves parallelism for SMP machines in addition to
allowing the Perl process to service other HTTP/NNTP/IMAP/POP3
requests while waiting for disk seeks, zlib inflation, and
delta resolution.
Checking stderr for error hints is now potentially racy, but
it's only a hint so overall performance under worst case
scenarios is preferable to correctness.
Eric Wong [Fri, 30 Aug 2024 19:05:29 +0000 (19:05 +0000)]
t/v2writable: avoid failure on strace un-readyiness
poll(2) uses milliseconds, IO::Poll::_poll doesn't abstract that,
nor does our ->poll_in wrapper. This ensures we wait enough time
for strace to start up on overloaded systems.
Eric Wong [Fri, 30 Aug 2024 19:05:15 +0000 (19:05 +0000)]
lei: increase umask timeout
On slow or overloaded systems, 2 seconds may not be sufficient
time to wait for a lei client to respond to the umask request
from lei-daemon. Use 60s to be consistent with the FD transfer
in the general case.
While we're at it, consistently use poll_in() now that it exists
since it's a better API than vec() + select() and will give
consistent performance regardless of the FD value.
Eric Wong [Fri, 23 Aug 2024 16:30:29 +0000 (16:30 +0000)]
tls: set SSL_OP_NO_COMPRESSION explicitly
TLS compression is susceptible to the CRIME attack and
per-connection zlib contexts waste memory for idle clients.
While compression should already be off by default in modern OpenSSL;
Net::SSLeay::CTX_get_mode reveals OP_NO_COMPRESSION was not set
when created by IO::Socket::SSL::SSL_Context->new. So set it
explicitly to ensure it's really off.
Eric Wong [Tue, 20 Aug 2024 18:40:59 +0000 (18:40 +0000)]
t/sigfd: reduce getpid() calls and hash lookups
getpid() is no longer cached by glibc, syscalls are more
expensive nowadays, so only call it once per test. The
additional hash table depth is no longer necessary since there's
no longer a difference between signal dispatch methods now that
Sigfd uses the global %SIG.