Eric Wong [Thu, 30 Nov 2023 11:40:54 +0000 (11:40 +0000)]
cindex: fix store_repo+repo_stored on no-op
It's possible to update the fingerprint for a given repo when we
have no commits to index on because they were already done for
another repo. Thus we'll always vivify $repo_ctx->{active}
before calling store_repo since $active may've been undef.
Eric Wong [Tue, 28 Nov 2023 17:36:59 +0000 (17:36 +0000)]
www: mail_diff: fix optional address obfuscation
We need to load the proper package and fully-qualify the sub
call since we shouldn't load Hval in lei. Some users use this
feature even if its broken, oh well :<
Eric Wong [Tue, 28 Nov 2023 14:56:26 +0000 (14:56 +0000)]
cindex: extra quit checks
We don't want to be accessing uninitialized variables on
process teardown since much of our control flow revolves
around DESTROY for dependency handling.
Eric Wong [Tue, 28 Nov 2023 14:56:25 +0000 (14:56 +0000)]
admin: resolve_git_dir respects symlinks
Absolute pathnames of git coderepos are stored in the cindex,
but we should favor paths relative to $ENV{PWD} since it
respects symlinks in the heirarchy.
Respecting symlinks makes it easier to migrate cindex to
new storage as old storage wears out and to relocate the
storage device onto another machine.
Eric Wong [Tue, 28 Nov 2023 14:56:23 +0000 (14:56 +0000)]
cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT'
Accepting @ARGV without switches ends up being ambiguous with
optional parameters for --join and --show. Requiring users to
specify `--join=' or `--show=' is a bit awkward (as it with
-clone --objstore= and the like, but that is historical baggage
we need to carry at this point...)
Eric Wong [Tue, 28 Nov 2023 14:56:22 +0000 (14:56 +0000)]
git: speed up ->git_path for non-worktrees
Only worktrees need to use `git rev-parse --git-path', so avoid
the spawn overhead of a new process. With the SolverGit.pm
limit on coderepo scans disabled and scanning over 800 git repos
for git@vger matches, this reduces up xt/solver.t times by
roughly 25%.
Eric Wong [Tue, 28 Nov 2023 14:56:21 +0000 (14:56 +0000)]
www: load and use cindex join data
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
Eric Wong [Tue, 28 Nov 2023 14:56:20 +0000 (14:56 +0000)]
hval: use File::Spec to make relative paths for href
File::Spec->abs2rel doesn't touch the filesystem at all when
given an absolute base arg ($env->{PATH_INFO}), so we can rely
on it to generate relative links that work with the `mount'
from Plack::Builder and also people running `wget -r' mirrors.
Eric Wong [Tue, 28 Nov 2023 14:56:19 +0000 (14:56 +0000)]
xap_helper: implement mset endpoint for WWW, IMAP, etc...
The C++ version will allow us to take full advantage of Xapian's
APIs for better queries, and the Perl bindings version can still
be advantageous in the future since we'll be able to support
timeouts effectively.
Eric Wong [Tue, 28 Nov 2023 14:56:18 +0000 (14:56 +0000)]
xap_helper.h: move cindex endpoints to separate file
It ought to help a bit with organization since xap_helper.h
is getting somewhat large and we'll need new endpoints to
support WWW, lei, and whatever else that needs to come.
Eric Wong [Tue, 28 Nov 2023 14:56:15 +0000 (14:56 +0000)]
t/cindex*: require SCM_RIGHTS for these tests
Code search will require SCM_RIGHTS, and Inline::C on BSDs
probably isn't too onerous a dependency for new features as
all the ones I've tested have it packaged.
Furthermore, requiring SCM_RIGHTS isn't far off since OpenBSD's
Perl is patched to route the `syscall' perlop through libc[1],
while NetBSD[2] and FreeBSD[3] actually do strive for backwards
compatibility. We'd just need to use the numbers and not rely
on syscall.ph shipped with Perl since the macro names themselves
are unstable.
Eric Wong [Tue, 28 Nov 2023 14:56:14 +0000 (14:56 +0000)]
test_common: create_*: detect changes all parameters
Data::Dumper+B::Deparse seems fast enough to generate cache keys
with, so this makes updating and developing tests easier (as
opposed to forcing the developer to change the identifier). The
main downside is we'll have to deal with cache expiration, but
"make clean" seems overly aggressive already (it keeps blowing
away the clones made by t/cindex-join.t :<)
Eric Wong [Mon, 27 Nov 2023 22:20:59 +0000 (22:20 +0000)]
disallow NUL characters in Message-ID and List-Id
While MTAs seem to stop '\0' from appearing in headers, users
fetching archives via git remain susceptible to having '\0' land
in archives. So we'll filter them out of Xapian and SQLite DBs
to avoid interopability problems with CLI tools since there's no
known messages in lore or any of my archives which feature them.
Avoiding '\0' will ensure all indexed Message-IDs and List-Ids
can be specified from the command-line (although some characters
will still require $(printf) contortions).
As with Message-ID, List-Id fields with /\n\t\r/ characters will
also be stripped for indexing. I will assume whatever went wrong
with the References: header in
<https://public-inbox.org/git/656C30A1EFC89F6B2082D9B6@localhost/raw>
could also happen to the List-Id header.
Eric Wong [Mon, 27 Nov 2023 10:23:48 +0000 (10:23 +0000)]
www: qs_html: fix escaping of `q' param
Our use of MID_ESC characters was only intended for the pathname
component of URIs and not appropriate for the query string
component. So use a different $unsafe parameter list for
uri_escape to make the result appropriate for query strings by
disallowing [\&\'\+=] characters. Most notably, this change
also allows us to accept `/' (slash) unescaped to make dfn: queries
nicer to look at.
Finally, we'll also add a ascii_html call on the URI-escaped
result as an extra safety measure even though it's not really
needed.
As far as I can tell, the code without this fix didn't result in
in an HTML injection since all our uses of uri_escape did escape
angle brackets.
Eric Wong [Mon, 27 Nov 2023 07:26:28 +0000 (07:26 +0000)]
t/nntpd-tls: avoid test failure on OpenBSD 7.3
The LibreSSL 3.7.2 on my OpenBSD 7.3 VM seems return 7 bytes of
junk data before EOF/ECONNRESET when a client attempts to write
plain-text to a TLS socket.
Eric Wong [Mon, 27 Nov 2023 04:05:47 +0000 (04:05 +0000)]
xap_helper.h: avoid some off_t vs size_t problems
We'll introduce a helper to cast off_t to size_t consistently
for mmap/munmap/calloc calls which require size_t. Also, an
extra check for multiplication overflow can be helpful just
in case we end up with a gigantic file roots file.
Eric Wong [Sun, 26 Nov 2023 20:07:45 +0000 (20:07 +0000)]
xap_helper: avoid strerror(3) inside signal handler
It's not async-signal-safe and the glibc implementation uses
malloc via asnprintf. Practically it's not a problem unless the
kernel OOMs and the write(2) fails to the self-pipe.
Eric Wong [Sun, 26 Nov 2023 21:08:01 +0000 (21:08 +0000)]
drop redundant calls to DS->Reset
Reset gets called on END{} anyways to workaround DBI lifetime
problems, so there's no need to call it near exit. We can't
replace calls to POSIX::_exit with `exit' to force END{} to
run just yet, as there are still some lingering destruction
ordering problems on newer DBI and or Perls.
Eric Wong [Sun, 26 Nov 2023 02:11:04 +0000 (02:11 +0000)]
git: improve coupling with {sock} and {inflight} fields
While the {inflight} array should be tied to the IO object even
more tightly, that's not an easy task with our current code. So
take some small steps by introducing a gcf_inflight helper to
validate the ownership of the process and to drain the inflight
array via the awaitpid callback.
This hopefully fix problems with t/lei-q-save.t (still) hanging
occasionally on v2 outputs since git->cleanup/->DESTROY was getting
called in v2 shard workers.
Eric Wong [Sat, 25 Nov 2023 20:54:35 +0000 (20:54 +0000)]
ds: long_step: eliminate redundant fileno call
We already stash the associated FD for reporting at startup and
don't need to call `fileno' again. Found via manual code
inspection while considering the effort to make async {forward}
from PublicInbox::HTTP more like the generic long_response API
and {long_cb} field used by IMAP/NNTP/POP3.
Eric Wong [Sat, 25 Nov 2023 20:54:34 +0000 (20:54 +0000)]
select+poll: have caller retry on EINTR
We can't assume signals are blocked when neither signalfd nor
EVFILT_SIGNAL are in use. So just return an empty result so
the caller can recalculate the timeout.
I found this bug while making xt/httpd-async-stream.t
use our event loop to reap processes but have abandoned
that effort for now since it didn't save any code.
Eric Wong [Sat, 25 Nov 2023 20:54:33 +0000 (20:54 +0000)]
http: fix pipelining during long async requests
We must not attempt to read request bodies from the HTTP client
while processing a long request since that drains pipelined
requests. The NNTP/IMAP/POP3 event_step callbacks follow the
same behavior when {long_cb} is present from ->long_response.
This bug has little real-world consequence since HTTP/1.1
pipelining is not widely-used, especially when behind varnish
or other reverse proxies.
I found this bug while randomly strace-ing an active -netd
process to see the kind of traffic it was seeing.
Eric Wong [Sat, 25 Nov 2023 01:52:25 +0000 (01:52 +0000)]
examples/unsubscribe.milter: limit scope of munging
We don't want the milter to munge List-Unsubscribe headers from
external (incoming) mlmmj lists, only lists hosted on the server
running unsubscribe.milter.
Adding support for an allow_domains file should've been enough,
but this further restricts the milter to only operating on Postfix
connections from localhost.
Eric Wong [Fri, 24 Nov 2023 09:53:46 +0000 (09:53 +0000)]
cindex: fix --join=reset and speed up incremental joins
`reset' means we want to ignore existing join data, while
the default (non-reset) means we perform an incremental
join while taking into account existing (fuzzy) join data.
Eric Wong [Wed, 22 Nov 2023 00:13:31 +0000 (00:13 +0000)]
lei_to_mail: don't close STDOUT unless it is a mbox* output
We only care about error checking when stdout is an mbox output
pointed to a pathname. This is noticeable with `lei up' with
multiple non-mbox* destinations. We'll also ensure throwing
exceptions to trigger lei->x_it from lei->do_env results in the
epoll/kqueue watch being discarded, otherwise commands may never
terminate (leading to stuck tests)
Eric Wong [Tue, 21 Nov 2023 12:43:15 +0000 (12:43 +0000)]
cindex: rename --associate to --join, test w/ real repos
The association data is just stored as deflated JSON in Xapian
metadata keys of shard[0] for now. It should be reasonably
compact and fit in memory for now since we'll assume sane,
non-malicious git coderepo history, for now.
The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be
set in the environment and tests the joins against the inboxes
and coderepos of two small projects with a common history.
Internally, we'll use `ibx_off', `root_off' instead of `ibx_id'
and `root_id' since `_id' may be mistaken for columns in an SQL
database which they are not.
Eric Wong [Tue, 21 Nov 2023 12:43:14 +0000 (12:43 +0000)]
doc/cindex: point no-fsync,dangerous to -index(1)
There's no point in duplicating --no-fsync documentation across
manpages. --dangerous can be useful for reducing SSD wear, so
add a pointer to it as well.
Eric Wong [Mon, 20 Nov 2023 19:22:32 +0000 (19:22 +0000)]
searchidx: run `git patch-id' in parallel
Informal benchmarks show a rough 5% indexing improvement on an
SMP system when there are idle cores due to Xapian shards being
I/O bound (since `git patch-id' is mainly CPU bound).
This is only parallelized on a per-patch basis. Further
increasing parallelism would increase complexity and probably
not be worth it since `git patch-id' is reasonably fast while
our text indexing tends to be slow.
Eric Wong [Mon, 20 Nov 2023 08:46:01 +0000 (08:46 +0000)]
git: return upon self->close
I encountered the odd lack of `return' while chasing Gcf2 bugs
on CentOS 7.x which resulted in commit 7d06b126e939
("gcf2: fix autodie usage for older Perl") and commit e618c7654794
("gcf2client: add alias for PublicInbox::Git::fail") before
realizing the lack of `return' here wasn't the culprit behind
failures on CentOS 7.x.
However, the use of a `return' here appears required in case we
actually hit the error path, since falling through and
attempting my_readline with an undefined filehandle is always a
failure.
Fixes: e97a30e7624d ("lei: fix SIGPIPE on large result sets to pager")
Eric Wong [Thu, 16 Nov 2023 11:00:20 +0000 (11:00 +0000)]
extindex: warn and hint about --gc on bad ibx_id
Stale entries from newsgroup name changes (including adding
a `publicinbox.<name>.newsgroup' entry when none existed
before) can wreak havoc during a --reindex. So give the
hint to users about running -extindex with --gc to clean
up stale entries.
Eric Wong [Wed, 15 Nov 2023 09:21:44 +0000 (09:21 +0000)]
lei: avoid extra fork for v2 outputs
We've always forced LeiToMail to only have one process for v2
outputs anyways since v2 has its own sharding and IPC. Thus we
can use the single LeiToMail process directly to avoid extra IPC
overhead.
Eric Wong [Wed, 15 Nov 2023 09:21:43 +0000 (09:21 +0000)]
lei convert: fix repeat and idempotent v2 output
We should be able to treat v2 outputs just like any other mail
format, with the exception that content dedupe is always
enforced by the v2 format.
This allows users hosting v2 public-inboxes to catch up broken
synchronization from alternate archives such as the mbox
archives hosted by https://lists.gnu.org/
Eric Wong [Wed, 15 Nov 2023 08:24:22 +0000 (08:24 +0000)]
xap_helper_cxx: accept leading spaces from pkg-config
Eric Wong <e@80x24.org> wrote:
> Avoid mixing autodie use in different scopes since it's likely
> to cause problems like it did in Gcf2. While none of these
> fix known problems with test cases, it's likely worthwhile to
> avoid it anyways to avoid future surprises.
That XapHelperCxx change was totally necessary for running the
C++ build on CentOS 7.x (but the test is auto-skipped on any
build failure), as is this one:
--------8<--------
Subject: [PATCH] xap_helper_cxx: accept leading spaces from pkg-config
pkg-config 0.27.1 and xapian14-core-devel (1.4.24-1.el7) on
CentOS 7.x will print a leading space when running
`pkg-config --libs --cflags xapian-core'. This leading
space creates an empty string when `split' with /\s+/ as
a pattern. Instead, use the documented ' ' (SP) character
to put split into "awk mode" which eats leading (and
redundant) spaces and tabs.
Eric Wong [Wed, 15 Nov 2023 04:32:39 +0000 (04:32 +0000)]
treewide: more autodie safety fixes for older Perl
Avoid mixing autodie use in different scopes since it's likely
to cause problems like it did in Gcf2. While none of these
fix known problems with test cases, it's likely worthwhile to
avoid it anyways to avoid future surprises.
For Process::IO, we'll add some additional tests in t/io.t
to ensure we don't get unintended exceptions for try_cat.
Eric Wong [Wed, 15 Nov 2023 04:32:38 +0000 (04:32 +0000)]
gcf2: fix autodie usage for older Perl
At least on Perl v5.16.3 on CentOS 7.x, use-ing autodie within
BEGIN {} affects all subroutines in that package, too. So just
use autodie at the top-level and rely on CORE::* and try_cat
to handle cases where autodie isn't desired.
Eric Wong [Wed, 15 Nov 2023 04:32:37 +0000 (04:32 +0000)]
gcf2client: add alias for PublicInbox::Git::fail
Ensure we can ->fail properly from other subs we can within
Gcf2Client. This doesn't fix the test failures on CentOS 7.x,
but tries to make it easier to fix underlying problems and
report OOM errors and other things which the test suite doesn't
touch on.
Eric Wong [Tue, 14 Nov 2023 22:46:57 +0000 (22:46 +0000)]
ds: run @post_loop_do if any user-queued events run
This ensures we can notice shutdown events in one-shot scripts
like -cindex (and eventually -clone/-fetch/-compact) without
forcing another real event to fire.
Eric Wong [Wed, 15 Nov 2023 05:55:49 +0000 (05:55 +0000)]
cindex: fix test when missing time(1) executable
It was only there for development purposes because associate is
slow, but it causes the test to get stuck on systems where it's
not available. So remove it and just call join(1posix).
Note: this is not the `time' builtin found in shells, this
executable shows memory and pagefault info (and more with the
`-v' switch). Unfortunately, it's not installed on many systems
despite being widely-packaged.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Wed, 15 Nov 2023 01:04:56 +0000 (01:04 +0000)]
lei: use -signal numbers for old Perl
Unlike modern Perls, Perl 5.16.3 on CentOS doesn't accept
negative string signals like "-TERM" .
This only became a problem since commit b231d91f42d7
(treewide: enable warnings in all exec-ed processes)
made our code stricter by enabling more warnings.
In both cases, the kill is probably unnecessary and safe
to remove since we can rely on closing sockets to drop
processes.
The tests will check for strace >= 4.16, but version 4.24 that I have
does not accept --version, only -V. This works for both older and newer
strace, so switch to using "strace -V" for the check.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Tue, 14 Nov 2023 00:32:20 +0000 (00:32 +0000)]
config: avoid eidx_key and newsgroup conflicts
Start lowercasing newsgroup names automatically since uppercase
names are incompatible with IMAP and POP3 and also causes
problems with both -extindex and -cindex.
We'll also warn on eidx_key and newsgroup conflicts to avoid
sometimes subtle breakage when using -extindex and -cindex.
Eric Wong [Mon, 13 Nov 2023 13:15:51 +0000 (13:15 +0000)]
cindex: support --associate-aggressive shortcut
This is shorthand to enabling --associate with the most
aggressive (and time-consuming) options available, starting from
the Unix epoch and having an unlimited window to join on.
Eric Wong [Mon, 13 Nov 2023 13:15:49 +0000 (13:15 +0000)]
cindex: do not guess integer maximum for Xapian
We can return an array to allow the caller to omit the internal
`-m' arg entirely. We'll also allow any non-positive values to
mean there's no limit; and we'll defer the "unlimited" case to
the XapHelper implementation. This frees us of having to deal
with mismatches between Perl and Xapian if Xapian was compiled
with 64-bit docid support and we're stuck on a 32-bit Perl
build.
Eric Wong [Mon, 13 Nov 2023 13:15:48 +0000 (13:15 +0000)]
xap_helper: better variable naming for key buffer
We'll use `kbuf' for the search object key, since we already use
the `fbuf' term in `struct fbuf'. This also adds an extra check
for open_memstream(3) failures in case of ENOMEM.
Eric Wong [Mon, 13 Nov 2023 13:15:47 +0000 (13:15 +0000)]
xap_helper: stricter and harsher error handling
We'll require an error stream for dump_ibx and dump_roots
commands; they're too important to ignore. Instead of writing
code to provide diagnostics for errors, rely on abort(3) and the
-ggdb3 compiler flag to generate nice core dumps for gdb since
all commands sent to xap_helper are from internal users.
We'll even abort on most usage errors since they could be
bugs in split2argv or our use of getopt(3).
We'll also just exit on ENOMEM errors since it's the easiest way
to recover from those errors by starting a new process which
closes all open Xapian DB handles.
Eric Wong [Mon, 13 Nov 2023 13:15:44 +0000 (13:15 +0000)]
cindex: delay associate until prune+indexing finish
Prune can get rid of invalid commits while indexing can add new
candidates for association, so we don't dump coderepo roots for
association until those are squared away. However, we can dump
inbox info since we don't touch inboxes while -cindex is running.
Eric Wong [Mon, 13 Nov 2023 13:15:42 +0000 (13:15 +0000)]
spawn: don't append to scalarrefs on stdout/stderr
None of our current code relies on it, and I can't imagine it's
something we'd need in the future, actually... This keeps the
door open for relying more on Spawn in TestCommon.
Eric Wong [Mon, 13 Nov 2023 13:15:41 +0000 (13:15 +0000)]
treewide: update read_all to avoid eof|close checks
read_all can be expanded to support FIFOs/pipes/sockets where
read-until-EOF behavior is desired. We can also rely on
wantarray to support splitting on EOL markers, but it's
hard-coded to support only `$/ eq "\n"' since (AFAIK)
it's the only way we use the wantarray form `readline'.
Eric Wong [Mon, 13 Nov 2023 13:15:40 +0000 (13:15 +0000)]
xap_client: spawn C++ xap_helper directly
No need to suffer through an extra dose of slow Perl load times
when we can drive the build in the big parent Perl process and
get the executable path name to pass to spawn directly.
Eric Wong [Mon, 13 Nov 2023 13:15:39 +0000 (13:15 +0000)]
xap_helper_cxx: use -pipe by default in CXXFLAGS
-ggdb3 is already used for g++ and clang, and -pipe is supported
by clang even if it's a no-op. So just use it to speed up g++
since it saves me 30-40ms.
We'll also get rid of the explicit `-O0' since it's the default
for both clang and g++.
Eric Wong [Mon, 13 Nov 2023 13:15:38 +0000 (13:15 +0000)]
xap_helper_cxx: make the build process ccache-friendly
We need to have stable filenames and separate compilation
from the linkage stage for ccache to hit. So avoid the use
of a temporary directory and instead rely on a lock file to
guard against parallel builds.
Eric Wong [Mon, 13 Nov 2023 13:15:37 +0000 (13:15 +0000)]
xap_helper_cxx: use write_file helper
PublicInbox::IO already gets loaded by PublicInbox::Spawn, so
there's no avoiding it even if we want fast startup time :<
But startup time for this piece will be less relevant in the
near future...
Eric Wong [Mon, 13 Nov 2023 13:15:35 +0000 (13:15 +0000)]
tmpfile: check `stat' errors, use autodie for unlink
`stat' can fail due to bugs on our end or ENOMEM, but there's
no autodie support for it. So just die if `unlink' fails, since
the FS wouldn't be usable for tmpfiles in that state, anyways.
Eric Wong [Mon, 13 Nov 2023 13:15:34 +0000 (13:15 +0000)]
cindex: check `say' errors w/ close or ->flush
We actually need to rely on autodie `close' to check for errors,
since error-checking with `say' is not useful due to perlio
write buffering. We'll also stop relying on `say ... or die'
since it's needless noise.
Fixes: 19f9089343c9 (cindex: drop redundant close on regular FH)
Eric Wong [Mon, 13 Nov 2023 05:00:39 +0000 (05:00 +0000)]
xap_helper: reset getopt(3) properly in workers
I only noticed this while doing a full -cindex --associate with
--associate-date-range=30.years.ago and --associate-max=-1 (no
limit for Xapian) between local mirrors of lore and
git.kernel.org my glibc-based system.
Apparently, glibc requires `optind = 0' to reset getopt(3) in
our workers. Oddly, glibc appeared to work fine prior to this
change for the defaults (--associate-date-range=1.year.ago..
and --associate-max=50000).
BSDs and musl have an `optreset' variable which appear to do
the same thing, but I don't have space on BSD VMs to test full
associations.
While we're at it, we'll also keep `opterr' enabled to improve
error reporting.
Eric Wong [Sun, 12 Nov 2023 13:12:33 +0000 (13:12 +0000)]
lei: don't read --stdin terminals from daemon
We must use a foreground process to read from terminals
on stdin, otherwise weird things like lost keystrokes and
EIO can happen. So take advantage of ->send_exec_cmd to
spawn `cat' in the same way we spawn MUAs, pagers,
`git config --edit' and `git credential' from script/lei
Eric Wong [Sat, 11 Nov 2023 09:04:59 +0000 (09:04 +0000)]
doc: update README.unsubscribe
The whitelist was only used in the early days of its development
and hasn't existed for a while. I've largely forgotten this
thing exists since it's been working well...
Eric Wong [Sat, 11 Nov 2023 09:04:57 +0000 (09:04 +0000)]
mda|learn|watch: support dropUniqueUnsubscribe config
List-Unsubscribe headers with unique identifiers (such as those
generated by our examples/unsubscribe.milter) should not
end up in public archives. Add a new config knob to strip
List-Unsubscribe headers if they have the
`List-Unsubscribe-Post: List-Unsubscribe=One-Click'
header.
Unfortunately, this breaks DKIM signatures if the signature
covers either of these List-Unsubscribe* headers. However,
breaking DKIM is the lesser evil compared to any archive reader
being able to stop archival by an independent archivist.
As much as I would like this to be the default, it probably
affects few users at the moment since very few mailing lists
use unique identifiers in List-Unsubscribe (but that number
has grown, recently).
Eric Wong [Sat, 11 Nov 2023 09:04:56 +0000 (09:04 +0000)]
learn: fix redundant ham import on dual matches
When learning and injecting new messages ham, we want to avoid
wasting cycles importing the same message into an inbox twice
(once for the To/Cc match and once for the List-Id match). Our
existing %seen hash turned out to be ineffective since
PublicInbox::Inbox refs get re-blessed to PublicInbox::InboxWritable.
So we stop letting class name influence the hash key for tracking by
using the reference address instead. We can get the reference address
by performing an arithmetic operation (+ 0) instead of having to
pay the cost of importing Scalar::Util::refaddr.
Eric Wong [Fri, 10 Nov 2023 22:26:33 +0000 (22:26 +0000)]
t/lei-import: skip strace for restricted systems
Systems with Yama can restrict ptrace(2) (the underlying syscall
used by strace(1)) and make it difficult to test error handling
via error injection. Just skip the tests on such systems since
it's probably not worth the effort to start using prctl(2) to
enable the test on such systems.
This seems like a easy (but WWW-specific) way to get recently
created and recently active topics as suggested by Konstantin.
To do this with Xapian will require a new columns and
reindexing; and I'm not sure if the current lei handling of
search results by dumping results to a format readable by common
MUAs would work well with this. A new TUI may be required...
Eric Wong [Thu, 9 Nov 2023 10:09:45 +0000 (10:09 +0000)]
lei: get rid of autoreap usage
We can rely on Process::IO->DESTROY to close and reap
in these cases. This is the final step in eliminating
the wantarray invocations of popen_rd (and popen_wr).
Eric Wong [Thu, 9 Nov 2023 10:09:43 +0000 (10:09 +0000)]
lei_input: always close single `eml' inputs
This matches the behavior we have for multi-message mbox files
since we rely on ->close to detect errors on bad mboxes. This
ensures we'll notice errors reading single messages from stdin.
We'll also start relying more on strace error injection to test
error handling.
Eric Wong [Thu, 9 Nov 2023 10:09:42 +0000 (10:09 +0000)]
ipc: simplify partial sendmsg fallback
In the rare case sendmsg(2) isn't able to send the full amount
(due to buffers >=2GB on Linux), use print + (autodie)close
to send the remainder and retry on EINTR. `substr' should be
able to avoid a large malloc via offsets and CoW on modern Perl.