Eric Wong [Sun, 10 Sep 2023 02:05:32 +0000 (02:05 +0000)]
ci/profiles: strip everything after the `-' in utsname.release
This fixes the script under FreeBSD (tested 13.2) FreeBSD 13.2
has `13.2-RELEASE-p3' in its uname(2) utsname.release. While
the `.2' component is a welcome addition over the old script,
Perl parses the `-' as a subtraction operation, which isn't
what we want.
Eric Wong [Sat, 9 Sep 2023 12:01:42 +0000 (12:01 +0000)]
xap_helper: clamp workers to USHRT_MAX
This allows us to avoid any integer overflow problems while
having enough room to grow for some future hardware, though it
looks like having hundreds of cores isn't ever going to make
it to typical servers nor workstations.
Eric Wong [Sat, 9 Sep 2023 12:01:41 +0000 (12:01 +0000)]
xap_helper: use _OPENBSD_SOURCE on NetBSD for reallocarray
NetBSD prefers reallocarr(3) for predictable zero-sized
allocation behavior; but no other OS seems to have reallocarr(3).
reallocarray(3) appears in by OpenBSD, FreeBSD, glibc, and musl,
so continue to go with that.
Eric Wong [Sat, 9 Sep 2023 12:01:38 +0000 (12:01 +0000)]
ci/profiles: rewrite in Perl
Reading os-release(5) is a bit more painful, now; and still
requires using the shell. However, sharing code between *BSDs
and being able to use v-strings for version comparisons is much
easier.
Test profiles for *BSDs are also trimmed down and more focused
on portability stuff.
Eric Wong [Sat, 9 Sep 2023 12:01:37 +0000 (12:01 +0000)]
ci/run.sh: parameterize BUILD_JOBS TEST_JOBS and TEST_TARGET
Parallezing BUILD_JOBS is usually harmless, but TEST_JOBS can
be problematic when tracking down problems on new platforms.
TEST_TARGET can be `check' or `check-run' for performance.
Eric Wong [Sat, 9 Sep 2023 12:01:35 +0000 (12:01 +0000)]
Makefile.PL: check `getconf NPROCESSORS_ONLN', too
NetBSD and OpenBSD getconf(1) don't accept a leading underscore,
while glibc getconf(1) only accepts the leading underscore
(`_NPROCESSORS_ONLN'). FreeBSD getconf(1) accepts both variants.
Eric Wong [Sat, 9 Sep 2023 01:48:38 +0000 (01:48 +0000)]
pop3d: support fcntl locks on NetBSD and OpenBSD
MboxLock already supported it since it locked the whole file,
but POP3D requires more fine-grained locking at file offsets.
I wonder if "struct flock" is old enough for it to be the same
across all the BSDs, it certainly seems so.
I originally considered using C11 `_Generic' support for the
struct offset/type dumping as I have in other projects, but
I am not ready to depend on C11 for this project, yet.
While we're modifying devel/sysdefs-list, add some Linux-only
structs to verify our `pack' templates are correct and remain
so when we encounter new architectures.
Eric Wong [Fri, 8 Sep 2023 22:31:12 +0000 (22:31 +0000)]
ci/deps: redo and fix essential package handling
git depends on p5-TimeDate on FreeBSD, too, so ensure git
doesn't get uninstalled on FreeBSD. Instead of making
@precious a separate array, we can actually stuff dependencies
into the $non_auto map and save us some code.
We can also eliminate some duplication in $non_auto by
populating the Perl standard library packages in a loop.
Eric Wong [Fri, 8 Sep 2023 13:09:08 +0000 (13:09 +0000)]
ci: updates for OpenBSD
Still a work-in-progress, but OpenBSD's pkg_add/pkg_delete seem
to be working somewhat. The dependency system seems to need some
extra help to ensure leaf packages with their own dependencies
(e.g. `xapian-bindings-perl') get uninstalled before their
dependencies (`xapian-core').
Deduplicating the command-line is also required since both
pkg_add and pkg_delete will repeat the installation/removal if
a package is specified multiple times in the same invocation.
Eric Wong [Fri, 8 Sep 2023 13:09:07 +0000 (13:09 +0000)]
ci/deps: drop unnecessary mappings and add Inline
The automatic mapping can work for more packages, so redundant
entries in $non_auto are just clutter.
Unfortunately, `Inline::C' is part of `Inline' on CentOS 7.x and
OpenBSD 7.3, so we'll add $non_auto mappings for those.
We'll also depend on `IO::Compress' to simplify mappings since
that's the CPAN distribution which holds both IO::Compress::Gzip
and IO::Compress::Gunzip and I'm not aware of any packagers who
split them.
Eric Wong [Fri, 8 Sep 2023 13:09:05 +0000 (13:09 +0000)]
update docs + tests for xapian-delve use
Since -cindex uses the xapian-delve(1) command for `--prune'
functionality, we'll rename our `xapian-compact' dependency to
the Debian package name (xapian-tools) since `xapian-delve' is
in the same package.
It actually needs to be bigger than the polling interval.
I suspect I missed this due to parallel tests on a loaded
VM, but running t/dir_idle.t on an unloaded machine reproduces
the problem when neither IO::KQueue nor Linux::Inotify2 are
present.
Eric Wong [Fri, 8 Sep 2023 10:51:15 +0000 (10:51 +0000)]
watch: reset HUP + USR1 signal handlers in children
Child processes handling IMAP/NNTP aren't going to want
to handle config reloads nor forced rescans, those are
exclusively for the parent. We'll leave a note that
QUIT/TERM/INT can safely use the same callback for both
parent and children, as I nearly made the mistake of
resetting those to their default values in the child.
Eric Wong [Fri, 8 Sep 2023 10:51:14 +0000 (10:51 +0000)]
watch: set %SIG for non-signalfd/kqueue
We need to ensure there isn't a window where we lose $SIG{CHLD}
handling. This is the second part in getting t/imapd.t to pass
the reload-after-setting-imap.pollInterval test
That said, I'm not entirely happy with the way -watch jumps
in and out of the event loop. It's historical baggage from
the pre-event_loop days.
Eric Wong [Fri, 8 Sep 2023 10:51:13 +0000 (10:51 +0000)]
ds: fix signals unblock for non-signalfd/kqueue
Using the sigset result of allowset() isn't appropriate for
SIG_UNBLOCK. We must generate a new signal set off of the $sig
dispatch map for use with SIG_UNBLOCK to actually unblock the
signals.
This is the first part in getting t/imapd.t to pass the
reload-after-setting--imap.pollInterval-test when neither
signalfd nor kqueue are usable.
Eric Wong [Fri, 8 Sep 2023 00:49:20 +0000 (00:49 +0000)]
fake_inotify + kqnotify: rewrite and combine code
KQNotify is now a subclass of FakeInotify since they're both
faking a subset of inotify; and both require directory scanning
via readdir() to detect new/deleted files.
ctime is no longer used with per-file stat to detect new files
with kevent. That proved too unreliable either due to low
time resolution of the NetBSD/OpenBSD VFS and/or
Time::HiRes::stat being constrained by floating point to
represent `struct timespec', so instead we fuzz the time a bit
if the ctime is recent and merely compare filenames off readdir.
This fixes t/fake_inotify.t and t/kqnotify.t failures under NetBSD
and also removes workarounds for OpenBSD in t/kqnotify.t. It
also allows us to to remove delays in tests by being more
aggressive in picking up new/deleted files in watch directories
by adjusting the time to scan if the ctime is recent.
This ought to may improve real-world reliability on all *BSDs
regardless of whether IO::KQueue is installed.
...on OpenBSD 7.3 (Perl 5.36, DBD::SQLite 1.70v0, DBI 1.643p0,
sqlite 3.41.0). I'm not sure exactly where the bug is, but I
suspect it's something inherent in Perl's unpredictable
destruction order at process teardown (something I've had to
workaround in the past when dealing with XS extensions).
There's no downloadable debug-* OpenBSD packages to ease
debugging for these components, either.
Eric Wong [Tue, 5 Sep 2023 07:37:25 +0000 (07:37 +0000)]
dskqxs: get rid of needless confess check
Destruction order is unpredictable at process teardown,
so confessing or warning here is unnecessary, just break
out of the sub since it's for to delete an entry, anyways.
pipe2(.., O_CLOEXEC) on NetBSD sets the O_CLOEXEC file description
flag along with the FD_CLOEXEC file descriptor flag, so we must
not attempt to do exact matches on the file description flags.
Eric Wong [Mon, 4 Sep 2023 23:49:45 +0000 (23:49 +0000)]
xap_helper: use rpath for libxapian on NetBSD
While rpath is frowned upon by Debian and other distro packagers; it
appears embraced by in the world of NetBSD ports and packages. This is
because ldconfig(8) on NetBSD doesn't put /usr/pkg/lib in its search
path by default. This behavior differs from the ports and packaging
systems of FreeBSD and OpenBSD which do search library paths of
pkg*-installed packages (and presumably ports).
Eric Wong [Mon, 4 Sep 2023 10:36:07 +0000 (10:36 +0000)]
test_common: start_script: set default signals
We need to ensure signal handlers in the child process aren't
inherited from the parent. This change was originally intended
to block signals all the way until PublicInbox::Daemon and
PublicInbox::Watch were fully ready to handle them (preferably
via EVFILT_SIGNAL or signalfd); but that proved unrealistic.
Now, all signal handlers are restored to their default values
before signals are unblocked.
Eric Wong [Mon, 4 Sep 2023 10:36:06 +0000 (10:36 +0000)]
tests: add `+SCM_RIGHTS' as a require_mods target
We'll also ensure the existing `lei' target expands to depend on
`+SCM_RIGHTS', and use require_mods in t/lei-import-nntp.t and
t/lei.t so they can be skipped when Inline::C and Socket::MsgHdr
are missing on OpenBSD.
Eric Wong [Mon, 4 Sep 2023 10:36:03 +0000 (10:36 +0000)]
watch: ensure children can use signal handlers
Blindly using the signal set inherited from the parent process
is wrong, since the parent (or grandparent) could've blocked all
signals. Ensure children can process signals in the event loop
when sig handlers have to use standard Perl facilities.
Eric Wong [Mon, 4 Sep 2023 10:36:02 +0000 (10:36 +0000)]
daemon: workaround pre-EVFILT_SIGNAL signals
FreeBSD and OpenBSD kqueue EVFILT_SIGNAL isn't able to handle
blocked signals which were sent before the filter is created.
This behavior differs from Linux signalfd, which can process
blocked signals that were sent before the signalfd existed.
Eric Wong [Mon, 4 Sep 2023 10:36:00 +0000 (10:36 +0000)]
t/sigfd: better checks related to SIGWINCH
Check to ensure there's a numeric value of SIGWINCH defined for
the given platform. SIGWINCH may also fire while the test is
running due to a user resizing their terminal, so a boolean test
to ensure it fired rather than an exact value check is more
correct.
Eric Wong [Mon, 4 Sep 2023 10:35:59 +0000 (10:35 +0000)]
t/sigfd: test EVFILT_SIGNAL vs signalfd differences
Verify that observed OpenBSD and FreeBSD EVFILT_SIGNAL behavior
works differently than what Linux signalfd does to ease upcoming
changes to PublicInbox::DS.
Eric Wong [Wed, 30 Aug 2023 05:10:45 +0000 (05:10 +0000)]
xap_helper.h: fix double-free on OpenBSD hdestroy
hdestroy on OpenBSD assumes each key in the table can be freed,
so use strdup to fulfil that requirement.
This behavior differs from tested behavior on glibc and FreeBSD,
as well as what I can see from reading the musl and NetBSD
source code. OpenBSD may be the only relevant OS which requires
this workaround.
Eric Wong [Wed, 30 Aug 2023 05:10:44 +0000 (05:10 +0000)]
xap_helper.h: limit stderr assignment to glibc+FreeBSD
This fixes the C++ xap_helper compilation on OpenBSD.
Assignable `FILE *' pointers appear to only be supported on
FreeBSD and glibc. Based on my reading of musl and NetBSD
source code, this should also fix builds on those platforms.
Eric Wong [Wed, 30 Aug 2023 05:10:43 +0000 (05:10 +0000)]
xap_helper.h: don't compress debug sections on OpenBSD
ld(1) on OpenBSD 7.3 doesn't appear to support zlib-compressed
debug sections out-of-the-box. Oh well, being able to build
this C++ bit at all is required to get acceptable performance
with -cindex --associate.
Eric Wong [Wed, 30 Aug 2023 05:10:42 +0000 (05:10 +0000)]
t/kqnotify: improve test reliability on OpenBSD
Unlike FreeBSD, OpenBSD (tested 7.3) kevent doesn't document
EVFILT_VNODE behavior when directories are being watched.
Regardless, FreeBSD semantics appear to be mostly (if not
unreliably) supported. Detecting rename(2) isn't reliable
at all and events seem to get lost and the test needs to
retry the rename(2) to succeed. Fortunately, rename(2)
isn't recommended for Maildirs anyways since it can clobber
existing files.
link(2) detection appears to be merely delayed on OpenBSD,
so the test merely needs an occasional delay.
Eric Wong [Wed, 30 Aug 2023 05:10:41 +0000 (05:10 +0000)]
Makefile.PL: depend on autodie, at least for tests
While using autodie everywhere is not appropriate[*], many of
our tests and FS access code can be easier-to-write and more
readable using autodie as we've started doing in XapHelperCxx.pm
and xap_helper.t
[*] - EAGAIN on non-blocking I/O shouldn't die, nor should
certain cases of opening maybe-missing files for reading
Eric Wong [Wed, 30 Aug 2023 05:10:39 +0000 (05:10 +0000)]
treewide: drop MSG_EOR with AF_UNIX+SOCK_SEQPACKET
It's apparently not needed for AF_UNIX + SOCK_SEQPACKET as our
receivers never check for MSG_EOR in "struct msghdr".msg_flags
anyways. I don't believe POSIX is clear on the exact semantics
of MSG_EOR on this socket type. This works around truncation
problems on OpenBSD recvmsg when MSG_EOR is used by the sender.
Eric Wong [Tue, 29 Aug 2023 17:20:16 +0000 (17:20 +0000)]
t/spawn.t: workaround OpenBSD RLIMIT_CPU delays
RLIMIT_CPU on OpenBSD doesn't work reliably with few syscalls or
on mostly idle systems. Even at its most accurate, it takes an
extra second to fire compared to FreeBSD or Linux due to
internal accounting differences, but worst case even the SIGKILL
can be 50s delayed.
So rewrite the CPU burner script in Perl where we can unblock
SIGXCPU and reliably use more syscalls.
Štěpán Němec [Mon, 28 Aug 2023 10:45:13 +0000 (12:45 +0200)]
public-inbox-init: honor umask when creating config file
Creating config 0600 disregarding umask breaks scenarios where daemons
run with credentials different from config owner (but need to read the
config).
File::Temp defaults to 0600, which is unsuitable for the
recommended/typical scenario of daemons running unprivileged and with
UID different from $PI_CONFIG owner, as the deamons need to read
$PI_CONFIG.
Respecting umask might end up creating world-unreadable config, too,
but for people who use such umask that's expected behavior.
Štěpán Němec [Mon, 28 Aug 2023 10:42:42 +0000 (12:42 +0200)]
ci/profiles.sh: fix case matching logic
'-' could never match, remove that alternative (it might have been a
typo of '--', but that is already covered by '*--|--*' ('*' matches
the null string)).
Replace '*--*' with the equivalent '*' ('--' is always present).
It would seem clearer to just replace the whole case command with
something like '[ "$ID" -a "$VERSION_ID" ] && break' (or the
POSIX-non-deprecated equivalent '[ "$ID" ] && [ "$VERSION_ID" ]' ); I
assume a preference of using case here (e.g., to avoid syscall
overhead in case [ is not implemented as a shell builtin (which seems
far-fetched given the context, though)).
Eric Wong [Sat, 26 Aug 2023 20:14:04 +0000 (20:14 +0000)]
t/xap_helper: skip test if missing SCM_RIGHTS support
xap_helper currently relies on FDs passed via SCM_RIGHTS for
robustness against $TMPDIR failures and over-eager FS cleanup
tasks. This depends on stable syscall numbers (Linux) or
Inline::C||Socket::MsgHdr being available, though, as Perl5
itself doesn't support SCM_RIGHTS.
We could probably add FIFO support to xap_helper for portability
to systems where neither Inline::C nor Socket::MsgHdr are available,
but that's for another day.
Eric Wong [Sat, 26 Aug 2023 06:13:17 +0000 (06:13 +0000)]
xap_helper: fix C++-specific warnings
While initialization of zeroed structs in C is done via `{0}',
I've just learned from g++(1) that C++ uses `{}'. I can't seem
to get use of a single designated initializer to compile without
warnings in C++, either, so we'll just initialize them as zero
and assign them ASAP for __cleanup__ functions.
This fixes compilation warnings under -Wextra in g++ (Debian 10.2.1-6)
which adds -Wmissing-field-initializers. This also fixes compilation
warnings under -Wall in clang (FreeBSD 13.0.0) from -Wmissing.
The `xapian-bindings-perl' package contains the Xapian.pm
SWIG bindings, but doesn't adhere to the existing convention
of naming system packages after the Perl package name itself
using: "p5-${\($Perl_package_name =~ s/::/-/gr)}".
Eric Wong [Thu, 24 Aug 2023 22:07:46 +0000 (22:07 +0000)]
cindex: dump cidx shards before inboxes
Since cidx shards used for associations are typically bigger
than individual inboxes, we'll dump them first to get better
work scheduling for xap_helper processes.
This gives roughly a 5% performance improvement with doing
a full associate on (git+lore).kernel.org
Eric Wong [Thu, 24 Aug 2023 01:22:36 +0000 (01:22 +0000)]
xap_helper: reopen+retry in MSetIterator loops
It's possible to hit a DatabaseModifiedError while iterating
through an MSet. We'll retry in these cases and cleanup some
code in both the Perl and C++ implementations.
Eric Wong [Thu, 24 Aug 2023 01:22:35 +0000 (01:22 +0000)]
cindex: implement dump_roots in C++
It's now just `dump_roots' instead of `dump_shard_roots', since
this doesn't need to be tied to the concept of shards. I'm
still shaky with C++, but intend to keep using stuff like
hsearch(3) to make life easier for C hackers :P
Eric Wong [Thu, 24 Aug 2023 01:22:34 +0000 (01:22 +0000)]
cindex: fix sorting and uniqueness
We can't rely on combining the `-u' and `-k1,1' switches of POSIX
sort(1) to do what we want. So only rely on `sort -k1,1' while
introducing a small Perl helper to fold identical prefixes into
one line. In other words, input such as:
ORS is current the comma (`,') for inbox IDs, but it'll be a
space (` ') for coderepo root IDs. This implementation also
combines identical IDs in the 2nd column. Thus:
Becomes a single `deadbeef 0' line thanks to the use of
XS List::Util::uniq (which beats a pure Perl hash).
I attempted to implement this in awk but Perl is close enough to
gawk in performance while being shorter and easier-to-understand
due to List::Util::uniq. mawk was faster, but still not enough
to matter as the bottleneck is from iterating through Xapian
MSets.
Eric Wong [Thu, 24 Aug 2023 01:22:33 +0000 (01:22 +0000)]
introduce optional C++ xap_helper
This allows us to perform the expensive "dump_ibx" operations in
native C++ code using the Xapian C++ library. This provides the
majority of the speedup with the -cindex --associate switch.
Eventually this may be expanded to cover all uses of Xapian
within the project to ensure we have access to Xapian APIs which
aren't available in XS|SWIG bindings; and also for
ease-of-installation on systems which don't provide
pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but
do provide Xapian development libraries.
Most of the C++ code is still C, as I'm not remotely familiar
with C++ compared to C. I suspect many users and potential
hackers being from git, Linux kernel, and glibc world are in the
same boat.
Eric Wong [Thu, 24 Aug 2023 01:22:31 +0000 (01:22 +0000)]
cindex: read-only association dump
This will eventually allow associating coderepos with inboxes
and vice-versa; avoiding the need for manual configuration via
tedious publicinbox.*.coderepo directives.
I'm not sure how this should be stored for WWW, yet, but it's
required since it takes about 8 hours to do this fully across
lore and git.kernel.org.
Eric Wong [Sat, 19 Aug 2023 08:30:51 +0000 (08:30 +0000)]
isearch: avoid hex string for Xapian sortable_serialise
While a string representing a integer in hex is fine for DBI and
SQLite, Xapian's sortable_serialise requires a Perl integer value.
So just retrieve the last Xapian DB document ID in this rare
code path because we can't use 64-bit integer literals in some
32-bit Perl builds (e.g. OpenBSD on i386)
Fixes: be2a0a353d60 ("isearch: support 64-bit article numbers for SQLite query")
Eric Wong [Thu, 17 Aug 2023 07:23:10 +0000 (07:23 +0000)]
t/nntp.t: attempt to quiet spurious uninitialized warnings
When running via t/run.perl ("make check-run") to reduce test
startup time, t/nntp.t occasionally hits uninitialized variable
warnings in the quote_str sub. I can't reproduce these
reliably, but scoping subs in tests reduces the chance of
conflict when we reuse interpreters.
Eric Wong [Wed, 16 Aug 2023 08:07:12 +0000 (08:07 +0000)]
search: all_terms: remove needless prefix check
The ->allterms_{begin,end} methods of Xapian::Database already
filter match on prefix natively. Thus there's no need to do
filtering ourselves (unlike per-document ->termlist_{begin/end})
Eric Wong [Thu, 27 Jul 2023 21:18:55 +0000 (21:18 +0000)]
clone: allow running without DBI / DBD::SQLite
Due to historic reasons, LeiQuery.pm gets loaded with LEI.pm and
-clone depends on LEI. So delay loading any DBI-dependent
modules until querying is actually required.
Eric Wong [Thu, 27 Jul 2023 21:18:54 +0000 (21:18 +0000)]
Makefile.pl: *.cols: account for non-UTF-8-aware awk
When checking line length limits, the `length()' function of
mawk doesn't count non-ASCII characters properly in UTF-8
locales. Force the man(1) output to use C locale and use normal
`-' instead of multi-byte dash characters.
Eric Wong [Fri, 14 Jul 2023 09:28:47 +0000 (09:28 +0000)]
tests: t/run.perl: fix invocations with <10 tests
We must account for the maximum index of an array to avoid
filling unused slots with `undef' from out-of-bounds reads.
This is needed to avoid undefined entry errors in workers when
fewer than 10 tests are run. We'll also silence the message
when a single test is run.
While I was diagnosing this, I also noticed a small
simplification and optimization in our generation of $todo_buf
since I initially thought that was the cause of undefined
entry errors in the $todo arrayref.
Eric Wong [Thu, 13 Jul 2023 05:39:17 +0000 (05:39 +0000)]
t/imapd: workaround a Perl 5.36.0 readline regression
Buffered readline (and read) ops under Perl 5.36.0 fails to read
new data after writes are made by other file handles (or
processes).
To fix and improve our test, introduce a new, (currently)
test-only TailNotify class to use inotify or kevent if available
to workaround it while avoiding infinite polling loops. Further
refinements to these test APIs since we use the same pattern for
testing daemons in many places.
This also fixes the TEST_KILL_IMAPD condition in t/imapd.t under
GNU/Linux, AFAIK that test was never reliable under FreeBSD.
Eric Wong [Thu, 13 Jul 2023 05:40:20 +0000 (05:40 +0000)]
doc: HACKING: drop bit about Debian 9.x (stretch)
It's oldoldstable, by now; just refer to Debian stable as
the primary but keep LTS distros in mind because stuff like
CentOS 7.x needs to remain supported.
Eric Wong [Tue, 11 Jul 2023 10:29:28 +0000 (10:29 +0000)]
Makefile.PL: depend on IO::Poll in case distros split it out
IO::Poll is part of the Perl standard library, but there's
always a chance distros will make it part of another package
since it's not portable to non-POSIX-like OSes.
Eric Wong [Wed, 21 Jun 2023 10:16:57 +0000 (10:16 +0000)]
t/solver_git: drop needless `use' and Plack deps
`lei (blob|rediff)' works without Plack installed, so don't put
a dependency on Plack or anything related to HTTP aside from
the URI module which we use everywhere. This only enables testing
the solver component on systems without Plack (as the actual lei
functionality has always worked without Plack).
Eric Wong [Fri, 16 Jun 2023 23:13:01 +0000 (23:13 +0000)]
www: use correct threadid for per-thread search
For individual public-inboxes relying on extindex for per-inbox
search, we must use the threadid from the extindex over.sqlite3
rather than the per-inbox over.sqlite3 file.
Eric Wong [Thu, 15 Jun 2023 09:50:53 +0000 (09:50 +0000)]
lei: make --dedupe=content always account for Message-IDs
The content dedupe logic was originally designed for v2 public
inboxes as a fallback for when the importer sees identical
Message-IDs. Thus it did not account for Message-ID(s) in
the message itself.
This change doesn't affect saved searches (the default when
writing to a pathname or IMAP). It affects --no-save, and
outputs to stdout (even if stdout is redirected to a file).
Prior to this change, lei reused the v2 logic as-is without
accounting for Message-IDs anywhere with `--dedupe=content'
(the default). This could cause messages to be skipped when
the content matches despite Message-IDs being different.
So with this change, `lei q --dedupe=content' will hash the
Message-ID(s) in the message to ensure messages with different
Message-IDs are NOT deduplicated.
Whether or not this change is a bug fix or introduces regression
is actually debatable. In my mind, it is better to err on the
side of showing too many messages rather than too few, even if
the actual contents of the message are identical. Making saved
searches deduplicate without accounting for Message-IDs would be
more difficult, too.
Eric Wong [Thu, 15 Jun 2023 08:46:37 +0000 (08:46 +0000)]
lei import: set +(L|kw) on already-imported blobs
When import hits blobs it's already seen, we'll add labels
regardless in order to match the behavior of other inexact
matches. This is useful when importing exact copies of
messages which exist in multiple mailboxes.
I noticed this when I had a message imported from my normal IMAP
`INBOX', but also copied it to a different folder for future
reference.
Eric Wong [Fri, 9 Jun 2023 10:31:08 +0000 (10:31 +0000)]
add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
Eric Wong [Fri, 9 Jun 2023 10:31:07 +0000 (10:31 +0000)]
search: hoist out do_enquire for codesearch
Reusing this bit seems to make sense as mail and code search
are similar enough w.r.t. setting up sort options. This
deduplication will become more useful as -cindex will
likely combine code and mail search to generate associations
between inboxes and code repos.
Eric Wong [Fri, 9 Jun 2023 10:31:06 +0000 (10:31 +0000)]
search: add comments wrt codesearch, reduce ops
Add some comments about various usages of xdb_shards_flat and
mset since the addition of CodeSearch (and other search things)
subclassing it may become confusing.
Since we're in the area, we can also avoid an extra hash
lookups/initializations and reduce Perl ops in various places.