Based on my conversations with the Xapian lead, the cost of
deletes were overestimated by 7x in cindex. Adjust the estimate
cost of a deleted document to a more reasonable number based on
calculations discussed on the xapian-discuss list.
In any case, all of our batch size memory cost estimates are
rough since since Xapian provides no way of letting us know the
memory cost of the current transaction.
$self->begin_txn_lazy;
}
+# <20230504084559.M203335@dcvr> thread on xapian-discuss@lists.xapian.org
+# discusses getting an estimate term length to multiply the get_doclength()
+# result to estimate memory use of uncommitted deletes. We need to estimate
+# length here since the data may no longer be available at all if we get to
+# prune_one().
+our $EST_LEN = 6;
+
sub prune_one { # via wq_io_do in IDX_SHARDS
my ($self, $term) = @_;
my @docids = $self->docids_by_postlist($term);
for (@docids) {
- $TXN_BYTES -= $self->{xdb}->get_doclength($_) * 42;
+ $TXN_BYTES -= $self->{xdb}->get_doclength($_) * $EST_LEN;
$self->{xdb}->delete_document($_);
}
++$self->{nr_prune};