From: Eric Wong Date: Wed, 11 Dec 2024 08:10:46 +0000 (+0000) Subject: cindex: adjust estimated memory cost for deletes X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=c8e51be3418c8d2c3139085eba549b3fa20f64b0;p=thirdparty%2Fpublic-inbox.git cindex: adjust estimated memory cost for deletes Based on my conversations with the Xapian lead, the cost of deletes were overestimated by 7x in cindex. Adjust the estimate cost of a deleted document to a more reasonable number based on calculations discussed on the xapian-discuss list. In any case, all of our batch size memory cost estimates are rough since since Xapian provides no way of letting us know the memory cost of the current transaction. --- diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm index 13533a00d..8b5f5ad03 100644 --- a/lib/PublicInbox/CodeSearchIdx.pm +++ b/lib/PublicInbox/CodeSearchIdx.pm @@ -813,11 +813,18 @@ sub prune_init { # via wq_io_do in IDX_SHARDS $self->begin_txn_lazy; } +# <20230504084559.M203335@dcvr> thread on xapian-discuss@lists.xapian.org +# discusses getting an estimate term length to multiply the get_doclength() +# result to estimate memory use of uncommitted deletes. We need to estimate +# length here since the data may no longer be available at all if we get to +# prune_one(). +our $EST_LEN = 6; + sub prune_one { # via wq_io_do in IDX_SHARDS my ($self, $term) = @_; my @docids = $self->docids_by_postlist($term); for (@docids) { - $TXN_BYTES -= $self->{xdb}->get_doclength($_) * 42; + $TXN_BYTES -= $self->{xdb}->get_doclength($_) * $EST_LEN; $self->{xdb}->delete_document($_); } ++$self->{nr_prune};