From: Marco Bettini Date: Tue, 22 Nov 2022 14:28:53 +0000 (+0000) Subject: lib-fts: fts_filter_stemmer_snowball_filter() - Handle cases where the stemmer return... X-Git-Tag: 2.4.0~3378 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=a1c36cc77977685686e829bcb9e9de79df2280ab;p=thirdparty%2Fdovecot%2Fcore.git lib-fts: fts_filter_stemmer_snowball_filter() - Handle cases where the stemmer returns no tokens Fixes an issue raised originally against flatcurve in GitHub Issue #37, where in some combination of languages and filters, the indexing crashes. The ultimate cause was the improper assumption that snowball ALWAYS returns a token, which happens to not be true. --- diff --git a/src/lib-fts/fts-filter-stemmer-snowball.c b/src/lib-fts/fts-filter-stemmer-snowball.c index 96d91fdf14..d4450329db 100644 --- a/src/lib-fts/fts-filter-stemmer-snowball.c +++ b/src/lib-fts/fts-filter-stemmer-snowball.c @@ -87,7 +87,19 @@ fts_filter_stemmer_snowball_filter(struct fts_filter *filter, "sb_stemmer_stem(len=%zu) failed: Out of memory", strlen(*token)); } - *token = t_strndup(base, sb_stemmer_length(sp->stemmer)); + int len = sb_stemmer_length(sp->stemmer); + if (len > 0) + *token = t_strndup(base, len); + else { + /* If the stemmer returns an empty token, the return value + * should be 0 instead of 1 (otherwise it causes an assertion + * fault in fts_filter_filter() ). + * However, removing tokens may bring the same kind of issues + * and inconsistencies that stopwords cause when used with + * multiple languages and negations. + * So, when the stemmer asks to remove a token, + * keep the original token unchanged instead. */ + } return 1; }