From: Vsevolod Stakhov Date: Sun, 5 Oct 2025 07:32:54 +0000 (+0100) Subject: [Minor] Add safety checks for short HTML to prevent false positives X-Git-Tag: 3.14.0~87^2~11 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9c125b92b5f82de5ea66b3941f829b2219aec64d;p=thirdparty%2Frspamd.git [Minor] Add safety checks for short HTML to prevent false positives Require minimum complexity for HTML fuzzy matching: - At least 2 links (single-link emails too generic) - At least DOM depth 3 (flat structures too common) This prevents false positives on trivial HTML like:

text link

Such simple structures are not unique enough for reliable fuzzy matching. --- diff --git a/src/plugins/fuzzy_check.c b/src/plugins/fuzzy_check.c index 3876666dcc..9568cf8410 100644 --- a/src/plugins/fuzzy_check.c +++ b/src/plugins/fuzzy_check.c @@ -2133,6 +2133,24 @@ fuzzy_cmd_from_html_part(struct rspamd_task *task, return NULL; } + /* + * Additional safety checks for short HTML to prevent false positives: + * - Require at least 2 links (single-link emails too generic) + * - Require at least some DOM depth (flat structure too common) + */ + if (part->html_features) { + if (part->html_features->links.total_links < 2) { + msg_debug_fuzzy_check("HTML part has only %d links, too few for reliable matching", + part->html_features->links.total_links); + return NULL; + } + if (part->html_features->max_dom_depth < 3) { + msg_debug_fuzzy_check("HTML part has depth %d, too shallow for reliable matching", + part->html_features->max_dom_depth); + return NULL; + } + } + /* * HTML fuzzy uses separate cache key to avoid conflicts with text fuzzy. * Text parts can have both text hash (short text, no shingles) and HTML hash.