]> git.ipfire.org Git - thirdparty/rspamd.git/commit
[Feature] Integrate HTML fuzzy hashing into fuzzy_check module
authorVsevolod Stakhov <vsevolod@rspamd.com>
Sat, 4 Oct 2025 18:34:48 +0000 (19:34 +0100)
committerVsevolod Stakhov <vsevolod@rspamd.com>
Sat, 4 Oct 2025 18:34:48 +0000 (19:34 +0100)
commit28e67afe35d7e67a77ade47e15df16c1e98f4d50
treeb01bd3fe45fc8ac846d6f0f565cc2d11cb6c334c
parent74818212833367d973d5916497f0860836418c31
[Feature] Integrate HTML fuzzy hashing into fuzzy_check module

Add support for HTML structure fuzzy hashing in fuzzy_check plugin:

Core integration:
- Add FUZZY_CMD_FLAG_HTML flag and FUZZY_RESULT_HTML result type
- Add html_shingles, min_html_tags, html_weight options to fuzzy_rule
- Implement fuzzy_cmd_from_html_part() to generate HTML fuzzy commands
- Integrate into fuzzy_generate_commands() for automatic hash generation
- Handle HTML results with configurable weight multiplier

Configuration:
- html_shingles: enable/disable HTML fuzzy hashing per rule
- min_html_tags: minimum HTML tags threshold (default 10)
- html_weight: score multiplier for HTML matches (default 1.0)

Use cases:
1. Brand protection: detect phishing with copied HTML but fake CTA
2. Spam campaigns: group messages by HTML structure
3. Template detection: identify newsletters/notifications
4. Phishing: text match + HTML CTA mismatch = suspicious

Files added:
- lualib/lua_fuzzy_html.lua: helper functions for mismatch detection
- conf/modules.d/fuzzy_check_html.conf: configuration examples
- test/functional/configs/fuzzy_html_test.conf: test configuration
- rules/fuzzy_html_phishing.lua: phishing detection rules

HTML fuzzy works alongside text fuzzy:
- Both hashes generated and sent to storage
- Separate result types allow different handling
- CTA domain verification prevents false positives

Next steps:
- Performance testing on real email corpus
- Fine-tune weights and thresholds
- Collect legitimate brand templates for whitelisting
conf/modules.d/fuzzy_check_html.conf [new file with mode: 0644]
lualib/lua_fuzzy_html.lua [new file with mode: 0644]
rules/fuzzy_html_phishing.lua [new file with mode: 0644]
src/plugins/fuzzy_check.c
test/functional/configs/fuzzy_html_test.conf [new file with mode: 0644]