[Feature] Integrate HTML fuzzy hashing into fuzzy_check module
Add support for HTML structure fuzzy hashing in fuzzy_check plugin:
Core integration:
- Add FUZZY_CMD_FLAG_HTML flag and FUZZY_RESULT_HTML result type
- Add html_shingles, min_html_tags, html_weight options to fuzzy_rule
- Implement fuzzy_cmd_from_html_part() to generate HTML fuzzy commands
- Integrate into fuzzy_generate_commands() for automatic hash generation
- Handle HTML results with configurable weight multiplier
Configuration:
- html_shingles: enable/disable HTML fuzzy hashing per rule
- min_html_tags: minimum HTML tags threshold (default 10)
- html_weight: score multiplier for HTML matches (default 1.0)
Use cases:
1. Brand protection: detect phishing with copied HTML but fake CTA
2. Spam campaigns: group messages by HTML structure
3. Template detection: identify newsletters/notifications
4. Phishing: text match + HTML CTA mismatch = suspicious
HTML fuzzy works alongside text fuzzy:
- Both hashes generated and sent to storage
- Separate result types allow different handling
- CTA domain verification prevents false positives
Next steps:
- Performance testing on real email corpus
- Fine-tune weights and thresholds
- Collect legitimate brand templates for whitelisting