]> git.ipfire.org Git - thirdparty/rspamd.git/commit
[Fix] re_cache: Always use charset-converted content for SARAWBODY matching
authorVsevolod Stakhov <vsevolod@rspamd.com>
Fri, 6 Feb 2026 11:22:07 +0000 (11:22 +0000)
committerVsevolod Stakhov <vsevolod@rspamd.com>
Fri, 6 Feb 2026 11:22:07 +0000 (11:22 +0000)
commit16e3dbe5bbdcfa96c1e7d822d4bcdead7967a848
tree35511614592ed4ad1f6077e901aa0378012e9555
parent30024739f12ecc24101adf903f4bf228aa6dca50
[Fix] re_cache: Always use charset-converted content for SARAWBODY matching

Use utf_raw_content (charset-converted UTF-8 with HTML tags preserved)
for all SARAWBODY patterns, regardless of /u flag presence. The previous
approach used utf_content (which strips HTML tags on HTML parts) and only
for classes containing /u patterns, leaving non-/u patterns matching
against raw bytes in the original charset.

This prevents trivial bypass of SA rawbody rules via exotic encodings
like UTF-16 and ensures consistent matching across PCRE and Hyperscan.
Falls back to transfer-decoded parsed content only when charset
conversion failed.
src/libserver/re_cache.c