]> git.ipfire.org Git - thirdparty/rspamd.git/commit
[Rework] Refactor extract_specific_urls to prevent DoS and use hash-based deduplication 5732/head
authorVsevolod Stakhov <vsevolod@rspamd.com>
Fri, 7 Nov 2025 16:17:20 +0000 (16:17 +0000)
committerVsevolod Stakhov <vsevolod@rspamd.com>
Fri, 7 Nov 2025 16:17:20 +0000 (16:17 +0000)
commitba0c4caf7d3ed678faf623944cad234e3c4e84d0
tree40419c2be18e7172c3c2fc913fb6c6a9a824840d
parent4889a81784fcdfc56c003097acad96a7348dfb74
[Rework] Refactor extract_specific_urls to prevent DoS and use hash-based deduplication

Replace tostring() with url:get_hash() throughout URL extraction to avoid
filling the Lua string interning table. Critical for handling malicious
messages with 100k+ URLs where each tostring() would create an interned
string causing memory exhaustion.

Key changes:
- Use dual data structure: array for results + hash set for O(1) dedup
- Add max_urls_to_process=50000 limit with warning for DoS protection
- Track url_index for stable sorting when priorities are equal
- Fix CTA priority preservation: prevent generic phished handling from
  overwriting CTA priorities which include phished/subject bonuses
- Add verbose flag to test suite for debugging

This ensures memory usage is strictly bounded regardless of malicious
input while maintaining correct URL prioritization for spam detection.
lualib/lua_util.lua
test/rspamd_test_suite.c