[Rework] Refactor extract_specific_urls to prevent DoS and use hash-based deduplication
Replace tostring() with url:get_hash() throughout URL extraction to avoid
filling the Lua string interning table. Critical for handling malicious
messages with 100k+ URLs where each tostring() would create an interned
string causing memory exhaustion.
Key changes:
- Use dual data structure: array for results + hash set for O(1) dedup
- Add max_urls_to_process=50000 limit with warning for DoS protection
- Track url_index for stable sorting when priorities are equal
- Fix CTA priority preservation: prevent generic phished handling from
overwriting CTA priorities which include phished/subject bonuses
- Add verbose flag to test suite for debugging
This ensures memory usage is strictly bounded regardless of malicious
input while maintaining correct URL prioritization for spam detection.