[Feature] elastic: log Reply-To, received IPs, URL metadata, and pre-result module (#6018)
* [Feature] elastic: log Reply-To, received IPs, URL metadata, and pre-result module
- reply_to_user / reply_to_domain: parsed from Reply-To via
rspamd_util.parse_mail_address, mirroring the from / mime_from split.
- received_ips: list of IPs from Received headers
- urls and urls_cta with the new collect_urls config block: per-URL
records {url, etld, host, protocol, flags, count} plus aggregate
metrics {total, unique, max_repeats, repeat_ratio}. CTA URLs are
collected via text_part:get_cta_urls({original=true}) and walked via
:get_redirected so url_redirector-resolved hops are captured, then
either kept inline at the top of urls (sorted ahead of non-CTA so
they survive max_urls truncation) or emitted into a dedicated
urls_cta when separate_cta is on
- action_forced: the module name from task:has_pre_result(), so logs
show which prefilter short-circuited the pipeline (or 'no force').
Renames get_received_delay to get_received_info (returns delay + ips
in one pass over the received chain) and replaces the local
merge_settings helper with lua_util.override_defaults — the two are
functionally equivalent recursive deep-merges, but override_defaults
is the project-wide maintained helper.
Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] elastic: reset queue counters on pop drain which prevents the indices from accumulating monotonically over the worker's lifetime
- Drop tostring() around url:get_text() (already a Lua string) in
url_to_record and url_key.
- Drop tostring() around url:get_flags_num() (.. coerces numbers).
- Replace tostring(url) in CTA dedup key with url:get_text() to avoid
the __tostring metamethod's percent-encoding two-pass walk.
- Drop `or nil` no-op after url:get_redirected().
- Cache url:get_host() once in url_to_record (was called twice).
- Remove dead `if on then` guard on url:get_flags() — only set bits
are inserted, so every value is true.
- Cache tostring(real_ip) in get_received_info and tostring(ip_addr) /
tostring(origin_ip) in get_general_metadata; refactor to one call.
- In build_urls_metadata, compute url_key(u, false) once per URL and
reuse for the CTA lookup; only recompute when full_urls is true.
- Drop sort=true from task:get_urls() — the C-level qsort doesn't
survive: results are rehashed for dedup and re-sorted by count.
Also remove the misleading "deterministic order, stable dedup"
comment (table.sort is unstable in standard Lua).
Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] elastic: drop dead `or {}` after task:get_urls() and other functions that always provide table