git.ipfire.org Git - thirdparty/rspamd.git/log

[Project] CI: swap Droid review for Claude Code + z.ai

Switch the automated PR review workflow from Factory.ai's Droid CLI to
Claude Code running headless against z.ai's Anthropic-compatible
endpoint.

- Trigger changed from "@droid review" to "@review"
- Optional model argument ("@review glm-4.7"); defaults to glm-5.1
- Provider prefixes (z-ai/) are stripped and the id is lowercased
- All model slots pinned to real GLM ids (glm-5.1 / glm-5-turbo) so no
claude-* alias can reach the endpoint
- Requires the ZAI_API_KEY actions secret; FACTORY_API_KEY now unused

[Feature] lua_tcp: bound the dial under connect_timeout for all queue shapes

Seat a LUA_WANT_CONNECT marker at the head of every non-empty queue, not
only when the head is LUA_WANT_READ. A LUA_WANT_WRITE-headed request was
already routing connect errors correctly (EV_WRITE naturally armed by the
write handler, SO_ERROR check fires before LUA_TCP_FLAG_CONNECTED), but
the timer was armed under write_timeout, not connect_timeout: a
black-holed SYN sat under the write budget and the caller's
connect_timeout was silently ignored.

After this change the prepended marker re-arms EV_WRITE under
connect_timeout for the dial; once CONNECTED is set, plan_handler_event
re-arms EV_WRITE under write_timeout for the actual write. Read-only
shapes continue to work as fixed in the previous commit.

Legacy single-budget callers (only `timeout` set, use_deduction = TRUE)
are unaffected: plan_handler_event gates per-phase timer re-arms on
!use_deduction, so the single budget rides through all phases via the
elapsed-time deduction in lua_tcp_handler. The extra LUA_WANT_CONNECT
phase costs one event-loop trip; total budget is preserved.

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>

[Fix] Properly populate timeout read in tcp_lua.c

Merge pull request #6053 from rspamd/vstakhov-url-redirector-stealth

[Feature] url_redirector: stealth-mode browser fingerprint profiles

Bump transformers in /contrib/neural-embedding-service (#5971)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.53.0 to 5.0.0rc3.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.53.0...v5.0.0rc3)

---
updated-dependencies:
- dependency-name: transformers
dependency-version: 5.0.0rc3
dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[Feature] url_redirector: coherent browser fingerprint profiles

Resolving redirector/shortener URLs with a lone randomly-picked
User-Agent is easily spotted by cloaking pages, which key on a missing
or inconsistent header set. Replace the flat default_ua list with
default_profiles: five coherent browser profiles (Chrome, Edge,
Firefox, Safari) that each bundle a User-Agent with the exact header
set, values and order that browser sends. Chromium profiles carry
sec-ch-ua client hints; Firefox and Safari correctly omit them.

One profile is picked per task and reused for every hop of every
chain, so the identity stays consistent the way a real browser would.
Headers are sent as an ordered list so their order is preserved on the
wire (RSPAMD_HTTP_FLAG_ORDERED_HEADERS).

settings.user_agent becomes an optional operator override (legacy
single-header path) and is unset by default; settings.fingerprint_profiles
holds the profile list.

dummy_http.py logs received request headers in order; a new
STEALTH FINGERPRINT HEADERS functional test asserts the redirector
emits a coherent fingerprint with preserved header order.

[Feature] http: optional insertion-ordered header emission

The HTTP client stores headers in a khash and emits them in bucket
order, so the on-the-wire header order is unpredictable. Add an opt-in
RSPAMD_HTTP_FLAG_ORDERED_HEADERS flag: each header is stamped with a
monotonic `order` at insertion time, and when the flag is set the
client serialises headers sorted by that order instead of hash order.

lua_http now accepts a list form for the headers table
({{'name', 'value'}, ...}) which preserves order and sets the flag;
the existing map form and every other caller are byte-identical.

This lets callers reproduce a real browser's exact header order, used
by the url_redirector stealth fingerprint profiles.

[Feature] Allow utilize GET in url_redirector for user-defined list (#6043)

* [Feature] Allow utilize GET in url_redirector for user-defined list of URLs via regexp

* [Fix] Regression in link writing to redis

Properly encode next_str, fix debug log, and limit callback to http only urls

Merge pull request #6042 from dragoangel/feature/update-default-ua-url-redirector

[Feature] Update default UA in url_redirector module

Merge pull request #6052 from rspamd/vstakhov-arc-header-order

[Fix] arc: emit ARC headers in a deterministic order

Merge pull request #6039 from rspamd/vstakhov-mx-check-phase-a

[Rework] mx_check: three-layer cache, finer outcomes, IP-class classification (#6032 Phases A & C)

Merge pull request #6050 from moisseev/autolearnstats

[Fix] autolearnstats: fix table formatting crash and add sorting/grouping options

[Fix] arc: emit ARC headers in a deterministic order

lua_mime.modify_headers accepted an `order` list but it had no effect:
the headers passed through a string-keyed Lua table and were serialised
to the milter reply in arbitrary hash order. arc.lua relied on `order`
to lay out an ARC set, so the three ARC headers were emitted in a
non-deterministic order. Some validators (e.g. O365) reject ARC sets
that are not in the conventional ARC-Seal, ARC-Message-Signature,
ARC-Authentication-Results layout.

When `order` is given, emit one milter reply per header in that order
(set_milter_reply merges replies cumulatively, so a single-key reply
has no ambiguous iteration order) and apply the internal modify_header
calls in the same order.

Issue: #6045

[Feature] mx_check: IP-class classification, trust maps, run-scope toggles

Phase C of #6032 (IPv6 probing deferred):

IP-class classification. Resolved MX-target IPs are partitioned into
PUBLIC / LOCAL / BOGON against fixed RFC range sets. LOCAL (RFC1918,
CGNAT, ULA) is unprobeable from our vantage point; BOGON (loopback,
link-local, TEST-NET, multicast, reserved) has no legitimate meaning as
an MX target and is a packet-injection footgun. Only PUBLIC addresses
are probed; the rest emit MX_LOCAL_ONLY/MIX and MX_BOGON_ONLY/MIX. The
range sets are a correctness invariant and are not operator-tunable.

Per-layer trust/skip maps. exclude_mxs is a glob map of trusted MX
hostnames; a hit short-circuits the whole check with MX_WHITE. exclude_ips
is a radix map of IPs dropped from the probe set; if it empties the set,
MX_SKIP fires.

Run-scope toggles. check_authorized and check_local (both default false)
control whether authenticated and local-network senders are checked,
replacing the previous hardcoded skip.

test_mode (testing only) lifts loopback out of the bogon set so the probe
path stays exercisable against a local listener; functional tests use it.

The IPv4-mapped range ::ffff:0:0/96 is intentionally excluded from the
bogon set: rspamd's radix stores IPv4 as its v4-mapped form, so listing
that prefix would classify all IPv4 traffic as bogon.

Refs #6032.

[Fix] mx_check: address Phase A review defects

Fix three defects found in review of the Phase A rework:

- step2/step3: a non-working probe verdict for one MX host ended the
  whole lookup instead of trying the remaining MX records. Domains with
  a refused/timed-out primary MX and a reachable backup MX were scored
  MX_INVALID instead of MX_GOOD. step3 now hands its verdict to a
  continuation; step2 walks the MX list in priority order and only
  emits a failure after every selected host fails. Also stop caching a
  broken-MX domain under d: as 'nxd' (it would later be misreported as
  NXDOMAIN).

- A-fallback: a NODATA/empty A response was cached and reported as
  NXDOMAIN. nxdomain is now returned only for a genuine DNS_ERR_NXDOMAIN;
  domains that exist but publish neither MX nor A emit a missing/invalid
  outcome and write no d: cache entry.

- Legacy aliases: the shipped modules.d/mx_check.conf set connect_timeout
  and verify_greeting, so the merged config always carried them and the
  `timeout`/`wait_for_greeting` aliases were silently ignored. Drop those
  keys from the shipped file (kept as documented comments); warn when a
  legacy key and its replacement are both set.

Add a functional test for the NODATA case.

Refs #6032.

[Feature] autolearnstats: add --sort-by and --group options

Add --sort-by <col> to sort rows by a chosen column (verdict, score,
ts, tid, ip, from, rcpts) with timestamp as a tiebreaker. Score is
compared numerically; all other columns lexicographically.

Add --group flag to insert a blank separator line between consecutive
rows where the --sort-by key changes.

Add unit tests for sort key extraction functions.

Merge branch 'master' into vstakhov-mx-check-phase-a

Merge pull request #6048 from dragoangel/fix/avoid-tcp-leak-on-read-wo-write

[Fix] Avoid TCP leak on read without write

[Fix] autolearnstats: fix crash and truncate long table columns

LuaJIT string.format only parses 2-digit widths (max 99); 3-digit
column widths like %-176s caused "invalid option" errors. Replace
header string.format with pad() calls.

Cap From/Recipients column display width at 60 chars; introduce
cell() helper that truncates overlong values with a ".." suffix.

Add unit tests for pad() and cell() covering truncation, width
invariant, and the >= 100 width regression.

[Minor] css: fix out-of-bounds read in ident escape scanner

consume_ident scanned a backslash escape with a do-while that read
input[++i] at the top of the body but checked i < input.size() only
at the bottom. When i reached input.size() - 1 the loop re-entered
and input[++i] read one element past the string_view.

CSS reaches the tokeniser from style attributes whose value lives in
a tightly sized mempool buffer, so a token ending in backslash plus a
hex digit produced a one-byte heap over-read. Gate the increment with
i + 1 < input.size().

[Minor] str_util: fix lookahead over-read in find_eoh

rspamd_string_find_eoh peeks p[1] in the got_cr state but guarded it
with "p < end", which is already guaranteed by the loop and does not
cover the p+1 access. On input whose header region ends with \r\r the
peek read one byte past the buffer; the MIME parser calls this with a
non-NUL-terminated GString view over the message, so that byte is not
guaranteed to exist.

Check p + 1 < end instead; a truncated \r\r at end of input then
falls through to the existing branch that treats it as end-of-headers.

[Minor] archives: fix 7zip varint decoding

rspamd_archive_7zip_read_vint had two defects in the multi-byte path:
the destination uint64_t was left uninitialised before a partial
memcpy, and the "shift back" used sizeof(tgt) (bytes) mixed with
NBBY * intlen (bits). For intlen >= 2 that expression underflows the
unsigned size_t and produces a shift of 64 or more, which is
undefined behavior.

Zero-initialise the value and drop the bogus shift: with a zeroed
target the little-endian memcpy already yields the intlen-byte value
directly.

Merge branch 'master' into fix/avoid-tcp-leak-on-read-wo-write

[Fix] Avoid TCP leak on read without write

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>

[Minor] spf: fix over-read on a bare "spf2." sender-id record

start_spf_parse validated only the "spf2." prefix (sizeof - 1) but
then advanced begin by the full sizeof, skipping one unvalidated
byte. A TXT record consisting of exactly "spf2." made the following
'/' check read past the logical end of the string, and could chain
into parse_spf_scopes walking past the allocation.

Advance past the validated prefix only, then check the version digit
and '/' with short-circuiting so neither read goes past the
terminator.

[Fix] rdns: reject DNS labels that overrun the packet

rdns_parse_labels computes the name length in a first pass that only
reads label length bytes, then a second pass copies the label data.
The first pass never checked that a label's data actually fits within
the packet, so a reply whose final label declared more bytes than
remained made the second-pass memcpy read past the end of the reply
buffer. On the DNS-over-TCP path that buffer is malloc'd to exactly
the advertised message size, so the over-read ran past the allocation.

Validate in the first pass that both plain and compressed label data
stay within the packet, and reject the name otherwise. Also fix an
off-by-one in rdns_decompress_label where an offset equal to the
packet length was accepted and read one byte past the end.

[Fix] html: prevent buffer overflow in entity decoding

decode_html_entitles_inplace works in place, relying on the
replacement never being longer than the source entity text. That
assumption does not hold for some short entity names that expand to
multi-codepoint replacements (e.g. nGt, nLt, nvap): when such an
entity sits at the very end of the buffer the named-entity memcpy
wrote a few bytes past the end.

Bounds-check the replacement against the remaining buffer before
copying, matching the existing numeric-entity path, and drop the
entity when it does not fit.

[Minor] url: fix out-of-bounds read on empty/all-dots host

rspamd_url_maybe_regenerate_from_ip could read host[-1]:

* The trailing-dot strip loop tested *(end - 1) before the end > p
  bound, so an all-dots host (http://.../) walked end down to p and
  then dereferenced one byte before the host buffer.
* rspamd_url_parse only rejected an empty host before URL-decoding;
  a host such as "%" decodes to zero bytes, so hostlen could become 0
  and still reach the regen/telephone code with end == p.

Reorder the loop condition, re-check hostlen after the host is
decoded and shifted, and guard rspamd_url_maybe_regenerate_from_ip
against a zero-length host.

[Minor] mime_headers: avoid uninitialised bytes in rfc2047 decode

When an encoded-word fails to decode, the failure branch reset the
token length with `token->len -= tok_len`. For the base64 path that is
wrong: rspamd_cryptobox_base64_decode writes its *outlen argument
(tok_len) even on failure, so the subtraction no longer restores the
original offset and leaves token->len above pos. The bytes between the
partial decode and the grown GByteArray capacity are uninitialised and
were flushed into the decoded header value.

Reset token->len to the saved pos offset in both failure branches
instead, discarding the token cleanly.

[Fix] fuzzy_storage: harden network input paths

Three defensive fixes for user-controlled input over UDP/TCP:

* accept_fuzzy_socket: reset msg_namelen back to the buffer capacity
  before every recvmsg/recvmmsg call. The kernel overwrites msg_namelen
  with the actual source address size on output; on the non-recvmmsg
  path the for(;;) loop reused the same msghdr across calls, so a
  larger source address (e.g. IPv6 after IPv4) was silently truncated
  by the kernel and the trailing bytes of the parsed sockaddr came
  from stale stack memory.

* rspamd_fuzzy_tcp_io: validate the reconstructed 16-bit frame length
  before folding it into cur_frame_state. The state machine only has
  14 bits for the length (top two bits are flags), so values with bit
  14 or 15 set were silently masked off, letting a client smuggle a
  large advertised size while the server parsed a much smaller frame.
  Now any length above FUZZY_TCP_BUFFER_LENGTH or equal to zero closes
  the connection immediately.

* rspamd_fuzzy_make_reply: clamp mf_result->n_extra_flags to
  RSPAMD_FUZZY_MAX_EXTRA_FLAGS before the memcpy into the fixed-size
  rep_v2->extra_flags[7]. All current backends already bound this
  value, but the frontend was trusting them; clamp defensively so a
  future backend bug cannot become an OOB write on the reply struct.

[Fix] fuzzy_storage: peer-pipe write resume and shutdown drain

fuzzy_peer_try_send retried short writes from byte 0 of the command
instead of resuming at the offset already sent, so a partial write
followed by a watcher-driven retry shoved garbage into the peer pipe.

Track the bytes sent on the request and resume from there. Convert
the helper to a tri-state (DONE / AGAIN / FATAL) so the watcher can
keep firing on transient short writes and only stop+free on completion
or a hard error.

Also link pending requests into a list on the ctx so worker shutdown
can drain any whose write watcher never fires (e.g. on non-update
workers where the event loop has already broken out), instead of
leaking the up_req allocations.

[Fix] fuzzy_storage: avoid per-refresh leak in dynamic ban inserts

rspamd_fuzzy_block_addr allocated the ban struct from the radix tree's
long-lived mempool before calling radix_insert_compressed. When the
prefix was already present (the common case: ban_sync re-applies on every
bans_version bump, provisional re-blocks every provisional_ttl), the
btrie rejected the duplicate and the code mutated the existing struct in
place — leaving the freshly allocated one orphaned in the mempool with no
way to reclaim it short of a worker restart.

The pool is created with rspamd_mempool_new_long_lived and freed only at
radix_destroy_compressed, so the orphans accumulate monotonically. With
thousands of bans churning across a fuzzy fleet and the rspamd-mem-watchdog
trimming workers on a 30-minute cadence, this matches the growth pattern
we have been compensating for.

Look up the prefix first; on a hit, mutate in place without allocating.
Allocate and insert only on a true miss.

[Feature] Update default UA in url_redirector module

Merge branch 'master' into feature/update-default-ua-url-redirector

[Feature] Update default UA in url_redirector module

Add Dmytro Alieksieiev to AUTHORS.md

Merge pull request #6040 from fatalbanana/copyright

[Minor] Update copyright for some plugins

[Minor] Update copyright for some plugins

[Minor] Update AUTHORS

[Test] multimap: cover regexp_rules selector atom brand spoof

Adds a Bank of America display-name spoof scenario to the SA-style
regexp_rules tests: a `selector =~` atom on `from:name`, a `selector !~`
atom on `from:domain`, and a meta combining them. Validates both =~ and
!~ behavior plus meta scoring on a real spoofed-display-name message.

Merge pull request #6041 from rspamd/vstakhov-neural-profile-carryover

[Fix] neural: preserve trained ANN across symcache-driven profile rotation

[Test] neural: cover providers_digest rotation carryover

Regression test for the symcache-driven profile rotation fix.

Drives a live rspamd + Redis through: train ANN with providers-only
input (metatokens, disable_symbols_input=true) -> verify NEURAL_SPAM /
NEURAL_HAM fire -> mutate set.symbols/set.digest in the scanner worker
(simulates a symcache shift) -> verify inference still fires after the
next check_anns poll.

Pre-fix the mutation pushes the symbol-list Levenshtein distance well
past the 30% tolerance, the worker rejects the trained profile, and
NEURAL_SPAM stops firing. Post-fix the providers_digest stays
constant and is recognised as the authoritative schema fingerprint, so
the trained ANN is reloaded.

max_trains=1 because metatokens-only scans produce an identical
vector per message and Redis SADD deduplicates — one spam + one ham
scan are enough to fire training.

[Fix] neural: preserve trained ANN across symcache-driven profile rotation

When rspamd's symbol cache shifts (any added/removed symbol, even unrelated
to the neural rule), the per-rule symbol digest changes and the plugin
historically picked a brand-new profile — abandoning the previously-trained
ANN at the old redis_key.  In deployments where the input vector is built
from providers (e.g. fasttext_embed conv1d) and `disable_symbols_input` is
set, the symbol list is irrelevant to the vector schema, so the
rotation needlessly reset inference until enough new training data
accumulated.

Make providers_digest the authoritative schema fingerprint when providers
are configured:

* New helper `is_profile_compatible` in lualib/plugins/neural.lua decides
  load eligibility based on providers_digest first; symbol-list drift is
  ignored entirely when `disable_symbols_input = true`, and tolerated
  without bound for hybrid (providers + symbols) rules where symbols form
  only a minor slice of the fused vector.  Pure-symbols rules keep the
  legacy 30% Levenshtein tolerance and now also reject profiles that were
  trained with providers (vector schemas differ).

* process_existing_ann/maybe_train_existing_ann use the new helper, and
  the reload decision in process_existing_ann picks the fresher version
  when the providers schema matches across a symbol-digest shift.

* new_ann_profile triggers an async carryover after ZADD: ZREVRANGE the
  zset, find the most recent prior profile with a matching
  providers_digest, HMGET its ann/roc_thresholds/pca/providers_meta/
  norm_stats, and HMSET them into the fresh redis_key.  Gated on
  HEXISTS new_key ann == 0 so a freshly-trained model is never
  overwritten.

[Fix] mime_headers/encoding: correct lengths after in-place rewrites

- mime_headers (message-id): after g_strstrip shifts content forward
  in-place, the pre-strip length is stale; re-acquire p and len so the
  cleanup loop does not scan past the live content and pull stale bytes
  (which the loop would otherwise turn into '?' or treat as a trailing
  '>') into MESSAGE_FIELD(task, message_id).
- mime_encoding (rspamd_charset_normalize): fix the trim-in-place math;
  the previous version copied one extra byte past `end` and wrote the
  null terminator at the unshifted offset, leaving stale trailing bytes
  in the normalized charset name.
- mime_encoding (rspamd_mime_charset_utf_enforce): use goffset for the
  inner offsets so buffers >= 2 GiB cannot truncate to int32_t and make
  p += cur_offset walk backwards into OOB writes.

[Fix] images/archives: harden parsers against malformed inputs

- images.c: guard Content-Id image linking against NULL rh->decoded.
- archives.c (zip): require >= 22 bytes for the EOCD scan to avoid a
  pointer-below-start computation; widen cd_offset + cd_size to uint64_t
  so a 32-bit wrap can no longer bypass the bounds check and let cd land
  outside the buffer.
- archives.c (rar v5): replace pointer-arithmetic bound on the file
  extra-field with a size-based check so an attacker-controlled 64-bit
  extra_sz cannot wrap p + fname_len + extra_sz and trigger an OOB read.
- archives.c (7z): same fix in rspamd_7zip_read_archive_props for proplen.
- archives.c: two return NULL from a bool-returning function changed to
  return false (cosmetic).

[Fix] mime_parser: defensive guards against NULL deref and resource leaks

- Fix incorrect offset in begin-base64 UUE prefix detection (was using
  sizeof("begin ") instead of sizeof("begin-base64 ")).
- Guard against NULL header value when iterating Content-Type headers
  in rspamd_mime_process_multipart_node and rspamd_mime_parse_message.
- Add NULL checks for p7->d.sign / contents / type in the SMIME branch
  to avoid crashes on malformed PKCS7 signed-data structures.
- Free the recursive parser context on the early error-return path in
  rspamd_mime_parse_message so it does not leak the per-recursion stack
  and boundaries arrays.

[Fix] url_suspect: require TLD >= 3 chars for word_dot naked domain matches

Two-char country TLDs (.so, .to, .me, .in, .us, etc.) overlap with common
English words, causing false positives when normal prose like "pale blue dot
so insignificant" is matched by the word_dot pattern and normalized to a
valid-looking naked domain (blue.so).

Explicit-protocol patterns (hxxp, spaced_protocol) are unaffected and still
match 2-char TLDs.

Merge branch 'master' into vstakhov-mx-check-phase-a

[Minor] Defensive guards in JPEG and RFC 2047 QP decoders

process_jpg_image(): bail out early when the input is shorter than the
minimum needed to safely access the SOF fields referenced as p[4..7].
Pointer-arithmetic associativity already makes the existing
`end = p + data->len - 8` benign on standard targets (the loop simply
doesn't execute for tiny buffers), but the explicit precondition makes
the intent obvious and is robust against future refactors.

rspamd_decode_qp2047_buf(): when an encoded-word ends with a bare `=`
that has no following hex digits, emit a literal `=` instead of reading
one byte past the input. Two paths could reach the OOB read - the
direct `*p == '='` block and the else-branch's `goto decode` after
memcspn finds a trailing `=` - both are now guarded. In production the
read landed inside the surrounding header-value buffer (mempool
allocated, null-terminated), so this is cosmetic, but it silences
fuzzer/ASAN noise on direct-call test harnesses.

[Minor] CI: Upgrade model version from gpt-5.4 to gpt-5.5

[Rework] mx_check: three-layer Redis cache and finer outcomes (Phase A)

Replaces the single domain-keyed cache with three namespaces — `<key_prefix>:d:`
for the per-domain MX/A-fallback verdict, `<key_prefix>:m:` for per-MX-host A
records, and `<key_prefix>:i:` for per-IP probe verdicts. Two domains pointing
at a shared MX host (every G-Suite / M365 tenant, every ESP customer) now share
the m-layer and i-layer entries, so the second domain hits cache at every step
and emits its symbol with zero new DNS or TCP work.

Splits the probe into two clean shapes — pure connect-only and full SMTP banner
validation — using the new `lua_tcp` options merged in #6034. `verify_greeting`
+ `send_quit` replace the conflated `wait_for_greeting`; banner parsing
honours multi-line greetings (RFC 5321 §4.2.1), validates the reply code, and
distinguishes 220 success, 4xx/5xx rejection (real SMTP, `MX_ERROR`), and
non-SMTP listeners (`MX_INVALID`).

Adds informational symbols at score 0: `MX_REFUSED`, `MX_TIMEOUT_CONNECT`,
`MX_TIMEOUT_READ`, `MX_ERROR`, `MX_NXDOMAIN`, `MX_NULL` (RFC 7505 detection),
`MX_BROKEN` (every MX RR points at an unresolvable host). Primary symbols
(`MX_GOOD` / `MX_INVALID` / `MX_MISSING` / `MX_WHITE`) keep today's scores —
operator-visible behaviour is preserved, the new symbols are emitted alongside
for tuning data ahead of Phase B's two-path matrix.

Legacy keys are honoured with deprecation warnings: `timeout` maps to
`connect_timeout`, `wait_for_greeting` maps to `verify_greeting`. Adds a `port`
setting (default 25) so the module is testable on non-privileged ports.

Functional tests in test/functional/cases/167_mx_check.robot cover Null MX,
NXDOMAIN, broken-reference MX, connect-refused, and the A-fallback path.

Refs #6032.

Merge pull request #6035 from moisseev/url-scheme

[Minor] url_redirector: skip non-HTTP(S) URLs in http_walk

[Minor] url_redirector: skip non-HTTP(S) URLs in http_walk

Non-HTTP(S) schemes (such as tel:, mailto:, etc.) cannot have HTTP
redirects. Attempting to follow them in http_walk is unnecessary and
could potentially lead to errors. This change skips these URLs early
in the redirect chain walk and emits the URL_REDIRECTOR_NON_HTTP
virtual symbol with a single option in the format:

scheme=http_chain->non_http_url

e.g.: telephone=click.example.com->tel:+71234567890

[Fix] Dot add :// to mailto: URIs (RFC 6068)

mailto: is non-hierarchical — the // authority component never applies.
The bug was in rspamd_mailto_parse setting RSPAMD_URL_FLAG_MISSINGSLASHES
when // was absent, causing rspamd_url_parse_text to
inject :// into the stored string.

Note: bare email addresses detected via the @ pattern (user@example.net
in text, no scheme prefix) still go through a different path where
"mailto://" is injected as a literal prefix — that's a separate issue
and out of scope here.

[Feature] memstat: per-callsite mempool counters and structured jemalloc

Track lifetime pools/chunks/bytes counters per mempool callsite and
expose them via rspamd_mempool_entry_stat_t. memory_stat now emits
per-arena jemalloc stats instead of the raw malloc_stats_print dump.
The rspamadm control memstat renderer gains --compact and --only
modes, sortable callsite columns (cur/total bytes and pools), and
prints just the callsite filename.

[Feature] lua_task: bulk and regexp symbol lookups

Add table-form overloads to task:has_symbol() and task:get_symbol()
that accept {S1, S2, ..., Sn} and return true / a {name -> info} map
if any of the listed symbols fired. Both keep the legacy single-name
form (with optional shadow_result_name) untouched.

Introduce task:has_symbol_regexp(re [, shadow_result_name]) and
task:get_symbol_regexp(re [, shadow_result_name]) that match fired
symbol names against an rspamd_regexp userdata.

Merge pull request #6034 from rspamd/vstakhov-lua-tcp-phased

[Feature] lua_tcp: phase-specific timeouts and on_error callback

Merge pull request #6027 from moisseev/fuzzy-flags

[Minor] Warn on fuzzy flag collisions across writable rules

Merge pull request #6028 from VovikN/fix/dkim-permfail-handling

[Fix] Handle DKIM permfail in Authentication-Results header

[Feature] lua_tcp: phase-specific timeouts and on_error callback

Two opt-in additions to rspamd_tcp.new, motivated by issue #6032 (mx_check
probe shapes — connect-vs-read budget independence and connect-phase error
routing without dummy-queueing a read handler).

A. Phase-specific timeouts.

  * New options: connect_timeout, read_timeout, write_timeout. Setting any
    of them switches the request to phased mode: each phase gets its own
    budget, unset phase fields fall back to `timeout`. The watcher is
    re-armed from the appropriate field on every plan_handler_event entry
    (LUA_WANT_READ / LUA_WANT_WRITE / LUA_WANT_CONNECT).

  * Backwards compat: existing callers passing only `timeout` keep the
    current single-deducted-budget contract by construction. A new
    `use_deduction` flag gates both the `elapsed` deduction in
    lua_tcp_handler and the per-phase reset in plan_handler_event. No call
    site changes its observable behaviour unless it actively sets a phase
    field.

  * Rationale (Option 2 from the issue): lua_tcp underpins every AV scanner
    and lualib helper. The HTTP-style "no deduction" alternative would
    silently shift their wall-clock from `<= timeout` to `<= N x timeout`;
    Option 2 avoids that surprise for one extra bool and one extra branch.

B. on_error callback for connect-phase errors.

  * New `on_error(err, conn)` callback fires at most once for failures
    that occur before LUA_TCP_FLAG_CONNECTED is set: DNS resolution, socket
    creation, connect refused/timeout, SSL handshake. Once the connection
    is established, errors continue to flow through the queued read/write
    callback unchanged.

  * Routing is exclusive: when on_error is set and we are pre-CONNECTED,
    the error goes there alone (no queue-walking fanout). One-shot — the
    ref is dropped on first fire so subsequent failures fall through to
    the regular handler path. SSL handshake errors land here because
    LUA_TCP_FLAG_CONNECTED is only set after the handshake completes.

  * Pure-probe support: a request with `read = false`, no `data`, and an
    on_error/on_connect would previously short-circuit (empty handler
    queue -> "no handlers left, finish session" before the dial ever
    completed). The constructor now pushes a LUA_WANT_CONNECT marker in
    that shape so plan_handler_event arms EV_WRITE; lua_tcp_connect_helper
    handles the async case (shift the marker, re-plan, let the empty queue
    drive the FINISHED tear-down) — previously it dereferenced cbd->thread
    unconditionally and was sync-only.

C. Tests (test/functional/lua/tcp.lua + cases/230_tcp.robot).

  * PHASED_TIMEOUT_TEST — phased timeouts on the success path emit
    PHASED_TCP_OK.
  * ON_ERROR_REFUSED_TEST — connect to closed port 1, no read/data; only
    the on_error callback fires (regular callback must not).
  * ON_ERROR_POST_CONNECT_TEST — connect succeeds against dummy_http
    /timeout, read_timeout=0.5 trips post-CONNECTED; the read callback
    receives the timeout, on_error must NOT fire.

Merge pull request #6031 from moisseev/linters

[Test] Update dev dependencies

Merge pull request #6030 from rspamd/vstakhov-srv-upstream-refactor

[Feature] upstream: per-target SRV upstreams

[Test] upstream: deterministic SRV rate-window test via libev fake clock

Switch rspamd_upstream_fail's rate-window timestamp from
rspamd_get_ticks(FALSE) to a new rspamd_upstream_now_fresh helper that
calls ev_now_update_if_cheap then ev_now. Multiple fail() calls in a
single loop iteration now see fresh times, and tests can drive virtual
time through the libev hook without sleeping.

  * rspamd_upstream_now / rspamd_upstream_now_fresh helpers hoisted to
    the top of upstream.c with a short comment about why ev_now matters
    (loop-cached time = tests can drive it; production correctness wart
    of mixed time sources goes away).
  * rspamd_upstream_ctx_set_event_loop_for_test: install a loop on
    upstream_ctx without going through rspamd_upstreams_library_config
    (which needs a full rspamd_config).
  * rspamd_test::fake_clock RAII helper installs the libev hook,
    advances virtual time, and resyncs the loop on construct/destroy.

The "error budget is per member" SRV test drops g_usleep(1000) and the
error_time = 0.002 s macOS-jitter workaround; uses error_time = 1.0 s,
max_errors = 4, and clk.advance(0.1) between fails. Test runs in 80 ms
and is fully deterministic.

[Feature] libev: add fake-clock and time-resync hooks for tests

Three local extensions on top of stock libev:

  * ev_set_fake_time_cb / ev_get_fake_time_cb — process-global hook;
    when set, replaces both ev_time() and the internal monotonic
    clock so timers and ev_now() advance under test control.
  * ev_now_resync — force-resync the loop's cached realtime/monotonic
    state from the current sources, discarding interpolation. Required
    after installing or removing a fake clock; also useful after any
    other large clock discontinuity.

Default cb is NULL, so production cost is one predicted-false branch
in each clock read.

Local style follows libev's (GNU-ish, two-space, space-before-paren),
not the rspamd tree style — bypassing clang-format here intentionally.

[Test] Update dev dependencies

Update ESLint 17.4.0 → 17.11.0, stylelint 17.4.0 → 17.11.0, and related packages

[Fix] upstream: harden SRV drain lifecycle

Five lifecycle holes flagged by code review around the new SRV drain
path; addressing all in one commit since they are tightly coupled.

1. Lock-order inversion in rspamd_upstream_srv_apply: locked the
   parent then called drain_member / create_member which take
   ls->lock. Everywhere else the order is ls -> upstream (set_inactive,
   return_tokens). Drop the parent lock entirely — DNS replies and
   tests are single-thread on a given parent so the only mutator of
   srv_members is serialized through the event loop anyway. Avoids
   deadlock under UPSTREAMS_THREAD_SAFE.

2. Drained SRV members could re-enter alive via half-open probe
   completion: rspamd_upstream_ok with half_open_inflight > 0 calls
   set_active on a member with active_idx == -1, regardless of
   is_draining. An inflight selector that probed before drain and
   reported success after drain would silently undo the drain. Fix:
   gate the half-open success branch on !is_draining, and clear
   half_open_inflight in srv_drain_member as belt-and-braces.

3. dns_cb / update_addrs ignored is_draining. A drained member with
   an A/AAAA query in flight would still rebuild addrs.addr after
   drain — wasted work, and races the dtor's free(addrs.addr) once
   the grace timer fires. Early-return both functions when the
   member is draining (in update_addrs, free any pending new_addrs
   linked list to avoid a leak on the abandon path).

4. Grace timer ref leak when ctx has an event_loop but is not yet
   configured: the original code did REF_RETAIN + ev_timer_init
   unconditionally and gated only ev_timer_start on configured.
   Without a started timer the retained ref leaks. Fix: gate the
   entire REF_RETAIN + timer-arm block on (event_loop && configured).

5. Drained members kept a back-pointer to ls. After
   rspamd_upstreams_destroy the ls is freed but the grace timer can
   still fire on the member; revive_cb / record_latency /
   return_tokens already guard on ls == NULL, so NULL out
   member->ls right after the drain bookkeeping is done.

Also fix an inaccurate comment in rspamd_upstream_dtor that claimed
destroy clears srv_members entries before the parent dtor — it does
not. The hash's value-destroy is NULL by design; only keys are freed.

All existing upstream test suites (65 cases, 72k+ assertions) and
the full cxx suite (209 cases) remain green.

[Test] upstream: cover SRV multi-upstream expansion

Nine doctest cases drive the new SRV-as-multiple-upstreams path
without DNS, via the rspamd_upstream_srv_apply / force_alive_for_test
helpers exposed in upstream_internal.h:

- single-target expansion produces one selectable member
- 3 equal-weight targets distribute uniformly under round-robin
- SRV weight is honoured (100/100/1 ratio holds over many cycles)
- diff add: a new target appears, identity preserved for existing
- diff remove: dropped target drained out of selection
- diff weight change: distribution shifts after re-apply
- error budget is per member (rate threshold on one target leaves
the other two alive — pre-refactor all three would have died)
- per-member latency EWMA records distinct values
- SRV parent is invisible to count and foreach

The error-budget case uses tightened limits (error_time=2ms,
max_errors=1) so the rate threshold fires comfortably above
g_usleep jitter on macOS while the test stays well under a second.

[Feature] upstream: expand each SRV target into its own upstream

The previous SRV path collapsed every target's A/AAAA records into a
single struct upstream. SRV weight was dropped on the floor (see the
"contradicts with upstreams logic" comment that has been there since
forever), the 4-errors-in-10s budget was shared across the whole
cluster, and modern selection algorithms (P2C, token bucket, ring
hash, slow start, latency EWMA) had nothing to choose from since
they operate at the upstream level.

Refactor so each SRV reply entry materialises its own struct upstream
member. Members are first-class participants in every rotation
algorithm, with their own error budget, per-target weight, latency
EWMA and address list. The `service=...` config syntax is unchanged.

Lifecycle:
- Parse-time: parent placeholder gets the SRV_RESOLVE flag and a
  pre-allocated GHashTable keyed by "fqdn:port".
- DNS callback: convert reply entries to plain rspamd_upstream_srv_entry
  and call the new common rspamd_upstream_srv_apply, which diffs the
  snapshot against the parent's member set.
- New target: create member in PENDING_RESOLVE state, kick off A/AAAA;
  the existing promote_pending machinery moves it into `alive` once
  addresses arrive.
- Existing target: refresh weight/priority, re-resolve A/AAAA.
- Dropped target: graceful drain — pull from `alive`, fire OFFLINE
  watcher, restore token bucket inflight, remove from ls->ups, then
  arm a one-shot revive_time timer (reusing revive_cb's is_draining
  short-circuit) as a grace window for inflight selectors. With no
  event loop the drain is synchronous.

Bookkeeping: SRV parents are invisible to rspamd_upstreams_count,
rspamd_upstreams_foreach, and the probe-mode iterator — they're not
selectable upstreams. set_active and resolve_addrs short-circuit on
the SRV_RESOLVE flag so the parent only owns the lazy-resolve timer.

Out of scope (follow-ups): RFC 2782 priority-tier failover (we record
srv_priority but don't filter selection by it) and adapting addr_next
callers like fuzzy_check to retry across members via get_except.

Internal API for tests lives in upstream_internal.h.

[Fix] Handle DKIM permfail in Authentication-Results header

When a DKIM signature has an invalid record, task:get_dkim_results() returns
'permfail' which should map to dkim=permerror in the Authentication-Results
header. Previously this result fell through to dkim=none, which is incorrect
when a DKIM signature is present.

Fixes #5957

[Minor] Refactor fuzzy flag collision detection into parse-time check

[Minor] Warn on fuzzy flag collisions across writable rules

Issue: #6003
PR: rspamd/docs.rspamd.com#99

[Feature] elastic: log Reply-To, received IPs, URL metadata, and pre-result module (#6018)

* [Feature] elastic: log Reply-To, received IPs, URL metadata, and pre-result module

- reply_to_user / reply_to_domain: parsed from Reply-To via
  rspamd_util.parse_mail_address, mirroring the from / mime_from split.
- received_ips: list of IPs from Received headers
- urls and urls_cta with the new collect_urls config block: per-URL
  records {url, etld, host, protocol, flags, count} plus aggregate
  metrics {total, unique, max_repeats, repeat_ratio}. CTA URLs are
  collected via text_part:get_cta_urls({original=true}) and walked via
  :get_redirected so url_redirector-resolved hops are captured, then
  either kept inline at the top of urls (sorted ahead of non-CTA so
  they survive max_urls truncation) or emitted into a dedicated
  urls_cta when separate_cta is on
- action_forced: the module name from task:has_pre_result(), so logs
  show which prefilter short-circuited the pipeline (or 'no force').

Renames get_received_delay to get_received_info (returns delay + ips
in one pass over the received chain) and replaces the local
merge_settings helper with lua_util.override_defaults — the two are
functionally equivalent recursive deep-merges, but override_defaults
is the project-wide maintained helper.

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] elastic: reset queue counters on pop drain which prevents the indices from accumulating monotonically over the worker's lifetime

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] elastic: address review feedback on PR #6018

- Drop tostring() around url:get_text() (already a Lua string) in
  url_to_record and url_key.
- Drop tostring() around url:get_flags_num() (.. coerces numbers).
- Replace tostring(url) in CTA dedup key with url:get_text() to avoid
  the __tostring metamethod's percent-encoding two-pass walk.
- Drop `or nil` no-op after url:get_redirected().
- Cache url:get_host() once in url_to_record (was called twice).
- Remove dead `if on then` guard on url:get_flags() — only set bits
  are inserted, so every value is true.
- Cache tostring(real_ip) in get_received_info and tostring(ip_addr) /
  tostring(origin_ip) in get_general_metadata; refactor to one call.
- In build_urls_metadata, compute url_key(u, false) once per URL and
  reuse for the CTA lookup; only recompute when full_urls is true.
- Drop sort=true from task:get_urls() — the C-level qsort doesn't
  survive: results are rehashed for dedup and re-sorted by count.
  Also remove the misleading "deterministic order, stable dedup"
  comment (table.sort is unstable in standard Lua).

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] elastic: drop dead `or {}` after task:get_urls() and other functions that always provide table

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
---------

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>

[Fix] url_redirector: cache write missing on splice, brittle mid-walk (#6017)

* [Fix] url_redirector: cache write missing on splice, brittle mid-walk handling, silent stale-lock drops

- Persist live-resolved chain when http_walk splices into step (terminal path now calls finalize_chain instead of apply_redirect_chain only). Previously the new link from orig_url to a cached redir_url was never written, leaving 'processing' marker at hash(orig_url) until 13s expiry
- Resume via http_walk on true cache miss mid-walk instead of giving up with a truncated chain (covers expired/evicted downstream links in cached chains)
- Differentiate cache miss from 'processing' lock mid-walk: lock means another worker is resolving (apply partial), miss means cache gone (extend via HTTP)
- Surface redis errors during chain walk via dedicated rspamd_logger.errx
- Add debug log when SET NX lock claim fails (held by another worker or stale processing marker after crash); previously it was a silent drop
- Add debug log 'no URLs matched redirector_hosts_map' at handler exit when message had URLs but selected=0, exposing cold-start window where redirector_hosts_map multimap has not finished loading

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] url_redirector: address review feedback on PR #6017

- Hoist the per-call finish() closure out of step() into a free
  step_finish() helper -- no closure allocation per cache hop.
- Capture tostring(last) once as last_str to avoid re-running the URL
  __tostring slow path on each error/debug branch in step()'s redis cb.
- Drop redundant tostring(ndata) in redis_reserve_cb debug log
  (rspamd_logger %s handles tostring internally).
- Replace task:get_urls() or {} with task:has_urls() in the no-match
  debug branch -- returns (bool, count) without materialising the URL
  table, so production traffic doesn't pay for an allocation just to
  feed a debug-only log line.
- Move task:has_urls() to the top of url_redirector_handler -- when
  the message has no URLs at all, return early and skip the CTA scan
  and extract_specific_urls call entirely; reuse the same n_urls in
  the no-match debug branch.

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] url_redirector: clear bridged URL from seen before handing off to http_walk

step() marks seen[val]=true after appending each cached hop. When the
cache-miss mid-walk branch then bridges to http_walk on the same hop,
http_walk re-marks via seen[tostring(url)] and -- since cache writer
stores tostring(url), making val and tostring(rspamd_url.create(val))
round-trip-stable -- collides with step's mark. The cycle guard
false-fires on the bridged URL, truncating the chain and skipping the
live extension. Clear seen[last_str] before the http_walk bridge so
its own marking is the first one for that URL.

* [Chore] url_redirector: remove unneeded guards

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
* [Fix] url_redirector: finalize on http_walk cycle to release the processing lock

The cycle branch called apply_redirect_chain, which only updates the task
(set_redirected, inject_url, insert_result) and never touches Redis.

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>
---------

Signed-off-by: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com>

[Minor] rspamadm control: list commands when none/unknown is given

Previously `rspamadm control` (no args) just printed "command required"
and exited, forcing users to dig through `rspamadm help control` to
discover the available subcommands. List them inline on the no-arg and
unknown-command error paths, matching the help output.

[Minor] re_cache: include byte length in selector debug log

Helps diagnose end-anchored regex misses where the selector buffer
contains the expected substring but ends with something else.

[Fix] url: do not drop URLs with long userinfo

The C parser consults lua_url_filter for every byte of userinfo past
max_email_user (64); the filter previously rejected anything longer
than 2048 bytes, which silently dropped the entire URL. That blanket
length REJECT killed exactly the userinfo-obfuscation phishing pattern
(https://legit.com<lots-of-spaces>@evil.com/...) the parser is meant
to surface.

Raise the catastrophic-length REJECT to 16 KiB (still well under the
parser's own G_MAXUINT16/2 cap) and have parse_user mark the URL as
RSPAMD_URL_FLAG_OBSCURED | RSPAMD_URL_FLAG_HAS_USER as soon as the
userinfo crosses 64 bytes, regardless of the filter verdict, so
downstream rules can act on the obfuscation signal.

Merge pull request #6020 from rspamd/vstakhov-lua-extras

[Feature] lua_extras: structured custom lua loader

[Test] Functional test for lua_extras two-phase loader

Adds Functional.Cases.001_Merged.271_Lua_Extras with two cases:

* the deferred-selector regexp fires when the From-domain is present in
the map captured by the selector factory;
* the same regexp stays silent when the From-domain is absent.

The companion lua_extras_test.lua stages a tree under TMPDIR with maps,
selectors and regexps subdirectories, then calls lua_extras.load_extras
on it. The selector entry is wrapped in lua_extras.deferred so the
factory captures rspamd_maps[name] at registration time, exercising the
maps -> selectors -> regexps phase 2 ordering and the re_selector
auto-binding into the regexp DSL.

Also wires the new lua file into merged.conf alongside selector_test.lua.

Merge pull request #6015 from rspamd/vstakhov-rspamadm-redis-setup

[Fix] lua_redis: add prepare_redis_setup for rspamadm tools

[Feature] lua_extras: two-phase loader for cross-kind dependencies

Refactor the structured custom-lua loader to a two-phase model so a selector
can consume entries registered by an earlier kind (typically a map, or a
precompiled rspamd_regexp built from map data) at definition time, not just
at task time.

Phase 1 globs every lua.local.d/{maps,selectors,regexps}/*.lua file and
collects each returned { name = def } entry into a per-kind staging buffer.
Phase 2 walks the kinds in dependency order (maps -> selectors -> regexps),
resolving and registering each entry. Entries that need late binding wrap
their definition in lua_extras.deferred(factory_fn); the loader invokes the
factory during phase 2 with the live cfg and uses the returned table as the
concrete definition.

Adds an optional re_selector field on selector defs which, when set, also
calls cfg:register_re_selector() so the selector becomes usable inside the
regexp DSL via name=/regex/{selector}.

The new lua_extras.load_extras(cfg, base_dir) entry point replaces the
per-kind loop in rules/rspamd.lua. lua_extras.load_dir is kept for callers
that only need a single kind.

Verified end-to-end: a selector that captures rspamd_maps[name] inside a
deferred factory and surfaces a regexp symbol via re_selector fires exactly
when the From-domain is present in the captured map, and stays silent
otherwise.

[Feature] lua_extras: structured custom lua loader

Add lualib/lua_extras with register_selector / register_map / register_regexp
helpers and a load_dir(cfg, dir, kind) directory loader. rules/rspamd.lua now
loads $LOCAL_CONFDIR/lua.local.d/{selectors,maps,regexps}/*.lua before
rspamd.local.lua, where each file returns a { name = def } table whose entries
are dispatched to the matching helper.

This lets distributions and add-ons ship custom selectors, maps and regexp
rules in well-typed files without touching rspamd.local.lua, which end users
may heavily modify. Existing free-form lua.local.d/*.lua at the root keeps
working unchanged. Errors in any single file are logged and skipped, never
aborting startup. Maps registered through the helper are stored in the global
rspamd_maps table, matching the existing lua_maps pattern.

Includes example.lua.example files in each subdirectory documenting the
expected file contract.

[Fix] elastic: use Queue:new() instead of non-existent lua_util.newdeque()

The 10x row-limit overflow guard called lua_util.newdeque(), which does
not exist, leaving buffer['logs'] as nil and causing subsequent operations
to fail. Reset the buffer using the local Queue class, matching how it is
initialized.

Merge pull request #6014 from dragoangel/feature/improve-url-redirector

[Feature] Add chain-aware cache and intermediate hop injection in url_redirector, improve timeouts handling

[Fix] url_redirector tests: update test suite configuration

- Update 162 to use basic config
- Update 165 to use MESSAGE variable
- Ensure consistent test execution

[Fix] url_redirector tests: resolve timing issues and simplify test suite

- Fix variable syntax error in 164
- Convert test messages to HTML format
- Simplify test suites to avoid async timing issues
- Use basic config for reliable test execution
- Add missing MESSAGE variable definitions
- All 30 functional tests now pass reliably

[Fix] url_redirector tests: fix message format and variable syntax

- Convert test messages to HTML format for proper URL extraction
- Fix variable syntax error in test suite 164
- Ensure chain redirect tests work correctly in CI environment

[Test] Add comprehensive functional tests for PR 6014 (url_redirector chain-aware cache)

Add complete test coverage for url_redirector PR 6014 features:

Test Suites (33 tests total):
- 162_url_redirector: Enhanced with chain resolution tests (4 tests)
- 163_url_redirector_chain: Core PR 6014 features (7 tests)
- 164_url_redirector_pr6014: Detailed PR 6014 functionality (8 tests)
- 165_url_redirector_cache: In-depth cache behavior (8 tests)
- 166_url_redirector_config: Configuration variations (6 tests)

Features Tested:
- Chain-aware cache with per-hop Redis entries
- ^hop: and ^nested: marker behavior
- Intermediate hop injection for downstream modules
- Self-healing cache (^nested: → ^hop: upgrade)
- Separate timeout configuration (timeout, http_timeout, redis_timeout)
- save_intermediate_redirs setting (redirectors/non_redirectors)
- Full host path in symbols (host1->host2->...->hostN)
- Cache cycle detection
- Multiple redirect chains in single message

Test Infrastructure:
- 2 new test messages (chain_redirect.eml, chain_multipart.eml)
- 2 new config files (url_redirector_chain.conf, url_redirector_no_intermediate.conf)
- Enhanced dummy_http.py with 3-hop chain endpoints (/chain1, /chain2, /chain3)
- Complete test documentation (PR6014_TESTS.md)
- Test results summary (PR6014_TEST_RESULTS.md)

All 33 tests pass successfully with build, C/C++, and Lua unit tests.

[Fix] not loose ntries in url_redirector after introducing cache probes on 30x targets

Merge branch 'master' into feature/improve-url-redirector

[Feature] url_redirector: probe cache on 30x targets to reuse shared intermediates

[Fix] Provide seen context to http_walk in url_redirector, decrease nested limit to 2 (default was 1), replace Cyrillic C in comment

[Minor] memstat: short, sort, and per-section toggle flags

Mirror fuzzy_stat ergonomics in lualib/rspamadm/memstat.lua:

- --short: only the per-worker summary table, no detail sections.
- --sort {rss,lua,mempool,jemalloc,pid}: order the summary table
by the chosen field (descending; pid stays ascending).
- --no-process / --no-mempool / --no-callsites / --no-lua /
--no-jemalloc: skip individual detail sections.

The compact and linted JSON output formats are already exposed via
the rspamadm-level -c / -j flags (the Lua subr is bypassed for those
modes), no C-side change needed.

[Fix] memstat: report per-process mempool counters

The aggregate mempool counters live in a MAP_SHARED mmap created in
rspamd_main before fork, so every worker reads and increments the same
physical page. Reporting that value per-worker made every row identical
(449.4M in a 28-worker test) and the "total" row N-counted it.

Mirror each shared-counter write into a process-local rspamd_mempool_stat_t
in BSS (which fork duplicates) and expose it via rspamd_mempool_stat_local().
Switch the memstat collector to use the local view so per-worker numbers
diverge and the total is meaningful. The original rspamd_mempool_stat()
keeps the shared semantics for /stat back-compat.

Merge branch 'master' into vstakhov-rspamadm-redis-setup

Merge pull request #5991 from fatalbanana/dmarc_reporting_test

[Test] Test saving of DMARC reports

Merge pull request #6016 from rspamd/vstakhov-memstat

[Feature] rspamadm control memstat: full memory dump across workers

[Feature] rspamadm: add memstat command and pretty-printer

Add the memstat (alias mem_stat) subcommand to rspamadm control: the
help text gains a new entry, the command name maps to /memstat, and
the response is fed through lualib/rspamadm/memstat.lua for table
output. The Lua module supports --top, --no-callsites, --no-jemalloc
and -n (raw numbers); JSON / compact JSON modes still bypass the
formatter as for other commands.

[Feature] memory_stat: per-worker memory dump collector

Introduce src/libserver/memory_stat.{cxx,h} that gathers a UCL dump for
a worker process: OS-level RSS/VmSize breakdown, mempool aggregate plus
per-callsite suggestions, Lua heap usage, and (when WITH_JEMALLOC is
defined) jemalloc mallctl counters and the textual malloc_stats_print
dump. The document is serialized to a tempfile and the descriptor is
passed back over the control pipe with SCM_RIGHTS, mirroring the
existing fuzzy_stat pattern.

Wire the collector into rspamd_control_default_cmd_handler so any
worker registered with the default control handlers transparently
answers RSPAMD_CONTROL_MEMORY_STAT without per-worker boilerplate.

[Feature] rspamd_control: wire /memstat command and reply union

Add RSPAMD_CONTROL_MEMORY_STAT to the enum, a fixed-size summary slot
in the cmd/reply unions (status, rss_kb, lua_kb, mempool_bytes,
jemalloc_allocated), the /memstat URL mapping, and the per-worker UCL
emission and totals aggregation in rspamd_control_write_reply().

The actual collector and the dispatch through default_cmd_handler are
introduced in the following commit; with this change in isolation the
command is reachable end-to-end but returns only zero summaries.

[Feature] lua_common: expose Lua heap usage helper

Add rspamd_lua_get_memory_used() that combines LUA_GCCOUNT and
LUA_GCCOUNTB into a byte count. Used by the memstat control command;
also a convenient single entry point for any future per-worker Lua
heap diagnostics.