git.ipfire.org Git - thirdparty/rspamd.git/log

[Fix] Fix issues in logstats/mapstats from code review

- Shell-quote paths in io.popen() to prevent injection
- Fix typo: correllations -> correlations in JSON output
- Pre-compile ignored symbol regexes instead of recompiling per call
- Deduplicate score change output logic in logstats
- Use native rspamd_ip equality instead of string comparison in mapstats

[Test] Add functional tests for logstats and mapstats

Robot Framework tests covering:
- logstats: JSON output, text output, symbol filter, alpha_score
warning, stdin mode, scan time display
- mapstats: map loading, inline comments for plain/IP/regexp maps,
match counts, regexp matching

Includes test data: sample log files, plain/IP/regexp map files,
and a minimal multimap config.

[Fix] Fix missing inline comments in mapstats output

rspamd_regexp_search() truncates captures at the first unmatched
optional group, so when the score group was absent the comment
group was lost. Extract comments with Lua patterns before passing
the line body to rspamd_regexp.

[Minor] Add colored output and TTY-aware progress to logstats/mapstats

Gate spinner and ANSI escape codes behind isatty() so piped output is
clean. Add ansicolors to logstats (Ham/Spam/Junk labels, symbol names,
actions, warnings, summary) and mapstats (map status, match counts,
unmatched warnings).

[Minor] Warn when symbols are filtered by alpha_score

[Minor] Handle blank lines in mapstats maps

Replace regex-based empty line detection with Lua pattern matching
to correctly identify and preserve blank lines instead of treating
them as syntax errors.

[Minor] Use positional argument for log file in logstats/mapstats

[Fix] Fix broken ip_within in mapstats: parse CIDR and use apply_mask return value

rspamd_ip.from_string rejects '/' in CIDR notation, so strip the mask
before parsing. Also apply_mask returns a new IP object rather than
modifying in place, so capture the return values.

[Feature] Rewrite rspamd_stats.pl and mapstats.pl as rspamadm Lua subcommands

Add rspamadm logstats and rspamadm mapstats commands that replace the
Perl utility scripts utils/rspamd_stats.pl and utils/mapstats.pl.

- lua_log_utils.lua: shared library for log format detection, timestamp
  conversion, compressed file handling, directory scanning, and progress
  spinner
- logstats.lua: full port of rspamd_stats.pl with all options including
  symbol filtering, bidirectional symbols, groups, correlations, score
  multipliers, time range filtering, and JSON output via UCL
- mapstats.lua: full port of mapstats.pl using native rspamd_config for
  multimap access, rspamd_regexp for regex maps with full flag support,
  and rspamd_ip for IP/CIDR matching (no external dependencies)

[Test] Add unit tests for ring hash consistent upstream hashing

Verify consistency, distribution, weight-scaling, stability, and
except-parameter behaviour of the Ketama-style ring hash introduced
in 4ea750466.

[Rework] Replace broken Jump Hash with Ring Hash (Ketama) for consistent upstream hashing

Jump Consistent Hash (Lamping & Veach 2014) only handles bucket
addition/removal at the end of the range.  When an upstream in the
middle failed, the old code rehashed with mum_hash_step and retried
up to 20 times, which destroyed the consistency property: keys that
mapped to the dead node were redistributed randomly instead of
deterministically, and didn't return when the node recovered.

Replace with a Ketama-style ring hash:
- Each alive upstream gets MAX(weight,1)*100 virtual nodes on a
  sorted hash ring (keyed by name, order-independent).
- Lookup is a binary search: O(log(n*v)) instead of O(ln n) * retries.
- When an upstream fails, only its ~1/n fraction of keys slide to the
  next ring point — true minimal disruption.
- When it recovers, the same keys return — true consistency.
- The 'except' parameter walks forward on the ring instead of rehashing.
- Ring is rebuilt lazily (dirty flag set on active/inactive transitions).

Merge pull request #5884 from rspamd/vstakhov-ssl-server

Implement HTTPS server support for workers

Merge pull request #5883 from moisseev/webui

[Minor] Update RequireJS to 2.3.8

Merge pull request #5870 from moisseev/mapstats

[Feature] Add mapstats utility for multimap statistics analysis

[Fix] Fix proxy mirror SSL/keepalive config parsing and remove duplicate keepalive block

Add missing ssl and keepalive option parsing to mirror config parser,
and remove duplicate keepalive parsing block in upstream config parser.

[Test] Add SSL server functional tests

Add functional tests for HTTPS server support in the
merged test suite. Tests cover controller and normal
worker SSL endpoints plus plain HTTP coexistence.

[Feature] Auto-detect SSL from bind sockets, remove ssl = true option

Instead of requiring a separate `ssl = true` worker option, automatically
detect SSL need by checking if any bind socket has the ssl flag. Emit an
error if SSL bind sockets are configured but ssl_cert/ssl_key are missing.

[Fix] Fix bind line parsing to use stripped bind_line for SSL suffix

When parsing bind lines with " ssl" suffix, the suffix was stripped from
cnf->bind_line but the original unstripped str was passed to
rspamd_parse_host_port_priority, causing parse failures.

[Feature] Implement HTTPS server support for workers

Wire up server-side SSL/TLS for HTTP workers, building on the recently
added rspamd_ssl_accept_fd() infrastructure. This enables HTTPS for
controller, normal, and proxy workers with per-bind-address granularity.

Configuration: `bind_socket = "*:11335 ssl"` plus `ssl_cert` and
`ssl_key` worker options.

- Add rspamd_init_ssl_ctx_server() for server SSL_CTX with cert+key
- Parse trailing " ssl" suffix in bind_socket lines
- Propagate is_ssl flag through bind conf to listen sockets
- Add rspamd_http_connection_accept_ssl() for async SSL handshake
- Fix write_message_common to handle server-side SSL responses
- Add SSL support to HTTP router (set_ssl, handle_socket_ssl)
- Wire up SSL in controller, normal worker, and proxy worker
- Add rspamd_worker_is_ssl_socket() utility for fd-to-SSL lookup

[Minor] Add CLAUDE.md with development guidelines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[Feature] Add SSL server-side accept support

Add rspamd_ssl_accept_fd() function for server-side SSL handshakes:
- Mirrors existing rspamd_ssl_connect_fd() but for accepting connections
- Adds ssl_conn_init_accept state for server-side SSL state machine
- Handles SSL_accept() with proper WANT_READ/WANT_WRITE event handling

This enables SSL-capable server implementations (e.g. SMTP proxy with STARTTLS).

Merge pull request #5880 from rspamd/vstakhov-check-v3

Add /checkv3 multipart scan endpoint

[Test] Add v3 compression and proxy forwarding tests

Add C++ unit tests for zstd per-part compression round-trip
(serialize and iov paths), mixed compressed/uncompressed parts,
and body_iov segment writability for in-place encryption.

Add Robot functional tests for /checkv3 through the proxy,
both direct multipart and rspamc with zstd compression.

[Fix] protocol: Handle shared memory and whole-body compression for v3 proxy path

When the proxy forwards /checkv3 requests to a local upstream, it uses
shared memory (GET + Shm headers) instead of sending the body inline.
The v3 request handler only read from chunk/len parameters which are
empty in this case. Add Shm/Shm-Offset/Shm-Length header handling to
read the body from the shared memory segment.

Additionally, the proxy may compress the entire response body with zstd
before forwarding to the client. The v3 client finish handler parsed
multipart directly from the compressed body_buf. Add whole-body
decompression (matching the v2 handler) before multipart parsing.

[Feature] protocol: Zero-copy piecewise writev for v3 multipart responses

Add body_iov support to the HTTP message layer so the write path can use
writev with multiple iovec segments instead of a single contiguous buffer.
The v3 multipart response now builds its boundary/header strings and data
pointers as separate iovecs, avoiding extra copies of the UCL result and
rewritten message body. The cryptobox encryption path handles multiple body
segments via encryptv_nm_inplace seamlessly.

[Feature] protocol: Add v3 multipart response parsing for proxy and body decompression

Proxy forwarding now handles multipart/mixed responses from /checkv3:
parse result+body parts, decompress zstd, detect msgpack, and forward
rewritten body to milter. Self-scan v3 populates conn->results for
milter and Lua comparison scripts. rspamc client decompresses zstd
body parts returned by the server.

[Refactor] protocol: Deduplicate v2/v3 request and reply handling

Extract shared helpers to eliminate duplicated logic between
rspamd_protocol_handle_headers (v2) and rspamd_protocol_handle_metadata (v3),
as well as between rspamd_protocol_http_reply and rspamd_protocol_http_reply_v3.

Request-side helpers: rspamd_protocol_set_from_envelope,
rspamd_protocol_set_ip, rspamd_protocol_set_settings_id,
rspamd_protocol_set_log_tag, rspamd_protocol_add_mail_esmtp_arg,
rspamd_protocol_add_rcpt_esmtp_arg.

Reply-side helpers: rspamd_protocol_update_history_and_log,
rspamd_protocol_update_stats, rspamd_protocol_get_rewritten_body.

[Feature] rspamc: Add --msgpack flag for v3 protocol

Add --msgpack option to rspamc that sends metadata as msgpack instead
of JSON and requests msgpack responses when using --protocol-v3.

The client serializes metadata via UCL_EMIT_MSGPACK, sets the metadata
part Content-Type to application/msgpack, and sends Accept:
application/msgpack so the server returns results in msgpack format.

Add functional tests for rspamc v3 with zstd compression, httpcrypt
encryption, msgpack metadata, and encrypted+msgpack combinations.

[Fix] protocol: Pass v3 Content-Type as mime_type parameter

The v3 reply builder was adding Content-Type (multipart/mixed with
boundary) as an HTTP header via rspamd_http_message_add_header, while
setting ctype=NULL for rspamd_http_connection_write_message. With NULL,
the HTTP library defaults to "text/plain", so the client never saw the
multipart Content-Type and fell through to plain UCL parsing.

Fix by returning the Content-Type string (pool-allocated) from
rspamd_protocol_http_reply_v3 and passing it as the mime_type parameter
directly. Also fix the same pattern in rspamd_proxy.c.

Update v3 error test expectations from 400 to 500 to match the existing
error code mapping formula (500 + err_code % 100).

[Fix] Add missing includes for Linux/GCC build

Add <cctype> for std::tolower and <string> for std::string
in multipart_form.cxx. These are transitively included on
macOS/clang but not on Linux/GCC.

[Test] Add MIME-in-message tests for /checkv3

Verify that messages with their own MIME structure (multipart/alternative,
multipart/mixed with attachments) are preserved intact when wrapped in
the outer form-data envelope. Unit tests confirm inner MIME boundaries
don't confuse the outer parser; functional tests confirm end-to-end
symbol detection (R_PARTS_DIFFER, MIME_HTML_ONLY) works via /checkv3.

[Test] Add tests for /checkv3 multipart endpoint

C++ unit tests (23 cases): multipart form parser, response builder,
and round-trip serialization. Robot Framework functional tests (6 cases):
GTUBE scan, metadata handling, settings_id, and error cases for missing
parts and malformed boundaries. Python helpers for building and parsing
multipart/form-data requests and multipart/mixed responses.

[Feature] Add v3 request validation and use safe UCL parser flags

Enforce max 2 parts (metadata + message) in /checkv3 multipart requests,
returning HTTP 400 for malformed requests with extra parts. Switch UCL
parser to UCL_PARSER_SAFE_FLAGS to disable macros/includes in untrusted
metadata input.

Merge branch 'master' into vstakhov-check-v3

[Fix] tests: Update URL expectation to match percent-encoded spaces

After 9f3a41069 URL tostring re-encodes spaces as %20, so the
functional test must expect the encoded form.

[Minor] Update RequireJS to 2.3.8

[Test] Add MIME_HTML_ONLY test for multipart/mixed with html and non-text attachment

Add test case for multipart/mixed containing text/html + application/zip
to ensure MIME_HTML_ONLY fires when HTML is the only text part alongside
a non-text attachment.

[Fix] lua_content: Move PDF ligature substitutions from string unescape to text handler

StandardEncoding/MacRomanEncoding ligature substitutions (e.g. byte 0xAD -> 'ffl')
were applied to all PDF strings including /URI annotation values. This corrupted
soft hyphens (U+00AD) in URLs, preventing the URL parser from detecting zero-width
space obfuscation and setting the ZW_SPACES flag.

Move ligature substitutions to text_op_handler where they belong, so they only
apply to rendered text content (Tj/TJ operators), not to dictionary string values.

[Feature] arc: Add trusted_authserv_id option for reuse_auth_results

Allow configuring which Authentication-Results header to trust when
reuse_auth_results is enabled, by matching the authserv-id field.

Closes #5881

[Fix] lua_url: Re-encode control characters and spaces in URL tostring

The URL parser (rspamd_url_decode) decodes percent-encoded sequences
like %20 back to literal characters in the internal representation.
When tostring() returned these decoded URLs, spaces and control chars
would break subsequent re-parsing (e.g., in url_redirector redirect
chains and Redis cache round-trips). Fix by re-encoding characters
<= 0x20 on serialization, matching browser behavior: decode internally
for matching, re-encode on copy.

[Fix] re_cache: Always use charset-converted content for SARAWBODY matching

Use utf_raw_content (charset-converted UTF-8 with HTML tags preserved)
for all SARAWBODY patterns, regardless of /u flag presence. The previous
approach used utf_content (which strips HTML tags on HTML parts) and only
for classes containing /u patterns, leaving non-/u patterns matching
against raw bytes in the original charset.

This prevents trivial bypass of SA rawbody rules via exotic encodings
like UTF-16 and ensures consistent matching across PCRE and Hyperscan.
Falls back to transfer-decoded parsed content only when charset
conversion failed.

Merge pull request #5871 from KIT-CERT/fix_ratelimits

fix dynamic bucket-specific rate-limits

Merge pull request #5874 from rspamd/vstakhov-proxy-balancing

Feature: Token bucket load balancing for proxy upstreams

[Test] upstream: add token bucket unit tests

15 doctest test cases covering token bucket load balancing:
basic selection, cost formula, token return/penalty, least-loaded
preference, except parameter, exhaustion fallback, fair distribution,
custom config, empty list, null safety, large messages, multiple
inflight, mixed success/failure, and generic API fallback.

[Fix] upstream: fix stale heap_idx in token bucket

The intrusive heap swaps entire structs during swim/sink, making
up->heap_idx stale after any heap modification. The update function
would silently skip updates when the cached index pointed to a
different upstream, breaking load distribution across backends.

Fix by falling back to linear search on cache miss and refreshing
heap_idx after every heap update. Also add underflow warning for
double-return detection and improve API documentation.

Merge branch 'master' into vstakhov-proxy-balancing

[Feature] Add /checkv3 multipart scan endpoint

Implement a new /checkv3 endpoint that uses multipart/form-data for
requests and multipart/mixed for responses. Metadata (from, rcpt, ip,
settings, etc.) is sent as a structured JSON/msgpack part instead of
HTTP headers. The response includes a "result" part and an optional
"body" part for rewritten messages.

New C++ multipart parser and response builder with C bridge functions.
Per-part zstd compression support. Client-side support via rspamc
--protocol-v3 flag. Proxy self-scan path updated for v3.

[Refactor] fuzzy storage: split helper code (#5875)

Merge pull request #5878 from moisseev/webui

[Fix] WebUI: Allow computing fuzzy hashes without writable storages

[Fix] Fix printf format specifiers found by clang-plugin

[Fix] clang-plugin: add null check for struct type in check_struct_type

[Fix] clang-plugin: suppress noisy remarks and fix SANITIZER macro conflict

[Fix] clang-plugin: fix build with modern LLVM/Clang

[Fix] Use %ud instead of %u in rspamd printf format strings

[Fix] re_cache: Use debug level for missing Lua backend during config

During config initialization (configtest, startup), there's no event
loop available so the Lua backend cannot be initialized. This is
expected behavior - use debug level when try_load=true to avoid
noisy warnings during configtest.

[Test] Add test cases for MIME_HTML_ONLY with malformed multipart

Add tests for edge cases that caused a segfault when multipart/related
has no children or contains only non-text content:

- alternative-nested-rfc822.eml: multipart/alternative with HTML and
related containing only image (no text), plus nested message/rfc822
- alternative-empty-related.eml: multipart/alternative with malformed
related that has no proper MIME children

These test cases verify the NULL check fix for mp->children.

[Fix] message: Add NULL check for mp->children in alternative detection

The multipart children array can be NULL in some edge cases. Add NULL
checks before accessing mp->children->len to prevent segfault in
rspamd_mime_part_find_text_in_subtree() and related code paths.

[Fix] re_cache: Use debug level for startup hyperscan load failures

During worker startup, a "best-effort" synchronous hyperscan load is
attempted before hs_helper has finished compiling. When files don't
exist yet, the "no valid expressions" message was logged at info level,
which is noisy and misleading since this is expected startup behavior.

Changed to use debug level when try_load=true (startup probe), while
keeping info level for actual failures. Workers will receive async
notifications when hs_helper finishes compiling.

[Fix] WebUI: Allow computing fuzzy hashes without writable storages

[Fix] re_cache: Respect disable_hyperscan option in loading functions

Add checks for disable_hyperscan at the start of hyperscan loading
functions to prevent database loading when the option is set.

Previously, hyperscan databases would still be loaded even with
disable_hyperscan = true, causing unnecessary I/O and memory usage.

[Fix] re_cache: Use charset-converted content for UTF-8 SARAWBODY patterns

When SARAWBODY regexp class contains UTF-8 patterns (/u flag), use
utf_content (charset-converted UTF-8 with HTML preserved) instead of
parsed content. This allows Unicode patterns like \x{200b} to match
correctly.

For non-UTF patterns, continue using parsed content with raw mode
for backward compatibility with raw byte matching.

This fixes "bad utf8 input for JIT re" errors when using Unicode
patterns in rawbody rules on non-UTF-8 encoded messages.

[Fix] re_cache: Always use raw mode for SARAWBODY regexps

The parsed content is transfer-decoded (base64/QP) but NOT charset-converted,
so it may contain non-UTF-8 data even when IS_TEXT_PART_UTF is true.

Using dynamic raw flag based on IS_TEXT_PART_UTF was incorrect because that
flag indicates whether utf_content is valid UTF-8, not whether parsed content
is valid UTF-8.

Bug introduced in 0d62dd6513 (1.8.3), this restores the original behavior of
always treating SARAWBODY content as raw.

[Feature] re_cache: Improve hyperscan loading log messages

- Show regexp count alongside class count (e.g., "4662 regexps (42/42 classes)")
- Display loaded/total classes ratio for partial loads (e.g., "38/42 classes")
- Log missing classes with reasons (not cached, empty data, load failed)
- Add consistent logging to async loading path
- Include class type and type_data in missing class messages for easier debugging

[Fix] Use rspamd printf format specifiers instead of GNU

Fix format strings in hs_cache_backend.c and re_cache.c to use rspamd's
custom printf specifiers:
- %uz for gsize (unsigned size_t), not %z
- %ud for unsigned int, not %u

The GNU format specifiers cause crashes on some platforms.

[Fix] lua_magic: avoid misdetecting HTML with embedded SVG as SVG

Add svg_format_heuristic that checks for HTML markers (<!DOCTYPE html>,
<html>, <head>, <body>, <meta>) before the <svg> tag position. If HTML
markers are present, skip SVG detection and let the text heuristic
properly classify the content as HTML.

Add functional test with HTML containing embedded SVG (should detect as
HTML) and standalone SVG (should still detect as SVG).

[Fix] Hyperscan cache: use Lua backend for sync loading, load on worker startup

Two issues addressed:

1. Sync loading now uses Lua backend exclusively instead of duplicating
   file loading logic in C. The Lua backend handles files, compression,
   and future backends (redis, http) uniformly.

2. Workers now proactively load hyperscan on startup after Lua backend
   is initialized. This fixes a race condition where workers spawned
   after hs_helper broadcasts HYPERSCAN_LOADED would never receive the
   notification and run without hyperscan acceleration.

Changes:
- Add rspamd_hs_cache_lua_load_sync() and rspamd_hs_cache_lua_exists_sync()
  to call Lua backend's sync methods from C
- Remove duplicated C file loading code from re_cache.c (zstd decompress,
  file path checking) - Lua backend handles this
- rspamd_re_cache_load_hyperscan() now requires Lua backend
- Workers try sync load on startup (best-effort, falls back to PCRE)

[Fix] Rework alternative parts detection

convert tabs to spaces

[Fix] R_PARTS_DIFFER: handle multipart/related in alternative

fix linter errors

[Feature] proxy: enable token bucket load balancing by default

Add token_bucket configuration to the default upstream in worker-proxy.inc
with sensible defaults (max_tokens=10000, scale=1024, base_cost=10).

[Feature] proxy: implement token bucket load balancing for upstreams

Add weighted load balancing algorithm that considers message size and
current backend load when selecting upstreams. Each upstream has a token
pool that gets depleted proportionally to message size and replenished
when requests complete successfully.

- Add RSPAMD_UPSTREAM_TOKEN_BUCKET rotation type
- Implement min-heap based selection for O(log n) upstream selection
- Reserve tokens proportional to message size (base_cost + size/scale)
- Return tokens on success (restores available) or failure (lost)
- Fall back to least-loaded upstream when all are token-exhausted
- Add UCL configuration: token_bucket { max_tokens, scale, min_tokens, base_cost }

[Fix] R_PARTS_DIFFER: also handle parts without words

The previous fix only handled truly empty parts. This also handles
the case where a part has content but no extractable words (e.g.,
6 bytes of whitespace in text/plain vs 142 words in text/html).

Now check if exactly one part has normalized_hashes with words,
regardless of whether parts are marked as empty.

[Fix] R_PARTS_DIFFER: handle empty alternative parts

Previously, R_PARTS_DIFFER only triggered when both text/html and
text/plain parts had content. When one part was empty (e.g., empty
HTML with non-empty plain text), the distance was never calculated.

Now detect when exactly one part is empty and treat it as 100%
difference, which will trigger R_PARTS_DIFFER with maximum score.

fix dynamic bucket-specific rate-limits

[Minor] Fix security issues and add security documentation

- Fix command injection in configdump(): use list form open() instead of
backticks with string concatenation
- Fix ReDoS in ProcessLog(): escape regex metacharacters in symbol names
with \Q...\E when matching against log symbols
- Add SECURITY CONSIDERATIONS section to POD documenting trust assumptions
for map files, configuration, and log files

[Feature] Add mapstats utility for multimap statistics analysis

Introduce a new utility to analyze Rspamd logs and count matches for
multimap module patterns. This helps identify ineffective map entries
and optimize multimap configurations.

Features:
- Parse file-based multimap configurations from rspamadm configdump
- Load and validate map files with support for:
  * IP/CIDR patterns (IPv4/IPv6)
  * Regular expressions with PCRE flags (imsxurOL)
  * Plain string patterns (domains, hostnames, etc.)
  * Full-line and inline comments
- Process Rspamd logs (plain or compressed: bz2, gz, xz, zst)
- Match log entries against map patterns
- Generate statistics report grouped by map source files
- Show match counts and comments for each pattern
- Report unmatched symbol values for debugging

Installation:
- Added to CMake install rules (installs as 'mapstats')
- Added to RPM spec file
- Debian packaging works automatically via CMake

Usage:
mapstats -l /var/log/rspamd/rspamd.log
mapstats -l /var/log/rspamd -n 5        # Last 5 rotated logs
mapstats -l /var/log/rspamd --start 2024-01-01 --end 2024-01-31

Requirements:
- Perl 5.14+
- JSON::PP (core module)
- NetAddr::IP (optional, for type=ip maps only)

[Fix] worker_util: add hyperscan handlers to controller

Controller was missing hyperscan/multipattern/regexp_map hot-swap
handlers, causing it to stay on ACISM fallback while normal workers
switched to hyperscan.

[Fix] multipattern: fix TLD pattern matching after hyperscan hot-swap

The hyperscan TLD pattern suffix (?:[^a-zA-Z0-9]|$) was consuming the
boundary character, causing match length to be one character too long.
This broke URL detection in url_tld_end() after workers hot-swapped
from ACISM to hyperscan.

Root cause: For input "adobesign.github.io/?u=xxx":
- ACISM pattern ".github.io" -> match length 10, p points to "/"
- Hyperscan pattern "\.github.io(?:[^a-zA-Z0-9]|$)" -> match length 11,
p points to "?" (one past expected)

url_tld_end() checks if *p == '/' which failed for hyperscan.

Fix: Remove suffix from hyperscan TLD patterns and add boundary checking
in rspamd_multipattern_hs_cb(), mirroring what ACISM callback already does.
This ensures consistent match length between both backends.

[Fix] Make unknown and broken DKIM keys behaviour conforming to RFC

[Feature] headers_checks: add Reply-To address validity checks

Add RFC 5321 compliance checks for Reply-To header:
- REPLYTO_INVALID: address doesn't pass RFC 5321 validation
- REPLYTO_LOCALPART_LONG: local-part exceeds 64 characters
- REPLYTO_DOMAIN_LONG: domain exceeds 255 characters

This helps detect spam with intentionally invalid Reply-To addresses
that users cannot actually reply to.

Closes: #5854

Merge pull request #5869 from fatalbanana/antivirus_docs

[Minor] antivirus: fix `whitelist` description

Merge branch 'master' into antivirus_docs

[Fix] re_cache: fix stale hyperscan ID handling during config reload

When hyperscan cache is reloaded, the old hs_ids array may contain indices
that now point to different regexps in cache->re due to regexp reordering
after config reload. This caused two issues:

1. Cleanup used stale hs_ids to reset match_type, potentially resetting
   wrong regexps while leaving the actual ones with match_type=HYPERSCAN
   but hs_scratch=NULL, causing assertion failures.

2. If validation failed mid-loop while setting match_types, some regexps
   would already have match_type=HYPERSCAN before we freed scratch,
   leaving them in an inconsistent state.

Fix:
- Iterate re_class->re hash table (actual regexps in class) during cleanup
  instead of using potentially stale hs_ids
- Split validation and match_type setting into separate loops so we only
  set match_types after ALL IDs are validated

Both file-based and Redis-based loading paths are fixed.

[Minor] antivirus: fix `whitelist` description

Merge pull request #5866 from rspamd/vstakhov-fuzzy-ratelimit-scripts

Enhance fuzzy blacklist handler with richer context

[Fix] cryptobox: properly bypass RHEL/CentOS 10+ crypto-policies for SHA-1 DKIM

RHEL/CentOS 10+ disables SHA-1 signatures via the rh-allow-sha1-signatures
config option, which is not bypassed by simply creating a new OSSL_LIB_CTX.

This fix:
- Creates a temporary OpenSSL config file with rh-allow-sha1-signatures=yes
- Loads it into the dedicated SHA-1 library context via OSSL_LIB_CTX_load_config
- Improves error messages to include algorithm name and RHEL-specific hints
- Captures OpenSSL error details when EVP_PKEY_verify fails
- Adds troubleshooting guidance in error messages

On non-RHEL systems, the config option is simply ignored.

[Fix] lua_hs_cache: add defensive checks for zstd_decompress

Add type checking before decompression to catch unexpected data types
from Redis. Wrap zstd_decompress in pcall to gracefully handle any
errors and provide detailed diagnostic logging when failures occur.

[Fix] re_cache: stop timer during async operations to prevent re-entry

Stop the timer before calling exists_async and save_async to prevent
the timer from firing multiple times while an async callback is pending.
Without this, the repeating timer (0.1s interval) could fire again before
the async operation completes, causing multiple concurrent async calls
for the same re_class. This led to race conditions and use-after-free
crashes when callbacks completed out of order.

[Fix] re_cache: initialize async context to prevent random callback skipping

Use g_malloc0 instead of g_malloc when allocating rspamd_re_cache_async_ctx
to ensure callback_processed flag is initialized to FALSE. Without this,
the flag could randomly be TRUE (from uninitialized memory), causing the
exists/save callbacks to be silently skipped and preventing re_class
compilation from proceeding after cache misses.

[Fix] hs_helper: fix use-after-free in Redis async cache callbacks

Remove dangerous ev_run(EVRUN_NOWAIT) calls from inside Redis callback
chains in rspamd_hs_helper_mp_exists_cb and rspamd_hs_helper_remap_exists_cb.

Calling ev_run() inside a callback can trigger Lua GC which may try to
finalize lua_redis userdata while we're still processing the callback,
causing lua_redis_gc to access already-freed memory.

Also add missing REF_RELEASE calls at the end of both callbacks to properly
release the reference from exists_async (matching the exists==true path).

Revert "[Fix] lua_redis: add defensive check in GC handler"

This reverts commit e753e063c2102d920b8b2ea15f675de04b1aaa99.

[Fix] lua_redis: add defensive check in GC handler

Add validation in lua_redis_gc to check that the context pointer
appears valid before releasing. This prevents crashes when Lua GC
collects stale userdata pointing to already-freed memory.

[Fix] hs_helper: fix use-after-free in async hyperscan cache callbacks

Add proper refcounting to async compilation contexts to prevent
use-after-free when Redis callbacks are invoked multiple times
(timeout + response) or during worker termination.

- Add ref_entry_t to async context structures
- Use REF_RETAIN before async operations and REF_RELEASE in callbacks
- Add callback_processed flag to prevent double processing
- Save entry data before ev_run that might free pending arrays

Merge pull request #5863 from moisseev/full-hashes

[Feature] WebUI: Add fuzzy hash copy and delist buttons

[Feature] fuzzy_storage: enhance blacklist handler with richer context

Pass extended information to Lua blacklist handlers:
- event_type: distinguish "new", "existing", or "blacklist" events
- ratelimit_info: bucket state (level, burst, rate, exceeded_by)
- digest: hash when session context is available
- extensions: domain and source IP from fuzzy extensions

Backwards compatible - existing handlers still receive ip and reason
as first two arguments. New arguments are optional for Lua handlers.

Optimized with early-exit checks when no handlers are registered.

[Minor] WebUI: Deduplicate fuzzy hashes in multipart messages

[Fix] cryptobox: remove redundant obj_mac.h include

NID_sha1 is already available through evp.h -> objects.h -> obj_mac.h
include chain, so explicit include is unnecessary.

[Fix] cryptobox: bypass RHEL/CentOS 10 crypto-policies for SHA-1 DKIM verification

RHEL/CentOS 10+ crypto-policies disable SHA-1 for signatures by default,
causing rsa-sha1 DKIM verification to fail. This is problematic as many
legitimate emails still use rsa-sha1 DKIM signatures.

For OpenSSL 3.0+, create a dedicated OSSL_LIB_CTX that bypasses system
crypto-policies specifically for SHA-1 DKIM signature verification.
SHA-256/SHA-512 verifications continue using the normal system context.

The legacy context is lazily initialized only when SHA-1 verification
is needed, avoiding overhead for modern rsa-sha256 signatures.

[Fix] re_cache: detect and handle stale hyperscan files with mismatched regexp IDs

When the re_class hash doesn't include global regexp indices, regexps can
be reordered while the class hash stays the same. This causes old hyperscan
files to be loaded with stale position IDs that point to wrong regexps
(possibly in different re_classes with no hyperscan loaded), leading to
assertion failures when hs_scratch is NULL.

Fix:
- Reset match_type to PCRE during cleanup before freeing hs_ids, preventing
stale HYPERSCAN flags if reload fails
- Validate that each stored ID points to a regexp belonging to the current
re_class before setting match_type
- Delete stale hyperscan files to trigger recompilation by hs_helper
- For Redis cache, stale entries will expire or be overwritten

Both file-based and Redis-based loading paths are fixed.