Vsevolod Stakhov [Fri, 12 Dec 2025 20:24:26 +0000 (20:24 +0000)]
[Fix] Handle connection errors with io_uring backend in HTTP client
When using io_uring, POLLERR is reported as both EV_READ and EV_WRITE.
This caused connection failures (e.g., ECONNREFUSED) to be misinterpreted
as early server responses. Check SO_ERROR before attempting to read when
the connection hasn't been established yet.
[Fix] Correct symbols column index in history and scan tables
Fixes regression introduced in 62b136a where sorting fails with
"can't access property 'sortValue', val.options is undefined" on
the History tab, and symbol reordering doesn't work on the Scan tab.
The "file" column addition shifted the symbols column index, but
history.js and upload.js were not updated, causing symbol reordering
to target wrong columns.
[Feature] Add multipart and msgpack formatters to metadata_exporter
- Add multipart formatter for HTTP export using form-data with separate
metadata (JSON) and message (rfc822) parts
- Add msgpack formatter for efficient binary serialization
- Add json_with_message formatter for JSON with base64-encoded message
- Deprecate meta_headers option (broken by design for complex data)
- HTTP pusher now auto-detects multipart boundary from formatter
[Fix] Only apply early response handling for HTTP clients
The early response detection logic should not run for server-side
connections, as it incorrectly modifies wr_pos state when the server
reads incoming requests. This was breaking spamc protocol handling.
[Fix] Handle HTTP early server responses during request write
Fix HTTP client to properly handle early server responses (e.g., 413
Too Large) that arrive before the client has finished sending the
request body. This is allowed by HTTP/1.1 (RFC 7230 Section 6.5).
- Use bitwise AND for event flag checks to handle combined EV_READ|EV_WRITE
- Watch for both READ and WRITE events during write phase
- Check for early response on write errors (EPIPE, ECONNRESET)
- Add RSPAMD_HTTP_CONN_FLAG_EARLY_RESPONSE flag to track state
FreeBSD 15.0 introduced native inotify support, which causes
libev to enable EV_USE_INOTIFY. On FreeBSD, struct statfs is
defined in <sys/mount.h> rather than <sys/statfs.h>.
Note: This fix obsoletes the corresponding patch file in the
FreeBSD `mail/rspamd` and `mail/rspamd-devel` ports.
[Fix] Allow default_headers_order to be configured in milter_headers
Fixes #5781: The default_headers_order setting was defined in the plugin
but never read from the configuration file. Add schema validation and
config loading for this option.
[Fix] Fix Lua 5.4 compatibility issues in neural plugin
This commit addresses several Lua 5.4 compatibility issues that caused
the neural LLM tests to fail:
1. Redis TTL must be integer (lua_cache.lua):
- Lua 5.4's tostring() produces "4.0" for floats instead of "4"
- Redis SETEX/EXPIRE commands require integer TTL values
- Fixed by using math.floor() before tostring()
2. Version number format in ANN keys (lualib/plugins/neural.lua):
- Changed string format from %s to %d for version numbers
- Ensures integer format "1" instead of potential "1.0"
3. Iterator vs table handling (src/plugins/lua/neural.lua):
- fun.map() returns an iterator, not a table
- In Lua 5.4, # operator on iterators returns 0
- Fixed by wrapping with fun.totable() to get a proper table
4. Nil values in table arguments (lualib/plugins/neural.lua):
- Lua 5.4 handles nil values in tables differently
- Tables like {a, b, nil, nil} have undefined length behavior
- Fixed by using empty string defaults for optional parameters
5. Redis script nil checks (neural_save_unlock.lua):
- Added empty string checks alongside nil checks
- Ensures optional fields are only set when truly provided
6. Test infrastructure improvements:
- Added logging to dummy_llm.py for debugging
- Added proper error handling and diagnostics
- Updated rspamd.robot with better dummy_llm startup logging
- Add required Host header to all HTTP/1.1 requests in tcp.lua
- Bind dummy servers to 127.0.0.1 instead of localhost to avoid
IPv6/IPv4 mismatch on systems where localhost resolves to ::1
Lua 5.4's require() returns both the module and the file path, while
LuaJIT returns only the module. Save stack top before luaL_dostring
and restore to top+1 after to keep only the first return value.
[Fix] Use ipairs for ordered iteration in header checks
pairs() does not guarantee iteration order for numeric keys. In Lua 5.4
this caused RCVD_COUNT, HAS_X_PRIO, and RCPT_COUNT symbols to select
wrong thresholds when the table was iterated in non-ascending order.
[Fix] Use math.floor for Lua 5.4 integer division compatibility
In Lua 5.4, the / operator always returns a float (2/2 = 1.0), while
LuaJIT returns an integer (2/2 = 1). This caused test dependency
registration to fail as tostring(i/2) produced "1.0" instead of "1".
[Fix] Improve loadstring error handling for Lua 5.4 compatibility
Ensure loadstring results are checked for nil (syntax errors) before
passing to pcall. This prevents errors when running with Lua 5.4
compatibility where load behavior differs slightly or when handling
invalid Lua chunks.
[Fix] Use userdata __gc for UCL objects in all Lua versions
Use userdata __gc instead of table __gc for UCL object garbage
collection in all Lua versions. Table __gc in Lua 5.2+ can cause
use-after-free crashes due to GC ordering issues when UCL objects
reference each other or config objects.
[Fix] Use locale-independent patterns in URL encoding
Replace %w with explicit A-Za-z0-9 ranges in URL encoding functions.
The %w pattern is locale-dependent and incorrectly matches high bytes
(0xE4, 0xE5, 0xE6) as word characters in UTF-8 locales like en_GB.UTF-8,
breaking URL encoding of non-ASCII characters.
- Add startup progress messages to stderr
- Capture exceptions with full traceback
- Write PID only after successful server start
- Log output on failure in robot tests
[Test] Fix Lua 5.4 and cffi-lua compatibility issues
- Fix rspamd_memspn to handle empty character set without crash
- Add unpack compatibility shim for Lua 5.2+ (table.unpack)
- Replace deprecated table.maxn with # operator
- Fix cffi-lua strict type checking (char* vs unsigned char*)
- Add helpers for cdata-to-number conversion (64-bit integers)
- Add proper NULL pointer detection for cffi-lua
- Fix lua_resume to use coroutine threads in Lua 5.4
- Update test expectations for Lua 5.4 tostring(float) behavior
- base32: Use tostring() for size_t values in format strings
- expressions: Use %s instead of %d for float values (Lua 5.4 strict)
- fpconv: Skip variadic function tests on cffi-lua (not supported)
[CI] Install luarocks and dependencies for cffi-lua
The Fedora CI image doesn't have luarocks pre-installed, so we need
to install it along with lua-devel and libffi-devel before we can
install cffi-lua via luarocks.
[Test] Use size ranges for gzip tests to support zlib-ng
Fedora 40+ uses zlib-ng which produces slightly different compressed
sizes than standard zlib. Instead of checking exact sizes, use
reasonable ranges that accommodate both implementations.
OpenSSL ENGINE API was deprecated in 3.0 and the header is removed
in newer versions. In dkim.c it was unused; in ssl_util.c we now
conditionally include it only for OpenSSL versions that need it.
[Fix] Fix reputation whitelist schema and selector-aware checking
- lua_maps_expressions.schema: Change rules from array to key-value
table to match actual UCL config format (fixes #5780)
- reputation.lua: Make simple whitelist maps selector-aware instead
of always assuming IP-based whitelists
[Fix] Avoid repeated simdutf implementation detection on each call
The previous code stored a pointer to simdutf's proxy singleton instead
of the actual implementation, causing detect_best_supported() to be
called on every UTF-8 validation operation.
[Fix] Normalize URLs with multiple slashes between host and path
Fixes #5773: URLs like https://example.com//path were not being
normalized. The extra slashes between host and path are now collapsed
to a single slash during URL parsing.
[Fix] Use double type for rspamd_scan_time_average Prometheus metric
The metric was always 0 because rspamd_metrics_add_integer() truncated
the avg_scan_time double value (typically fractions of a second) to
integer. Added rspamd_metrics_add_double() helper and use it for
avg_scan_time.
[Feature] Add --recheck-rua option to dmarc_report for RUA filtering at send time
- Add -r/--recheck-rua flag to rspamadm dmarc_report to re-check RUA
addresses against exclude_rua_addresses map before sending reports
- Extend lua_maps to support rspamadm context for external map queries,
enabling coroutine-based synchronous HTTP requests
- Works with both local maps and external (HTTP) maps
- Checks both full email address and domain-only against the map
[Fix] Enable aliases plugin by default to restore plus-addressing
Fixes #5768: Settings lookup was broken for subaddressed recipients
(e.g., user+folder@example.com) because the aliases plugin was
disabled by default after it was moved from rules/misc.lua in 3.14.
This restores the pre-3.14 behavior where plus-tags are stripped
and virtual recipients are created for settings matching.
[Test] Restrict public suffix sync workflow to upstream repository
Restrict the scheduled/public-dispatch sync job so it runs only in the upstream repository (rspamd/rspamd) and not in forks. This prevents automated PRs from being opened in forks.
Vsevolod Stakhov [Sat, 29 Nov 2025 14:24:36 +0000 (14:24 +0000)]
[Feature] Auto-mark whitelist symbols with SYMBOL_TYPE_FINE flag
This change ensures that symbols with negative weight and symbols used
in whitelist composites (composites with negative score) will always
execute regardless of whether the reject threshold has been reached.
Previously, when the early-stop optimization kicked in after reaching
the reject score, whitelist symbols could be skipped, leading to
potential false positives where emails should have been whitelisted.
Changes:
- Symbols with negative weight are automatically marked as FINE during
config validation in symcache::validate()
- New rspamd_composites_mark_whitelist_deps() function traverses all
composites with negative score and marks their constituent symbols
as FINE (with transitive expansion for nested composites)
- New C API rspamd_symcache_set_symbol_fine() to programmatically set
the FINE flag with proper parent/child propagation
- FINE flag is properly synchronized between virtual symbols and their
parent symbols
Vsevolod Stakhov [Thu, 27 Nov 2025 15:37:47 +0000 (15:37 +0000)]
[Feature] Add combinator option for multimap selector rules
This change adds support for structured data output from selectors in
multimap rules. Previously, selectors always produced concatenated
strings which made it impossible to send structured JSON data to
external map services.
New 'combinator' option for selector-type multimap rules:
- 'string' (default): concatenate results with delimiter (existing behavior)
- 'array': flatten all results into a flat array
- 'object': convert pairs of selectors into key-value object
Changes:
- lua_selectors: Added combinator registry and helper functions
- get_combinator(name): returns combinator function by name
- list_combinators(): returns available combinator names
- create_selector_closure_with_combinator(): creates closure with named combinator
- multimap: Added 'combinator' option support for selector and redis+selector maps
Vsevolod Stakhov [Thu, 27 Nov 2025 10:39:54 +0000 (10:39 +0000)]
[Feature] Add control protocol command for composites statistics
- Add RSPAMD_CONTROL_COMPOSITES_STATS command to control protocol
- Add /compositesstats endpoint to control socket
- Add 'rspamadm control compositesstats' command
- Aggregate statistics from all workers with per-worker breakdown
- Remove composites stats from controller /stat (use control socket instead)
- Statistics always collected, timing sampled 1/256 (configurable)
Vsevolod Stakhov [Wed, 26 Nov 2025 12:33:26 +0000 (12:33 +0000)]
[Feature] Precompute composite atom types at config time
Resolve ATOM_COMPOSITE vs ATOM_PLAIN for all composite atoms during
configuration phase instead of lazy evaluation at runtime. This
eliminates repeated hash lookups during expression evaluation.
- Add rspamd_composites_resolve_atom_types() function
- Call after process_dependencies() and before build_inverted_index()
- Sets comp_type and ncomp pointer for each atom upfront
Vsevolod Stakhov [Wed, 26 Nov 2025 11:45:58 +0000 (11:45 +0000)]
[Fix] Copy expression string to memory pool for Lua composites
When composites are added via Lua API (rspamd_config:add_composite),
the expression string was not copied to the memory pool. The expression
parser stores pointers (atom->str) into the original string, which
became invalid after Lua garbage collected the string.
This caused the inverted index to extract garbage symbol names,
breaking composite evaluation for dynamically added composites
like MISSING_MID_ALLOWED and INVALID_MSGID_ALLOWED from mid.lua.
Vsevolod Stakhov [Wed, 26 Nov 2025 09:59:50 +0000 (09:59 +0000)]
[Fix] Handle group matchers in composites inverted index
Composites that use group matchers (g:, g+:, g-:) cannot be
efficiently indexed because we don't know which symbols will
match until runtime. Add these composites to not_only_composites
list so they are always evaluated.
Vsevolod Stakhov [Wed, 26 Nov 2025 09:18:26 +0000 (09:18 +0000)]
[Fix] Improve atom polarity detection in composites inverted index
Count NOT operations from atom to root instead of just checking direct
parent. This correctly handles nested negations like !(A & B) where
atoms A and B are both under negation even though their direct parent
is AND, not NOT.
- Even number of NOTs = positive atom (must be true)
- Odd number of NOTs = negative atom (must be false)
Vsevolod Stakhov [Tue, 25 Nov 2025 17:50:48 +0000 (17:50 +0000)]
[Feature] Add bloom filter for fast negative symbol lookups
Add an inline bloom filter (1024 bits) to rspamd_scan_result structure
for O(1) negative lookups in rspamd_task_find_symbol_result().
This optimization benefits composites evaluation where most symbol
lookups are negative (symbol not present in results). The bloom filter
is updated when symbols are inserted and checked before the hash lookup.
For 50 symbols, the false positive rate is approximately 0.5%, meaning
99.5% of negative lookups will be rejected without hash table access.
Vsevolod Stakhov [Tue, 25 Nov 2025 17:13:19 +0000 (17:13 +0000)]
[Feature] Add inverted index for composites optimization
Build an inverted index mapping symbol names to composites that contain
those symbols as positive (non-negated) atoms. This allows filtering out
composites that cannot possibly match during the first pass evaluation.
- Add rspamd_expression_atom_foreach_ex() to traverse expression atoms
with access to AST nodes (needed to detect negated atoms)
- Add rspamd_expression_node_is_op() to check if a node is an operator
- Build inverted index in composites_manager during config processing
- Track composites with only negated atoms separately (they must always
be evaluated)
- Use inverted index in composites_metric_callback for first pass to
evaluate only potentially matching composites
For configurations with many composites (4000+), this reduces the number
of composites evaluated per message from all to only those that have at
least one matching symbol present.
Vsevolod Stakhov [Sun, 23 Nov 2025 11:38:18 +0000 (11:38 +0000)]
[Feature] Add rspamd_util.decode_html_entities and improve obfuscated URL detection
- Add Lua binding for HTML entity decoding (rspamd_util.decode_html_entities)
wrapping rspamd_html_decode_entitles_inplace C function
- Switch obfuscated URL detection from regexp module to rspamd_trie
for Hyperscan-accelerated multi-pattern matching
- Fix URL flag passing (use url.create with flags table instead of add_flag)
- Fix inject_url usage (doesn't return value)
- Add functional tests for obfuscated URL detection
Vsevolod Stakhov [Sat, 22 Nov 2025 13:46:55 +0000 (13:46 +0000)]
[Feature] Add obfuscated URL detection to url_suspect plugin
Detect URLs hidden in message text using various obfuscation techniques:
- Spaced protocols (h t t p s : / /)
- hxxp variants (hxxp://)
- Bracket dots (example[.]com)
- Word dots (example dot com)
- HTML entities (. for dots)
Features:
- Hyperscan-based prefiltering for performance
- Normalization and URL extraction from obfuscated text
- URL injection with 'obscured' flag for further analysis
- Configurable via built-in settings or external maps
- DoS protection with strict limits
The url_suspect plugin had multiple critical issues:
1. R_SUSPICIOUS_URL triggered on every message with URLs, adding 25 points
due to incorrect dynamic score usage (5.0 * 5.0 instead of 1.0 * 5.0)
2. Broken compat_mode inserted R_SUSPICIOUS_URL without URL info whenever
ANY url check triggered, making it impossible to debug
3. Symbol names were unnecessarily configurable, adding complexity
4. url_suspect_group.conf was not included in groups.conf, so scores
were not loaded at all
Fixed by:
- Removed R_SUSPICIOUS_URL and compat_mode completely
- Fixed all insert_result() calls to use 1.0 dynamic weight
- Made symbol names hardcoded constants
- Added url group to groups.conf with max_score = 9.0
- Cleaned up score configuration parameters