Vsevolod Stakhov [Tue, 16 Dec 2025 11:25:29 +0000 (11:25 +0000)]
[Fix] Fix Lua 5.4 compatibility in clickhouse and elastic plugins
- Merge nested settings tables (limits, retention) to preserve defaults
when user provides partial configuration
- Use %d with math.floor() instead of %.0f for integer formatting
Vsevolod Stakhov [Fri, 12 Dec 2025 20:24:26 +0000 (20:24 +0000)]
[Fix] Handle connection errors with io_uring backend in HTTP client
When using io_uring, POLLERR is reported as both EV_READ and EV_WRITE.
This caused connection failures (e.g., ECONNREFUSED) to be misinterpreted
as early server responses. Check SO_ERROR before attempting to read when
the connection hasn't been established yet.
[Test] Skip unnecessary waiting in initial scan counter read
Since the test starts already on the Status tab, the gotoTab("status")
doesn't trigger a new request, and we're just waiting for
the autorefresh to happen, causing unnecessary delay.
Vsevolod Stakhov [Thu, 11 Dec 2025 18:11:36 +0000 (18:11 +0000)]
[Feature] Add text quality analysis for PDF garbage filtering
- Add rspamd_util.get_text_quality() function with comprehensive UTF-8
text analysis using ICU for proper Unicode classification
- Returns 18 metrics: letters, digits, punctuation, spaces, printable,
words, word_chars, total, emojis, uppercase, lowercase, ascii_chars,
non_ascii_chars, latin_vowels, latin_consonants, script_transitions,
double_spaces, non_printable
- Add confidence scoring to PDF text extraction to filter garbage tokens
(single characters, encoded data, random sequences)
- Configurable via text_quality_threshold, text_quality_min_length,
text_quality_enabled options in pdf module config
- Add unit tests for get_text_quality function
[Fix] Correct symbols column index in history and scan tables
Fixes regression introduced in 62b136a where sorting fails with
"can't access property 'sortValue', val.options is undefined" on
the History tab, and symbol reordering doesn't work on the Scan tab.
The "file" column addition shifted the symbols column index, but
history.js and upload.js were not updated, causing symbol reordering
to target wrong columns.
[Feature] Add multipart and msgpack formatters to metadata_exporter
- Add multipart formatter for HTTP export using form-data with separate
metadata (JSON) and message (rfc822) parts
- Add msgpack formatter for efficient binary serialization
- Add json_with_message formatter for JSON with base64-encoded message
- Deprecate meta_headers option (broken by design for complex data)
- HTTP pusher now auto-detects multipart boundary from formatter
[Fix] Only apply early response handling for HTTP clients
The early response detection logic should not run for server-side
connections, as it incorrectly modifies wr_pos state when the server
reads incoming requests. This was breaking spamc protocol handling.
[Fix] Handle HTTP early server responses during request write
Fix HTTP client to properly handle early server responses (e.g., 413
Too Large) that arrive before the client has finished sending the
request body. This is allowed by HTTP/1.1 (RFC 7230 Section 6.5).
- Use bitwise AND for event flag checks to handle combined EV_READ|EV_WRITE
- Watch for both READ and WRITE events during write phase
- Check for early response on write errors (EPIPE, ECONNRESET)
- Add RSPAMD_HTTP_CONN_FLAG_EARLY_RESPONSE flag to track state
FreeBSD 15.0 introduced native inotify support, which causes
libev to enable EV_USE_INOTIFY. On FreeBSD, struct statfs is
defined in <sys/mount.h> rather than <sys/statfs.h>.
Note: This fix obsoletes the corresponding patch file in the
FreeBSD `mail/rspamd` and `mail/rspamd-devel` ports.
[Fix] Allow default_headers_order to be configured in milter_headers
Fixes #5781: The default_headers_order setting was defined in the plugin
but never read from the configuration file. Add schema validation and
config loading for this option.
[Fix] Fix Lua 5.4 compatibility issues in neural plugin
This commit addresses several Lua 5.4 compatibility issues that caused
the neural LLM tests to fail:
1. Redis TTL must be integer (lua_cache.lua):
- Lua 5.4's tostring() produces "4.0" for floats instead of "4"
- Redis SETEX/EXPIRE commands require integer TTL values
- Fixed by using math.floor() before tostring()
2. Version number format in ANN keys (lualib/plugins/neural.lua):
- Changed string format from %s to %d for version numbers
- Ensures integer format "1" instead of potential "1.0"
3. Iterator vs table handling (src/plugins/lua/neural.lua):
- fun.map() returns an iterator, not a table
- In Lua 5.4, # operator on iterators returns 0
- Fixed by wrapping with fun.totable() to get a proper table
4. Nil values in table arguments (lualib/plugins/neural.lua):
- Lua 5.4 handles nil values in tables differently
- Tables like {a, b, nil, nil} have undefined length behavior
- Fixed by using empty string defaults for optional parameters
5. Redis script nil checks (neural_save_unlock.lua):
- Added empty string checks alongside nil checks
- Ensures optional fields are only set when truly provided
6. Test infrastructure improvements:
- Added logging to dummy_llm.py for debugging
- Added proper error handling and diagnostics
- Updated rspamd.robot with better dummy_llm startup logging
- Add required Host header to all HTTP/1.1 requests in tcp.lua
- Bind dummy servers to 127.0.0.1 instead of localhost to avoid
IPv6/IPv4 mismatch on systems where localhost resolves to ::1
Lua 5.4's require() returns both the module and the file path, while
LuaJIT returns only the module. Save stack top before luaL_dostring
and restore to top+1 after to keep only the first return value.
[Fix] Use ipairs for ordered iteration in header checks
pairs() does not guarantee iteration order for numeric keys. In Lua 5.4
this caused RCVD_COUNT, HAS_X_PRIO, and RCPT_COUNT symbols to select
wrong thresholds when the table was iterated in non-ascending order.
[Fix] Use math.floor for Lua 5.4 integer division compatibility
In Lua 5.4, the / operator always returns a float (2/2 = 1.0), while
LuaJIT returns an integer (2/2 = 1). This caused test dependency
registration to fail as tostring(i/2) produced "1.0" instead of "1".
[Fix] Improve loadstring error handling for Lua 5.4 compatibility
Ensure loadstring results are checked for nil (syntax errors) before
passing to pcall. This prevents errors when running with Lua 5.4
compatibility where load behavior differs slightly or when handling
invalid Lua chunks.
[Fix] Use userdata __gc for UCL objects in all Lua versions
Use userdata __gc instead of table __gc for UCL object garbage
collection in all Lua versions. Table __gc in Lua 5.2+ can cause
use-after-free crashes due to GC ordering issues when UCL objects
reference each other or config objects.
[Fix] Use locale-independent patterns in URL encoding
Replace %w with explicit A-Za-z0-9 ranges in URL encoding functions.
The %w pattern is locale-dependent and incorrectly matches high bytes
(0xE4, 0xE5, 0xE6) as word characters in UTF-8 locales like en_GB.UTF-8,
breaking URL encoding of non-ASCII characters.
- Add startup progress messages to stderr
- Capture exceptions with full traceback
- Write PID only after successful server start
- Log output on failure in robot tests
[Test] Fix Lua 5.4 and cffi-lua compatibility issues
- Fix rspamd_memspn to handle empty character set without crash
- Add unpack compatibility shim for Lua 5.2+ (table.unpack)
- Replace deprecated table.maxn with # operator
- Fix cffi-lua strict type checking (char* vs unsigned char*)
- Add helpers for cdata-to-number conversion (64-bit integers)
- Add proper NULL pointer detection for cffi-lua
- Fix lua_resume to use coroutine threads in Lua 5.4
- Update test expectations for Lua 5.4 tostring(float) behavior
- base32: Use tostring() for size_t values in format strings
- expressions: Use %s instead of %d for float values (Lua 5.4 strict)
- fpconv: Skip variadic function tests on cffi-lua (not supported)
[CI] Install luarocks and dependencies for cffi-lua
The Fedora CI image doesn't have luarocks pre-installed, so we need
to install it along with lua-devel and libffi-devel before we can
install cffi-lua via luarocks.
[Test] Use size ranges for gzip tests to support zlib-ng
Fedora 40+ uses zlib-ng which produces slightly different compressed
sizes than standard zlib. Instead of checking exact sizes, use
reasonable ranges that accommodate both implementations.
OpenSSL ENGINE API was deprecated in 3.0 and the header is removed
in newer versions. In dkim.c it was unused; in ssl_util.c we now
conditionally include it only for OpenSSL versions that need it.
[Fix] Fix reputation whitelist schema and selector-aware checking
- lua_maps_expressions.schema: Change rules from array to key-value
table to match actual UCL config format (fixes #5780)
- reputation.lua: Make simple whitelist maps selector-aware instead
of always assuming IP-based whitelists
[Fix] Avoid repeated simdutf implementation detection on each call
The previous code stored a pointer to simdutf's proxy singleton instead
of the actual implementation, causing detect_best_supported() to be
called on every UTF-8 validation operation.
[Fix] Normalize URLs with multiple slashes between host and path
Fixes #5773: URLs like https://example.com//path were not being
normalized. The extra slashes between host and path are now collapsed
to a single slash during URL parsing.
[Fix] Use double type for rspamd_scan_time_average Prometheus metric
The metric was always 0 because rspamd_metrics_add_integer() truncated
the avg_scan_time double value (typically fractions of a second) to
integer. Added rspamd_metrics_add_double() helper and use it for
avg_scan_time.
[Feature] Add --recheck-rua option to dmarc_report for RUA filtering at send time
- Add -r/--recheck-rua flag to rspamadm dmarc_report to re-check RUA
addresses against exclude_rua_addresses map before sending reports
- Extend lua_maps to support rspamadm context for external map queries,
enabling coroutine-based synchronous HTTP requests
- Works with both local maps and external (HTTP) maps
- Checks both full email address and domain-only against the map
[Fix] Enable aliases plugin by default to restore plus-addressing
Fixes #5768: Settings lookup was broken for subaddressed recipients
(e.g., user+folder@example.com) because the aliases plugin was
disabled by default after it was moved from rules/misc.lua in 3.14.
This restores the pre-3.14 behavior where plus-tags are stripped
and virtual recipients are created for settings matching.
[Test] Restrict public suffix sync workflow to upstream repository
Restrict the scheduled/public-dispatch sync job so it runs only in the upstream repository (rspamd/rspamd) and not in forks. This prevents automated PRs from being opened in forks.
Vsevolod Stakhov [Sat, 29 Nov 2025 14:24:36 +0000 (14:24 +0000)]
[Feature] Auto-mark whitelist symbols with SYMBOL_TYPE_FINE flag
This change ensures that symbols with negative weight and symbols used
in whitelist composites (composites with negative score) will always
execute regardless of whether the reject threshold has been reached.
Previously, when the early-stop optimization kicked in after reaching
the reject score, whitelist symbols could be skipped, leading to
potential false positives where emails should have been whitelisted.
Changes:
- Symbols with negative weight are automatically marked as FINE during
config validation in symcache::validate()
- New rspamd_composites_mark_whitelist_deps() function traverses all
composites with negative score and marks their constituent symbols
as FINE (with transitive expansion for nested composites)
- New C API rspamd_symcache_set_symbol_fine() to programmatically set
the FINE flag with proper parent/child propagation
- FINE flag is properly synchronized between virtual symbols and their
parent symbols
Vsevolod Stakhov [Thu, 27 Nov 2025 15:37:47 +0000 (15:37 +0000)]
[Feature] Add combinator option for multimap selector rules
This change adds support for structured data output from selectors in
multimap rules. Previously, selectors always produced concatenated
strings which made it impossible to send structured JSON data to
external map services.
New 'combinator' option for selector-type multimap rules:
- 'string' (default): concatenate results with delimiter (existing behavior)
- 'array': flatten all results into a flat array
- 'object': convert pairs of selectors into key-value object
Changes:
- lua_selectors: Added combinator registry and helper functions
- get_combinator(name): returns combinator function by name
- list_combinators(): returns available combinator names
- create_selector_closure_with_combinator(): creates closure with named combinator
- multimap: Added 'combinator' option support for selector and redis+selector maps
Vsevolod Stakhov [Thu, 27 Nov 2025 10:39:54 +0000 (10:39 +0000)]
[Feature] Add control protocol command for composites statistics
- Add RSPAMD_CONTROL_COMPOSITES_STATS command to control protocol
- Add /compositesstats endpoint to control socket
- Add 'rspamadm control compositesstats' command
- Aggregate statistics from all workers with per-worker breakdown
- Remove composites stats from controller /stat (use control socket instead)
- Statistics always collected, timing sampled 1/256 (configurable)