Vsevolod Stakhov [Wed, 31 Dec 2025 10:54:55 +0000 (10:54 +0000)]
[Feature] Add extra tables API for clickhouse plugin
Allow other plugins to dynamically register custom Clickhouse tables
via rspamd_plugins['clickhouse'].register_extra_table(). Supports
per-table schemas, row callbacks (single or multiple rows), and
independent retention settings.
Vsevolod Stakhov [Mon, 29 Dec 2025 22:28:40 +0000 (22:28 +0000)]
[Fix] Fix replxx build with LLVM 21+
- Simplify CMakeLists.txt to use CMAKE_CXX_STANDARD 20
- Replace std::unordered_map with std::map to avoid libc++ ABI issues
- Add operator< to UnicodeString for std::map compatibility
Vsevolod Stakhov [Sun, 28 Dec 2025 21:20:12 +0000 (21:20 +0000)]
[Fix] Avoid SDK headers in include path when package ROOT is specified
- Add NO_DEFAULT_PATH to FIND_PATH when PKG_ROOT is set to prevent
macOS SDK C headers from polluting include paths before libc++
- Fix typo: {RSPAMD_DEFAULT_INCLUDE_PATHS} -> ${...}
- Remove obsolete paths (/opt/csw, /sw), add /opt/homebrew for macOS
Vsevolod Stakhov [Sun, 28 Dec 2025 18:45:05 +0000 (18:45 +0000)]
[Feature] Rename fuzzy_check max_score to hits_limit for clarity
The option name max_score was confusing as it doesn't refer to the
symbol score but rather the number of fuzzy hash hits at which the
normalized score reaches ~1.0 (formula: tanh(e * hits / hits_limit)).
- Rename max_score -> hits_limit in fuzzy_check.c and default config
- Add backward compatibility: max_score is still accepted as an alias
- Add lua_cfg_transform to handle legacy configs (max_score overrides
hits_limit to ensure local.d overrides work correctly)
- Add explanatory comments in config and documentation
Vsevolod Stakhov [Sat, 27 Dec 2025 10:59:05 +0000 (10:59 +0000)]
[Fix] Add resilience to lua_cfg_transform
- Check :type() before indexing UCL objects to handle null values
- Wrap transform sections in pcall to prevent one bad config section
from breaking the entire configuration load
- Log errors with section name for easier debugging
Vsevolod Stakhov [Tue, 23 Dec 2025 10:13:43 +0000 (10:13 +0000)]
[Fix] Restore Lua stack properly in second-pass MIME detection
Fix lua_settop(L, 0) which cleared the entire Lua stack instead
of restoring to the previous state, causing segfaults when
process_message() was called from Lua unit tests.
Vsevolod Stakhov [Mon, 22 Dec 2025 11:53:37 +0000 (11:53 +0000)]
[Fix] Use Fibonacci hashing for task pointer hash
Use golden ratio multiplication for 64-bit to 32-bit pointer hashing.
This provides good distribution with minimal operations (1 multiply +
1 shift) and works well with kh_int_hash_func which is identity.
Vsevolod Stakhov [Mon, 22 Dec 2025 11:25:49 +0000 (11:25 +0000)]
[Fix] Add logging, preallocation and hash mixing to task registry
- Log error when detecting use-after-free attempt on task pointer
- Preallocate task set to 16 elements to reduce early rehashing
- Mix pointer bits using multiplicative hash for better distribution
Vsevolod Stakhov [Mon, 22 Dec 2025 10:06:19 +0000 (10:06 +0000)]
[Fix] Use pointer set instead of key map for task validation
Store task pointers in a khash set and validate them on lookup
from Lua. This works with all code paths that create task userdata
directly without going through rspamd_lua_task_push.
Vsevolod Stakhov [Sun, 21 Dec 2025 20:05:27 +0000 (20:05 +0000)]
[Feature] Add task registry for safe Lua task reference validation
Implement a global task registry that maps unique uint64_t keys to task
pointers. This prevents use-after-free bugs when Lua code holds references
to tasks that may have been freed (e.g., in async Redis callbacks).
Key changes:
- Add lua_key field to rspamd_task struct
- Implement task registry using khash (O(1) lookup)
- Store lua_key in Lua userdata instead of raw pointer
- Lookup via registry when extracting task from Lua
- Remove task from registry FIRST in rspamd_task_free()
The counter-based key approach avoids issues with:
- Pointer reuse after free (memory allocator may reuse addresses)
- Lua number precision (52-bit mantissa is sufficient for counter)
- NaN/subnormal float values that could cause issues
This fixes potential use-after-free in Redis script waitq callbacks
when Redis is unavailable longer than task lifetime.
Vsevolod Stakhov [Tue, 16 Dec 2025 11:25:29 +0000 (11:25 +0000)]
[Fix] Fix Lua 5.4 compatibility in clickhouse and elastic plugins
- Merge nested settings tables (limits, retention) to preserve defaults
when user provides partial configuration
- Use %d with math.floor() instead of %.0f for integer formatting
Vsevolod Stakhov [Fri, 12 Dec 2025 20:24:26 +0000 (20:24 +0000)]
[Fix] Handle connection errors with io_uring backend in HTTP client
When using io_uring, POLLERR is reported as both EV_READ and EV_WRITE.
This caused connection failures (e.g., ECONNREFUSED) to be misinterpreted
as early server responses. Check SO_ERROR before attempting to read when
the connection hasn't been established yet.
[Test] Skip unnecessary waiting in initial scan counter read
Since the test starts already on the Status tab, the gotoTab("status")
doesn't trigger a new request, and we're just waiting for
the autorefresh to happen, causing unnecessary delay.
Vsevolod Stakhov [Thu, 11 Dec 2025 18:11:36 +0000 (18:11 +0000)]
[Feature] Add text quality analysis for PDF garbage filtering
- Add rspamd_util.get_text_quality() function with comprehensive UTF-8
text analysis using ICU for proper Unicode classification
- Returns 18 metrics: letters, digits, punctuation, spaces, printable,
words, word_chars, total, emojis, uppercase, lowercase, ascii_chars,
non_ascii_chars, latin_vowels, latin_consonants, script_transitions,
double_spaces, non_printable
- Add confidence scoring to PDF text extraction to filter garbage tokens
(single characters, encoded data, random sequences)
- Configurable via text_quality_threshold, text_quality_min_length,
text_quality_enabled options in pdf module config
- Add unit tests for get_text_quality function
[Fix] Correct symbols column index in history and scan tables
Fixes regression introduced in 62b136a where sorting fails with
"can't access property 'sortValue', val.options is undefined" on
the History tab, and symbol reordering doesn't work on the Scan tab.
The "file" column addition shifted the symbols column index, but
history.js and upload.js were not updated, causing symbol reordering
to target wrong columns.
[Feature] Add multipart and msgpack formatters to metadata_exporter
- Add multipart formatter for HTTP export using form-data with separate
metadata (JSON) and message (rfc822) parts
- Add msgpack formatter for efficient binary serialization
- Add json_with_message formatter for JSON with base64-encoded message
- Deprecate meta_headers option (broken by design for complex data)
- HTTP pusher now auto-detects multipart boundary from formatter
[Fix] Only apply early response handling for HTTP clients
The early response detection logic should not run for server-side
connections, as it incorrectly modifies wr_pos state when the server
reads incoming requests. This was breaking spamc protocol handling.
[Fix] Handle HTTP early server responses during request write
Fix HTTP client to properly handle early server responses (e.g., 413
Too Large) that arrive before the client has finished sending the
request body. This is allowed by HTTP/1.1 (RFC 7230 Section 6.5).
- Use bitwise AND for event flag checks to handle combined EV_READ|EV_WRITE
- Watch for both READ and WRITE events during write phase
- Check for early response on write errors (EPIPE, ECONNRESET)
- Add RSPAMD_HTTP_CONN_FLAG_EARLY_RESPONSE flag to track state
FreeBSD 15.0 introduced native inotify support, which causes
libev to enable EV_USE_INOTIFY. On FreeBSD, struct statfs is
defined in <sys/mount.h> rather than <sys/statfs.h>.
Note: This fix obsoletes the corresponding patch file in the
FreeBSD `mail/rspamd` and `mail/rspamd-devel` ports.
[Fix] Allow default_headers_order to be configured in milter_headers
Fixes #5781: The default_headers_order setting was defined in the plugin
but never read from the configuration file. Add schema validation and
config loading for this option.
[Fix] Fix Lua 5.4 compatibility issues in neural plugin
This commit addresses several Lua 5.4 compatibility issues that caused
the neural LLM tests to fail:
1. Redis TTL must be integer (lua_cache.lua):
- Lua 5.4's tostring() produces "4.0" for floats instead of "4"
- Redis SETEX/EXPIRE commands require integer TTL values
- Fixed by using math.floor() before tostring()
2. Version number format in ANN keys (lualib/plugins/neural.lua):
- Changed string format from %s to %d for version numbers
- Ensures integer format "1" instead of potential "1.0"
3. Iterator vs table handling (src/plugins/lua/neural.lua):
- fun.map() returns an iterator, not a table
- In Lua 5.4, # operator on iterators returns 0
- Fixed by wrapping with fun.totable() to get a proper table
4. Nil values in table arguments (lualib/plugins/neural.lua):
- Lua 5.4 handles nil values in tables differently
- Tables like {a, b, nil, nil} have undefined length behavior
- Fixed by using empty string defaults for optional parameters
5. Redis script nil checks (neural_save_unlock.lua):
- Added empty string checks alongside nil checks
- Ensures optional fields are only set when truly provided
6. Test infrastructure improvements:
- Added logging to dummy_llm.py for debugging
- Added proper error handling and diagnostics
- Updated rspamd.robot with better dummy_llm startup logging
- Add required Host header to all HTTP/1.1 requests in tcp.lua
- Bind dummy servers to 127.0.0.1 instead of localhost to avoid
IPv6/IPv4 mismatch on systems where localhost resolves to ::1
Lua 5.4's require() returns both the module and the file path, while
LuaJIT returns only the module. Save stack top before luaL_dostring
and restore to top+1 after to keep only the first return value.
[Fix] Use ipairs for ordered iteration in header checks
pairs() does not guarantee iteration order for numeric keys. In Lua 5.4
this caused RCVD_COUNT, HAS_X_PRIO, and RCPT_COUNT symbols to select
wrong thresholds when the table was iterated in non-ascending order.
[Fix] Use math.floor for Lua 5.4 integer division compatibility
In Lua 5.4, the / operator always returns a float (2/2 = 1.0), while
LuaJIT returns an integer (2/2 = 1). This caused test dependency
registration to fail as tostring(i/2) produced "1.0" instead of "1".
[Fix] Improve loadstring error handling for Lua 5.4 compatibility
Ensure loadstring results are checked for nil (syntax errors) before
passing to pcall. This prevents errors when running with Lua 5.4
compatibility where load behavior differs slightly or when handling
invalid Lua chunks.
[Fix] Use userdata __gc for UCL objects in all Lua versions
Use userdata __gc instead of table __gc for UCL object garbage
collection in all Lua versions. Table __gc in Lua 5.2+ can cause
use-after-free crashes due to GC ordering issues when UCL objects
reference each other or config objects.
[Fix] Use locale-independent patterns in URL encoding
Replace %w with explicit A-Za-z0-9 ranges in URL encoding functions.
The %w pattern is locale-dependent and incorrectly matches high bytes
(0xE4, 0xE5, 0xE6) as word characters in UTF-8 locales like en_GB.UTF-8,
breaking URL encoding of non-ASCII characters.
- Add startup progress messages to stderr
- Capture exceptions with full traceback
- Write PID only after successful server start
- Log output on failure in robot tests
[Test] Fix Lua 5.4 and cffi-lua compatibility issues
- Fix rspamd_memspn to handle empty character set without crash
- Add unpack compatibility shim for Lua 5.2+ (table.unpack)
- Replace deprecated table.maxn with # operator
- Fix cffi-lua strict type checking (char* vs unsigned char*)
- Add helpers for cdata-to-number conversion (64-bit integers)
- Add proper NULL pointer detection for cffi-lua
- Fix lua_resume to use coroutine threads in Lua 5.4
- Update test expectations for Lua 5.4 tostring(float) behavior
- base32: Use tostring() for size_t values in format strings
- expressions: Use %s instead of %d for float values (Lua 5.4 strict)
- fpconv: Skip variadic function tests on cffi-lua (not supported)
[CI] Install luarocks and dependencies for cffi-lua
The Fedora CI image doesn't have luarocks pre-installed, so we need
to install it along with lua-devel and libffi-devel before we can
install cffi-lua via luarocks.
[Test] Use size ranges for gzip tests to support zlib-ng
Fedora 40+ uses zlib-ng which produces slightly different compressed
sizes than standard zlib. Instead of checking exact sizes, use
reasonable ranges that accommodate both implementations.