Vsevolod Stakhov [Thu, 27 Nov 2025 15:37:47 +0000 (15:37 +0000)]
[Feature] Add combinator option for multimap selector rules
This change adds support for structured data output from selectors in
multimap rules. Previously, selectors always produced concatenated
strings which made it impossible to send structured JSON data to
external map services.
New 'combinator' option for selector-type multimap rules:
- 'string' (default): concatenate results with delimiter (existing behavior)
- 'array': flatten all results into a flat array
- 'object': convert pairs of selectors into key-value object
Changes:
- lua_selectors: Added combinator registry and helper functions
- get_combinator(name): returns combinator function by name
- list_combinators(): returns available combinator names
- create_selector_closure_with_combinator(): creates closure with named combinator
- multimap: Added 'combinator' option support for selector and redis+selector maps
Vsevolod Stakhov [Thu, 27 Nov 2025 10:39:54 +0000 (10:39 +0000)]
[Feature] Add control protocol command for composites statistics
- Add RSPAMD_CONTROL_COMPOSITES_STATS command to control protocol
- Add /compositesstats endpoint to control socket
- Add 'rspamadm control compositesstats' command
- Aggregate statistics from all workers with per-worker breakdown
- Remove composites stats from controller /stat (use control socket instead)
- Statistics always collected, timing sampled 1/256 (configurable)
Vsevolod Stakhov [Wed, 26 Nov 2025 12:33:26 +0000 (12:33 +0000)]
[Feature] Precompute composite atom types at config time
Resolve ATOM_COMPOSITE vs ATOM_PLAIN for all composite atoms during
configuration phase instead of lazy evaluation at runtime. This
eliminates repeated hash lookups during expression evaluation.
- Add rspamd_composites_resolve_atom_types() function
- Call after process_dependencies() and before build_inverted_index()
- Sets comp_type and ncomp pointer for each atom upfront
Vsevolod Stakhov [Wed, 26 Nov 2025 11:45:58 +0000 (11:45 +0000)]
[Fix] Copy expression string to memory pool for Lua composites
When composites are added via Lua API (rspamd_config:add_composite),
the expression string was not copied to the memory pool. The expression
parser stores pointers (atom->str) into the original string, which
became invalid after Lua garbage collected the string.
This caused the inverted index to extract garbage symbol names,
breaking composite evaluation for dynamically added composites
like MISSING_MID_ALLOWED and INVALID_MSGID_ALLOWED from mid.lua.
Vsevolod Stakhov [Wed, 26 Nov 2025 09:59:50 +0000 (09:59 +0000)]
[Fix] Handle group matchers in composites inverted index
Composites that use group matchers (g:, g+:, g-:) cannot be
efficiently indexed because we don't know which symbols will
match until runtime. Add these composites to not_only_composites
list so they are always evaluated.
Vsevolod Stakhov [Wed, 26 Nov 2025 09:18:26 +0000 (09:18 +0000)]
[Fix] Improve atom polarity detection in composites inverted index
Count NOT operations from atom to root instead of just checking direct
parent. This correctly handles nested negations like !(A & B) where
atoms A and B are both under negation even though their direct parent
is AND, not NOT.
- Even number of NOTs = positive atom (must be true)
- Odd number of NOTs = negative atom (must be false)
Vsevolod Stakhov [Tue, 25 Nov 2025 17:50:48 +0000 (17:50 +0000)]
[Feature] Add bloom filter for fast negative symbol lookups
Add an inline bloom filter (1024 bits) to rspamd_scan_result structure
for O(1) negative lookups in rspamd_task_find_symbol_result().
This optimization benefits composites evaluation where most symbol
lookups are negative (symbol not present in results). The bloom filter
is updated when symbols are inserted and checked before the hash lookup.
For 50 symbols, the false positive rate is approximately 0.5%, meaning
99.5% of negative lookups will be rejected without hash table access.
Vsevolod Stakhov [Tue, 25 Nov 2025 17:13:19 +0000 (17:13 +0000)]
[Feature] Add inverted index for composites optimization
Build an inverted index mapping symbol names to composites that contain
those symbols as positive (non-negated) atoms. This allows filtering out
composites that cannot possibly match during the first pass evaluation.
- Add rspamd_expression_atom_foreach_ex() to traverse expression atoms
with access to AST nodes (needed to detect negated atoms)
- Add rspamd_expression_node_is_op() to check if a node is an operator
- Build inverted index in composites_manager during config processing
- Track composites with only negated atoms separately (they must always
be evaluated)
- Use inverted index in composites_metric_callback for first pass to
evaluate only potentially matching composites
For configurations with many composites (4000+), this reduces the number
of composites evaluated per message from all to only those that have at
least one matching symbol present.
Vsevolod Stakhov [Sun, 23 Nov 2025 11:38:18 +0000 (11:38 +0000)]
[Feature] Add rspamd_util.decode_html_entities and improve obfuscated URL detection
- Add Lua binding for HTML entity decoding (rspamd_util.decode_html_entities)
wrapping rspamd_html_decode_entitles_inplace C function
- Switch obfuscated URL detection from regexp module to rspamd_trie
for Hyperscan-accelerated multi-pattern matching
- Fix URL flag passing (use url.create with flags table instead of add_flag)
- Fix inject_url usage (doesn't return value)
- Add functional tests for obfuscated URL detection
Vsevolod Stakhov [Sat, 22 Nov 2025 13:46:55 +0000 (13:46 +0000)]
[Feature] Add obfuscated URL detection to url_suspect plugin
Detect URLs hidden in message text using various obfuscation techniques:
- Spaced protocols (h t t p s : / /)
- hxxp variants (hxxp://)
- Bracket dots (example[.]com)
- Word dots (example dot com)
- HTML entities (. for dots)
Features:
- Hyperscan-based prefiltering for performance
- Normalization and URL extraction from obfuscated text
- URL injection with 'obscured' flag for further analysis
- Configurable via built-in settings or external maps
- DoS protection with strict limits
The url_suspect plugin had multiple critical issues:
1. R_SUSPICIOUS_URL triggered on every message with URLs, adding 25 points
due to incorrect dynamic score usage (5.0 * 5.0 instead of 1.0 * 5.0)
2. Broken compat_mode inserted R_SUSPICIOUS_URL without URL info whenever
ANY url check triggered, making it impossible to debug
3. Symbol names were unnecessarily configurable, adding complexity
4. url_suspect_group.conf was not included in groups.conf, so scores
were not loaded at all
Fixed by:
- Removed R_SUSPICIOUS_URL and compat_mode completely
- Fixed all insert_result() calls to use 1.0 dynamic weight
- Made symbol names hardcoded constants
- Added url group to groups.conf with max_score = 9.0
- Cleaned up score configuration parameters
Vsevolod Stakhov [Thu, 20 Nov 2025 12:32:33 +0000 (12:32 +0000)]
[Fix] Prevent infinite loop in split_networks_into_chunks()
If a single IP network entry exceeds max_record_length (450 chars) on its own,
the code would enter an infinite loop. Added validation to check if individual
items can fit before attempting to add them to chunks. Items that are too large
are now skipped with a warning message.
Vsevolod Stakhov [Thu, 20 Nov 2025 11:43:40 +0000 (11:43 +0000)]
[Feature] Add SPF flattening tool with macro preservation
- Add new 'spf-flatten' command to dns_tool for optimizing SPF records
- Introduce RSPAMD_SPF_FLAG_MACRO_UNRESOLVED flag to preserve SPF macros
- Prevent macro expansion when sender IP is unavailable (flatten mode)
- SPF elements with macros (exists:, a:, mx:, ptr:) now preserved correctly
- Add multiple output formats: default, json, compact (BIND-style)
- Optimize IP addresses by removing default /32 and /128 masks
- Automatically split large SPF records into multiple includes
- Preserve qualifiers and 'all' mechanism in flattened records
Vsevolod Stakhov [Wed, 19 Nov 2025 16:20:57 +0000 (16:20 +0000)]
[Fix] Fix lua_shape registry to recursively resolve nested schemas
The registry's resolve_schema function was not recursively resolving
field schemas and opts.extra in table nodes, causing mixins in nested
one_of variants to never be expanded. This broke external_relay plugin
validation where rule variants with mixins were reported as having
unknown fields.
Now recursively resolves all nested schemas including field schemas,
opts.extra, and ensures mixins are properly expanded throughout the
entire schema tree.
Vsevolod Stakhov [Wed, 19 Nov 2025 14:03:15 +0000 (14:03 +0000)]
[Fix] Fix test files to handle missing map env vars
- Add defensive checks in maps_kv.lua to skip map creation if env vars not set
- Ensure test config file is always saved to robot-save directory
Fixes issue where external_relay test would crash because maps_kv.lua tried
to create maps with nil URLs when RADIX_MAP/MAP_MAP/REGEXP_MAP env vars
were not defined.
Vsevolod Stakhov [Wed, 19 Nov 2025 12:44:15 +0000 (12:44 +0000)]
[Fix] Preserve metatables in shallowcopy and save test configs
- Fix shallowcopy to preserve metatables when copying schema objects
- Ensure configdump output and input config are always saved to robot-save
even if configdump crashes with assertion failure
Vsevolod Stakhov [Wed, 19 Nov 2025 11:50:28 +0000 (11:50 +0000)]
[Fix] Fix maybe_adjust_type callback return and add nil check
Fixed bug where maybe_adjust_type returned only mtype instead of data,mtype
when mtype=='callback'. Added defensive check to prevent passing nil/empty
URL to rspamd_config:add_map which would cause assertion failure.
Vsevolod Stakhov [Wed, 19 Nov 2025 11:26:43 +0000 (11:26 +0000)]
[Rework] Refactor T.transform to validate input first
Changed T.transform to validate input type before applying transformer.
If transformer returns nil, treat as error. Output is not type-checked.
Updated all usages and tests.
Vsevolod Stakhov [Wed, 19 Nov 2025 10:19:09 +0000 (10:19 +0000)]
[Fix] Fix transform schemas - inner schema describes result
- Transform inner schema now describes the result type, not input
- Fixed lua_maps.lua: timeout transform
- Fixed lua_redis.lua: timeout, sentinel_watch_time, sentinel_master_maxerrors, redis_version
- Fixed rbl.lua: key transforms, return_bits number transforms
- Transform functions now handle type checking before conversion
Vsevolod Stakhov [Wed, 19 Nov 2025 09:51:20 +0000 (09:51 +0000)]
[Fix] Fix RBL plugin transform schemas
- Transform inner schema now validates the result, not input
- Updated return_codes_schema and return_bits_schema
- Transform functions now handle type conversion properly
Vsevolod Stakhov [Wed, 19 Nov 2025 09:17:26 +0000 (09:17 +0000)]
[Fix] Improve lua_shape error safety
- Transform functions wrapped in pcall to catch user errors
- Default value functions wrapped in pcall
- Pattern matching (string.match, lpeg.match) wrapped in pcall
- Unresolved references return validation errors instead of throwing
- Library now never throws Lua errors on invalid input
Vsevolod Stakhov [Wed, 19 Nov 2025 09:06:35 +0000 (09:06 +0000)]
[Fix] Fix lua_shape transform logic
- Transform now validates original value before transformation
- Open tables now properly apply extra schema with transforms
- Fixes RBL plugin returncodes handling
Vsevolod Stakhov [Tue, 18 Nov 2025 12:45:29 +0000 (12:45 +0000)]
[Fix] Fix lua_shape transform compatibility and test issues
Multiple fixes to make lua_shape fully compatible with tableshape:
1. Transform return values: :transform() now returns (value) on success,
(nil, error) on failure - matching tableshape API. :check() still
returns (bool, value_or_error).
2. Transform functions: Fix check_transform to pass only value to
transform function, not (value, ctx). Transform functions expect
single argument.
3. Test updates: Update all lua_shape unit tests to use new :transform()
API with (val, err) instead of (ok, val).
4. Selector schema fixes:
- header: Accept any string for flags, not just literals
- specific_urls: Simplify boolean+string handling with single transform
All 783 tests now pass: 780 passed, 0 failed, 0 errors, 3 unassertive.
Vsevolod Stakhov [Tue, 18 Nov 2025 12:31:15 +0000 (12:31 +0000)]
[Fix] Make lua_shape :transform() tableshape-compatible
Change :transform() return values to match tableshape behavior:
- Success: return value (not true, value)
- Failure: return nil, error (not false, error)
This is required for compatibility with lua_selectors check_args which
expects the tableshape return convention. The :check() method still
returns (bool, value_or_error) for new code.
Schemas:
- return_codes_schema: Map of symbols to IP patterns
- return_bits_schema: Map of symbols to bit numbers
- rule_schema: Complete RBL rule with 50+ configuration options
Vsevolod Stakhov [Tue, 18 Nov 2025 11:09:22 +0000 (11:09 +0000)]
[Feature] Add T.callable() type to lua_shape
Add function/callable type validator to lua_shape core.
Implementation:
- check_callable(): Validates value is of type 'function'
- T.callable(opts): Constructor for callable type
- Returns type_mismatch error if value is not a function
This was missing from the initial implementation and is needed by
lua_meta.lua which validates callback functions in metafunction schemas.
Documentation added to README.md scalar types section.
Schemas:
- generic_selector: selector-based reputation with whitelist/exclusion
- redis backend: prefix, expiry, and time buckets array
- dns backend: simple DNS list configuration
Vsevolod Stakhov [Tue, 18 Nov 2025 10:07:51 +0000 (10:07 +0000)]
[Minor] Migrate src/plugins/lua/bimi.lua to lua_shape
Replace tableshape with lua_shape in BIMI plugin.
Changes:
- Uses lua_redis.enrich_schema() which returns lua_shape with mixins
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields
Schema: settings_schema with Redis common fields plus BIMI-specific
settings for helper service URL, timeouts, VMC-only mode, and staged
HTTP timeouts (connect, SSL, write, read).
Vsevolod Stakhov [Tue, 18 Nov 2025 09:54:34 +0000 (09:54 +0000)]
[Minor] Migrate src/plugins/lua/history_redis.lua to lua_shape
Replace tableshape with lua_shape in history_redis plugin.
Changes:
- Uses lua_redis.enrich_schema() which now returns lua_shape with mixins
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields
Schema: settings_schema with Redis common fields (via mixin) plus
history-specific fields like key_prefix, nrows, compress, and
subject privacy options.
Vsevolod Stakhov [Mon, 17 Nov 2025 21:31:55 +0000 (21:31 +0000)]
[Feature] Add callable defaults support to lua_shape
Enhance lua_shape to support function values as defaults that are
evaluated dynamically at validation time, not at schema definition time.
Core changes:
- check_optional(): Check if default is a function and call it
- check_table(): Same for table field defaults
- Enables patterns like: T.string():with_default(get_timestamp)
lua_aws.lua cleanup:
Replace ugly patterns like:
T.transform(T.one_of({T.string(), T.literal(nil)}),
function(v) return v or 'GET' end)
With clean:
T.string():with_default('GET')
For dynamic defaults (date field), use:
T.string():with_default(today_canonical) -- function ref, not call
Benefits:
- Much cleaner and more readable schemas
- Consistent with lua_shape design philosophy
- Dynamic defaults for timestamps, random values, etc.
- Static defaults for constants
Vsevolod Stakhov [Mon, 17 Nov 2025 21:12:29 +0000 (21:12 +0000)]
[Minor] Migrate lua_redis.lua to lua_shape with mixin tracking
Replace tableshape with lua_shape using first-class mixin system
to preserve origin tracking for documentation and better error reporting.
Key changes:
- common_schema: Now a proper T.table() schema (not plain table)
- Schema composition: Uses T.mixin() instead of lutil.table_merge()
- enrich_schema: Returns T.one_of() with 6 named variants, each using
mixins for redis_common and external fields
- All tables use { open = true } to allow additional fields
- Transform patterns: T.transform(T.number(), tostring) for type conversions
- Union types: T.one_of({...}) for alternatives like string | array
Benefits:
- Mixin origin tracking preserved in field metadata
- Documentation generation can show field sources
- Better error messages with mixin context
- Consistent with lua_shape design philosophy
Vsevolod Stakhov [Mon, 17 Nov 2025 18:07:34 +0000 (18:07 +0000)]
[Minor] Migrate lua_mime.lua from tableshape to lua_shape
Replace tableshape with lua_shape in message_to_ucl_schema function:
- Convert ts.shape to T.table
- Convert :describe() to :doc({ summary = ... })
- Convert :is_optional() to :optional()
- Convert ts.array_of to T.array
- Convert ts.pattern to T.string({ pattern = ... })
- Convert ts.one_of to T.enum for simple value lists
- Convert ts.integer/string/boolean to T.integer()/string()/boolean()
Vsevolod Stakhov [Mon, 17 Nov 2025 16:46:46 +0000 (16:46 +0000)]
[Minor] Migrate lua_meta.lua from tableshape to lua_shape
Replace tableshape with lua_shape for metafunction schema:
- Convert ts.shape to T.table
- Convert ts.func to T.callable()
- Convert ts.array_of to T.array
- Convert :is_optional() to :optional()
Vsevolod Stakhov [Mon, 17 Nov 2025 16:32:01 +0000 (16:32 +0000)]
[Minor] Migrate lua_maps.lua from tableshape to lua_shape
Replace tableshape with lua_shape for map configuration schemas:
- Convert ts.shape to T.table
- Convert ts.equivalent(true) to T.literal(true)
- Convert :is_optional() to :optional()
- Convert ts.one_of to T.one_of with proper braces
- Convert ts.enum with :is_optional() to T.enum():optional()
- Convert ts.array_of to T.array
Updates external_map_schema, direct_map_schema, and exports.map_schema.
No functional changes, luacheck passes.
Vsevolod Stakhov [Mon, 17 Nov 2025 15:32:50 +0000 (15:32 +0000)]
[Minor] Migrate lua_fuzzy.lua from tableshape to lua_shape
Replace tableshape with lua_shape for fuzzy policy schemas:
- Convert ts.number + ts.string / tonumber to T.transform + T.one_of
- Convert ts.array_of to T.array
- Convert ts.boolean to T.boolean()
- Convert ts.shape to T.table with open option support
Vsevolod Stakhov [Mon, 17 Nov 2025 15:31:52 +0000 (15:31 +0000)]
[Minor] Migrate lua_maps_expressions.lua from tableshape to lua_shape
Replace tableshape with lua_shape for maps expressions schema:
- Convert ts.shape to T.table
- Convert ts.array_of to T.array
- Keep reference to lua_maps.map_schema (will be migrated separately)
Vsevolod Stakhov [Mon, 17 Nov 2025 15:29:36 +0000 (15:29 +0000)]
[Minor] Migrate lua_aws.lua from tableshape to lua_shape
Replace tableshape with lua_shape for AWS parameter validation:
- Convert ts.shape to T.table
- Replace ts.string + ts['nil'] / fn with T.transform + T.one_of
- Replace ts.map_of(ts.string, ts.string) with open table validation
- Update documentation comments to reflect new schema format
Vsevolod Stakhov [Mon, 17 Nov 2025 14:43:36 +0000 (14:43 +0000)]
[Feature] Add lua_shape validation library as tableshape replacement
Implement comprehensive schema validation library with improved features:
* Better one_of error reporting with intersection analysis
* Schema-driven documentation generation with mixin tracking
* Rich type constraints (ranges, lengths, Lua patterns)
* First-class mixins with origin tracking for composition
* JSON Schema Draft 7 export for UCL validation
* Transform support with immutable semantics
* Pure Lua implementation with optional lpeg support
The library provides 4 core modules:
- core.lua: All type constructors, validation, and utilities
- registry.lua: Schema registration and reference resolution
- jsonschema.lua: JSON Schema export
- docs.lua: Documentation IR generation
Includes comprehensive test suite (44 tests, 119 assertions).
Designed to gradually replace tableshape across 22 modules.
Vsevolod Stakhov [Sat, 15 Nov 2025 15:53:17 +0000 (15:53 +0000)]
[Fix] Encode redirect URLs to handle unencoded spaces and special characters
This fixes issue #5525 where url_redirector fails when redirect Location
headers contain unencoded spaces or other special characters.
The http_parser_parse_url() function strictly requires percent-encoded URLs
per RFC 3986, but many servers send Location headers with unencoded spaces.
Changes:
- Add encode_url_for_redirect() function to percent-encode problematic characters
- Apply encoding to redirect Location headers before creating URL objects
- Preserve already-encoded sequences (no double-encoding)
- Log warnings for URLs that fail even after encoding
The fix is conservative - only encodes characters that http_parser rejects,
maintaining full backward compatibility with properly formatted URLs.
Vsevolod Stakhov [Sat, 15 Nov 2025 11:01:18 +0000 (11:01 +0000)]
[Fix] Rewrite lua_url_filter using available Lua string functions
- Pass URL as rspamd_text from C (for future optimizations)
- Convert to string in Lua (acceptable - called rarely on suspicious patterns)
- Use string.find() with string.char() for control character detection
- Use string.gsub() trick for counting @ signs
- Avoid non-existent memchr() method (not implemented for rspamd_text)
- Clean, simple implementation using standard Lua functions
Performance:
- Called only when C parser encounters suspicious patterns
- Conversion overhead acceptable given low frequency
- Future: can optimize with proper memspn functions if needed
Fixes:
- Runtime error: attempt to call method 'memchr' (a nil value)
- Luacheck warning: empty if branch
Vsevolod Stakhov [Sat, 15 Nov 2025 09:49:22 +0000 (09:49 +0000)]
[Test] Add comprehensive tests for URL deep processing
Unit tests (test/lua/unit/lua_url_filter.lua):
- filter_url_string basic validation (normal, long user, multiple @)
- filter_url with URL objects
- UTF-8 validation (ASCII, Cyrillic, Japanese, invalid)
- Custom filter registration and chaining
- Issue #5731 regression test (oversized user parsing)
Functional tests (test/functional/cases/001_merged/400_url_suspect.robot):
- Moved to 001_merged for shared setup/teardown
- Long user field (80 chars) - URL_USER_LONG
- Very long user field (300 chars) - URL_USER_VERY_LONG
- Numeric IP - URL_NUMERIC_IP
- Numeric IP with user - URL_NUMERIC_IP_USER
- Suspicious TLD - URL_SUSPICIOUS_TLD
- Multiple @ signs - URL_MULTIPLE_AT_SIGNS
- Normal URLs (no false positives)
- All tests verify R_SUSPICIOUS_URL backward compatibility
Vsevolod Stakhov [Fri, 14 Nov 2025 20:21:29 +0000 (20:21 +0000)]
[Fix] Complete lua_State parameter threading through codebase
- Add forward declaration in url.c for rspamd_url_lua_consult
- Add lua_State forward declaration in html_url.hxx
- Add lua_State parameter to html_append_tag_content
- Add lua_State parameter to html_process_img_tag
- Add lua_State parameter to html_process_link_tag
- Add lua_State parameter to html_url_is_phished
- Cast void* to lua_State* when calling HTML functions from task context
- Cast lua_State* to void* when calling C API from C++ functions
- All compilation errors resolved
- Build successful
Vsevolod Stakhov [Fri, 14 Nov 2025 20:06:39 +0000 (20:06 +0000)]
[Fix] Use void* for lua_state in public API
- Change lua_State* to void* in url.h public functions
- Fixes C compilation: struct lua_State and lua_State are distinct types in C
- Cast void* to lua_State* inside implementation (url.c)
- Updated: rspamd_url_parse(), rspamd_url_find_multiple()
- Updated: rspamd_web_parse() internal function
- Updated: url_callback_data structure
- Follows C convention: opaque pointers in public headers
Vsevolod Stakhov [Fri, 14 Nov 2025 19:24:26 +0000 (19:24 +0000)]
[Fix] Add forward declaration for lua_State in url.h
- Add 'struct lua_State;' forward declaration
- Fixes compilation errors in C files that include url.h but not lua.h
- Follows Rspamd convention (same pattern as other headers)
Vsevolod Stakhov [Fri, 14 Nov 2025 18:18:32 +0000 (18:18 +0000)]
[Feature] Pass lua_State through HTML URL processing
- Add lua_State parameter to html_process_url() and html_process_url_tag()
- Add lua_State parameter to html_check_displayed_url() and html_process_displayed_href_tag()
- Add lua_State parameter to html_process_query_url()
- Pass task->cfg->lua_state from html_process_input() to all URL processing functions
- All rspamd_url_parse() calls in HTML now have proper lua_State
- HTML URL processing now benefits from Lua filter consultation
- Completes lua_State plumbing - now universally available throughout URL processing chain
Vsevolod Stakhov [Fri, 14 Nov 2025 18:07:46 +0000 (18:07 +0000)]
[Refactor] Use enum and pass lua_State through url_callback_data
- Add enum rspamd_url_lua_filter_result for return values (ACCEPT/SUSPICIOUS/REJECT)
- Replace magic numbers (0/1/2) with enum constants throughout
- Add lua_state field to struct url_callback_data
- Pass lua_State through rspamd_url_find_multiple() chain
- Update all callers: task contexts pass task->cfg->lua_state, others pass NULL
- rspamd_url_trie_generic_callback_common now uses cb->lua_state
- More idiomatic C code with proper type safety
- lua_State now universally available where URL parsing happens
Vsevolod Stakhov [Fri, 14 Nov 2025 18:04:08 +0000 (18:04 +0000)]
[Feature] Wire C->Lua URL filter consultation through parser
- Add lua_State parameter to rspamd_url_parse() and rspamd_web_parse()
- Pass lua_State through entire parsing chain
- Call rspamd_url_lua_consult() at two critical points:
* Oversized user field (>max_email_user) - line 1205
* Multiple @ signs detected - line 1227
- Lua filter can now REJECT (abort), SUSPICIOUS (mark obscured), or ACCEPT
- Update all callers: pass task->cfg->lua_state when available, NULL otherwise
- HTML parser calls: pass NULL (no task context)
- URL extraction: pass NULL (callback data doesn't have task)
- Query URL parsing: pass task->cfg->lua_state (has task context)
- Completes two-level architecture: C consults Lua on ambiguous patterns
- Add rspamd_url_lua_consult() helper function in url.c
- Function calls lua_url_filter.filter_url_string() from C
- Returns ACCEPT/SUSPICIOUS/REJECT to guide C parser
- Add filter_url_string() function in lua_url_filter.lua
- Validates URL strings and rejects obvious garbage
- Checks: length, @ count, user field size, control chars, UTF-8
- Add TODO comment at oversized user field check (line 1204)
- Infrastructure ready, needs lua_State plumbing through call chain
- This completes the two-level architecture design
Vsevolod Stakhov [Fri, 14 Nov 2025 14:06:37 +0000 (14:06 +0000)]
[Fix] Update config comments to guide users to local.d
- Changed comments from 'Uncomment to enable' to 'To enable, add in local.d/url_suspect.conf:'
- Users should not edit shipped config files directly
- Follow Rspamd convention: use local.d/override.d for user customizations
- Updated all map parameter comments for consistency
- Clearer path structure: use local.d/maps/ subdirectory
Vsevolod Stakhov [Fri, 14 Nov 2025 13:56:39 +0000 (13:56 +0000)]
[Refactor] Simplify configuration by removing use_*_map flags
- Removed all use_pattern_map, use_range_map, use_tld_map, etc. flags
- Maps are now implicitly enabled if configured (not nil)
- Cleaner configuration: just uncomment the map parameter to enable
- Updated init_maps() to check map existence instead of enable flags
- Updated check functions to use maps if configured
- Simpler, more intuitive configuration approach
Vsevolod Stakhov [Fri, 14 Nov 2025 13:42:06 +0000 (13:42 +0000)]
[Cleanup] Remove example maps and add doc/ to gitignore
- Removed example map files from conf/maps.d/url_suspect/
- Added doc/ to .gitignore for transient documentation
- Added conf/maps.d/url_suspect/ to .gitignore for user-created maps
- Example maps and documentation belong in separate docs repository
- Users can create their own maps in conf/maps.d/url_suspect/ as needed
Vsevolod Stakhov [Fri, 14 Nov 2025 12:27:21 +0000 (12:27 +0000)]
[Feature] Add URL deep processing architecture
This commit implements a two-level URL processing system that addresses
issue #5731 and provides flexible URL analysis with multiple specific symbols.
Core changes:
* Modified src/libserver/url.c to handle oversized user fields (fixes #5731)
* Added lualib/lua_url_filter.lua - Fast library filter during parsing
* Added src/plugins/lua/url_suspect.lua - Deep inspection plugin
* Added conf/modules.d/url_suspect.conf - Plugin configuration
* Added conf/scores.d/url_suspect_group.conf - Symbol scores
Key features:
* No new C flags - uses existing URL flags (has_user, numeric, obscured, etc.)
* Works without maps - built-in logic for common cases
* 15+ specific symbols instead of generic R_SUSPICIOUS_URL
* Backward compatible - keeps R_SUSPICIOUS_URL working
* User extensible - custom filters and checks supported
Optional features:
* Example map files for advanced customization (disabled by default)
* Whitelist, pattern matching, TLD lists