git.ipfire.org Git - thirdparty/rspamd.git/log

Merge pull request #5767 from fatalbanana/el10_gcc

Use GCC on EL10 again

Revert "[Minor] Use clang for build on EL10"

This reverts commit ac4c6ec4210b97886327f2c526957207ca9d8030.

Merge pull request #5766 from rspamd/vstakhov-multimap-selectors-combinators

[Feature] Add combinator option for multimap selector rules

[Feature] Add combinator option for multimap selector rules

This change adds support for structured data output from selectors in
multimap rules. Previously, selectors always produced concatenated
strings which made it impossible to send structured JSON data to
external map services.

New 'combinator' option for selector-type multimap rules:
- 'string' (default): concatenate results with delimiter (existing behavior)
- 'array': flatten all results into a flat array
- 'object': convert pairs of selectors into key-value object

Example configuration for external JSON API:
  multimap {
    MY_EXTERNAL_CHECK {
      type = "selector";
      selector = "id('from');from('smtp'):addr;id('ip');ip";
      combinator = "object";  # produces {"from": "...", "ip": "..."}
      map = {
        external = true;
        backend = "http://api.example.com/check";
        method = "body";
        encode = "json";
      };
    }
  }

Changes:
- lua_selectors: Added combinator registry and helper functions
  - get_combinator(name): returns combinator function by name
  - list_combinators(): returns available combinator names
  - create_selector_closure_with_combinator(): creates closure with named combinator
- multimap: Added 'combinator' option support for selector and redis+selector maps

Merge pull request #5764 from rspamd/vstakhov-composites-opt

[Project] Composites processing optimization

Merge pull request #5765 from moisseev/webui

[Fix] Update D3 libs: bug fixes & validation

[Feature] Add control protocol command for composites statistics

- Add RSPAMD_CONTROL_COMPOSITES_STATS command to control protocol
- Add /compositesstats endpoint to control socket
- Add 'rspamadm control compositesstats' command
- Aggregate statistics from all workers with per-worker breakdown
- Remove composites stats from controller /stat (use control socket instead)
- Statistics always collected, timing sampled 1/256 (configurable)

[Feature] Add composites statistics and config options

- Add config options: composites_inverted_index (default: true),
composites_stats_always (default: false for 1/256 sampling)
- Add EMA-based timing statistics using rspamd_counter_data
- Export stats via /stat endpoint in "composites" section:
checked_slow, checked_fast, matched counts and timing with mean/stddev
- Allow toggling inverted index for performance comparison

[Fix] Update D3 libs: bug fixes & validation

- D3Evolution 2.0.3 → 2.0.4:
  * Fix hover highlight regression after D3 v7 migration
  * Fix memory leaks in destroy() method
  * Optimize hover performance (replace DOM traversal)
  * Fix legend circle flickering on interaction
  * Fix potential negative index in cursor positioning

- rspamd-D3Pie 1.1.1 → 1.1.2:
  * Fix crash when rapidly updating with different datasets
  * Fix data array corruption during quick dataset changes
  * Add comprehensive options validation with constraints
  * Add ID format validation and duplicate label detection
  * Improve error handling and descriptive messages
  * Add donut mask support for inner radius > 0
  * Fix memory leaks with event handler cleanup
  * Improve transition interruption detection

[Feature] Precompute composite atom types at config time

Resolve ATOM_COMPOSITE vs ATOM_PLAIN for all composite atoms during
configuration phase instead of lazy evaluation at runtime. This
eliminates repeated hash lookups during expression evaluation.

- Add rspamd_composites_resolve_atom_types() function
- Call after process_dependencies() and before build_inverted_index()
- Sets comp_type and ncomp pointer for each atom upfront

[Fix] Copy expression string to memory pool for Lua composites

When composites are added via Lua API (rspamd_config:add_composite),
the expression string was not copied to the memory pool. The expression
parser stores pointers (atom->str) into the original string, which
became invalid after Lua garbage collected the string.

This caused the inverted index to extract garbage symbol names,
breaking composite evaluation for dynamically added composites
like MISSING_MID_ALLOWED and INVALID_MSGID_ALLOWED from mid.lua.

[Fix] Handle group matchers in composites inverted index

Composites that use group matchers (g:, g+:, g-:) cannot be
efficiently indexed because we don't know which symbols will
match until runtime. Add these composites to not_only_composites
list so they are always evaluated.

[Fix] Improve atom polarity detection in composites inverted index

Count NOT operations from atom to root instead of just checking direct
parent. This correctly handles nested negations like !(A & B) where
atoms A and B are both under negation even though their direct parent
is AND, not NOT.

- Even number of NOTs = positive atom (must be true)
- Odd number of NOTs = negative atom (must be false)

[Feature] Add bloom filter for fast negative symbol lookups

Add an inline bloom filter (1024 bits) to rspamd_scan_result structure
for O(1) negative lookups in rspamd_task_find_symbol_result().

This optimization benefits composites evaluation where most symbol
lookups are negative (symbol not present in results). The bloom filter
is updated when symbols are inserted and checked before the hash lookup.

For 50 symbols, the false positive rate is approximately 0.5%, meaning
99.5% of negative lookups will be rejected without hash table access.

[Feature] Add inverted index for composites optimization

Build an inverted index mapping symbol names to composites that contain
those symbols as positive (non-negated) atoms. This allows filtering out
composites that cannot possibly match during the first pass evaluation.

- Add rspamd_expression_atom_foreach_ex() to traverse expression atoms
  with access to AST nodes (needed to detect negated atoms)
- Add rspamd_expression_node_is_op() to check if a node is an operator
- Build inverted index in composites_manager during config processing
- Track composites with only negated atoms separately (they must always
  be evaluated)
- Use inverted index in composites_metric_callback for first pass to
  evaluate only potentially matching composites

For configurations with many composites (4000+), this reduces the number
of composites evaluated per message from all to only those that have at
least one matching symbol present.

Merge pull request #5761 from rspamd/vstakhov-urls-heuristics

[Feature] Add obfuscated URL detection to url_suspect plugin

Merge pull request #5763 from dragoangel/patch-16

Cover more possible use cases of task:set_milter_reply

Update milter reply settings in lua_task.c

[Feature] Add rspamd_util.decode_html_entities and improve obfuscated URL detection

- Add Lua binding for HTML entity decoding (rspamd_util.decode_html_entities)
wrapping rspamd_html_decode_entitles_inplace C function
- Switch obfuscated URL detection from regexp module to rspamd_trie
for Hyperscan-accelerated multi-pattern matching
- Fix URL flag passing (use url.create with flags table instead of add_flag)
- Fix inject_url usage (doesn't return value)
- Add functional tests for obfuscated URL detection

[Fix] Replace lua_util.table_len with #re_parts

[Feature] Add obfuscated URL detection to url_suspect plugin

Detect URLs hidden in message text using various obfuscation techniques:
- Spaced protocols (h t t p s : / /)
- hxxp variants (hxxp://)
- Bracket dots (example[.]com)
- Word dots (example dot com)
- HTML entities (. for dots)

Features:
- Hyperscan-based prefiltering for performance
- Normalization and URL extraction from obfuscated text
- URL injection with 'obscured' flag for further analysis
- Configurable via built-in settings or external maps
- DoS protection with strict limits

New symbol: URL_OBFUSCATED_TEXT (score: 5.0)

Merge pull request #5760 from moisseev/webui

[Test] Update dev dependencies

[Test] Update dev dependencies

- @stylistic/eslint-plugin: 5.5.0 → 5.6.1
- stylelint: 16.25.0 → 16.26.0
- js-yaml: 4.1.0 → 4.1.1 (fixes #5759)

Merge pull request #5758 from rspamd/vstakhov-suspect-url-fix

[Fix] Fix url_suspect plugin causing massive false positives

[Test] Isolate url_suspect tests with symbols_enabled

Use symbols_enabled setting to test only specific url_suspect symbols,
preventing interference from other rules like greylist.

[Performance] Optimize url_suspect for high URL volume messages

Performance improvements for messages with many URLs:

1. O(1) TLD lookups: Convert builtin_suspicious list to hash set on init,
   eliminates O(n*m) iteration (500k+ checks for 100k URLs × 5 TLDs)

2. Use rspamd_text for URL checks: get_text(true) returns opaque rspamd_text
   without string copying, use text:find() for RTL detection

3. Use rspamd_ip API: parse_addr() + is_local() for IP checks instead of
   pattern matching

4. Add max_urls limit (10000) for DoS protection

These optimizations significantly reduce memory allocation and CPU usage.

[Fix] Fix url_suspect plugin causing massive false positives

The url_suspect plugin had multiple critical issues:

1. R_SUSPICIOUS_URL triggered on every message with URLs, adding 25 points
   due to incorrect dynamic score usage (5.0 * 5.0 instead of 1.0 * 5.0)

2. Broken compat_mode inserted R_SUSPICIOUS_URL without URL info whenever
   ANY url check triggered, making it impossible to debug

3. Symbol names were unnecessarily configurable, adding complexity

4. url_suspect_group.conf was not included in groups.conf, so scores
   were not loaded at all

Fixed by:
- Removed R_SUSPICIOUS_URL and compat_mode completely
- Fixed all insert_result() calls to use 1.0 dynamic weight
- Made symbol names hardcoded constants
- Added url group to groups.conf with max_score = 9.0
- Cleaned up score configuration parameters

Merge pull request #5754 from rspamd/vstakhov-rework-tableshape

[Feature] Add lua_shape validation library as tableshape replacement

Merge pull request #5757 from rspamd/vstakhov-spf-flattener

[Feature] Add SPF flattening tool with macro preservation

[Fix] Prevent infinite loop in split_networks_into_chunks()

If a single IP network entry exceeds max_record_length (450 chars) on its own,
the code would enter an infinite loop. Added validation to check if individual
items can fit before attempting to add them to chunks. Items that are too large
are now skipped with a warning message.

[Feature] Add SPF flattening tool with macro preservation

- Add new 'spf-flatten' command to dns_tool for optimizing SPF records
- Introduce RSPAMD_SPF_FLAG_MACRO_UNRESOLVED flag to preserve SPF macros
- Prevent macro expansion when sender IP is unavailable (flatten mode)
- SPF elements with macros (exists:, a:, mx:, ptr:) now preserved correctly
- Add multiple output formats: default, json, compact (BIND-style)
- Optimize IP addresses by removing default /32 and /128 masks
- Automatically split large SPF records into multiple includes
- Preserve qualifiers and 'all' mechanism in flattened records

[Fix] Fix memory leak in custom tokenizer result handling

Free dynamically allocated kvec array after copying tokens to avoid
memory leak when using custom tokenizers with exceptions.

[Fix] Fix lua_shape registry to recursively resolve nested schemas

The registry's resolve_schema function was not recursively resolving
field schemas and opts.extra in table nodes, causing mixins in nested
one_of variants to never be expanded. This broke external_relay plugin
validation where rule variants with mixins were reported as having
unknown fields.

Now recursively resolves all nested schemas including field schemas,
opts.extra, and ensures mixins are properly expanded throughout the
entire schema tree.

[Fix] Fix test files to handle missing map env vars

- Add defensive checks in maps_kv.lua to skip map creation if env vars not set
- Ensure test config file is always saved to robot-save directory

Fixes issue where external_relay test would crash because maps_kv.lua tried
to create maps with nil URLs when RADIX_MAP/MAP_MAP/REGEXP_MAP env vars
were not defined.

[Fix] Preserve metatables in shallowcopy and save test configs

- Fix shallowcopy to preserve metatables when copying schema objects
- Ensure configdump output and input config are always saved to robot-save
even if configdump crashes with assertion failure

[Fix] Fix maybe_adjust_type callback return and add nil check

Fixed bug where maybe_adjust_type returned only mtype instead of data,mtype
when mtype=='callback'. Added defensive check to prevent passing nil/empty
URL to rspamd_config:add_map which would cause assertion failure.

[Rework] Refactor T.transform to validate input first

Changed T.transform to validate input type before applying transformer.
If transformer returns nil, treat as error. Output is not type-checked.
Updated all usages and tests.

[Fix] Fix external_relay mixins and confighelp

- Register external_relay_common mixin for documentation
- Fix confighelp to handle table data (doc objects)

[Fix] Fix schema usage in reputation and ratelimit plugins

- reputation: use :check() method instead of calling schema as function
- ratelimit: fix burst and rate transform schemas (inner describes result)

[Fix] Fix transform schemas - inner schema describes result

- Transform inner schema now describes the result type, not input
- Fixed lua_maps.lua: timeout transform
- Fixed lua_redis.lua: timeout, sentinel_watch_time, sentinel_master_maxerrors, redis_version
- Fixed rbl.lua: key transforms, return_bits number transforms
- Transform functions now handle type checking before conversion

[Fix] Fix RBL plugin transform schemas

- Transform inner schema now validates the result, not input
- Updated return_codes_schema and return_bits_schema
- Transform functions now handle type conversion properly

[Fix] Improve lua_shape error safety

- Transform functions wrapped in pcall to catch user errors
- Default value functions wrapped in pcall
- Pattern matching (string.match, lpeg.match) wrapped in pcall
- Unresolved references return validation errors instead of throwing
- Library now never throws Lua errors on invalid input

[Fix] Fix lua_shape transform logic

- Transform now validates original value before transformation
- Open tables now properly apply extra schema with transforms
- Fixes RBL plugin returncodes handling

[Fix] Fix external_map_schema call in lua_maps

[Project] Rework mixins and documentation part

[Minor] Add missing

[Project] Use plugins registry

[Minor] Update README

[Cleanup] Drop tableshape contrib library

[Fix] Fix lua_shape transform compatibility and test issues

Multiple fixes to make lua_shape fully compatible with tableshape:

1. Transform return values: :transform() now returns (value) on success,
   (nil, error) on failure - matching tableshape API. :check() still
   returns (bool, value_or_error).

2. Transform functions: Fix check_transform to pass only value to
   transform function, not (value, ctx). Transform functions expect
   single argument.

3. Test updates: Update all lua_shape unit tests to use new :transform()
   API with (val, err) instead of (ok, val).

4. Selector schema fixes:
   - header: Accept any string for flags, not just literals
   - specific_urls: Simplify boolean+string handling with single transform

All 783 tests now pass: 780 passed, 0 failed, 0 errors, 3 unassertive.

[Fix] Make lua_shape :transform() tableshape-compatible

Change :transform() return values to match tableshape behavior:
- Success: return value (not true, value)
- Failure: return nil, error (not false, error)

This is required for compatibility with lua_selectors check_args which
expects the tableshape return convention. The :check() method still
returns (bool, value_or_error) for new code.

[Minor] Migrate lualib/lua_selectors/extractors.lua to lua_shape

Replace tableshape with lua_shape in selector extractors (final migration).

Changes:
- ts.string, ts.number, ts.boolean → T.string(), T.number(), T.boolean()
- ts.array_of(x), ts.one_of({...}) → T.array(x), T.enum({...})
- ts.shape → T.table
- ts.pattern → T.string():with_pattern
- :is_optional() → :optional()

Complex schemas updated including specific_urls with nested options,
url_flags arrays, and header extractors with pattern matching.

[Minor] Migrate lualib/lua_selectors/transforms.lua to lua_shape

Replace tableshape with lua_shape in selector transforms.

Changes:
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- ts.string, ts.array_of(x) → T.string(), T.array(x)
- :is_optional() → :optional()

23 transform functions updated with new schema syntax for args_schema.

[Minor] Migrate lualib/plugins/rbl.lua to lua_shape

Replace tableshape with lua_shape in RBL plugin utilities.

Changes:
- ts.map_of(k, v) → T.table({}, {open=true, key=..., extra=...})
- ts.shape { ... } → T.table({ ... })
- ts.array_of(x) → T.array(x)
- ts.one_of { ... } → T.enum({ ... })
- ts.boolean, ts.string, ts.number → T.boolean(), T.string(), T.number()
- :is_optional() → :optional()
- Added documentation

Schemas:
- return_codes_schema: Map of symbols to IP patterns
- return_bits_schema: Map of symbols to bit numbers
- rule_schema: Complete RBL rule with 50+ configuration options

[Minor] Migrate lualib/plugins/ratelimit.lua to lua_shape

Replace tableshape with lua_shape in ratelimit utilities.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: bucket_schema for ratelimit bucket with burst/rate limits,
optional symbol, message, and skip flags.

[Minor] Migrate lualib/lua_selectors/common.lua to lua_shape

Replace tableshape with lua_shape in selectors common utilities.

Changes:
- ts.one_of { ... } → T.enum({ ... })
- :is_optional() → :optional()
- Added documentation to schema fields

Schema: digest_schema returns array with optional encoding format
and optional hash algorithm for selector digest operations.

[Fix] Update lua_mime test to use lua_shape API

Fix test to use lua_shape's :check() method instead of calling
schema as function (tableshape API).

Change:
- assert_true(schema(ucl_object))
+ local ok, err = schema:check(ucl_object)
+ assert_true(ok, err)

lua_shape schemas cannot be called as functions like tableshape.
They provide :check() and :transform() methods instead.

[Feature] Add T.callable() type to lua_shape

Add function/callable type validator to lua_shape core.

Implementation:
- check_callable(): Validates value is of type 'function'
- T.callable(opts): Constructor for callable type
- Returns type_mismatch error if value is not a function

This was missing from the initial implementation and is needed by
lua_meta.lua which validates callback functions in metafunction schemas.

Documentation added to README.md scalar types section.

[Minor] Migrate src/plugins/lua/reputation.lua to lua_shape

Replace tableshape with lua_shape in reputation plugin (largest).

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.array_of(x) → T.array(x)
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- ts.one_of(a, b) → T.one_of({named variants})
- :is_optional() → :optional()
- Uses lua_redis.enrich_schema for Redis backend
- Added comprehensive documentation to all fields

Schemas:
- generic_selector: selector-based reputation with whitelist/exclusion
- redis backend: prefix, expiry, and time buckets array
- dns backend: simple DNS list configuration

[Minor] Migrate src/plugins/lua/neural.lua to lua_shape

Replace tableshape with lua_shape in neural plugin.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.string, ts.number → T.string(), T.number()
- ts.array_of(x) → T.array(x)
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: redis_profile_schema for neural network profiles stored in
Redis, including digest, symbols, version, and optional distance.

[Minor] Migrate src/plugins/lua/milter_headers.lua to lua_shape

Replace tableshape with lua_shape in milter_headers plugin.

Changes:
- ts.shape({...}) → T.table({...})
- ts.array_of(x) → T.array(x)
- ts.string + ts.string / fn → T.one_of({T.array(), T.transform(T.string(), fn)})
- ts.map_of(k, v) → T.table({}, {open=true, extra=v})
- :is_optional() → :optional()
- extra_fields option → open=true table option
- Added comprehensive documentation to all fields

Schema: config_schema with flexible 'use' field accepting string or
array, optional flags, header lists, map schema, and custom headers.

[Minor] Migrate rules/controller/neural.lua to lua_shape

Replace tableshape with lua_shape in neural controller.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.array_of(ts.array_of(x)) → T.array(T.array(x))
- ts.string, ts.number → T.string(), T.number()
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: learn_request_schema for neural network training requests
with nested arrays for ham and spam training vectors.

[Minor] Migrate src/plugins/lua/bimi.lua to lua_shape

Replace tableshape with lua_shape in BIMI plugin.

Changes:
- Uses lua_redis.enrich_schema() which returns lua_shape with mixins
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: settings_schema with Redis common fields plus BIMI-specific
settings for helper service URL, timeouts, VMC-only mode, and staged
HTTP timeouts (connect, SSL, write, read).

[Minor] Migrate src/plugins/lua/contextal.lua to lua_shape

Replace tableshape with lua_shape in Contextal plugin.

Changes:
- Uses lua_redis.enrich_schema() which returns lua_shape with mixins
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.array_of(x) → T.array(x)
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: config_schema with Redis common fields plus Contextal-specific
settings for API URLs, caching, timeouts, and custom actions.

[Minor] Migrate src/plugins/lua/clustering.lua to lua_shape

Replace tableshape with lua_shape in clustering plugin.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.number, ts.string → T.number(), T.string()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: rule_schema with clustering configuration including max_elts,
expire times, score multipliers for spam/junk/ham, and selectors.

[Minor] Migrate src/plugins/lua/history_redis.lua to lua_shape

Replace tableshape with lua_shape in history_redis plugin.

Changes:
- Uses lua_redis.enrich_schema() which now returns lua_shape with mixins
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: settings_schema with Redis common fields (via mixin) plus
history-specific fields like key_prefix, nrows, compress, and
subject privacy options.

[Minor] Migrate src/plugins/lua/external_relay.lua to lua_shape

Replace tableshape with lua_shape in external relay plugin.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.map_of(ts.string, schema) → T.table({}, {open=true, extra=schema})
- ts.one_of with 5 strategy variants → T.one_of with named variants
- Created rule_common mixin for priority/symbol fields
- All 5 strategy variants (authenticated, count, local, hostname_map,
ip_map) use mixin for common fields
- Added comprehensive documentation

Schema: config_schema with rules map where each value is one of 5
strategy types, using mixins for field origin tracking.

[Minor] Migrate src/plugins/lua/aws_s3.lua to lua_shape

Replace tableshape with lua_shape in AWS S3 plugin.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.string, ts.boolean, ts.number → T.string(), T.boolean(), T.number()
- ts.number + ts.string / fn → T.one_of({T.number(), T.transform(T.string(), fn)})
- :is_optional() → :optional()
- Added comprehensive documentation to all fields

Schema: settings_schema for AWS S3 configuration including bucket,
region, credentials, timeout, compression, and content options.

[Minor] Migrate rules/controller/maps.lua to lua_shape

Replace tableshape with lua_shape in controller maps plugin.

Changes:
- ts.shape { ... } → T.table({ ... })
- ts.array_of(x) → T.array(x)
- :is_optional() → :optional()
- Added documentation to all fields and schema

Schema: query_json_schema for validating map query requests
with optional maps filter and report_misses flag.

[Feature] Add callable defaults support to lua_shape

Enhance lua_shape to support function values as defaults that are
evaluated dynamically at validation time, not at schema definition time.

Core changes:
- check_optional(): Check if default is a function and call it
- check_table(): Same for table field defaults
- Enables patterns like: T.string():with_default(get_timestamp)

lua_aws.lua cleanup:
Replace ugly patterns like:
  T.transform(T.one_of({T.string(), T.literal(nil)}),
    function(v) return v or 'GET' end)
With clean:
  T.string():with_default('GET')

For dynamic defaults (date field), use:
  T.string():with_default(today_canonical)  -- function ref, not call

Benefits:
- Much cleaner and more readable schemas
- Consistent with lua_shape design philosophy
- Dynamic defaults for timestamps, random values, etc.
- Static defaults for constants

All tests pass (44/44).

[Minor] Remove testing section from lua_shape README

Remove references to non-existent test files in repository root.
Tests are properly located in test/lua/unit/lua_shape.lua.

[Minor] Migrate lua_redis.lua to lua_shape with mixin tracking

Replace tableshape with lua_shape using first-class mixin system
to preserve origin tracking for documentation and better error reporting.

Key changes:
- common_schema: Now a proper T.table() schema (not plain table)
- Schema composition: Uses T.mixin() instead of lutil.table_merge()
- enrich_schema: Returns T.one_of() with 6 named variants, each using
mixins for redis_common and external fields
- All tables use { open = true } to allow additional fields
- Transform patterns: T.transform(T.number(), tostring) for type conversions
- Union types: T.one_of({...}) for alternatives like string | array

Benefits:
- Mixin origin tracking preserved in field metadata
- Documentation generation can show field sources
- Better error messages with mixin context
- Consistent with lua_shape design philosophy

[Minor] Migrate lua_mime.lua from tableshape to lua_shape

Replace tableshape with lua_shape in message_to_ucl_schema function:
- Convert ts.shape to T.table
- Convert :describe() to :doc({ summary = ... })
- Convert :is_optional() to :optional()
- Convert ts.array_of to T.array
- Convert ts.pattern to T.string({ pattern = ... })
- Convert ts.one_of to T.enum for simple value lists
- Convert ts.integer/string/boolean to T.integer()/string()/boolean()

Transforms nested schemas: headers_schema, part_schema, email_addr_schema,
envelope_schema. No functional changes, luacheck passes.

[Minor] Migrate lua_meta.lua from tableshape to lua_shape

Replace tableshape with lua_shape for metafunction schema:
- Convert ts.shape to T.table
- Convert ts.func to T.callable()
- Convert ts.array_of to T.array
- Convert :is_optional() to :optional()

No functional changes, luacheck passes.

[Minor] Migrate lua_maps.lua from tableshape to lua_shape

Replace tableshape with lua_shape for map configuration schemas:
- Convert ts.shape to T.table
- Convert ts.equivalent(true) to T.literal(true)
- Convert :is_optional() to :optional()
- Convert ts.one_of to T.one_of with proper braces
- Convert ts.enum with :is_optional() to T.enum():optional()
- Convert ts.array_of to T.array

Updates external_map_schema, direct_map_schema, and exports.map_schema.
No functional changes, luacheck passes.

[Minor] Migrate lua_fuzzy.lua from tableshape to lua_shape

Replace tableshape with lua_shape for fuzzy policy schemas:
- Convert ts.number + ts.string / tonumber to T.transform + T.one_of
- Convert ts.array_of to T.array
- Convert ts.boolean to T.boolean()
- Convert ts.shape to T.table with open option support

No functional changes, luacheck passes.

[Minor] Migrate lua_maps_expressions.lua from tableshape to lua_shape

Replace tableshape with lua_shape for maps expressions schema:
- Convert ts.shape to T.table
- Convert ts.array_of to T.array
- Keep reference to lua_maps.map_schema (will be migrated separately)

No functional changes, luacheck passes.

[Minor] Migrate lua_aws.lua from tableshape to lua_shape

Replace tableshape with lua_shape for AWS parameter validation:
- Convert ts.shape to T.table
- Replace ts.string + ts['nil'] / fn with T.transform + T.one_of
- Replace ts.map_of(ts.string, ts.string) with open table validation
- Update documentation comments to reflect new schema format

No functional changes, luacheck passes.

[Feature] Add lua_shape validation library as tableshape replacement

Implement comprehensive schema validation library with improved features:

* Better one_of error reporting with intersection analysis
* Schema-driven documentation generation with mixin tracking
* Rich type constraints (ranges, lengths, Lua patterns)
* First-class mixins with origin tracking for composition
* JSON Schema Draft 7 export for UCL validation
* Transform support with immutable semantics
* Pure Lua implementation with optional lpeg support

The library provides 4 core modules:
- core.lua: All type constructors, validation, and utilities
- registry.lua: Schema registration and reference resolution
- jsonschema.lua: JSON Schema export
- docs.lua: Documentation IR generation

Includes comprehensive test suite (44 tests, 119 assertions).
Designed to gradually replace tableshape across 22 modules.

Merge pull request #5744 from dragoangel/patch-15

chore: add missing semicolon in whitelist.conf

Merge pull request #5748 from rspamd/vstakhov-url-deep-processing

[Feature] URL deep processing architecture

[Fix] Encode redirect URLs to handle unencoded spaces and special characters

This fixes issue #5525 where url_redirector fails when redirect Location
headers contain unencoded spaces or other special characters.

The http_parser_parse_url() function strictly requires percent-encoded URLs
per RFC 3986, but many servers send Location headers with unencoded spaces.

Changes:
- Add encode_url_for_redirect() function to percent-encode problematic characters
- Apply encoding to redirect Location headers before creating URL objects
- Preserve already-encoded sequences (no double-encoding)
- Log warnings for URLs that fail even after encoding

The fix is conservative - only encodes characters that http_parser rejects,
maintaining full backward compatibility with properly formatted URLs.

Closes: #5525

[Minor] Address review thingies (mostly fp though)

[Fix] Fix tests failures

[Fix] Fix strings processing

[Fix] Rewrite lua_url_filter using available Lua string functions

- Pass URL as rspamd_text from C (for future optimizations)
- Convert to string in Lua (acceptable - called rarely on suspicious patterns)
- Use string.find() with string.char() for control character detection
- Use string.gsub() trick for counting @ signs
- Avoid non-existent memchr() method (not implemented for rspamd_text)
- Clean, simple implementation using standard Lua functions

Performance:
- Called only when C parser encounters suspicious patterns
- Conversion overhead acceptable given low frequency
- Future: can optimize with proper memspn functions if needed

Fixes:
- Runtime error: attempt to call method 'memchr' (a nil value)
- Luacheck warning: empty if branch

[Test] Add comprehensive tests for URL deep processing

Unit tests (test/lua/unit/lua_url_filter.lua):
- filter_url_string basic validation (normal, long user, multiple @)
- filter_url with URL objects
- UTF-8 validation (ASCII, Cyrillic, Japanese, invalid)
- Custom filter registration and chaining
- Issue #5731 regression test (oversized user parsing)

Functional tests (test/functional/cases/001_merged/400_url_suspect.robot):
- Moved to 001_merged for shared setup/teardown
- Long user field (80 chars) - URL_USER_LONG
- Very long user field (300 chars) - URL_USER_VERY_LONG
- Numeric IP - URL_NUMERIC_IP
- Numeric IP with user - URL_NUMERIC_IP_USER
- Suspicious TLD - URL_SUSPICIOUS_TLD
- Multiple @ signs - URL_MULTIPLE_AT_SIGNS
- Normal URLs (no false positives)
- All tests verify R_SUSPICIOUS_URL backward compatibility

Test messages (test/functional/messages/):
- url_suspect_long_user.eml (80-char user)
- url_suspect_very_long_user.eml (300-char user)
- url_suspect_numeric_ip.eml
- url_suspect_numeric_ip_user.eml
- url_suspect_bad_tld.eml
- url_suspect_multiple_at.eml
- url_suspect_normal.eml

Config:
- Enable url_suspect plugin in merged-override.conf
- Add Robot Framework outputs to gitignore

[Fix] Remove shadowing of rspamd_logger in rules/misc.lua

- Fixes luacheck W431 warning (shadowing upvalue)
- rspamd_logger already required at module level (line 24)
- Remove redundant local require inside function

[Fix] Complete lua_State parameter threading through codebase

- Add forward declaration in url.c for rspamd_url_lua_consult
- Add lua_State forward declaration in html_url.hxx
- Add lua_State parameter to html_append_tag_content
- Add lua_State parameter to html_process_img_tag
- Add lua_State parameter to html_process_link_tag
- Add lua_State parameter to html_url_is_phished
- Cast void* to lua_State* when calling HTML functions from task context
- Cast lua_State* to void* when calling C API from C++ functions
- All compilation errors resolved
- Build successful

[Fix] Update all rspamd_url_find_multiple callers with lua_state

- Add lua_state parameter to lua_task.c (inject_url_query_callback)
- Add lua_state parameter to message.c (Subject header URL extraction)
- Add lua_state parameter to lua_url.c (rspamd.url.all function)
- All callers pass task->cfg->lua_state or L depending on context
- Completes the lua_state parameter addition

[Fix] Use void* for lua_state in public API

- Change lua_State* to void* in url.h public functions
- Fixes C compilation: struct lua_State and lua_State are distinct types in C
- Cast void* to lua_State* inside implementation (url.c)
- Updated: rspamd_url_parse(), rspamd_url_find_multiple()
- Updated: rspamd_web_parse() internal function
- Updated: url_callback_data structure
- Follows C convention: opaque pointers in public headers

[Fix] Add missing rspamd_logger require in rules/misc.lua

- Fixes luacheck warning W113 (accessing undefined variable)
- rspamd_logger used in conditional registration logic

[Fix] Add forward declaration for lua_State in url.h

- Add 'struct lua_State;' forward declaration
- Fixes compilation errors in C files that include url.h but not lua.h
- Follows Rspamd convention (same pattern as other headers)

[Feature] Pass lua_State through HTML URL processing

- Add lua_State parameter to html_process_url() and html_process_url_tag()
- Add lua_State parameter to html_check_displayed_url() and html_process_displayed_href_tag()
- Add lua_State parameter to html_process_query_url()
- Pass task->cfg->lua_state from html_process_input() to all URL processing functions
- All rspamd_url_parse() calls in HTML now have proper lua_State
- HTML URL processing now benefits from Lua filter consultation
- Completes lua_State plumbing - now universally available throughout URL processing chain

[Refactor] Use enum and pass lua_State through url_callback_data

- Add enum rspamd_url_lua_filter_result for return values (ACCEPT/SUSPICIOUS/REJECT)
- Replace magic numbers (0/1/2) with enum constants throughout
- Add lua_state field to struct url_callback_data
- Pass lua_State through rspamd_url_find_multiple() chain
- Update all callers: task contexts pass task->cfg->lua_state, others pass NULL
- rspamd_url_trie_generic_callback_common now uses cb->lua_state
- More idiomatic C code with proper type safety
- lua_State now universally available where URL parsing happens

[Feature] Wire C->Lua URL filter consultation through parser

- Add lua_State parameter to rspamd_url_parse() and rspamd_web_parse()
- Pass lua_State through entire parsing chain
- Call rspamd_url_lua_consult() at two critical points:
* Oversized user field (>max_email_user) - line 1205
* Multiple @ signs detected - line 1227
- Lua filter can now REJECT (abort), SUSPICIOUS (mark obscured), or ACCEPT
- Update all callers: pass task->cfg->lua_state when available, NULL otherwise
- HTML parser calls: pass NULL (no task context)
- URL extraction: pass NULL (callback data doesn't have task)
- Query URL parsing: pass task->cfg->lua_state (has task context)
- Completes two-level architecture: C consults Lua on ambiguous patterns

[Feature] Add C->Lua URL filter consultation infrastructure

- Add rspamd_url_lua_consult() helper function in url.c
- Function calls lua_url_filter.filter_url_string() from C
- Returns ACCEPT/SUSPICIOUS/REJECT to guide C parser
- Add filter_url_string() function in lua_url_filter.lua
- Validates URL strings and rejects obvious garbage
- Checks: length, @ count, user field size, control chars, UTF-8
- Add TODO comment at oversized user field check (line 1204)
- Infrastructure ready, needs lua_State plumbing through call chain
- This completes the two-level architecture design

[Fix] Update config comments to guide users to local.d

- Changed comments from 'Uncomment to enable' to 'To enable, add in local.d/url_suspect.conf:'
- Users should not edit shipped config files directly
- Follow Rspamd convention: use local.d/override.d for user customizations
- Updated all map parameter comments for consistency
- Clearer path structure: use local.d/maps/ subdirectory

[Refactor] Simplify configuration by removing use_*_map flags

- Removed all use_pattern_map, use_range_map, use_tld_map, etc. flags
- Maps are now implicitly enabled if configured (not nil)
- Cleaner configuration: just uncomment the map parameter to enable
- Updated init_maps() to check map existence instead of enable flags
- Updated check functions to use maps if configured
- Simpler, more intuitive configuration approach

[Cleanup] Remove example maps and add doc/ to gitignore

- Removed example map files from conf/maps.d/url_suspect/
- Added doc/ to .gitignore for transient documentation
- Added conf/maps.d/url_suspect/ to .gitignore for user-created maps
- Example maps and documentation belong in separate docs repository
- Users can create their own maps in conf/maps.d/url_suspect/ as needed