[Feature] Add symbol categories for MetaDefender and VirusTotal
Implemented a category-based symbol system for hash lookup antivirus
scanners (MetaDefender and VirusTotal) to replace dynamic scoring:
- Added 4 symbol categories: CLEAN (-0.5), LOW (2.0), MEDIUM (5.0), HIGH (8.0)
- Replaced full_score_engines with threshold-based categorization (low_category, medium_category)
- Fixed symbol registration in antivirus.lua to use rule instead of config
- Updated cache format to preserve symbol category across requests
- Added backward compatibility for old cache format
- Added symbols registration and metric score assignment
- Updated configuration documentation with examples
The new system provides:
- Clear threat categorization instead of linear interpolation
- Proper symbol weights applied automatically
- Consistent behavior between MetaDefender and VirusTotal
- Cache that preserves symbol categories
[Fix] Add nil check for vault_data in show_handler
Prevent runtime errors when parsing Vault KV v2 responses if obj.data.data is nil.
This adds a safety check before accessing vault_data.selectors, consistent with
other handlers in the file (newkey_handler and roll_handler).
[Feature] Improve LLM prompt and add sender frequency tracking
* Update default prompt to reduce false positives on legitimate emails
- Explicitly recognize verification emails as legitimate
- Require MULTIPLE red flags for phishing classification
- Add guidance on known/frequent senders
* Add sender frequency detection in context
- Classify senders as: new, occasional, known, frequent
- Based on sender_counts from user context
- Passed to LLM via context snippet
* Prompt instructs LLM to reduce phishing score for known senders
* Helps avoid false positives on transactional/verification emails
[Feature] Improve GPT module with uncertain caching and server timeout
* Add GPT_UNCERTAIN symbol for caching uncertain classifications
- Cache results even when no consensus is reached
- Avoid repeated expensive LLM queries for borderline cases
- Set X-GPT-Reason header with detailed vote statistics
* Add server-side timeout support for OpenAI API requests
- New request_timeout parameter (optional, multiplied by 0.95)
- Only sent if explicitly configured (not all APIs support this)
- Accounts for connection setup and data transfer overhead
* Fix max_ham_prob initialization (was 0, now correctly 1.0)
* Add pcall protection for fold_header_with_encoding with raw fallback
* Improve error messages for token limit exceeded
* Add detailed logging for context snippets and consensus decisions
* Pass debug_module parameter to llm_context functions
[Feature] Add cache expiration timestamps to debug logs
* Show when cached data will expire in human-readable format
* Log expiration time both when caching and after successful write
* Helps with debugging cache TTL issues
[Feature] Add bidirectional context support for LLM
* Unify context for incoming and outgoing mail
* Same identity used for authenticated/local sender and recipient
* Follows replies module pattern for direction detection
* Make llm_context.lua module-agnostic with debug_module parameter
* Improve userdata handling (use :sub instead of string.sub)
* Add nil-safety to all debug logging calls
* Add cache expiration timestamps to context logs
[Fix] Add full Lua traceback to HTTP callback errors
Improved error diagnostics in lua_http_finish_handler by adding
rspamd_lua_traceback handler. Now shows complete call stack with
file names and line numbers when Lua HTTP callbacks fail, making
debugging much easier.
[Feature] Add user/domain context support for LLM-based classification
* Add llm_context.lua module for Redis-based conversation context
* Context features: sliding window, top senders, keywords, flagged phrases
* Use low-level word API (get_words('full')) with stop_word flags
* Flexible gating via maps/selectors (enable_map/enable_expression)
* Update context even when GPT condition not met (BAYES_SPAM/HAM)
* Add min_messages warm-up threshold to prevent weak context injection
* Configurable scope: user/domain/esld with TTL and sliding window
* [Feature] Archive module: Full support for encrypted ZIP archives with ZipCrypto and AES encryption
* [Feature] Archive module: Both reading and writing of AES-encrypted ZIP archives is supported
* [Feature] Archive module: Updated Lua bindings for libarchive
* [Feature] Encrypted maps: Support for encrypted maps to enable new distribution scenarios
* [Feature] Redis TLS: Configurable TLS connections in Redis backend
* [Feature] Map helpers alignment: Enforce 64-byte alignment to prevent unaligned memory access
* [Feature] Enhanced CLI for secretbox with additional security test coverage
* [Fix] MIME encoding: Major overhauls and multiple fixes for MIME encoding logic
* [Fix] MIME encoding: Improved handling and decoding of UTF-8 in MIME headers
* [Fix] Learning system: Numerous fixes to learn checks and autolearn flag handling
* [Fix] Learning system: Prevention of duplicate message learning
* [Fix] Learning system: Extended multiclass learning test coverage
* [Fix] Critical: Fixed bug when converting zero-length strings to numbers
* [Fix] Critical: Fixed XML prolog detection in lua_magic module
* [Fix] Build: Fixed build issues on 32-bit platforms
* [Fix] Compatibility: Improved compatibility with Lua versions above 5.1
* [Fix] Empty input: Addressed issues with empty input handling in lua_magic
* [Fix] Testing: Improved stability of automated testing with multiple test fixes
* [Fix] Minor compatibility improvements (buffer allocation, missing cmath include)
* Refactored rspamd_control_fill_msghdr to accept
a caller-provided control buffer, fixing the
lifetime bug where a pointer to a local array
was stored in msg_control.
* Replaced static buffers with automatic (stack)
buffers at the exact call sites of sendmsg/recvmsg,
so PowerPC and similar platforms won’t choke on
non-constant expressions.
- Removed g_strdup/g_free of TLS paths in src/lua/lua_redis.c.
- Now we:
- Keep TLS values (booleans + strings) on the Lua stack temporarily.
- Use an absolute table index (so gettable calls aren’t confused by
the growing stack).
- Call rspamd_redis_pool_connect_ext while those values are on the
stack.
- Pop all postponed values and then the table in one go immediately
after the connect call.
- The C++ pool still copies into std::string on element creation; we
only ensure Lua strings live through the call without extra
allocations.
- remove redundant `ensure_ssl_inited` function and calls. Core SSL init
should suffice.
- Refactor TLS initiation into `redis_pool_elt::initiate_tls(...)` to
eliminate duplication
- Switch TLS flags to `bool` in the public struct
- Fix ephemeral string usage in lua by duplicating the values into
locals and freeing after connect. Flags are boolean. (it's not super
likely that Lua will GC the strings before we connect to Redis, but
this ensures that it won't be a problem)
- Remove the redis TLS options propagation unit test
* [Conf] Add defaults
* [Conf] Fix JB IDE damage
* [Feature] Add a signal from main to workers for workers ready state
* [Feature] Add lua_util.fold_header_with_encoding
* [Feature] Add some convenience options to rspamc
* [Feature] Add some more OS utility functions
* [Feature] Add symbols proxy for piecewise changes
* [Feature] Allow lua callback maps to be filled line by line
* [Feature] Allow selectors in regexp maps expressions
* [Feature] Allow to pass expression flags in the regexp plugin
* [Feature] Detect part types in mime parser
* [Feature] Resolve DNS nameservers names using getaddrinfo
* [Fix] Bayes: Try to be bug-to-bug compatible
* [Fix] Check skip_hashes for the returned hashes
* [Fix] Fix DL lists initialisations
* [Fix] Fix double free in the client...
* [Fix] Fix end-to-end proxy compression
* [Fix] Fix l= calculations again
* [Fix] Fix lua state setting ambiguity
* [Fix] Fix order of descriptor closing
* [Fix] Fix probabilities overflow
* [Fix] Fix rules setup
* [Fix] Fix statfiles ordering
* [Fix] Fix various corner cases and tests
* [Fix] Fix whitelist options in the arc module
* [Fix] GPT: Fix occasional damage
* [Fix] GPT: fix processing of messages with no subject
* [Fix] Prevent WebUI crash with empty RRD
* [Fix] Store html attributes that are empty
* [Fix] Try to fix learned order
* [Fix] Use C++20 standard consistently to resolve ODR violations
* [Fix] Use a more straightforward approach for learn cache
* [Fix] fix error check in lua_dkim_tools.lua
* [Project] Add CTA analytics engine
* [Project] Add ability to create custom tokenizers for languages
* [Project] Add controller learn endpoints
* [Project] Add support of granular timeouts to plugins and maps
* [Project] Add tests and fix stuff
* [Project] Add tests for LLM provider, fix various issues with metatokens
* [Project] Apply changes to bayes_expiry plugin
* [Project] Create an isolated API for external tokenizers
* [Project] Extract more features from HTML messages
* [Project] Fix Lua API and some constexpr compatibility
* [Project] Fix binary classification and lua scripts
* [Project] Fix more calculation issues
* [Project] Fix other classification and learning issues
* [Project] Fix scoped compilation again
* [Project] Fix symbols finalisation
* [Project] Fix unlearn stuff
* [Project] Fix various issues
* [Project] Fix various other issues
* [Project] Further updates
* [Project] Implement backoff for upstreams revival
* [Project] Implement more flexible http timeouts
* [Project] Implement scoped compilation
* [Project] Implement scoped regexp cache system
* [Project] Multi-class classification project baseline
* [Project] Rework rspamc to allow training of different neural types
* [Project] Rework system of html tags to allow more tag types
* [Project] Rework tokenizers initialisation
* [Project] Some rework of the CTA defaults
* [Project] Start implementation of the rules maps
* [Project] Start to implement better revive strategy for upstreams
* [Project] Store regexp rules state to avoid incomplete/orphaned rules
* [Project] Support more common html attributes
* [Project] Take button weight into consideration
* [Project] Use re_cache scopes for maps
* [Rework] Fix logger format string mismatch
* [Rework] MIME detection via Lua Magic; enforce cfg in Lua task API
* [Rework] Return back N-ary optimizations for arithmetic-alike expressions
* [Rework] Use GLib agnostic type for words
* [Rework]Refactor MIME detection via Lua Magic; enforce cfg in Lua task API
* [Rules] Make bitcoin expression to use explicit flags
[Rework] MIME detection via Lua Magic; enforce cfg in Lua task API
- Add rspamd_mime_parser_config on cfg; remove global state and lazy init
- Initialize parser config once per cfg; preload lua_magic.detect_mime_part
- Always run detection after normal part parse; promote .eml/message parts
- Preserve detected_ext/detected_ct/detected_type and NO_TEXT flag
- Remove duplicate detection from message.c; add debug logs
- Restore CTE parsing API and fix call sites
- Enforce cfg requirement in rspamd_task.load_from_string/load_from_file/create
- Fix unit tests to pass rspamd_config to load_from_string
[Rework]Refactor MIME detection via Lua Magic; enforce cfg in Lua task API
- Add rspamd_mime_parser_config on cfg; remove global state and lazy init
- Initialize parser config once per cfg; preload lua_magic.detect_mime_part
- Always run detection after normal part parse; promote .eml/message parts
- Preserve detected_ext/detected_ct/detected_type and NO_TEXT flag
- Remove duplicate detection from message.c; add debug logs
- Restore CTE parsing API and fix call sites
- Enforce cfg requirement in rspamd_task.load_from_string/load_from_file/create
- Fix unit tests to pass rspamd_config to load_from_string