]> git.ipfire.org Git - thirdparty/rspamd.git/log
thirdparty/rspamd.git
15 hours ago[Fix] cryptobox: properly bypass RHEL/CentOS 10+ crypto-policies for SHA-1 DKIM master
Vsevolod Stakhov [Thu, 29 Jan 2026 17:57:23 +0000 (17:57 +0000)] 
[Fix] cryptobox: properly bypass RHEL/CentOS 10+ crypto-policies for SHA-1 DKIM

RHEL/CentOS 10+ disables SHA-1 signatures via the rh-allow-sha1-signatures
config option, which is not bypassed by simply creating a new OSSL_LIB_CTX.

This fix:
- Creates a temporary OpenSSL config file with rh-allow-sha1-signatures=yes
- Loads it into the dedicated SHA-1 library context via OSSL_LIB_CTX_load_config
- Improves error messages to include algorithm name and RHEL-specific hints
- Captures OpenSSL error details when EVP_PKEY_verify fails
- Adds troubleshooting guidance in error messages

On non-RHEL systems, the config option is simply ignored.

16 hours ago[Fix] lua_hs_cache: add defensive checks for zstd_decompress
Vsevolod Stakhov [Thu, 29 Jan 2026 16:53:59 +0000 (16:53 +0000)] 
[Fix] lua_hs_cache: add defensive checks for zstd_decompress

Add type checking before decompression to catch unexpected data types
from Redis. Wrap zstd_decompress in pcall to gracefully handle any
errors and provide detailed diagnostic logging when failures occur.

16 hours ago[Fix] re_cache: stop timer during async operations to prevent re-entry
Vsevolod Stakhov [Thu, 29 Jan 2026 16:49:20 +0000 (16:49 +0000)] 
[Fix] re_cache: stop timer during async operations to prevent re-entry

Stop the timer before calling exists_async and save_async to prevent
the timer from firing multiple times while an async callback is pending.
Without this, the repeating timer (0.1s interval) could fire again before
the async operation completes, causing multiple concurrent async calls
for the same re_class. This led to race conditions and use-after-free
crashes when callbacks completed out of order.

16 hours ago[Fix] re_cache: initialize async context to prevent random callback skipping
Vsevolod Stakhov [Thu, 29 Jan 2026 16:35:35 +0000 (16:35 +0000)] 
[Fix] re_cache: initialize async context to prevent random callback skipping

Use g_malloc0 instead of g_malloc when allocating rspamd_re_cache_async_ctx
to ensure callback_processed flag is initialized to FALSE. Without this,
the flag could randomly be TRUE (from uninitialized memory), causing the
exists/save callbacks to be silently skipped and preventing re_class
compilation from proceeding after cache misses.

17 hours ago[Fix] hs_helper: fix use-after-free in Redis async cache callbacks
Vsevolod Stakhov [Thu, 29 Jan 2026 15:58:41 +0000 (15:58 +0000)] 
[Fix] hs_helper: fix use-after-free in Redis async cache callbacks

Remove dangerous ev_run(EVRUN_NOWAIT) calls from inside Redis callback
chains in rspamd_hs_helper_mp_exists_cb and rspamd_hs_helper_remap_exists_cb.

Calling ev_run() inside a callback can trigger Lua GC which may try to
finalize lua_redis userdata while we're still processing the callback,
causing lua_redis_gc to access already-freed memory.

Also add missing REF_RELEASE calls at the end of both callbacks to properly
release the reference from exists_async (matching the exists==true path).

17 hours agoRevert "[Fix] lua_redis: add defensive check in GC handler"
Vsevolod Stakhov [Thu, 29 Jan 2026 15:19:46 +0000 (15:19 +0000)] 
Revert "[Fix] lua_redis: add defensive check in GC handler"

This reverts commit e753e063c2102d920b8b2ea15f675de04b1aaa99.

17 hours ago[Fix] lua_redis: add defensive check in GC handler
Vsevolod Stakhov [Thu, 29 Jan 2026 15:17:15 +0000 (15:17 +0000)] 
[Fix] lua_redis: add defensive check in GC handler

Add validation in lua_redis_gc to check that the context pointer
appears valid before releasing. This prevents crashes when Lua GC
collects stale userdata pointing to already-freed memory.

17 hours ago[Fix] hs_helper: fix use-after-free in async hyperscan cache callbacks
Vsevolod Stakhov [Thu, 29 Jan 2026 15:16:59 +0000 (15:16 +0000)] 
[Fix] hs_helper: fix use-after-free in async hyperscan cache callbacks

Add proper refcounting to async compilation contexts to prevent
use-after-free when Redis callbacks are invoked multiple times
(timeout + response) or during worker termination.

- Add ref_entry_t to async context structures
- Use REF_RETAIN before async operations and REF_RELEASE in callbacks
- Add callback_processed flag to prevent double processing
- Save entry data before ev_run that might free pending arrays

19 hours agoMerge pull request #5863 from moisseev/full-hashes
Vsevolod Stakhov [Thu, 29 Jan 2026 13:30:20 +0000 (13:30 +0000)] 
Merge pull request #5863 from moisseev/full-hashes

[Feature] WebUI: Add fuzzy hash copy and delist buttons

19 hours ago[Minor] WebUI: Deduplicate fuzzy hashes in multipart messages 5863/head
Alexander Moisseev [Thu, 29 Jan 2026 13:19:55 +0000 (16:19 +0300)] 
[Minor] WebUI: Deduplicate fuzzy hashes in multipart messages

22 hours ago[Fix] cryptobox: remove redundant obj_mac.h include
Vsevolod Stakhov [Thu, 29 Jan 2026 10:52:03 +0000 (10:52 +0000)] 
[Fix] cryptobox: remove redundant obj_mac.h include

NID_sha1 is already available through evp.h -> objects.h -> obj_mac.h
include chain, so explicit include is unnecessary.

22 hours ago[Fix] cryptobox: bypass RHEL/CentOS 10 crypto-policies for SHA-1 DKIM verification
Vsevolod Stakhov [Thu, 29 Jan 2026 10:41:15 +0000 (10:41 +0000)] 
[Fix] cryptobox: bypass RHEL/CentOS 10 crypto-policies for SHA-1 DKIM verification

RHEL/CentOS 10+ crypto-policies disable SHA-1 for signatures by default,
causing rsa-sha1 DKIM verification to fail. This is problematic as many
legitimate emails still use rsa-sha1 DKIM signatures.

For OpenSSL 3.0+, create a dedicated OSSL_LIB_CTX that bypasses system
crypto-policies specifically for SHA-1 DKIM signature verification.
SHA-256/SHA-512 verifications continue using the normal system context.

The legacy context is lazily initialized only when SHA-1 verification
is needed, avoiding overhead for modern rsa-sha256 signatures.

38 hours ago[Fix] re_cache: detect and handle stale hyperscan files with mismatched regexp IDs
Vsevolod Stakhov [Wed, 28 Jan 2026 18:44:18 +0000 (18:44 +0000)] 
[Fix] re_cache: detect and handle stale hyperscan files with mismatched regexp IDs

When the re_class hash doesn't include global regexp indices, regexps can
be reordered while the class hash stays the same. This causes old hyperscan
files to be loaded with stale position IDs that point to wrong regexps
(possibly in different re_classes with no hyperscan loaded), leading to
assertion failures when hs_scratch is NULL.

Fix:
- Reset match_type to PCRE during cleanup before freeing hs_ids, preventing
  stale HYPERSCAN flags if reload fails
- Validate that each stored ID points to a regexp belonging to the current
  re_class before setting match_type
- Delete stale hyperscan files to trigger recompilation by hs_helper
- For Redis cache, stale entries will expire or be overwritten

Both file-based and Redis-based loading paths are fixed.

44 hours ago[Fix] Fix EVP_PKEY_CTX memory leak in DKIM RSA signing
Vsevolod Stakhov [Wed, 28 Jan 2026 12:51:15 +0000 (12:51 +0000)] 
[Fix] Fix EVP_PKEY_CTX memory leak in DKIM RSA signing

The EVP_PKEY_CTX allocated in rspamd_dkim_sign() for RSA key signing
was never freed, causing continuous memory growth when using DKIM/ARC
signing with RSA keys.

Add EVP_PKEY_CTX_free() calls in all error paths and after successful
signing to properly release the OpenSSL context.

Fixes: #5865
46 hours ago[Fix] url_suspect: extract TLD from eSLD for suspicious TLD check
Vsevolod Stakhov [Wed, 28 Jan 2026 11:08:49 +0000 (11:08 +0000)] 
[Fix] url_suspect: extract TLD from eSLD for suspicious TLD check

The get_tld() function returns eSLD (e.g., "phishing.tk"), not the TLD
suffix. Extract the actual TLD by removing the first label.

Also add suspicious_tlds_map to test config since the override replaces
the default url_suspect configuration.

46 hours ago[Fix] hs_helper: defer next multipattern compilation to prevent crash after Redis...
Vsevolod Stakhov [Wed, 28 Jan 2026 10:49:10 +0000 (10:49 +0000)] 
[Fix] hs_helper: defer next multipattern compilation to prevent crash after Redis timeout

When a Redis timeout occurs during HS cache save operation, the error callback
immediately starts processing the next multipattern while still inside the
timeout handler. This leaves the connection in an inconsistent state, causing
a crash (SIGSEGV in hs_compile_multi) when compiling the next multipattern.

Fix by deferring the next operation to the next event loop iteration using
ev_timer with 0 timeout. This ensures the error handling completes fully
before starting the next operation.

2 days ago[Fix] Redis hyperscan cache: use write_servers for store/delete operations
Vsevolod Stakhov [Wed, 28 Jan 2026 09:05:25 +0000 (09:05 +0000)] 
[Fix] Redis hyperscan cache: use write_servers for store/delete operations

The redis_backend was missing is_write=true in attrs for store and delete
operations, causing lua_redis.request to use read_servers instead of
write_servers. This resulted in READONLY errors when read and write servers
are configured separately.

2 days ago[Feature] Replace builtin_suspicious TLDs with map-based configuration
Vsevolod Stakhov [Wed, 28 Jan 2026 08:56:28 +0000 (08:56 +0000)] 
[Feature] Replace builtin_suspicious TLDs with map-based configuration

Convert hardcoded suspicious TLDs list to a proper map file following
rspamd's standard map loading pattern with fallback support.

Changes:
- Add conf/maps.d/suspicious_tlds.inc with default TLDs (.tk, .ml, .ga, .cf, .gq)
- Update url_suspect.conf to use fallback+file:// pattern for user overrides
- Update url_suspect.lua to load TLDs via rspamd_map_add_from_ucl()

Users can now:
- Override entirely: create local.d/maps.d/suspicious_tlds.inc
- Extend defaults: create local.d/maps.d/suspicious_tlds.inc.local
- Disable: set suspicious_tlds_map = null in local.d/url_suspect.conf

Supersedes #5864 - the map-based approach inherently handles nil/missing
config gracefully, making the type check unnecessary.

2 days ago[Minor] WebUI: Fix fuzzy hash handling edge cases
Alexander Moisseev [Tue, 27 Jan 2026 15:32:31 +0000 (18:32 +0300)] 
[Minor] WebUI: Fix fuzzy hash handling edge cases

- Fix hash collision when multiple hashes share same prefix
- Add bounds checking for fuzzy hash array access
- Add error handling for malformed JSON in hash data
- Remove unused parameter from generateFuzzyActions

2 days ago[Feature] WebUI: Add fuzzy hash copy and delist buttons
Alexander Moisseev [Tue, 27 Jan 2026 10:16:16 +0000 (13:16 +0300)] 
[Feature] WebUI: Add fuzzy hash copy and delist buttons

Add UI controls for managing fuzzy hashes in History and Scan tables:
- Copy button to copy full hashes to clipboard (newline-separated)
- Delist button to open bl.rspamd.com removal page with hashes
- Buttons are disabled (with tooltips) when hashes are unavailable
- Hashes are searchable via filter input

3 days ago[Fix] Simplify parse_sa_regexp by delegating to rspamd_regexp.create
Vsevolod Stakhov [Mon, 26 Jan 2026 17:48:55 +0000 (17:48 +0000)] 
[Fix] Simplify parse_sa_regexp by delegating to rspamd_regexp.create

Remove unnecessary flag parsing and inline modifier transform that
caused issues with invalid PCRE flags (g, u). rspamd_regexp.create
already handles /pattern/flags and m{pattern}flags formats natively.

Closes: #5858
3 days ago[Fix] GPT plugin: explicitly set POST method for API requests
Vsevolod Stakhov [Mon, 26 Jan 2026 16:40:19 +0000 (16:40 +0000)] 
[Fix] GPT plugin: explicitly set POST method for API requests

Fixes #5859

Some API providers (like Ollama) strictly require POST method on their
endpoints and return 405 Method Not Allowed for GET requests. While
rspamd_http auto-detects POST when a body is present, explicitly setting
the method ensures correct behavior in all cases.

3 days ago[Fix] Clear pending multipatterns on config reload to prevent use-after-free
Vsevolod Stakhov [Mon, 26 Jan 2026 16:30:18 +0000 (16:30 +0000)] 
[Fix] Clear pending multipatterns on config reload to prevent use-after-free

After SIGHUP reload, the global pending_compilations queue retained
stale multipattern pointers from the freed old config. When hs_helper
processed the queue, it accessed freed memory causing heap-buffer-overflow
in rspamd_multipattern_get_npatterns().

Add rspamd_multipattern_clear_pending() alongside the existing
rspamd_regexp_map_clear_pending() call before releasing old config.

3 days ago[Minor] WebUI: Export copyToClipboard for reuse across modules
Alexander Moisseev [Mon, 26 Jan 2026 13:39:24 +0000 (16:39 +0300)] 
[Minor] WebUI: Export copyToClipboard for reuse across modules

- Move from local function to ui.copyToClipboard
- Support modal (fixed positioning) and non-modal (absolute) contexts
- Simplify implementation using textarea.remove() and opacity

3 days agoMerge pull request #5860 from rspamd/vstakhov-fuzzy-history
Vsevolod Stakhov [Mon, 26 Jan 2026 13:47:00 +0000 (13:47 +0000)] 
Merge pull request #5860 from rspamd/vstakhov-fuzzy-history

[Feature] Display matched fuzzy hashes in WebUI history

4 days ago[Feature] Store matched fuzzy hashes in Redis history 5860/head
Vsevolod Stakhov [Sat, 24 Jan 2026 16:33:17 +0000 (16:33 +0000)] 
[Feature] Store matched fuzzy hashes in Redis history

Add fuzzy_hashes array to history entries by retrieving matched
hashes from task mempool in history_redis plugin.

6 days agoMerge pull request #5831 from dragoangel/patch-15
Vsevolod Stakhov [Fri, 23 Jan 2026 10:46:45 +0000 (10:46 +0000)] 
Merge pull request #5831 from dragoangel/patch-15

[Neural] Add option to skip training if store_set_only is true

6 days agoMerge pull request #5855 from dragoangel/patch-16
Vsevolod Stakhov [Fri, 23 Jan 2026 09:46:42 +0000 (09:46 +0000)] 
Merge pull request #5855 from dragoangel/patch-16

Fix learn_mode typo in neural.lua

6 days agoMerge pull request #5856 from rspamd/dependabot/pip/contrib/neural-embedding-service...
Vsevolod Stakhov [Fri, 23 Jan 2026 09:46:30 +0000 (09:46 +0000)] 
Merge pull request #5856 from rspamd/dependabot/pip/contrib/neural-embedding-service/transformers-4.53.0

Bump transformers from 4.40.0 to 4.53.0 in /contrib/neural-embedding-service

7 days agoFix after merging master 5831/head
Dmitriy Alekseev [Thu, 22 Jan 2026 21:14:42 +0000 (22:14 +0100)] 
Fix after merging master

7 days agoMerge branch 'master' into patch-15
Dmitriy Alekseev [Thu, 22 Jan 2026 21:12:07 +0000 (22:12 +0100)] 
Merge branch 'master' into patch-15

7 days agofix after master merge 5855/head
Dmitriy Alekseev [Thu, 22 Jan 2026 21:09:47 +0000 (22:09 +0100)] 
fix after master merge

7 days agoMerge branch 'master' into patch-16
Dmitriy Alekseev [Thu, 22 Jan 2026 21:02:27 +0000 (22:02 +0100)] 
Merge branch 'master' into patch-16

7 days agoBump transformers in /contrib/neural-embedding-service 5856/head
dependabot[bot] [Thu, 22 Jan 2026 21:01:55 +0000 (21:01 +0000)] 
Bump transformers in /contrib/neural-embedding-service

Bumps [transformers](https://github.com/huggingface/transformers) from 4.40.0 to 4.53.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.40.0...v4.53.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.53.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
7 days agoMerge pull request #5835 from rspamd/vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Thu, 22 Jan 2026 21:00:55 +0000 (21:00 +0000)] 
Merge pull request #5835 from rspamd/vstakhov-llm-embedding-improvements

Add expression-based autolearn for neural LLM providers

7 days ago[Fix] Use versioned key for hybrid LLM+symbols manual training 5835/head
Vsevolod Stakhov [Thu, 22 Jan 2026 19:48:30 +0000 (19:48 +0000)] 
[Fix] Use versioned key for hybrid LLM+symbols manual training

Pending key is now only used for LLM-only mode where embedding
dimensions may vary. Hybrid (LLM+symbols) and symbols-only modes
use versioned key directly since dimension includes stable symbols.

7 days ago[Fix] Use versioned key for manual training in symbols-only mode
Vsevolod Stakhov [Thu, 22 Jan 2026 18:36:26 +0000 (18:36 +0000)] 
[Fix] Use versioned key for manual training in symbols-only mode

Manual training via ANN-Train header now writes to versioned key when
no LLM provider is configured. The pending key is only used with LLM
providers where embedding dimensions may vary between versions.

7 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Thu, 22 Jan 2026 18:22:06 +0000 (18:22 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

7 days agoMerge branch 'master' into patch-15
Dmitriy Alekseev [Thu, 22 Jan 2026 18:20:49 +0000 (19:20 +0100)] 
Merge branch 'master' into patch-15

7 days agoFixy learn_mode typo in neural.lua
Dmitriy Alekseev [Thu, 22 Jan 2026 18:19:49 +0000 (19:19 +0100)] 
Fixy learn_mode typo in neural.lua

7 days ago[Fix] Match fuzzy_check.c hash generation in text_part:get_fuzzy_hashes
Vsevolod Stakhov [Thu, 22 Jan 2026 15:35:09 +0000 (15:35 +0000)] 
[Fix] Match fuzzy_check.c hash generation in text_part:get_fuzzy_hashes

Fix text_part:get_fuzzy_hashes() to produce identical hashes as the
fuzzy_check plugin's fuzzy_cmd_from_text_part():

- For short text (<32 words): hash utf_stripped_content directly instead
  of individual words, and optionally include subject
- For normal text: skip words with RSPAMD_WORD_FLAG_SKIPPED flag or
  empty stems

Add optional subject parameter to include in short text hash calculation
(matches fuzzy_check.c behavior with no_subject=false).

Update rspamadm mime stat to pass subject to get_fuzzy_hashes().

7 days ago[Fix] Stop HTTP watchers before error handlers
Vsevolod Stakhov [Thu, 22 Jan 2026 13:40:20 +0000 (13:40 +0000)] 
[Fix] Stop HTTP watchers before error handlers

7 days ago[Feature] Put subject first in LLM embedding input
Vsevolod Stakhov [Thu, 22 Jan 2026 11:51:53 +0000 (11:51 +0000)] 
[Feature] Put subject first in LLM embedding input

Subject is highly valuable for spam detection and placing it first
ensures it's always included even if text content gets truncated
by model token limits.

7 days ago[Feature] Rename neural autolearn options to match RBL module naming
Vsevolod Stakhov [Thu, 22 Jan 2026 11:21:52 +0000 (11:21 +0000)] 
[Feature] Rename neural autolearn options to match RBL module naming

Rename check_local/check_authed to exclude_local/exclude_users for
consistency with RBL module conventions. Change exclude_users default
to true (authenticated users excluded by default).

8 days agoUpdate neural.lua
Dmitriy Alekseev [Wed, 21 Jan 2026 15:05:06 +0000 (16:05 +0100)] 
Update neural.lua

8 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Wed, 21 Jan 2026 13:39:51 +0000 (13:39 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

8 days agoMerge pull request #5853 from rspamd/vstakhov-content-urls-rework
Vsevolod Stakhov [Wed, 21 Jan 2026 13:39:35 +0000 (13:39 +0000)] 
Merge pull request #5853 from rspamd/vstakhov-content-urls-rework

[Feature] Include content URLs by default in URL API calls

8 days ago[Test] Set include_content_urls = false for functional tests 5853/head
Vsevolod Stakhov [Wed, 21 Jan 2026 13:16:52 +0000 (13:16 +0000)] 
[Test] Set include_content_urls = false for functional tests

Preserve backward compatibility in tests by using the old default
behavior (exclude content URLs).

8 days ago[Feature] Include content URLs by default in URL API calls
Vsevolod Stakhov [Wed, 21 Jan 2026 09:57:54 +0000 (09:57 +0000)] 
[Feature] Include content URLs by default in URL API calls

- Add `include_content_urls` global option (default: true) to control
  whether URLs extracted from content (PDF, etc.) are included in API calls
- Update task:get_urls(), task:get_emails() to include content URLs by default
- Update lua_util.extract_specific_urls() to use config default when
  need_content is not explicitly specified
- Mark URLs extracted from computed/virtual parts (PDF text) with CONTENT
  flag instead of FROM_TEXT flag, since they may be clickable links
- Add commented documentation in conf/options.inc

Users who want the old behavior can set `include_content_urls = false`
in their options configuration.

9 days ago[Feature] Add order-independent table digest using XXH3 XOR accumulation
Vsevolod Stakhov [Wed, 21 Jan 2026 08:57:13 +0000 (08:57 +0000)] 
[Feature] Add order-independent table digest using XXH3 XOR accumulation

Add rspamd_cryptobox.fast_hash64() C function that returns XXH3 hash as
two 32-bit integers, enabling XOR accumulation for order-independent
hashing in Lua.

Add lua_util.unordered_table_digest() that produces consistent digests
regardless of table iteration order. This fixes issues where different
Rspamd instances produced different ANN digests for identical configs
due to non-deterministic key ordering in pairs().

The original table_digest had two bugs:
- Used pairs() which iterates in undefined order across Lua VMs
- Ignored numeric and boolean values in the hash

Update neural plugin's providers_config_digest to use the new function,
fixing the "providers config changed" warnings on identical configs.

Also update lua_maps and lua_urls_compose cache key generation to use
unordered_table_digest for more reliable cache hits.

9 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Wed, 21 Jan 2026 08:19:21 +0000 (08:19 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

9 days ago[Fix] Clear pending regexp maps on config reload to prevent use-after-free
Vsevolod Stakhov [Tue, 20 Jan 2026 21:41:15 +0000 (21:41 +0000)] 
[Fix] Clear pending regexp maps on config reload to prevent use-after-free

During HUP-triggered config reload, the pending_regexp_maps array retained
pointers to re_map objects from the old config after they were freed. When
workers received "regexp map loaded" notifications, they accessed freed memory
(visible as 0x5A poison pattern in re_digest), causing SIGSEGV.

Fix by calling rspamd_regexp_map_clear_pending() before releasing the old
config in reread_config().

9 days ago[Fix] Fix race condition between I/O handler and SIGCHLD in subprocess
Vsevolod Stakhov [Tue, 20 Jan 2026 16:54:43 +0000 (16:54 +0000)] 
[Fix] Fix race condition between I/O handler and SIGCHLD in subprocess

The subprocess callback could crash when SIGCHLD handler ran concurrently
with the I/O handler processing large training results. The race:

1. I/O handler receives full data, calls callback
2. SIGCHLD fires during callback execution
3. SIGCHLD handler frees cbdata while callback still uses it
4. Callback returns, I/O handler accesses freed memory -> crash

Fix:
- Add 'dead' flag to track when child has exited
- Set 'replied' BEFORE calling callback (not after)
- SIGCHLD handler skips cleanup if replied=TRUE (I/O handler owns it)
- I/O handler does cleanup after callback if dead=TRUE
- Extract cleanup into rspamd_lua_cbdata_free() helper

9 days ago[Fix] Use rspamd_text for subprocess callback data to avoid large allocations
Vsevolod Stakhov [Tue, 20 Jan 2026 16:13:21 +0000 (16:13 +0000)] 
[Fix] Use rspamd_text for subprocess callback data to avoid large allocations

Replace lua_pushlstring with lua_new_text(FALSE) when passing subprocess
result data to Lua callbacks. This avoids copying potentially large buffers
(e.g., 2.7MB neural network training results) into Lua's heap, which could
cause crashes under memory pressure.

9 days ago[Fix] Fix ROC threshold calculation for ham/spam labels
Vsevolod Stakhov [Tue, 20 Jan 2026 16:12:41 +0000 (16:12 +0000)] 
[Fix] Fix ROC threshold calculation for ham/spam labels

The ROC calculation was checking outputs[i][1] == 0 for ham samples,
but the ceb_neg cost function uses -1.0 for ham and 1.0 for spam.
Changed to check outputs[i][1] < 0 to correctly identify ham samples.

9 days ago[Feature] Multi-layer funnel architecture for LLM embeddings
Vsevolod Stakhov [Tue, 20 Jan 2026 14:25:29 +0000 (14:25 +0000)] 
[Feature] Multi-layer funnel architecture for LLM embeddings

Add improved neural network architecture specifically for LLM embedding
inputs, while preserving backward compatibility for symbol-based rules.

Key changes:
- New create_embedding_ann() with multi-layer funnel architecture
- Auto-detection of LLM providers via uses_llm_embeddings()
- Support for configurable layers, dropout, layer normalization
- GELU activation by default when available (falls back to ReLU)
- Layer size auto-scaling based on input dimension:
  - >512 dims: 3 layers (0.5, 0.25, 0.125)
  - 256-512 dims: 2 layers (0.5, 0.25)
  - <256 dims: 1 layer (0.5)

Bug fixes:
- Wrap create_ann in pcall to handle errors gracefully
- Reset learning_spawned flag on ANN creation failure
- Replace assert(false) with proper error logging that resets state
- Prevents training from getting stuck after errors

New configuration options:
- layers: explicit layer size multipliers
- dropout: dropout rate (default 0.2 for embeddings)
- use_layernorm: enable layer normalization (default true)
- activation: 'gelu' or 'relu' (default 'gelu' if available)

9 days ago[Feature] Add GELU activation and expose dropout in KANN bindings
Vsevolod Stakhov [Tue, 20 Jan 2026 14:20:51 +0000 (14:20 +0000)] 
[Feature] Add GELU activation and expose dropout in KANN bindings

- Implement GELU (Gaussian Error Linear Unit) activation function
  using erf: GELU(x) = 0.5 * x * (1 + erf(x / sqrt(2)))
- Add proper forward and backward passes for GELU
- Register GELU as operation #37 in kad_op_list
- Expose dropout layer to Lua (function existed but wasn't registered)
- Add Lua bindings for rspamd_kann.transform.gelu

GELU is often better than ReLU for transformer-like architectures
and high-dimensional embedding inputs.

9 days agoAdd GPU and vast.ai support for neural embedding service
Vsevolod Stakhov [Tue, 20 Jan 2026 12:16:36 +0000 (12:16 +0000)] 
Add GPU and vast.ai support for neural embedding service

- Add Dockerfile.gpu for GPU-accelerated inference with PyTorch CUDA
- Add requirements-gpu.txt with pinned versions for CUDA compatibility
- Add vastai-launch.sh script for deploying on vast.ai cloud GPUs
- Update README with GPU deployment instructions and model recommendations

Default GPU model: intfloat/multilingual-e5-large (100+ languages including Russian)
Tested on RTX 4090 with ~20-50ms latency per embedding.

9 days ago[Fix] Prefer higher version ANN profiles when symbol distances are equal
Vsevolod Stakhov [Tue, 20 Jan 2026 11:09:47 +0000 (11:09 +0000)] 
[Fix] Prefer higher version ANN profiles when symbol distances are equal

When multiple ANN profiles have the same symbol distance, the profile
selection would pick the first one encountered rather than the newest.
This caused issues when a newly trained ANN (version 1) existed alongside
the initial profile (version 0) - the scanner would select version 0
which had no actual ANN data.

Fix by adding a secondary selection criterion: when distances are equal,
prefer the profile with the higher version number.

10 days agoMerge branch 'master' into patch-15
Dmitriy Alekseev [Tue, 20 Jan 2026 08:58:32 +0000 (09:58 +0100)] 
Merge branch 'master' into patch-15

10 days ago[Fix] Remove unused variables in neural controller
Vsevolod Stakhov [Mon, 19 Jan 2026 20:22:35 +0000 (20:22 +0000)] 
[Fix] Remove unused variables in neural controller

Remove unused ev_base and ann_key variables to fix luacheck warnings.

10 days ago[Fix] Fix messagepack cache decoding format string
Vsevolod Stakhov [Mon, 19 Jan 2026 19:14:11 +0000 (19:14 +0000)] 
[Fix] Fix messagepack cache decoding format string

The UCL parser's parse_text() only accepts 'msgpack' as the format
string for messagepack parsing, while to_format() accepts both
'msgpack' and 'messagepack'. This mismatch caused cached data to
fail decoding and appear as cache misses.

Fixes LLM embedding cache never being read back despite being stored.

10 days ago[Fix] Prevent concurrent neural network training races
Vsevolod Stakhov [Mon, 19 Jan 2026 18:41:02 +0000 (18:41 +0000)] 
[Fix] Prevent concurrent neural network training races

- Add learning_spawned check at start of do_train_ann to prevent
  concurrent async Redis operations
- Move learning_spawned flag to start of spawn_train for earlier
  lock acquisition
- Remove redundant flag assignments later in the training flow

10 days ago[Feature] Add pending training keys and fix neural network training issues
Vsevolod Stakhov [Mon, 19 Jan 2026 18:08:58 +0000 (18:08 +0000)] 
[Feature] Add pending training keys and fix neural network training issues

- Add pending_train_key() for version-independent training vector storage
- Fix variable shadowing bug where ann_trained callback was overwritten
- Add concurrent training prevention via learning_spawned check
- Replace assert with proper error handling for msgpack parsing
- Clean up pending keys after successful training
- Update controller endpoint to use pending keys for manual training
- Fix ev_base:sleep() to register with session events properly
- Update classifier_test.lua to support llm_embeddings classifier testing

Co-Authored-By: Claude <noreply@anthropic.com>
10 days ago[Feature] Add ev_base:sleep() method for Lua
Vsevolod Stakhov [Mon, 19 Jan 2026 15:32:43 +0000 (15:32 +0000)] 
[Feature] Add ev_base:sleep() method for Lua

Add sleep method to ev_base that supports both sync and async modes:
- ev_base:sleep(time) - sync mode using coroutines
- ev_base:sleep(time, callback) - async mode with callback

Sync mode yields the current coroutine and resumes after timeout.
Async mode schedules the callback to run after the timeout.

10 days ago[Fix] Skip external map queries when Settings header is provided
Vsevolod Stakhov [Mon, 19 Jan 2026 14:41:30 +0000 (14:41 +0000)] 
[Fix] Skip external map queries when Settings header is provided

When settings are specified manually via the Settings HTTP header,
external map queries should not be executed as they may override
the manually provided settings asynchronously.

This prevents connection errors to external maps from affecting
requests that explicitly provide their own settings.

10 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Mon, 19 Jan 2026 14:11:05 +0000 (14:11 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

10 days ago[Fix] Guard fuzzy TCP session cleanup
Vsevolod Stakhov [Mon, 19 Jan 2026 14:09:01 +0000 (14:09 +0000)] 
[Fix] Guard fuzzy TCP session cleanup

10 days ago[Feature] Add language-based model/URL selection for LLM embeddings
Vsevolod Stakhov [Mon, 19 Jan 2026 09:29:44 +0000 (09:29 +0000)] 
[Feature] Add language-based model/URL selection for LLM embeddings

Support language-specific embedding models via language_models config:
- Shorthand: language_models = { ru = "model-name" }
- Full config: language_models = { ru = { model, url, api_key } }

Uses get_displayed_text_part() for language detection.
Include language in cache key for proper separation.

11 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Sun, 18 Jan 2026 17:31:40 +0000 (17:31 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

11 days agoMerge pull request #5845 from rspamd/feature/extract-text-limited
Vsevolod Stakhov [Sun, 18 Jan 2026 17:30:04 +0000 (17:30 +0000)] 
Merge pull request #5845 from rspamd/feature/extract-text-limited

[Feature] Add extract_text_limited for email text extraction with limits

11 days agoMerge pull request #5846 from moisseev/webui
Vsevolod Stakhov [Sun, 18 Jan 2026 17:29:51 +0000 (17:29 +0000)] 
Merge pull request #5846 from moisseev/webui

[Minor] Fix WebUI symbols frequency column sorting

11 days ago[Feature] Add reply_trim_mode for LLM input 5845/head
Vsevolod Stakhov [Sun, 18 Jan 2026 13:19:50 +0000 (13:19 +0000)] 
[Feature] Add reply_trim_mode for LLM input

11 days ago[Fix] Improve reply header trimming
Vsevolod Stakhov [Sun, 18 Jan 2026 12:04:16 +0000 (12:04 +0000)] 
[Fix] Improve reply header trimming

12 days ago[Minor] WebUI: Add frequency stddev column and units to symbols table 5846/head
Alexander Moisseev [Sun, 18 Jan 2026 08:20:58 +0000 (11:20 +0300)] 
[Minor] WebUI: Add frequency stddev column and units to symbols table

- Add frequency standard deviation column with the same exponential
scaling as frequency for consistent notation. Hidden on smaller
screens (lg breakpoint)
- Display units (hits/s for frequencies, s for time) in table headers
- Remove "s" suffix from time cells (unit now in header)

12 days ago[Fix] Calculate frequency exponent from non-zero values only
Alexander Moisseev [Sun, 18 Jan 2026 06:04:08 +0000 (09:04 +0300)] 
[Fix] Calculate frequency exponent from non-zero values only

Fix suboptimal exponential notation selection in WebUI symbols
frequency display. Previously, the exponent was calculated from
the average of all frequency values including zeros, resulting
in unnecessarily small exponents (e.g., 2300.00e-8 instead of
2.30e-5). Now only non-zero values are used for calculation,
producing more readable notation.

12 days ago[Feature] Add extract_text_limited for email text extraction with limits
Vsevolod Stakhov [Sat, 17 Jan 2026 15:58:14 +0000 (15:58 +0000)] 
[Feature] Add extract_text_limited for email text extraction with limits

Add lua_mime.extract_text_limited() function to extract meaningful text from
emails with long reply chains while respecting size limits.

Features:
- max_bytes: Hard limit on output size (default: 32KB)
- max_words: Alternative limit by word count
- strip_quotes: Remove quoted replies (lines starting with >)
- strip_reply_headers: Remove reply headers (On X wrote:, From: Sent:)
- strip_signatures: Remove signature blocks (-- separator, mobile signatures)
- smart_trim: Enable all heuristics

Implementation:
- Uses rspamd_text:lines() iterator for memory-efficient line processing
- No full string interning of email content (better for large emails)
- rspamd_trie for multi-pattern matching (67 signature, 44 reply patterns)
- rspamd_regexp for regex patterns (wrote:, schrieb:, etc.)
- Single-pass O(n) algorithm with early termination

Multilingual support for 10+ languages:
- English, German, French, Spanish, Russian, Portuguese, Italian
- Chinese, Japanese, Polish

Configuration API:
- lua_mime.configure_text_extraction(cfg) for custom patterns
- Supports extend_defaults to add patterns without replacing defaults

CLI integration in rspamadm mime ex:
- -L/--limit, -Q/--strip-quotes, -S/--strip-signatures
- -R/--strip-reply-headers, -T/--smart-trim

Also updates llm_common.build_llm_input() to use the new function.

12 days ago[Minor] Unify sortValue functions to arrow functions
Alexander Moisseev [Sat, 17 Jan 2026 16:45:44 +0000 (19:45 +0300)] 
[Minor] Unify sortValue functions to arrow functions

Convert all sortValue functions in FooTable column definitions to
arrow functions with consistent parameter naming for consistency
across the codebase.

12 days ago[Minor] Fix WebUI symbols frequency column sorting
Alexander Moisseev [Sat, 17 Jan 2026 16:27:44 +0000 (19:27 +0300)] 
[Minor] Fix WebUI symbols frequency column sorting

Previously, frequency values with exponential notation (e.g., "0.00e-5",
"389.40e-5") were compared as strings, causing incorrect sort order.

12 days ago[Fix] Normalize request header values
Vsevolod Stakhov [Sat, 17 Jan 2026 12:19:17 +0000 (12:19 +0000)] 
[Fix] Normalize request header values

12 days ago[Fix] Stabilize neural LLM embedding training and cache keys
Vsevolod Stakhov [Sat, 17 Jan 2026 10:46:13 +0000 (10:46 +0000)] 
[Fix] Stabilize neural LLM embedding training and cache keys

13 days agoMerge branch 'master' into vstakhov-llm-embedding-improvements
Vsevolod Stakhov [Fri, 16 Jan 2026 17:45:00 +0000 (17:45 +0000)] 
Merge branch 'master' into vstakhov-llm-embedding-improvements

13 days agoMerge branch 'master' into patch-15
Dmitriy Alekseev [Fri, 16 Jan 2026 17:22:48 +0000 (18:22 +0100)] 
Merge branch 'master' into patch-15

13 days ago[Fix] Fix fuzzystat control replies
Vsevolod Stakhov [Fri, 16 Jan 2026 13:25:04 +0000 (13:25 +0000)] 
[Fix] Fix fuzzystat control replies

13 days ago[Fix] Avoid case-only alias rewrites
Vsevolod Stakhov [Fri, 16 Jan 2026 12:17:51 +0000 (12:17 +0000)] 
[Fix] Avoid case-only alias rewrites

Refs #5843

13 days ago[Fix] Respect headers_modify_mode for fuzzy hash headers
Vsevolod Stakhov [Fri, 16 Jan 2026 10:59:14 +0000 (10:59 +0000)] 
[Fix] Respect headers_modify_mode for fuzzy hash headers

2 weeks agoMerge pull request #5842 from fatalbanana/rl_compat
Vsevolod Stakhov [Fri, 16 Jan 2026 08:59:17 +0000 (08:59 +0000)] 
Merge pull request #5842 from fatalbanana/rl_compat

[Fix] ratelimit: fix compatibility with old records

2 weeks ago[Fix] ratelimit: fix compatibility with old records 5842/head
Andrew Lewis [Thu, 15 Jan 2026 15:33:46 +0000 (17:33 +0200)] 
[Fix] ratelimit: fix compatibility with old records

2 weeks agoMerge pull request #5839 from bneumeier/master
Vsevolod Stakhov [Thu, 15 Jan 2026 15:02:48 +0000 (15:02 +0000)] 
Merge pull request #5839 from bneumeier/master

Allow for use of Lua 5.5

2 weeks agoMerge pull request #5840 from moisseev/frequency
Vsevolod Stakhov [Thu, 15 Jan 2026 14:54:51 +0000 (14:54 +0000)] 
Merge pull request #5840 from moisseev/frequency

[Fix] Use proper rounding for symbol frequency statistics

2 weeks agoMerge pull request #5841 from fatalbanana/log_keys
Vsevolod Stakhov [Thu, 15 Jan 2026 14:54:37 +0000 (14:54 +0000)] 
Merge pull request #5841 from fatalbanana/log_keys

Lua: populate missing log keys

2 weeks ago[Minor] Satisfy luacheck 5841/head
Andrew Lewis [Thu, 15 Jan 2026 12:29:07 +0000 (14:29 +0200)] 
[Minor] Satisfy luacheck

2 weeks ago[Minor] Return errors from lua_redis.load_redis_script_from_file
Andrew Lewis [Thu, 15 Jan 2026 12:13:22 +0000 (14:13 +0200)] 
[Minor] Return errors from lua_redis.load_redis_script_from_file

2 weeks ago[Minor] populate missing log keys in plugins, lualib
Andrew Lewis [Thu, 15 Jan 2026 12:06:37 +0000 (14:06 +0200)] 
[Minor] populate missing log keys in plugins, lualib

2 weeks ago[Fix] Silence zlib preset dictionary inflate errors
Vsevolod Stakhov [Thu, 15 Jan 2026 10:14:55 +0000 (10:14 +0000)] 
[Fix] Silence zlib preset dictionary inflate errors

2 weeks ago[Fix] Propagate control request ids in replies
Vsevolod Stakhov [Wed, 14 Jan 2026 22:48:06 +0000 (22:48 +0000)] 
[Fix] Propagate control request ids in replies

Ensure workers include cmd->id in control replies to avoid 'unknown request id 0' warnings. Update functional control tests and make RSPAMD_TMPDIR visible to child suites.

2 weeks ago[Fix] Use proper rounding for symbol frequency statistics 5840/head
Alexander Moisseev [Wed, 14 Jan 2026 16:11:50 +0000 (19:11 +0300)] 
[Fix] Use proper rounding for symbol frequency statistics

- Replace incorrect floor() with round() in rounding functions to avoid
  losing small values
- Increase counters API frequency precision from 3 to 6 decimal places
  (need 5 to avoid rspamc displaying values as multiples of 0.06, need 6
  for /counters endpoint itself - no additional overhead as JSON stores
  double anyway)
- Add frequency_stddev field to counters API output (fixes zero stdev in
  `rspamc counters` output)
- Clarify `rspamc counters` table header with "avg (stddev)" subheading
- Fix WebUI to preserve frequency precision before scaling

Example for symbol with frequency 0.004772 hits/sec:
- Before: /symbols returns 0.004772, /counters returns 0.004000,
  `rspamc counters` shows 0.240
- After:  /symbols returns 0.004772, /counters returns 0.004772,
  `rspamc counters` shows 0.286

2 weeks ago[Feature] Route all hyperscan cache operations through Lua backend
Vsevolod Stakhov [Wed, 14 Jan 2026 14:29:32 +0000 (14:29 +0000)] 
[Feature] Route all hyperscan cache operations through Lua backend

- Route file backend through Lua for consistency with redis/http
- Add zstd compression support with magic byte detection for backward
  compatibility (reads both .hs and .hs.zst files)
- Fix rspamd_util.stat() return value handling (returns err, stat tuple)
- Fix timer management for synchronous Lua callbacks to prevent early
  termination of re_cache compilation
- Fix use-after-free in load path by pre-counting pending items
- Add priority queue for re_cache compilation (short lists first)
- Add ev_run() flush before blocking hyperscan compilations to ensure
  busy notifications are sent
- Add hyperscan_notice_known() and hyperscan_get_platform_id() Lua APIs

2 weeks ago[Feature] Add ASCII85 decode support for PDF text extraction
Vsevolod Stakhov [Wed, 14 Jan 2026 10:30:51 +0000 (10:30 +0000)] 
[Feature] Add ASCII85 decode support for PDF text extraction

PDFs may use ASCII85Decode filter for content streams. This was causing
text extraction to fail for such PDFs, resulting in missed URLs and emails.

- Add rspamd_decode_ascii85_buf() in str_util.c
- Add rspamd_util.decode_ascii85() Lua binding
- Add ASCII85Decode filter support in pdf.lua
- Add --raw flag to rspamadm mime urls command

2 weeks ago[Fix] Refactor control socket to use ID-based request/reply matching
Vsevolod Stakhov [Tue, 13 Jan 2026 21:52:13 +0000 (21:52 +0000)] 
[Fix] Refactor control socket to use ID-based request/reply matching

Replace the serialization-based control command handling with an ID-based
approach using khash, mirroring the existing rspamd_srv_requests pattern.

Key changes:
- Add uint64_t id field to control command/reply structs
- Use khash for O(1) request lookup by ID instead of GHashTable
- Add rspamd_control_reply_handler() for centralized reply processing
- Add rspamd_control_pending_new/destroy/remove_all() API functions
- Add control_ev watcher to worker struct for reply monitoring
- Call rspamd_srv_pipe_cleanup() on worker shutdown to prevent leaks
- Handle ID collisions gracefully (warn and free old entry)

This fixes hash table iterator corruption crashes that occurred when
modifying the hash during iteration, and provides more robust concurrent
command handling.