git.ipfire.org Git - thirdparty/rspamd.git/log

]> git.ipfire.org Git - thirdparty/rspamd.git/log

projects / thirdparty / rspamd.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 15:11:44 +0000 (15:11 +0000)]

[Fix] lua_url: Re-encode control characters and spaces in URL tostring

The URL parser (rspamd_url_decode) decodes percent-encoded sequences
like %20 back to literal characters in the internal representation.
When tostring() returned these decoded URLs, spaces and control chars
would break subsequent re-parsing (e.g., in url_redirector redirect
chains and Redis cache round-trips). Fix by re-encoding characters
<= 0x20 on serialization, matching browser behavior: decode internally
for matching, re-encode on copy.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 11:22:07 +0000 (11:22 +0000)]

[Fix] re_cache: Always use charset-converted content for SARAWBODY matching

Use utf_raw_content (charset-converted UTF-8 with HTML tags preserved)
for all SARAWBODY patterns, regardless of /u flag presence. The previous
approach used utf_content (which strips HTML tags on HTML parts) and only
for classes containing /u patterns, leaving non-/u patterns matching
against raw bytes in the original charset.

This prevents trivial bypass of SA rawbody rules via exotic encodings
like UTF-16 and ensures consistent matching across PCRE and Hyperscan.
Falls back to transfer-decoded parsed content only when charset
conversion failed.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 10:17:38 +0000 (10:17 +0000)]

Merge pull request #5871 from KIT-CERT/fix_ratelimits

fix dynamic bucket-specific rate-limits

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 10:17:15 +0000 (10:17 +0000)]

Merge pull request #5874 from rspamd/vstakhov-proxy-balancing

Feature: Token bucket load balancing for proxy upstreams

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 09:26:39 +0000 (09:26 +0000)]

[Test] upstream: add token bucket unit tests

15 doctest test cases covering token bucket load balancing:
basic selection, cost formula, token return/penalty, least-loaded
preference, except parameter, exhaustion fallback, fair distribution,
custom config, empty list, null safety, large messages, multiple
inflight, mixed success/failure, and generic API fallback.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 09:25:17 +0000 (09:25 +0000)]

[Fix] upstream: fix stale heap_idx in token bucket

The intrusive heap swaps entire structs during swim/sink, making
up->heap_idx stale after any heap modification. The update function
would silently skip updates when the cached index pointed to a
different upstream, breaking load distribution across backends.

Fix by falling back to linear search on cache miss and refreshing
heap_idx after every heap update. Also add underflow warning for
double-return detection and improve API documentation.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 6 Feb 2026 08:07:36 +0000 (08:07 +0000)]

Merge branch 'master' into vstakhov-proxy-balancing

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 15:38:07 +0000 (15:38 +0000)]

[Refactor] fuzzy storage: split helper code (#5875)

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 14:52:20 +0000 (14:52 +0000)]

Merge pull request #5878 from moisseev/webui

[Fix] WebUI: Allow computing fuzzy hashes without writable storages

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 14:05:32 +0000 (14:05 +0000)]

[Fix] Fix printf format specifiers found by clang-plugin

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 14:04:57 +0000 (14:04 +0000)]

[Fix] clang-plugin: add null check for struct type in check_struct_type

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 13:10:54 +0000 (13:10 +0000)]

[Fix] clang-plugin: suppress noisy remarks and fix SANITIZER macro conflict

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 13:07:03 +0000 (13:07 +0000)]

[Fix] clang-plugin: fix build with modern LLVM/Clang

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 12:44:03 +0000 (12:44 +0000)]

[Fix] Use %ud instead of %u in rspamd printf format strings

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 09:47:44 +0000 (09:47 +0000)]

[Fix] re_cache: Use debug level for missing Lua backend during config

During config initialization (configtest, startup), there's no event
loop available so the Lua backend cannot be initialized. This is
expected behavior - use debug level when try_load=true to avoid
noisy warnings during configtest.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 09:23:54 +0000 (09:23 +0000)]

[Test] Add test cases for MIME_HTML_ONLY with malformed multipart

Add tests for edge cases that caused a segfault when multipart/related
has no children or contains only non-text content:

- alternative-nested-rfc822.eml: multipart/alternative with HTML and
related containing only image (no text), plus nested message/rfc822
- alternative-empty-related.eml: multipart/alternative with malformed
related that has no proper MIME children

These test cases verify the NULL check fix for mp->children.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 09:17:47 +0000 (09:17 +0000)]

[Fix] message: Add NULL check for mp->children in alternative detection

The multipart children array can be NULL in some edge cases. Add NULL
checks before accessing mp->children->len to prevent segfault in
rspamd_mime_part_find_text_in_subtree() and related code paths.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 09:14:22 +0000 (09:14 +0000)]

[Fix] re_cache: Use debug level for startup hyperscan load failures

During worker startup, a "best-effort" synchronous hyperscan load is
attempted before hs_helper has finished compiling. When files don't
exist yet, the "no valid expressions" message was logged at info level,
which is noisy and misleading since this is expected startup behavior.

Changed to use debug level when try_load=true (startup probe), while
keeping info level for actual failures. Workers will receive async
notifications when hs_helper finishes compiling.

commit | commitdiff | tree

Alexander Moisseev [Thu, 5 Feb 2026 09:06:48 +0000 (12:06 +0300)]

[Fix] WebUI: Allow computing fuzzy hashes without writable storages

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 08:58:53 +0000 (08:58 +0000)]

[Fix] re_cache: Respect disable_hyperscan option in loading functions

Add checks for disable_hyperscan at the start of hyperscan loading
functions to prevent database loading when the option is set.

Previously, hyperscan databases would still be loaded even with
disable_hyperscan = true, causing unnecessary I/O and memory usage.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 5 Feb 2026 08:53:07 +0000 (08:53 +0000)]

[Fix] re_cache: Use charset-converted content for UTF-8 SARAWBODY patterns

When SARAWBODY regexp class contains UTF-8 patterns (/u flag), use
utf_content (charset-converted UTF-8 with HTML preserved) instead of
parsed content. This allows Unicode patterns like \x{200b} to match
correctly.

For non-UTF patterns, continue using parsed content with raw mode
for backward compatibility with raw byte matching.

This fixes "bad utf8 input for JIT re" errors when using Unicode
patterns in rawbody rules on non-UTF-8 encoded messages.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 18:44:18 +0000 (18:44 +0000)]

[Fix] re_cache: Always use raw mode for SARAWBODY regexps

The parsed content is transfer-decoded (base64/QP) but NOT charset-converted,
so it may contain non-UTF-8 data even when IS_TEXT_PART_UTF is true.

Using dynamic raw flag based on IS_TEXT_PART_UTF was incorrect because that
flag indicates whether utf_content is valid UTF-8, not whether parsed content
is valid UTF-8.

Bug introduced in 0d62dd6513 (1.8.3), this restores the original behavior of
always treating SARAWBODY content as raw.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 17:06:40 +0000 (17:06 +0000)]

[Feature] re_cache: Improve hyperscan loading log messages

- Show regexp count alongside class count (e.g., "4662 regexps (42/42 classes)")
- Display loaded/total classes ratio for partial loads (e.g., "38/42 classes")
- Log missing classes with reasons (not cached, empty data, load failed)
- Add consistent logging to async loading path
- Include class type and type_data in missing class messages for easier debugging

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 15:10:00 +0000 (15:10 +0000)]

[Fix] Use rspamd printf format specifiers instead of GNU

Fix format strings in hs_cache_backend.c and re_cache.c to use rspamd's
custom printf specifiers:
- %uz for gsize (unsigned size_t), not %z
- %ud for unsigned int, not %u

The GNU format specifiers cause crashes on some platforms.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 14:50:35 +0000 (14:50 +0000)]

[Fix] lua_magic: avoid misdetecting HTML with embedded SVG as SVG

Add svg_format_heuristic that checks for HTML markers (<!DOCTYPE html>,
<html>, <head>, <body>, <meta>) before the <svg> tag position. If HTML
markers are present, skip SVG detection and let the text heuristic
properly classify the content as HTML.

Add functional test with HTML containing embedded SVG (should detect as
HTML) and standalone SVG (should still detect as SVG).

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 13:48:00 +0000 (13:48 +0000)]

[Fix] Hyperscan cache: use Lua backend for sync loading, load on worker startup

Two issues addressed:

1. Sync loading now uses Lua backend exclusively instead of duplicating
   file loading logic in C. The Lua backend handles files, compression,
   and future backends (redis, http) uniformly.

2. Workers now proactively load hyperscan on startup after Lua backend
   is initialized. This fixes a race condition where workers spawned
   after hs_helper broadcasts HYPERSCAN_LOADED would never receive the
   notification and run without hyperscan acceleration.

Changes:
- Add rspamd_hs_cache_lua_load_sync() and rspamd_hs_cache_lua_exists_sync()
  to call Lua backend's sync methods from C
- Remove duplicated C file loading code from re_cache.c (zstd decompress,
  file path checking) - Lua backend handles this
- rspamd_re_cache_load_hyperscan() now requires Lua backend
- Workers try sync load on startup (best-effort, falls back to PCRE)

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 4 Feb 2026 12:02:06 +0000 (12:02 +0000)]

[Fix] Rework alternative parts detection

commit | commitdiff | tree

Konstantin Zangerle [Tue, 3 Feb 2026 16:28:19 +0000 (17:28 +0100)]

convert tabs to spaces

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 3 Feb 2026 16:11:15 +0000 (16:11 +0000)]

[Fix] R_PARTS_DIFFER: handle multipart/related in alternative

commit | commitdiff | tree

Konstantin Zangerle [Tue, 3 Feb 2026 16:04:55 +0000 (17:04 +0100)]

fix linter errors

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 3 Feb 2026 11:52:18 +0000 (11:52 +0000)]

[Feature] proxy: enable token bucket load balancing by default

Add token_bucket configuration to the default upstream in worker-proxy.inc
with sensible defaults (max_tokens=10000, scale=1024, base_cost=10).

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 3 Feb 2026 11:42:27 +0000 (11:42 +0000)]

[Feature] proxy: implement token bucket load balancing for upstreams

Add weighted load balancing algorithm that considers message size and
current backend load when selecting upstreams. Each upstream has a token
pool that gets depleted proportionally to message size and replenished
when requests complete successfully.

- Add RSPAMD_UPSTREAM_TOKEN_BUCKET rotation type
- Implement min-heap based selection for O(log n) upstream selection
- Reserve tokens proportional to message size (base_cost + size/scale)
- Return tokens on success (restores available) or failure (lost)
- Fall back to least-loaded upstream when all are token-exhausted
- Add UCL configuration: token_bucket { max_tokens, scale, min_tokens, base_cost }

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 3 Feb 2026 10:14:26 +0000 (10:14 +0000)]

[Fix] R_PARTS_DIFFER: also handle parts without words

The previous fix only handled truly empty parts. This also handles
the case where a part has content but no extractable words (e.g.,
6 bytes of whitespace in text/plain vs 142 words in text/html).

Now check if exactly one part has normalized_hashes with words,
regardless of whether parts are marked as empty.

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 2 Feb 2026 14:27:16 +0000 (14:27 +0000)]

[Fix] R_PARTS_DIFFER: handle empty alternative parts

Previously, R_PARTS_DIFFER only triggered when both text/html and
text/plain parts had content. When one part was empty (e.g., empty
HTML with non-empty plain text), the distance was never calculated.

Now detect when exactly one part is empty and treat it as 100%
difference, which will trigger R_PARTS_DIFFER with maximum score.

commit | commitdiff | tree

Konstantin Zangerle [Mon, 2 Feb 2026 13:38:46 +0000 (14:38 +0100)]

fix dynamic bucket-specific rate-limits

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 16:00:05 +0000 (16:00 +0000)]

[Fix] worker_util: add hyperscan handlers to controller

Controller was missing hyperscan/multipattern/regexp_map hot-swap
handlers, causing it to stay on ACISM fallback while normal workers
switched to hyperscan.

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 15:52:56 +0000 (15:52 +0000)]

[Fix] multipattern: fix TLD pattern matching after hyperscan hot-swap

The hyperscan TLD pattern suffix (?:[^a-zA-Z0-9]|$) was consuming the
boundary character, causing match length to be one character too long.
This broke URL detection in url_tld_end() after workers hot-swapped
from ACISM to hyperscan.

Root cause: For input "adobesign.github.io/?u=xxx":
- ACISM pattern ".github.io" -> match length 10, p points to "/"
- Hyperscan pattern "\.github.io(?:[^a-zA-Z0-9]|$)" -> match length 11,
p points to "?" (one past expected)

url_tld_end() checks if *p == '/' which failed for hyperscan.

Fix: Remove suffix from hyperscan TLD patterns and add boundary checking
in rspamd_multipattern_hs_cb(), mirroring what ACISM callback already does.
This ensures consistent match length between both backends.

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 15:44:31 +0000 (15:44 +0000)]

[Fix] Make unknown and broken DKIM keys behaviour conforming to RFC

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 12:31:17 +0000 (12:31 +0000)]

[Feature] headers_checks: add Reply-To address validity checks

Add RFC 5321 compliance checks for Reply-To header:
- REPLYTO_INVALID: address doesn't pass RFC 5321 validation
- REPLYTO_LOCALPART_LONG: local-part exceeds 64 characters
- REPLYTO_DOMAIN_LONG: domain exceeds 255 characters

This helps detect spam with intentionally invalid Reply-To addresses
that users cannot actually reply to.

Closes: #5854

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 09:25:29 +0000 (09:25 +0000)]

Merge pull request #5869 from fatalbanana/antivirus_docs

[Minor] antivirus: fix `whitelist` description

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 1 Feb 2026 09:24:48 +0000 (09:24 +0000)]

Merge branch 'master' into antivirus_docs

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 31 Jan 2026 22:17:31 +0000 (22:17 +0000)]

[Fix] re_cache: fix stale hyperscan ID handling during config reload

When hyperscan cache is reloaded, the old hs_ids array may contain indices
that now point to different regexps in cache->re due to regexp reordering
after config reload. This caused two issues:

1. Cleanup used stale hs_ids to reset match_type, potentially resetting
   wrong regexps while leaving the actual ones with match_type=HYPERSCAN
   but hs_scratch=NULL, causing assertion failures.

2. If validation failed mid-loop while setting match_types, some regexps
   would already have match_type=HYPERSCAN before we freed scratch,
   leaving them in an inconsistent state.

Fix:
- Iterate re_class->re hash table (actual regexps in class) during cleanup
  instead of using potentially stale hs_ids
- Split validation and match_type setting into separate loops so we only
  set match_types after ALL IDs are validated

Both file-based and Redis-based loading paths are fixed.

commit | commitdiff | tree

Andrew Lewis [Fri, 30 Jan 2026 17:39:24 +0000 (19:39 +0200)]

[Minor] antivirus: fix `whitelist` description

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 30 Jan 2026 12:52:52 +0000 (12:52 +0000)]

Merge pull request #5866 from rspamd/vstakhov-fuzzy-ratelimit-scripts

Enhance fuzzy blacklist handler with richer context

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 17:57:23 +0000 (17:57 +0000)]

[Fix] cryptobox: properly bypass RHEL/CentOS 10+ crypto-policies for SHA-1 DKIM

RHEL/CentOS 10+ disables SHA-1 signatures via the rh-allow-sha1-signatures
config option, which is not bypassed by simply creating a new OSSL_LIB_CTX.

This fix:
- Creates a temporary OpenSSL config file with rh-allow-sha1-signatures=yes
- Loads it into the dedicated SHA-1 library context via OSSL_LIB_CTX_load_config
- Improves error messages to include algorithm name and RHEL-specific hints
- Captures OpenSSL error details when EVP_PKEY_verify fails
- Adds troubleshooting guidance in error messages

On non-RHEL systems, the config option is simply ignored.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 16:53:59 +0000 (16:53 +0000)]

[Fix] lua_hs_cache: add defensive checks for zstd_decompress

Add type checking before decompression to catch unexpected data types
from Redis. Wrap zstd_decompress in pcall to gracefully handle any
errors and provide detailed diagnostic logging when failures occur.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 16:49:20 +0000 (16:49 +0000)]

[Fix] re_cache: stop timer during async operations to prevent re-entry

Stop the timer before calling exists_async and save_async to prevent
the timer from firing multiple times while an async callback is pending.
Without this, the repeating timer (0.1s interval) could fire again before
the async operation completes, causing multiple concurrent async calls
for the same re_class. This led to race conditions and use-after-free
crashes when callbacks completed out of order.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 16:35:35 +0000 (16:35 +0000)]

[Fix] re_cache: initialize async context to prevent random callback skipping

Use g_malloc0 instead of g_malloc when allocating rspamd_re_cache_async_ctx
to ensure callback_processed flag is initialized to FALSE. Without this,
the flag could randomly be TRUE (from uninitialized memory), causing the
exists/save callbacks to be silently skipped and preventing re_class
compilation from proceeding after cache misses.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 15:58:41 +0000 (15:58 +0000)]

[Fix] hs_helper: fix use-after-free in Redis async cache callbacks

Remove dangerous ev_run(EVRUN_NOWAIT) calls from inside Redis callback
chains in rspamd_hs_helper_mp_exists_cb and rspamd_hs_helper_remap_exists_cb.

Calling ev_run() inside a callback can trigger Lua GC which may try to
finalize lua_redis userdata while we're still processing the callback,
causing lua_redis_gc to access already-freed memory.

Also add missing REF_RELEASE calls at the end of both callbacks to properly
release the reference from exists_async (matching the exists==true path).

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 15:19:46 +0000 (15:19 +0000)]

Revert "[Fix] lua_redis: add defensive check in GC handler"

This reverts commit e753e063c2102d920b8b2ea15f675de04b1aaa99.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 15:17:15 +0000 (15:17 +0000)]

[Fix] lua_redis: add defensive check in GC handler

Add validation in lua_redis_gc to check that the context pointer
appears valid before releasing. This prevents crashes when Lua GC
collects stale userdata pointing to already-freed memory.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 15:16:59 +0000 (15:16 +0000)]

[Fix] hs_helper: fix use-after-free in async hyperscan cache callbacks

Add proper refcounting to async compilation contexts to prevent
use-after-free when Redis callbacks are invoked multiple times
(timeout + response) or during worker termination.

- Add ref_entry_t to async context structures
- Use REF_RETAIN before async operations and REF_RELEASE in callbacks
- Add callback_processed flag to prevent double processing
- Save entry data before ev_run that might free pending arrays

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 13:30:20 +0000 (13:30 +0000)]

Merge pull request #5863 from moisseev/full-hashes

[Feature] WebUI: Add fuzzy hash copy and delist buttons

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 13:26:27 +0000 (13:26 +0000)]

[Feature] fuzzy_storage: enhance blacklist handler with richer context

Pass extended information to Lua blacklist handlers:
- event_type: distinguish "new", "existing", or "blacklist" events
- ratelimit_info: bucket state (level, burst, rate, exceeded_by)
- digest: hash when session context is available
- extensions: domain and source IP from fuzzy extensions

Backwards compatible - existing handlers still receive ip and reason
as first two arguments. New arguments are optional for Lua handlers.

Optimized with early-exit checks when no handlers are registered.

commit | commitdiff | tree

Alexander Moisseev [Thu, 29 Jan 2026 13:19:55 +0000 (16:19 +0300)]

[Minor] WebUI: Deduplicate fuzzy hashes in multipart messages

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 10:52:03 +0000 (10:52 +0000)]

[Fix] cryptobox: remove redundant obj_mac.h include

NID_sha1 is already available through evp.h -> objects.h -> obj_mac.h
include chain, so explicit include is unnecessary.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 29 Jan 2026 10:41:15 +0000 (10:41 +0000)]

[Fix] cryptobox: bypass RHEL/CentOS 10 crypto-policies for SHA-1 DKIM verification

RHEL/CentOS 10+ crypto-policies disable SHA-1 for signatures by default,
causing rsa-sha1 DKIM verification to fail. This is problematic as many
legitimate emails still use rsa-sha1 DKIM signatures.

For OpenSSL 3.0+, create a dedicated OSSL_LIB_CTX that bypasses system
crypto-policies specifically for SHA-1 DKIM signature verification.
SHA-256/SHA-512 verifications continue using the normal system context.

The legacy context is lazily initialized only when SHA-1 verification
is needed, avoiding overhead for modern rsa-sha256 signatures.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 18:44:18 +0000 (18:44 +0000)]

[Fix] re_cache: detect and handle stale hyperscan files with mismatched regexp IDs

When the re_class hash doesn't include global regexp indices, regexps can
be reordered while the class hash stays the same. This causes old hyperscan
files to be loaded with stale position IDs that point to wrong regexps
(possibly in different re_classes with no hyperscan loaded), leading to
assertion failures when hs_scratch is NULL.

Fix:
- Reset match_type to PCRE during cleanup before freeing hs_ids, preventing
stale HYPERSCAN flags if reload fails
- Validate that each stored ID points to a regexp belonging to the current
re_class before setting match_type
- Delete stale hyperscan files to trigger recompilation by hs_helper
- For Redis cache, stale entries will expire or be overwritten

Both file-based and Redis-based loading paths are fixed.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 12:51:15 +0000 (12:51 +0000)]

[Fix] Fix EVP_PKEY_CTX memory leak in DKIM RSA signing

The EVP_PKEY_CTX allocated in rspamd_dkim_sign() for RSA key signing
was never freed, causing continuous memory growth when using DKIM/ARC
signing with RSA keys.

Add EVP_PKEY_CTX_free() calls in all error paths and after successful
signing to properly release the OpenSSL context.

Fixes: #5865

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 11:08:49 +0000 (11:08 +0000)]

[Fix] url_suspect: extract TLD from eSLD for suspicious TLD check

The get_tld() function returns eSLD (e.g., "phishing.tk"), not the TLD
suffix. Extract the actual TLD by removing the first label.

Also add suspicious_tlds_map to test config since the override replaces
the default url_suspect configuration.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 10:49:10 +0000 (10:49 +0000)]

[Fix] hs_helper: defer next multipattern compilation to prevent crash after Redis timeout

When a Redis timeout occurs during HS cache save operation, the error callback
immediately starts processing the next multipattern while still inside the
timeout handler. This leaves the connection in an inconsistent state, causing
a crash (SIGSEGV in hs_compile_multi) when compiling the next multipattern.

Fix by deferring the next operation to the next event loop iteration using
ev_timer with 0 timeout. This ensures the error handling completes fully
before starting the next operation.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 09:05:25 +0000 (09:05 +0000)]

[Fix] Redis hyperscan cache: use write_servers for store/delete operations

The redis_backend was missing is_write=true in attrs for store and delete
operations, causing lua_redis.request to use read_servers instead of
write_servers. This resulted in READONLY errors when read and write servers
are configured separately.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 28 Jan 2026 08:56:28 +0000 (08:56 +0000)]

[Feature] Replace builtin_suspicious TLDs with map-based configuration

Convert hardcoded suspicious TLDs list to a proper map file following
rspamd's standard map loading pattern with fallback support.

Changes:
- Add conf/maps.d/suspicious_tlds.inc with default TLDs (.tk, .ml, .ga, .cf, .gq)
- Update url_suspect.conf to use fallback+file:// pattern for user overrides
- Update url_suspect.lua to load TLDs via rspamd_map_add_from_ucl()

Users can now:
- Override entirely: create local.d/maps.d/suspicious_tlds.inc
- Extend defaults: create local.d/maps.d/suspicious_tlds.inc.local
- Disable: set suspicious_tlds_map = null in local.d/url_suspect.conf

Supersedes #5864 - the map-based approach inherently handles nil/missing
config gracefully, making the type check unnecessary.

commit | commitdiff | tree

Alexander Moisseev [Tue, 27 Jan 2026 15:32:31 +0000 (18:32 +0300)]

[Minor] WebUI: Fix fuzzy hash handling edge cases

- Fix hash collision when multiple hashes share same prefix
- Add bounds checking for fuzzy hash array access
- Add error handling for malformed JSON in hash data
- Remove unused parameter from generateFuzzyActions

commit | commitdiff | tree

Alexander Moisseev [Tue, 27 Jan 2026 10:16:16 +0000 (13:16 +0300)]

[Feature] WebUI: Add fuzzy hash copy and delist buttons

Add UI controls for managing fuzzy hashes in History and Scan tables:
- Copy button to copy full hashes to clipboard (newline-separated)
- Delist button to open bl.rspamd.com removal page with hashes
- Buttons are disabled (with tooltips) when hashes are unavailable
- Hashes are searchable via filter input

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 26 Jan 2026 17:48:55 +0000 (17:48 +0000)]

[Fix] Simplify parse_sa_regexp by delegating to rspamd_regexp.create

Remove unnecessary flag parsing and inline modifier transform that
caused issues with invalid PCRE flags (g, u). rspamd_regexp.create
already handles /pattern/flags and m{pattern}flags formats natively.

Closes: #5858

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 26 Jan 2026 16:40:19 +0000 (16:40 +0000)]

[Fix] GPT plugin: explicitly set POST method for API requests

Fixes #5859

Some API providers (like Ollama) strictly require POST method on their
endpoints and return 405 Method Not Allowed for GET requests. While
rspamd_http auto-detects POST when a body is present, explicitly setting
the method ensures correct behavior in all cases.

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 26 Jan 2026 16:30:18 +0000 (16:30 +0000)]

[Fix] Clear pending multipatterns on config reload to prevent use-after-free

After SIGHUP reload, the global pending_compilations queue retained
stale multipattern pointers from the freed old config. When hs_helper
processed the queue, it accessed freed memory causing heap-buffer-overflow
in rspamd_multipattern_get_npatterns().

Add rspamd_multipattern_clear_pending() alongside the existing
rspamd_regexp_map_clear_pending() call before releasing old config.

commit | commitdiff | tree

Alexander Moisseev [Mon, 26 Jan 2026 13:39:24 +0000 (16:39 +0300)]

[Minor] WebUI: Export copyToClipboard for reuse across modules

- Move from local function to ui.copyToClipboard
- Support modal (fixed positioning) and non-modal (absolute) contexts
- Simplify implementation using textarea.remove() and opacity

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 26 Jan 2026 13:47:00 +0000 (13:47 +0000)]

Merge pull request #5860 from rspamd/vstakhov-fuzzy-history

[Feature] Display matched fuzzy hashes in WebUI history

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 24 Jan 2026 16:33:17 +0000 (16:33 +0000)]

[Feature] Store matched fuzzy hashes in Redis history

Add fuzzy_hashes array to history entries by retrieving matched
hashes from task mempool in history_redis plugin.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 23 Jan 2026 10:46:45 +0000 (10:46 +0000)]

Merge pull request #5831 from dragoangel/patch-15

[Neural] Add option to skip training if store_set_only is true

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 23 Jan 2026 09:46:42 +0000 (09:46 +0000)]

Merge pull request #5855 from dragoangel/patch-16

Fix learn_mode typo in neural.lua

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 23 Jan 2026 09:46:30 +0000 (09:46 +0000)]

Merge pull request #5856 from rspamd/dependabot/pip/contrib/neural-embedding-service/transformers-4.53.0

Bump transformers from 4.40.0 to 4.53.0 in /contrib/neural-embedding-service

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 21:14:42 +0000 (22:14 +0100)]

Fix after merging master

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 21:12:07 +0000 (22:12 +0100)]

Merge branch 'master' into patch-15

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 21:09:47 +0000 (22:09 +0100)]

fix after master merge

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 21:02:27 +0000 (22:02 +0100)]

Merge branch 'master' into patch-16

commit | commitdiff | tree

dependabot[bot] [Thu, 22 Jan 2026 21:01:55 +0000 (21:01 +0000)]

Bump transformers in /contrib/neural-embedding-service

Bumps [transformers](https://github.com/huggingface/transformers) from 4.40.0 to 4.53.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.40.0...v4.53.0)

---
updated-dependencies:
- dependency-name: transformers
dependency-version: 4.53.0
dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 21:00:55 +0000 (21:00 +0000)]

Merge pull request #5835 from rspamd/vstakhov-llm-embedding-improvements

Add expression-based autolearn for neural LLM providers

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 19:48:30 +0000 (19:48 +0000)]

[Fix] Use versioned key for hybrid LLM+symbols manual training

Pending key is now only used for LLM-only mode where embedding
dimensions may vary. Hybrid (LLM+symbols) and symbols-only modes
use versioned key directly since dimension includes stable symbols.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 18:36:26 +0000 (18:36 +0000)]

[Fix] Use versioned key for manual training in symbols-only mode

Manual training via ANN-Train header now writes to versioned key when
no LLM provider is configured. The pending key is only used with LLM
providers where embedding dimensions may vary between versions.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 18:22:06 +0000 (18:22 +0000)]

Merge branch 'master' into vstakhov-llm-embedding-improvements

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 18:20:49 +0000 (19:20 +0100)]

Merge branch 'master' into patch-15

commit | commitdiff | tree

Dmitriy Alekseev [Thu, 22 Jan 2026 18:19:49 +0000 (19:19 +0100)]

Fixy learn_mode typo in neural.lua

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 15:35:09 +0000 (15:35 +0000)]

[Fix] Match fuzzy_check.c hash generation in text_part:get_fuzzy_hashes

Fix text_part:get_fuzzy_hashes() to produce identical hashes as the
fuzzy_check plugin's fuzzy_cmd_from_text_part():

- For short text (<32 words): hash utf_stripped_content directly instead
of individual words, and optionally include subject
- For normal text: skip words with RSPAMD_WORD_FLAG_SKIPPED flag or
empty stems

Add optional subject parameter to include in short text hash calculation
(matches fuzzy_check.c behavior with no_subject=false).

Update rspamadm mime stat to pass subject to get_fuzzy_hashes().

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 13:40:20 +0000 (13:40 +0000)]

[Fix] Stop HTTP watchers before error handlers

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 11:51:53 +0000 (11:51 +0000)]

[Feature] Put subject first in LLM embedding input

Subject is highly valuable for spam detection and placing it first
ensures it's always included even if text content gets truncated
by model token limits.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 22 Jan 2026 11:21:52 +0000 (11:21 +0000)]

[Feature] Rename neural autolearn options to match RBL module naming

Rename check_local/check_authed to exclude_local/exclude_users for
consistency with RBL module conventions. Change exclude_users default
to true (authenticated users excluded by default).

commit | commitdiff | tree

Dmitriy Alekseev [Wed, 21 Jan 2026 15:05:06 +0000 (16:05 +0100)]

Update neural.lua

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 13:39:51 +0000 (13:39 +0000)]

Merge branch 'master' into vstakhov-llm-embedding-improvements

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 13:39:35 +0000 (13:39 +0000)]

Merge pull request #5853 from rspamd/vstakhov-content-urls-rework

[Feature] Include content URLs by default in URL API calls

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 13:16:52 +0000 (13:16 +0000)]

[Test] Set include_content_urls = false for functional tests

Preserve backward compatibility in tests by using the old default
behavior (exclude content URLs).

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 09:57:54 +0000 (09:57 +0000)]

[Feature] Include content URLs by default in URL API calls

- Add `include_content_urls` global option (default: true) to control
  whether URLs extracted from content (PDF, etc.) are included in API calls
- Update task:get_urls(), task:get_emails() to include content URLs by default
- Update lua_util.extract_specific_urls() to use config default when
  need_content is not explicitly specified
- Mark URLs extracted from computed/virtual parts (PDF text) with CONTENT
  flag instead of FROM_TEXT flag, since they may be clickable links
- Add commented documentation in conf/options.inc

Users who want the old behavior can set `include_content_urls = false`
in their options configuration.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 08:57:13 +0000 (08:57 +0000)]

[Feature] Add order-independent table digest using XXH3 XOR accumulation

Add rspamd_cryptobox.fast_hash64() C function that returns XXH3 hash as
two 32-bit integers, enabling XOR accumulation for order-independent
hashing in Lua.

Add lua_util.unordered_table_digest() that produces consistent digests
regardless of table iteration order. This fixes issues where different
Rspamd instances produced different ANN digests for identical configs
due to non-deterministic key ordering in pairs().

The original table_digest had two bugs:
- Used pairs() which iterates in undefined order across Lua VMs
- Ignored numeric and boolean values in the hash

Update neural plugin's providers_config_digest to use the new function,
fixing the "providers config changed" warnings on identical configs.

Also update lua_maps and lua_urls_compose cache key generation to use
unordered_table_digest for more reliable cache hits.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 21 Jan 2026 08:19:21 +0000 (08:19 +0000)]

Merge branch 'master' into vstakhov-llm-embedding-improvements

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 20 Jan 2026 21:41:15 +0000 (21:41 +0000)]

[Fix] Clear pending regexp maps on config reload to prevent use-after-free

During HUP-triggered config reload, the pending_regexp_maps array retained
pointers to re_map objects from the old config after they were freed. When
workers received "regexp map loaded" notifications, they accessed freed memory
(visible as 0x5A poison pattern in re_digest), causing SIGSEGV.

Fix by calling rspamd_regexp_map_clear_pending() before releasing the old
config in reread_config().

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 20 Jan 2026 16:54:43 +0000 (16:54 +0000)]

[Fix] Fix race condition between I/O handler and SIGCHLD in subprocess

The subprocess callback could crash when SIGCHLD handler ran concurrently
with the I/O handler processing large training results. The race:

1. I/O handler receives full data, calls callback
2. SIGCHLD fires during callback execution
3. SIGCHLD handler frees cbdata while callback still uses it
4. Callback returns, I/O handler accesses freed memory -> crash

Fix:
- Add 'dead' flag to track when child has exited
- Set 'replied' BEFORE calling callback (not after)
- SIGCHLD handler skips cleanup if replied=TRUE (I/O handler owns it)
- I/O handler does cleanup after callback if dead=TRUE
- Extract cleanup into rspamd_lua_cbdata_free() helper

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 20 Jan 2026 16:13:21 +0000 (16:13 +0000)]

[Fix] Use rspamd_text for subprocess callback data to avoid large allocations

Replace lua_pushlstring with lua_new_text(FALSE) when passing subprocess
result data to Lua callbacks. This avoids copying potentially large buffers
(e.g., 2.7MB neural network training results) into Lua's heap, which could
cause crashes under memory pressure.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 20 Jan 2026 16:12:41 +0000 (16:12 +0000)]

[Fix] Fix ROC threshold calculation for ham/spam labels

The ROC calculation was checking outputs[i][1] == 0 for ham samples,
but the ceb_neg cost function uses -1.0 for ham and 1.0 for spam.
Changed to check outputs[i][1] < 0 to correctly identify ham samples.

Mirror of https://github.com/rspamd/rspamd.git