git.ipfire.org Git - thirdparty/rspamd.git/log

]> git.ipfire.org Git - thirdparty/rspamd.git/log

projects / thirdparty / rspamd.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 14:32:31 +0000 (15:32 +0100)]

[Test] Fix integer expression errors in ASAN log checker

Replace grep -c with wc -l to avoid malformed output when grep
returns results with filenames or multiple lines. The grep -c
command was producing output like "0\n0" instead of a single
integer, causing bash comparison failures.

Use wc -l with tr to ensure clean integer values, and add
error suppression to comparison operators for robustness.

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 14:19:27 +0000 (15:19 +0100)]

[Fix] Stat: fix memory leak in metadata tokenization

The kvec structure allocated in rspamd_stat_tokenize_parts_metadata
was never freed, causing a memory leak of its internal buffer.
The leak was 450KB across 569 objects as reported by ASAN.

Tie the kvec lifetime to the task mempool by registering a destructor
that properly releases the internal buffer when the task is destroyed.

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 11:07:27 +0000 (12:07 +0100)]

Merge pull request #5688 from rspamd/vstakhov-integration-tests

Add Docker-based integration test suite with rspamd-test-corpus

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 10:16:26 +0000 (11:16 +0100)]

[Test] Stop containers before checking ASAN logs

ASAN logs are written only when processes terminate, not during runtime.
Need to stop Docker containers first to flush ASAN logs, then check them.

Order of steps:
1. Run integration test
2. Collect Docker logs (while running)
3. Stop Docker Compose (triggers ASAN log flush)
4. Check AddressSanitizer logs (now available)
5. Upload artifacts

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 10:01:44 +0000 (11:01 +0100)]

[Test] Run integration tests on schedule only (daily at 2 AM UTC)

Integration tests are resource-intensive and take ~30 minutes to complete.
Running them on every commit/PR is too slow for development workflow.

The test can still be triggered manually via workflow_dispatch if needed.

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 09:52:46 +0000 (10:52 +0100)]

[Test] Fix rspamd startup timeout and ASAN configuration

- Increase wait time to 3 minutes (rspamd takes ~40s to start)
- Remove fast_unwind_on_malloc=0 which causes rspamd to hang
- Keep ASAN_OPTIONS: detect_leaks=1, log_path=/data/asan.log
- Keep LSAN_OPTIONS: exitcode=0 to collect all leaks
- ASAN logs are written on process termination

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 09:05:52 +0000 (10:05 +0100)]

[Test] Improve startup diagnostics and show ASAN logs on failure

- Show full rspamd logs, ASAN logs, and container stderr on startup failure
- Add detailed logging after docker compose up
- Check processes in container to verify rspamd is running

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 08:52:26 +0000 (09:52 +0100)]

[Test] ASAN errors should immediately fail the test

Remove halt_on_error=0, abort_on_error=0, exitcode=0 from ASAN_OPTIONS
so critical errors (buffer overflow, use-after-free) fail immediately.
Keep exitcode=0 only in LSAN_OPTIONS to collect all memory leaks.

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 08:47:47 +0000 (09:47 +0100)]

[Test] Improve ASAN configuration and fix logs order

- Add proper ASAN_OPTIONS: quarantine_size_mb, malloc_context_size, fast_unwind_on_malloc
- Add exitcode=0 to prevent ASAN from failing tests
- Collect Docker logs before uploading
- Add debug output for ASAN env vars and /data contents

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 08:03:46 +0000 (09:03 +0100)]

[Test] Disable leak detection for rspamadm and rspamc utilities

Set ASAN_OPTIONS=detect_leaks=0 for CLI tools to avoid false
positives, while rspamd daemon still has leak detection enabled

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 07:50:48 +0000 (08:50 +0100)]

[Test] Enable debug build with ASAN and leak sanitizer

Use -DENABLE_FULL_DEBUG=ON -DSANITIZE=address,leak instead of
release build which is incompatible with sanitizers

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 18 Oct 2025 07:27:35 +0000 (08:27 +0100)]

[Test] Fix ASAN log permissions and remove broken log_suffix

- Remove log_suffix option (ASAN adds PID automatically)
- Add chmod to fix permissions on ASAN logs before upload
- Prevents permission denied errors in artifact upload

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 20:52:22 +0000 (21:52 +0100)]

[Test] Fix results filename and ASAN for multiple processes

- Rename scan_results.json to results.json for workflow
- Add log_suffix=.%p to ASAN_OPTIONS for per-process logs
- Add log_exe_name=1 and log_threads=1 for better debugging

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 19:54:01 +0000 (20:54 +0100)]

[Test] Fix fuzzy detection and enable ASAN

- Scan same shuffled files used for training to get accurate fuzzy detection rate
- Build with AddressSanitizer enabled (-DENABLE_SANITIZER=address)
- Add libasan8 and missing runtime libraries to Docker container

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 19:27:24 +0000 (20:27 +0100)]

[Test] Use directory scanning instead of file lists

rspamc can scan directories directly with -n for parallelism

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 19:07:50 +0000 (20:07 +0100)]

[Test] Disable set -e around scanning to capture errors

Use set +e temporarily to allow error log display before exit

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 17:42:39 +0000 (18:42 +0100)]

[Test] Add error logging for scanning phase

Separate stderr to scan_errors.log and display on failure
to debug exit code 123 issue

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 17:19:00 +0000 (18:19 +0100)]

[Test] Use xargs to avoid argument list too long error

Pass file list through xargs instead of command substitution
to handle 1000+ files, while keeping rspamc -n parallelism

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 15:44:06 +0000 (16:44 +0100)]

[Test] Set permissions on data directory for container writes

Add chmod 777 after mkdir to allow container to write
shuffled_files.txt and other temporary files

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 15:36:40 +0000 (16:36 +0100)]

[Test] Download corpus before Docker Compose starts

Move corpus download step before Docker Compose to avoid
permission issues with data directory created by Docker

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 15:22:04 +0000 (16:22 +0100)]

[Test] Remove unnecessary chmod that fails on existing directory

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 15:13:22 +0000 (16:13 +0100)]

[Test] Use rspamc -n instead of xargs for parallel scanning

rspamc already supports parallelism via -n flag

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 15:11:28 +0000 (16:11 +0100)]

[Test] Train and scan directly from corpus without copying

- Use file lists instead of copying files to avoid permission errors
- Train fuzzy/bayes directly from read-only mounted corpus
- Remove unnecessary directory creation
- Use xargs for parallel scanning

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 14:49:38 +0000 (15:49 +0100)]

[Test] Use real corpus and filter small files

- Mount data/corpus in docker instead of functional/messages
- Filter emails by minimum size (200 bytes) for adequate tokens
- Remove CORPUS_DIR override in workflow (auto-detected)

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 13:50:05 +0000 (14:50 +0100)]

[Test] Fix fuzzy_add and learn commands syntax

Process files individually instead of using directory syntax
with colon, which was causing 'cannot stat file' errors

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 13:48:17 +0000 (14:48 +0100)]

[Test] Use safer AWK variable passing to prevent syntax errors

- Validate all count variables are numeric using grep
- Use awk -v to pass variables instead of bash substitution
- This prevents syntax errors when jq returns non-numeric values

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 13:26:34 +0000 (14:26 +0100)]

[Test] Fix AWK syntax error in integration test analysis

Add default values for count variables to prevent division errors
when jq returns empty results

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 13:11:31 +0000 (14:11 +0100)]

[Test] Pre-create data subdirectories with proper permissions

Create fuzzy_train, bayes_spam, bayes_ham, test_corpus directories
with 777 permissions before running integration test to fix Docker
container write permission errors

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 12:58:50 +0000 (13:58 +0100)]

[Test] Fix corpus directory path

Change CORPUS_DIR from data/corpus/corpus to data/corpus
Archive extracts as data/corpus/ directly, not nested

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 12:49:06 +0000 (13:49 +0100)]

[Test] Fix data directory permissions for corpus download

Create data directory with proper permissions before downloading corpus
Fixes: curl: (23) Failure writing output to destination

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 12:40:08 +0000 (13:40 +0100)]

[Minor] Fix env variables for integration tests

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 12:24:17 +0000 (13:24 +0100)]

[Test] Fix UCL config syntax and env variable names

- Move opening braces to same line as key (UCL requirement)
- Fix worker-normal.inc: keypair { on same line
- Fix worker-fuzzy.inc: keypair { on same line
- Fix worker-proxy.inc: upstream { and keypair { on same line
- Update all env variable names to match .env.keys format:
  - WORKER_* -> RSPAMD_WORKER_*
  - FUZZY_* -> RSPAMD_FUZZY_*
  - PROXY_* -> RSPAMD_PROXY_*

Note: Using --no-verify as clang-format conflicts with UCL syntax

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 11:59:29 +0000 (12:59 +0100)]

[Test] Fix key generation to create .env.keys file

Generate .env.keys instead of configs/fuzzy-keys.conf
Use environment variable format (KEY=VALUE) for docker-compose

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 11:54:33 +0000 (12:54 +0100)]

[Test] Add permissions block to integration test workflow

Set least-privilege defaults with contents:read permission

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 11:51:59 +0000 (12:51 +0100)]

[Test] Add system dependencies installation to integration test workflow

- Install ragel, cmake, ninja-build for compilation
- Install all required libraries (luajit, glib, ssl, icu, etc.)
- Fix CI build failure

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 11:44:44 +0000 (12:44 +0100)]

Merge pull request #5679 from fatalbanana/bayes_autolearn_localauth

[Feature] Allow skipping local/auth'd mail in default bayes autolearn…

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 11:40:49 +0000 (12:40 +0100)]

[Test] Update integration tests to use rspamd-test-corpus

- Fix integration-test.py -> integration-test.sh references
- Add rspamd-test-corpus repository integration
- Update workflow to download corpus from GitHub releases
- Update README with corpus usage instructions

The corpus repository provides:
- 1000 base email messages (SpamAssassin)
- Structure for regression tests
- Automated corpus management

Corpus: https://github.com/rspamd/rspamd-test-corpus

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 15:26:46 +0000 (16:26 +0100)]

[Test] Add Docker-based integration test suite

Add comprehensive integration testing framework:
- Docker Compose setup with Redis and Rspamd (ASAN build)
- Fuzzy storage encryption with environment-based key management
- Shell-based test harness using rspamc for parallel operations
- Support for fuzzy training, Bayes learning, and scanning
- Makefile targets for easy test execution
- ASAN leak detection and log checking

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 09:45:07 +0000 (10:45 +0100)]

Merge pull request #5687 from rspamd/vstakhov-mime-anonymize-improvements

Improve MIME anonymization with LLM support and enhanced privacy

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 17 Oct 2025 07:53:57 +0000 (08:53 +0100)]

[Fix] Remove Authentication-Results and anonymize envelope-from in Received headers

- Remove Authentication-Results header containing sensitive information
including email addresses, domains, and authentication check results
- Anonymize envelope-from clauses in Received headers to prevent
email address leakage

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 16:56:27 +0000 (17:56 +0100)]

[Feature] Improve MIME anonymization with LLM support and enhanced privacy

- Add Claude/Anthropic API support alongside OpenAI and Ollama
- Add LLM-based subject line anonymization with context-aware prompts
- Remove privacy-sensitive headers: DKIM, ARC, X-Spamd-Result, Return-Path, Delivered-To
- Anonymize recipient addresses in Received header 'for' clauses
- Add comprehensive debug logging throughout anonymization process
- Support per-model parameter configuration for flexible API usage
- Fix error handling to properly exit on anonymization failure
- Add finish_reason analysis for detecting truncated LLM responses
- Improve default LLM prompt for better anonymization quality

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 17:12:07 +0000 (18:12 +0100)]

Merge pull request #5686 from PHPGangsta/patch-4

Set headers in DMARC reports to prevent out-of-office replies

commit | commitdiff | tree

Michael Kliewe [Thu, 16 Oct 2025 16:13:09 +0000 (18:13 +0200)]

Set headers in DMARC reports to prevent out-of-office replies

To prevent out-of-office-replies, vacation-replies or similar, we should set a few headers in DMARC report mails, which seems to be best-practice for these types of system-generated mails.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 07:43:22 +0000 (08:43 +0100)]

[Fix] Fix use-after-free in fuzzy TCP connection cleanup

Cache the upstream name as a string when creating TCP connections
to avoid dereferencing the upstream pointer during connection
cleanup. The upstream library may already be freed when the
connection destructor is called during config cleanup, causing a
use-after-free when accessing conn->server.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 07:38:19 +0000 (08:38 +0100)]

[Fix] Fix compiler warnings in lua_logger and dkim modules

Fixed incompatible pointer type warnings in lua_logger.c when converting
strings to integers by using gulong/glong types matching rspamd_strtoul/
rspamd_strtol function signatures.

Fixed enum type mismatch in dkim.c by adding RSPAMD_DKIM_KEY_INVALID to
rspamd_dkim_key_type enum and handling it in the verification switch.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 07:27:45 +0000 (08:27 +0100)]

Merge pull request #5685 from moisseev/webui

[Minor] Update CodeJar to version 4.3.0

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 07:27:32 +0000 (08:27 +0100)]

Merge pull request #5684 from rspamd/vstakhov-arc-sign-fix

[Fix] ARC module now supports ed25519 keys

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 16 Oct 2025 06:20:51 +0000 (07:20 +0100)]

[Test] Add ARC chain verification tests with multiple signatures

Adds roundtrip tests that sign messages twice (creating i=1 and i=2)
and verify the entire chain to ensure proper ARC chain validation.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 15 Oct 2025 17:44:55 +0000 (18:44 +0100)]

[Fix] Restore strict ARC header ordering to comply with RFC 8617

The split of ARC header insertion into two separate lua_mime.modify_headers
calls removed the explicit ordering enforcement. This caused ARC-Seal to
potentially be inserted before ARC-Authentication-Results and ARC-Message-Signature,
violating RFC 8617 requirements and causing ARC validation failures.

Consolidate all three ARC headers into a single modify_headers call with
explicit order parameter to ensure correct insertion sequence.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 15 Oct 2025 15:30:53 +0000 (16:30 +0100)]

[Feature] Add DKIM signing key API for flexible ARC signing

Implements new C API for DKIM signing operations:
- rspamd_plugins.dkim.load_sign_key() - loads signing key
- rspamd_plugins.dkim.sign_key_get_alg() - detects key algorithm
- rspamd_plugins.dkim.sign_digest() - signs digest with loaded key

Updates ARC module to use new API for proper ed25519 and RSA support.
Adds comprehensive tests and improved signing eligibility diagnostics.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 15 Oct 2025 14:32:22 +0000 (15:32 +0100)]

[Feature] Add milter.add_headers object format support to rspamc --mime

Support milter.add_headers entries in {order: N, value: "..."} object
format in addition to plain strings and arrays. This format is used by
lua_mime.modify_headers() to control header insertion order.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 15 Oct 2025 13:17:07 +0000 (14:17 +0100)]

[Feature] Add milter header support to rspamc --mime output

- Process milter.add_headers from JSON response in --mime mode
- Supports both single string and array values for headers
- Enables ARC headers (and other milter-added headers) to appear in modified message output
- Removes outdated TODO comment about milter header support

commit | commitdiff | tree

Alexander Moisseev [Wed, 15 Oct 2025 13:15:10 +0000 (16:15 +0300)]

[Minor] Update CodeJar to version 4.3.0

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 15 Oct 2025 11:39:54 +0000 (12:39 +0100)]

[Fix] ARC module now supports ed25519 keys

- Remove hardcoded RSA-only restriction in do_sign()
- Replace manual RSA-specific key loading and signing in arc_sign_seal()
- Use native C dkim_sign() function with sign_type='arc-seal'
- Leverages existing C infrastructure that supports both RSA and ed25519
- Fixes 'DECODER routines::unsupported' error when loading ed25519 keys
- Algorithm detection (rsa-sha256 vs ed25519-sha256) now automatic
- Reduces arc_sign_seal() from ~100 lines to ~50 lines
- No FFI dependency, works with plain Lua installations

Resolves RSP-76

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 15:12:42 +0000 (16:12 +0100)]

Merge pull request #5681 from rspamd/vstakhov-composites-split

[Fix] Implement two-phase composite evaluation for postfilter dependencies

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 14:38:39 +0000 (15:38 +0100)]

[Fix] Use null-terminated string for symbol lookup in composite dependency analysis

In composite_dep_callback, atom->begin from rspamd_ftok_t is not null-terminated,
but was being passed directly to symbol_needs_second_pass() which calls
rspamd_symcache_get_symbol_flags() expecting a null-terminated C string.

This could cause incorrect symbol lookups or undefined behavior. Fix by creating
a std::string to ensure null-termination before passing to the C API.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 13:59:01 +0000 (14:59 +0100)]

[Fix] Implement two-phase composite evaluation for postfilter dependencies

Fixes #5674 where composite rules combining postfilter/statistics symbols
with regular filter symbols failed to trigger. Composites like
BAYES_SPAM & NEURAL_SPAM didn't work because BAYES_SPAM is added during
CLASSIFIERS stage and NEURAL_SPAM during POST_FILTERS stage, but composites
were only evaluated once during COMPOSITES stage.

Solution:
- Analyze composite dependencies at configuration time
- Split composites into first-pass (depend only on filters) and second-pass
  (depend on postfilters/stats or other second-pass composites)
- Evaluate first-pass composites during COMPOSITES stage via symcache
- Evaluate second-pass composites during COMPOSITES_POST stage by directly
  iterating the second_pass_composites vector
- Skip symcache checks for second-pass composites during second pass to
  force re-evaluation despite being marked as checked in first pass
- Add functional test demonstrating the fix

The dependency analysis uses transitive closure: if composite A depends on
composite B, and B needs second pass, then A also needs second pass.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 13:57:15 +0000 (14:57 +0100)]

Merge pull request #5680 from fatalbanana/multimap_multisymbol_numerals

Multimap: deal with symbols with leading numerals

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 12:36:15 +0000 (13:36 +0100)]

Revert "[Fix] Move nresults_postfilters recording to after POST_FILTERS stage"

This reverts commit b4649ad851f67e64d2186100b9b53eb187f1f062.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 10:58:32 +0000 (11:58 +0100)]

[Fix] Move nresults_postfilters recording to after POST_FILTERS stage

This fixes an issue where composite rules depending on statistics symbols
(like BAYES_SPAM) would fail to trigger. The nresults_postfilters counter
was being set too early (after COMPOSITES stage), preventing detection of
symbols added during autolearn or other post-filter processing.

Fixes #5674

commit | commitdiff | tree

Andrew Lewis [Tue, 14 Oct 2025 10:54:31 +0000 (12:54 +0200)]

[Fix] Multimap: deal with symbols with leading numerals

commit | commitdiff | tree

Andrew Lewis [Tue, 14 Oct 2025 10:41:09 +0000 (12:41 +0200)]

[Test] Multimap symbol with leading numerals

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 10:31:43 +0000 (11:31 +0100)]

Merge pull request #5676 from rspamd/vstakhov-url-patching

[Feature] Add HTML URL rewriting infrastructure

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 10:07:35 +0000 (11:07 +0100)]

[Fix] Correct HTML attribute value offset calculation

Fix two issues in HTML parser attribute value span calculation:
1. Empty quoted values (href="" or src='') now properly initialize value_start pointer
2. Unquoted attribute values no longer incorrectly lowercase the first character

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 09:42:19 +0000 (10:42 +0100)]

[Fix] Add HTML entity encoding for URL rewriting

Replacement URLs are now properly encoded when inserted into HTML attributes. This prevents special characters like & from creating malformed HTML that could break parsing.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 14 Oct 2025 08:02:46 +0000 (09:02 +0100)]

[Refactor] Direct C++ Lua binding for get_html_urls()

Replace the C wrapper layer (rspamd_html_enumerate_urls) with a direct
C++ Lua binding to eliminate unnecessary data copying. Previously, URL
candidates were copied from C++ to C structures, then to Lua. Now they
are pushed directly from C++ to Lua using lua_pushlstring.

Changes:
- Add lua_html_url_rewrite.cxx with direct C++ Lua binding
- Remove rspamd_html_enumerate_urls() C wrapper and struct
- Update lua_task.c to use extern declaration for C++ function
- Add lua_html_url_rewrite.cxx to CMakeLists.txt
- Use lua_createtable() to preallocate tables with known sizes

This improves performance by avoiding intermediate allocations, string
copies, and table reallocations while maintaining the same Lua API.

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 13 Oct 2025 15:54:06 +0000 (16:54 +0100)]

[Minor] Remove irrelevant file

commit | commitdiff | tree

Andrew Lewis [Mon, 13 Oct 2025 14:58:53 +0000 (16:58 +0200)]

[Feature] Allow skipping local/auth'd mail in default bayes autolearn condition

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 13 Oct 2025 10:46:09 +0000 (11:46 +0100)]

[Feature] Add task:get_html_urls() for async URL rewriting

Introduce a two-phase API for HTML URL rewriting that separates URL
extraction from the rewriting step. This enables async workflows where
URLs are batched and checked against external services before rewriting.

Changes:
- Add rspamd_html_enumerate_urls() C wrapper to extract URL candidates
- Add task:get_html_urls() Lua method returning URL info per HTML part
- Include comprehensive unit tests covering edge cases
- Provide async usage examples (HTTP, Redis, simple patterns)

The new API complements the existing task:rewrite_html_urls() method,
allowing users to extract URLs, perform async operations, then apply
rewrites using a lookup table callback.

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 13 Oct 2025 09:22:52 +0000 (10:22 +0100)]

[Fix] Use UTF-8 buffer for HTML URL rewriting

The HTML parser calculates attribute value offsets from the UTF-8
buffer (utf_raw_content), but URL rewriting was incorrectly applying
patches to the MIME-decoded buffer (parsed). When charset conversion
occurs (e.g., from ISO-8859-1 to UTF-8), the same character can have
different byte lengths, causing incorrect patch positions.

This commit ensures all URL rewriting operations use the UTF-8 buffer
consistently, preventing corruption with non-ASCII characters.

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 12 Oct 2025 20:01:46 +0000 (21:01 +0100)]

Merge pull request #5678 from moisseev/search

[Minor] Add search syntax hint to history table filter input

commit | commitdiff | tree

Copilot [Sun, 12 Oct 2025 17:10:43 +0000 (20:10 +0300)]

[Minor] Add search syntax hint to history table filter input

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: moisseev <2275981+moisseev@users.noreply.github.com>

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 17:04:27 +0000 (18:04 +0100)]

Merge pull request #5675 from moisseev/visibility

[Rework] Refactor element visibility control to use Bootstrap classes

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 14:40:20 +0000 (15:40 +0100)]

[Test] Add comprehensive Lua unit tests for HTML URL rewriting

Add 12 Lua-based unit tests covering:
- Basic URL rewriting with callback function
- Multiple URLs in same HTML part
- Selective rewriting (nil returns)
- Non-HTML parts skipped
- Quoted-printable encoded HTML
- Empty HTML handling
- Error handling (invalid callback)
- Multipart messages
- URLs with special characters
- Data and CID URI schemes skipped

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 14:18:02 +0000 (15:18 +0100)]

[Feature] Use luaL_ref for URL rewriter callback instead of global function name

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 14:08:13 +0000 (15:08 +0100)]

[Feature] Add Lua binding for HTML URL rewriting (task:rewrite_html_urls)

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 13:42:15 +0000 (14:42 +0100)]

[Test] Add unit tests for HTML URL rewriting patch engine

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 10:05:43 +0000 (11:05 +0100)]

[Fix] Add missing <optional> include to html_url_rewrite.hxx

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 11 Oct 2025 09:03:37 +0000 (10:03 +0100)]

[Feature] Add HTML URL rewriting infrastructure

Implements infrastructure for rewriting clickable URLs in HTML content:

- Add span tracking to HTML parser to capture byte offsets of href/src attribute values
- Implement patch-based URL rewriting engine with overlap validation
- Add C→Lua glue for URL rewriting callback functions
- Support MIME re-encoding (quoted-printable, base64, 8bit) for modified content
- Add configuration options: enable_url_rewrite, url_rewrite_lua_func, url_rewrite_fold_limit

The feature allows Lua callbacks to transform URLs while preserving HTML structure
and MIME encoding. Integration with milter REPLBODY support enables message body
replacement.

commit | commitdiff | tree

Copilot [Fri, 10 Oct 2025 17:17:41 +0000 (20:17 +0300)]

[Rework] Refactor element visibility control to use Bootstrap classes

Replace inline styles and mixed jQuery methods with consistent helper functions and `d-none` class for better maintainability and performance.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: moisseev <2275981+moisseev@users.noreply.github.com>

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 10 Oct 2025 12:37:32 +0000 (13:37 +0100)]

[Feature] Improve body rewriting support in rspamc and proxy

- Add --output-body option to rspamc for saving rewritten message body to file
  instead of printing to stdout
- Enable body_block protocol flag in proxy for non-milter mode to ensure
  message body is always available for rewriting operations
- This ensures consistent body rewriting capability across all protocol modes
  (rspamc, milter, and proxy)

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 18:48:58 +0000 (19:48 +0100)]

Merge pull request #5673 from rspamd/cursor/RSP-26-fix-milter-remove-headers-array-handling-1740

Fix milter remove_headers array handling

commit | commitdiff | tree

Cursor Agent [Thu, 9 Oct 2025 15:45:51 +0000 (15:45 +0000)]

feat: Support array of positions for remove_headers

Co-authored-by: v <v@rspamd.com>

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 15:42:21 +0000 (16:42 +0100)]

Merge pull request #5669 from rspamd/vstakhov-fuzzy-tcp-rework

Add TCP protocol support for fuzzy storage

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 15:33:08 +0000 (16:33 +0100)]

Merge pull request #5672 from moisseev/liners

[Test] Update JS linters

commit | commitdiff | tree

Alexander Moisseev [Thu, 9 Oct 2025 13:20:16 +0000 (16:20 +0300)]

[Test] Update JS linters

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 10:36:02 +0000 (11:36 +0100)]

[Fix] Fix double-release of fuzzy_tcp_session on invalid commands

When a TCP command fails to parse in rspamd_fuzzy_tcp_io, the
fuzzy_tcp_session was released prematurely while cmd_session still
held a reference to it. This caused a double-release when cmd_session
was destroyed, potentially leading to memory corruption.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 10:07:32 +0000 (11:07 +0100)]

Merge pull request #5671 from rspamd/cursor/RSP-278-fix-proxy-client-ip-forwarding-c0ae

Fix proxy client ip forwarding

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 09:43:35 +0000 (10:43 +0100)]

[Fix] Preserve IP header from upstream proxy in chain

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 09:06:22 +0000 (10:06 +0100)]

[Fix] Fix refcount leak in fuzzy_session destructor for TCP sessions

The fuzzy_session created for TCP command processing holds a reference
to its parent fuzzy_tcp_session but failed to release it in the destructor,
causing a refcount leak and potential use-after-free issue.

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 08:55:34 +0000 (09:55 +0100)]

[Fix] Use rspamd event wrapper consistently for TCP session timer

The TCP session timer was incorrectly mixing rspamd's rspamd_io_ev wrapper
with direct libev API calls (ev_timer_init/start/stop), creating inconsistent
state that could lead to resource management issues.

Fixed by using rspamd_ev_watcher_init/start/stop consistently throughout,
passing fd=-1 for pure timers without file descriptors. Also removed the
now-unused fuzzy_tcp_timer_libev_cb wrapper function.

commit | commitdiff | tree

Cursor Agent [Thu, 9 Oct 2025 08:10:58 +0000 (08:10 +0000)]

feat: Add client IP to proxy messages

Co-authored-by: v <v@rspamd.com>

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 9 Oct 2025 07:31:51 +0000 (08:31 +0100)]

Merge branch 'master' into vstakhov-fuzzy-tcp-rework

Resolved conflict in src/plugins/fuzzy_check.c by including both:
- HTML shingles configuration parsing from master
- TCP connection initialization from feature branch

Fixed trailing whitespace in config files from master.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 21:38:57 +0000 (22:38 +0100)]

Merge pull request #5661 from rspamd/vstakhov-html-fuzzy

[Feature] Add HTML fuzzy hashing for structural similarity matching

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 16:23:10 +0000 (17:23 +0100)]

[Fix] Fix frequency-based ordering in HTML domain hashing

The hash_top_domains function was sorting domains by frequency (descending),
but hash_domain_list was immediately re-sorting them alphabetically, which
negated the frequency information. This resulted in incorrect hashes where
domain order mattered for fuzzy matching.

Added preserve_order parameter to hash_domain_list to optionally skip
alphabetical re-sorting when frequency-based ordering should be maintained.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 16:12:17 +0000 (17:12 +0100)]

Merge branch 'master' into vstakhov-html-fuzzy

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 15:12:02 +0000 (16:12 +0100)]

[Fix] Fix HTML shingles hash generation bugs

- Skip empty domains in hash_domain_list and hash_top_domains
- Validate HTML features are initialized before hashing
- Return zero hash for invalid/empty input instead of garbage

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 14:58:58 +0000 (15:58 +0100)]

[Fix] Fix memory leaks in HTML shingles generation

- Require mempool parameter (cannot be NULL) for consistent memory management
- Change helper function to fill shingle structure in-place instead of allocating
- Eliminate unnecessary allocation, memcpy, and potential memory leaks
- All allocations now use rspamd_mempool_alloc0 consistently

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 14:23:34 +0000 (15:23 +0100)]

Merge pull request #5655 from rspamd/vstakhov-aliases-rewamp

[Feature] Email aliases resolution and message classification

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 8 Oct 2025 10:16:32 +0000 (11:16 +0100)]

[Fix] Fix set_addr validation to prevent malformed addresses

The set_addr function now properly checks that both addr.user and addr.domain
are non-empty strings before constructing addr.addr and addr.raw. This prevents
creating malformed addresses like '@domain.com' when addr.user is empty, and
ensures consistent state when addr.domain is empty.

Mirror of https://github.com/rspamd/rspamd.git