git.ipfire.org Git - thirdparty/rspamd.git/log

Neural module rework: provider-based feature fusion, LLM embeddings, normalization, and v3 schema

This PR evolves the neural module from a symbols-only scorer into a general feature-fusion classifier with pluggable providers. It adds an LLM embedding provider, introduces trained normalization and metadata persistence, and isolates new models via a schema/prefix bump.

- The existing neural module is limited to metatokens and symbols.
- We want to combine multiple feature sources (LLM embeddings now; Bayes/FastText later).
- Ensure consistent train/infer behavior with stored normalization and provider metadata.
- Improve operability with caching, digest checks, and safer rollouts.

- Provider architecture
  - Provider registry and fusion: `collect_features(task, rule)` concatenates provider vectors with optional weights.
  - New LLM provider: `lualib/plugins/neural/providers/llm.lua` using `rspamd_http` and `lua_cache` for Redis-backed embedding caching.
  - Symbols provider extracted: `lualib/plugins/neural/providers/symbols.lua`.
- Normalization and PCA
  - Configurable fusion normalization: none/unit/zscore.
  - Trained normalization stats computed during training and applied at inference.
  - Existing global PCA preserved; loaded/saved alongside ANN.
- Schema and compatibility
  - `plugin_ver` bumped to '3' to isolate from earlier profiles.
  - Redis save/load extended:
    - Profiles include `providers_digest`.
    - ANN hash can include `providers_meta`, `norm_stats`, `pca`, `roc_thresholds`, `ann`.
  - ANN load validates provider digest and skips apply on mismatch.
- Performance and reliability
  - LLM embeddings cached in Redis (content+model keyed).
  - Graceful fallback to symbols if providers not configured or fail.
  - Basic provider configuration validation.

- `lualib/plugins/neural.lua`: provider registry, fusion, normalization helpers, profile digests, training pipeline updates.
- `src/plugins/lua/neural.lua`: integrates fusion into inference/learning, loads new metadata, applies normalization, validates digest.
- `lualib/plugins/neural/providers/llm.lua`: LLM embeddings with Redis cache.
- `lualib/plugins/neural/providers/symbols.lua`: legacy symbols provider wrapper.
- `lualib/redis_scripts/neural_save_unlock.lua`: stores `providers_meta` and `norm_stats` in ANN hash.
- `NEURAL_REWORK_PLAN.md`: design and phased TODO.

- Enable LLM alongside symbols:
```ucl
neural {
  rules {
    default {
      providers = [
        { type = "symbols"; weight = 0.5; },
        { type = "llm"; model = "text-embed-1"; url = "https://api.openai.com/v1/embeddings";
          cache_ttl = 86400; weight = 1.0; }
      ];
      fusion { normalization = "zscore"; }
      roc_enabled = true;
      max_inputs = 256; # optional PCA
    }
  }
}
```
- LLM provider uses `gpt` block for defaults if present (e.g., API key). You can override `model`, `url`, `timeout`, and cache parameters per provider entry.

- Existing (v2) neural profiles remain unaffected (new `plugin_ver = '3'` prefixes).
- New profiles embed `providers_digest`; incompatible provider sets won’t be applied.
- No immediate cleanup required; TTL-based cleanup keeps old keys around until expiry.

- Validated: provider digest checks, ANN load/save roundtrip, normalization application at inference, LLM caching paths, symbols fallback.
- Please test with/without LLM provider and with `fusion.normalization = none|unit|zscore`.

- LLM latency/cost is mitigated by Redis caching; timeouts are configurable per provider.
- Privacy: use trusted endpoints; no content leaves unless configured.
- Failure behavior: missing/failed providers degrade to others; training/inference can proceed with partial features.

- Rules without `providers` continue to use symbols-only behavior.
- Existing command surface unchanged; future PR will introduce `rspamc learn_neural:*` and controller endpoints.

- [x] Provider registry and fusion
- [x] LLM provider with Redis caching
- [x] Symbols provider split
- [x] Normalization (unit/zscore) with trained stats
- [x] Redis schema v3 additions and profile digest
- [x] Inference uses trained normalization
- [x] Basic provider validation and fallbacks
- [x] Plan document
- [ ] Per-provider budgets/metrics and circuit breaker for LLM
- [ ] Expand providers: Bayes and FastText/subword vectors
- [ ] Per-provider PCA and learned fusion
- [ ] New CLI (`rspamc learn_neural`) and status/invalidate endpoints
- [ ] Documentation expansion under `docs/modules/neural.md`

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 16 Aug 2025 14:23:49 +0000 (20:23 +0600)]

Merge pull request #5572 from hunter-nl/master

Update GPT plugin to support OpenAI GPT-5 and other newer models

commit | commitdiff | tree

René Draaisma [Sat, 16 Aug 2025 08:56:15 +0000 (10:56 +0200)]

Merge branch 'master' of https://github.com/hunter-nl/rspamd

commit | commitdiff | tree

René Draaisma [Sat, 16 Aug 2025 08:55:40 +0000 (10:55 +0200)]

Updated gpt.lua to set default gpt-5-mini as model, fix issue when GPT max_completion_tokens exceeded and returned empty reason field, Set default group to GPT for Symbols, group is now also configurable in settings with extra_symbols, fix issue when no score is defined in settings at extra_symbols, default score is now 0

commit | commitdiff | tree

hunter-nl [Fri, 15 Aug 2025 15:55:43 +0000 (17:55 +0200)]

Merge branch 'master' into master

commit | commitdiff | tree

René Draaisma [Fri, 15 Aug 2025 15:55:03 +0000 (17:55 +0200)]

Updated gpt.lua to provide model parameters with the settings

commit | commitdiff | tree

hunter-nl [Fri, 15 Aug 2025 14:21:36 +0000 (16:21 +0200)]

Update gpt.lua to make use of lua_util.deepcopy function.

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 15 Aug 2025 09:11:18 +0000 (10:11 +0100)]

[Fix] Bayes: Try to be bug-to-bug compatible

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 15 Aug 2025 08:25:57 +0000 (14:25 +0600)]

Merge pull request #5574 from moisseev/e2e-playwright

[Test] Display browser version in HTML report and console

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 15 Aug 2025 08:25:46 +0000 (14:25 +0600)]

Merge pull request #5570 from fatalbanana/be_kind_to_comcast

[Minor] Drop overzealous regex from hfilter

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 15 Aug 2025 08:25:40 +0000 (14:25 +0600)]

Merge branch 'master' into be_kind_to_comcast

commit | commitdiff | tree

Alexander Moisseev [Fri, 15 Aug 2025 06:43:51 +0000 (09:43 +0300)]

[Minor] Enable multimap module in WebUI E2E workflow

commit | commitdiff | tree

Alexander Moisseev [Thu, 14 Aug 2025 15:10:35 +0000 (18:10 +0300)]

[Test] Display browser version in HTML report and console

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 14:51:24 +0000 (16:51 +0200)]

Update gpt.lua to fix spaces on empty lines

To fix luacheck "line contains only whitespace"

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 14:03:43 +0000 (16:03 +0200)]

Update gpt.lua to get fresh body for each model iteration

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 13:03:42 +0000 (15:03 +0200)]

Update gpt.lua to remove unneeded body_base.model

Not needed in the body_base structure.

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 12:57:04 +0000 (14:57 +0200)]

Update gpt.lua to handle OpenAI parallel old and new models

When in rspamd_config is specified multiple models (old/new), this is handled now correctly to set the required attributes for each model request.

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 12:24:36 +0000 (14:24 +0200)]

Update gpt.lua to support newer models without temperature attribute

Newer models do not support temperature attribute anymore.

commit | commitdiff | tree

hunter-nl [Thu, 14 Aug 2025 10:36:16 +0000 (12:36 +0200)]

Update gpt.lua to support newer models with max_completion_tokens

Newer and reasoning models requires max_completion_tokens instead of max_tokens attribute.

commit | commitdiff | tree

Andrew Lewis [Wed, 13 Aug 2025 10:16:07 +0000 (12:16 +0200)]

[Minor] Drop overzealous regex from hfilter

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 12 Aug 2025 19:06:53 +0000 (01:06 +0600)]

Merge pull request #5569 from moisseev/e2e-playwright

[Test] Add WebUI E2E workflow with Playwright

commit | commitdiff | tree

Alexander Moisseev [Tue, 12 Aug 2025 16:09:07 +0000 (19:09 +0300)]

[Test] Add WebUI E2E workflow with Playwright

Add a GitHub Actions workflow to run WebUI E2E tests
with Playwright on legacy and latest browser versions
against rspamd binaries built in the pipeline.

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 12 Aug 2025 15:38:37 +0000 (16:38 +0100)]

[Minor] Add specific calculations for binary classification case

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 12 Aug 2025 13:09:25 +0000 (19:09 +0600)]

Merge pull request #5566 from fatalbanana/el10rpm

[Minor] Build on EL10

commit | commitdiff | tree

Andrew Lewis [Tue, 12 Aug 2025 11:24:49 +0000 (13:24 +0200)]

[Minor] Use clang for build on EL10

commit | commitdiff | tree

Andrew Lewis [Mon, 11 Aug 2025 12:53:41 +0000 (14:53 +0200)]

[Minor] Use embedded vectorscan on EL10

commit | commitdiff | tree

Andrew Lewis [Mon, 11 Aug 2025 12:53:25 +0000 (14:53 +0200)]

[Minor] Fix implicit declaration

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 9 Aug 2025 09:42:41 +0000 (10:42 +0100)]

[Fix] Try to fix learned order

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 7 Aug 2025 11:06:22 +0000 (17:06 +0600)]

Merge pull request #5562 from rspamd/vstakhov-proxy-compression

[Fix] Fix end-to-end proxy compression

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 6 Aug 2025 16:06:48 +0000 (17:06 +0100)]

[Fix] Fix double free in the client...

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 6 Aug 2025 15:58:58 +0000 (16:58 +0100)]

[Minor] Fix 'Compression' header logic

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 5 Aug 2025 13:10:11 +0000 (14:10 +0100)]

[Minor] Some more logic fixes

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 4 Aug 2025 19:46:33 +0000 (20:46 +0100)]

[Minor] More cleanups for compression stuff

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 4 Aug 2025 14:40:24 +0000 (15:40 +0100)]

[Fix] Fix end-to-end proxy compression

Issue: #5561

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 4 Aug 2025 11:58:44 +0000 (17:58 +0600)]

Merge pull request #5547 from rspamd/vstakhov-multi-class-bayes

[Project] Multi-class classification

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 1 Aug 2025 15:50:33 +0000 (21:50 +0600)]

Merge pull request #5559 from rspamd/vstakhov-arc-fixes

[Fix] Fix whitelist options in the arc module

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 1 Aug 2025 15:49:34 +0000 (21:49 +0600)]

Merge pull request #5556 from rspamd/vstakhov-skip-hashes-fuzzy

[Fix] Check skip_hashes for the returned hashes

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 1 Aug 2025 11:41:57 +0000 (12:41 +0100)]

[Fix] Fix whitelist options in the arc module

Issue: #5558

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 31 Jul 2025 15:51:46 +0000 (16:51 +0100)]

[Fix] Check skip_hashes for the returned hashes

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 31 Jul 2025 11:03:41 +0000 (17:03 +0600)]

Merge pull request #5555 from heptalium/meissner-fix-dmarc-reports

Use Redis write servers for write commands while generating DMARC reports

commit | commitdiff | tree

Jens Meißner [Wed, 30 Jul 2025 13:53:33 +0000 (15:53 +0200)]

Use Redis write servers for write commands while generating DMARC reports.

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 30 Jul 2025 09:01:10 +0000 (10:01 +0100)]

[Fix] Use a more straightforward approach for learn cache

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 29 Jul 2025 19:01:36 +0000 (20:01 +0100)]

[Fix] Fix various corner cases and tests

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 29 Jul 2025 13:17:00 +0000 (14:17 +0100)]

[Test] Add logic to match test id and logs id

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 29 Jul 2025 09:06:35 +0000 (10:06 +0100)]

[Minor] Add --log-tag option for rspamc

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 29 Jul 2025 08:16:42 +0000 (09:16 +0100)]

[Minor] Fix single class fallback

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 28 Jul 2025 18:22:56 +0000 (19:22 +0100)]

[Project] Apply changes to bayes_expiry plugin

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 28 Jul 2025 18:07:35 +0000 (19:07 +0100)]

[Test] Some more adjustments to the tests

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 28 Jul 2025 14:06:06 +0000 (15:06 +0100)]

[Project] Fix unlearn stuff

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 28 Jul 2025 09:41:33 +0000 (10:41 +0100)]

[Project] Fix more calculation issues

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 28 Jul 2025 08:53:43 +0000 (09:53 +0100)]

[Feature] Add some convenience options to rspamc

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 27 Jul 2025 15:04:29 +0000 (16:04 +0100)]

[Minor] Further adjustments

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 26 Jul 2025 11:28:07 +0000 (12:28 +0100)]

[Fix] Fix statfiles ordering

commit | commitdiff | tree

Vsevolod Stakhov [Sat, 26 Jul 2025 11:27:50 +0000 (12:27 +0100)]

[Fix] Fix probabilities overflow

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 25 Jul 2025 14:40:06 +0000 (15:40 +0100)]

[Minor] Fix stupid change to call Redis for each class

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 25 Jul 2025 10:58:36 +0000 (11:58 +0100)]

[Project] Fix various issues

commit | commitdiff | tree

Vsevolod Stakhov [Fri, 25 Jul 2025 07:48:48 +0000 (08:48 +0100)]

[Test] Add multiclass tests

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 23 Jul 2025 17:28:46 +0000 (18:28 +0100)]

[Minor] Reduce debug level verbosity

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 23 Jul 2025 17:06:34 +0000 (18:06 +0100)]

[Doc] Add architecture document

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 23 Jul 2025 16:38:10 +0000 (22:38 +0600)]

Merge pull request #5550 from moisseev/webui

[WebUI] Add history load range controls

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 23 Jul 2025 14:17:25 +0000 (15:17 +0100)]

[Minor] Use modern bayes configuration

commit | commitdiff | tree

Vsevolod Stakhov [Wed, 23 Jul 2025 13:33:04 +0000 (14:33 +0100)]

[Project] Fix binary classification and lua scripts

commit | commitdiff | tree

Vsevolod Stakhov [Tue, 22 Jul 2025 21:30:03 +0000 (22:30 +0100)]

[Project] Further updates

commit | commitdiff | tree

Alexander Moisseev [Tue, 22 Jul 2025 16:25:05 +0000 (19:25 +0300)]

[WebUI] Add history load range controls

- Add 'Offset' and 'Count' inputs to History tab for selecting
the range of history rows to load
- Add corresponding default setting in Settings tab

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 21 Jul 2025 09:55:59 +0000 (10:55 +0100)]

[Project] Fix other classification and learning issues

commit | commitdiff | tree

Vsevolod Stakhov [Mon, 21 Jul 2025 09:07:27 +0000 (10:07 +0100)]

[Minor] Fix various issues

commit | commitdiff | tree

Vsevolod Stakhov [Sun, 20 Jul 2025 16:11:52 +0000 (17:11 +0100)]

[Project] Multi-class classification project baseline

commit | commitdiff | tree

Vsevolod Stakhov [Thu, 17 Jul 2025 11:09:06 +0000 (17:09 +0600)]

Merge pull request #5545 from rspamd/vstakhov-html-tags-overload

[Project] Rework system of html tags to allow more tag types