git.ipfire.org Git - thirdparty/rspamd.git/commit

[Fix] neural: don't strand trained ANNs behind tombstones

A trained ANN could become unreachable to workers even though
training succeeded: NEURAL_SPAM/NEURAL_HAM stopped firing while the
controller logged "ann ... is changed, our version = N, remote
version = M" forever.

Root cause is a version regression, not a missing zset registration.
The new version was seeded from the in-memory set.ann, and
fill_set_ann resets set.ann.version to 0 whenever a worker never
loaded an ANN (restart, or the selected profile's blob was missing).
A worker that trained from the _4 profile then saved version 1.
process_existing_ann selects the highest version among compatible
profiles, so the live version-1 blob was shadowed by the stale
version-4 zset entry whose key was empty. The profile zset has no
TTL, so the dead high-version tombstone was immortal and the
condition self-perpetuated (the _4 blob was never rewritten).

Three fixes:

1. Version monotonicity (lualib/plugins/neural.lua): seed the new
   version from the profile actually trained from (the trained-from
   key encodes it as the trailing _<n>), max'd with
   training_profile/set.ann, so the new entry always outranks the
   profile it supersedes.

2. Liveness-aware selection (src/plugins/lua/neural.lua,
   neural_maybe_invalidate.lua): when the selected profile's blob is
   missing, fall back to the next compatible profile with a live blob
   instead of going dark, and emit a throttled warning (was a silent
   debug line). The invalidate script also GCs profile entries that
   have no blob and no training data and are older than a grace
   window.

3. Lifetime coupling (neural_save_unlock.lua,
   src/plugins/lua/neural.lua): give the profile zset a TTL refreshed
   each check_anns cycle, and refresh the blob TTL on every reload,
   so an actively used ANN never expires out from under its entry.

Adds 330_neural/005_stale_version.robot, which injects a
higher-version tombstone and asserts inference recovers.

author	Vsevolod Stakhov <vsevolod@rspamd.com>
	Tue, 9 Jun 2026 11:43:40 +0000 (12:43 +0100)
committer	Vsevolod Stakhov <vsevolod@rspamd.com>
	Tue, 9 Jun 2026 11:44:01 +0000 (12:44 +0100)
commit	6e6f13c428f95057dbf11cf64f8e0daab057f331
tree	8ed3e7a081a07e1d35df8b3079e91e97a92b76e8	tree \| snapshot
parent	9701f6831e711f3dfda1683d0dc944b119c4900d	commit \| diff

lualib/plugins/neural.lua		diff \| blob \| blame \| history
lualib/redis_scripts/neural_maybe_invalidate.lua		diff \| blob \| blame \| history
lualib/redis_scripts/neural_save_unlock.lua		diff \| blob \| blame \| history
src/plugins/lua/neural.lua		diff \| blob \| blame \| history
test/functional/cases/330_neural/005_stale_version.robot	[new file with mode: 0644]	blob