From: Vsevolod Stakhov Date: Sat, 23 May 2026 10:34:17 +0000 (+0100) Subject: [Fix] neural: resilient ANN reuse across symbol-list drift X-Git-Tag: 4.1.0~27 X-Git-Url: http://git.ipfire.org/gitweb/index.cgi?a=commitdiff_plain;h=ed97ec8a7be8649bf87ab73287e6847e787cd5cc;p=thirdparty%2Frspamd.git [Fix] neural: resilient ANN reuse across symbol-list drift Two follow-up fixes that complete the "neural keeps working when symbols change" story started by the disable_symbols_input digest stability commit. Both motivated by inspecting the actual vbspam Redis state on sp-collector, which showed multiple coexisting profiles per rule and an orphaned training set (~100 spam / 15 ham) under a stale digest. is_profile_compatible (pure-symbols mode) The 30% Levenshtein-drift cap rejected the prior profile on every modest config change (new RBL, multimap addition, SA-style rule loaded via multimap regexp_rules). When rejected, set.training_profile stayed nil, inference went dark, and training samples had nowhere to accumulate until a brand-new ANN trained from scratch -- weeks under realistic class imbalance. Raise the cap to 50%, with a comment pointing at the result_to_vector path (it builds vectors from profile.symbols, NOT set.symbols, so loading the older profile keeps the trained weights correctly indexed against the features that produced them). maybe_carryover_ann (hybrid providers + symbols) The carryover copied an ANN blob from an old key (trained against profile.symbols A) into a fresh key whose profile entry carries set.symbols (current = B). load_new_ann later writes set.ann.symbols = profile.symbols, so at inference the copied weights got applied to indices that no longer correspond to the symbols they were trained on -- silent garbage output. Guard the carryover with rule.disable_symbols_input: only then does the symbol portion not contribute to the input vector, and copied weights remain meaningful. For hybrid mode without disable_symbols_input the existing is_profile_compatible path already keeps inference alive via the prior profile entry (whose own symbol list keeps weights aligned), so skipping carryover is the correct behaviour, not a regression. Combined with the earlier digest-stability commit, the failure modes the user kept hitting in production -- disable_symbols_input digest rotation, pure-symbols cap too tight, hybrid carryover misindexing -- are all addressed. --- diff --git a/lualib/plugins/neural.lua b/lualib/plugins/neural.lua index c2362a0a42..a116adc14a 100644 --- a/lualib/plugins/neural.lua +++ b/lualib/plugins/neural.lua @@ -763,8 +763,16 @@ local function is_profile_compatible(rule, set, profile_elt, current_providers_d if not profile_elt.symbols or not set.symbols then return false, math.huge end + -- Accept profiles whose symbol list still overlaps the current one by at + -- least 50% (i.e. Levenshtein drift < 50% of |set.symbols|). The previous + -- 30% threshold rejected the old profile on every modest config change + -- and inference went completely dark until a new ANN trained from scratch + -- (weeks under realistic class imbalance). With this looser cap the worker + -- keeps using the old profile's redis_key -- and crucially its OWN symbol + -- list, since result_to_vector uses profile.symbols -- so the trained + -- weights stay correctly indexed against the features that produced them. local dist = lua_util.distance_sorted(profile_elt.symbols, set.symbols) - if dist >= #set.symbols * 0.3 then + if dist >= #set.symbols * 0.5 then return false, dist end return true, dist diff --git a/src/plugins/lua/neural.lua b/src/plugins/lua/neural.lua index c6a00c4fa2..b8c787b5cc 100644 --- a/src/plugins/lua/neural.lua +++ b/src/plugins/lua/neural.lua @@ -85,12 +85,19 @@ local function new_ann_profile(task, rule, set, version) else rspamd_logger.infox(task, 'created new ANN profile for %s:%s, data stored at prefix %s', rule.prefix, set.name, profile.redis_key) - -- If a prior profile with the same providers_digest holds trained - -- weights, carry them over into the fresh profile key. This prevents - -- a symcache-driven profile rotation from abandoning a still-valid - -- ANN whenever the input vector schema is decided by providers - -- (rather than the symbol list). - if providers_digest then + -- Carry weights from a prior profile (same providers_digest, different + -- symbol-list digest) into the fresh profile key ONLY when the input + -- vector schema is decided entirely by providers -- i.e. when + -- disable_symbols_input is set. In hybrid mode (providers + symbols) + -- the symbol portion of the vector reshapes with symbol drift, and + -- load_new_ann then sets set.ann.symbols = profile.symbols (= current + -- symbol list), so copied weights would be indexed against features + -- they were never trained against -- silent garbage at inference. + -- For hybrid mode is_profile_compatible already routes inference to + -- the prior profile entry, which carries its own (older) symbol list + -- and therefore keeps weights correctly aligned at inference time; + -- skipping carryover is the right behaviour. + if providers_digest and rule.disable_symbols_input then maybe_carryover_ann(task, rule, set, ann_key, providers_digest) end end