From: Vsevolod Stakhov Date: Thu, 4 Jun 2026 18:06:48 +0000 (+0100) Subject: [Fix] url: keep query nesting cap a fixed functional limit X-Git-Tag: 4.1.0~3 X-Git-Url: http://git.ipfire.org/gitweb/index.cgi?a=commitdiff_plain;h=9277f19ef3468e0fe7b3608ea65578d525985667;p=thirdparty%2Frspamd.git [Fix] url: keep query nesting cap a fixed functional limit f068a1156 derived RSPAMD_URL_QUERY_MAX_NESTING from the multipattern scratch budget (MAX_REENTRANCY - 2), which silently bumped the redirect/wrapper unwrap depth from 5 to 8 and broke the get_html_urls unit test that pins the cap at 5. The nesting depth is a functional/product decision, not a function of the scratch pool size. Restore the fixed cap of 5 and instead assert at compile time that it stays within the scratch budget (plus the enclosing scan and the leaf TLD lookup); rspamd_multipattern_lookup() still degrades gracefully if the bound is ever exceeded. --- diff --git a/src/libserver/url.c b/src/libserver/url.c index dfa3c68798..199e0e7374 100644 --- a/src/libserver/url.c +++ b/src/libserver/url.c @@ -28,6 +28,15 @@ #include #include +/* + * Following a query-embedded URL re-enters the URL multipattern scan; the + * deepest chain holds RSPAMD_URL_QUERY_MAX_NESTING scratch contexts plus the + * enclosing scan and the leaf TLD lookup. Keep that on the fast static-scratch + * path of the multipattern matcher (a deeper run still works via the graceful + * fallback in rspamd_multipattern_lookup, just without a cached scratch). + */ +G_STATIC_ASSERT(RSPAMD_URL_QUERY_MAX_NESTING + 2 <= RSPAMD_MULTIPATTERN_MAX_REENTRANCY); + /* Lua URL filter consultation return values */ enum rspamd_url_lua_filter_result { RSPAMD_URL_LUA_FILTER_ACCEPT = 0, /* Continue parsing normally */ diff --git a/src/libserver/url.h b/src/libserver/url.h index 7a67547cb4..e6241c4f4f 100644 --- a/src/libserver/url.h +++ b/src/libserver/url.h @@ -279,15 +279,17 @@ void rspamd_url_find_single(rspamd_mempool_t *pool, * How deep to follow URLs nested inside the query of an already query-extracted * URL (a properly escaped wrapper carries one target per encoding layer). * + * This is a functional limit on how far redirect/wrapper chains are unwrapped. * Each level re-enters the URL multipattern scan while the enclosing scan is - * still on the stack. The peak number of simultaneously-held scratch contexts - * on the deepest chain is therefore this depth plus two: one for the enclosing - * text/subject scan, and one for the per-URL TLD lookup that rspamd_url_parse - * runs on the freshly extracted leaf URL. Keep that within the multipattern - * scratch budget (RSPAMD_MULTIPATTERN_MAX_REENTRANCY) so normal nesting stays - * on the fast static-scratch path. - */ -#define RSPAMD_URL_QUERY_MAX_NESTING (RSPAMD_MULTIPATTERN_MAX_REENTRANCY - 2) + * still on the stack, so the deepest chain holds this depth plus two scratch + * contexts (the enclosing text/subject scan, and the per-URL TLD lookup that + * rspamd_url_parse runs on the freshly extracted leaf). It must therefore stay + * comfortably below the multipattern scratch budget + * (RSPAMD_MULTIPATTERN_MAX_REENTRANCY); the static assert in url.c enforces + * that, and rspamd_multipattern_lookup() degrades gracefully if it is ever + * exceeded. + */ +#define RSPAMD_URL_QUERY_MAX_NESTING 5 /** * Find URLs embedded in the query parameters of `url`. Unlike