From f535d3e031c355de639b8c61228ce3df988cdd84 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Thu, 22 Jan 2026 12:01:22 +0100 Subject: [PATCH] BUG/MEDIUM: debug: only dump Lua state when panicking For a long time, we've tried to show the Lua state and backtrace when dumping threads so as to be able to figure is (and which) Lua code was misbehaving, e.g. by performing expensive library calls. Since 3.1 with commit 365ee28510 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()"), it appears that the approach is more fragile (though that fix addressed a real issue about out-of-memory), and it's possible to occasionally observe crashes or CPU loops with "show threads" while running Lua heavily. While users of "show threads" are rare, the watchdog warnings, which were also enabled on 3.1, also trigger these issues, which is even more of a concern. This patch goes the simple way to address this for now: since the purpose of the Lua backtrace was to help locate Lua call places upon a panic, let's only call the backtrace on panic but not in other situations. After a panic we obviously don't care that the Lua stack might be corrupted since it's never going to be resumed anyway. This may be relaxed in the future if a solution is found to reliably produce harmless Lua backtraces. The commit above was backported to all stable branches, so this patch will be needed everywhere. However, TAINTED_PANIC only appeared in 2.8, and given the rarety of this bug before 3.1, it's probably not needed to make any extra effort to go beyond 2.8. It's easy enough to test a version for being subject to this issue, by running the following Lua code: local function stress(txn) for _, backend in pairs(core.backends) do for _, server in pairs(backend.servers) do local stats = server:get_stats() end end end core.register_fetches("stress", stress) in the following config file: global stats socket /tmp/haproxy.stat level admin mode 666 tune.lua.bool-sample-conversion normal lua-load-per-thread "stress.lua" listen stress bind :8001 mode http timeout client 5s timeout server 5s timeout connect 5s http-request return status 200 content-type text/plain lf-string %[lua.stress()] server s1 127.0.0.1:8000 and stressing port 8001 with 100+ connections requesting / in loop, then issuing "show threads" on the CLI using socat in loops as well. Normally it instantly segfaults (sometimes during the first "show"). --- src/debug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/debug.c b/src/debug.c index f921e0d68..c404139be 100644 --- a/src/debug.c +++ b/src/debug.c @@ -605,7 +605,11 @@ void ha_task_dump(struct buffer *buf, const struct task *task, const char *pfx) chunk_appendf(buf, "%sCurrent executing a Lua HTTP service -- ", pfx); } - if (hlua && hlua->T) { + /* only dump the Lua stack on panic because the approach is often + * destructive and the running program might not recover from this + * if called during warnings or "show threads". + */ + if (hlua && hlua->T && (get_tainted() & TAINTED_PANIC)) { chunk_appendf(buf, "stack traceback:\n "); append_prefixed_str(buf, hlua_traceback(hlua->T, "\n "), pfx, '\n', 0); } -- 2.47.3