git.ipfire.org Git - thirdparty/haproxy.git/commit

BUG/MEDIUM: debug: only dump Lua state when panicking

For a long time, we've tried to show the Lua state and backtrace when
dumping threads so as to be able to figure is (and which) Lua code was
misbehaving, e.g. by performing expensive library calls. Since 3.1 with
commit 365ee28510 ("BUG/MINOR: hlua: prevent LJMP in hlua_traceback()"),
it appears that the approach is more fragile (though that fix addressed
a real issue about out-of-memory), and it's possible to occasionally
observe crashes or CPU loops with "show threads" while running Lua
heavily. While users of "show threads" are rare, the watchdog warnings,
which were also enabled on 3.1, also trigger these issues, which is
even more of a concern.

This patch goes the simple way to address this for now: since the purpose
of the Lua backtrace was to help locate Lua call places upon a panic,
let's only call the backtrace on panic but not in other situations. After
a panic we obviously don't care that the Lua stack might be corrupted
since it's never going to be resumed anyway. This may be relaxed in the
future if a solution is found to reliably produce harmless Lua backtraces.

The commit above was backported to all stable branches, so this patch
will be needed everywhere. However, TAINTED_PANIC only appeared in 2.8,
and given the rarety of this bug before 3.1, it's probably not needed
to make any extra effort to go beyond 2.8.

It's easy enough to test a version for being subject to this issue,
by running the following Lua code:

  local function stress(txn)
          for _, backend in pairs(core.backends) do
                  for _, server in pairs(backend.servers) do
                          local stats = server:get_stats()
                  end
          end
  end

  core.register_fetches("stress", stress)

in the following config file:

  global
        stats socket /tmp/haproxy.stat level admin mode 666
        tune.lua.bool-sample-conversion normal
        lua-load-per-thread "stress.lua"

  listen stress
        bind :8001
        mode http
        timeout client 5s
        timeout server 5s
        timeout connect 5s
        http-request return status 200 content-type text/plain lf-string %[lua.stress()]
        server s1 127.0.0.1:8000

and stressing port 8001 with 100+ connections requesting / in loop, then
issuing "show threads" on the CLI using socat in loops as well. Normally
it instantly segfaults (sometimes during the first "show").

author	Willy Tarreau <w@1wt.eu>
	Thu, 22 Jan 2026 11:01:22 +0000 (12:01 +0100)
committer	Willy Tarreau <w@1wt.eu>
	Thu, 22 Jan 2026 14:47:42 +0000 (15:47 +0100)
commit	f535d3e031c355de639b8c61228ce3df988cdd84
tree	fbcb0884db54c4c249ed54112f4dfa0cd515a08d	tree \| snapshot
parent	ac877a25ddf5f4a4dc1fbc0c5d225f806006e053	commit \| diff