From 7f15a94e6553deb8074b3aa1ed2d0157b00f7aec Mon Sep 17 00:00:00 2001 From: Andrew Burgess Date: Mon, 4 Aug 2025 15:44:04 +0100 Subject: [PATCH] gdb: use kill() in gdbpy_interrupt for hosts with signal support For background, see this thread: https://inbox.sourceware.org/gdb-patches/20250612144607.27507-1-tdevries@suse.de Tom describes the issue clearly in the above thread, here's what he said: Once in a while, when running test-case gdb.base/bp-cmds-continue-ctrl-c.exp, I run into: ... Breakpoint 2, foo () at bp-cmds-continue-ctrl-c.c:23^M 23 usleep (100);^M ^CFAIL: $exp: run: stop with control-c (unexpected) (timeout) FAIL: $exp: run: stop with control-c ... This is PR python/32167, observed both on x86_64-linux and powerpc64le-linux. This is not a timeout due to accidental slowness, gdb actually hangs. The backtrace at the hang is (on cfarm120 running AlmaLinux 9.6): ... (gdb) bt #0 0x00007fffbca9dd94 in __lll_lock_wait () from /lib64/glibc-hwcaps/power10/libc.so.6 #1 0x00007fffbcaa6ddc in pthread_mutex_lock@@GLIBC_2.17 () from /lib64/glibc-hwcaps/power10/libc.so.6 #2 0x000000001067aee8 in __gthread_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749 #3 0x000000001067afc8 in __gthread_recursive_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811 #4 0x000000001067b0d4 in std::recursive_mutex::lock () at /usr/include/c++/11/mutex:108 #5 0x000000001067b380 in std::lock_guard::lock_guard () at /usr/include/c++/11/bits/std_mutex.h:229 #6 0x0000000010679d3c in set_quit_flag () at gdb/extension.c:865 #7 0x000000001066b6dc in handle_sigint () at gdb/event-top.c:1264 #8 0x00000000109e3b3c in handler_wrapper () at gdb/posix-hdep.c:70 #9 #10 0x00007fffbcaa6d14 in pthread_mutex_lock@@GLIBC_2.17 () from /lib64/glibc-hwcaps/power10/libc.so.6 #11 0x000000001067aee8 in __gthread_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749 #12 0x000000001067afc8 in __gthread_recursive_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811 #13 0x000000001067b0d4 in std::recursive_mutex::lock () at /usr/include/c++/11/mutex:108 #14 0x000000001067b380 in std::lock_guard::lock_guard () at /usr/include/c++/11/bits/std_mutex.h:229 #15 0x00000000106799cc in set_active_ext_lang () at gdb/extension.c:775 #16 0x0000000010b287ac in gdbpy_enter::gdbpy_enter () at gdb/python/python.c:232 #17 0x0000000010a8e3f8 in bpfinishpy_handle_stop () at gdb/python/py-finishbreakpoint.c:414 ... What happens here is the following: - the gdbpy_enter constructor attempts to set the current extension language to python using set_active_ext_lang - set_active_ext_lang attempts to lock ext_lang_mutex - while doing so, it is interrupted by sigint_wrapper (the SIGINT handler), handling a SIGINT - sigint_wrapper calls handle_sigint, which calls set_quit_flag, which also tries to lock ext_lang_mutex - since std::recursive_mutex::lock is not async-signal-safe, things go wrong, resulting in a hang. The hang bisects to commit 8bb8f834672 ("Fix gdb.interrupt race"), which introduced the lock, making PR python/32167 a regression since gdb 15.1. Commit 8bb8f834672 fixes PR dap/31263, a race reported by ThreadSanitizer: ... WARNING: ThreadSanitizer: data race (pid=615372) Read of size 1 at 0x00000328064c by thread T19: #0 set_active_ext_lang(extension_language_defn const*) gdb/extension.c:755 #1 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling() gdb/extension.c:697 #2 gdbpy_interrupt gdb/python/python.c:1106 #3 cfunction_vectorcall_NOARGS Previous write of size 1 at 0x00000328064c by main thread: #0 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling() gdb/extension.c:704 #1 fetch_inferior_event() gdb/infrun.c:4591 ... Location is global 'cooperative_sigint_handling_disabled' of size 1 at 0x00000328064c ... SUMMARY: ThreadSanitizer: data race gdb/extension.c:755 in \ set_active_ext_lang(extension_language_defn const*) ... The problem here is that gdb.interrupt is called from a worker thread, and its implementation, gdbpy_interrupt races with the main thread on some variable. The fix presented here is based on the fix that Tom proposed, but fills in the missing Mingw support. The problem is basically split into two: hosts that support unix like signals, and Mingw, which doesn't support signals. For signal supporting hosts, I've adopted the approach that Tom suggests, gdbpy_interrupt uses kill() to send SIGINT to the GDB process. This is then handled in the main thread as if the user had pressed Ctrl+C. For these hosts no locking is required, so the existing lock is removed. However, everywhere the lock currently exists I've added an assert: gdb_assert (is_main_thread ()); If this assert ever triggers then we're setting or reading the quit flag on a worker thread, this will be a problem without the mutex. For Mingw, the current mutex is retained. This is fine as there are no signals, so no chance of the mutex acquisition being interrupted by a signal, and so, deadlock shouldn't be an issue. To manage the complexity of when we need an assert, and when we need the mutex, I've created 'struct ext_lang_guard', which can be used as a RAII object. This object either performs the assertion check, or acquires the mutex, depending on the host. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32167 Co-Authored-By: Tom de Vries Approved-By: Tom Tromey --- gdb/extension.c | 79 +++++++++++++++++++++++++++++++++------------ gdb/python/python.c | 13 ++++++-- 2 files changed, 68 insertions(+), 24 deletions(-) diff --git a/gdb/extension.c b/gdb/extension.c index d34dbcdd34e..1e967c523f6 100644 --- a/gdb/extension.c +++ b/gdb/extension.c @@ -34,6 +34,8 @@ #include #include "inferior.h" #include "gdbsupport/scoped_signal_handler.h" +#include "gdbsupport/scoped_ignore_signal.h" +#include "run-on-main-thread.h" static script_sourcer_func source_gdb_script; static objfile_script_sourcer_func source_gdb_objfile_script; @@ -638,7 +640,7 @@ breakpoint_ext_lang_cond_says_stop (struct breakpoint *b) This requires cooperation with the extension languages so the support is defined here. */ -#if CXX_STD_THREAD +#if CXX_STD_THREAD && defined __MINGW32__ #include @@ -648,10 +650,19 @@ breakpoint_ext_lang_cond_says_stop (struct breakpoint *b) available, DAP will not start. This lock is held for accesses to quit_flag, active_ext_lang, and - cooperative_sigint_handling_disabled. */ + cooperative_sigint_handling_disabled. + + This lock is only required for targets that don't support kill(), and, + it is assumed, handle SIGINT not as a signal, but as a new thread. For + these targets this mutex prevents multiple threads adjusting the above + state at the same time. + + For targets that support kill() gdb.interrupt is implemented by just + sending SIGINT to the process, which is then handled in the "normal" + way. */ static std::recursive_mutex ext_lang_mutex; -#endif /* CXX_STD_THREAD */ +#endif /* CXX_STD_THREAD && defined (__MINGW32__)*/ /* This flag tracks quit requests when we haven't called out to an extension language. it also holds quit requests when we transition to @@ -700,6 +711,44 @@ void (*hook_set_active_ext_lang) () = nullptr; } #endif +namespace gdb +{ +/* Wrapper that acquires the global EXT_LANG_MUTEX, but only for hosts + that might call quit related functions from a separate thread. + Specifically, this is hosts that don't support Unix like signals + (currently only Mingw). + + For hosts with signal support, we don't try to use a mutex as a signal + might interrupt the lock acquisition, in which case deadlock will + occur. However, for these hosts, all the quit related functions are + called on the main thread (there's an assert for this), so the lack of + locking shouldn't be an issue. + + For Mingw, without signals, the implementation of the Python + gdb.interrupt function can call set_quit_flag() from a second thread, + and additionally, the emulation of SIGINT involves the creation of a + temporary thread which calls the sigint handler. So for Mingw, we do + need the mutex, but that's OK, as no signal can interrupt the lock + acquisition. */ +struct ext_lang_guard +{ + ext_lang_guard () + { +#if CXX_STD_THREAD && !defined __MINGW32__ + gdb_assert (is_main_thread ()); +#endif /* CXX_STD_THREAD && ! defined __MINGW32__ */ + } + + ~ext_lang_guard () { /* Nothing. */ } + +private: +#if CXX_STD_THREAD && defined __MINGW32__ + std::lock_guard m_guard { ext_lang_mutex }; +#endif +}; + +} + /* True if cooperative SIGINT handling is disabled. This is needed so that calls to set_active_ext_lang do not re-enable cooperative handling, which if enabled would make set_quit_flag store the @@ -708,9 +757,7 @@ static bool cooperative_sigint_handling_disabled = false; scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling () { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; /* Force the active extension language to the GDB scripting language. This ensures that a previously saved SIGINT is moved @@ -729,9 +776,7 @@ scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_ha scoped_disable_cooperative_sigint_handling::~scoped_disable_cooperative_sigint_handling () { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; cooperative_sigint_handling_disabled = m_prev_cooperative_sigint_handling_disabled; restore_active_ext_lang (m_prev_active_ext_lang_state); @@ -771,9 +816,7 @@ scoped_disable_cooperative_sigint_handling::~scoped_disable_cooperative_sigint_h struct active_ext_lang_state * set_active_ext_lang (const struct extension_language_defn *now_active) { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; #if GDB_SELF_TEST if (selftests::hook_set_active_ext_lang) @@ -827,9 +870,7 @@ set_active_ext_lang (const struct extension_language_defn *now_active) void restore_active_ext_lang (struct active_ext_lang_state *previous) { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; if (cooperative_sigint_handling_disabled) { @@ -861,9 +902,7 @@ restore_active_ext_lang (struct active_ext_lang_state *previous) void set_quit_flag () { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; if (active_ext_lang->ops != NULL && active_ext_lang->ops->set_quit_flag != NULL) @@ -886,9 +925,7 @@ set_quit_flag () bool check_quit_flag () { -#if CXX_STD_THREAD - std::lock_guard guard (ext_lang_mutex); -#endif /* CXX_STD_THREAD */ + gdb::ext_lang_guard guard; bool result = false; diff --git a/gdb/python/python.c b/gdb/python/python.c index 1af7896eb08..3b5d06944b2 100644 --- a/gdb/python/python.c +++ b/gdb/python/python.c @@ -1202,15 +1202,22 @@ gdbpy_post_event (PyObject *self, PyObject *args) static PyObject * gdbpy_interrupt (PyObject *self, PyObject *args) { +#ifdef __MINGW32__ { - /* Make sure the interrupt isn't delivered immediately somehow. - This probably is not truly needed, but at the same time it - seems more clear to be explicit about the intent. */ gdbpy_allow_threads temporarily_exit_python; scoped_disable_cooperative_sigint_handling no_python_sigint; set_quit_flag (); } +#else + { + /* For targets with support kill() just send SIGINT. This will be + handled as if the user hit Ctrl+C. This isn't exactly the same as + the above, which directly sets the quit flag. Consider, for + example, every place that install_sigint_handler is called. */ + kill (getpid (), SIGINT); + } +#endif Py_RETURN_NONE; } -- 2.47.3