For background, see this thread:
https://inbox.sourceware.org/gdb-patches/
20250612144607.27507-1-tdevries@suse.de
Tom describes the issue clearly in the above thread, here's what he
said:
Once in a while, when running test-case gdb.base/bp-cmds-continue-ctrl-c.exp,
I run into:
...
Breakpoint 2, foo () at bp-cmds-continue-ctrl-c.c:23^M
23 usleep (100);^M
^CFAIL: $exp: run: stop with control-c (unexpected) (timeout)
FAIL: $exp: run: stop with control-c
...
This is PR python/32167, observed both on x86_64-linux and powerpc64le-linux.
This is not a timeout due to accidental slowness, gdb actually hangs.
The backtrace at the hang is (on cfarm120 running AlmaLinux 9.6):
...
(gdb) bt
#0 0x00007fffbca9dd94 in __lll_lock_wait () from
/lib64/glibc-hwcaps/power10/libc.so.6
#1 0x00007fffbcaa6ddc in pthread_mutex_lock@@GLIBC_2.17 () from
/lib64/glibc-hwcaps/power10/libc.so.6
#2 0x000000001067aee8 in __gthread_mutex_lock ()
at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749
#3 0x000000001067afc8 in __gthread_recursive_mutex_lock ()
at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811
#4 0x000000001067b0d4 in std::recursive_mutex::lock ()
at /usr/include/c++/11/mutex:108
#5 0x000000001067b380 in std::lock_guard<std::recursive_mutex>::lock_guard ()
at /usr/include/c++/11/bits/std_mutex.h:229
#6 0x0000000010679d3c in set_quit_flag () at gdb/extension.c:865
#7 0x000000001066b6dc in handle_sigint () at gdb/event-top.c:1264
#8 0x00000000109e3b3c in handler_wrapper () at gdb/posix-hdep.c:70
#9 <signal handler called>
#10 0x00007fffbcaa6d14 in pthread_mutex_lock@@GLIBC_2.17 () from
/lib64/glibc-hwcaps/power10/libc.so.6
#11 0x000000001067aee8 in __gthread_mutex_lock ()
at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749
#12 0x000000001067afc8 in __gthread_recursive_mutex_lock ()
at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811
#13 0x000000001067b0d4 in std::recursive_mutex::lock ()
at /usr/include/c++/11/mutex:108
#14 0x000000001067b380 in std::lock_guard<std::recursive_mutex>::lock_guard ()
at /usr/include/c++/11/bits/std_mutex.h:229
#15 0x00000000106799cc in set_active_ext_lang ()
at gdb/extension.c:775
#16 0x0000000010b287ac in gdbpy_enter::gdbpy_enter ()
at gdb/python/python.c:232
#17 0x0000000010a8e3f8 in bpfinishpy_handle_stop ()
at gdb/python/py-finishbreakpoint.c:414
...
What happens here is the following:
- the gdbpy_enter constructor attempts to set the current extension language
to python using set_active_ext_lang
- set_active_ext_lang attempts to lock ext_lang_mutex
- while doing so, it is interrupted by sigint_wrapper (the SIGINT handler),
handling a SIGINT
- sigint_wrapper calls handle_sigint, which calls set_quit_flag, which also
tries to lock ext_lang_mutex
- since std::recursive_mutex::lock is not async-signal-safe, things go wrong,
resulting in a hang.
The hang bisects to commit
8bb8f834672 ("Fix gdb.interrupt race"), which
introduced the lock, making PR python/32167 a regression since gdb 15.1.
Commit
8bb8f834672 fixes PR dap/31263, a race reported by ThreadSanitizer:
...
WARNING: ThreadSanitizer: data race (pid=615372)
Read of size 1 at 0x00000328064c by thread T19:
#0 set_active_ext_lang(extension_language_defn const*) gdb/extension.c:755
#1 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling()
gdb/extension.c:697
#2 gdbpy_interrupt gdb/python/python.c:1106
#3 cfunction_vectorcall_NOARGS <null>
Previous write of size 1 at 0x00000328064c by main thread:
#0 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling()
gdb/extension.c:704
#1 fetch_inferior_event() gdb/infrun.c:4591
...
Location is global 'cooperative_sigint_handling_disabled' of size 1 at 0x00000328064c
...
SUMMARY: ThreadSanitizer: data race gdb/extension.c:755 in \
set_active_ext_lang(extension_language_defn const*)
...
The problem here is that gdb.interrupt is called from a worker thread, and its
implementation, gdbpy_interrupt races with the main thread on some variable.
The fix presented here is based on the fix that Tom proposed, but
fills in the missing Mingw support.
The problem is basically split into two: hosts that support unix like
signals, and Mingw, which doesn't support signals.
For signal supporting hosts, I've adopted the approach that Tom
suggests, gdbpy_interrupt uses kill() to send SIGINT to the GDB
process. This is then handled in the main thread as if the user had
pressed Ctrl+C. For these hosts no locking is required, so the
existing lock is removed. However, everywhere the lock currently
exists I've added an assert:
gdb_assert (is_main_thread ());
If this assert ever triggers then we're setting or reading the quit
flag on a worker thread, this will be a problem without the mutex.
For Mingw, the current mutex is retained. This is fine as there are
no signals, so no chance of the mutex acquisition being interrupted by
a signal, and so, deadlock shouldn't be an issue.
To manage the complexity of when we need an assert, and when we need
the mutex, I've created 'struct ext_lang_guard', which can be used as
a RAII object. This object either performs the assertion check, or
acquires the mutex, depending on the host.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32167
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Approved-By: Tom Tromey <tom@tromey.com>
#include <array>
#include "inferior.h"
#include "gdbsupport/scoped_signal_handler.h"
+#include "gdbsupport/scoped_ignore_signal.h"
+#include "run-on-main-thread.h"
static script_sourcer_func source_gdb_script;
static objfile_script_sourcer_func source_gdb_objfile_script;
This requires cooperation with the extension languages so the support
is defined here. */
-#if CXX_STD_THREAD
+#if CXX_STD_THREAD && defined __MINGW32__
#include <mutex>
available, DAP will not start.
This lock is held for accesses to quit_flag, active_ext_lang, and
- cooperative_sigint_handling_disabled. */
+ cooperative_sigint_handling_disabled.
+
+ This lock is only required for targets that don't support kill(), and,
+ it is assumed, handle SIGINT not as a signal, but as a new thread. For
+ these targets this mutex prevents multiple threads adjusting the above
+ state at the same time.
+
+ For targets that support kill() gdb.interrupt is implemented by just
+ sending SIGINT to the process, which is then handled in the "normal"
+ way. */
static std::recursive_mutex ext_lang_mutex;
-#endif /* CXX_STD_THREAD */
+#endif /* CXX_STD_THREAD && defined (__MINGW32__)*/
/* This flag tracks quit requests when we haven't called out to an
extension language. it also holds quit requests when we transition to
}
#endif
+namespace gdb
+{
+/* Wrapper that acquires the global EXT_LANG_MUTEX, but only for hosts
+ that might call quit related functions from a separate thread.
+ Specifically, this is hosts that don't support Unix like signals
+ (currently only Mingw).
+
+ For hosts with signal support, we don't try to use a mutex as a signal
+ might interrupt the lock acquisition, in which case deadlock will
+ occur. However, for these hosts, all the quit related functions are
+ called on the main thread (there's an assert for this), so the lack of
+ locking shouldn't be an issue.
+
+ For Mingw, without signals, the implementation of the Python
+ gdb.interrupt function can call set_quit_flag() from a second thread,
+ and additionally, the emulation of SIGINT involves the creation of a
+ temporary thread which calls the sigint handler. So for Mingw, we do
+ need the mutex, but that's OK, as no signal can interrupt the lock
+ acquisition. */
+struct ext_lang_guard
+{
+ ext_lang_guard ()
+ {
+#if CXX_STD_THREAD && !defined __MINGW32__
+ gdb_assert (is_main_thread ());
+#endif /* CXX_STD_THREAD && ! defined __MINGW32__ */
+ }
+
+ ~ext_lang_guard () { /* Nothing. */ }
+
+private:
+#if CXX_STD_THREAD && defined __MINGW32__
+ std::lock_guard<typeof (ext_lang_mutex)> m_guard { ext_lang_mutex };
+#endif
+};
+
+}
+
/* True if cooperative SIGINT handling is disabled. This is needed so
that calls to set_active_ext_lang do not re-enable cooperative
handling, which if enabled would make set_quit_flag store the
scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling ()
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
/* Force the active extension language to the GDB scripting
language. This ensures that a previously saved SIGINT is moved
scoped_disable_cooperative_sigint_handling::~scoped_disable_cooperative_sigint_handling ()
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
cooperative_sigint_handling_disabled = m_prev_cooperative_sigint_handling_disabled;
restore_active_ext_lang (m_prev_active_ext_lang_state);
struct active_ext_lang_state *
set_active_ext_lang (const struct extension_language_defn *now_active)
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
#if GDB_SELF_TEST
if (selftests::hook_set_active_ext_lang)
void
restore_active_ext_lang (struct active_ext_lang_state *previous)
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
if (cooperative_sigint_handling_disabled)
{
void
set_quit_flag ()
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
if (active_ext_lang->ops != NULL
&& active_ext_lang->ops->set_quit_flag != NULL)
bool
check_quit_flag ()
{
-#if CXX_STD_THREAD
- std::lock_guard guard (ext_lang_mutex);
-#endif /* CXX_STD_THREAD */
+ gdb::ext_lang_guard guard;
bool result = false;
static PyObject *
gdbpy_interrupt (PyObject *self, PyObject *args)
{
+#ifdef __MINGW32__
{
- /* Make sure the interrupt isn't delivered immediately somehow.
- This probably is not truly needed, but at the same time it
- seems more clear to be explicit about the intent. */
gdbpy_allow_threads temporarily_exit_python;
scoped_disable_cooperative_sigint_handling no_python_sigint;
set_quit_flag ();
}
+#else
+ {
+ /* For targets with support kill() just send SIGINT. This will be
+ handled as if the user hit Ctrl+C. This isn't exactly the same as
+ the above, which directly sets the quit flag. Consider, for
+ example, every place that install_sigint_handler is called. */
+ kill (getpid (), SIGINT);
+ }
+#endif
Py_RETURN_NONE;
}