I decided to try to build and test gdb on Windows.
I found a page on the wiki [1] suggesting three ways of building gdb:
- MinGW,
- MinGW on Cygwin, and
- Cygwin.
I picked Cygwin, because I've used it before (though not recently).
I managed to install Cygwin and sufficient packages to build gdb and start the
testsuite.
However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
[ AFAICT, similar problems reported here [2]. ]
I managed to reproduce this hang by running just the test-case.
I attempted to kill the hanging processes by:
- first killing the inferior process, using the cygwin "kill -9" command, and
- then killing the gdb process, likewise.
But the gdb process remained, and I had to point-and-click my way through task
manager to actually kill the gdb process.
I investigated this by attaching to the hanging gdb process. Looking at the
main thread, I saw it was stopped in a call to WaitForSingleObject, with
the dwMilliseconds parameter set to INFINITE.
The backtrace in more detail:
...
(gdb) bt
#0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
/cygdrive/c/windows/SYSTEM32/ntdll.dll
#1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from
/cygdrive/c/windows/System32/KERNELBASE.dll
#2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=
4294967295) at
gdb/windows-nat.c:435
#3 0x0000000100999aa7 in
windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
(this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
#4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
(event=<optimized out>, this=0xa001c6fe0)
at gdb/../gdbsupport/function-view.h:296
#5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
#6 0x00000001008f2f86 in target_kill () at gdb/target.c:901
#7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
at gdb/top.c:1658
#8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
at gdb/top.c:1759
#9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
#10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
#11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
at gdb/event-top.c:1496
#12 0x0000000100499c45 in invoke_async_signal_handlers ()
at gdb/async-event.c:233
#13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
at gdbsupport/event-loop.cc:198
#14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
this=<optimized out>) at gdb/interps.h:87
#15 start_event_loop () at gdb/main.c:402
#16 captured_command_loop () at gdb/main.c:466
#17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
#18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
#19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
...
In the docs [3], I read that using an INFINITE argument to WaitForSingleObject
might cause a system deadlock.
This prompted me to try this simple change in wait_for_single:
...
while (true)
{
- DWORD r = WaitForSingleObject (handle, howlong);
+ DWORD r = WaitForSingleObject (handle,
+ howlong == INFINITE ? 100 : howlong);
+ if (howlong == INFINITE && r == WAIT_TIMEOUT)
+ continue;
...
with the timeout of 0.1 second estimated to be:
- small enough for gdb to feel reactive, and
- big enough not to consume too much cpu cycles with looping.
And indeed, the test-case, while still failing, now finishes in ~50 seconds.
While there may be an underlying bug that triggers this behaviour, the failure
mode is so severe that I consider it a bug in itself.
Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
Tested on x86_64-cygwin, by running the testsuite past the test-case.
Approved-By: Pedro Alves <pedro@palves.net>
PR tdep/32894
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32894
[1] https://sourceware.org/gdb/wiki/BuildingOnWindows
[2] https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
[3] https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject