Don't pretend infcalls don't set the inferior running (PR gdb/34082)
Commit
2954dd2b73 ("thread_info::executing+resumed ->
thread_info::internal_state"), caused a regression in
gdb.threads/hand-call-new-thread.exp:
...
(gdb) PASS: gdb.threads/hand-call-new-thread.exp: iter 1: no thread marked running
p new_thread ()
.../src/gdb/infrun.c:3742: internal-error: proceed: Assertion `!thread_is_in_step_over_chain (&tp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
FAIL: gdb.threads/hand-call-new-thread.exp: iter 2: gdb-command<p new_thread ()> (GDB internal error)
...
This commit fixes it.
Let's say we have three threads, 1, 2, and 3. User does:
(gdb) continue
This makes GDB switch all three threads to THREAD_RUNNING.
If some of those threads, now running, spawns a new thread, that
thread is also set to state THREAD_RUNNING. We end up with four
threads marked THREAD_RUNNING.
If e.g., threads 2 and 3 both hit a breakpoint that needs to be
stepped over (e.g., condition evals false), and there is only one
displaced-stepping slot, then one thread starts a displaced stepping
sequence, while the other is put in the step-over queue, waiting for
its turn.
Now, if meanwhile thread 1 hits a user-visible stop, GDB stops all
threads, and transitions all their states to THREAD_STOPPED. Any
thread that was still waiting for its turn in the step-over queue is
removed from the queue. That happens in the THREAD_RUNNING =>
THREAD_STOPPED transition, here:
thread_state
thread_info::set_state (thread_state state, bool suppress_notification)
{
...
switch (m_state)
{
case THREAD_STOPPED:
if (thread_is_in_step_over_chain (this))
global_thread_step_over_chain_remove (this);
The next time the user continues execution, if the breakpoint is still
inserted, proceed() sets them stepping the breakpoint again. And
again, if there is more than one thread that needs to step-over, and
there aren't enough slots, some threads may end up in the step-over
queue. Rinse, repeat.
All this works well with normal resumption commands, like continue,
step, next, etc.
The problem exposed by gdb.threads/hand-call-new-thread.exp is if you
resume execution with an infcall instead of a normal execution
command. In that case, proceed() skips transitioning (pre-existing)
threads to THREAD_RUNNING, here:
proceed (CORE_ADDR addr, enum gdb_signal siggnal)
{
...
/* Even if RESUME_PTID is a wildcard, and we end up resuming fewer
threads in RESUME_PTID are now running. Unless we're calling an
inferior function, as in that case we pretend the inferior
doesn't run at all. */
if (!cur_thr->control.in_infcall)
set_state (resume_target, resume_ptid, THREAD_RUNNING);
So later, when the call finishes for any reason (normal call finish,
or some other user-visible stop happens), and GDB transitions all
threads to THREAD_STOPPED, we hit the early return in
thread_info::set_state:
thread_state
thread_info::set_state (thread_state state, bool suppress_notification)
{
thread_state prev_state = m_state;
if (prev_state == state)
return prev_state; // <== EARLY RETURN
m_state = state;
switch (m_state)
{
case THREAD_STOPPED:
if (thread_is_in_step_over_chain (this))
global_thread_step_over_chain_remove (this); // NOT REACHED
break;
...
}
}
... and so if any thread had been put in the step-over queue since the
last proceed(), it will incorrectly be left still in the step-over
queue, with THREAD_STOPPED state.
If/when the user re-resumes the program again, we trip the assertion
in proceed:
(gdb) p new_thread ()
../../src/gdb/infrun.c:3742: internal-error: proceed: Assertion `!thread_is_in_step_over_chain (&tp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Before commit
2954dd2b73 ("thread_info::executing+resumed ->
thread_info::internal_state"), this didn't happen because
set_running_thread(..., running=false) would remove threads from the
step-over queue unconditionally, even if they were already marked
stopped.
I think the right fix is to stop pretending that infcalls don't set
the target running. I can't think of a reason we do that. It really
does run. Some thoughts:
- I added the code to skip set_running (nowadays 'set_state(...,
THREAD_RUNNING))' for infcalls back in commit
4d9d9d0423 over 10
years ago, but I honestly don't recall why. My guess is that it
must have been to keep backwards compatibility with something, and
the code has probably changed sufficiently since then making it no
longer necessary.
- infcalls are always synchronous, so the intermediate running state
can't be observed with commands.
- MI still suppresses *running
The running state could potentially be observed on frontends, with
e.g., MI's *running => *stopped transitions, but if that is a
problem, I think it should be handled by the interpreter layer not
emiting the notifications, instead of hacking the threads's core
state. I.e., make it a presentation detail.
I don't think it would be a problem for frontends to see *running
during infcalls, but in any case, MI is already suppressing such
notifications when they are caused by an infcall. See
mi_interp::on_target_resumed.
So e.g., if we let the threads transition to THREAD_RUNNING, with
MI, we still see no *running/*stopped:
(gdb)
p malloc(0)
&"p malloc(0)\n"
~"$2 = (void *) 0x555555560320\n"
^done
(gdb)
I don't know if it'd be a problem for DAP, but I assume not too.
Note how the code in proceed that skips setting threads to
THREAD_RUNNING only applies to already-known threads. Any new
thread that appears while the infcall is ongoing will end up marked
THREAD_RUNNING, causing frontend notifications. This shows how
hiding the running state doesn't really hide it completely.
So this is what this commit does. It lets threads transition to
THREAD_RUNNING even during infcalls, fixing the problem described
above, as now there will be proper THREAD_RUNNING => THREAD_STOPPED
transitions when the infcall finishes.
It also tweaks the testcase to spawn 10 threads per infcall instead of
one. This makes it much more likely to reproduce the problem on my
machine. Without it, the test still passes for me, which is why I
didn't see the problem before merging
2954dd2b73.
Tested on x86-64 Linux, native and gdbserver.
Approved-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I1bdc733ac865102d3f7bf0a4f7f56e6f7d75d457
commit-id:
d2abd130