The valgrind gdbserver inheritated a register cache from the original
GDBserver implementation.
The objective of this register cache was to improve the performance
of GDB-> gdbserver -> inferior by avoiding the gdbserver having to
do ptrace system calls each time GDB wants to read or write a register
when the inferior is stopped.
This register cache is however not useful for the valgrind gdbserver:
As the valgrind gdbserver being co-located with the inferior, it
can directly and efficiently read and write registers from/to the VEX
state.
This commit ensures the valgrind GDBserver directly reads from
VEX state instead of fetching the registers from the VEX state and
copying them to the gdbserver regcache.
Similarly, when GDB wants to modify a register, the valgrind GDB server now
directly writes into the VEX state instead of writing the registers
in the regcache and having the regcache flushed to the VEX state
when execution is resumed.
The files regcache.h and regcache.c are still useful as they provide
a translation between a register number, a register name on one side
and the offset in an array of bytes in the format expected by GDB.
The regcache now is only used to create this array of bytes, which is
itself only used temporarily when GDB reads or writes the complete
set of registers instead of reading/writing one register at a time.
Removing the usage of this regcache avoids the bug 458915.
The regcache was causing the bug in the following circumstances:
We have a thread executing code, while we have a bunch of threads
that are blocked in a syscall.
When a thread is blocked in a syscall, the VEX rax register is set to the
syscall nr.
A thread executing code will check from time to time if GDB tries to
attach.
When GDB attaches to the valgrind gdbserver , the thread executing code
will copy the registers from all the threads to the thread gdbserver regcache.
However, the threads blocked in a system call can be unblocked e.g.
because the epoll_wait timeout expires. In such a case, the thread will
still execute the few instructions that follow the syscall instructions
till the thread is blocked trying to acquire the scheduler lock.
These instructions are extracting the syscall return code from the host
register and copies it to the valgrind VEX state.
However, this assembly code is not aware that there is a gdbserver cache.
When the unblocked thread is on the acquire lock statement,
the GDB server regcache is now inconsistent (i.e. different from) the
real VEX state.
When finally GDB tells GDB server to continue execution, the GDB server
wrongly detected that its regcache was modified compared to the VEX state:
the regcache still contains e.g. for the rax register the syscall number
while the unblocked thread has put the syscall return code in the VEX
rax register. GDBserver then flushed the regcache rax (containing the
syscall number) to the VEX rax.
And that led to the detected bug that the syscall return code seen by
the guest application was the syscall number.
Removing the regcache ensures that GDB directly reads the values
from VEX and directly writes to VEX state.
Note that we could still have GDB reading from VEX a register value
that will be changed a few instructions later.
GDB will then show some (slightly) old/obsolete values
for some registers to the user.
This should have no consequence as long as GDB does not try to modify
the registers to execute an inferior call.
The bug did not happen systematically as most of the time, when threads are
blocked in syscalls, vgdb attaches using ptrace to the valgrind process.
When vgdb attaches with ptrace, it stops all the threads using linux syscall.
When vgdb stops the threads, the threads blocked in a syscall will not
execute the instructions between the syscall instruction and the lock
acquire, and so the problem of desynchronisation between the VEX state
and the register cache could not happen.
This commit touches architecture specific files of the gdbserver,
it has been tested on amd64/debian, on pcc64/centos and on arm64/ubuntu.
Possibly, some untested arch might not compile but the fix should be trivial.