Reg<->Reg MOV coalescing status is now a part of the HRegUsage.
This allows register allocation to query it two times without incurring
a performance penalty. This in turn allows to better keep track of
vreg<->vreg MOV coalescing so that all vregs in the coalesce chain
get the effective |dead_before| of the last vreg.
A small performance improvement has been observed because this allows
to coalesce even spilled vregs (previously only assigned ones).
Reorder allocatable registers for s390x so that the callee saved are listed first.
Helper calls always trash all caller saved registers. By listing the callee saved
first then VEX register allocator (both v2 and v3) is more likely to pick them
and does not need to spill that much before helper calls.
Reorder allocatable registers for AMD64, X86, and PPC so that the callee saved are listed first.
Helper calls always trash all caller saved registers. By listing the callee saved
first then VEX register allocator (both v2 and v3) is more likely to pick them
and does not need to spill that much before helper calls.
iselStmtVec: take notice of the IR level branch hint.
When computing the cc to be embedded within the resulting HInstrIfThenElse,
it is necessary to take notice of the branch hint, and invert the sense of
the condition code if the |else| branch is to be the OOL one.
Ivo Raisr [Wed, 13 Sep 2017 15:38:13 +0000 (17:38 +0200)]
Clone and merge the register allocator states for If-Then-Else support.
The register allocator state is cloned in stage 4, before fall-through
and out-of-line legs are processed. The states are then merged back
at the legs join.
Reduce number of spill instructions generated by VEX register allocator v3.
Keeps track whether the bound real register has been reloaded from a virtual
register recently and if this real reg is still equal to that spill slot.
Avoids unnecessary spilling that vreg later, when this rreg needs
to be reserved, usually as a caller save register for a helper call.
Ivo Raisr [Fri, 25 Aug 2017 22:19:05 +0000 (00:19 +0200)]
VEX register allocator version 3.
Implements a new version of VEX register allocator which
keeps the main state per virtual registers, as opposed
to real registers in v2. This results in a simpler and
cleaner design and much simpler implementation.
It has been observed that the new allocator executes 20-30%
faster than the previous one but could produce slightly worse
spilling decisions. Overall performance improvement when running
the Valgrind performance regression test suite has been observed
in terms of a few percent.
The new register allocator (v3) is now the default one.
The old register allocator (v2) is still kept around and can be
activated with command line option '--vex-regalloc-version=2'.
Petar Jovanovic [Tue, 22 Aug 2017 13:53:15 +0000 (15:53 +0200)]
mips: reimplement handling of div, divu and ddivu
Previous implementation misused some opcodes, and a side effect was
dead code emission.
To reimplement handling of these instructions, three new IoPs have been
introduced:
Iop_DivModU64to64, // :: I64,I64 -> I128
// of which lo half is div and hi half is mod
Iop_DivModS32to32, // :: I32,I32 -> I64
// of which lo half is div and hi half is mod
Iop_DivModU32to32, // :: I32,I32 -> I64
// of which lo half is div and hi half is mod
Ivo Raisr [Fri, 18 Aug 2017 14:53:57 +0000 (16:53 +0200)]
Recognize signal 151 (SIGLIBRT) sent by gdb.
It has been observed that gdb on Solaris sends this signal to
child processes. Unfortunately array "pass_signals" was too small
to accomodate this signal and subsequently VG_(clo_vex_control).iropt_verbosity
was overwritten.
This has been fixed now.
Petar Jovanovic [Thu, 17 Aug 2017 18:08:17 +0000 (20:08 +0200)]
mips32: finetune vfp test to avoid compiler warnings
This patch removes two compiler warnings from the test:
vfp.c: In function 'handler':
vfp.c:260:4: warning: implicit declaration of function 'exit'
[-Wimplicit-function-declaration]
exit(0);
^
vfp.c:260:4: warning: incompatible implicit declaration of built-in
function 'exit'
vfp.c: At top level:
vfp.c:258:13: warning: 'handler' defined but not used [-Wunused-function]
static void handler(int sig)
^
Ensure host stack trace has better chance to work when valgrind is exiting
When investigating bug 383275, the host stacktrace was containing
only one IP. This is because the tid corresponding to the lwpid
is dead, and so no valid thread state was returned.
This then gave a rubbish stacktop of 0, which means unwinding
stops at first frame.
So, try harder to find a valid thread state when reporting the
host stacktrace.
When a massif xtree snapshot is taken when no allocation was done,
the xtree contains no exe context.
The data structure ips_order_xecu is then szied to 0 using VG_(hintSizeXA).
m_xarray.c then allocates an empty array, while later on, a zero size
is expected to correspond to no allocated array.
Fix the problem in m_xarray.c, by not doing any allocation if the
size hint is 0.
Ivo Raisr [Fri, 28 Jul 2017 20:49:20 +0000 (20:49 +0000)]
Check whether it is ok to use compiler flag '-pie'.
Some compilers actually do not support -pie and report its usage
as an error. We need to check if it is safe to use it first.
n-i-bz
valgrind core side for Add inner requests in VEX (cfr revision 3399)
When running Valgrind under Valgrind, the VEX memory allocation
(temporary or permanent) was not checked, as there was no
inner request.
This patch changes VEX to mark the temporary and permanent
allocations with redzone, and memory is marked unaddressable
when the VEX temporary pool is cleared.
The changes are:
* add a file libvex_inner.h which mostly takes over what
was in pub_core_inner.h (which now just includes libvex_inner.h)
* modify main_util.h and main_util.c to mark the temporary
and permanent pool with memcheck pool requests to indicate
when a block is allocated or freed.
* Impact is (should be) none, unless Valgrind is configured
as an inner.
* Outer memcheck/inner regression tests run on gcc20 (amd64).
Nothing (more worrying than the 3.13 self hosting) detected
When running Valgrind under Valgrind, the VEX memory allocation
(temporary or permanent) was not checked, as there was no
inner request.
This patch changes VEX to mark the temporary and permanent
allocations with redzone, and memory is marked unaddressable
when the VEX temporary pool is cleared.
The changes are:
* add a file libvex_inner.h which mostly takes over what
was in pub_core_inner.h (which now just includes libvex_inner.h)
* modify main_util.h and main_util.c to mark the temporary
and permanent pool with memcheck pool requests to indicate
when a block is allocated or freed.
* Impact is (should be) none, unless Valgrind is configured
as an inner.
* Outer memcheck/inner regression tests run on gcc20 (amd64).
Nothing (more worrying than the 3.13 self hosting) detected
ld.so: Reject overly long LD_PRELOAD path elements
arm32 doesn't have an ld.so hardwire for index/strchr like other
architectures and so will always complain during early startup:
==9495== Conditional jump or move depends on uninitialised value(s)
==9495== at 0x401CF84: index (in /usr/lib/ld-2.25.so)
==9495==
==9495== Conditional jump or move depends on uninitialised value(s)
==9495== at 0x401CF88: index (in /usr/lib/ld-2.25.so)
index/strchr is doing a word load from a partially-written
stack-allocated buffer, therefore accessing uninitialized data.
This is normal for an optimized string function. The uninitialized
data does not affect the function result.
This can be suppressed by adding a index hardwire for ld.so on arm32
like on other arches. There even was already some commented out code
to do that. Enable that code.
After fork, vgdb activity is polled according to the nr of bbs done :
once the nr of bbs done reaches the next vgdb poll, a check for vgdb
activity is done.
This might lead to the activation of gdbserver after fork.
Such poll is however not expected, unless the children is
to be trace.
This spurious poll in the forked child can cause failures
depending on the nr of bbs done before the fork, and the
nr of bbs done between the fork and the exec.
=> disable vgdb poll in the child in the cleanup after fork
in the child, unless the children have to be traced.
At the beginning of a Valgrind gdbserver test,
2 messages are produced when launching the command
target remote | vgdb
A message output by vgdb:
relaying data between gdb and process <pid>
(this message is read by GDB from the vgdb pipe, and re-output
on stderr)
and a message produced by GDB:
Remote debugging using | ./vgdb
GDB 8.0 changes the order in which the above messages are output.
This causes 2 tests to fail, as the 'relaying' line appears
then in a part of the output deleted by a filter script.
To avoid this, change the filter scripts to always remove
this 'relaying line', which is not particularly interesting to check.
All the .exp files containining such a 'relaying' line are updated
accordingly.
This has been tested with various gdb versions (7.5, 7.7, 7.12, 8.0)
on amd64 and/or ppc64.
Thanks to Mark Wielaard, which helped to investigate this problem
by bisecting the GDB patches in GDB 8.0 causing this change of
behaviour.
Mark Wielaard [Tue, 20 Jun 2017 17:55:13 +0000 (17:55 +0000)]
Bug 381274 powerpc too chatty even with --sigill-diagnostics=no.
Even with valgrind --sigill-diagnostics=no (or -q) guest_ppc_toIR.c
will report various cases why it didn't handle an instruction. e.g.
disInstr(ppc): found the Power 8 instruction 0x10000508 that can't be
handled by Valgrind on this host. This instruction requires a host
that supports Power 8 instructions.
After which valgrind will generate a SIGILL. But in case the user uses
-q or --sigill-diagnostics=no they aren't interested in that diagnostics.
For example openssl will try some power 8 instructions while initializing
and catch the SIGILL if not supported without issue.
Guard those cases with if (sigill_diag) like the generic decode_failure.
Mark Wielaard [Sat, 17 Jun 2017 13:49:22 +0000 (13:49 +0000)]
epoll_pwait can have a NULL sigmask.
According to the epoll_pwait(2) man page:
The sigmask argument may be specified as NULL, in which case
epoll_pwait() is equivalent to epoll_wait().
But doing that under valgrind gives:
==13887== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==13887== at 0x4F2B940: epoll_pwait (epoll_pwait.c:43)
==13887== by 0x400ADE: main (syscalls-2007.c:89)
==13887== Address 0x0 is not stack'd, malloc'd or (recently) free'd
This is because the sys_epoll_pwait wrapper has:
if (ARG4)
PRE_MEM_READ( "epoll_pwait(sigmask)", ARG5, sizeof(vki_sigset_t) );
Which looks like a typo (ARG4 is timeout and ARG5 is sigmask).
This shows up with newer glibc which translates an epoll_wait call into
an epoll_pwait call with NULL sigmask.