Julian Seward [Fri, 18 Jun 2010 08:17:41 +0000 (08:17 +0000)]
Implement SSE4 instructions: PCMPGTQ PMAXUD PMINUD PMAXSB PMINSB PMULLD
I believe this covers everything that gcc-4.4 and gcc-4.5 will generate
with "-O3 -msse4.2". Note, this commit changes the set of IR ops and so
requires a from-scratch rebuild of the tree.
Julian Seward [Mon, 7 Jun 2010 16:22:22 +0000 (16:22 +0000)]
Implement SIDT and SGDT as pass-throughs to the host. It's a pretty
bad thing to do, but I can't think of a way to virtualise these
properly. Patch from Alexander Potapenko. See
https://bugs.kde.org/show_bug.cgi?id=205241#c38
Julian Seward [Tue, 4 May 2010 08:48:43 +0000 (08:48 +0000)]
Handle v7 memory fence instructions: ISB DSB DMB and their v6 equivalents:
mcr 15,0,r0,c7,c5,4 mcr 15,0,r0,c7,c10,4 mcr 15,0,r0,c7,c10,5
respectively. Re-emit them in the v6 form so as not to inhibit possible
support for v6-only platforms in the future. Extended version of a patch
from Alexander Potapenko (glider@google.com). Fixes bug 228060.
Julian Seward [Sun, 21 Feb 2010 20:40:53 +0000 (20:40 +0000)]
CVTPI2PD (which converts 2 x I32 in M64 or MMX to 2 x F64 in XMM):
only switch the x87 FPU to MMX mode in the case where the source
operand is in memory, not in an MMX register. This fixes #210264.
This is all very fishy.
* it's inconsistent with all other instructions which convert between
values in (MMX or M64) and XMM, in that they put the FPU in MMX mode
even if the source is memory, not MMX. (for example, CVTPI2PS).
At least, that's what the Intel docs appear to say.
* the AMD documentation makes no mention at all of this. For example
it makes no differentiation in this matter between CVTPI2PD and
CVTPI2PS.
I wonder if Intel surreptitiously changed the behaviour of CVTPI2PD
since this code was written circa 5 years ago. Or, whether the Intel
and AMD implementations differ in this respect.
Julian Seward [Sun, 17 Jan 2010 15:47:01 +0000 (15:47 +0000)]
x86/amd64 front ends: don't chase a conditional branch that leads
back to the start of the trace. It's better to leave the IR loop
unroller to handle such cases.
Julian Seward [Fri, 15 Jan 2010 10:53:21 +0000 (10:53 +0000)]
Add logic to allow front ends to speculatively continue adding guest
instructions into IRSBs (superblocks) after conditional branches.
Currently only the x86 and amd64 front ends support this. The
assumption is that backwards conditional branches are taken and
forwards conditional branches are not taken, which is generally
regarded as plausible and is particularly effective with code compiled
by gcc at -O2, -O3 or -O -freorder-blocks (-freorder-blocks is enabled
by default at -O2 and above).
Is disabled by default. Has been seen to provide notable speedups
(eg, --tool=none for perf/bz2), and reduces the number of
block-to-block transitions dramatically, by up to half, but usually
makes programs run more slowly. Increases the amount of generated
code by at least 15%-20% and so is a net liability in terms of icache
misses and JIT time.
Julian Seward [Mon, 11 Jan 2010 10:46:18 +0000 (10:46 +0000)]
For 32-bit reads of integer guest registers, generate a 64-bit Get
followed by a Iop_64to32 narrowing, rather than doing a 32-bit Get.
This makes the Put-to-Get-forwarding optimisation work seamlessly for
code which does 32-bit register operations (very common), which it
never did before. Also add a folding rule to remove the resulting
32-to-64-to-32 widen-narrow chains.
This reduces the amount of code generated overall about 3%, but gives
a much larger speedup, of about 11% for Memcheck running perf/bz2.c.
Not sure why this is, perhaps due to reducing store bandwidth
requirements in the generated code, or due to avoiding
store-forwarding stalls when writing/reading the guest state.
Julian Seward [Sat, 9 Jan 2010 11:43:21 +0000 (11:43 +0000)]
* support PLD (cache-preload-hint) instructions
* start of a framework for decoding instructions in NV space
* fix a couple of unused/untested RRX shifter operand cases
Julian Seward [Thu, 31 Dec 2009 19:26:03 +0000 (19:26 +0000)]
Make the x86 and amd64 back ends use the revised prototypes for
genSpill and genReload. ppc32/64 backends are still broken.
Also, tidy up associated pointer-type casting in main_main.c.
Julian Seward [Thu, 31 Dec 2009 18:00:12 +0000 (18:00 +0000)]
Merge r1925:1948 from branches/ARM. This temporarily breaks all other
targets, because a few IR primops to do with int<->float conversions
have been renamed, and because an internal interface for creating
spill/reload instructions has changed.
Julian Seward [Thu, 26 Nov 2009 17:17:37 +0000 (17:17 +0000)]
Change the IR representation of load linked and store conditional.
They are now moved out into their own new IRStmt kind (IRStmt_LLSC),
and are not treated merely as variants of standard loads (IRExpr_Load)
or store (IRStmt_Store). This is necessary because load linked is a
load with a side effect (lodging a reservation), hence it cannot be an
IRExpr since IRExprs denote side-effect free value computations.
Fix up all front and back ends accordingly; also iropt.
Use a much faster hash function to do the self-modifying-code checks.
This reduces the extra overhead of --smc-check=all when running
Memcheck from about 75% to about 45%.
Julian Seward [Sun, 2 Aug 2009 14:35:45 +0000 (14:35 +0000)]
Implement mfpvr (mfspr 287) (bug #201585).
Also, fix a type mismatch in the generated IR for mfspr 268/269 which
would have caused an IR checker assertion failure when handling those
insns on ppc64.
Tell the register allocator on x86 that xmm0..7 are trashed across
function calls. This forces it to handle them as caller-saved, which
is (to the extent that it's possible to tell) what the ELF ABI
requires. Lack of this has been observed to corrupt floating point
computations in tools that use the xmm registers in the helper
functions called from generated code. This change brings the x86
backend into line with the amd64 backend, the latter of which has
always treated the xmm regs as caller-saved.
The x87 registers are still incorrectly handled as callee-saved.
Add new integer comparison primitives Iop_CasCmp{EQ,NE}{8,16,32,64},
which are semantically identical to Iop_Cmp{EQ,NE}{8,16,32,64}. Use
these new primitives instead of the normal ones, in the tests
following IR-level compare-and-swap operations, which establish
whether or not the CAS succeeded. This is all for Memcheck's benefit,
as it really needs to be able to identify which comparisons are
CAS-success tests and which aren't. This is all described in great
detail in memcheck/mc_translate.c in the comment
"COMMENT_ON_CasCmpEQ".
Flatten out the directory structure in the priv/ side, by pulling all
files into priv/ and giving them unique names. This makes it easier
to use automake to build all this stuff in Valgrind. It also tidies
up a directory structure which had become a bit pointlessly complex.
This branch adds proper support for atomic instructions, proper in the
sense that the atomicity is preserved through the compilation
pipeline, and thus in the instrumented code.
The change adds a new IR statement kind, IRStmt_CAS, which represents
single- and doubleword compare-and-swap. This is used as the basis
for the translation of all LOCK-prefixed instructions on x86 and
amd64.
The change also extends IRExpr_Load and IRStmt_Store so that
load-linked and store-conditional operations can be represented. This
facilitates correct translation of l[wd]arx and st[wd]cx. on ppc in
the sense that these instructions will now eventually be regenerated
at the end of the compilation pipeline.
Julian Seward [Thu, 19 Mar 2009 22:21:40 +0000 (22:21 +0000)]
In order to make it possible for Valgrind to restart client syscalls
that have been interrupted by signals, on Darwin, generalise an idea
which first emerged in the guest ppc32/64 stuff, in order to solve the
same problem on AIX.
Idea is: make all guests have a pseudo-register "IP_AT_SYSCALL", which
records the address of the most recently executed system call
instruction. Then, to back up the guest over the most recent syscall,
simply make its program counter equal to this value. This idea
already existing in the for ppc32/64 guests, but the register was
called "CIA_AT_SC".
Currently is not set in guest-amd64.
This commit will break the Valgrind svn trunk (temporarily).
Julian Seward [Wed, 4 Jun 2008 09:10:38 +0000 (09:10 +0000)]
Translate "fnstsw %ax" in a slightly different way, which plays better
with Memcheck's origin tracking stuff. a.k.a. a lame kludge. See
comments in source.
Julian Seward [Fri, 30 May 2008 22:58:07 +0000 (22:58 +0000)]
In some obscure circumstances, the allocator would incorrectly omit a
spill store on the basis that the register being spilled had the same
value as the spill slot being written to. This change is believed to
make the equals-spill-slot optimisation correct. Fixes a bug first
observed by Nuno Lopes and later by Marc-Oliver Straub.
Julian Seward [Sun, 11 May 2008 10:11:58 +0000 (10:11 +0000)]
Compute the starting address of the instruction correctly. This has
always been wrong and can cause the next-instruction-address to be
wrong in obscure circumstances. Fixes #152818.