guest-ppc32
~~~~~~~~~~
- store-with-update instrs: Valgrind pagefault handler expects faulting address >= current stack ptr, so we need to update the stack ptr register _before_ storing the old stack ptr
- branch_ctr_ok (bad calc for 'branch if %ctr zero' case)
- mcrf: scanning bitfields in the wrong direction
- on spotting the magic sequence, delta += 24
- updated DIPs for +ve-only args
host-ppc32
~~~~~~~~~
- fixed CMov reg usage
- fixed Pin_Call in emit_PPC32Instr(): we already know how far we're jumping
- fixed Pin_Goto in emit_PPC32Instr(): vassert right range of jump deltas
other-ppc32
~~~~~~~~~~
- exported OFFSET_ppc32_(various) for valgrind
Julian Seward [Wed, 18 May 2005 11:47:47 +0000 (11:47 +0000)]
Handle XCHG rAX, reg for 32-bit regs as well as 64-bit regs. I'm not
sure this is right -- the AMD64 docs are very difficult to interpret
on the subtle point of precisely what is and isn't to be regarded as a
no-op.
Julian Seward [Thu, 12 May 2005 17:55:01 +0000 (17:55 +0000)]
Add the beginnings of what might be a general mechanism to pass
ABI-specific knowledge through the IR compilation pipeline. This
entails a new IR construction, AbiHint.
Currently there is only one kind of hint, and it is generated by the
amd64 front end. This tells whoever wants to know that a function
call or return has happened, and so the 128 bytes below %rsp should be
considered undefined.
Julian Seward [Wed, 11 May 2005 23:16:13 +0000 (23:16 +0000)]
Allow reg-alloc to use %rbx. This is a callee-saved register and
therefore particularly valuable - bringing it into circulation reduces
the volume of code generated by memcheck by about 3%.
Julian Seward [Wed, 11 May 2005 22:55:08 +0000 (22:55 +0000)]
Ah, the joys of register allocation. You might think that giving
reg-alloc as many registers as possible maximises performance. You
would be wrong. Giving it more registers generates more spilling of
caller-saved regs around the innumerable helper calls created by
Memcheck. What we really need are zillions of callee-save registers,
but those are in short supply. Hmm, perhaps I should let it use %rbx
too -- that's listed as callee-save.
Anyway, the current arrangement allows reg-alloc to use 8
general-purpose regs and 10 xmm registers. The x87 registers are not
used at all. This seems to work fairly well.
Julian Seward [Mon, 9 May 2005 22:23:38 +0000 (22:23 +0000)]
Finish off amd64 MMX instructions before they finish me off (it's
either them or me). Honestly, the amd64 insn set has the most complex
encoding I have ever seen.
Julian Seward [Tue, 3 May 2005 12:20:15 +0000 (12:20 +0000)]
x86 guest: generate Iop_Neg* in the x86->IR phase. Intent is to
ensure that the non-shadow (real) computation done by the program will
fail if Iop_Neg* is incorrectly handled somehow. Until this point,
Iop_Neg* is only generated by memcheck and so it will not be obvious
if it is mishandled. IOW, this commit enhances verifiability of the
x86-IR-x86 pipeline.
Julian Seward [Mon, 2 May 2005 16:16:15 +0000 (16:16 +0000)]
Minor tweakage: use testl rather than andl in three places on the
basis that andl trashes the tested register whereas testl doesn't. In
two out of the three cases this makes no difference since the tested
register is a copy of some other register anyway, but hey.
Add a few new primops which allow for more concise expression of
the instrumentation Memcheck generates:
* CmpNEZ{8,16,32,64}, which are equivalent to CmpNE<sz> with one
argument zero
* Neg{8,16,32,64}, which is equivalent to Sub<sz> with the first
argument zero
For 64-bit platforms, add these primops. This gives a complete set of
primops for conversions between the integral types (I8, I16, I32,
I64), so that a widening/narrowing from any type to any other type can
be achieved in a single primop:
Fix a nasty assembler bug, in the handling of Set64, arising from
confusion over whether we were looking at a complete integer register
number or just the lower 3 bits of it. Rename functions pertaining to
messing with integer register numbers in an attempt to stop this
happening in future.
When generating IR for movsd mem->reg, don't first write the entire
guest reg with zeroes and then overwrite the lower half. This forces
the back end to generate code which creates huge write-after-write
stalls in the memory system of P4s due to the different sized writes.
This apparently small change reduces the run-time of one
sse2-intensive floating point program from 145 seconds to 90 seconds
(--tool=none).
Whenever the flags thunk is set, fill in all the fields, even NDEP
which isn't usually used. This makes redundant-PUT elimination work
better, fixing a rather subtle optimisation bug. For at least one
floating-point case this gives a significant speedup. Consider a bb
like this:
(no flag setting insns before inc)
inc ...
(no flag setting insns)
add ...
inc sets CC_OP, CC_DEP1 and CC_NDEP; the latter is expensive because a
call to calculate_eflags_c is required.
add sets CC_OP, CC_DEP1 and CC_DEP2. The CC_NDEP value is now
irrelevant, but because CC_NDEP is not overwritten, iropt cannot
remove the previous assignment to it, and so the expensive helper call
remains even though it is irrelevant.
This commit fixes that: By always setting NDEP to zero whenever its
value will be unused, any previous assignment to it will be removed by
iropt.
This change should be propagated to the amd64 front end too.