Julian Seward [Wed, 25 Jan 2006 21:29:48 +0000 (21:29 +0000)]
Change the way the ppc backend does ppc32/64 float-integer
conversions. fctiw/fctid/fcfid/stfiwx are now represented explicitly
and are generated by the instruction selector. This removes the need
for hdefs.c to know anything about scratch areas on the stack and
scratch FP registers.
Julian Seward [Fri, 20 Jan 2006 14:19:25 +0000 (14:19 +0000)]
More ppc64-only function wrapping hacks:
- increase size of redirect stack from 8 to 16 elems
- augment the _NRADDR pseudo-register with _NRADDR_GPR2,
which is the value of R2 at the most recent divert point.
This is needed in the ELF ppc64 ABI in order to safely run
the function being wrapped.
- add pseudo-instruction to read get _NRADDR_GPR2 into _GPR3.
- related change: always keep R2 up to date wrt possible memory
exceptions (no specific reason, just being conservative)
Julian Seward [Wed, 18 Jan 2006 04:14:52 +0000 (04:14 +0000)]
For ppc64, emit AbiHints from the front end so as to tell tools when
the 288-byte stack should be regarded as having become undefined as
per the ppc64 ELF ABI.
Julian Seward [Tue, 17 Jan 2006 01:48:46 +0000 (01:48 +0000)]
Two different sets of changes (hard to disentangle):
* Remove from Vex all knowledge about function wrapping. All the IR
trickery needed can be done on the Valgrind side, by giving
LibVEX_Translate yet another callback. This one is called just
before any instructions are disassembled into IR, allowing Valgrind
to insert its own IR preamble if it wants. It also allows Valgrind
to inhibit any insn disassembly for the block. Effect is that this
allows Valgrind to provide any old IR for a given translation, and
have Vex process it as usual, yet that IR can be anything and does
not have to bear any relationship to any guest insns anywhere.
* Consistently pass a void* closure argument as the first parameter to
all Valgrind-supplied callbacks. This gets rid of various nasty hacks
at the Valgrind side to do with passing instance-specific values
to callbacks.
Julian Seward [Tue, 17 Jan 2006 01:42:56 +0000 (01:42 +0000)]
Give the ppc64 guest state a 16-entry pseudo-register array,
guest_REDIR_STACK. This is used (along with a stack pointer,
guest_REDIR_SP) by Valgrind to support function replacement and
wrapping on ppc64-linux. Due to the strange ppc64-linux ABI, both
replacement and wrapping require saving (%R2,%LR) pairs on a stack,
and this provides the stack.
Julian Seward [Tue, 17 Jan 2006 01:39:15 +0000 (01:39 +0000)]
Teach the ppc back end (64-bit mode only) how to deal with PutI and
GetI. These are needed to support by IR trickery which supports
function replacement/wrapping on ppc64-linux.
Note: Only the rounding mode field of the FPSCR is supported.
- Reads from any other bits return zero.
- Writes to any other bits are ignored. Writes to 'exception control' bits or the 'non-ieee mode' bit results in an emulation warning.
Fix switchback.c to reflect changes to call of LibVEX_Translate()
Fix test_ppc_jm1.c to reflect direct linking
- main -> __main etc
- vex_printf -> vexxx_printf etc
Fixed up front and backend for 32bit mul,div,cmp,shift in mode64
Backend:
- separated shifts from other alu ops
- gave {shift, mul, div, cmp} ops a bool to indicate 32|64bit insn
- fixed and implemented more mode64 cases
Also improved some IR by moving imm's to right arg of binop - backend assumes this.
All integer ppc32 insns now pass switchback tests in 64bit mode.
(ppc64-only insns not yet fully tested)
Julian Seward [Thu, 15 Dec 2005 14:02:34 +0000 (14:02 +0000)]
- x86 back end: change code generation convention, so that instead of
dispatchers CALLing generated code which later RETs, dispatchers
jump to generated code and it jumps back to the dispatcher. This
removes two memory references per translation run and by itself
gives a measureable performance improvement on P4. As a result,
there is new plumbing so that the caller of LibVEX_Translate can
supply the address of the dispatcher to jump back to.
This probably breaks all other targets. Do not update.
- Administrative cleanup: LibVEX_Translate has an excessive
number of arguments. Remove them all and instead add a struct
by which the arguments are supplied. Add further comments
about the meaning of some fields.
Switchbacker updates
- no longer using home-grown linker - simply compiling and linking switchback.c with test_xxx.c
- updated to handle ppc64 (along with it's weirdo function descriptors...)
- have to be careful not to use exported functions from libvex_arch_linux.a, hence vex_printf -> vexxx_printf in test_xxx.c
Julian Seward [Mon, 28 Nov 2005 13:39:37 +0000 (13:39 +0000)]
Modify the tree builder to use a fixed-size binding environment rather
than one that is potentially proportional to the length of the input
BB. This changes its complexity from quadratic to linear (in the
length of the BB) and gives a noticable increase in the overall speed
of vex. The tradeoff is that it can no longer guarantee to build
maximal trees, but in practice in only rarely fails to do so (about 1
in 100 bbs) and so the resulting degradation in code quality is
completely insignificant (unmeasurable).
Frontend
--------
Added a bunch of altivec float insns:
vaddfp, vsubfp, vmaxfp, vminfp,
vrefp, vrsqrtefp
vcmpgefp, vcmpgtfp, vcmpbfp
Made use of fact that ppc backend for compare insns return
zero'd lanes if either of the corresponding args is a nan.
- perhaps better to have an irop Iop_isNan32Fx4, but seems unecessary work until we get into running non-native code through vex.
- better still, tighten down the spec for compare irops wrt nan
Backend
-------
Separated av float ops to own insn group - they're only ever type 32x4
Added av float unary insns
Added av float cmp insns - for irops that don't map directly to native insns, native behaviour wrt nan's is followed, requiring lane value==nan comparisons for each argument vector.
Fix usage of Iop_MullEven* to give IR correct meaning of which lanes being multiplied, i.e. lowest significant lane = zero
(rather than the ibm-speke 'most significant = zero')
various helpers to construct IR
- expand8x16*: sign/zero-extend V128_8x16 lanes => 2x V128_16x8
- breakV128to4x64*: break V128 to 4xI32's, sign/zero-extend to I64's
- mkQNarrow64to32*: un/signed saturating narrow 64 to 32
- mkV128from4x64*: narrow 4xI64's to 4xI32's, combine to V128_34x4