Julian Seward [Mon, 1 May 2006 02:14:17 +0000 (02:14 +0000)]
Counterpart to r1605: in the ppc insn selector, don't use the bits
VexArchInfo.hwcaps to distinguish ppc32 and ppc64. Instead pass
the host arch around. And associated plumbing.
Don't use the bits VexArchInfo.hwcaps to distinguish ppc32 and ppc64,
since that doesn't work properly. Instead pass the guest arch around
too. Small change with lots of associated plumbing.
Julian Seward [Tue, 7 Mar 2006 01:15:50 +0000 (01:15 +0000)]
Move the helper function for x86 'fxtract' to g_generic_x87.c so
it can be shared by the x86 and amd64 front ends, then use it to
implement fxtract on amd64.
Julian Seward [Wed, 8 Feb 2006 19:30:46 +0000 (19:30 +0000)]
Redo the way FP multiply-accumulate insns are done on ppc32/64.
Instead of splitting them up into a multiply and an add/sub, add 4 new
primops which keeps the operation as a single unit. Then, in the back
end, re-emit the as a single instruction.
Reason for this is that so-called fused-multiply-accumulate -- which
is what ppc does -- generates a double-double length intermediate
result (of the multiply, 112 mantissa bits) before doing the add, and
so it is impossible to do a bit-accurate simulation of it using AddF64
and MulF64.
Unfortunately the new primops unavoidably take 4 args (a rounding mode
+ 3 FP args) and so there is a new IRExpr expression type, IRExpr_Qop
and associated supporting junk.
Julian Seward [Sat, 4 Feb 2006 15:24:00 +0000 (15:24 +0000)]
Make the CSE pass more aggressive. It now commons up Mux0X and GetI
expressions too. This generates somewhat better FP code on x86 since
it removes more redundant artefacts from the x87 FP stack simulation.
Unfortunately commoning up GetIs complicates CSEs, since it is now
possible that "available expressions" collected by the CSEr will
become invalidated by writes to the guest state as we work through the
block. So there is additional code to check for this case.
Some supporting functions (getAliasingRelation_IC and
getAliasingRelation_II) have been moved earlier in the file.
Julian Seward [Fri, 3 Feb 2006 16:08:03 +0000 (16:08 +0000)]
An overhaul of VEX's floating point handling, to facilitate correct
simulation of IEEE rounding modes in all FP operations.
The fundamental change is to add a third argument to the basic
floating point primops, eg AddF64, MulF64, etc, indicating the
(IR-encoded) rounding mode to be used for that operation.
Unfortunately IR did not have any way to support three-argument
primops, which means a new kind of IRExpr has been added: a ternary
op, IRExpr_Triop, which is simply a 3-argument form of the existing IR
binary operation node. The unfortunate side effect is that the size
of the union type IRExpr has increased from 16 to 20 bytes on 32-bit
platforms, and hence the JIT chews through more memory, but this does
not appear to have a measurable effect on the JIT's performance, at
least as measured by Valgrind's perf suite.
* Add IRExpr_Triop, and add handling code to dozens of places which
examine IRExprs.
* Rename/retype a bunch of floating point IR primops to take a 3rd
rounding mode argument (which is always the first arg).
* Add extra primops AddF64r32 et al, which do double-precision FP
operations and then round to single precision, still within a 64-bit
type. This is needed to simulate PPC's fadds et al without double
rounding.
* Adjust the PPC->IR front end, to generate these new primops and
rounding modes.
* Cause the IR optimiser to do a CSE pass on blocks containing any
floating point operations. This commons up the IR rounding mode
computations, which is important for generating efficient code from
the backend.
* Adjust the IR->PPC back end, so as to emit instructions to set the
rounding mode before each FP operation. Well, at least in
principle. In practice there is a bit of cleverness to avoid
repeatedly setting it to the same value. This depends on both the
abovementioned CSE pass, and on the SSA property of IR (cool stuff,
SSA!). The effect is that for most blocks containing FP code, the
rounding mode is set just once, at the start of the block, and the
resulting overhead is minimal. See comment on
set_FPU_rounding_mode().
This change requires followup changes in memcheck. Also, the
x86/amd64 front/back ends are temporarily broken.
Julian Seward [Fri, 27 Jan 2006 21:20:15 +0000 (21:20 +0000)]
Change the way Vex represents architecture variants into something
more flexible. Prior to this change, the type VexSubArch effectively
imposed a total ordering on subarchitecture capabilities, which was
overly restrictive. This change moves to effectively using a bit-set,
allowing some features (instruction groups) to be supported or not
supported independently of each other.
Julian Seward [Wed, 25 Jan 2006 21:29:48 +0000 (21:29 +0000)]
Change the way the ppc backend does ppc32/64 float-integer
conversions. fctiw/fctid/fcfid/stfiwx are now represented explicitly
and are generated by the instruction selector. This removes the need
for hdefs.c to know anything about scratch areas on the stack and
scratch FP registers.
Julian Seward [Fri, 20 Jan 2006 14:19:25 +0000 (14:19 +0000)]
More ppc64-only function wrapping hacks:
- increase size of redirect stack from 8 to 16 elems
- augment the _NRADDR pseudo-register with _NRADDR_GPR2,
which is the value of R2 at the most recent divert point.
This is needed in the ELF ppc64 ABI in order to safely run
the function being wrapped.
- add pseudo-instruction to read get _NRADDR_GPR2 into _GPR3.
- related change: always keep R2 up to date wrt possible memory
exceptions (no specific reason, just being conservative)
Julian Seward [Wed, 18 Jan 2006 04:14:52 +0000 (04:14 +0000)]
For ppc64, emit AbiHints from the front end so as to tell tools when
the 288-byte stack should be regarded as having become undefined as
per the ppc64 ELF ABI.
Julian Seward [Tue, 17 Jan 2006 01:48:46 +0000 (01:48 +0000)]
Two different sets of changes (hard to disentangle):
* Remove from Vex all knowledge about function wrapping. All the IR
trickery needed can be done on the Valgrind side, by giving
LibVEX_Translate yet another callback. This one is called just
before any instructions are disassembled into IR, allowing Valgrind
to insert its own IR preamble if it wants. It also allows Valgrind
to inhibit any insn disassembly for the block. Effect is that this
allows Valgrind to provide any old IR for a given translation, and
have Vex process it as usual, yet that IR can be anything and does
not have to bear any relationship to any guest insns anywhere.
* Consistently pass a void* closure argument as the first parameter to
all Valgrind-supplied callbacks. This gets rid of various nasty hacks
at the Valgrind side to do with passing instance-specific values
to callbacks.
Julian Seward [Tue, 17 Jan 2006 01:42:56 +0000 (01:42 +0000)]
Give the ppc64 guest state a 16-entry pseudo-register array,
guest_REDIR_STACK. This is used (along with a stack pointer,
guest_REDIR_SP) by Valgrind to support function replacement and
wrapping on ppc64-linux. Due to the strange ppc64-linux ABI, both
replacement and wrapping require saving (%R2,%LR) pairs on a stack,
and this provides the stack.
Julian Seward [Tue, 17 Jan 2006 01:39:15 +0000 (01:39 +0000)]
Teach the ppc back end (64-bit mode only) how to deal with PutI and
GetI. These are needed to support by IR trickery which supports
function replacement/wrapping on ppc64-linux.
Note: Only the rounding mode field of the FPSCR is supported.
- Reads from any other bits return zero.
- Writes to any other bits are ignored. Writes to 'exception control' bits or the 'non-ieee mode' bit results in an emulation warning.
Fix switchback.c to reflect changes to call of LibVEX_Translate()
Fix test_ppc_jm1.c to reflect direct linking
- main -> __main etc
- vex_printf -> vexxx_printf etc
Fixed up front and backend for 32bit mul,div,cmp,shift in mode64
Backend:
- separated shifts from other alu ops
- gave {shift, mul, div, cmp} ops a bool to indicate 32|64bit insn
- fixed and implemented more mode64 cases
Also improved some IR by moving imm's to right arg of binop - backend assumes this.
All integer ppc32 insns now pass switchback tests in 64bit mode.
(ppc64-only insns not yet fully tested)
Julian Seward [Thu, 15 Dec 2005 14:02:34 +0000 (14:02 +0000)]
- x86 back end: change code generation convention, so that instead of
dispatchers CALLing generated code which later RETs, dispatchers
jump to generated code and it jumps back to the dispatcher. This
removes two memory references per translation run and by itself
gives a measureable performance improvement on P4. As a result,
there is new plumbing so that the caller of LibVEX_Translate can
supply the address of the dispatcher to jump back to.
This probably breaks all other targets. Do not update.
- Administrative cleanup: LibVEX_Translate has an excessive
number of arguments. Remove them all and instead add a struct
by which the arguments are supplied. Add further comments
about the meaning of some fields.
Switchbacker updates
- no longer using home-grown linker - simply compiling and linking switchback.c with test_xxx.c
- updated to handle ppc64 (along with it's weirdo function descriptors...)
- have to be careful not to use exported functions from libvex_arch_linux.a, hence vex_printf -> vexxx_printf in test_xxx.c