An overhaul of VEX's floating point handling, to facilitate correct
simulation of IEEE rounding modes in all FP operations.
The fundamental change is to add a third argument to the basic
floating point primops, eg AddF64, MulF64, etc, indicating the
(IR-encoded) rounding mode to be used for that operation.
Unfortunately IR did not have any way to support three-argument
primops, which means a new kind of IRExpr has been added: a ternary
op, IRExpr_Triop, which is simply a 3-argument form of the existing IR
binary operation node. The unfortunate side effect is that the size
of the union type IRExpr has increased from 16 to 20 bytes on 32-bit
platforms, and hence the JIT chews through more memory, but this does
not appear to have a measurable effect on the JIT's performance, at
least as measured by Valgrind's perf suite.
* Add IRExpr_Triop, and add handling code to dozens of places which
examine IRExprs.
* Rename/retype a bunch of floating point IR primops to take a 3rd
rounding mode argument (which is always the first arg).
* Add extra primops AddF64r32 et al, which do double-precision FP
operations and then round to single precision, still within a 64-bit
type. This is needed to simulate PPC's fadds et al without double
rounding.
* Adjust the PPC->IR front end, to generate these new primops and
rounding modes.
* Cause the IR optimiser to do a CSE pass on blocks containing any
floating point operations. This commons up the IR rounding mode
computations, which is important for generating efficient code from
the backend.
* Adjust the IR->PPC back end, so as to emit instructions to set the
rounding mode before each FP operation. Well, at least in
principle. In practice there is a bit of cleverness to avoid
repeatedly setting it to the same value. This depends on both the
abovementioned CSE pass, and on the SSA property of IR (cool stuff,
SSA!). The effect is that for most blocks containing FP code, the
rounding mode is set just once, at the start of the block, and the
resulting overhead is minimal. See comment on
set_FPU_rounding_mode().
This change requires followup changes in memcheck. Also, the
x86/amd64 front/back ends are temporarily broken.