Julian Seward [Sun, 21 May 2006 01:02:31 +0000 (01:02 +0000)]
A couple of IR simplification hacks for the amd64 front end, so as to
avoid false errors from memcheck. Analogous to some of the recent
bunch of commits to x86 front end.
Julian Seward [Sun, 14 May 2006 18:46:55 +0000 (18:46 +0000)]
Add an IR folding rule to convert Add32(x,x) into Shl32(x,1). This
fixes #118466 and it also gets rid of a bunch of false positives for
KDE 3.5.2 built by gcc-4.0.2 on x86, of the form shown below.
Use of uninitialised value of size 4
at 0x4BFC342: QIconSet::pixmap(QIconSet::Size, QIconSet::Mode,
QIconSet::State) const (qiconset.cpp:530)
by 0x4555BE7: KToolBarButton::drawButton(QPainter*)
(ktoolbarbutton.cpp:536)
by 0x4CB8A0A: QButton::paintEvent(QPaintEvent*) (qbutton.cpp:887)
Julian Seward [Sat, 13 May 2006 23:08:06 +0000 (23:08 +0000)]
Add specialisation rules to simplify the IR for 'testl .. ; js ..',
'testw .. ; js ..' and 'testb .. ; js ..'. This gets rid of a bunch of
false errors in Memcheck of the form
==2398== Conditional jump or move depends on uninitialised value(s)
==2398== at 0x6C51B61: KHTMLPart::clear() (khtml_part.cpp:1370)
==2398== by 0x6C61A72: KHTMLPart::begin(KURL const&, int, int)
(khtml_part.cpp:1881)
Julian Seward [Mon, 1 May 2006 02:14:17 +0000 (02:14 +0000)]
Counterpart to r1605: in the ppc insn selector, don't use the bits
VexArchInfo.hwcaps to distinguish ppc32 and ppc64. Instead pass
the host arch around. And associated plumbing.
Don't use the bits VexArchInfo.hwcaps to distinguish ppc32 and ppc64,
since that doesn't work properly. Instead pass the guest arch around
too. Small change with lots of associated plumbing.
Julian Seward [Tue, 7 Mar 2006 01:15:50 +0000 (01:15 +0000)]
Move the helper function for x86 'fxtract' to g_generic_x87.c so
it can be shared by the x86 and amd64 front ends, then use it to
implement fxtract on amd64.
Julian Seward [Wed, 8 Feb 2006 19:30:46 +0000 (19:30 +0000)]
Redo the way FP multiply-accumulate insns are done on ppc32/64.
Instead of splitting them up into a multiply and an add/sub, add 4 new
primops which keeps the operation as a single unit. Then, in the back
end, re-emit the as a single instruction.
Reason for this is that so-called fused-multiply-accumulate -- which
is what ppc does -- generates a double-double length intermediate
result (of the multiply, 112 mantissa bits) before doing the add, and
so it is impossible to do a bit-accurate simulation of it using AddF64
and MulF64.
Unfortunately the new primops unavoidably take 4 args (a rounding mode
+ 3 FP args) and so there is a new IRExpr expression type, IRExpr_Qop
and associated supporting junk.
Julian Seward [Sat, 4 Feb 2006 15:24:00 +0000 (15:24 +0000)]
Make the CSE pass more aggressive. It now commons up Mux0X and GetI
expressions too. This generates somewhat better FP code on x86 since
it removes more redundant artefacts from the x87 FP stack simulation.
Unfortunately commoning up GetIs complicates CSEs, since it is now
possible that "available expressions" collected by the CSEr will
become invalidated by writes to the guest state as we work through the
block. So there is additional code to check for this case.
Some supporting functions (getAliasingRelation_IC and
getAliasingRelation_II) have been moved earlier in the file.
Julian Seward [Fri, 3 Feb 2006 16:08:03 +0000 (16:08 +0000)]
An overhaul of VEX's floating point handling, to facilitate correct
simulation of IEEE rounding modes in all FP operations.
The fundamental change is to add a third argument to the basic
floating point primops, eg AddF64, MulF64, etc, indicating the
(IR-encoded) rounding mode to be used for that operation.
Unfortunately IR did not have any way to support three-argument
primops, which means a new kind of IRExpr has been added: a ternary
op, IRExpr_Triop, which is simply a 3-argument form of the existing IR
binary operation node. The unfortunate side effect is that the size
of the union type IRExpr has increased from 16 to 20 bytes on 32-bit
platforms, and hence the JIT chews through more memory, but this does
not appear to have a measurable effect on the JIT's performance, at
least as measured by Valgrind's perf suite.
* Add IRExpr_Triop, and add handling code to dozens of places which
examine IRExprs.
* Rename/retype a bunch of floating point IR primops to take a 3rd
rounding mode argument (which is always the first arg).
* Add extra primops AddF64r32 et al, which do double-precision FP
operations and then round to single precision, still within a 64-bit
type. This is needed to simulate PPC's fadds et al without double
rounding.
* Adjust the PPC->IR front end, to generate these new primops and
rounding modes.
* Cause the IR optimiser to do a CSE pass on blocks containing any
floating point operations. This commons up the IR rounding mode
computations, which is important for generating efficient code from
the backend.
* Adjust the IR->PPC back end, so as to emit instructions to set the
rounding mode before each FP operation. Well, at least in
principle. In practice there is a bit of cleverness to avoid
repeatedly setting it to the same value. This depends on both the
abovementioned CSE pass, and on the SSA property of IR (cool stuff,
SSA!). The effect is that for most blocks containing FP code, the
rounding mode is set just once, at the start of the block, and the
resulting overhead is minimal. See comment on
set_FPU_rounding_mode().
This change requires followup changes in memcheck. Also, the
x86/amd64 front/back ends are temporarily broken.
Julian Seward [Fri, 27 Jan 2006 21:20:15 +0000 (21:20 +0000)]
Change the way Vex represents architecture variants into something
more flexible. Prior to this change, the type VexSubArch effectively
imposed a total ordering on subarchitecture capabilities, which was
overly restrictive. This change moves to effectively using a bit-set,
allowing some features (instruction groups) to be supported or not
supported independently of each other.
Julian Seward [Wed, 25 Jan 2006 21:29:48 +0000 (21:29 +0000)]
Change the way the ppc backend does ppc32/64 float-integer
conversions. fctiw/fctid/fcfid/stfiwx are now represented explicitly
and are generated by the instruction selector. This removes the need
for hdefs.c to know anything about scratch areas on the stack and
scratch FP registers.