Julian Seward [Sun, 24 Dec 2006 02:20:24 +0000 (02:20 +0000)]
A large but non-functional commit: as suggested by Nick, rename some
IR types, structure fields and functions to make IR a bit easier to
understand. Specifically:
dopyIR* -> deepCopyIR*
sopyIR* -> shallowCopyIR*
The presence of a .Tmp union in both IRExpr and IRStmt is
confusing. It has been renamed to RdTmp in IRExpr, reflecting
the fact that here we are getting the value of an IRTemp, and to
WrTmp in IRStmt, reflecting the fact that here we are assigning
to an IRTemp.
IRBB (IR Basic Block) is renamed to IRSB (IR SuperBlock),
reflecting the reality that Vex does not really operate in terms
of basic blocks, but in terms of superblocks - single entry,
multiple exit sequences.
IRArray is renamed to IRRegArray, to make it clearer it refers
to arrays of guest registers and not arrays in memory.
VexMiscInfo is renamed to VexAbiInfo, since that's what it is
-- relevant facts about the ABI (calling conventions, etc) for
both the guest and host platforms.
Julian Seward [Fri, 1 Dec 2006 02:59:17 +0000 (02:59 +0000)]
Change a stupid algorithm that deals with real register live
ranges into a less stupid one. Prior to this change, the complexity
of reg-alloc included an expensive term
O(#instrs in code sequence x #real-register live ranges in code sequence)
This commit changes that term to essentially
O(#instrs in code sequence) + O(time to sort real-reg-L-R array)
On amd64 this nearly halves the cost of register allocation and means
Valgrind performs better in translation-intensive situations (a.k.a
starting programs). Eg, firefox start/exit falls from 119 to 113
seconds. The effect will be larger on ppc32/64 as there are more real
registers and hence real-reg live ranges to consider, and will be
smaller on x86 for the same reason.
The actual code the JIT produces should be unchanged. This commit
merely modifies how the register allocator handles one of its
important data structures.
Julian Seward [Thu, 19 Oct 2006 03:01:09 +0000 (03:01 +0000)]
When doing rlwinm in 64-bit mode, bind the intermediate 32-bit result
to a temporary so it is only computed once. What's there currently
causes it to be computed twice.
Julian Seward [Tue, 17 Oct 2006 00:28:22 +0000 (00:28 +0000)]
Merge r1663-r1666:
- AIX5 build changes
- genoffsets.c: print the offsets of a few more ppc registers
- Get rid of a bunch of ad-hoc hacks which hardwire in certain
assumptions about guest and host ABIs. Instead pass that info
in a VexMiscInfo structure. This cleans up various grotty bits.
- Add to ppc32 guest state, redirection-stack stuff already present
in ppc64 guest state. This is to enable function redirection/
wrapping in the presence of TOC pointers in 32-bit mode.
- Add to both ppc32 and ppc64 guest states, a new pseudo-register
LR_AT_SC. This holds the link register value at the most recent
'sc', so that AIX can back up to restart a syscall if needed.
- Add to both ppc32 and ppc64 guest states, a SPRG3 register.
- Use VexMiscInfo to handle 'sc' on AIX differently from Linux:
on AIX, 'sc' continues at the location stated in the link
register, not at the next insn.
Add support for amd64 'fprem' (fixes bug 132918). This isn't exactly
right; the C3/2/1/0 FPU flags sometimes don't get set the same as
natively, and I can't figure out why.
Julian Seward [Sat, 19 Aug 2006 18:31:53 +0000 (18:31 +0000)]
Comparing a reg with itself produces a result which doesn't depend on
the contents of the reg. Therefore remove the false dependency, which
has been known to cause memcheck to produce false errors for
xlc-compiled code.
Julian Seward [Sun, 21 May 2006 01:02:31 +0000 (01:02 +0000)]
A couple of IR simplification hacks for the amd64 front end, so as to
avoid false errors from memcheck. Analogous to some of the recent
bunch of commits to x86 front end.
Julian Seward [Sun, 14 May 2006 18:46:55 +0000 (18:46 +0000)]
Add an IR folding rule to convert Add32(x,x) into Shl32(x,1). This
fixes #118466 and it also gets rid of a bunch of false positives for
KDE 3.5.2 built by gcc-4.0.2 on x86, of the form shown below.
Use of uninitialised value of size 4
at 0x4BFC342: QIconSet::pixmap(QIconSet::Size, QIconSet::Mode,
QIconSet::State) const (qiconset.cpp:530)
by 0x4555BE7: KToolBarButton::drawButton(QPainter*)
(ktoolbarbutton.cpp:536)
by 0x4CB8A0A: QButton::paintEvent(QPaintEvent*) (qbutton.cpp:887)
Julian Seward [Sat, 13 May 2006 23:08:06 +0000 (23:08 +0000)]
Add specialisation rules to simplify the IR for 'testl .. ; js ..',
'testw .. ; js ..' and 'testb .. ; js ..'. This gets rid of a bunch of
false errors in Memcheck of the form
==2398== Conditional jump or move depends on uninitialised value(s)
==2398== at 0x6C51B61: KHTMLPart::clear() (khtml_part.cpp:1370)
==2398== by 0x6C61A72: KHTMLPart::begin(KURL const&, int, int)
(khtml_part.cpp:1881)
Julian Seward [Mon, 1 May 2006 02:14:17 +0000 (02:14 +0000)]
Counterpart to r1605: in the ppc insn selector, don't use the bits
VexArchInfo.hwcaps to distinguish ppc32 and ppc64. Instead pass
the host arch around. And associated plumbing.
Don't use the bits VexArchInfo.hwcaps to distinguish ppc32 and ppc64,
since that doesn't work properly. Instead pass the guest arch around
too. Small change with lots of associated plumbing.
Julian Seward [Tue, 7 Mar 2006 01:15:50 +0000 (01:15 +0000)]
Move the helper function for x86 'fxtract' to g_generic_x87.c so
it can be shared by the x86 and amd64 front ends, then use it to
implement fxtract on amd64.
Julian Seward [Wed, 8 Feb 2006 19:30:46 +0000 (19:30 +0000)]
Redo the way FP multiply-accumulate insns are done on ppc32/64.
Instead of splitting them up into a multiply and an add/sub, add 4 new
primops which keeps the operation as a single unit. Then, in the back
end, re-emit the as a single instruction.
Reason for this is that so-called fused-multiply-accumulate -- which
is what ppc does -- generates a double-double length intermediate
result (of the multiply, 112 mantissa bits) before doing the add, and
so it is impossible to do a bit-accurate simulation of it using AddF64
and MulF64.
Unfortunately the new primops unavoidably take 4 args (a rounding mode
+ 3 FP args) and so there is a new IRExpr expression type, IRExpr_Qop
and associated supporting junk.
Julian Seward [Sat, 4 Feb 2006 15:24:00 +0000 (15:24 +0000)]
Make the CSE pass more aggressive. It now commons up Mux0X and GetI
expressions too. This generates somewhat better FP code on x86 since
it removes more redundant artefacts from the x87 FP stack simulation.
Unfortunately commoning up GetIs complicates CSEs, since it is now
possible that "available expressions" collected by the CSEr will
become invalidated by writes to the guest state as we work through the
block. So there is additional code to check for this case.
Some supporting functions (getAliasingRelation_IC and
getAliasingRelation_II) have been moved earlier in the file.
Julian Seward [Fri, 3 Feb 2006 16:08:03 +0000 (16:08 +0000)]
An overhaul of VEX's floating point handling, to facilitate correct
simulation of IEEE rounding modes in all FP operations.
The fundamental change is to add a third argument to the basic
floating point primops, eg AddF64, MulF64, etc, indicating the
(IR-encoded) rounding mode to be used for that operation.
Unfortunately IR did not have any way to support three-argument
primops, which means a new kind of IRExpr has been added: a ternary
op, IRExpr_Triop, which is simply a 3-argument form of the existing IR
binary operation node. The unfortunate side effect is that the size
of the union type IRExpr has increased from 16 to 20 bytes on 32-bit
platforms, and hence the JIT chews through more memory, but this does
not appear to have a measurable effect on the JIT's performance, at
least as measured by Valgrind's perf suite.
* Add IRExpr_Triop, and add handling code to dozens of places which
examine IRExprs.
* Rename/retype a bunch of floating point IR primops to take a 3rd
rounding mode argument (which is always the first arg).
* Add extra primops AddF64r32 et al, which do double-precision FP
operations and then round to single precision, still within a 64-bit
type. This is needed to simulate PPC's fadds et al without double
rounding.
* Adjust the PPC->IR front end, to generate these new primops and
rounding modes.
* Cause the IR optimiser to do a CSE pass on blocks containing any
floating point operations. This commons up the IR rounding mode
computations, which is important for generating efficient code from
the backend.
* Adjust the IR->PPC back end, so as to emit instructions to set the
rounding mode before each FP operation. Well, at least in
principle. In practice there is a bit of cleverness to avoid
repeatedly setting it to the same value. This depends on both the
abovementioned CSE pass, and on the SSA property of IR (cool stuff,
SSA!). The effect is that for most blocks containing FP code, the
rounding mode is set just once, at the start of the block, and the
resulting overhead is minimal. See comment on
set_FPU_rounding_mode().
This change requires followup changes in memcheck. Also, the
x86/amd64 front/back ends are temporarily broken.
Julian Seward [Fri, 27 Jan 2006 21:20:15 +0000 (21:20 +0000)]
Change the way Vex represents architecture variants into something
more flexible. Prior to this change, the type VexSubArch effectively
imposed a total ordering on subarchitecture capabilities, which was
overly restrictive. This change moves to effectively using a bit-set,
allowing some features (instruction groups) to be supported or not
supported independently of each other.
Julian Seward [Wed, 25 Jan 2006 21:29:48 +0000 (21:29 +0000)]
Change the way the ppc backend does ppc32/64 float-integer
conversions. fctiw/fctid/fcfid/stfiwx are now represented explicitly
and are generated by the instruction selector. This removes the need
for hdefs.c to know anything about scratch areas on the stack and
scratch FP registers.