Julian Seward [Sat, 31 Mar 2007 14:30:12 +0000 (14:30 +0000)]
Teach the x86 back end how generate 'lea' instructions, and generate
them in a couple of places which are important. This reduces the
amount of generated code for memcheck and none by about 1%, and (in
very unscientific tests on perf/bz2) speeds memcheck up by about 1%.
Julian Seward [Sun, 25 Mar 2007 04:14:58 +0000 (04:14 +0000)]
x86 back end: use 80-bit loads/stores for floating point spills rather
than 64-bit ones, to reduce accuracy loss. To support this, in
reg-alloc, allocate 2 64-bit spill slots for each HRcFlt64 vreg
instead of just 1.
Julian Seward [Tue, 20 Mar 2007 14:18:45 +0000 (14:18 +0000)]
x86 front end: synthesise SIGILL in the normal way for some obscure
invalid instruction cases, rather than asserting, as happened in
#143079 and #142279. amd64 equivalents to follow.
Julian Seward [Fri, 9 Mar 2007 18:07:00 +0000 (18:07 +0000)]
When generating 64-bit code, ensure that any addresses used in 4 or 8
byte loads or stores of the form reg+imm have the lowest 2 bits of imm
set to zero, so that they can safely be used in ld/ldu/lda/std/stdu
instructions. This boils down to doing an extra check in
iselWordExpr_AMode and avoiding the reg+imm case in cases where the
amode might end up in any of the abovementioned instructions.
Julian Seward [Sat, 27 Jan 2007 00:46:28 +0000 (00:46 +0000)]
Fill in missing cases in eqIRConst. This stops iropt's CSE pass from
asserting in the presence of V128 immediates, which is a regression
in valgrind 3.2.2.
Julian Seward [Wed, 10 Jan 2007 04:59:33 +0000 (04:59 +0000)]
Implement FXSAVE on amd64. Mysteriously my Athlon64 does not seem to
write all the fields that the AMD documentation says it should: it
skips ROP, RIP and RDP, so vex's implementation writes zeroes there.
Julian Seward [Sun, 24 Dec 2006 02:20:24 +0000 (02:20 +0000)]
A large but non-functional commit: as suggested by Nick, rename some
IR types, structure fields and functions to make IR a bit easier to
understand. Specifically:
dopyIR* -> deepCopyIR*
sopyIR* -> shallowCopyIR*
The presence of a .Tmp union in both IRExpr and IRStmt is
confusing. It has been renamed to RdTmp in IRExpr, reflecting
the fact that here we are getting the value of an IRTemp, and to
WrTmp in IRStmt, reflecting the fact that here we are assigning
to an IRTemp.
IRBB (IR Basic Block) is renamed to IRSB (IR SuperBlock),
reflecting the reality that Vex does not really operate in terms
of basic blocks, but in terms of superblocks - single entry,
multiple exit sequences.
IRArray is renamed to IRRegArray, to make it clearer it refers
to arrays of guest registers and not arrays in memory.
VexMiscInfo is renamed to VexAbiInfo, since that's what it is
-- relevant facts about the ABI (calling conventions, etc) for
both the guest and host platforms.
Julian Seward [Fri, 1 Dec 2006 02:59:17 +0000 (02:59 +0000)]
Change a stupid algorithm that deals with real register live
ranges into a less stupid one. Prior to this change, the complexity
of reg-alloc included an expensive term
O(#instrs in code sequence x #real-register live ranges in code sequence)
This commit changes that term to essentially
O(#instrs in code sequence) + O(time to sort real-reg-L-R array)
On amd64 this nearly halves the cost of register allocation and means
Valgrind performs better in translation-intensive situations (a.k.a
starting programs). Eg, firefox start/exit falls from 119 to 113
seconds. The effect will be larger on ppc32/64 as there are more real
registers and hence real-reg live ranges to consider, and will be
smaller on x86 for the same reason.
The actual code the JIT produces should be unchanged. This commit
merely modifies how the register allocator handles one of its
important data structures.
Julian Seward [Thu, 19 Oct 2006 03:01:09 +0000 (03:01 +0000)]
When doing rlwinm in 64-bit mode, bind the intermediate 32-bit result
to a temporary so it is only computed once. What's there currently
causes it to be computed twice.
Julian Seward [Tue, 17 Oct 2006 00:28:22 +0000 (00:28 +0000)]
Merge r1663-r1666:
- AIX5 build changes
- genoffsets.c: print the offsets of a few more ppc registers
- Get rid of a bunch of ad-hoc hacks which hardwire in certain
assumptions about guest and host ABIs. Instead pass that info
in a VexMiscInfo structure. This cleans up various grotty bits.
- Add to ppc32 guest state, redirection-stack stuff already present
in ppc64 guest state. This is to enable function redirection/
wrapping in the presence of TOC pointers in 32-bit mode.
- Add to both ppc32 and ppc64 guest states, a new pseudo-register
LR_AT_SC. This holds the link register value at the most recent
'sc', so that AIX can back up to restart a syscall if needed.
- Add to both ppc32 and ppc64 guest states, a SPRG3 register.
- Use VexMiscInfo to handle 'sc' on AIX differently from Linux:
on AIX, 'sc' continues at the location stated in the link
register, not at the next insn.
Add support for amd64 'fprem' (fixes bug 132918). This isn't exactly
right; the C3/2/1/0 FPU flags sometimes don't get set the same as
natively, and I can't figure out why.
Julian Seward [Sat, 19 Aug 2006 18:31:53 +0000 (18:31 +0000)]
Comparing a reg with itself produces a result which doesn't depend on
the contents of the reg. Therefore remove the false dependency, which
has been known to cause memcheck to produce false errors for
xlc-compiled code.
Julian Seward [Sun, 21 May 2006 01:02:31 +0000 (01:02 +0000)]
A couple of IR simplification hacks for the amd64 front end, so as to
avoid false errors from memcheck. Analogous to some of the recent
bunch of commits to x86 front end.
Julian Seward [Sun, 14 May 2006 18:46:55 +0000 (18:46 +0000)]
Add an IR folding rule to convert Add32(x,x) into Shl32(x,1). This
fixes #118466 and it also gets rid of a bunch of false positives for
KDE 3.5.2 built by gcc-4.0.2 on x86, of the form shown below.
Use of uninitialised value of size 4
at 0x4BFC342: QIconSet::pixmap(QIconSet::Size, QIconSet::Mode,
QIconSet::State) const (qiconset.cpp:530)
by 0x4555BE7: KToolBarButton::drawButton(QPainter*)
(ktoolbarbutton.cpp:536)
by 0x4CB8A0A: QButton::paintEvent(QPaintEvent*) (qbutton.cpp:887)
Julian Seward [Sat, 13 May 2006 23:08:06 +0000 (23:08 +0000)]
Add specialisation rules to simplify the IR for 'testl .. ; js ..',
'testw .. ; js ..' and 'testb .. ; js ..'. This gets rid of a bunch of
false errors in Memcheck of the form
==2398== Conditional jump or move depends on uninitialised value(s)
==2398== at 0x6C51B61: KHTMLPart::clear() (khtml_part.cpp:1370)
==2398== by 0x6C61A72: KHTMLPart::begin(KURL const&, int, int)
(khtml_part.cpp:1881)
Julian Seward [Mon, 1 May 2006 02:14:17 +0000 (02:14 +0000)]
Counterpart to r1605: in the ppc insn selector, don't use the bits
VexArchInfo.hwcaps to distinguish ppc32 and ppc64. Instead pass
the host arch around. And associated plumbing.
Don't use the bits VexArchInfo.hwcaps to distinguish ppc32 and ppc64,
since that doesn't work properly. Instead pass the guest arch around
too. Small change with lots of associated plumbing.
Julian Seward [Tue, 7 Mar 2006 01:15:50 +0000 (01:15 +0000)]
Move the helper function for x86 'fxtract' to g_generic_x87.c so
it can be shared by the x86 and amd64 front ends, then use it to
implement fxtract on amd64.