When generating IR for movsd mem->reg, don't first write the entire
guest reg with zeroes and then overwrite the lower half. This forces
the back end to generate code which creates huge write-after-write
stalls in the memory system of P4s due to the different sized writes.
This apparently small change reduces the run-time of one
sse2-intensive floating point program from 145 seconds to 90 seconds
(--tool=none).
Whenever the flags thunk is set, fill in all the fields, even NDEP
which isn't usually used. This makes redundant-PUT elimination work
better, fixing a rather subtle optimisation bug. For at least one
floating-point case this gives a significant speedup. Consider a bb
like this:
(no flag setting insns before inc)
inc ...
(no flag setting insns)
add ...
inc sets CC_OP, CC_DEP1 and CC_NDEP; the latter is expensive because a
call to calculate_eflags_c is required.
add sets CC_OP, CC_DEP1 and CC_DEP2. The CC_NDEP value is now
irrelevant, but because CC_NDEP is not overwritten, iropt cannot
remove the previous assignment to it, and so the expensive helper call
remains even though it is irrelevant.
This commit fixes that: By always setting NDEP to zero whenever its
value will be unused, any previous assignment to it will be removed by
iropt.
This change should be propagated to the amd64 front end too.
#include <rude_words.h>. The recent change of denotation of no-op IR
statements from NULL to IRStmt_NoOp screwed up the how-much-to-unroll
heuristics in iropt, resulting in iropt being significantly less
enthusiastic about unrolling than it was before the change. Gaaah!
This commit should fix it.
Remember to clear C2 after fsincos, as that actually makes it work
right with reasonable-sized inputs. This confirms fsincos as the
golden lemon of x87 floating point instructions, since Vex has by now
chomped through vast amounts of floating point code on x86 and this is
the first time this bug has come to light.
Julian Seward [Sat, 26 Mar 2005 20:33:38 +0000 (20:33 +0000)]
Move some conversion functions (IEEE double <-> x87 extended-real
format) into their own module, so they can be shared by the x86 and
amd64 front ends.
Julian Seward [Sat, 26 Mar 2005 12:57:39 +0000 (12:57 +0000)]
Update comment.
Stare at sanity checks but still fail to figure out how to make them
cheaper. The register allocator's sanity checks consume 15%-20% of
the total running time of Vex.
Julian Seward [Fri, 25 Mar 2005 22:33:54 +0000 (22:33 +0000)]
The helper for FXAM wasn't setting the C1 flag correctly. This bug
is more-or-less undetectable at least if you use fstsw to examine
the results of FXAM, since fstsw doesn't copy C1.
Julian Seward [Tue, 22 Mar 2005 02:24:05 +0000 (02:24 +0000)]
Turns out the recent IRStmt_NoOp hackery broke the IR optimiser quite
seriously. It was still transforming correctly, but many of the
transformations had been hampered by no longer being able to recognise
no-ops properly. This hopefully fixes it.
Julian Seward [Mon, 21 Mar 2005 01:06:20 +0000 (01:06 +0000)]
The Icc typechecker police have been round banging on our doors again.
Placating icc -Wall is a Herculean task; I don't know if it will ever
get completed.
Julian Seward [Mon, 21 Mar 2005 00:15:53 +0000 (00:15 +0000)]
Add a new IR statement kind: IRStmt_NoOp, to denote a no-operation.
These are generated by the IR optimiser. The use of no-ops replaces
the old practice of allowing a BB to contain NULL pointers in its
statement array as a way of denoting no-ops. NULL stmts are now no
longer allowed under any circumstances, and the IR sanity checker will
reject any BB containing them.
Julian Seward [Wed, 16 Mar 2005 18:19:10 +0000 (18:19 +0000)]
Add a new kind of IR stmt: "instruction marks" (IRStmt_IMark), so as
to support profiling. It is the responsibility of front ends (toIR.c)
to generate these. For each instruction, the first IR stmt emitted
should be an IMark, stating the guest address and length of the guest
instruction represented by the IR that follows. All IR stmts
following the IMark but before the next IMark are then assumed to
'belong to' the guest insn described by the first IMark. IMarks do
not denote executable code and can be ignored at any point in the
proceedings; they are an optional addition which help
profiling-annotators to navigate the IR stmt stream.
This commit adds IR level infrastructure for IMarks and IMark
generation in the x86 front end. The amd64 and ppc32 front end are
not yet done.
Julian Seward [Wed, 16 Mar 2005 13:57:58 +0000 (13:57 +0000)]
Add guest_TISTART and guest_TILEN fields to all guest state structs,
since eventually users of the library will refer to them, and unless
they exist in all guest states, compilation failure will result.
These fields contain the size and length of an area of icache
invalidated by any icache-flushing instruction encountered. On x86
and amd64 there is no such insn and so they are zeroed at startup and
play no further role at all. But on ppc32 they are written to as a
result of executing an 'icbi' instruction.
Julian Seward [Wed, 16 Mar 2005 11:52:25 +0000 (11:52 +0000)]
Add %EBP/%RBP to the set of registers for which redundant-PUT
elimination is not done. This is needed so that Valgrind can
construct correct stack traces on x86/amd64. Curiously enough old
UCode valgrind didn't do this correctly, but because it doesn't
optimise as aggressively as Vex, we didn't notice this. Overall
result is that Vex-based valgrind now produces more accurate stack
traces, at least on x86, than valgrind-2.4.X will.
Fixed xer_ca flag calc for subfze,
Cleaned up ghelpers.c: calc_xer_ca, calc_xer_ov
Cleaned up toIR.c dis_int_arith, dis_int_cmp, dis_int_logic, dis_int_shift