r1769:
This commit provides a bunch of enhancements to the IR optimiser
(iropt) and to the various backend instruction selectors.
Unfortunately the changes are interrelated and cannot easily be
committed in pieces in any meaningful way. Between them and the
already-committed register allocation enhancements (r1765, r1767)
performance of Memcheck is improved by 0%-10%. Improvements are also
applicable to other tools to lesser extents.
Main changes are:
* Add new IR primops Iop_Left64/32/16/8 and Iop_CmpwNEZ64/32/16/8
which Memcheck uses to express some primitive operations on
definedness (V) bits:
Left(x) = set all bits to the left of the rightmost 1 bit to 1
CmpwNEZ(x) = if x == 0 then 0 else 0xFF...FF
Left and CmpwNEZ are detailed in the Usenix 2005 paper (in which
CmpwNEZ is called PCast). The new primops expose opportunities for
IR optimisation at tree-build time. Prior to this change Memcheck
expressed Left and CmpwNEZ in terms of lower level primitives
(logical or, negation, compares, various casts) which was simpler
but hindered further optimisation.
* Enhance the IR optimiser's tree builder so it can rewrite trees
as they are constructed, according to useful identities, for example:
CmpwNEZ64( Or64 ( CmpwNEZ64(x), y ) ) --> CmpwNEZ64( Or64( x, y ) )
which gets rid of a CmpwNEZ64 operation - a win as they are relatively
expensive. See functions fold_IRExpr_Binop and fold_IRExpr_Unop.
Allowing the tree builder to rewrite trees also makes it possible to
have a single implementation of certain transformation rules which
were previously duplicated in the x86, amd64 and ppc instruction
selectors. For example
32to1(1Uto32(x)) --> x
This simplifies the instruction selectors and gives a central place
to put such IR-level transformations, which is a Good Thing.
* Various minor refinements to the instruction selectors:
- ppc64 generates 32Sto64 into 1 instruction instead of 2
- x86 can now generate movsbl
- x86 handles 64-bit integer Mux0X better for cases typically
arising from Memchecking of FP code
- misc other patterns handled better
Overall these changes are a straight win - vex generates less code,
and does so a bit faster since its register allocator has to chew
through fewer instructions. The main risk is that of correctness:
making Left and CmpwNEZ explicit, and adding rewrite rules for them,
is a substantial change in the way Memcheck deals with undefined value
tracking, and I am concerned to ensure that the changes do not cause
false negatives. I _think_ it's all correct so far.
r1770:
Get rid of Iop_Neg64/32/16/8 as they are no longer used by Memcheck,
and any uses as generated by the front ends are so infrequent that
generating the equivalent Sub(0, ..) is good enough. This gets rid of
quite a few lines of code. Add isel cases for Sub(0, ..) patterns so
that the x86/amd64 backends still generate negl/negq where possible.
r1771:
Handle Left64. Fixes failure on none/tests/x86/insn_sse2.