Redo the way FP multiply-accumulate insns are done on ppc32/64.
Instead of splitting them up into a multiply and an add/sub, add 4 new
primops which keeps the operation as a single unit. Then, in the back
end, re-emit the as a single instruction.
Reason for this is that so-called fused-multiply-accumulate -- which
is what ppc does -- generates a double-double length intermediate
result (of the multiply, 112 mantissa bits) before doing the add, and
so it is impossible to do a bit-accurate simulation of it using AddF64
and MulF64.
Unfortunately the new primops unavoidably take 4 args (a rounding mode
+ 3 FP args) and so there is a new IRExpr expression type, IRExpr_Qop
and associated supporting junk.