git.ipfire.org Git - thirdparty/qemu.git/commit

author	Paolo Bonzini <pbonzini@redhat.com>
	Wed, 19 Oct 2022 11:22:06 +0000 (13:22 +0200)
committer	Paolo Bonzini <pbonzini@redhat.com>
	Sat, 22 Oct 2022 07:05:54 +0000 (09:05 +0200)
commit	2872b0f390c3fbd8f19f6b82da3dca15fa820118
tree	da951c7b5fb8902fa0184c160e00d01d6a5c7840	tree
parent	cf5ec6641ed456e2748b211b7bbf5103bfc93098	commit \| diff

target/i386: implement FMA instructions

The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities). However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand. Therefore, there is no separate XMM
and YMM of the scalar helpers).

First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.

Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386/cpu.c		diff \| blob \| blame \| history
target/i386/ops_sse.h		diff \| blob \| blame \| history
target/i386/ops_sse_header.h		diff \| blob \| blame \| history
target/i386/tcg/decode-new.c.inc		diff \| blob \| blame \| history
target/i386/tcg/decode-new.h		diff \| blob \| blame \| history
target/i386/tcg/emit.c.inc		diff \| blob \| blame \| history
target/i386/tcg/translate.c		diff \| blob \| blame \| history
tests/tcg/i386/test-avx.py		diff \| blob \| blame \| history