]>
git.ipfire.org Git - thirdparty/gcc.git/commit
RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.
It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x.
Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have
better chances to fuse vsetvl.
Consider this following case:
void
foo (uint32_t *outputMat, uint32_t *inputMat)
{
vuint32m1_t matRegIn0 = __riscv_vle32_v_u32m1 (inputMat, 4);
vuint32m1_t matRegIn1 = __riscv_vle32_v_u32m1 (inputMat + 4, 4);
vuint32m1_t matRegIn2 = __riscv_vle32_v_u32m1 (inputMat + 8, 4);
vuint32m1_t matRegIn3 = __riscv_vle32_v_u32m1 (inputMat + 12, 4);
vbool32_t oddMask
= __riscv_vreinterpret_v_u32m1_b32 (__riscv_vmv_v_x_u32m1 (0xaaaa, 1));
vuint32m1_t smallTransposeMat0
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn0, matRegIn1, 1, 4);
vuint32m1_t smallTransposeMat2
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn2, matRegIn3, 1, 4);
vuint32m1_t outMat0 = __riscv_vslideup_vx_u32m1_tu (smallTransposeMat0,
smallTransposeMat2, 2, 4);
__riscv_vse32_v_u32m1 (outputMat, outMat0, 4);
}
Before this patch:
vsetivli zero,4,e32,m1,ta,ma
li a5,45056
addi a2,a1,16
addi a3,a1,32
addi a4,a1,48
vle32.v v1,0(a1)
vle32.v v4,0(a2)
vle32.v v2,0(a3)
vle32.v v3,0(a4)
addiw a5,a5,-1366
vsetivli zero,1,e32,m1,ta,ma
vmv.v.x v0,a5 ---> Since it avl = 1, we can transform it into vmv.s.x
vsetivli zero,4,e32,m1,tu,mu
vslideup.vi v1,v4,1,v0.t
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret
After this patch:
li a5,45056
addi a2,a1,16
vsetivli zero,4,e32,m1,tu,mu
addiw a5,a5,-1366
vle32.v v3,0(a2)
addi a3,a1,32
addi a4,a1,48
vle32.v v1,0(a1)
vmv.s.x v0,a5
vle32.v v2,0(a3)
vslideup.vi v1,v3,1,v0.t
vle32.v v3,0(a4)
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (splat_to_scalar_move_p): New function.
* config/riscv/riscv-v.cc (splat_to_scalar_move_p): Ditto.
* config/riscv/vector.md: Simplify vmv.v.x. into vmv.s.x.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/attribute-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/attribute-3.c: New test.