git.ipfire.org Git - thirdparty/gcc.git/commit

RISC-V: Add RVV builtin vectorization cost model

This patch fixes PR11153:

        ble     a1,zero,.L8
        addiw   a5,a1,-1
        li      a4,4
        addi    sp,sp,-16
        mv      a2,a0
        sext.w  a3,a1
        bleu    a5,a4,.L9
        srliw   a4,a3,2
        slli    a4,a4,4
        mv      a5,a0
        add     a4,a4,a0
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,0
        vse32.v v1,0(sp)
.L4:
        vle32.v v1,0(a5) ---> This loop always processes 4 elements which is ok for VLEN = 128bits, but waste a huge amount of computation units when VLEN > 128bits
        vle32.v v2,0(sp)
        addi    a5,a5,16
        vadd.vv v1,v2,v1
        vse32.v v1,0(sp)
        bne     a4,a5,.L4
        ld      a5,0(sp)
        lw      a4,0(sp)
        andi    a1,a1,-4
        srai    a5,a5,32
        addw    a5,a4,a5
        lw      a4,8(sp)
        addw    a5,a5,a4
        ld      a4,8(sp)
        srai    a4,a4,32
        addw    a0,a5,a4
        beq     a3,a1,.L15
.L3:
        subw    a3,a3,a1
        slli    a5,a1,32
        slli    a3,a3,32
        srli    a3,a3,32
        srli    a5,a5,30
        add     a2,a2,a5
        vsetvli a5,a3,e8,mf4,tu,mu
        vsetvli a4,zero,e32,m1,ta,ma
        sub     a1,a3,a5
        vmv.v.i v1,0
        vsetvli zero,a3,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vmv.v.v v1,v2
        bne     a3,a5,.L21
.L7:
        vsetvli a4,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a5,v1
        addw    a0,a0,a5
.L15:
        addi    sp,sp,16
        jr      ra
.L21:
        slli    a5,a5,2
        add     a2,a2,a5
        vsetvli zero,a1,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vadd.vv v1,v1,v2
        j       .L7
.L8:
        li      a0,0
        ret
.L9:
        li      a1,0
        li      a0,0
        j       .L3

The rootcause of this is we missed RVV builtin vectorization cost model.

After this patch:

ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret

PR target/111153

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): New struct.
(struct scalable_vector_cost): Ditto.
(struct cpu_vector_cost): Ditto.
* config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add RVV
builtin vectorization cost
* config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
(get_common_costs): New function.
(riscv_builtin_vectorization_cost): Ditto.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.

author	Juzhe-Zhong <juzhe.zhong@rivai.ai>
	Thu, 14 Dec 2023 03:23:43 +0000 (11:23 +0800)
committer	Pan Li <pan2.li@intel.com>
	Thu, 14 Dec 2023 06:51:03 +0000 (14:51 +0800)
commit	5e0f67b84a615ba186ab234a9bc43df0df5a50b6
tree	da9d7a9aacd0e401475646366c9496ed4b87643a	tree \| snapshot
parent	acfd33620af3519b84baecedb0eb6618c2f599a6	commit \| diff

gcc/config/riscv/riscv-protos.h		diff \| blob \| blame \| history
gcc/config/riscv/riscv-vector-costs.cc		diff \| blob \| blame \| history
gcc/config/riscv/riscv.cc		diff \| blob \| blame \| history
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111153.c	[new file with mode: 0644]	blob