[PATCH 4/7] RISC-V: Honour -mrvv-max-lmul in riscv_vector::expand_block_move
Unlike the other vector string ops, expand_block_move was using max LMUL
m8 regardless of TARGET_MAX_LMUL.
The check for whether to generate inline vector code for movmem has been
moved from movmem<mode> to riscv_vector::expand_block_move to avoid
maintaining multiple versions of similar logic. They already differed
on the minimum length for which they would generate vector code. Now
that the expand_block_move value is used, movmem will be generated for
smaller lengths.
Limiting memcpy to m1 caused some memcpy loops to be generated in
the calling convention tests which makes it awkward to add suitable scan
assembler tests checking the return value being set, so
-mrvv-max-lmul=m8 has been added to these tests. Other tests have been
adjusted to expect the new memcpy m1 generation where reasonably
straight-forward, otherwise -mrvv-max-lmul=m8 has been added.
pr111720-[0-9].c regressed because a memcpy loop is generated instead
of straight-line. This reveals an existing issue where a redundant
straight-line memcpy gets eliminated but a memcpy loop does not
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117205).
For example, on pr111720-0.c after this patch:
-mrvv-max-lmul=m8:
test:
lui a5,%hi(.LANCHOR0)
li a4,32
addi sp,sp,-32
addi a5,a5,%lo(.LANCHOR0)
vsetvli zero,a4,e8,m1,ta,ma
vle8.v v8,0(a5)
addi sp,sp,32
jr ra
-mrvv-max-lmul=m1:
test:
addi sp,sp,-32
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
mv a2,sp
li a3,32
.L2:
vsetvli a4,a3,e8,m1,ta,ma
vle8.v v8,0(a5)
sub a3,a3,a4
add a5,a5,a4
vse8.v v8,0(a2)
add a2,a2,a4
bne a3,zero,.L2
li a5,32
vsetvli zero,a5,e8,m1,ta,ma
vle8.v v8,0(sp)
addi sp,sp,32
jr ra
I have added -mrvv-max-lmul=m8 to pr111720-[0-9].c so that we continue
to test the elimination of straight-line memcpy.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (get_lmul_mode): New prototype.
(expand_block_move): Add bool parameter for movmem_p.
* config/riscv/riscv-string.cc (riscv_expand_block_move_scalar):
Pass movmem_p as false to riscv_vector::expand_block_move.
(expand_block_move): Add movmem_p parameter. Return false if
loop needed and movmem_p is true. Respect TARGET_MAX_LMUL.
* config/riscv/riscv-v.cc (get_lmul_mode): New function.
* config/riscv/riscv.md (movmem<mode>): Move checking for
whether to generate inline vector code to
riscv_vector::expand_block_move by passing movmem_p as true.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113206-1.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/pr113206-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Add
-mrvv-max-lmul=m8 and adjust assembly scans.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-1.c: Expect m1 in f1 and f2.
* gcc.target/riscv/rvv/base/cpymem-2.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/movmem-1.c: Adjust f1 to a length
that will not get vectorized.
* gcc.target/riscv/rvv/base/pr111720-0.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/pr111720-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-2.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-3.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-4.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-5.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-6.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-7.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-8.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-9.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect memcpy m1
loops.
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise.