AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
The new costs should only count reduction latency by multiplying count for
single_defuse_cycle. For other situations, this will increase the reduction
latency a lot and miss vectorization opportunities.
Tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/110625
* config/aarch64/aarch64.cc (count_ops): Only '* count' for
single_defuse_cycle while counting reduction_latency.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr110625_1.c: New testcase.
* gcc.target/aarch64/pr110625_2.c: New testcase.