aarch64: convert vector shift + bitwise and + multiply to vector compare
When using SWAR (SIMD in a register) techniques a comparison operation
within such a register can be made by using a combination of shifts,
bitwise and and multiplication. If code using this scheme is
vectorized then there is potential to replace all these operations
with a single vector comparison, by reinterpreting the vector types to
match the width of the SWAR register.
For example, for the test function packed_cmp_16_32, the original
generated code is:
ldr q0, [x0]
add w1, w1, 1
ushr v0.4s, v0.4s, 15
and v0.16b, v0.16b, v2.16b
shl v1.4s, v0.4s, 16
sub v0.4s, v1.4s, v0.4s
str q0, [x0], 16
cmp w2, w1
bhi .L20
with this pattern the above can be optimized to:
ldr q0, [x0]
add w1, w1, 1
cmlt v0.8h, v0.8h, #0
str q0, [x0], 16
cmp w2, w1
bhi .L20
The effect is similar for x86-64.