Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn when
possible, which allows for better optimization when the code is inside a loop
by using a constant.
The conversion is based on the fact that for an unsigned integer:
-x = ~x + 1 => ~x = -1 - x
thus '(u8)(~x >> imm)' is equivalent to '(u8)(((u16)-1 - x) >> imm)'.
For the following function:
uint8x8_t neg_narrow_v8hi(uint16x8_t a) {
uint16x8_t b = vmvnq_u16(a);
return vshrn_n_u16(b, 8);
}
Without this patch the assembly look like:
not v0.16b, v0.16b
shrn v0.8b, v0.8h, 8
After the patch it becomes:
mvni v31.4s, 0
subhn v0.8b, v31.8h, v0.8h