git.ipfire.org Git - thirdparty/gcc.git/commit

author	mtsamis <manolis.tsamis@vrull.eu>
	Mon, 1 Aug 2022 12:11:02 +0000 (14:11 +0200)
committer	Philipp Tomsich <philipp.tomsich@vrull.eu>
	Thu, 11 May 2023 19:59:09 +0000 (21:59 +0200)
commit	c4638cc4164ee572868dbb4f1a66dc7cfd34b8ba
tree	ae89f56c61b042e3413dac28d94cda11ec9aed0b	tree
parent	9eea27e7cec6168dc603e1fb0718ded9c828945e	commit \| diff

aarch64: convert vector shift + bitwise and + multiply to vector compare

When using SWAR (SIMD in a register) techniques a comparison operation
within such a register can be made by using a combination of shifts,
bitwise and and multiplication. If code using this scheme is
vectorized then there is potential to replace all these operations
with a single vector comparison, by reinterpreting the vector types to
match the width of the SWAR register.

For example, for the test function packed_cmp_16_32, the original
generated code is:
        ldr     q0, [x0]
        add     w1, w1, 1
        ushr    v0.4s, v0.4s, 15
        and     v0.16b, v0.16b, v2.16b
        shl     v1.4s, v0.4s, 16
        sub     v0.4s, v1.4s, v0.4s
        str     q0, [x0], 16
        cmp     w2, w1
        bhi     .L20
with this pattern the above can be optimized to:
        ldr     q0, [x0]
        add     w1, w1, 1
        cmlt    v0.8h, v0.8h, #0
        str     q0, [x0], 16
        cmp     w2, w1
        bhi     .L20
The effect is similar for x86-64.

Bootstrapped and reg-tested for x86 and aarch64.

gcc/ChangeLog:

* match.pd: simplify vector shift + bit_and + multiply.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/swar_to_vec_cmp.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>

gcc/match.pd		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c	[new file with mode: 0644]	blob