git.ipfire.org Git - thirdparty/gcc.git/commit

author	Kyrylo Tkachov <ktkachov@nvidia.com>
	Tue, 22 Oct 2024 14:52:36 +0000 (07:52 -0700)
committer	Kyrylo Tkachov <ktkachov@nvidia.com>
	Mon, 4 Nov 2024 08:41:09 +0000 (09:41 +0100)
commit	14cb23e743e02e6923f7e46a14717e9f561f6723
tree	47aca37dd2172d1412ace6e6746c23a260942d91	tree
parent	19757e1c28de07b45da03117e6ff7ae3e21e5a7a	commit \| diff

aarch64: Emit XAR for vector rotates where possible

We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
    return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
        movi    v31.4s, 0
        xar     z0.s, z0.s, z31.s, #23
        ret

G2:
        movi    v31.4s, 0
        xar     z0.b, z0.b, z31.b, #5
        ret

instead of the current:
G1:
        shl     v31.4s, v0.4s, 9
        usra    v31.4s, v0.4s, 23
        mov     v0.16b, v31.16b
        ret
G2:
        shl     v31.8b, v0.8b, 3
        usra    v31.8b, v0.8b, 5
        mov     v0.8b, v31.8b
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
generation of XAR sequences when possible.

gcc/testsuite/

* gcc.target/aarch64/rotate_xar_1.c: New test.

gcc/config/aarch64/aarch64.cc		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/rotate_xar_1.c	[new file with mode: 0644]	blob