]> git.ipfire.org Git - thirdparty/gcc.git/commit - gcc/doc/invoke.texi
aarch64: Add an early RA for strided registers
authorRichard Sandiford <richard.sandiford@arm.com>
Thu, 7 Dec 2023 19:41:19 +0000 (19:41 +0000)
committerRichard Sandiford <richard.sandiford@arm.com>
Thu, 7 Dec 2023 19:41:19 +0000 (19:41 +0000)
commit9f0f7d802482a8958d6cdc72f1fe0c8549db2182
tree5f21e89ba9eae66488e7c56531322b62ad743cb7
parent656f092cba951fddc1e40468ad71d241ffe98566
aarch64: Add an early RA for strided registers

This pass adds a simple register allocator for FP & SIMD registers.
Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
instructions, which require a very specific grouping structure,
and so would be difficult to exploit with general allocation.

The allocator is very simple.  It gives up on anything that would
require spilling, or that it might not handle well for other reasons.

The allocator needs to track liveness at the level of individual FPRs.
Doing that fixes a lot of the PRs relating to redundant moves caused by
structure loads and stores.  That particular problem is going to be
fixed more generally for GCC 15 by Lehua's RA patches.

However, the early-RA pass runs before scheduling, so it has a chance
to bag a spill-free allocation of vector code before the scheduler moves
things around.  It could therefore still be useful for non-SME code
(e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.

The pass is controlled by a tristate switch:

- -mearly-ra=all: run on all functions
- -mearly-ra=strided: run on functions that have access to strided registers
- -mearly-ra=none: don't run on any function

The patch makes -mearly-ra=all the default at -O2 and above for now.
We can revisit this for GCC 15 once Lehua's patches are in;
-mearly-ra=strided might then be more appropriate.

As said previously, the pass is very naive.  There's much more that we
could do, such as handling invariants better.  The main focus is on not
committing to a bad allocation, rather than on handling as much as
possible.

gcc/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* config.gcc: Add aarch64-early-ra.o for AArch64 targets.
* config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
* config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
* config/aarch64/aarch64.opt (mearly_ra): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc
(aarch_option_optimization_table): Use -mearly-ra=strided by
default for -O2 and above.
* config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass.
* config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
(make_pass_aarch64_early_ra): Declare.
* config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>):
Add a stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
(svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle
new way of defining multi-register loads and stores.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
* config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
(@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
(@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
* config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
function.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
(stride_type): New attribute.
* config/aarch64/constraints.md (Uwd, Uwt): New constraints.
* config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT)
(UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
(optab): Handle them.
(LD1_COUNT, ST1_COUNT): New iterators.
* config/aarch64/aarch64-early-ra.cc: New file.

gcc/testsuite/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
output test.
* gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
as well as .d.
* gcc.target/aarch64/sme/strided_1.c: New test.
* gcc.target/aarch64/pr109078.c: Likewise.
* gcc.target/aarch64/pr109391.c: Likewise.
* gcc.target/aarch64/sve/pr106694.c: Likewise.
23 files changed:
gcc/common/config/aarch64/aarch64-common.cc
gcc/config.gcc
gcc/config/aarch64/aarch64-early-ra.cc [new file with mode: 0644]
gcc/config/aarch64/aarch64-opts.h
gcc/config/aarch64/aarch64-passes.def
gcc/config/aarch64/aarch64-protos.h
gcc/config/aarch64/aarch64-sme.md
gcc/config/aarch64/aarch64-sve-builtins-base.cc
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/aarch64-sve2.md
gcc/config/aarch64/aarch64.cc
gcc/config/aarch64/aarch64.md
gcc/config/aarch64/aarch64.opt
gcc/config/aarch64/constraints.md
gcc/config/aarch64/iterators.md
gcc/config/aarch64/t-aarch64
gcc/doc/invoke.texi
gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
gcc/testsuite/gcc.target/aarch64/pr109078.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/pr109391.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sme/strided_1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/pr106694.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/shift_1.c