]> git.ipfire.org Git - thirdparty/gcc.git/commit
aarch64: Avoid redundant writes to FPMR
authorRichard Sandiford <richard.sandiford@arm.com>
Thu, 23 Jan 2025 13:57:01 +0000 (13:57 +0000)
committerRichard Sandiford <richard.sandiford@arm.com>
Thu, 23 Jan 2025 13:57:01 +0000 (13:57 +0000)
commit1886dfb27a296b31de46b44beae0f1db6c1584b6
treec4e71b4dcba2f0f4eff4df3d13506a0dddb07428
parentce6fc67da7f600b63985abeb39ba85440cbad549
aarch64: Avoid redundant writes to FPMR

GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register.  So:

- The FP8 intrinsics take a final uint64_t argument that
  specifies what value FPMR should have.

- If an FP8 operation is split across multiple functions,
  it is likely that those functions would have a similar argument.

If the object code has the structure:

    for (...)
      fp8_kernel (..., fpmr_value);

then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).

The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:

       mrs     tmp, fpmr
       cmp     tmp, <value>
       beq     1f
       msr     fpmr, <value>
     1:

instead of writing the same value to FPMR repeatedly.

This patch implements that.  It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.

Hopefully this really is the last part of the FP8 enablement.

gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
* config/aarch64/aarch64.md: Split moves into FPMR into a test
and branch around.
(aarch64_write_fpmr): New pattern.

gcc/testsuite/
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
cheap_fpmr_write by default.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
* gcc.target/aarch64/acle/fpmr-6.c: New test.
12 files changed:
gcc/config/aarch64/aarch64-tuning-flags.def
gcc/config/aarch64/aarch64.h
gcc/config/aarch64/aarch64.md
gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
gcc/testsuite/gcc.target/aarch64/acle/fp8.c
gcc/testsuite/gcc.target/aarch64/acle/fpmr-2.c
gcc/testsuite/gcc.target/aarch64/acle/fpmr-6.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/simd/vcvt_fpm.c
gcc/testsuite/gcc.target/aarch64/simd/vdot2_fpm.c
gcc/testsuite/gcc.target/aarch64/simd/vdot4_fpm.c
gcc/testsuite/gcc.target/aarch64/simd/vmla_fpm.c
gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp