The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components. The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp. That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.
As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range. In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.
For T{F,D}mode in GCC 15 I think we could consider relaxing the
restriction imposed in aarch64_classify_address, as typically T{F,D}mode
should be allocated to FPRs. But such a change seems too invasive to
consider for GCC 14 at this stage (let alone backports).
Fortunately the new flexible load/store pair patterns in GCC 14 allow
this mode change to work without further changes. The backports are
more involved as we need to adjust the load/store pair handling to cater
for V16QImode in a few places.
Note that for the testcase we are relying on the torture options to add
-funroll-loops at -O3 which is necessary to trigger the ICE on trunk
(but not on the 13 branch).
gcc/ChangeLog:
PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
gcc/testsuite/ChangeLog:
PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
case ARM_PCS_SIMD:
/* The vector PCS saves the low 128 bits (which is the full
register on non-SVE targets). */
- return TFmode;
+ return V16QImode;
case ARM_PCS_SVE:
/* Use vectors of DImode for registers that need frame
--- /dev/null
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
+typedef struct {
+ long size_z;
+ int width;
+} dt_bilateral_t;
+typedef float dt_aligned_pixel_t[4];
+#pragma omp declare simd
+void dt_bilateral_splat(dt_bilateral_t *b) {
+ float *buf;
+ long offsets[8];
+ for (; b;) {
+ int firstrow;
+ for (int j = firstrow; j; j++)
+ for (int i; i < b->width; i++) {
+ dt_aligned_pixel_t contrib;
+ for (int k = 0; k < 4; k++)
+ buf[offsets[k]] += contrib[k];
+ }
+ float *dest;
+ for (int j = (long)b; j; j++) {
+ float *src = (float *)b->size_z;
+ for (int i = 0; i < (long)b; i++)
+ dest[i] += src[i];
+ }
+ }
+}