git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Make FRM as global register [PR118103]

After we enabled the labe-combine pass after the mode-switching pass, it
will try to combine below insn patterns into op.  Aka:

(insn 40 5 41 2 (set (reg:SI 11 a1 [151])
  (reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
  (nil))
(insn 41 40 7 2 (set (reg:SI 69 frm)
  (const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
  (nil))
(insn 42 10 11 2 (set (reg:SI 69 frm)
  (reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
    (nil))

trying to combine definition of r11 in:
40: a1:SI=frm:SI
    into:
42: frm:SI=a1:SI
    instruction becomes a no-op:
(set (reg:SI 69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted: 8.000000), replacement cost =
2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40

For example we have code as blow:
   9   │ int test_exampe () {
  10   │   test ();
  11   │
  12   │   size_t vl = 4;
  13   │   vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
  14   │   va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
  15   │   va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
  16   │
  17   │   __riscv_vse16_v_f16m1(b, va, vl);
  18   │
  19   │   return 0;
  20   │ }

it will be compiled to:
  53   │ main:
  54   │     addi    sp,sp,-16
  55   │     sd  ra,8(sp)
  56   │     call    initialize
  57   │     lui a6,%hi(b)
  58   │     lui a2,%hi(a)
  59   │     addi    a3,a6,%lo(b)
  60   │     addi    a2,a2,%lo(a)
  61   │     li  a4,4
  62   │ .L8:
  63   │     fsrmi   2
  64   │     vsetvli a5,a4,e16,m1,ta,ma
  65   │     vle16.v v1,0(a2)
  66   │     slli    a1,a5,1
  67   │     subw    a4,a4,a5
  68   │     add a2,a2,a1
  69   │     vfnmadd.vv  v1,v1,v1
  >> The fsrm a0 insn is deleted by late-combine <<
  70   │     vfmsub.vv   v1,v1,v1
  71   │     vse16.v v1,0(a3)
  72   │     add a3,a3,a1
  73   │     bgt a4,zero,.L8
  74   │     lh  a4,%lo(b)(a6)
  75   │     li  a5,-20480
  76   │     addi    a5,a5,-1382
  77   │     bne a4,a5,.L14
  78   │     ld  ra,8(sp)
  79   │     li  a0,0
  80   │     addi    sp,sp,16
  81   │     jr  ra

This patch would like to add the FRM register to the global_regs as it
is a cooperatively-managed global register.  And then the fsrm insn will
not be eliminated by late-combine.  The related spec17 cam4 failure may
also caused by this issue too.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/118103

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the FRM as the global_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr118103-1.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[PR modula2/118010, modula2/118183] Rebuild bootstrap tools with lseek fix

This patch rebuilds the bootstrap tools mc and pge incorporating the fix to
libc.lseek. The tool mc is changed to omit INCLUDE_MEMORY from
checkGccConfigSystem. The pge tool on rebuild now requires
--gcc-config-system to pick up the system.h containing INCLUDE_MEMORY.
After rebuild all local INCLUDE_MEMORY definitions disappear.

gcc/m2/ChangeLog:

PR modula2/117737
PR modula2/118010
* Make-maintainer.in (PGE-MC-OPTIONS): New macro.
(m2/gm2-pge-boot/$(SRC_PREFIX)M2RTS.o): Use $(PGE-MC-OPTIONS).
(m2/gm2-pge-boot/$(SRC_PREFIX)SymbolKey.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)NameKey.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)Lists.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)Output.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)bnflex.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)RTentity.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)%.o): Ditto.
(m2/gm2-pge-boot/$(SRC_PREFIX)pge.o): Ditto.
(m2/gm2-auto/pg.o): Ditto.
(m2/gm2-auto/pge.o): Ditto.
(mc-autogen): Add include directory $(GMPINC) to $(CXX).
* mc/keyc.mod (checkGccConfigSystem): Remove
INCLUDE_MEMORY define.
* mc-boot/GASCII.cc (INCLUDE_MEMORY): Removed during rebuild.
* mc-boot/GASCII.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GArgs.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GArgs.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GAssertion.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GAssertion.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GBreak.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GBreak.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GCOROUTINES.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GCmdArgs.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GCmdArgs.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GDebug.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GDebug.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GDynamicStrings.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GDynamicStrings.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GEnvironment.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GEnvironment.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GFIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GFIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GFormatStrings.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GFormatStrings.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GFpuIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GFpuIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GIndexing.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GIndexing.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2Dependent.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2Dependent.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2EXCEPTION.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2EXCEPTION.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2RTS.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GM2RTS.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GMemUtils.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GMemUtils.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GNumberIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GNumberIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GPushBackInput.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GPushBackInput.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTExceptions.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTExceptions.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTco.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTentity.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTint.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GRTint.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSArgs.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GSArgs.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSFIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GSFIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSYSTEM.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSelective.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStdIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStdIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStorage.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStorage.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrCase.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrCase.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrIO.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrIO.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrLib.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStrLib.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GStringConvert.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GStringConvert.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSysExceptions.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GSysStorage.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GSysStorage.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GTimeString.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GTimeString.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GUnixArgs.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Galists.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Galists.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gdecl.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gdecl.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gdtoa.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gerrno.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gkeyc.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gkeyc.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gldtoa.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Glibc.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Glibm.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Glists.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Glists.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcComment.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcComment.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcComp.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcComp.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcDebug.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcDebug.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcError.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcError.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcFileName.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcFileName.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcLexBuf.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcLexBuf.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcMetaError.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcMetaError.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcOptions.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcOptions.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPreprocess.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPreprocess.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPretty.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPretty.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPrintf.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcPrintf.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcQuiet.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcQuiet.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcReserved.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcReserved.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcSearch.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcSearch.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcStack.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcStack.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcStream.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GmcStream.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcflex.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp1.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp1.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp2.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp2.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp3.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp3.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp4.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp4.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp5.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gmcp5.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GnameKey.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GnameKey.h (INCLUDE_MEMORY): Ditto.
* mc-boot/GsymbolKey.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/GsymbolKey.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gtermios.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gtop.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gvarargs.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gvarargs.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gwlists.cc (INCLUDE_MEMORY): Ditto.
* mc-boot/Gwlists.h (INCLUDE_MEMORY): Ditto.
* mc-boot/Gwrapc.h (INCLUDE_MEMORY): Ditto.
* pge-boot/GIndexing.h (INCLUDE_MEMORY): Ditto.
* pge-boot/GSEnvironment.h (INCLUDE_MEMORY): Ditto.
* pge-boot/GScan.h (INCLUDE_MEMORY): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Fortran: fix issues with variables in BLOCK DATA [PR58857]

PR fortran/58857

gcc/fortran/ChangeLog:

* class.cc (gfc_find_derived_vtab): Declare some frontend generated
variables and procedures (_vtab, _copy, _deallocate) as artificial.
(find_intrinsic_vtab): Likewise.
* trans-decl.cc (check_block_data_decls): New helper function.
(gfc_generate_block_data): Use it to emit warnings for variables
declared in a BLOCK DATA program unit but not in a COMMON block.

gcc/testsuite/ChangeLog:

* gfortran.dg/uncommon_block_data_2.f90: New test.

Move ferror out of hot loop of file cache

glibc ferror is surprisingly expensive. Move it out of the hot loop
of finding lines by setting a flag after the actual IO operations.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache_slot::m_error): New field.
(file_cache_slot::create): Clear m_error.
(file_cache_slot::file_cache_slot): Clear m_error.
(file_cache_slot::read_data): Set m_error on error.
(file_cache_slot::get_next_line): Use m_error instead of ferror.

PR modula2/118010 libc.def lseek procedure off_t bugfix

This patch fixes calls to lseek from m2 sources. The new data
type SYSTEM.COFF_T is used instead of SYSTEM.CCSIZE_T.

gcc/m2/ChangeLog:

PR modula2/118010
* gm2-libs-log/FileSystem.mod (doModeChange): Replace
LONGINT with COFF_T.
(SetPos): Use COFF_T for the return value and offset type
when calling lseek.
* gm2-libs/FIO.mod (SetPositionFromBeginning): Convert pos
to COFF_T.
(SetPositionFromEnd): Ditto.
* mc-boot/GFIO.cc: Rebuild.
* mc-boot/Glibc.h: Ditto.
* pge-boot/GFIO.cc: Ditto.
* pge-boot/Glibc.h: Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: Reinstate check for uninitialized bases with c++ <= 17 [PR118239]

We currently accept this code with c++ <= 17 even though it's invalid
since the base is not initialized (we properly reject it with c++ >= 20)

=== cut here ===
struct NoMut1 { int a, b; };
struct NoMut3 : NoMut1 {
constexpr NoMut3(int a, int b) {}
};
void mutable_subobjects() {
constexpr NoMut3 nm3(1, 2);
}
=== cut here ===

This is a fallout of r0-118700-gc2b3ec18a494e3, that ignores all fields
with DECL_ARTIFICIAL in cx_check_missing_mem_inits, including those that
represent base classes, and need to be checked.

This patch makes sure that we only skip fields that have DECL_ARTIFICIAL
if they don't have DECL_FIELD_IS_BASE.

PR c++/118239

gcc/cp/ChangeLog:

* constexpr.cc (cx_check_missing_mem_inits): Don't skip fields
with DECL_FIELD_IS_BASE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-base8.C: New test.

[RISC-V][PR target/116256] Improve handling of single bit constants

So under the umbrella of pr116256 (P3 regression) I've been exploring removal
of the mvconst_internal pattern.   Not surprisingly, that's going to cause all
kinds of undesirable fallout.  While I can kind of see a path forward for that
work, it's going to require some combine work that I don't think we want to
tackle in the context of gcc-15.

Essentially without mvconst_internal we'll have fully exposed constant
synthesis prior to combine.  Remember that combine has limits on what
combinations it will perform based on how many instructions are in the source
sequence.  If we need 2+ instructions to synthesize the constant, those eat
into our budget.

In a world without mvconst_internal we'd need to either improve combine to
handle 5 insns cases (which do show up in the testsuite) or we need to
significantly improve how combine handles REG_EQUAL notes.  5 insn combinations
sound like insanity to me.  So I'd tend to lean towards the latter, though
that's going to need some refactoring and diving into note redistribution
(ugh!).

In the mean time we can start limiting mvconst_internal.  For the remaining
case in pr116256 we have this code in combine:

> (insn 8 5 10 2 (set (reg:V2048HF 138 [ _5 ])
>         (vec_duplicate:V2048HF (reg:HF 142 [ x ]))) "j.c":152:11 3712 {*vec_duplicatev2048hf}
>      (expr_list:REG_DEAD (reg:HF 142 [ x ])
>         (nil)))
> (insn 10 8 11 2 (set (reg:DI 139)
>         (const_int 2048 [0x800])) "j.c":152:11 275 {*mvconst_internal}
>      (nil))      (insn 11 10 0 2 (set (mem:V2048HF (reg/f:DI 141 [ in ]) [1 MEM <vector(2048) _Float16> [(_Float16 *)in_7(D)]+0 S4096 A128])
>         (if_then_else:V2048HF (unspec:V2048BI [
>                     (const_vector:V2048BI [
>                             (const_int 1 [0x1]) repeated x2048
>                         ])
>                     (reg:DI 139)
>                     (const_int 2 [0x2]) repeated x3
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (reg:V2048HF 138 [ _5 ])
>             (unspec:V2048HF [
>                     (reg:DI 0 zero)
>                 ] UNSPEC_VUNDEF))) "j.c":152:11 3843 {*pred_movv2048hf}
>      (expr_list:REG_DEAD (reg/f:DI 141 [ in ])
>         (expr_list:REG_DEAD (reg:DI 0 zero)
>             (expr_list:REG_DEAD (reg:SI 66 vl)
>                 (expr_list:REG_DEAD (reg:SI 67 vtype)
>                     (expr_list:REG_DEAD (reg:V2048HF 138 [ _5 ])
>                         (expr_list:REG_DEAD (reg:DI 139)
>                             (nil))))))))

Note a couple things.  First insn 8 will be split shortly after combine and
will need the constant 2048.  But that's obviously exposed  late. Second (of
course) is the mvconst_internal pattern at insn 10.  After split1 we'll have:

> (insn 16 5 17 2 (set (reg:DI 144)         (const_int 4096 [0x1000])) "j.c":152:11 -1
>      (nil))
> (insn 17 16 18 2 (set (reg:DI 143)
>         (plus:DI (reg:DI 144)
>             (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
>      (expr_list:REG_EQUAL (const_int 2048 [0x800])
>         (nil)))
> (insn 18 17 19 2 (set (reg:V2048HF 138 [ _5 ])
>         (if_then_else:V2048HF (unspec:V2048BI [                     (const_vector:V2048BI [
>                             (const_int 1 [0x1]) repeated x2048
>                         ])
>                     (reg:DI 143)
>                     (const_int 2 [0x2]) repeated x3
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:V2048HF (reg:HF 142 [ x ]))
>             (unspec:V2048HF [                     (reg:DI 0 zero)
>                 ] UNSPEC_VUNDEF))) "j.c":152:11 -1
>      (nil))
> (insn 19 18 20 2 (set (reg:DI 145)
>         (const_int 4096 [0x1000])) "j.c":152:11 -1
>      (nil))
> (insn 20 19 11 2 (set (reg:DI 139)
>         (plus:DI (reg:DI 145)
>             (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
>      (expr_list:REG_EQUAL (const_int 2048 [0x800])
>         (nil)))
> (insn 11 20 0 2 (set (mem:V2048HF (reg/f:DI 141 [ in ]) [1 MEM <vector(2048) _Float16> [(_Float16 *)in_7(D)]+0 S4096 A128])
>         (if_then_else:V2048HF (unspec:V2048BI [
>                     (const_vector:V2048BI [
>                             (const_int 1 [0x1]) repeated x2048
>                         ])
>                     (reg:DI 139)                     (const_int 2 [0x2]) repeated x3
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (reg:V2048HF 138 [ _5 ])
>             (unspec:V2048HF [                     (reg:DI 0 zero)
>                 ] UNSPEC_VUNDEF))) "j.c":152:11 3843 {*pred_movv2048hf}
>      (expr_list:REG_DEAD (reg/f:DI 141 [ in ])
>         (expr_list:REG_DEAD (reg:DI 0 zero)             (expr_list:REG_DEAD (reg:SI 66 vl)
>                 (expr_list:REG_DEAD (reg:SI 67 vtype)
>                     (expr_list:REG_DEAD (reg:V2048HF 138 [ _5 ])
>                         (expr_list:REG_DEAD (reg:DI 139)
>                             (nil))))))))
Note the synthesis of 2048 appears twice.  I seriously considered adding a
local cprop pass at this point.  That could be done with a bit of work.  It
didn't look too bad -- the biggest problem is cprop isn't designed to run once
we've left cfglayout.  But we could probably finesse that by not allowing it to
change jumps if we've left cfglayout or converting it to do the more complex
jump fixups.

You might ask why the post-reload optimizers don't help since this at least
looks like a case where they could.  After LRA the RTL looks like:

> (insn 26 5 25 2 (set (reg:DI 15 a5 [144])
>         (const_int 4096 [0x1000])) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 277 {*movdi_64bit}      (expr_list:REG_EQUIV (const_int 4096 [0x1000])
>         (nil)))
> (insn 25 26 19 2 (set (reg:DI 15 a5 [143])
>         (plus:DI (reg:DI 15 a5 [144])
>             (const_int -2048 [0xfffffffffffff800]))) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 5 {adddi3}
>      (expr_list:REG_EQUIV (const_int 2048 [0x800])
>         (nil)))
> (insn 19 25 20 2 (set (reg:V2048QI 100 v4 [orig:138 _11 ] [138])
>         (if_then_else:V2048QI (unspec:V2048BI [
>                     (const_vector:V2048BI [
>                             (const_int 1 [0x1]) repeated x2048
>                         ])
>                     (reg:DI 15 a5 [143])
>                     (const_int 2 [0x2]) repeated x3
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:V2048QI (reg:QI 12 a2 [145]))
>             (unspec:V2048QI [                     (reg:DI 0 zero)
>                 ] UNSPEC_VUNDEF))) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 4172 {*pred_broadcastv2048qi}
>      (nil)) (insn 20 19 21 2 (set (reg:DI 15 a5 [146])
>         (const_int 4096 [0x1000])) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 277 {*movdi_64bit}       (expr_list:REG_EQUIV (const_int 4096 [0x1000])
>         (nil)))
> (insn 21 20 11 2 (set (reg:DI 15 a5 [139])
>         (plus:DI (reg:DI 15 a5 [146])
>             (const_int -2048 [0xfffffffffffff800]))) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 5 {adddi3}
>      (expr_list:REG_EQUIV (const_int 2048 [0x800])
>         (nil)))

Note the re-use of a5 for the constant synthesis steps.  That's going to spoil
any chance of reload_cse saving us.  That re-use also gets in the way of vsetvl
elimination and we ultimately get this code:

> foo10:
>         li      a5,4096
>         addi    a5,a5,-2048
>         vsetvli zero,a5,e16,m8,ta,ma
>         vfmv.v.f        v8,fa0
>         li      a5,4096
>         addi    a5,a5,-2048
>         vsetvli zero,a5,e16,m8,ta,ma
>         vse16.v v8,0(a0)
>         ret
The regression is we have the obviously redundant vsetvl.  The additional copy
of the synthesis is undesirable as well.

If we filter out single bit constants from mvconst_internal we trivially fix
that regression.  The only fallout is a class of saturation tests which want to
test against 0x80000000.   Under the hood this is a minor codegen issue
interacting badly with combine's deliberate rejection of simplification of
extensions of constants.  Rather than constructing the SImode constant, then
zero extending the result we can just generate the constant we actually want
directly in DImode.

The net is we fix the regression, don't introduce any obvious new regressions
and slightly reduce our dependence on mvconst_internal.  All good in my book.
Obviously I'll wait for pre-commit CI to render a verdict.

PR target/116256
gcc/
* config/riscv/riscv.md (mvconst_internal): Reject single bit
constants.
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Improve
handling constants.

c: Diagnose ,) at the end of OpenMP clauses [PR118639]

This is something the C++ FE has been diagnosing but C FE only
complained if there wasn't an identifier right after opening (

2025-01-25 Jakub Jelinek <jakub@redhat.com>

PR c/118639
* c-parser.cc (c_parser_omp_variable_list): Remove first variable
and emit "expected identifier" error regardless of it.

* c-c++-common/gomp/pr118639.c: New test.
* c-c++-common/goacc/cache-2.c: Remove one xfail for c.

c++: Only destruct elts of array for new expression if exception is thrown during the initialization [PR117827]

The following testcase r12-6328, because the elements of the array
are destructed twice, once when the callee encounters delete[] p;
and then second time when the exception is thrown.
The array elts should be only destructed if exception is thrown from
one of the constructors during the build_vec_init emitted code in case of
new expressions, but when the new expression completes, it is IMO
responsibility of user code to delete[] it when it is no longer needed.

So, the following patch uses the cleanup_flags argument to build_vec_init
to get notified of the flags that need to be changed when the expression
is complete and build_disable_temp_cleanup to do the changes.

2025-01-25 Jakub Jelinek <jakub@redhat.com>

PR c++/117827
* init.cc (build_new_1): Pass address of a make_tree_vector ()
initialized gc tree vector to build_vec_init and append
build_disable_temp_cleanup to init_expr from it.

* g++.dg/init/array66.C: New test.

PR modula2/118010 m2 libc lseek procedure interface correction

This patch corrects a typo in the definition of lseek in libc.
The second offset parameter should have been declared as COFF_T.
No errors are seen when bootstrapping using -Werror=odr
-Werror=lto-type-mismatch.

gcc/m2/ChangeLog:

PR modula2/118010
* gm2-compiler/P2SymBuild.mod (Debug): Comment out unused
procedure.
* gm2-libs/libc.def (lseek): Declare second parameter offset
as COFF_T.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++/modules: Treat unattached lambdas as TU-local [PR116568]

This fixes ICEs where unattached lambdas at class scope (for instance,
in member template instantiations) are streamed. This is only possible
in header units, as in named modules attempting to stream such lambdas
will be an error.

PR c++/116568

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind): Treat all lambdas
without a mangling scope as un-mergeable.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-8.h: New test.
* g++.dg/modules/lambda-8_a.H: New test.
* g++.dg/modules/lambda-8_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Diagnose TU-local lambdas, give mangling scope to lambdas in concepts

This fills in a hole left in r15-6378-g9016c5ac94c557 with regards to
detection of TU-local lambdas. Now that LAMBDA_EXPR_EXTRA_SCOPE is
properly set for most lambdas we can use it to detect lambdas that are
TU-local.

CWG2988 suggests that lambdas in concept definitions should not be
considered TU-local, since they are always unevaluated and should never
be emitted. This patch gives these lambdas a mangling scope (though it
will never be actually used in name mangling).

PR c++/116568

gcc/cp/ChangeLog:

* cp-tree.h (finish_concept_definition): Adjust parameters.
(start_concept_definition): Declare.
* module.cc (depset::hash::is_tu_local_entity): Use
LAMBDA_EXPR_EXTRA_SCOPE to detect TU-local lambdas.
* parser.cc (cp_parser_concept_definition): Start a lambda scope
for concept definitions.
* pt.cc (tsubst_lambda_expr): Namespace-scope lambdas may now
have extra scope.
(finish_concept_definition): Split into...
(start_concept_definition): ...this new function.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-4_b.C: Remove XFAIL, add lambda alias
testcase.
* g++.dg/modules/lambda-9.h: New test.
* g++.dg/modules/lambda-9_a.H: New test.
* g++.dg/modules/lambda-9_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245]

This is a step closer to implementing the suggested changes for
https://github.com/itanium-cxx-abi/cxx-abi/pull/85.  Most lambdas
defined within a class should have an extra scope of that class so that
uses across different TUs are properly merged by the linker.  This also
needs to happen during template instantiation.

While I was working on this I found some other cases where the mangling
of lambdas was incorrect and causing issues, notably the testcase
lambda-ctx3.C which currently emits the same mangling for the base class
and member lambdas, causing mysterious assembler errors since r14-9232.

One notable case not handled either here or in the ABI is what is
supposed to happen with such unattached lambdas declared in member
templates; see lambda-uneval22.  I believe that by the C++ standard,
such lambdas should also dedup across TUs, but this isn't currently
implemented, and it's not clear exactly how such lambdas should mangle.

Since this should only affect usage of lambdas in unevaluated contexts
(a C++20 feature) this patch does not add an ABI flag to control this
behaviour.

PR c++/118245

gcc/cp/ChangeLog:

* cp-tree.h (LAMBDA_EXPR_EXTRA_SCOPE): Adjust comment.
* parser.cc (cp_parser_class_head): Start (and do not finish)
lambda scope for all valid types.
(cp_parser_class_specifier): Finish lambda scope after parsing
members instead.
* pt.cc (instantiate_class_template): Add lambda scoping.

gcc/testsuite/ChangeLog:

* g++.dg/abi/lambda-ctx3.C: New test.
* g++.dg/cpp2a/lambda-uneval22.C: New test.
* g++.dg/cpp2a/lambda-uneval23.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

PR modula2/118589 Opaque type fields are visible outside implementation module

This patch fixes a bug shown when a variable declared as an opaque type is
dereferenced outside the declaration module.  The fix also improves error
recovery.  In the error cases it ensures that an error symbol is created
and the appropriate virtual token is assigned.  Finally there is a new
testsuite directory gm2.dg which contains tests to check against expected
error messages.

gcc/m2/ChangeLog:

PR modula2/118589
* gm2-compiler/M2MetaError.mod (symDesc): Add opaque type
description.
* gm2-compiler/M2Quads.mod (BuildDesignatorPointerError): New
procedure.
(BuildDesignatorPointer): Reimplement.
* gm2-compiler/P3Build.bnf (SubDesignator): Tidy up error message.
Use MetaErrorT2 rather than WriteForma1 and use the token pos from
the quad stack.

gcc/testsuite/ChangeLog:

PR modula2/118589
* lib/gm2-dg.exp (gm2.exp): load_lib.
* gm2.dg/pim/fail/badopaque.mod: New test.
* gm2.dg/pim/fail/badopaque2.mod: New test.
* gm2.dg/pim/fail/dg-pim-fail.exp: New test.
* gm2.dg/pim/fail/opaquedefs.def: New test.
* gm2.dg/pim/fail/opaquedefs.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

aarch64: Add +cpa feature flag

This doesn't enable anything within the compiler, but this allows the
flag to be passed the assembler. There also doesn't appear to be a
kernel cpuinfo name yet.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V9_5A): Add CPA.
* config/aarch64/aarch64-option-extensions.def (CPA): New.
* doc/invoke.texi: Document +cpa.

docs: Add +wfxt and +xs to armv9.2-a

I missed that the documentation doesn't include armv8.7-a
within armv9.2-a.

gcc/ChangeLog:

* doc/invoke.texi: Add +wfxt and +xs to armv9.2-a

aarch64: Add command line support for armv9.5-a

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V9_5A): New.
* doc/invoke.texi: Document armv9.5-a option.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/armv9p5.c: New test.

aarch64: Make AARCH64_FL_CRYPTO always unset

This feature flag bit only exists to support the +crypto alias. Outside
of option processing this bit needs to be set or unset consistently.
This patch goes with the latter option.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc: Assert that CRYPTO
bit is not set.
* config/aarch64/aarch64-feature-deps.h
(info<FEAT>.explicit_on): Unset CRYPTO bit.
(cpu_##CORE_IDENT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/crypto-alias-1.c: New test.

aarch64: Refactor aarch64_rewrite_mcpu

Use aarch64_validate_cpu instead of the existing duplicate (and worse)
version of the -mcpu parsing code.

The original code used fatal_error; I'm guessing that using error
instead should be ok.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_rewrite_selected_cpu): Refactor and inline into...
(aarch64_rewrite_mcpu): this.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): Delete.

aarch64: Rewrite architecture strings for assembler

Add infrastructure to allow rewriting the architecture strings passed to
the assembler (either as -march options or .arch directives). There was
already canonicalisation everywhere except for an -march driver option
passed directly to the compiler; this patch applies the same
canonicalisation there as well.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_arch_string_for_assembler): New.
(aarch64_rewrite_march): New.
(aarch64_rewrite_selected_cpu): Call new function.
* config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
* config/aarch64/aarch64-protos.h
(aarch64_get_arch_string_for_assembler): New.
* config/aarch64/aarch64.cc
(aarch64_declare_function_name): Call new function.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h
(EXTRA_SPEC_FUNCTIONS): Use new macro name.
(MCPU_TO_MARCH_SPEC): Rename to...
(MARCH_REWRITE_SPEC): ...this, and extend the spec rule.
(aarch64_rewrite_march): New declaration.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
(AARCH64_BASE_SPEC_FUNCTIONS): ...this, and add new function.
(ASM_CPU_SPEC): Use new macro name.

aarch64: Inline aarch64_get_all_extension_candidates

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_all_extension_candidates): Inline into...
(aarch64_print_hint_for_extensions): ...this.

aarch64: Move arch/cpu parsing to aarch64-common.cc

Aside from moving the functions, the only changes are to make them
non-static, and to use the existing info arrays within aarch64-common.cc
instead of the info arrays remaining in aarch64.cc.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_all_extension_candidates): Move within file.
(aarch64_print_hint_candidates): Move from aarch64.cc.
(aarch64_print_hint_for_extensions): Ditto.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_parse_arch): Ditto.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): Move within file.
(aarch64_print_hint_for_extensions): Share function prototype.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_get_all_extension_candidates): Unshare prototype.
* config/aarch64/aarch64.cc
(aarch64_parse_arch): Move to aarch64-common.cc.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_print_hint_candidates): Ditto.
(aarch64_print_hint_for_core): Ditto.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mtune): Ditto.

aarch64: Inline aarch64_print_hint_for_core_or_arch

It seems odd that we add "native" to the list for -march but not for
-mcpu. This is probably a bug, but for now we'll preserve the existing
behaviour.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_print_hint_candidates): New helper function.
(aarch64_print_hint_for_core_or_arch): Inline into callers.
(aarch64_print_hint_for_core): Inline callee and use new helper.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Use new helper.

aarch64: Adjust option parsing parameter types.

Replace `const struct processor *` in output parameters with
`aarch64_arch` or `aarch64_cpu`.

Replace `std:string` parameter in aarch64_print_hint_for_extensions with
`char *`.

Also name the return parameters more clearly and consistently.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_print_hint_for_extensions): Receive string as a char *.
(aarch64_parse_arch): Don't return a const struct processor *.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_validate_mcpu): Ditto, and use temporary variables for
march/mcpu cross-check.
(aarch64_validate_march): Ditto.
(aarch64_override_options): Adjust for changed parameter types.
(aarch64_handle_attr_arch): Ditto.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_tune): Ditto.

aarch64: Rename info structs in aarch64-common.cc

Also add a (currently unused) processor field to aarch64_processor_info,
and change name from "" to NULL for the terminating array entries.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Rename to..
(struct aarch64_extension_info): ...this.
(all_extensions): Update type name.
(struct arch_to_arch_name): Rename to...
(struct aarch64_arch_info): ...this, and rename name field.
(all_architectures): Update type names, and move before...
(struct processor_name_to_arch): ...this. Rename to...
(struct aarch64_processor_info): ...this, rename name field and
add cpu field.
(all_cores): Update type name, and set new field.
(aarch64_parse_extension): Update names.
(aarch64_get_all_extension_candidates): Ditto.
(aarch64_rewrite_selected_cpu): Ditto.

aarch64: Remove redundant generic cpu entry

The list of cores in aarch64-common.cc included an explicit "generic"
entry, despite this entry also being present in aarch64-cores.def.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(all_cores): Remove explicit generic entry.

aarch64: Replace duplicate cpu enums

Replace `enum aarch64_processor` and `enum target_cpus` with
`enum aarch64_cpu`, and prefix the entries with `AARCH64_CPU_`.
Also rename aarch64_none to aarch64_no_cpu.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h
(enum aarch64_processor): Rename to...
(enum aarch64_cpu): ...this, and rename the entries.
* config/aarch64/aarch64.cc
(aarch64_type): Rename type and initial value.
(struct processor): Rename member types.
(all_architectures): Rename enum members.
(all_cores): Ditto.
(aarch64_get_tune_cpu): Rename type and enum member.
* config/aarch64/aarch64.h (enum target_cpus): Remove.
(TARGET_CPU_DEFAULT): Rename default value.
(aarch64_tune): Rename type.
* config/aarch64/aarch64.opt:
(selected_tune): Rename type and default value.

aarch64: Improve mcpu/march conflict check

Features from a cpu or base architecture that were explicitly disabled
by a +nofeat option were being incorrectly added back in before checking
for conflicts between -mcpu and -march options. This patch instead
compares the returned feature masks directly.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Compare
returned feature masks directly.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/target_attr_crypto_ice_1.c: Prune warning.
* gcc.target/aarch64/target_attr_crypto_ice_2.c: Ditto.

[PR118497][IRA]: Fix calculation of cost of assigning callee-saved hard reg

  Assembler code generated by GCC for PR118497 contains unnecessary
move insn.  This happened as IRA assigns AX reg to a pseudo which
should be in BX reg later for a call.  The pseudo did not get BX as
LRA decided that it requires to save BX although BX will be saved
anyway.  The patch fixes the cost calculation.  Usage of hard reg
nrefs from regstat or DF will result in numerous failures as such
nrefs include artificial reg refs.  Therefore we add a calculation of
hard reg nrefs in IRA.  Also we change regexp used for scanning the
assembler in test vartrack-1.c as with the patch LRA assigns
callee-saved hard reg BP instead of another callee-saved hard reg BX
expected by the test.

gcc/ChangeLog:

PR target/118497
* ira-int.h (target_ira_int): Add x_ira_hard_regno_nrefs.
(ira_hard_regno_nrefs): New macro.
* ira.cc (setup_hard_regno_aclass): Remove unused code.  Modify
the comment.
(setup_hard_regno_nrefs): New function.
(ira): Call it.
* ira-color.cc (calculate_saved_nregs): Check
ira_hard_regno_nrefs.

gcc/testsuite/ChangeLog:

PR target/118497
* gcc.target/i386/pr118497.c: New.
* gcc.target/i386/vartrack-1.c: Modify the regexp.

c++: ICE with nested anonymous union [PR117153]

In a template, for

  union {
    union {
      T d;
    };
  };

build_anon_union_vars crates a malformed COMPONENT_REF: we have no
DECL_NAME for the nested anon union so we create something like "object.".
Most of the front end doesn't seem to care, but if such a tree gets to
potential_constant_expression, it can cause a crash.

We can use FIELD directly for the COMPONENT_REF's member.  tsubst_stmt
should build up a proper one in:

    if (VAR_P (decl) && !DECL_NAME (decl)
&& ANON_AGGR_TYPE_P (TREE_TYPE (decl)))
      /* Anonymous aggregates are a special case.  */
      finish_anon_union (decl);

PR c++/117153

gcc/cp/ChangeLog:

* decl2.cc (build_anon_union_vars): Use FIELD for the second operand
of a COMPONENT_REF.

gcc/testsuite/ChangeLog:

* g++.dg/other/anon-union6.C: New test.
* g++.dg/other/anon-union7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

testsuite: arm: Use -std=c17 for gcc.target/arm/thumb-bitfld1.c

Using -std=c17 avoids excess errors like:
.../thumb-bitfld1.c:15:1: warning: old-style function definition [-Wold-style-definition]

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb-bitfld1.c: Use -std=c17.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Use -Os -fno-math-errno in vfp-1.c [PR116448]

gcc/testsuite/ChangeLog:

PR testsuite/116448
* gcc.target/arm/vfp-1.c: Use -Os -fno-math-errno.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Fortran: Fix UTF-8 output with A edit descriptor.

This adjusts the source len for the case where the user has
specified a field width with the A descriptor.

PR libfortran/118571

libgfortran/ChangeLog:

* io/write.c (write_utf8_char4): Adjust the src_len to the
format width w_len when greater than zero.

gcc/testsuite/ChangeLog:

* gfortran.dg/utf8_3.f03: New test.

c++/modules: Fix linkage checks for exported using-decls

This patch attempts to fix an error when build module std. The reason for
the error is __builtin_va_list (aka struct __va_list) has internal linkage.
so mark this builtin type as TREE_PUBLIC to make struct __va_list has
external linkage.

g++ -fmodules -std=c++23 -fsearch-include-path bits/std.cc -c
std.cc:3642:14:error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external linkage
3642 | using std::va_list;
| ^~~~~~~
<built-in>: note: ‘struct __va_list’ declared here with internal linkage

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): Mark
__builtin_va_list as TREE_PUBLIC.
* config/arm/arm.cc (arm_build_builtin_va_list): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/builtin-8.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

jit: fix for write_reproducer [PR117886]

The original generated .c reproducer for PR jit/117886 did not compile,
with:

main.c: In function ‘create_code’:
main.c:600:9: error: initialization of ‘gcc_jit_rvalue *’ from incompatible pointer type ‘gcc_jit_lvalue *’ [-Wincompatible-pointer-types]
600 | local__1,
| ^~~~~~~~

The issue is that recording::ctor::write_reproducer was missing
creation of casts to gcc_jit_rvalue * for
gcc_jit_context_new_array_constructor and
gcc_jit_context_new_struct_constructor.

Fixed thusly.

gcc/jit/ChangeLog:
PR jit/117886
* jit-recording.cc (reproducer::get_identifier_as_rvalue): Handle
null memento.
(reproducer::get_identifier_as_lvalue): Likewise.
(reproducer::get_identifier_as_type): Likewise.
(recording::ctor::write_reproducer): Use get_identifier_as_rvalue
rather than get_identifier when writing out gcc_jit_rvalue *
expressions.

gcc/testsuite/ChangeLog:
PR jit/117886
* jit.dg/all-non-failing-tests.h: Add
test-pr117886-write-reproducer.c.
* jit.dg/test-pr117886-write-reproducer.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif-replay: respect prefix and suffix during installation [PR117670]

gcc/ChangeLog:
PR sarif-replay/117670
* Makefile.in (SARIF_REPLAY_INSTALL_NAME): New.
(install-libgdiagnostics): Use it,and exeext, rather than just
sarif-replay.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

tree-optimization/116010 - dr_may_alias regression

r15-491-gc290e6a0b7a9de fixed a latent issue with dr_analyze_innermost
and dr_may_alias where not properly analyzed DRs would yield an invalid
answer. This caused some missed optimizations in case there is not
actually any evolution in the not analyzed base part. The following
recovers this by only handling base parts which reference SSA vars
as index in the conservative way.

The gfortran.dg/vect/vect-8.f90 testcase is difficult to deal with,
so the following merely bumps the maximum number of expected vectorized loops
for both aarch64 and x86-64.

PR tree-optimization/116010
* tree-data-ref.cc (contains_ssa_ref_p_1): New function.
(contains_ssa_ref_p): Likewise.
(dr_may_alias_p): Avoid treating unanalyzed base parts without
SSA reference conservatively.

* gfortran.dg/vect/vect-8.f90: Adjust.

s390: Implement isfinite and isnormal optabs

Merge new optabs with the existing implementations for signbit and
isinf.

gcc/ChangeLog:

* config/s390/s390.h (S390_TDC_POSITIVE_ZERO): Remove.
(S390_TDC_NEGATIVE_ZERO): Remove.
(S390_TDC_POSITIVE_NORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_NORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_POSITIVE_DENORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_DENORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_POSITIVE_INFINITY): Remove.
(S390_TDC_NEGATIVE_INFINITY): Remove.
(S390_TDC_POSITIVE_QUIET_NAN): Remove.
(S390_TDC_NEGATIVE_QUIET_NAN): Remove.
(S390_TDC_POSITIVE_SIGNALING_NAN): Remove.
(S390_TDC_NEGATIVE_SIGNALING_NAN): Remove.
(S390_TDC_POSITIVE_DENORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_DENORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_POSITIVE_NORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_NORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_SIGNBIT_SET): Remove.
(S390_TDC_INFINITY): Remove.
* config/s390/s390.md (signbit<mode>2<tf_fpr>): Merge this one
(isinf<mode>2<tf_fpr>): and this one into
(<TDC_CLASS:tdc_insn><mode>2<tf_fpr>): new expander.
(isnormal<mode>2<tf_fpr>): New BFP expander.
(isnormal<mode>2): New DFP expander.
* config/s390/vector.md (signbittf2_vr): Merge this one
(isinftf2_vr): and this one into
(<tdc_insn>tf2_vr): new expander.
(signbittf2): Merge this one
(isinftf2): and this one into
(<tdc_insn>tf2): new expander.

gcc/testsuite/ChangeLog:

* gcc.target/s390/isfinite-isinf-isnormal-signbit-1.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit.h: New test.

tree-optimization/118634 - improve cunroll dump

We no longer subtract the estimated eliminated number of instructions
from the estimated size after unrolling we print - this is a bit
confusing when comparing dumps to previous releases.  The following
changes the dump from

  Estimated size after unrolling: 42

to

  Estimated size after unrolling: 42-12

for the testcase in the PR.

PR tree-optimization/118634
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely):
Dump the number of estimated eliminated insns.

Fix command flags for SVE2 faminmax

Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was
incorrect and this patch changes it so that it is gated behind
sve2+faminmax.

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md:
(*aarch64_pred_faminmax_fused): Fix to use the correct flags.
* config/aarch64/aarch64.h
(TARGET_SVE_FAMINMAX): Remove.
* config/aarch64/iterators.md: Fix iterators so that famax and
famin use correct flags.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_3.c: New test.

[ifcombine] check for more zero-extension cases [PR118572]

When comparing a signed narrow variable with a wider constant that has
the bit corresponding to the variable's sign bit set, we would check
that the constant is a sign-extension from that sign bit, and conclude
that the compare fails if it isn't.

When the signed variable is masked without getting the [lr]l_signbit
variable set, or when the sign bit itself is masked out, we know the
sign-extension bits from the extended variable are going to be zero,
so the constant will only compare equal if it is a zero- rather than
sign-extension from the narrow variable's precision, therefore, check
that it satisfies this property, and yield a false compare result
otherwise.

for gcc/ChangeLog

PR tree-optimization/118572
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
unsigned the variables whose extension bits are masked out.

for gcc/testsuite/ChangeLog

PR tree-optimization/118572
* gcc.dg/field-merge-24.c: New.

[ifcombine] improve reverse checking and operand swapping

Don't reject an ifcombine field-merging opportunity just because the
left-hand operands aren't both reversed, if the second compare needs
to be swapped for operands to match.

Also mention that reversep does NOT affect the turning of range tests
into bit tests.

for gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Document
reversep's absence of effects on range tests. Don't reject
reversep mismatches before trying compare swapping.

[ifcombine] out-of-bounds bitfield refs can trap [PR118514]

Check that BIT_FIELD_REFs of DECLs are in range before deciding they
don't trap.

Check that a replacement bitfield load is as trapping as the replaced
load.

for gcc/ChangeLog

PR tree-optimization/118514
* tree-eh.cc (bit_field_ref_in_bounds_p): New.
(tree_could_trap_p) <BIT_FIELD_REF>: Call it.
* gimple-fold.cc (make_bit_field_load): Check trapping status
of replacement load against original load.

for gcc/testsuite/ChangeLog

PR tree-optimization/118514
* gcc.dg/field-merge-23.c: New.

Daily bump.

c++: bogus error with nested lambdas [PR117602]

The error here should also check that we aren't nested in another
lambda; in it, at_function_scope_p() will be false.

PR c++/117602

gcc/cp/ChangeLog:

* cp-tree.h (current_nonlambda_scope): Add a default argument.
* lambda.cc (current_nonlambda_scope): New bool parameter. Use it.
* parser.cc (cp_parser_lambda_introducer): Use current_nonlambda_scope
to check if the lambda is non-local.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval21.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Small make_tree_vector_from_ctor improvement

After committing the append_ctor_to_tree_vector patch, I've realized
that for the larger constructors make_tree_vector_from_ctor unnecessarily
wastes one GC vector; make_tree_vector () / release_tree_vector () only
caches GC vectors from 4 to 16 allocated tree elements, so in the likely
case of a rather small ctor using make_tree_vector () can be beneficial,
we can pick something from the cache and if we don't need it later,
pt.cc calls release_tree_vector on it to return it back to the cache.
But for the larger ctors, we just eat one vector from the cache, never
use it (because the vec_safe_reserve will immediately allocate a different
vector) and never return it back to the cache.

So, the following patch passes NULL for the larger vectors, which
append_ctor_to_tree_vector handles just fine now (vec_safe_reserve will
just allocate appropriately sized vector).

2025-01-23 Jakub Jelinek <jakub@redhat.com>

* c-common.cc (make_tree_vector_from_ctor): Only use make_tree_vector
for ctors with <= 16 elements.

hppa: Fix typo in ADDITIONAL_REGISTER_NAMES in pa32-regs.h

2025-01-23 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa32-regs.h (ADDITIONAL_REGISTER_NAMES): Change
register 86 name to "%fr31L".

vect: Avoid copying of uninitialized variable [PR118628]

vectorizable_{store,load} does roughly
      tree offvar;
      tree running_off;
      if (!costing_p)
        {
          ... initialize offvar ...
        }
      running_off = offvar;
      for (...)
        {
          if (costing_p)
            {
              ...
              continue;
            }
          ... use running_off ...
        }
so, it copies unconditionally sometimes uninitialized variable (but then
uses the copied variable only if it was set to something initialized).
Still, I think it is better to avoid copying around maybe uninitialized
vars.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118628
* tree-vect-stmts.cc (vectorizable_store, vectorizable_load):
Initialize offvar to NULL_TREE.

Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613]

PR fortran/118613

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm
for inlined version of MINLOC and MAXLOC so that arguments are only
evaluted once, and create temporaries where necessary. Document
change of algorithm.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxval_arg_eval_count.f90: New test.

AVR: PR118012 - Try to work around sick code from match.pd.

This patch tries to work around PR118012 which may use a
full fledged multiplication instead of a simple bit test.
This is because match.pd's

/* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */
/* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */

"optimizes" code with op in { plus, ior, xor } like

  if (a & 1)
    b = b <op> c;

to something like:

  x1 = EXTRACT_BIT0 (a);
  x2 = c MULT x1;
  b = b <op> x2;

or

  x1 = EXTRACT_BIT0 (a);
  x2 = ZERO_EXTEND (x1);
  x3 = NEG x2;
  x4 = a AND x3:
  b = b <op> x4;

which is very expensive and may even result in a libgcc call for
a 32-bit multiplication on devices that don't even have MUL.
Notice that EXTRACT_BIT0 is already more expensive (slower, more
code, more register pressure) than a bit-test + branch.

The patch:

o Adds some combiner patterns that try to map sick code back
  to a bit test + branch.

o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the
  middle-end will use that alternative (which we map to sane code).

o On devices without MUL, 32-bit multiplication was performed by a
  library call, which bypasses the MULT (x AND 1) and similar patterns.
  Therefore, mulsi3 is also allowed for devices without MUL so that
  we get at MULT pattern that can be transformed.  (Though this is
  not possible on AVR_TINY since it passes arguments on the stack).

o Add a new command line option -mpr118012, so most of the patterns
  and cost computations can be switched off as they have
  avropt_pr118012 in their insn condition.

o Added sign-extract.0 patterns unconditionally (no avropt_pr118012).

Notice that this patch is just a work-around, it's not a fix of the
root cause, which are the patterns in match.pd that don't care about
the target and don't even care about costs.

The work-around is incomplete, and 3 of the new tests are still failing.
This is because there are situations where it does not work:

* The MULT is realized as a library call.

* The MULT is realized as an ASHIFT, and the ASHIFT again is transformed
  into something else.  For example, with -O2 -mmcu=atmega128,
  ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2).

PR tree-optimization/118012
PR tree-optimization/118360
gcc/
* config/avr/avr.opt (-mpr118012): New undocumented option.
* config/avr/avr-protos.h (avr_out_sextr)
(avr_emit_skip_pixop, avr_emit_skip_clear): New protos.
* config/avr/avr.cc (avr_adjust_insn_length)
[case ADJUST_LEN_SEXTR]: Handle case.
(avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)).
[MULT && avropt_pr118012]: Costs for MULT (x AND 1).
(avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New
functions.
* config/avr/avr.md [avropt_pr118012]: Add combine patterns with
that condition that try to work around PR118012.
(adjust_len) <sextr>: Add insn attr value.
(pixop): New code iterator.
(mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition.
gcc/testsuite/
* gcc.target/avr/mmcu/pr118012-1.h: New file.
* gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test.
* gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test.
* gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test.
* gcc.target/avr/mmcu/pr118360-1.h: New file.
* gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test.
* gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test.
* gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.

Optimize vector<bool>::operator[]

the following testcase:

  bool f(const std::vector<bool>& v, std::size_t x) {
    return v[x];
  }

is compiled as:

f(std::vector<bool, std::allocator<bool> > const&, unsigned long):
        testq   %rsi, %rsi
        leaq    63(%rsi), %rax
        movq    (%rdi), %rdx
        cmovns  %rsi, %rax
        sarq    $6, %rax
        leaq    (%rdx,%rax,8), %rdx
        movq    %rsi, %rax
        sarq    $63, %rax
        shrq    $58, %rax
        addq    %rax, %rsi
        andl    $63, %esi
        subq    %rax, %rsi
        jns     .L2
        addq    $64, %rsi
        subq    $8, %rdx
.L2:
        movl    $1, %eax
        shlx    %rsi, %rax, %rax
        andq    (%rdx), %rax
        setne   %al
        ret

which is quite expensive for simple bit access in a bitmap.  The reason is that
the bit access is implemented using iterators
return begin()[__n];
Which in turn cares about situation where __n is negative yielding the extra
conditional.

    _GLIBCXX20_CONSTEXPR
    void
    _M_incr(ptrdiff_t __i)
    {
      _M_assume_normalized();
      difference_type __n = __i + _M_offset;
      _M_p += __n / int(_S_word_bit);
      __n = __n % int(_S_word_bit);
      if (__n < 0)
        {
          __n += int(_S_word_bit);
          --_M_p;
        }
      _M_offset = static_cast<unsigned int>(__n);
    }

While we can use __builtin_unreachable to declare that __n is in range
0...max_size () but I think it is better to implement it directly, since
resulting code is shorter and much easier to optimize.

We now porduce:
.LFB1248:
        .cfi_startproc
        movq    (%rdi), %rax
        movq    %rsi, %rdx
        shrq    $6, %rdx
        andq    (%rax,%rdx,8), %rsi
        andl    $63, %esi
        setne   %al
        ret

Testcase suggests
        movq    (%rdi), %rax
        movl    %esi, %ecx
        shrq    $5, %rsi        # does still need to be 64-bit
        movl    (%rax,%rsi,4), %eax
        btl     %ecx, %eax
        setb    %al
        retq
Which is still one instruction shorter.

libstdc++-v3/ChangeLog:

PR target/80813
* include/bits/stl_bvector.h (vector<bool, _Alloc>::operator []): Do
not use iterators.

gcc/testsuite/ChangeLog:

PR target/80813
* g++.dg/tree-ssa/bvector-3.C: New test.

rtl-ssa: Avoid dangling phi uses [PR118562]

rtl-ssa uses degenerate phis to maintain an RPO list of
accesses in which every use is of the RPO-previous definition.
Thus, if it finds that a phi is always equal to a particular
value V, it sometimes needs to keep the phi and make V the
single input, rather than replace all uses of the phi with V.

The code to do that rerouted the phi's first input to the single
value V. But as this PR shows, it failed to unlink the uses of
the other inputs.

The specific problem in the PR was that we had:

x = PHI<x(a), V(b)>

The code replaced the first input with V and removed the second
input from the phi, but it didn't unlink the use of V associated
with that second input.

gcc/
PR rtl-optimization/118562
* rtl-ssa/blocks.cc (function_info::replace_phi): When converting
to a degenerate phi, make sure to remove all uses of the previous
inputs.

gcc/testsuite/
PR rtl-optimization/118562
* gcc.dg/torture/pr118562.c: New test.

aarch64: Avoid redundant writes to FPMR

GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR.  Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register.  So:

- The FP8 intrinsics take a final uint64_t argument that
  specifies what value FPMR should have.

- If an FP8 operation is split across multiple functions,
  it is likely that those functions would have a similar argument.

If the object code has the structure:

    for (...)
      fp8_kernel (..., fpmr_value);

then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).

The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:

       mrs     tmp, fpmr
       cmp     tmp, <value>
       beq     1f
       msr     fpmr, <value>
     1:

instead of writing the same value to FPMR repeatedly.

This patch implements that.  It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.

Hopefully this really is the last part of the FP8 enablement.

gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
* config/aarch64/aarch64.md: Split moves into FPMR into a test
and branch around.
(aarch64_write_fpmr): New pattern.

gcc/testsuite/
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Add
cheap_fpmr_write by default.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/acle/fp8.c: Add cheap_fpmr_write.
* gcc.target/aarch64/acle/fpmr-2.c: Likewise.
* gcc.target/aarch64/simd/vcvt_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot2_fpm.c: Likewise.
* gcc.target/aarch64/simd/vdot4_fpm.c: Likewise.
* gcc.target/aarch64/simd/vmla_fpm.c: Likewise.
* gcc.target/aarch64/acle/fpmr-6.c: New test.

aarch64: Fix memory cost for FPM_REGNUM

GCC 15 is going to be the first release to support FPMR.
While working on a follow-up patch, I noticed that for:

    (set (reg:DI R) ...)
    ...
    (set (reg:DI fpmr) (reg:DI R))

IRA would prefer to spill R to memory rather than allocate a GPR.
This is because the register move cost for GENERAL_REGS to
MOVEABLE_SYSREGS is very high:

  /* Moves to/from sysregs are expensive, and must go via GPR.  */
  if (from == MOVEABLE_SYSREGS)
    return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
  if (to == MOVEABLE_SYSREGS)
    return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);

but the memory cost for MOVEABLE_SYSREGS was the same as for
GENERAL_REGS, making memory much cheaper.

Loading and storing FPMR involves a GPR temporary, so the cost should
account for moving into and out of that temporary.

This did show up indirectly in some of the existing asm tests,
where the stack frame allocated 16 bytes for callee saves (D8)
and another 16 bytes for spilling a temporary register.

It's possible that other registers need the same treatment
and it's more than probable that this code needs a rework.
None of that seems suitable for stage 4 though.

gcc/
* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
for the cost of moving in and out of GENERAL_SYSREGS.

gcc/testsuite/
* gcc.target/aarch64/acle/fpmr-5.c: New test.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
a spill slot to be allocated.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.

aarch64: Allow FPMR source values to be zero

GCC 15 is going to be the first release to support FPMR.
The alternatives for moving values into FPMR were missing
a zero alternative, meaning that moves of zero would use an
unnecessary temporary register.

gcc/
* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64)
(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
to be zero.

gcc/testsuite/
* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.

tree-assume: Fix UB in assume_query [PR118605]

The assume_query constructor does
assume_query::assume_query (function *f, bitmap p) : m_parm_list (p),
                                                     m_func (f)
where m_parm_list is bitmap &.  This is compile time UB, because
as soon as the constructor returns, m_parm_list reference is still
bound to the parameter of the constructor which is no longer in scope.

Now, one possible fix would be change the ctor argument to be bitmap &,
but that doesn't really work because in the only user of that class
we have
      auto_bitmap decls;
...
      assume_query query (fun, decls);
and auto_bitmap just has
  operator bitmap () { return &m_bits; }
Could be perhaps const bitmap &, but why?  bitmap is a pointer:
typedef class bitmap_head *bitmap;
and the EXECUTE_IF_SET_IN_BITMAP macros don't really change that point,
they just inspect what is inside of that bitmap_head the pointer points
to.

So, the simplest I think is avoid references (which cause even worse
code as it has to be dereferenced twice rather than once).

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118605
* tree-assume.cc (assume_query::m_parm_list): Change type
from bitmap & to bitmap.

OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc. This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog:

* omp-low.cc (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.

testsuite: i386: Adjust gcc.target/i386/cmov12.c for Sun as syntax

The new gcc.target/i386/cmov12.c test FAILs on Solaris/x86 with the
native as:

FAIL: gcc.target/i386/cmov12.c scan-assembler-times cmovg 3

This happens because as uses a different syntax for cmov:

--- cmov12.s.bu243 2025-01-21 16:55:27.038829605 +0100
+++ cmov12.s.bu24390 2025-01-21 16:55:44.565051230 +0100
@@ -41,9 +41,9 @@
leal 1(%rdx), %ebp
movl (%r11), %esi
cmpl %eax, %esi
- cmovg %ebp, %edx
- cmovg %r11, %rcx
- cmovg %esi, %eax
+ cmovl.g %ebp, %edx
+ cmovq.g %r11, %rcx
+ cmovl.g %esi, %eax

The problem is even more prominent with the upcoming gas 2.44 which
added support for the Sun as syntax on Solaris, which gcc/configure
picks up.

This patch allows for both forms.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2025-01-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/cmov12.c (scan-assembler-times): Allow for
cmovl.g etc.

c++: Fix build_omp_array_section for type dependent array_expr [PR118590]

As can be seen on the testcase, when array_expr is type dependent, assuming
it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even
if not, blindly assuming it is a pointer or array type is also wrong.

So, like in many other spots in the C++ FE, for type dependent expressions
we want to create something which will survive until instantiation and can be
redone at that point.

Unfortunately, build_omp_array_section is called before we actually do any
kind of checking what array_expr really is, and on invalid code it can be e.g.
a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the
pr67522.C testcase). So, I've hacked this by checking it is not TYPE_DECL,
I hope a TYPE_P can't make it through there when we just lookup an identifier.

Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something]
during instantiation, so I think something needs to be done for this in pt.cc
as well.

2025-01-23 Jakub Jelinek <jakub@redhat.com>

PR c++/118590
* typeck.cc (build_omp_array_section): If array_expr is type dependent
or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type.

* g++.dg/goacc/pr118590.C: New test.

c++: Fix weird expression in test for clauses other than when/default/otherwise [PR118604]

Some clang analyzer warned about
if (!strcmp (p, "when") == 0 && !default_p)
which really looks weird, it is better to use strcmp (p, "when") != 0
or !!strcmp (p, "when"). Furthermore, as a micro optimization, it is cheaper
to evaluate default_p than calling strcmp, so that can be put first in the &&.

The C test for the same thing wasn't that weird, but I think for consistency
it is better to use the same test rather than trying to be creative.

2025-01-23 Jakub Jelinek <jakub@redhat.com>

PR c++/118604
gcc/c/
* c-parser.cc (c_parser_omp_metadirective): Rewrite
condition for clauses other than when, default and otherwise.
gcc/cp/
* parser.cc (cp_parser_omp_metadirective): Test !default_p
first and use strcmp () != 0 rather than !strcmp () == 0.

builtins: Store unspecified value to *exp for inf/nan [PR114877]

The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.

The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.

glibc and newlib store there 0, musl apparently doesn't store anything.

The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
      arg = builtin_save_expr (arg);
      return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.

2025-01-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.

* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.

testsuite: Only run test if alarm is available

Most baremetal toolchains will not have an implementation for alarm and
sigaction as they are target specific.
For arm-none-eabi with newlib, function signatures are exposed, but
there is no implmentation and thus the test cases causes a undefined
symbol link error.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78185.c: Remove dg-do and replace with
with dg-require-effective-target of signal and alarm.
* gcc.dg/pr116906-1.c: Likewise.
* gcc.dg/pr116906-2.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* lib/target-supports.exp(check_effective_target_alarm): New.

gcc/ChangeLog:

* doc/sourcebuild.texi (Effective-Target Keywords): Document
'alarm'.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz.

As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less. This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.

This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).

PR target/117726
gcc/
* config/avr/avr.cc (avr_ld_regno_p): New function.
(ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak.
(lshrsi3_out): Same.
(avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs.
* config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const):
Add "r,r,C4L" alternative.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative.
* config/avr/constraints.md (C4R, C4L): New,
gcc/testsuite/
* gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS):
Turn one option variant into -Oz.

Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087]

2025-01-23 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/96087
* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
backend decl, it is likely that it has come from a module proc
interface. Look for the formal symbol by name in the containing
proc and use its backend decl.
* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
same reason, match the name, rather than the symbol address to
perform the mapping.

gcc/testsuite/
PR fortran/96087
* gfortran.dg/pr96087.f90: New test.

tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE

There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen). Eventually this function could go away.

This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.

PR tree-optimization/118558
* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
through offset to dr_misalignment.
* tree-vect-stmts.cc (get_group_load_store_type): Compute
offset applied for negative stride and use it when querying
alignment of accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/pr118558.c: New testcase.

c++: Update mangling of lambdas in expressions

https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that
mangling a lambda expression should use 'L' rather than "tl".

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Update mangling for lambdas.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling.
* g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Fix mangling of lambdas in static data member initializers [PR107741]

This fixes an issue where lambdas declared in the initializer of a
static data member within the class body do not get a mangling scope of
that variable; this results in mangled names that do not conform to the
ABI spec.

To do this, the patch splits up grokfield for this case specifically,
allowing a declaration to be build and used in start_lambda_scope before
parsing the initializer, so that record_lambda_scope works correctly.

As a drive-by, this also fixes the issue of a static member not being
visible within its own initializer.

PR c++/107741

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Bump ABI version.

gcc/ChangeLog:

* common.opt: Add -fabi-version=20.
* doc/invoke.texi: Likewise.

gcc/cp/ChangeLog:

* cp-tree.h (start_initialized_static_member): Declare.
(finish_initialized_static_member): Declare.
* decl2.cc (start_initialized_static_member): New function.
(finish_initialized_static_member): New function.
* lambda.cc (record_lambda_scope): Support falling back to old
ABI (maybe with warning).
* parser.cc (cp_parser_member_declaration): Build decl early
when parsing an initialized static data member.

gcc/testsuite/ChangeLog:

* g++.dg/abi/macro0.C: Bump ABI version.
* g++.dg/abi/mangle74.C: Remove XFAILs.
* g++.dg/other/fold1.C: Restore originally raised error.
* g++.dg/abi/lambda-ctx2-19.C: New test.
* g++.dg/abi/lambda-ctx2-19vs20.C: New test.
* g++.dg/abi/lambda-ctx2-20.C: New test.
* g++.dg/abi/lambda-ctx2.h: New test.
* g++.dg/cpp0x/static-member-init-1.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Fix exporting temploid friends in header units [PR118582]

When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.

This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow). This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.

PR c++/118582

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Always stream
imported_temploid_friends information.
(trees_in::decl_value): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118582_a.H: New test.
* g++.dg/modules/pr118582_b.H: New test.
* g++.dg/modules/pr118582_c.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

LoongArch: Fix invalid subregs in xorsign [PR118501]

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign<mode>3): Use
force_lowpart_subreg.

i386: Omit "p" for packed in intrin name for FP8 convert

gcc/ChangeLog:

* config/i386/avx10_2-512convertintrin.h:
Omit "p" for packed for FP8.
* config/i386/avx10_2convertintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-convert-1.c: Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.

i386: Change mnemonics from VCVT[,T]NEBF162I[,U]BS to VCVT[,T]BF162I[,U]BS

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512satcvtintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTBF162IBS): Rename from UNSPEC_VCVTNEBF162IBS.
(UNSPEC_VCVTBF162IUBS): Rename from UNSPEC_VCVTNEBF162IUBS.
(UNSPEC_VCVTTBF162IBS): Rename from UNSPEC_VCVTTNEBF162IBS.
(UNSPEC_VCVTTBF162IUBS): Rename from UNSPEC_VCVTTNEBF162IUBS.
(UNSPEC_CVTNE_BF16_IBS_ITER): Rename to...
(UNSPEC_CVT_BF16_IBS_ITER): ...this. Adjust UNSPEC name.
(sat_cvt_sign_prefix): Adjust UNSPEC name.
(sat_cvt_trunc_prefix): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>nebf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
Rename to...
(avx10_2_cvt<sat_cvt_trunc_prefix>bf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
...this. Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCVTNEPH2[B,H]F8 to VCVTPH2[B,H]F8

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTPH2BF8): Rename from UNSPEC_VCVTNEPH2BF8.
(UNSPEC_VCVTPH2BF8S): Rename from UNSPEC_VCVTNEPH2BF8S.
(UNSPEC_VCVTPH2HF8): Rename from UNSPEC_VCVTNEPH2HF8.
(UNSPEC_VCVTPH2HF8S): Rename from UNSPEC_VCVTNEPH2HF8S.
(UNSPEC_CONVERTPH2FP8): Rename from UNSPEC_NECONVERTPH2FP8.
Adjust UNSPEC name.
(convertph2fp8): Rename from neconvertph2fp8. Adjust
iterator map.
(vcvt<neconvertph2fp8>v8hf): Rename to...
(vcvt<neconvertph2fp8>v8hf): ...this.
(*vcvt<neconvertph2fp8>v8hf): Rename to...
(*vcvt<neconvertph2fp8>v8hf): ...this.
(vcvt<neconvertph2fp8>v8hf_mask): Rename to...
(vcvt<neconvertph2fp8>v8hf_mask): ...this.
(*vcvt<neconvertph2fp8>v8hf_mask): Rename to...
(*vcvt<neconvertph2fp8>v8hf_mask): ...this.
(vcvt<neconvertph2fp8><mode><mask_name>): Rename to...
(vcvt<convertph2fp8><mode><mask_name>): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCVTNE2PH2[B,H]F8 to VCVT2PH2[B,H]F8

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVT2PH2BF8): Rename from UNSPEC_VCVTNE2PH2BF8.
(UNSPEC_VCVT2PH2BF8S): Rename from UNSPEC_VCVTNE2PH2BF8S.
(UNSPEC_VCVT2PH2HF8): Rename from UNSPEC_VCVTNE2PH2HF8.
(UNSPEC_VCVT2PH2HF8S): Rename from UNSPEC_VCVTNE2PH2HF8S.
(UNSPEC_CONVERTFP8_PACK): Rename from UNSPEC_NECONVERTFP8_PACK.
Adjust UNSPEC name.
(convertfp8_pack): Rename from neconvertfp8_pack. Adjust
iterator map.
(vcvt<neconvertfp8_pack><mode><mask_name>): Rename to...
(vcvt<convertfp8_pack><mode><mask_name>): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCOMSBF16 to VCOMISBF16

Besides mnemonics change, this patch also use the compare
pattern instead of UNSPEC.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/i386-expand.cc
(ix86_expand_fp_compare): Adjust comments.
(ix86_expand_builtin): Adjust switch case.
* config/i386/i386.md (cmpibf): Change instruction name output.
* config/i386/sse.md (UNSPEC_VCOMSBF16): Removed.
(avx10_2_comisbf16_v8bf): New.
(avx10_2_comsbf16_v8bf): Removed.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-comibf-1.c: Adjust asm check.
* gcc.target/i386/avx10_2-comibf-3.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-1.c: ...here.
Adjust output and intrin call.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/pr117495.c: Adjust asm check.

i386: Change mnemonics from V[GETEXP,FPCLASS]PBF16 to V[GETEXP,FPCLASS]BF16

Besides mnemonics change, this patch also fixed SDE test fail for
FPCLASS.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VFPCLASSBF16); Rename from UNSPEC_VFPCLASSPBF16.
(avx10_2_getexppbf16_<mode><mask_name>): Rename to...
(avx10_2_getexpbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_fpclasspbf16_<mode><mask_scalar_merge_name>):
Rename to...
(avx10_2_fpclassbf16_<mode><mask_scalar_merge_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

i386: Change mnemonics from V[RSQRT,SCALEF,SQRTNE]PBF16 to V[RSQRT,SCALEF,SQRT]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VSCALEFBF16): Rename from UNSPEC_VSCALEFPBF16.
(avx10_2_scalefpbf16_<mode><mask_name>): Rename to...
(avx10_2_scalefbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_rsqrtpbf16_<mode><mask_name>): Rename to...
(avx10_2_rsqrtbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_sqrtnepbf16_<mode><mask_name>): Rename to...
(avx10_2_sqrtbf16_<mode><mask_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsqrtbf16-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from V[GETMANT,REDUCENE,RNDSCALENE]PBF16 to V[GETMANT,REDUCE,RNDSCALE]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VRNDSCALEBF16): Rename from UNSPEC_VRNDSCALENEPBF16.
(UNSPEC_VREDUCEBF16): Rename from UNSPEC_VREDUCENEPBF16.
(UNSPEC_VGETMANTBF16): Rename from UNSPEC_VGETMANTPBF16.
(BF16IMMOP): Adjust iterator due to UNSPEC name change.
(bf16immop): Ditto.
(avx10_2_<bf16immop>pbf16_<mode><mask_name>): Rename to...
(avx10_2_<bf16immop>bf16_<mode><mask_name>): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.

i386: Change mnemonics from VMINMAXNEPBF16 to VMINMAXBF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512minmaxintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_MINMAXBF16): Rename from UNSPEC_MINMAXNEPBF16.
(avx10_2_minmaxnepbf16_<mode><mask_name>): Rename to...
(avx10_2_minmaxbf16_<mode><mask_name>): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-minmax-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-minmax-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.

i386: Change mnemonics from V[CMP,MAX,MIN]PBF16 to V[CMP,MAX,MIN]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_<code>pbf16_<mode><mask_name>): Rename to...
(avx10_2_<code>bf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_cmppbf16_<mode><mask_scalar_merge_name>): Rename to...
(avx10_2_cmpbf16_<mode><mask_scalar_merge_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: ...here.
* gcc.target/i386/avx10_2-vcmppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/part-vect-vec_cmpbf.c: Adjust asm check.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

i386: Change mnemonics from VF[,N]M[ADD,SUB][132,213,231]NEPBF16 to VF[,N]M[ADD,SUB][132,213,231]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
names according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_fmaddnepbf16_<mode>_maskz): Rename to...
(avx10_2_fmaddbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fmaddnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fmaddbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16_<mode>_mask): Rename to...
(avx10_2_fmaddbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16_<mode>_mask3): Rename to...
(avx10_2_fmaddbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_maskz): Rename to...
(avx10_2_fnmaddbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fnmaddbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_mask): Rename to...
(avx10_2_fnmaddbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_mask3): Rename to...
(avx10_2_fnmaddbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_maskz): Rename to...
(avx10_2_fmsubbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fmsubnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fmsubbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_mask): Rename to...
(avx10_2_fmsubbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_mask3): Rename to...
(avx10_2_fmsubbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_maskz): Rename to...
(avx10_2_fnmsubbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fnmsubbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_mask): Rename to...
(avx10_2_fnmsubbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_mask3): Rename to...
(avx10_2_fnmsubbf16_<mode>_mask3): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from V[ADDNE,DIVNE,MULNE,RCP,SUBNE]PBF16 to V[ADD,DIV,MUL,RCP,SUB]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md (div<mode>3): Adjust emit_insn.
(avx10_2_<insn>nepbf16_<mode><mask_name>): Rename to...
(avx10_2_<insn>bf16_<mode><mask_name>): ...this. Change
instruction name output.
(avx10_2_rcppbf16_<mode><mask_name>): Rename to...
(avx10_2_rcpbf16_<mode><mask_name>):...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: Move to ...
* gcc.target/i386/avx10_2-512-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vaddbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vdivbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmulbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrcpbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsubbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Move to ....
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vaddbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vdivbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmulbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrcpbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsubbf16-2.c: ...here. Adjust intrin call.
* lib/target-supports.exp (check_effective_target_avx10_2):
Adjust asm usage.
(check_effective_target_avx10_2_512): Ditto.

i386: Enhance AMX tests

After Binutils got changed, the previous usage on intrin will raise
warning for assembler. We need to change that. Besides that, there
are separate issues for both AMX-MOVRS and AMX-TRANSPOSE.

For AMX-MOVRS, t2rpntlvwrs tests wrongly used AMX-TRANSPOSE intrins
in test. Since the only difference between them is the "rs" hint,
it won't change result.

For AMX-TRANSPOSE, "t1" hint test is missing.

This patch fixed both of them. Also changing AMX-MOVRS test file
name to make it match with other AMX tests.

gcc/testsuite/ChangeLog:

PR target/118270
PR target/118609
* gcc.target/i386/amxmovrs-t2rpntlvw-2.c: Move to...
* gcc.target/i386/amxmovrs-2rpntlvwrs-2.c: ...here.
* gcc.target/i386/amxtranspose-2rpntlvw-2.c: Add "t1" hint test.

i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

These two testcases are misses on previous addition for
-march=x86-64-v3 to silence warning for -march=native tests.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vnniint16-auto-vectorize-4.c: Append
-march=x86-64-v3.
* gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.

Daily bump.

d,ada/spec: only sub nostd{inc,lib} rather than nostd{inc,lib}*

This prevents the gcc driver erroneously accepting -nostdlib++ when it
should not when Ada was enabled.

Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either
the Ada or D compiler, so the spec should not substitute those
either (thanks for pointing that out, Jakub).

Brought to my attention by Michał Górny <mgorny@gentoo.org>.

gcc/ada/ChangeLog:

* gcc-interface/lang-specs.h: Replace %{nostdinc*} %{nostdlib*}
with %{nostdinc} %{nostdlib}.

gcc/d/ChangeLog:

* lang-specs.h: Replace %{nostdinc*} with %{nostdinc}.

gcc/testsuite/ChangeLog:

* gcc.dg/driver-nostdlibstar.c: New test.

c++: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

On Wed, Aug 14, 2024 at 10:06:24AM +0200, Jakub Jelinek wrote:
> Though, now that I think about it again, perhaps what we could do instead
> is just make sure the _ZGVZ3barvEDC1x1y1z1wE initialization doesn't have
> a CLEANUP_POINT_EXPR in it and wrap both the _ZGVZ3barvEDC1x1y1z1wE
> and cp_finish_decomp created stuff into a single CLEANUP_POINT_EXPR.
> That way, perhaps _ZGVZ3barvEDC1x1y1z1wE could be initialized by one thread
> and _ZGVZ3barvE1x by a different, but the temporaries from _ZGVZ3barvEDC1x1y1z1wE
> initialization would be only destructed after the _ZGVZ3barvE1w guard
> was released by the thread which initialized _ZGVZ3barvEDC1x1y1z1wE.

Here is the I believe ABI compatible version, which uses the separate
guard variables, so different structured binding variables can be
initialized in different threads, but the thread that did the artificial
base initialization will keep temporaries live at least until the last
guard variable is released (i.e. when even that variable has been
initialized).

2025-01-22 Jakub Jelinek <jakub@redhat.com>

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

c++: further tweak to cxx_eval_outermost_constant_expr [PR118396]

This patch adds an error in a !allow_non_constant case when the
initializer/object types don't match.

PR c++/118396

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Add an error call
when !allow_non_constant.

Reviewed-by: Jason Merrill <jason@redhat.com>

jit: fix startup on aarch64

libgccjit fails on startup on aarch64 (and probably other archs).

The issues are that

(a) within jit_langhook_init the call to
targetm.init_builtins can use types that aren't representable
via jit::recording::type, and

(b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which
although a no-op for libgccjit has a gcc_unreachable.

Fixed thusly.

gcc/jit/ChangeLog:
* dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE,
bail out if the inner call to tree_type_to_jit_type fails.
Don't abort on unknown types.
(jit_langhook_pushdecl): Replace gcc_unreachable with return of
NULL_TREE.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

s390: Fix arch15 machine string for binutils

gcc/ChangeLog:

* config/s390/s390.cc: Fix arch15 machine string which must not
be empty.

aarch64: Fix aarch64_write_sysregdi predicate

While working on another MSR-related patch, I noticed that
aarch64_write_sysregdi's constraints allowed zero, but its
predicate didn't. This could in principle lead to an ICE
during or after RA, since "Z" allows the RA to rematerialise
a known zero directly into the instruction.

The usual techniques for exposing a bug like that didn't work in this
case, since the optimisers seem to make no attempt to remove redundant
zero moves (at least not for these unspec_volatiles). But the problem
still seems worth fixing pre-emptively.

gcc/
* config/aarch64/aarch64.md (aarch64_read_sysregti): Change
the source predicate to aarch64_reg_or_zero.

gcc/testsuite/
* gcc.target/aarch64/acle/rwsr-4.c: New test.
* gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized
variable.

AVR: Add test cases for PR118591.

gcc/testsuite/
PR rtl-optimization/118591
* gcc.target/avr/torture/pr118591-1.c: New test.
* gcc.target/avr/torture/pr118591-2.c: New test.

c++: Clear TARGET_EXPR_ELIDING_P when forced to use a copy constructor due to __no_unique_address__ [PR118199]

We currently fail with a checking assert upon the following valid code
when using -fno-elide-constructors

=== cut here ===
struct d { ~d(); };
d &b();
struct f {
  [[__no_unique_address__]] d e;
};
struct h : f  {
  h() : f{b()} {}
} i;
=== cut here ===

The problem is that split_nonconstant_init_1 detects that it cannot
elide the copy constructor due to __no_unique_address__ but does not
clear TARGET_EXPR_ELIDING_P, and due to -fno-elide-constructors, we trip
on a checking assert in cp_gimplify_expr.

This patch fixes this by making sure that we clear TARGET_EXPR_ELIDING_P
if we determine that we have to keep the copy constructor due to
__no_unique_address__. An alternative would be to just check for
elide_constructors in that assert, but I think it'd lose most of its
value if we did so.

PR c++/118199

gcc/cp/ChangeLog:

* typeck2.cc (split_nonconstant_init_1): Clear
TARGET_EXPR_ELIDING_P if we need to use a copy constructor
because of __no_unique_address__.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide3.C: New test.

LoongArch: Fix wrong code with <optab>_alsl_reversesi_extended

The second source register of this insn cannot be the same as the
destination register.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(<optab>_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the first source register
constraint to indicate the destination register cannot be same
as the second source register, and change the split condition to
reload_completed so that the insn will be split only after RA in
order to obtain allocated registers that satisfy the above
constraints.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
test.

c++: Improve cp_parser_objc_messsage_args compile time

On Tue, Jan 21, 2025 at 06:47:53PM +0100, Jakub Jelinek wrote:
> Indeed, I've just used what it was doing without thinking too much about it,
> sorry.
> addl_args = tree_cons (NULL_TREE, arg, addl_args);
> with addl_args = nreverse (addl_args); after the loop might be better,
> can test that incrementally. sel_args is handled the same and should have
> the same treatment.

Here is incremental patch to do that.

Verified also on the 2 va-meth*.mm testcases (one without CPP_EMBED, one
with) that -fdump-tree-gimple is the same before/after the patch.

2025-01-22 Jakub Jelinek <jakub@redhat.com>

* parser.cc (cp_parser_objc_message_args): Use tree_cons with
nreverse at the end for both sel_args and addl_args, instead of
chainon with build_tree_list second argument.

c++: Introduce append_ctor_to_tree_vector

On Mon, Jan 20, 2025 at 05:14:33PM -0500, Jason Merrill wrote:
> > --- gcc/cp/call.cc.jj       2025-01-15 18:24:36.135503866 +0100
> > +++ gcc/cp/call.cc  2025-01-17 14:42:38.201643385 +0100
> > @@ -4258,11 +4258,30 @@ add_list_candidates (tree fns, tree firs
> >     /* Expand the CONSTRUCTOR into a new argument vec.  */
>
> Maybe we could factor out a function called something like
> append_ctor_to_tree_vector from the common code between this and
> make_tree_vector_from_ctor?
>
> But this is OK as is if you don't want to pursue that.

I had the previous patch already tested and wanted to avoid delaying
the large initializer speedup re-reversion any further, so I've committed
the patch as is.

Here is an incremental patch to factor that out.

2025-01-22  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

c++: 'this' capture clobbered during recursive inst [PR116756]

Here during instantiation of generic lambda's op() [with I = 0] we
substitute into the call self(self, cst<1>{}) which requires recursive
instantiation of the same op() [with I = 1] (which isn't deferred due to
lambda's deduced return type. During this recursive instantiation, the
DECL_EXPR case of tsubst_stmt clobbers LAMBDA_EXPR_THIS_CAPTURE to point
to the child op()'s specialized capture proxy instead of the parent's,
and the original value is never restored.

So later when substituting into the openSeries call in the parent op()
maybe_resolve_dummy uses the 'this' proxy belonging to the child op(),
which leads to a context mismatch ICE during gimplification of the
proxy.

An earlier version of this patch fixed this by making instantiate_body
save/restore LAMBDA_EXPR_THIS_CAPTURE during a lambda op() instantiation.
But it seems cleaner to avoid overwriting LAMBDA_EXPR_THIS_CAPTURE in the
first place by making it point to the non-specialized capture proxy, and
instead call retrieve_local_specialization as needed, which is what this
patch implements. It's natural then to not clear LAMBDA_EXPR_THIS_CAPTURE
after parsing/regenerating a lambda.

PR c++/116756

gcc/cp/ChangeLog:

* lambda.cc (lambda_expr_this_capture): Call
retrieve_local_specialization on the result of
LAMBDA_EXPR_THIS_CAPTURE for a generic lambda.
* parser.cc (cp_parser_lambda_expression): Don't clear
LAMBDA_EXPR_THIS_CAPTURE.
* pt.cc (tsubst_stmt) <case DECL_EXPR>: Don't overwrite
LAMBDA_EXPR_THIS_CAPTURE with the specialized capture.
(tsubst_lambda_expr): Don't clear LAMBDA_EXPR_THIS_CAPTURE
afterward.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if-lambda7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

Revert "[PATCH 1/2] RISC-V:Add intrinsic support for the CMOs extensions"

This reverts commit d2c8548e0ce51dac6bc51d37236c50f98fca82f0.