git.ipfire.org Git - thirdparty/gcc.git/log

Fortran: Correct variable typespec in PDT specification exprs [PR84008]

2025-09-08 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84008
* decl.cc (insert_parameter_exprs): Correct the typespec of new
variable declarations, where the type is set to BT_PROCEDURE as
a precaution for resolution of the whole program unit.

gcc/testsuite/
PR fortran/84008
* gfortran.dg/pdt_45.f03: New test.

strlen: Handle empty constructor as memset for combining with malloc to calloc [PR87900]

This was noticed when turning memset (with constant size) into a store of an empty constructor
but can be reproduced without that.
In this case we have the following IR:
```
  p_3 = __builtin_malloc (4096);
  *p_3 = {};
```

Which we can treat the store as a memset.
So this patch adds the similar optimization as memset/malloc now for malloc/constructor.
This patch is on top of https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681439.html
(it calls allow_memset_malloc_to_calloc but that can be removed if that patch is rejected).

Changes since v1:
* v2: Correctly return false from handle_assign after removing stmt.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/87900

gcc/ChangeLog:

* tree-ssa-strlen.cc  (strlen_pass::handle_assign): Add RHS argument.
For empty constructor RHS, see if can combine with a previous malloc into
a calloc.
(strlen_pass::check_and_optimize_call): Update call to handle_assign;
passing NULL_TREE for RHS.
(strlen_pass::check_and_optimize_stmt): Update call to handle_assign.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/calloc-10.c: New test.
* gcc.dg/tree-ssa/calloc-11.c: New test.
* gcc.dg/tree-ssa/calloc-12.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

strlen: Don't do the malloc+memset->calloc optimization in some cases [PR83022]

This fixes a long standing (since GCC 5) issue where the malloc+memset->calloc
optimization would happen even if the memset was not always executed.
This is a varient of Nathan's patch: https://inbox.sourceware.org/gcc-patches/f4b5d106-8176-b7bd-709b-d435188783b0@acm.org/
Jeff Law had suggested to look at probabilities of the basic blocks to see
if it is profitable or not; I am not totally convinced that is a good idea.
Though this is an extended version of Nathan's patch as it uses post domination to see
if the memset is always called after the condition of null-ness.

PR tree-optimization/83022

gcc/ChangeLog:

* tree-ssa-strlen.cc (last_stmt_ptr_check): New function.
(allow_memset_malloc_to_calloc): New function.
(strlen_pass::handle_builtin_memset): Check to see if it is a good
idea to do the malloc+memset->calloc optimization.
(printf_strlen_execute): Free post dom info.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/calloc-6.c: New test.
* gcc.dg/tree-ssa/calloc-7.c: New test.
* gcc.dg/tree-ssa/calloc-8.c: New test.
* gcc.dg/tree-ssa/calloc-9.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

gcc: regenerate common.opt.urls

Needed to add -fdep-fusion.

gcc/ChangeLog:

* common.opt.urls: Regenerate.

Daily bump.

forwprop: Improve rejection of overlapping for copyprop of aggregates [PR121841]

Here we have:
tmp = src1[0];
dest1[0] = tmp;
where src1 and dest1 are decls.
We currently reject this as the bases are different but since the bases
are decls we know they won't overlap.
This adds the extra check to allow this.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/121841

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Allow
two different decls as bases as non-overlapping bases.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-prop-aggregate-struct-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

gcc: introduce the dep_fusion pass

>> +
>> +  // opt_pass methods:
>> +  opt_pass *clone () override { return new pass_dep_fusion (m_ctxt); }
>> +  bool gate (function *) override;
>> +  unsigned int execute (function *) override;
>
> Wouldn't it be better to add 'final' along with 'override' to opt_pass
> vfuncs?
> (See commit 725793af78064fa605ea6d9376aaf99ecb71467b, etc.)Yea.  It's easily missed.  Fixed in the obvious way.

Bootstrapped and regression tested on x86_64.  Pushed to the trunk.

gcc/
* dep-fusion.cc: Mark clone, gate and execute methods as final.

RISC-V: Add support for the XAndesvdot ISA extension.

This extension defines vector instructions to calculae of the signed/unsigned
dot product of four SEW/4-bit data and accumulate the result into a SEWbit
element for all elements in a vector register.

gcc/ChangeLog:

* config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New class.
(class nds_vd4dotsu): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vd4dots): Ditto.
(nds_vd4dotsu): Ditto.
(nds_vd4dotu): Ditto.
* config/riscv/andes-vector.md
(@pred_nds_vd4dot<su><mode>): New pattern.
(@pred_nds_vd4dotsu<mode>): New pattern.
* config/riscv/genrvv-type-indexer.cc (main): Modify sew of QUAD_FIX,
QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED.
* config/riscv/riscv-vector-builtins.cc
(qexti_vvvv_ops): New operand information.
(qexti_su_vvvv_ops): New operand information.
(qextu_vvvv_ops): New operand information.
* config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def.
(required_ext_to_isa_name): Add case XANDESVDOT_EXT.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: New test.

[RISC-V] Fix ordering of pipeline models

I missed that the new ascalon pipeline description was put into the wrong place
during review.  The net is tests which wanted to use generic-ooo explicitly for
stability in the test output ended up getting  a different pipeline model and
different codegen than the test expected.

This tripped a small number of vsetvl failures in the testsuite.

This has spun on riscv64-elf and riscv32-elf in my tester and fixes the
regression.  I'm going to go ahead and push it as I'm likely offline this
afternoon/evening and don't want anyone else to waste their time chasing the
regression down.

gcc/
* config/riscv/riscv-opts.h (riscv_microarchitecture_type): Fix ordering.

libphobos: enable for more hppa tuples

Gentoo uses hppa1.1*-*-linux* and hppa2.0*-*-linux* instead of Debian's
hppa-*-linux*.

libphobos/ChangeLog:

* configure.tgt: Add hppa[12]*-*-linux* as a supported target.

RISC-V: Add support for the XAndesvpackfph ISA extension.

This extension defines vector instructions to extract a pair of FP16 data from
a floating-point register. Multiply the top FP16 data with the FP16 elements
and add the result with the bottom FP16 data.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_FP_16 for XAndesvpackfph.
* config/riscv/andes-vector-builtins-bases.cc (nds_vfpmad): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vfpmadt): Ditto.
(nds_vfpmadb): Ditto.
(nds_vfpmadt_frm): Ditto.
(nds_vfpmadb_frm): Ditto.
* config/riscv/andes-vector.md (@pred_nds_vfpmad<nds_tb><mode>):
New pattern.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_F16_OPS): New def.
* config/riscv/riscv-vector-builtins.cc (f16_ops): Ditto
* config/riscv/riscv-vector-builtins.def (float32_type_node): Ditto.
* config/riscv/riscv-vector-builtins.h (XANDESVPACKFPH_EXT): Ditto.
(required_ext_to_isa_name): Add case XANDESVPACKFPH_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (VHF): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: New test.

c++: Update TLS model after processing a TLS variable

Set a tentative TLS model in grokvardecl and update TLS mode with the
default TLS access model after a TLS variable has been fully processed
if the default TLS access model is stronger.

gcc/cp/

PR c++/107393
* decl.cc (grokvardecl): Set a tentative TLS model which will be
updated by cplus_decl_attributes later.
* decl2.cc (cplus_decl_attributes): Update TLS model with the
default TLS access model if the default TLS access model is
stronger.
* pt.cc (tsubst_decl): Set TLS model only after processing a
variable.

gcc/testsuite/

PR c++/107393
* g++.dg/tls/pr107393-1.C: New test.
* g++.dg/tls/pr107393-2.C: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

AVR: ad target/121794 - Invoke zero_reg less.

gcc/
PR target/121794
* config/avr/avr.md (cmpqi3): Use cpi R,0 if possible.

libphobos: enable for powerpc64le-linux-gnu

libphobos/ChangeLog:

* configure.tgt: Add powerpc64le-linux-gnu as a supported target
when configured with --with-long-double-format=ieee.

RISC-V: Add test for vec_duplicate + vnmsub.vv unsigned combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsub.vvm
combine to vnmsub.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vnmsub.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vnmsub.vv signed combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsub.vv
combine to vnmsub.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vnmsub.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vnmsub.vv to vnmsub.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vnmsub.vv to the
vnmsub.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

Before this patch:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  ...
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  22   │     vnmsub.vv v1,v2,v3
  ...
  25   │     bne a3,zero,.L3

After this patch:
  11   │     beq a3,zero,.L8
  ...
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  20   │     vnmsub.vx v1,a2,v3
  ...
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Rename from.
(*mul_minus_vx_<mode>): Rename to and add nmsub support.
* config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Rename from.
(@pred_mul_minus_vx_<mode>): Rename to and add nmsub support.
(*pred_nmsac_<mode>_scalar_undef): Rename from.
(*pred_mul_minus_vx<mode>_undef): Rename to and add nmsub support.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

doc: drop verify-canonical-types=1 ref

--param verify-canonical-types was removed back in r0-81986-g7313518b90b280.

The same verification is controlled via our generic checking framework
these days.

gcc/ChangeLog:

* doc/generic.texi (TYPE_CANONICAL): Don't mention long-removed
--param verify-canonical-types.

dep_fusion: Fix if target does not have macro fusion [PR121835]

This new pass will ICE if the target does not define the macro_fusion_pair_p
pass. The pass will not be useful in that case so it is best to return
early.

Pushed as obvious after a bootstrap on x86_64-linux-gnu.

PR rtl-optimization/121835
gcc/ChangeLog:

* dep-fusion.cc (pass_dep_fusion::execute): Return early if
macro_fusion_pair_p is null.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

gcc: introduce the dep_fusion pass

Presently, the scheduler code only considers consecutive instructions
for macro-op fusion (see sched-deps.cc::sched_macro_fuse_insns () for
details).  This patch introduces the new dep_fusion pass, which is
intended to uncover more fusion opportunities by reordering eligible
instructions to form fusible pairs (based solely on the value of the
TARGET_SCHED_MACRO_FUSION_PAIR_P hook).  This is achieved by using
the RTL-SSA framework, and only the single-use instructions are
considered for the first instruction of a pair.

Aside from reordering instructions, this pass also sets the SCHED_GROUP
flag for the second instruction so that following passes can implement
special handling of the fused pairs.  For instance, RA and regrename
should make use of this information to preserve single-output property
for some of such pairs.  Accordingly, in passes.def, this patch adds two
invocations of the new pass: just before IRA and just before regrename.

The new pass is enabled at -O2+ and -Os.

gcc/ChangeLog:

* Makefile.in (OBJS): Add dep-fusion.o.
* common.opt (fdep-fusion): Add option.
* dep-fusion.cc: New pass.
* doc/invoke.texi: Document it.
* opts.cc (default_options_table): Enable it at -O2+ and -Os.
* passes.def: Insert two instances of dep_fusion.
* tree-pass.h (make_pass_dep_fusion): Declare new function.

doc: fix -momit-leaf-frame-pointer typo

For x86, the option is -momit-leaf-frame-pointer, not -fomit-leaf-frame-pointer.

gcc/ChangeLog:

* doc/invoke.texi (x86 Options): Fix '-momit-leaf-frame-pointer' typo.

forwprop: Factor out the memcpy followed by memset optimization

As simplify_builtin_call adds more and more optimization, it is
getting bigger and bigger and easier to misunderstand, so this
factors out the memcpy followed by memset optimization (which
was the original optimization added).

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_builtin_call): Factor out
the memcpy followed by a memset optimization to ...
(simplify_builtin_memcpy_memset): Here. New function.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

forwprop: Factor out memchr optimization to its own function

As more optimizations are added to forwprop's simplify_builtin_call,
this function is becoming harder and harder to understand. To help
simplify things, this factors out the memchr optimization to its own
function like what was done when memcmp optimization was added.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_builtin_call): Factor out the memchr
optimization to ...
(simplify_builtin_memchr): Here. New function.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

ipa: Fix build on MacOS

The build is broken on MacOS since r16-3581-g1da3c4d90e678a because
ipa-inline-transform.cc uses std::max but does not include <algorithm>.

This patch fixes it by defining INCLUDE_ALGORITHM in that file.

gcc/ChangeLog:

* ipa-inline-transform.cc: Define INCLUDE_ALGORITHM.

install: Properly capitalize GNU Binutils

gcc:
PR target/69374
* doc/install.texi (Prerequisites): Properly capitalize
GNU Binutils.
(Configuration): Ditto.
(Building): Ditto.
(Specific): Ditto.

doc: consistently say 'whole-program' where appropriate

Unchanged instances are deliberate.

gcc/ChangeLog:

* doc/invoke.texi: Say 'whole-program' consistently where
appropriate.

doc: consistently spell 'GNU Binutils'

gcc/ChangeLog:

* doc/invoke.texi: Capitalize 'GNU Binutils' consistently.

doc: update incremental link vs binutils information

GNU Binutils now supports linking LTO and non-LTO objects into a single
mixed object file as of 2.44. Update the text to reflect this and fix
some minor grammar issues while at it.

gcc/ChangeLog:
PR ipa/116410

* doc/invoke.texi (Link Options): Update -flinker-output= text
to reflect GNU Binutils changes. Fix grammar.

RISC-V: Add support for the XAndesvsintload ISA extension.

This extension defines vector load instructions to move sign-extended or
zero-extended INT4 data into 8-bit vector register elements.

gcc/ChangeLog:

* config/riscv/andes-vector-builtins-bases.cc
(nds_nibbleload): New class.
* config/riscv/andes-vector-builtins-bases.h (nds_vln8): New def.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector-builtins-functions.def (nds_vln8): Ditto.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector.md (@pred_intload_mov<su><mode>): New pattern.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_Q_OPS): New def.
(DEF_RVV_QU_OPS): Ditto.
* config/riscv/riscv-vector-builtins.cc
(q_v_void_const_ptr_ops): New operand information.
(qu_v_void_const_ptr_ops): Ditto.
* config/riscv/riscv-vector-builtins.def (void_const_ptr): New def.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVSINTLOAD_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (NDS_QVI): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: New test.

RISC-V: Add support for the XAndesvbfhcvt ISA extension.

This patch add support for XAndesvbfhcvt ISA extension.
This extension defines instructions to perform vector floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit
single-precision floating-point (SP) data in a vector register.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt.
* config.gcc: Add extra_objs andes-vector-builtins-bases.o
and extra_headers andes_vector.h.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): Increase size to 20.
* config/riscv/riscv-vector-builtins.cc
(f32_to_bf16_nf_w_ops): New operand information.
(f32_to_bf16_nf_w_ops): New operand information.
(DEF_RVV_FUNCTION): New def.
* config/riscv/riscv-vector-builtins.def (bf16): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT.
(required_extensions_specified): Ditto.
* config/riscv/t-riscv: Add andes-vector-builtins-functions.def,
andes-vector-builtins-bases.h and andes-vector-builtins-bases.o.
* config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator.
(NDS_V_DOUBLE_TRUNC_BF): New attr.
* config/riscv/andes-vector-builtins-bases.cc: New file.
* config/riscv/andes-vector-builtins-bases.h: New file.
* config/riscv/andes-vector-builtins-functions.def: New file.
* config/riscv/andes_vector.h: New file.
* config/riscv/andes-vector.md: New file.
* config/riscv/vector.md: Include andes_vector.md.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add regression for xandesvector.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfwcvtsbf16.c: New test.

RISC-V: Add tt-ascalon-d8 pipeline description

Add pipeline description for the Tenstorrent Ascalon 8 wide CPU.

gcc/ChangeLog
* config/riscv/riscv-cores.def (RISCV_TUNE): Update.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add tt_ascalon_d8.
* config/riscv/riscv.md: Update tune attribute and include
tt-ascalon-d8.md.
* config/riscv/tt-ascalon-d8.md: New file.

Fortran: Implement correct form of PDT constructors [PR84119]

2025-09-06 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84119
* resolve.cc (reset_array_ref_to_scalar): New function using
chunk broken out from gfc_resolve_ref.
(gfc_resolve_ref): Call the new function, the first time for
PDT type parameters and the second time for LEN inquiry refs.

gcc/testsuite/
PR fortran/84119
* gfortran.dg/pdt_20.f03: Modify to deal with scalar type parm.

phiopt: Improve locations for factor out conditional operation [PR108466]

This improves the locations for the phi args and the newly created statement.
Since this is a factorization/commonizing in one case the location for the new
statement will not always be set correctly either way.
The new locations on the new phi will either be the old location of the argument
to the phi or the location of the defining statement (if it exists).
The new statement will be either the location of the phi or the location
of the defining statements if the location of the phi is unknown.

This fixes the location of an uninitialized variable warning too.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/108466
gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Give better
locations to the new phi args and the new statement.

gcc/testsuite/ChangeLog:

* gcc.dg/uninit-pr108466-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

[RISC-V] Adjust recently added test

The recently added zbb-sext test includes stdint.h and explicitly asks for the
lp64 abi (not lp64d!).

This will fail on a native riscv system as the system headers don't support
lp64 -- they assume "d" is included.

It looks like most tests are including stdint-gcc instead of stdint. Not a fan
of that, but it seems to be how we've been handling this kind of issue to-date.

gcc/testsuite
* gcc.target/riscv/zbb-sext.c: Include stdint-gcc.h instead of
stdint.h.

c++/modules: Fix exported using-directive of imported namespace [PR121702]

Currently we represent exported using-directives as a list of indices
into the namespace array that we stream.  However this list of
namespaces doesn't include any namespaces that we don't expose in this
module's purview, and so we ICE.

This patch reworks the handling to instead use the existing depset
tracking for namespaces directly.  This means that we don't need to
build up a second lookup map when streaming, and we can reuse the logic
in {read,write}_namespace.  We do need to make sure that we create a
depset for namespaces only referenced by a using-directive, though.

I don't expect to be exporting large numbers of using-directives from a
namespace, so for simplicity we stream the names as {parent, target}
pairs.

This also adjusts read handling so that we load the using-directives for
any import (including indirect) if it's in the import list for the
current TU.  Otherwise we run into issues if the using-directive is in
a namespace that is otherwise never referenced in the 'export import'ing
module, because we never walk this namespace and so never know that we
need to emit it.  To do this the patch ensures that we calculate the
import list before read_language is called.

As a drive-by fix, I noticed that with modules 'add_using_namespace'
will add duplicate using-directives because we compare usings against
the target namespace, but we then push a wrapping USING_DECL instead.
This reworks so that the contents of the structure is equivalent between
modules and non-modules code.

PR c++/121702

gcc/cp/ChangeLog:

* module.cc (enum module_state_counts): New counter.
(depset::hash::add_namespace_entities): Seed using-directive
targets for later streaming.
(module_state::write_namespaces): Don't handle using-directives
here.
(module_state::read_namespaces): Likewise.
(module_state::write_using_directives): New function.
(module_state::read_using_directives): New function.
(module_state::write_counts): Log using-directives.
(module_state::read_counts): Likewise.
(module_state::write_begin): Stream using-directives.
(module_state::read_language): Read using-directives if
directly importing.
(module_state::direct_import): Update current TU import list
before calling read_language.
* name-lookup.cc (add_using_namespace): Fix lookup of previous
using-directives.
* parser.cc (cp_parser_import_declaration): Don't set
MK_EXPORTING when performing import_module.

gcc/testsuite/ChangeLog:

* g++.dg/modules/namespace-10_c.C: Add check for log dump.
* g++.dg/modules/namespace-13_a.C: New test.
* g++.dg/modules/namespace-13_b.C: New test.
* g++.dg/modules/namespace-13_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Support ADL on non-discarded GM entities [PR121705]

[basic.lookup.argdep] p4 says that ADL also finds declarations of
functions or function templates from a point of lookup within the
module, only ignoring discarded (or internal) GM entities.

To implement this we need to create bindings for these entities so that
we can guarantee that name lookup will discover they exist. This raises
some complications, though, as we ideally would like to avoid having
bindings that contain no declarations, or emitting GM namespaces that
only contain discarded or internal functions.

This patch does this by additionally creating a new binding whenever we
call make_dependency on a non-EK_FOR_BINDING decl. We don't do this for
using-decls, as at the point of use of a GM entity we no longer know
whether we called through a using-decl or the declaration directly;
however, this behaviour is explicitly supported by [module.global.frag]
p3.6.

Creating these bindings caused g++.dg/modules/default-arg-4_* to fail.
It turns out that this makes the behaviour look identical to
g++.dg/modules/default-arg-5, which is incorrectly dg-error-ing default
value redeclarations (we only currently error because of PR c++/99000).
This patch removes the otherwise identical test and turns the dg-errors
into xfailed dg-bogus.

As a drive-by fix this also fixes an ICE when debug printing friend
function instantiations.

PR c++/121705
PR c++/117658

gcc/cp/ChangeLog:

* module.cc (depset::hash::make_dependency): Make bindings for
GM functions.
(depset::hash::add_binding_entity): Adjust comment.
(depset::hash::add_deduction_guides): Add log.
* ptree.cc (cxx_print_xnode): Handle friend functions where
TI_TEMPLATE is an OVERLOAD or IDENTIFIER.

gcc/testsuite/ChangeLog:

* g++.dg/modules/default-arg-4_a.C: XFAIL bogus errors.
* g++.dg/modules/default-arg-4_b.C: Likewise.
* g++.dg/modules/default-arg-5_a.C: Remove duplicate test.
* g++.dg/modules/default-arg-5_b.C: Likewise.
* g++.dg/modules/adl-9_a.C: New test.
* g++.dg/modules/adl-9_b.C: New test.
* g++.dg/modules/gmf-5.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

libgomp: Use consistent formatting in <omp.h>

I've noticed the new C++ part of omp.h uses libstdc++ coding conventions,
while the rest of the header (and libgomp) is formatted using the normal
gcc coding conventions like gcc/.

This patch makes it consistent.

2025-09-06 Jakub Jelinek <jakub@redhat.com>

* omp.h.in: Fix up formatting of __cplusplus >= 201103L
guarded code from libstc++ style to GCC/libgomp style.

Daily bump.

gcc: PR121757 test needs LTO effective target

gcc/testsuite/ChangeLog:
PR rtl-optimization/121757

* g++.dg/pr121757.C: Add dg-require-effective-target for lto.

Fix uninitialized variable in frontend [PR121806]

gcc/ChangeLog:
PR middle-end/121806
* gcc.cc (for_each_path): Initialize return value.

RISC-V: Check if we can vec_extract [PR121510].

For Zvfhmin a vector mode exists but the corresponding vec_extract does
not. This patch checks that a vec_extract is available and otherwise
falls back to standard handling.

PR target/121510

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Check if we can
vec_extract.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr121510.c: New test.

c: Allow translations of a lot of C FE messages

I've noticed a lot of diagnostic messages in the C FE aren't marked
for translation.
The reason is some weird coding style which wraps the string
literals into (), especially when they don't fit on a single line.
With that fixed, there were 83 unique similar messages
"both %<something%> and %<something%> in declaration specifiers"
marked for translation, which is very unfriendly to translators,
the patch brings that down to 4 (if it was ok to change order,
it could be even 3):
msgid "both %qs and %qs in declaration specifiers"
msgid "both %qs and %<__int%d%> in declaration specifiers"
msgid "both %qs and %<_Float%d%s%> in declaration specifiers"
msgid "both %<__int%d%> and %qs in declaration specifiers"

2025-09-05 Jakub Jelinek <jakub@redhat.com>

* c-decl.cc (pushtag): Remove ()s around string literal
in call to diagnostic function.
(diagnose_mismatched_decls): Likewise.
(c_check_switch_jump_warnings): Likewise.
(grokdeclarator): Likewise.
(warn_cxx_compat_finish_struct): Likewise.
(build_enumerator): Formatting fix.
(declspecs_add_type): Remove ()s around string literal
in call to diagnostic function, simplify
"both %<something%> and %<something%>" starting format
strings to "both %qs and %qs" with appropriate arguments.
Formatting fixes.
* c-typeck.cc (build_external_ref): Remove ()s around string
literal in call to diagnostic function.
(build_conditional_expr): Likewise.
* c-parser.cc (c_parser_transaction): Use G_() around string
literals. Formatting fix.
(c_parser_transaction_expression): Likewise.

rtl-ssa: Maintain clobber_group invariant [PR121757]

In order to reduce time complexity, rtl-ssa groups consecutive
clobbers together.  Each group of clobbers has a splay tree for
lookup and manipulation purposes.

This arrangement means that we might need to split a group (when
inserting a new non-clobber definition between two clobbers) or
to join consecutive groups together (when deleting an intervening
non-clobber definition).  To reduce the time complexity of these updates,
the back pointer from a clobber to its group is only updated lazily.
The invariant is supposed to be that the first clobber, last clobber,
and splay tree root have the right group at all times, whereas other
members of the group can have identifiably stale group pointers.

However, a lack of abstraction meant that only some splay tree lookups
correctly maintained this invariant.  Others did not update the group
pointer after installing a new root.

This patch adds a helper that maintains the invariant and uses it in
three places, one that was already correct and two that were wrong.
The original lookup_clobber is still used in other code that
manipulates groups as a whole.

gcc/
PR rtl-optimization/121757
* rtl-ssa/accesses.h (clobber_group::lookup_clobber): New member
function.
* rtl-ssa/accesses.cc (clobber_group::lookup_clobber): Likewise.
(clobber_group::prev_clobber, clobber_group::next_clobber)
(function_info::add_clobber): Use it.

gcc/testsuite/
PR rtl-optimization/121757
* g++.dg/pr121757.C: New test.

libstdc++: Make join_view::_Iterator::_M_get_inner noexcept [PR121804]

Since this helper (added in r16-3576-g7f7f1878eedd80) is used in the
noexcept-spec of iter_move and iter_swap, it in turn needs an accurate
noexcept-spec.

PR libstdc++/121804

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_get_inner):
Mark noexcept.
* testsuite/std/ranges/adaptors/join.cc (test16): New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

cobol: Improved handling of COBOL Special Registers.

COBOL Special Registers (e.g., RETURN-CODE; DEBUG-ITEM) are implemented as
global variables. These changes define them with the prefix "__ggsr__" in
their variable names so that the GDB-COBOL debugger can identify them.

The creation and handling of such variables has been streamlined with the
introduction of the "register_e" cbl_field_t::attr bit.

gcc/cobol/ChangeLog:

* genapi.cc (trace1_init): Prepend two internal variables with
underscore.
(initialize_variable_internal): Use new register_e attribute.
(psa_global): Use "__ggsr__" prefix to identify special registers
(parser_symbol_add): Use new register_e attribute.
* symbols.cc (cbl_field_attr_str): Likewise.
(symbol_table_init): Likewise.
(is_register_field): Eliminated in favor of (attr & register_e).
* symbols.h (is_register_field): Likewise.

libgcobol/ChangeLog:

* common-defs.h (enum cbl_field_attr_t): Define register_e.
* constants.cc (struct cblc_field_t): Define special registers with
"__ggsr__" prefix.

libstdc++: Document remaining C++17 implementation-defined behavior.

This also covers bad_function_call::what from C++11.

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

testsuite: Fix gcc.dg/torture/pr121695-1.c

This test case fails on int < 32-bit platforms obviously.
This patch undoes the macro expansion from stdint.h.

gcc/testsuite/
PR testsuite/121695
PR testsuite/52641
* gcc.dg/torture/pr121695-1.c: int -> int32_t etc.

libstdc++: Document missing implementation defined behavior for std::filesystem.

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

AVR: target/121794 - Invoke zero_reg less.

There are some cases where involing zero_reg is not needed and
where there are other sequences with the same efficiency.
An example is to use SBCI R,0 instead of SBC R,__zero_reg__
when R >= R16. This may turn out to be better for small ISRs.

PR target/121794
gcc/
* config/avr/avr.cc (avr_out_compare): Only use zero_reg
when there is no other sequence of the same length.
(avr_out_plus_ext): Same.
(avr_out_plus_1): Same.

Avoid costing vector stmts with count == 0

This avoids confusing the backends.

* tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Do not
cost zero remaining scalar stmts.
(vectorizable_slp_permutation): Do not cost zero actual
permutations.
* tree-vect-stmts.cc (vectorizable_load): Likewise.

tree-optimization/121802 - fix vect_setup_realignment

The following avoids looking at STMT_VINFO_VECTYPE in
vect_setup_realignment and instead passes down the relevant vector
type.

PR tree-optimization/121802
* tree-vectorizer.h (vect_setup_realignment): Add vectype
argument.
* tree-vect-data-refs.cc (vect_setup_realignment): Replace
local vectype with argument.
* tree-vect-stmts.cc (vectorizable_load): Adjust.

c++: Fix cxx_eval_cxa_builtin_fn diagnostic message

Marek Polacek reported to me internally that I've messed up one diagnostic
message in this function, with one word before final double quote on one
line and another word right after opening double quote on the next line,
with no space in between.

Fixed thusly.

2025-09-05 Jakub Jelinek <jakub@redhat.com>

* constexpr.cc (cxx_eval_cxa_builtin_fn): Add missing word separating
space into invalid_nargs diagnostics.

testsuite: Fix up fixed-point/bitint-1.c test

This test was written without _BitInt support on any target with
fixed-point support as well, so was actually never tested.

Now that it can be tested on loongarch64-linux, there is a missing
expected error, so this patch adds it.

2025-09-05 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/fixed-point/bitint-1.c: Expect also error about _Sat used
without _Fract/_Accum.

Remove file that shouldn't have been committed.

2025-09-05 Jakub Jelinek <jakub@redhat.com>

* J: Remove.

testsuite, powerpc, v2: Fix vsx-vectorize-* after alignment peeling [PR118567]

On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!

Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.

The second one adds further 8 tests, which are dg-do run which #include
the former tests, don't do any dump tests and just define the checking/main
for those.

2025-09-05 Jakub Jelinek <jakub@redhat.com>

PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-9.c: New test.
* gcc.target/powerpc/vsx-vectorize-10.c: New test.
* gcc.target/powerpc/vsx-vectorize-11.c: New test.
* gcc.target/powerpc/vsx-vectorize-12.c: New test.
* gcc.target/powerpc/vsx-vectorize-13.c: New test.
* gcc.target/powerpc/vsx-vectorize-14.c: New test.
* gcc.target/powerpc/vsx-vectorize-15.c: New test.
* gcc.target/powerpc/vsx-vectorize-16.c: New test.

testsuite, powerpc, v2: Fix vsx-vectorize-* after alignment peeling [PR118567]

On Tue, Jul 01, 2025 at 02:50:40PM -0500, Segher Boessenkool wrote:
> No tests become good tests without effort. And tests that are not good
> tests require constant maintenance!

Here are two patches, either just the first one or both can be used
and both were tested on powerpc64le-linux.

The first one removes all the checking etc. stuff from the testcases,
as they are just dg-do compile, for the vectorize dump checks all we
care about are the vectorized loops they want to test.

2025-09-05 Jakub Jelinek <jakub@redhat.com>

PR testsuite/118567
* gcc.target/powerpc/vsx-vectorize-1.c: Remove includes, checking
part of main1 and main.
* gcc.target/powerpc/vsx-vectorize-2.c: Remove includes, replace
bar definition with declaration, remove main.
* gcc.target/powerpc/vsx-vectorize-3.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-4.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-5.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-6.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-7.c: Likewise.
* gcc.target/powerpc/vsx-vectorize-8.c: Likewise.

aarch64: Use SVE for V2DImode integer min/max operations

Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax
on 64-bit elements.  Thus, we can use them with the fixed-width V2DImode
expander.  Most of the machinery is already there on the define_insn side,
supporting V2DImode operands of the SVE pattern.  We just need to wire up
the RTL emission to the v2di standard names for the TARGET_SVE case.

So for the smin case we now generate:
min_di:
        ldr     q30, [x0]
        ptrue   p7.b, all
        ldr     q31, [x1]
        smin    z30.d, p7/m, z30.d, z31.d
        str     q30, [x2]
        ret

min_imm_di:
        ldr     q31, [x0]
        smin    z31.d, z31.d, #5
        str     q31, [x2]
        ret

instead of the previous:
min_di:
        ldr     q30, [x0]
        ldr     q31, [x1]
        cmgt    v29.2d, v30.2d, v31.2d
        bsl     v29.16b, v31.16b, v30.16b
        str     q29, [x2]
        ret

min_imm_di:
        ldr     q31, [x0]
        mov     z30.d, #5
        cmgt    v29.2d, v30.2d, v31.2d
        bsl     v29.16b, v31.16b, v30.16b
        str     q29, [x2]
        ret

The register operand case is the same length, though the new ptrue can now be
shared and moved away.  But the immediate operand case is obviously better
as the SVE immediate form doesn't require a predicate operand.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/iterators.md (sve_di_suf): New mode attribute.
* config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI):
Rename to...
(<optab><mode>3<sve_di_suf>): ... This.  Use SVE_I_SIMD_DI mode
iterator.
* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above
for TARGET_SVE.

gcc/testsuite/

* gcc.target/aarch64/sve/usminmax_di.c: New test.

Fortran: Check PDT parameters are of integer type [PR84432, PR114815]

2025-09-04 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84432
PR fortran/114815
* expr.cc (gfc_check_assign_symbol): Check that components in a
PDT with a default initializer have type and length parameters
that reduce to constant integer expressions.
* trans-expr.cc (gfc_trans_assignment_1): Parameterized
components cannot have default initializers so they must be
allocated after initialization.

gcc/testsuite/
PR fortran/84432
PR fortran/114815
* gfortran.dg/pdt_26.f03: Update with default no initializer.
* gfortran.dg/pdt_27.f03: Change to test non-conforming
initializers.

Fortran: Check PDT parameters are of integer type [PR83762, PR102457]

2025-09-05 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/83762
PR fortran/102457
* decl.cc (gfc_get_pdt_instance): Check that variable PDT parm
expressions are of type integer. Note that the symbol must be
tested since the expression often appears as BT_PROCEDURE.

gcc/testsuite/
PR fortran/83762
PR fortran/102457
* gfortran.dg/pdt_44.f03: New test.
* gfortran.dg/pr95090.f90: Give the PDT parameter a value to
suppress the type error.

Daily bump.

RISC-V: Add test for vec_duplicate + vmadd.vv unsigned combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vmadd.vvm
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmadd.vv signed combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vmadd.vv
combine to vmadd.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vmadd.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmadd-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Adjust the vmacc.vx combine test cases

To avoid generating the vmadd.vx code.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Adjust the
vmacc.vx to avoid generating vmadd.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmadd.vv to vmadd.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmadd.vv to the
vmadd.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

Before this patch:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  ...
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  22   │     vmadd.vv v1,v2,v3
  ...
  25   │     bne a3,zero,.L3

After this patch:
  11   │     beq a3,zero,.L8
  ...
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  20   │     vmadd.vx v1,a2,v3
  ...
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Rename to
handle both the macc and madd.
(*mul_plus_vx_<mode>): Add madd pattern.
* config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Rename to
handle both the macc and madd.
(*pred_macc_<mode>_scalar_undef): Remove.
(*pred_nmsac_<mode>_scalar_undef): Remove.
(*pred_mul_plus_vx<mode>_undef): Add new pattern to handle
both the vmacc and vmadd.
(@pred_mul_plus_vx<mode>): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Adjust aarch64/spaceship_1.C testcase for recent changes [PR121732]

In r16-3414 libstdc++ changed ABI for (still experimental C++20) and uses
unordered value -128 instead of 2.  Generally the change improved code
generation on all targets tested, see
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693534.html
for details.
In r16-3474 I've adjusted the middle-end and backends to use that value.
This apparently broke the spaceship_1.C test on aarch64 which scans the
exact function bodies which are now different.

The following patch adjusts the full body patterns to match.  On these
2 routines, the generated code is 1 insn longer than in the past, so if
you have ideas how to change the code generation for the common case of
-1, 0, 1, -128 value, maybe it could be improved.

2025-09-04  Jakub Jelinek  <jakub@redhat.com>

PR testsuite/121732
PR target/117013
* g++.target/aarch64/spaceship_1.C: Adjust expected fn bodies
for _Z8ss_floatff and _Z9ss_doubledd.

Fix ICE with auto-fdo and -fpartial-profiling

With -fpartial-profling we ICE building perlbench and gcc from spec2k17 since
afdo_annotate_cfg applies knowlede about zero profiles too early. This patch
moves it after the early exit when profile is 0 everywhere and also fixes
formatting issue in the next block.

gcc/ChangeLog:

* auto-profile.cc (afdo_annotate_cfg): Apply zero_bbs after early
exit for missing profile; fix formating

Fix scalng of auto-fdo profiles in liner

with auto-fdo it is possible that function bar with non-zero profile is inlined
into foo with zero profile and foo is the only caller of it. In this case
we currently scale bar to also have zero profile which makes it optimized
for size. With normal profiles this does not happen, since basic blocks with
non-zero count must have some way to be reached.

This patch makes inliner to scale caller in this case which mitigates the
problem (to some degree).

Bootstrapped/regtested x86_64-linux, plan to commit it shortly.

gcc/ChangeLog:

* ipa-inline-transform.cc (inline_call): If function with
AFDO profile is inlined into function with
GUESSED_GLOBAL0_AFDO or GUESSED_GLOBAL0_ADJUSTED, scale
caller to AFDO profile.
* profile-count.h (profile_count::apply_scale): If num is AFDO
and den is not GUESSED, make result AFDO rather then GUESSED.

MAINTAINERS: Add myself as an aarch64 port reviewer

Following on from the announcement here:
https://gcc.gnu.org/pipermail/gcc/2025-July/246267.html
adding myself as an aarch64 port reviewer.

ChangeLog:

* MAINTAINERS (Reviewers): Add myself for the aarch64 port.

optab: Add optab for isnan

Add an optab for isnan. This requires changes to the existing folding code
to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN.
It now checks for a valid optab before emitting the generic expansion.
There is no change if no optab is defined. Update documentation.

gcc:
* builtins.cc (interclass_mathfn_icode): Add support for isnan
optab.
(expand_builtin): Add BUILT_IN_ISNAN to expand isnan optab.
(fold_builtin_interclass_mathfn): Expand BUILT_IN_ISNAN only after
checking for a valid optab.
(fold_builtin_classify): Move generic BUILT_IN_ISNAN expansion
to fold_builtin_interclass_mathfn.
(fold_builtin_1): For BUILT_IN_ISNAN first try fold_builtin_classify,
then fold_builtin_interclass_mathfn.
* optabs.def: Add isnan optab.
* doc/md.texi: Document isnan.

TLC to vect_create_epilog_for_reduction

The following removes back-and-forth of state in
vect_create_epilog_for_reduction and code that's pointless, in
particular around double reduction handling which isn't that
special as it seems.

* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Remove unnecessary code around double reductions.

arm: wrong code from vset_lane_* [PR121775]

Insufficient validation of the operands in vec_set_<mode>_internal
means that the optimizers can transform the exanded code into
something that is invalid. We then emit code based on the incorrect
RTL assuming that it is still valid. A valid pattern can only have a
single bit set in the immediate operand, representing the lane to be
written.

gcc/ChangeLog:

PR target/121775
* config/arm/neon.md (vec_set<mode>_internal, all variants):
validate the immediate operand that indicates the lane to
modify.

gcc/testsuite/ChangeLog:

PR target/121775
* gcc.target/arm/simd/vset_lane_u8.c: New test.

libstdc++: Conditionalize LWG 3569 changes to join_view

LWG 3569 adjusted join_view's iterator specification to handle non
default-constructible iterators by wrapping the corresponding data member
in std::optional, which we followed suit in r13-2649-g7aa80c82ecf3a3.

But this wrapping is unnecessary for iterators that are already
default-constructible.  Rather than unconditionally using std::optional
here, which introduces time/space overhead, this patch conditionalizes
our LWG 3569 changes on the iterator in question being non-forward (and
thus non default-constructible).  We check forwardness instead of
default-constructibility in order to accommodate input-only iterators
that satisfy but do not model default_initializable, e.g. whose default
constructor is underconstrained.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_satisfy):
Adjust to handle non-std::optional _M_inner as per before LWG 3569.
(join_view::_Iterator::_M_get_inner): New.
(join_view::_Iterator::_M_inner): Don't wrap in std::optional if
the iterator is forward.  Initialize.
(join_view::_Iterator::operator*): Use _M_get_inner instead
of *_M_inner.
(join_view::_Iterator::operator++): Likewise.
(join_view::_Iterator::iter_move): Likewise.
(join_view::_Iterator::iter_swap): Likewise.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Reuse _Bind_back_t functor in ranges::_Partial

This patch refactors ranges::_Partial to be implemented using _Bind_back_t.
This allows it to benefit from the changes in r16-3398-g250dd5b5604fbc,
specifically making the closure trivially copyable. Since _Bind_back_t
already provides an optimized implementation for a single bound argument,
specializations for _Partial with a single argument are now removed.

We still preserve a specialization of _Partial for trivially copy-constructible
arguments that define only a const overload of operator(). To avoid
re-checking invocability constraints, this specialization calls the now-public,
unconstrained _Binder::_S_call static method instead of the constrained
_Binder::operator().

The primary specialization of _Partial retains its operator(), which
uses a simpler __adaptor_invocable constraint that does not consider
member pointers, as they are not relevant here. This implementation also
calls _Binder::_S_call to avoid re-performing overload resolution and
invocability checks for _Binder::operator().

Finally, the _M_binder member (_Bind_back_t) is now marked
[[no_unique_address]]. This is beneficial as ranges::_Partial is used with
ranges::to, which commonly has zero or empty bound arguments (e.g., stateless
allocators, comparators, or hash functions).

libstdc++-v3/ChangeLog:

* include/bits/binders.h (_Binder::_S_call): Make public.
* include/std/ranges (ranges::_Partial<_Adaptor, _Args...>):
Replace tuple<_Args...> with _Bind_back_t<_Adaptor, _Args...>.
(ranges::_Partial<_Adaptor, _Arg>): Remove.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Move _Binder and related aliases to separate file.

bits/binders.h is already mapped in libstdc++-v3/doc/doxygen/stdheader.cc.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add bits/binders.h
* include/Makefile.in: Add bits/binders.h
* include/std/functional (std::_Indexed_bound_arg, std::_Binder)
(std::__make_bound_args, std::_Bind_front_t, std::_Bind_back_t):
Moved to bits/binders.h file, that is now included.
* include/bits/binders.h: New file.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Merge bind_front and bind_back binders

The _Bind_front and _Bind_back class templates are now merged into a single
_Binder implementation that accepts _Back as a template parameter. This makes
the bind_back implementation available in C++20 mode, allowing it to be used
for range adaptor closures.

With zero bound arguments, bind_back and bind_front have equivalent
functionality. Consequently, _Bind_back_t now produces the same type as
bind_front (_Binder<false, _Fd>). A simple copy of the functor cannot be
returned in this case, as it would visibly affect overload resolution
(see included test cases).

We also replace std::invoke in internal functions, with std::__invoke.

libstdc++-v3/ChangeLog:

* include/std/functional: (std::_Indexed_bound_arg): Fixed
indentation.
(__Bound_arg_storage::_S_apply_front)
(__Bound_arg_storage::_S_apply_front): Merged into _S_apply.
(__Bound_arg_storage::_S_apply): Merged above, add _Back template
parameter, replace std::invoke with std::__invoke.
(std::_Bind_front): Renamed to std::_Binder and add _Back
template parameter.
(std::_Binder): Renamed from std::_Bind_front.
(_Binder::_Result_t, _Binder::_S_noexcept_invoke): Define.
(_Binder::operator()): Use _Result_t and _S_noexcept_invoke.
(_Binder::_S_call): Handle zero args specially, replace std::invoke
with std::__invoke.
(std::_Bind_front_t, std::_Bind_back_t): Defined in terms
of _Binder.
(std::_Bind_back): Merged into _Binder.
* testsuite/20_util/function_objects/bind_back/1.cc: New tests.
* testsuite/20_util/function_objects/bind_back/111327.cc: Updated
error messages.
* testsuite/20_util/function_objects/bind_front/1.cc: New tests.
* testsuite/20_util/function_objects/bind_front/111327.cc: Updated
error messages.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

TLC for vectorizable_reduction

The following removes never taken paths and consolidates the
nested_cycle and double_reduc variables which are the same.

* tree-vect-loop.cc (vectorizable_reduction): Eliminate
nested_cycle in favor of double_reduc and set that where
it makes most sense. Remove never taken paths and always
true conditions.

RISC-V: Use correct target in expand_vec_perm [PR121780].

This fixes a glaring mistake in yesterday's change to the expansion of
vec_perm. We should of course move tmp_target into the real target
and not the other way around. I wonder why my testing hasn't
caught this...

PR target/121742
PR target/121780
PR target/121781

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_perm): Swap target and
tmp_target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr121780.c: New test.
* gcc.target/riscv/rvv/autovec/pr121781.c: New test.

vect: Use poly_int64 for prolog bound.

Since peeling and version for alignment for VLA modes was introduced
(r16-3065-geee51f9a4b6) we have been seeing a lot of test suite failures
like

internal compiler error: in apply_scale, at profile-count.h:1187

This is because vect_gen_prolog_loop_niters sets the prolog bound to -1
in case align_in_elems is a non-constant poly_int.
bound - 1 is later used to scale the loop profile in scale_loop_profile
so we try to calculate with an assumed -2 iterations.

This patch changes bound_prolog to poly_int64, using a poly estimate for
frequency scaling but only records an iteration bound for the prolog if
the bound is a scalar.

PR/tree-optimization 121523

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_gen_prolog_loop_niters):
Change prolog bound to poly_int64.
(vect_gen_scalar_loop_niters): Ditto.
(vect_do_peeling): Use poly estimate for frequency scaling.

tree-optimization/121768 - bogus double reduction detected

The following changes how we detect double reductions, in particular
not setting vect_double_reduction_def on the outer PHIs when the inner
loop doesn't satisfy double reduction constraints. It also simplifies
the setup a bit by not having to detect wheter we process an inner
loop of a double reduction.

PR tree-optimization/121768
* tree-vect-loop.cc (vect_inner_phi_in_double_reduction_p): Remove.
(vect_analyze_scalar_cycles_1): Analyze inner loops of
double reductions immediately and only mark fully recognized
double reductions. Skip already analyzed inner loops.
(vect_is_simple_reduction): Change double_reduc from a flag
to an output of the inner loop PHI and to whether we are
processing an inner loop of a double reduction.

* gcc.dg/vect/pr121768.c: New testcase.

tree-optimization/121685 - accesses to *this are not trapping

When inside a method then we know the this pointer points to
an object of at least the size of the methods base type. We
can use this to compute more references as not trapping and
enable invariant motion and in turn vectorization as for a
slightly modified version of the testcase in the PR.

PR tree-optimization/121685
* tree-eh.cc (ref_outside_object_p): Split out from ...
(tree_could_trap_p): ... here. Assume the this pointer
of a method refers to an object of at least size of its
base type.

* g++.dg/vect/pr121685-1.cc: New testcase.

forwprop: Improve the reject case for copy prop [PR107051]

Currently the code rejects:
```
tmp = *a;
*b = tmp;
```
(unless *a == *b). This can be improved such that if a and b are known to
share the same base, then only reject it if they overlap; that is the
difference of the offsets (from the base) is maybe less than the size.

This fixes the testcase in comment #0 of PR 107051.

Changes since v1:
* v2: Use ranges_maybe_overlap_p instead of manually checking the overlap.
Allow for the case where the alignment is known to be greater than
the size.

PR tree-optimization/107051

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Allow for
memory sharing the same base if they known not to overlap over
the size.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-prop-aggregate-union-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

libstdc++: Add _GLIBCXX_RESOLVE_LIB_DEFECTS for 4314 in <mdspan>.

In r16-2328-g29d53f6213e0a1 we fixed a bug related to user-defined
objects that can convert to an integers only via an rvalue reference.
The same commit also implemented LWG 4314 [1], but didn't mark it with
_GLIBCXX_RESOLVE_LIB_DEFECTS. This commit adds the missing markers.

[1]: https://cplusplus.github.io/LWG/issue4314

It also fixes one cases of trailing white-space near a ctor for
aligned_accessor.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_left::mapping::operator()): Add
_GLIBCXX_RESOLVE_LIB_DEFECTS marker for 4314.
(layout_left::mapping::operator()): Ditto.
(layout_stride::mapping::operator()): Ditto.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>

RISC-V: Always register vector built-in functions during LTO [PR110812]

Previously, vector built-in functions were not properly registered during
the LTO pipeline, causing link failures when vector intrinsics were used
in LTO builds with mixed architecture options.  This patch ensures all
vector built-in functions are always registered during LTO compilation.

The key changes include:
- Moving pragma intrinsic flag manipulation from riscv-c.cc to
  riscv-vector-builtins.cc for better encapsulation
- Registering all vector built-in functions regardless of current ISA
  extensions, deferring the actual extension checking to expansion time
- Adding proper support for built-in type registration during LTO

This approach is safe because we already perform extension requirement
checking at expansion time.  The trade-off is a slight increase in
bootstrap time for LTO builds due to registering more built-in functions.

PR target/110812

gcc/ChangeLog:

* config/riscv/riscv-c.cc (pragma_intrinsic_flags): Remove struct.
(riscv_pragma_intrinsic_flags_pollute): Remove function.
(riscv_pragma_intrinsic_flags_restore): Remove function.
(riscv_pragma_intrinsic): Simplify to only call handle_pragma_vector.
* config/riscv/riscv-vector-builtins.cc (pragma_intrinsic_flags):
Move struct definition here from riscv-c.cc.
(riscv_pragma_intrinsic_flags_pollute): Move and adapt from
riscv-c.cc, add zvfbfmin, zvfhmin and vector_elen_bf_16 support.
(riscv_pragma_intrinsic_flags_restore): Move from riscv-c.cc.
(rvv_switcher::rvv_switcher): Add pollute_flags parameter to
control flag manipulation.
(rvv_switcher::~rvv_switcher): Restore flags conditionally.
(register_builtin_types): Use rvv_switcher without polluting flags.
(get_required_extensions): Remove function.
(check_required_extensions): Simplify to only check type validity.
(function_instance::function_returns_void_p): Move implementation
from header.
(function_builder::add_function): Register placeholder for LTO.
(init_builtins): Simplify and handle LTO case.
(reinit_builtins): Remove function.
(handle_pragma_vector): Remove extension checking.
* config/riscv/riscv-vector-builtins.h
(function_instance::function_returns_void_p): Add declaration.
(function_call_info::function_returns_void_p): Remove inline
implementation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/lto/pr110812_0.c: New test.
* gcc.target/riscv/lto/pr110812_1.c: New test.
* gcc.target/riscv/lto/riscv-lto.exp: New test driver.
* gcc.target/riscv/lto/riscv_vector.h: New header wrapper.

RISC-V: Fix extension subset check in riscv_can_inline_p

The extension subset check logic in riscv_ext_is_subset was incorrectly
inverted, causing functions with more extensions to be incorrectly
rejected from being inlined into functions with fewer extensions.

This patch fixes the logic to correctly check if the callee's required
extensions are a subset of the caller's extensions. The corrected logic
now properly allows inlining when the caller has all the extensions that
the callee requires.

gcc/
* common/config/riscv/riscv-common.cc (riscv_ext_is_subset): Fix
inverted logic in extension subset check.

gcc/testsuite/
* gcc.target/riscv/can_inline_p_test-01.c: New test.
* gcc.target/riscv/can_inline_p_test-02.c: New test.
* gcc.target/riscv/can_inline_p_test-03.c: New test.
* gcc.target/riscv/can_inline_p_test-04.c: New test.
* gcc.target/riscv/riscv_vector.h: New header wrapper for vector
tests.

bitint: Fix torture/bitint-14.c on bitint_extended targets [PR117599]

This patch fixes regressions of the gcc.dg/torture/bitint-* tests
caused by r16-3036-ga76a032354ee48 with --enable-checking=all.

The errors are similar to the following:

../../gcc/testsuite/gcc.dg/torture/bitint-14.c:54:1: error: type mismatch in 'array_ref'
<unnamed-signed:63>

unsigned long

_42 = VIEW_CONVERT_EXPR<unsigned long[10]>(r575[i_10])[8];
during GIMPLE pass: bitintlower0
../../gcc/testsuite/gcc.dg/torture/bitint-14.c:54:1: internal compiler error: verify_gimple failed

The first two hunks aren't strictly necessary, I'm just trying to
avoid calling build_qualified_type when it won't be needed.

At least on s390x-linux (tried cross) bitint-14.c doesn't ICE with it
anymore.

Though, I must say the more I look at the limb_access changes, the less
I like the abi_load_p stuff, so I think what we eventually should do instead
is return values with m_limb_type always.
For bitint_extended case (but only if we can prove that the extension there
is for the right precision and right sign) or !write_p just return it,
otherwise cast to lower precision and back to m_limb_type.
And on the other side on stores, for !bitint_extended happily store whatever
the whole m_limb_type value contains, for bitint_extended do the cast to
smaller precision and back on the writes.

2025-09-04 Jakub Jelinek <jakub@redhat.com>

PR target/117599
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Move
build_qualified_type calls into the if/else if/else bodies, for
the last one set ltype to m_limb_type first, drop limb_type_a
and use ltype instead.

tree-optimization/61247 - handle peeled converted IV in SCEV

The following handles SCEV analysis of a peeled converted IV if
that IV is known to not overflow.  For

  # _15 = PHI <_4(6), 0(5)>
  # i_18 = PHI <i_11(6), 0(5)>
  i_11 = i_18 + 1;
  _4 = (long unsigned int) i_11;

we cannot analyze _15 directly since the SCC has a widening
conversion.  But we can analyze _4 to (long unsigned int) {1, +, 1}_1
which is "peeled" (it's from after the first iteration of _15).
If the un-peeled IV {0, +, 1}_1 has the same initial value as _15
and it does not overflow then _15 can be analyzed as
{0ul, +, 1ul}_1.

The following implements this in simplify_peeled_chrec.

PR tree-optimization/61247
* tree-scalar-evolution.cc (simplify_peeled_chrec):
Handle the case of a converted peeled chrec.

* gcc.dg/vect/vect-pr61247.c: New testcase.

tree-optimization/121740 - handle aggregate zeroing as skipped may-def

The following makes value-numbering handle a situation like

  D.58046 = {};
  SR.83_44->i = {};
  pretmp_41 = MEM[(struct _Optional_payload_base &)&D.58046 + 8]._M_engaged;

where the intermediate may-def SR.83_44->i = {} prevents CSE of the
load to zero.  The problem is two-fold here, one is that the code
skipping may-defs does not handle zeroing via a CTOR, the other is that
(partial) must-defs can be better handled by later code as otherwise
we may not find an appropriate definition to CSE to.

I've noticed we fail to guard against storage-order issues, so fixed
that on the fly.

PR tree-optimization/121740
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow skipping
may-defs from CTORs.  Do not skip may-defs with storage-order
issues or (partial) must-defs.

* gcc.dg/tree-ssa/ssa-fre-104.c: Un-XFAIL.
* gcc.dg/tree-ssa/ssa-fre-110.c: New testcase.

libstdc++: Add stable names to C++98 implementation-defined docs.

Stable names are based on C++03 standard document, and some of then were
changed since then.

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerated the file.
* doc/xml/manual/status_cxx1998.xml: Add stable name to
each entry.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++/modules: Fix ADL [PR117658]

On looking again at [basic.lookup.argdep] p4, I believe GCC hasn't fully
implemented the wording here for ADL.  This patch fixes two issues.

First, 4.3 indicates that a function exported from a named module should
be visible to ADL regardless of whether it's visible to normal name
lookup, as long as some restrictions are followed.

This patch implements this; for skipping declarations that "do not
appear in the TU containing the point of lookup" I don't think there's
anything special we need to do, as any declarations before the point of
lookup will be found in other ways anyway, and any remaining
declarations from the current TU cannot be seen regardless.

Secondly, currently we only add the exported functions along the
instantiation path of a lookup.  But I don't think this is intended by
the current wording, so this patch adjusts that.  I also clean up the
logic to do all different module processing in adl_namespace_fns so that
we don't duplicate work in traversing the module binding list
unnecessarily.

This new handling means we need to do some extra work to properly error
on overload sets containing TU-local entities (as this might actually
come up now!) but I'm leaving that for a later patch.

As a drive-by fix this also fixes an ICE for C++26 expansion statements
with finding the instantiation path.

PR c++/117658

gcc/cp/ChangeLog:

* cp-tree.h (get_originating_module): Adjust parameter names.
* module.cc (path_of_instantiation): Handle C++26 expansion
statements.
* name-lookup.cc (name_lookup::adl_namespace_fns): Handle
exported declarations attached to the same module of an
associated entity with the same innermost non-inline namespace,
and non-exported functions on the instantiation path.
(name_lookup::search_adl): Build mapping of namespace to modules
that associated entities are attached to; remove now-unneeded
instantiation path handling.

gcc/testsuite/ChangeLog:

* g++.dg/modules/adl-4_a.C: Test should pass.
* g++.dg/modules/adl-4_b.C: Test should pass.
* g++.dg/modules/adl-6_a.C: New test.
* g++.dg/modules/adl-6_b.C: New test.
* g++.dg/modules/adl-6_c.C: New test.
* g++.dg/modules/adl-7_a.C: New test.
* g++.dg/modules/adl-7_b.C: New test.
* g++.dg/modules/adl-7_c.C: New test.
* g++.dg/modules/adl-8_a.C: New test.
* g++.dg/modules/adl-8_b.C: New test.
* g++.dg/modules/adl-8_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Mark implicit inline namespaces as purview [PR121724]

When we push an existing namespace within the module purview for the
first time, we also need to mark any parent inline namespaces as purview
to not confuse the streaming logic.

PR c++/121724

gcc/cp/ChangeLog:

* name-lookup.cc (push_namespace): Mark inline namespace
contexts as purview if needed.

gcc/testsuite/ChangeLog:

* g++.dg/modules/namespace-12_a.C: New test.
* g++.dg/modules/namespace-12_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

testsuite, darwin: Suppress unwind frames in scantest-lto.c.

Currently, for Darwin unwind and EH frames are emitted without use
of .cfi_xxx instructions; the emitted frames also contain the
string 'ascii'. For the purpose of this test, omit them.

PR testsuite/112728

gcc/testsuite/ChangeLog:

* gcc.dg/scantest-lto.c: Omit unwind frames.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Daily bump.

RISC-V: Add support for the XAndesbfhcvt ISA extension.

This extension defines instructions to perform scalar floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754
32-bit single-precision floating-point (SP) data in a scalar
floating point register.

gcc/ChangeLog:

* config/riscv/andes.def: Add nds_fcvt_s_bf16 and nds_fcvt_bf16_s.
* config/riscv/riscv.md (truncsfbf2): Add TARGET_XANDESBFHCVT support.
(extendbfsf2): Ditto.
* config/riscv/riscv-builtins.cc: New AVAIL andesbfhcvt.
Add new define RISCV_ATYPE_BF and RISCV_ATYPE_SF.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandes/xandesbfhcvt-1.c: New test.
* gcc.target/riscv/xandes/xandesbfhcvt-2.c: New test.

RISC-V: Add support for the XAndesperf ISA extension.

This patch adds support for the XAndesperf ISA extension.
The 32-bit AndeStar V5 extension includes branch instructions,
load effective address instructions, and string processing
instructions for performance improvement.
New INSN patterns are added into the new file andes.md
as a seprated vender extension.

gcc/ChangeLog:

* config/riscv/constraints.md (Ou07): New constraint.
(ads_Bext): New constraint.
* config/riscv/iterators.md (ANYLE32): New iterator.
(sizen): New iterator.
(sh_limit): New iterator.
(sh_bit): New iterator.
(cs): New iterator.
* config/riscv/predicates.md (ads_branch_bbcs_operand): New predicate.
(ads_branch_bimm_operand): New predicate.
(ads_imm_extract_operand): New predicate.
(ads_extract_size_imm_si): New predicate.
(ads_extract_size_imm_di): New predicate.
(const_int5_operand): New predicate.
* config/riscv/riscv-builtins.cc:
Add new AVAIL andesperf32 and andesperf64.
Add new define RISCV_ATYPE_DI.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
* config/riscv/riscv.cc
(riscv_extend_cost): Cost for pattern 'bfo'.
(riscv_rtx_costs): Cost for XAndesperf extension.
* config/riscv/riscv.md: Add support for XAndesperf to patterns
zero_extendsidi2_internal, zero_extendhi2, extendsidi2_internal,
extend<SHORT:mode><SUPERQI:mode>2, <any_extract:optab><GPR:mode>3
and branch_on_bit.
* config/riscv/vector-iterators.md
(sz): Add sign_extract and zero_extract.
* config/riscv/andes.def: New file for vender Andes.
* config/riscv/andes.md: New file for vender Andes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/riscv.exp: Add runtest for subdir xandes.
* gcc.target/riscv/xandes/xandesperf-1.c: New test.
* gcc.target/riscv/xandes/xandesperf-10.c: New test.
* gcc.target/riscv/xandes/xandesperf-2.c: New test.
* gcc.target/riscv/xandes/xandesperf-3.c: New test.
* gcc.target/riscv/xandes/xandesperf-4.c: New test.
* gcc.target/riscv/xandes/xandesperf-5.c: New test.
* gcc.target/riscv/xandes/xandesperf-6.c: New test.
* gcc.target/riscv/xandes/xandesperf-7.c: New test.
* gcc.target/riscv/xandes/xandesperf-8.c: New test.
* gcc.target/riscv/xandes/xandesperf-9.c: New test.

RISC-V: Add basic XAndes vendor extension support.

This patch add basic support for the following XAndes ISA extensions:

XANDESPERF
XANDESBFHCVT
XANDESVBFHCVT
XANDESVSINTLOAD
XANDESVPACKFPH
XANDESVDOT

gcc/ChangeLog:

* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
(XANDESPERF) : New mask.
(XANDESBFHCVT): Ditto.
(XANDESVBFHCVT): Ditto.
(XANDESVSINTLOAD): Ditto.
(XANDESVPACKFPH): Ditto.
(XANDESVDOT): Ditto.
* config/riscv/t-riscv: Add riscv-ext-andes.def.
* doc/riscv-ext.texi: Regenerated.
* config/riscv/riscv-ext-andes.def: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandes/xandes-predef-1.c: New test.
* gcc.target/riscv/xandes/xandes-predef-2.c: New test.
* gcc.target/riscv/xandes/xandes-predef-3.c: New test.
* gcc.target/riscv/xandes/xandes-predef-4.c: New test.
* gcc.target/riscv/xandes/xandes-predef-5.c: New test.
* gcc.target/riscv/xandes/xandes-predef-6.c: New test.

Co-author: Lino Hsing-Yu Peng (linopeng@andestech.com)
Co-author: Kai Kai-Yi Weng (kaiweng@andestech.com).

RISC-V: Add pattern for vector-scalar floating-point max

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an smax RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmax.vv       v1,v1,v2

After, we get only one:
  vfmax.vf       v1,v1,fa0

In some cases, it also shaves off one vsetvli.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfmax_vf_<mode>): Rename into...
(*vf<optab>_vf_<mode>): New pattern to combine vec_duplicate +
vf{min,max}.vv into vf{max,min}.vf.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmax. Also add
missing scan-dump for vfmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Add vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add max functions.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f64.c: New test.

Fortran: fix TRANSFER with rank 1 unlimited polymorphic SOURCE [PR121263]

PR fortran/121263

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_transfer): For an
unlimited polymorphic SOURCE to TRANSFER use saved descriptor
if possible.

gcc/testsuite/ChangeLog:

* gfortran.dg/transfer_class_5.f90: New test.

libstdc++: Implement LWG4222 'expected' constructor from a single value missing a constraint

libstdc++-v3/ChangeLog:

* include/std/expected (expected(U&&)): Add missing constraint
as per LWG 4222.
* testsuite/20_util/expected/lwg4222.cc: New test.

Signed-off-by: Yihan Wang <yronglin777@gmail.com>

[RISC-V][PR target/121213] Avoid unnecessary sign extension in amoswap sequence

This is Austin's work to remove the redundant sign extension seen in pr121213.

--

The .w form of amoswap will sign extend its result from 32 to 64 bits, thus any
explicit sign extension insn doing the same is redundant.

This uses Jivan's approach of allocating a DI temporary for an extended result
and using a promoted subreg extraction to get that result into the final
destination.

Tested with no regressions on riscv32-elf and riscv64-elf and bootstrapped on
the BPI and pioneer systems.

PR target/121213
gcc/
* config/riscv/sync.md (amo_atomic_exchange_extended<mode>):
Separate insn with sign extension for 64 bit targets.

gcc/testsuite
* gcc.target/riscv/amo/pr121213.c: Remove xfail.