git.ipfire.org Git - thirdparty/gcc.git/log

Restore STMT_VINFO_VECTYPE during analysis, set to NULL for all stmts

The following makes vect_analyze_stmt call vectorizable_* with all
STMT_VINFO_VECTYPE NULL_TREE but restores the value for eventual
iteration with single-lane SLP. It clears it for every stmt during
vect_transform_stmt.

* tree-vect-stmts.cc (vect_transform_stmt): Clear
STMT_VINFO_VECTYPE for all stmts.
(vect_analyze_stmt): Likewise. But restore at the end again.

tree-optimization/121754 - ICE with vect_reduc_type and nested cycle

The reduction guard isn't correct, STMT_VINFO_REDUC_DEF also exists
for nested cycles not part of reductions but there's no reduction
info for them.

PR tree-optimization/121754
* tree-vectorizer.h (vect_reduc_type): Simplify to not ICE
on nested cycles.

* gcc.dg/vect/pr121754.c: New testcase.
* gcc.target/aarch64/vect-pr121754.c: Likewise.

Avoid touching STMT_VINFO_VECTYPE in bump_vector_ptr

bump is always specified, so remove the STMT_VINFO_VECTYPE touching
path.

* tree-vect-data-refs.cc (bump_vector_ptr): Remove the
STMT_VINFO_VECTYPE use, bump is always specified.

Pass vectype to vect_check_gather_scatter

The strided-store path needs to have the SLP trees vector type so
the following patch passes dowm the vector type to be used to
vect_check_gather_scatter and adjusts all other callers. This
removes one of the last pieces requiring STMT_VINFO_VECTYPE
during SLP stmt analysis.

* tree-vectorizer.h (vect_check_gather_scatter): Add
vectype parameter.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Get
vectype as parameter.
(vect_analyze_data_refs): Adjust.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Get vectype
as parameter, pass down.
(vect_build_slp_tree_2): Adjust.
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Likewise.
(vect_use_strided_gather_scatters_p): Likewise.

libstdc++: Rename __cmp_cat::__unspec to __cmp_cat::__literal_zero.

This slightly improve the readability of error message, by suggesting
that 0 (literal) is expected as argument:
invalid conversion from 'int' to 'std::__cmp_cat::__literal_zero*'

libstdc++-v3/ChangeLog:

* libsupc++/compare (__cmp_cat::__literal_zero): Rename
from __unspec.
(__cmp_cat::__unspec): Rename to __literal_zero.
(operator==, operator<, operator>, operator<=, operator>=):
Replace __cmp_cat::__unspec to __cmp_cat::__literal_zero.

doc: Fix sort order for counted_by attribute

gcc/ChangeLog:

* doc/extend.texi (Common Variable Attributes): Put counted_by
in alphabetical order.

tree-cfg: Fix up assign_discriminator ICE with too large #line [PR121663]

As mentioned in the PR, LOCATION_LINE is represented in an int,
and while we have -pedantic diagnostics (and -pedantic-error error)
for too large #line, we can still overflow into negative line
numbers up to -2 and -1.  We could overflow to that even with valid
source if it says has #line 2147483640 and then just has
2G+ lines after it.
Now, the ICE is because assign_discriminator{,s} uses a hash_map
with int_hash <int64_t, -1, -2>, so values -2 and -1 are reserved
for deleted and empty entries.  We just need to make sure those aren't
valid.  One possible fix would be just that
-  discrim_entry &e = map.get_or_insert (LOCATION_LINE (loc), &existed);
+  discrim_entry &e
+    = map.get_or_insert ((unsigned) LOCATION_LINE (loc), &existed);
by adding unsigned cast when the key is signed 64-bit, it will never
be -1 or -2.
But I think that is wasteful, discrim_entry is a struct with 2 unsigned
non-static data members, so for lines which can only be 0 to 0xffffffff
(sure, with wrap-around), I think just using a hash_map with 96bit elts
is better than 128bit.
So, the following patch just doesn't assign any discriminators for lines
-1U and -2U, I think that is fine, normal programs never do that.
Another possibility would be to handle lines -1U and -2U as if it was say
-3U.

2025-09-02  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/121663
* tree-cfg.cc (assign_discriminator): Change map argument type
from hash_map with int_hash <int64_t, -1, -2> to one with
int_hash <unsigned, -1U, -2U>.  Cast LOCATION_LINE to unsigned.
Return early for (unsigned) LOCATION_LINE above -3U.
(assign_discriminators): Change map type from hash_map with
int_hash <int64_t, -1, -2> to one with int_hash <unsigned, -1U, -2U>.

* gcc.dg/pr121663.c: New test.

testsuite: Fix gcc.dg/tree-ssa/cswtch-[67].c on Solaris/SPARC with as

The gcc.dg/tree-ssa/cswtch-[67].c tests FAIL on Solaris/SPARC with the
native as:

FAIL: gcc.dg/tree-ssa/cswtch-6.c scan-assembler .rodata.cst16
FAIL: gcc.dg/tree-ssa/cswtch-7.c scan-assembler .rodata.cst32

The issue is the same in both cases: compared to the gas version, with
as there's only

- .section .rodata.cst32,"aM",@progbits,32
+ .section ".rodata"

It turns out that varasm.c (mergeable_constant_section) only emits the
former if HAVE_GAS_SHF_MERGE, which is 0 with the native as.

Fixed by xfailing the tests in this case.

Tested on sparc-sun-solaris2.11 with both as and gas.

2025-07-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.dg/tree-ssa/cswtch-6.c (dg-final): xfail on
sparc*-*-solaris2* && !gas.
* gcc.dg/tree-ssa/cswtch-7.c: Likewise.

RISC-V: Remove unused print_ext_doc_entry function [NFC]

The print_ext_doc_entry function and associated version_t struct in
gen-riscv-ext-opt.cc were not being used anywhere in the codebase.
Remove them to clean up the code.

gcc/
* config/riscv/gen-riscv-ext-opt.cc (version_t): Remove unused
struct.
(print_ext_doc_entry): Remove unused function.

Testsuite: Don't test vector-compare-1.C on strict alignment targets

This testcase will fail on strict alignment targets due to
the requirement of doing a possible unaligned load. This fixes
that.

Note this testcase still fails on arm (and maybe riscv) targets while
having unaligned loads, they have slow ones.

Pushed as obvious after testing on x86_64-linux-gnu to make sure it
is still testing.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/vector-compare-1.C: Restrict to
non_strict_align targets.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Daily bump.

install: Fix spelling of "support" and "arithmetic"

gcc:
* doc/install.texi (Configuration): Fix spelling of "support"
and "floating-point arithmetic".

Signed-off-by: Jonathan Grant <jg@jguk.org>

Fix assertion when trying to represent Ada arrays in CodeView

The LF_ARRAY CodeView type represents a C- or C++-style array, which a
length known at compile time. We were crashing when using -gcodeview
with Ada (bug #121157), as the DW_AT_upper_bound value is not an
unsigned integer but something more complicated:

0x00000123:   DW_TAG_array_type
                DW_AT_type      (0x0000014d "character")
                DW_AT_sibling   (0x00000142)

0x0000012c:     DW_TAG_subrange_type
                  DW_AT_type    (0x00000142 "integer")
                  DW_AT_lower_bound     (DW_OP_push_object_address, DW_OP_plus_uconst 0x8, DW_OP_deref, DW_OP_deref_size 0x4)
                  DW_AT_upper_bound     (DW_OP_push_object_address, DW_OP_plus_uconst 0x8, DW_OP_deref, DW_OP_plus_uconst 0x4, DW_OP_deref_size 0x4)

It doesn't look like we can represent Ada arrays in CodeView, so return
0 in get_type_num_array_type so that they come through as an unknown
type.

gcc/
* dwarf2codeview.cc (get_type_num_array_type): Don't try to
encode non-C-style arrays.

maintainer-scripts: Improve syncing of libstdc++ docs

rsync generally is a more commonly used tool for syncing data - among
others it retains time stamps and is able to remove orphaned files on
the receiver side.

We just need to exclude some directories and a symlink from being
removed as "orphaned", since they originate elsewhere.

maintainer-scripts:
* update_web_docs_libstdcxx_git: Copy our "inner" documentation
into the web area using rsync instead of cpio and remove orphaned
files.

c: Implement C2Y N3457 - The __COUNTER__ predefined macro

The following patch implements the
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3457.htm
paper without the first 3 lines in Recommended practice.
Seems GCC behavior already matches the expected behavior except for
diagnostics of more than 2147483648 __COUNTER__ expansions, so the
patch adds a diagnostic for that (but not testcase because
#define A __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__
#define B A A A A A A A A
#define C B B B B B B B B
#define D C C C C C C C C
#define E D D D D D D D D
#define F E E E E E E E E
#define G F F F F F F F F
#define H G G G G G G G G
#define I H H H H H H H H
#define J I I I I I I I I
J J J J
__COUNTER__
just takes too long to preprocess).
Plus I've included all the snippets from the paper into one testcase.

2025-09-01 Jakub Jelinek <jakub@redhat.com>

* macro.cc: Implement C2Y N3457 - The __COUNTER__ predefined macro.
(_cpp_builtin_macro_text): Diagnose if __COUNTER__ reaches
2147483648 value.

* gcc.dg/cpp/c2y-counter-1.c: New test.

c: Rename uimaxabs to umaxabs

The following patch implements
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3577.txt
No big deal on the GCC side, for uimaxabs we just won't
recognize it as builtin and I don't see it worth preserving
__builtin_uimaxabs, I doubt anything but gcc testsuite used
that.
But on the glibc side I think it will need to remain exported
for ABI compatibility :(

2025-09-01 Jakub Jelinek <jakub@redhat.com>

* builtins.def: Implement C2Y N3577 - Rename s/uimaxabs/umaxabs/.
(BUILT_IN_UIMAXABS): Rename to ...
(BUILT_IN_UMAXABS): ... this. Change second argument to "umaxabs".
* builtins.cc (fold_builtin_1): Use BUILT_IN_UMAXABS rather than
BUILT_IN_UIMAXABS.

* gcc.c-torture/execute/builtins/lib/abs.c (uimaxabs): Rename to ...
(umaxabs): ... this.
* gcc.c-torture/execute/builtins/uabs-2.c (uimaxabs): Rename to ...
(umaxabs): ... this.
(main_test): Use umaxabs instead of uimaxabs.
* gcc.c-torture/execute/builtins/uabs-3.c (main_test): Use umaxabs
instead of uimaxabs.

Fortran: truncate constant string passed to character,value dummy [PR121727]

PR fortran/121727

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_const_length_character_type_p): New helper
function.
(conv_dummy_value): Use it to determine if a character actual
argument has a constant length. If a character actual argument is
constant and longer than the dummy, truncate it at compile time.

gcc/testsuite/ChangeLog:

* gfortran.dg/value_10.f90: New test.

doc: Update perfwiki web address

gcc:
* doc/invoke.texi (Optimize Options): Update the perfwiki web
address.

diagnostics: Fix bootstrap fail on Darwin 32b hosts.

The use of HOST_SIZE_T_PRINT_HEX needs to be paired with a c-style
cast to (fmt_size_t) otherwise the detection mechanisms in hwint.h
are not sufficient to deal with size_t defined as 'long unsigned int'
which is done on Darwin (and I think on Windows).

This patch just makes that update.

gcc/ChangeLog:

* diagnostics/logging.h (log_param_location_t): Cast
location_t value to fmt_size_t.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

configure, Darwin: Do not claim .cfi_xxx instruction support.

While the assemblers used by Darwin that are based on LLVM, do
support .cfi_ instructions, their use triggers production of
compact unwind which currently does not interwork properly with
GCC's output.

When the system objdump is used in the configure process this is
currently working by good fortune (the objdump does not recognise
the command and we fail to detect the cfi_advance.

However, if a user has binutils objdump earlier in thier PATH then
we will detect support and try to use .cfi_ which will cause later
and hard-to-diagnose issues.

Until we have this resolved, force cfi instruction use off for
Darwin.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Do not claim cfi instruction support even
if the assembler has it.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

PR target/89828 Inernal compiler error on "-fno-omit-frame-pointer"

The problem was caused by an erroneous note about creating a stack frame,
which caused the cur_cfa reg to fail to assert with a value other than
the frame pointer.

This fix will generate notes that correctly update cur_cfa.

v2 changes.
Add testcase.
All tests that failed with
"internal compiler error: in dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.cc"
now pass.

PR target/89828
gcc
* config/rx/rx.cc (add_pop_cfi_notes): Release the frame pointer if it is
used.
(rx_expand_prologue): Redesigned stack pointer and frame pointer update
process.

gcc/testsuite/
* gcc.dg/pr89828.c: New.

Add default arch/tuning to shift-gf2p8affine test cases

This makes them not fail during test suite runs with overriden arch or
tunings.

gcc/testsuite/ChangeLog:

* gcc.target/i386/shift-gf2p8affine-1.c: Use -march=x86-64
-mtune-generic.
* gcc.target/i386/shift-gf2p8affine-2.c: Dito.
* gcc.target/i386/shift-gf2p8affine-3.c: Dito.
* gcc.target/i386/shift-gf2p8affine-5.c: Dito.
* gcc.target/i386/shift-gf2p8affine-6.c: Dito.
* gcc.target/i386/shift-gf2p8affine-7.c: Dito.

testsuite: arm: factorize arm_v8_neon_ok flags

Like we do in other effective-targets, add
"-mcpu=unset -march=armv8-a"
directly when setting et_arm_v8_neon_flags in arm_v8_neon_ok_nocache,
to avoid having to add these two flags in all users of arm_v8_neon_ok.

This avoids duplication and possible typos / oversights.

gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_neon_ok_nocache): Add "-mcpu=unset
-march=armv8-a" to et_arm_v8_neon_flags.
(add_options_for_vect_early_break): Remove useless "-mcpu=unset
-march=armv8-a".
(add_options_for_arm_v8_neon): Likewise.

testsuite: arm: remove arm32 check from a few effective-targets

A few arm effective-targets call check_effective_target_arm32 even
though they would force a -march=XXX flag which supports Arm and/or
Thumb-2, thus making the arm32 check useless. This has an impact when
the toolchain is configured with a default -march or -mcpu which
supports Thumb-1 only: in such a case, arm32 is false and we skip many
tests, thus reducing coverage.

This patch removes the call to check_effective_target_arm32 where it
is useless, enabling about 2000 tests.

In addition, add an early exit if the target is not an arm one, thus
saving a few compilation cycles where not needed. In all callers of
arm_neon_ok, remove the now useless "istarget arm*-*-*.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_neon_ok_nocache): Remove arm32 check.
Add istarget arm*-*-* check.
(check_effective_target_arm_neon_fp16_ok_nocache): Likewise.
(check_effective_target_arm_neon_softfp_fp16_ok_nocache): Likewise.
(check_effective_target_arm_v8_neon_ok_nocache): Likewise.
(check_effective_target_arm_neonv2_ok_nocache): Likewise.
(check_effective_target_vect_pack_trunc): Remove istarget arm*-*-*
check.
(check_effective_target_vect_unpack): Likewise.
(check_effective_target_vect_condition): Likewise.
(check_effective_target_vect_cond_mixed): Likewise.
(available_vector_sizes): Likewise.

tree-optimization/121744 - handle CST << var in shift pattern recog

We currently do not handle promotion/demotion of 'var' when the
left operand of a variable shift is constant. There's no good
reason why, so the following fixes this omission.

PR tree-optimization/121744
* tree-vect-patterns.cc (vect_recog_vector_vector_shift_pattern):
Allow constant left operand.

* gcc.dg/vect/pr121744-1.c: New testcase.

Eliminate some STMT_VINFO_REDUC_IDX for SLP_TREE_REDUC_IDX

The following uses SLP_TREE_REDUC_IDX where it looks more appropriate.

* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use SLP_TREE_REDUC_IDX for following the SLP graph and
for identifying whether we use the 'else' in a COND.
(vectorizable_lane_reducing): Simplify check of whether
we are in a reduction.
(vectorizable_reduction): Add sanity checking around
SLP_TREE_REDUC_IDX and use it where it looks appropriate.
(vect_transform_reduction): Use SLP_TREE_REDUC_IDX.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_condition): Likewise.

Remove no longer needed STMT_VINFO_REDUC_DEF sets

The following removes no longer needed extra sets of STMT_VINFO_REDUC_DEF
and replaces a single remaining one with a more appropriate check.

* tree-vect-loop.cc (vectorizable_live_operation): Check
vect_is_reduction on the SLP node rather than
STMT_VINFO_REDUC_DEF on the stmt.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_DEF
on live stmts.

Introduce abstraction for vect reduction info, tracked from SLP nodes

While we have already the accessor info_for_reduction, its result
is a plain stmt_vec_info.  The following turns that into a class
for the purpose of changing accesses to reduction info to a new
set of accessors prefixed with VECT_REDUC_INFO and removes
the corresponding STMT_VINFO prefixed accessors where possible.

There is few reduction related things that are used by scalar
cycle detection and thus have to stay as-is for now and as
copies in future.

This also separates reduction info into one object per reduction
and associate it with SLP nodes, splitting it out from
stmt_vec_info, retaining (and duplicating) parts used by scalar
cycle analysis.  The data is then associated with SLP nodes
forming reduction cycles and accessible via info_for_reduction.
The data is created at SLP discovery time as we look at it even
pre-vectorizable_reduction analysis, but most of the data is
only populated by the latter.  There is no reduction info with
nested cycles that are not part of an outer reduction.
In the process this adds cycle info to each SLP tree, notably
the reduc-idx and a way to identify the reduction info.

* tree-vectorizer.h (vect_reduc_info): New.
(create_info_for_reduction): Likewise.
(VECT_REDUC_INFO_TYPE): Likewise.
(VECT_REDUC_INFO_CODE): Likewise.
(VECT_REDUC_INFO_FN): Likewise.
(VECT_REDUC_INFO_SCALAR_RESULTS): Likewise.
(VECT_REDUC_INFO_INITIAL_VALUES): Likewise.
(VECT_REDUC_INFO_REUSED_ACCUMULATOR): Likewise.
(VECT_REDUC_INFO_INDUC_COND_INITIAL_VAL): Likewise.
(VECT_REDUC_INFO_EPILOGUE_ADJUSTMENT): Likewise.
(VECT_REDUC_INFO_FORCE_SINGLE_CYCLE): Likewise.
(VECT_REDUC_INFO_RESULT_POS): Likewise.
(VECT_REDUC_INFO_VECTYPE): Likewise.
(STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL): Remove.
(STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise.
(STMT_VINFO_FORCE_SINGLE_CYCLE): Likewise.
(STMT_VINFO_REDUC_FN): Likewise.
(STMT_VINFO_REDUC_VECTYPE): Likewise.
(vect_reusable_accumulator::reduc_info): Adjust.
(vect_reduc_type): Adjust.
(_slp_tree::cycle_info): New member.
(SLP_TREE_REDUC_IDX): Likewise.
(vect_reduc_info_s): Move/copy data from ...
(_stmt_vec_info): ... here.
(_loop_vec_info::redcu_infos): New member.
(info_for_reduction): Adjust to take SLP node.
(vect_reduc_type): Adjust.
(vect_is_reduction): Add overload for SLP node.
* tree-vectorizer.cc (vec_info::new_stmt_vec_info):
Do not initialize removed members.
(vec_info::free_stmt_vec_info): Do not release them.
* tree-vect-stmts.cc (vectorizable_condition): Adjust.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
cycle info.
(vect_build_slp_tree_2): Compute SLP reduc_idx and store
it.  Create, populate and propagate reduction info.
(vect_print_slp_tree): Print cycle info.
(vect_analyze_slp_reduc_chain): Set cycle info on the
manual added conversion node.
(vect_optimize_slp_pass::start_choosing_layouts): Adjust.
* tree-vect-loop.cc (_loop_vec_info::~_loop_vec_info):
Release reduction infos.
(info_for_reduction): Get the reduction info from
the vector in the loop_vinfo.
(vect_create_epilog_for_reduction): Adjust.
(vectorizable_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise, deal with nested
cycles not part of a double reduction have no reduction info.
* config/aarch64/aarch64.cc (aarch64_force_single_cycle):
Use VECT_REDUC_INFO_FORCE_SINGLE_CYCLE, get SLP node and use
that.
(aarch64_vector_costs::count_ops): Adjust.

install.texi: For amdgcn, update Newlib version recommendation

Add two Newlib commits to the recommended Newlib version,
fixing two other SIMD issues.
Cf. PR target/121392 and Newlib Bug 33272.

gcc/ChangeLog:

PR target/121392
* doc/install.texi (amdgcn): Mention Newlib commit
that fixes another SIMD issue.

Simplify vectorizer IV analysis

The following simplifies the flow of IV analysis a bit.

* tree-vect-loop.cc (vect_is_simple_iv_evolution): Get
stmt_info and store into STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
and STMT_VINFO_LOOP_PHI_EVOLUTION_PART here. Drop unused
output parameters.
(vect_is_nonlinear_iv_evolution): Likewise.
(vect_analyze_scalar_cycles_1): Remove redundant setting
of STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED and
STMT_VINFO_LOOP_PHI_EVOLUTION_PART.

ira: Remove soft conflict related code in improve_allocation. [PR117838]

The original intention of this code was to allow more allocnos
to share the same register, but this led to expensive allocno
overflows. Extracted a small case (a bit large, see Bugzilla
PR117838 for details) from 548.exchange2_r to analyze this
register allocation issue.

Before improve_allocation function:

a537 (cost 1896, reg42)
a20  (cost 270, reg1)
a13  (cost 144, spill)
a551 (cost 70, reg40)
a5   (cost 43, spill)
a493 (cost 30, reg42)
a499 (cost 12, reg40)

------------------------------
Dump info in improve_allocation function:

Base:
Spilling a493r125 for a5r113
Spilling a573r202 for a5r113
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a20r237 for a551r120

With patch:
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a493r125 for a551r120
------------------------------

After assign_hard_reg (at the end of improve_allocation):

Base:
a537 (cost 1896, reg1)
a20  (cost 270, spill) -----> This is unreasonable
a13  (cost 144, reg40)
a551 (cost 70, reg1)
a5   (cost 43, reg42)
a493 (cost 30, spill)
a499 (cost 12, reg1)

With patch:
a537 (cost 1896, reg42)
a20  (cost 270, reg1)
a13  (cost 144, reg40)
a551 (cost 70, reg42)
a5   (cost 43, spill)
a493 (cost 30, spill)
a499 (cost 12, reg42)
-----------------------------

Collected spec2017 performance on Znver3/Graviton4/EMR/SRF for O2 and Ofast.
No performance regression was observed.

FOR multi-copy O2
SRF: 548.exchange2_r increased by 7.5%, 500.perlbench_r increased by 2.0%.
EMR: 548.exchange2_r increased by 4.5%, 500.perlbench_r increased by 1.7%.
Graviton4: 548.exchange2_r Increased by 2.2%, 511.povray_r increased by 2.8%.
Znver3 : 500.perlbench_r increased by 2.0%.

gcc/ChangeLog:

PR rtl-optimization/117838
* ira-color.cc (improve_allocation): Remove soft conflict related code.

Fix ICE due to wrong operand is passed to ix86_vgf2p8affine_shift_matrix.

1) Fix predicate of operands[3] in cond_<insn><mode> since only
const_vec_dup_operand is excepted for masked operations, and pass real
count to ix86_vgf2p8affine_shift_matrix.

2) Pass operands[2] instead of operands[1] to
gen_vgf2p8affineqb_<mode>_mask which excepted the operand to shifted,
but operands[1] is mask operand in cond_<insn><mode>.

gcc/ChangeLog:

PR target/121699
* config/i386/predicates.md (const_vec_dup_operand): New
predicate.
* config/i386/sse.md (cond_<insn><mode>): Fix predicate of
operands[3], and fix wrong operands passed to
ix86_vgf2p8affine_shift_matrix and
gen_vgf2p8affineqb_<mode>_mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr121699.c: New test.

Daily bump.

xtensa: Optimize branch whether (reg:SI) is within/out the range handled by CLAMPS instruction

The CLAMPS instruction in Xtensa ISA, provided when the TARGET_CLAMPS
configuration is enabled (and also requires TARGET_MINMAX), returns a
value clamped the number in the specified register to between -(1<<N) and
(1<<N)-1 inclusive, where N is an immediate value from 7 to 22.

Therefore, when the above configurations are met, by comparing the clamped
result with the original value for equality, branching whether the value
is within the range mentioned above or not is implemented with fewer
instructions, especially when the upper and lower bounds of the range are
too large to fit into a single immediate assignment.

     /* example (TARGET_MINMAX and TARGET_CLAMPS) */
     extern void foo(void);
     void test0(int a) {
       if (a >= -(1 << 9) && a < (1 << 9))
         foo();
     }
     void test1(int a) {
       if (a < -(1 << 20) || a >= (1 << 20))
         foo();
     }

     ;; before
     test0:
      entry sp, 32
      addmi a2, a2, 0x200
      movi a8, 0x3ff
      bltu a8, a2, .L1
      call8 foo
     .L1:
      retw.n
     test1:
      entry sp, 32
      movi.n a9, 1
      movi.n a8, -1
      slli a9, a9, 20
      srli a8, a8, 11
      add.n a2, a2, a9
      bgeu a8, a2, .L4
      call8 foo
     .L4:
      retw.n

     ;; after
     test0:
      entry sp, 32
      clamps a8, a2, 9
      bne a2, a8, .L1
      call8 foo
     .L1:
      retw.n
     test1:
      entry sp, 32
      clamps a8, a2, 20
      beq a2, a8, .L4
      call8 foo
     .L4:
      retw.n

(note: Currently, in the RTL instruction combination pass, the possible
const_int values are fundamentally constrained by
TARGET_LEGITIMATE_CONSTANT_P() if no bare large constant assignments are
possible (i.e., neither -mconst16 nor -mauto-litpools), so limiting N to
a range of 7 to only 10 instead of to 22.  A series of forthcoming
patches will introduce an entirely new "xt_largeconst" pass that will
solve several issues including this.)

gcc/ChangeLog:

* config/xtensa/predicates.md (alt_ubranch_operator):
New predicate.
* config/xtensa/xtensa.md (*eqne_in_range):
New insn_and_split pattern.

Fortran: Pass PDTs to dummies with VALUE attribute [PR99709]

2025-08-31 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/99709
* trans-array.cc (structure_alloc_comps): For the case
COPY_ALLOC_COMP, do a deep copy of non-allocatable PDT arrays
Suppress the use of 'duplicate_allocatable' for PDT arrays.
* trans-expr.cc (conv_dummy_value): When passing to a PDT dummy
with the VALUE attribute, do a deep copy to ensure that
parameterized components are reallocated.

gcc/testsuite/
PR fortran/99709
* gfortran.dg/pdt_41.f03: New test.

[RISC-V] Improve initial RTL generation for SImode adds on rv64

So this is the next chunk of Shreya's work to adjust our add expanders. In this
patch we're adding support for adding a 2*s12 immediate in SI for rv64.

To recap, the basic idea is reduce our reliance on the define_insn_and_split
that was added a year or so ago by synthesizing the more efficient sequence at
expansion time.  By handling this early rather than late the synthesized
sequence participates in the various optimizer passes in the natural way.  In
contrast using the define_insn_and_split bypasses the cost modeling in combine
and hides the synthesis until after reload as completed (which in turn leads to
the problems seen in pr120811).

This doesn't solve pr120811, but it is the last prerequisite patch before
directly tackling pr120811.

This has been bootstrapped & regression tested on the pioneer & bpi and been
through the usual testing on riscv32-elf and riscv64-elf.  Waiting on
pre-commit CI before moving forward.

gcc/
* config/riscv/riscv-protos.h (synthesize_add_extended): Prototype.
* config/riscv/riscv.cc (synthesize_add_extended): New function.
* config/riscv/riscv.md (addsi3): For RV64, try synthesize_add_extended.

gcc/testsuite/
* gcc.target/riscv/add-synthesis-2.c: New test.

install: Drop MinGW binaries download link

This has been unavailable for well over a year.

gcc:
* doc/install.texi (Binaries): Drop MinGW.

libstdc++: Update link to Boost "Exception-Safety"

libstdc++-v3:
* doc/xml/manual/using_exceptions.xml: Update link to
Boost's "Exception-Safety"
* doc/html/manual/using_exceptions.html: Rebuild.

libstdc++: Fix bootstrap failures in src/c++26/debugging.cc

ptrace on Darwin requires <sys/types.h>.

The inline x86 asm doesn't work with the Solaris assembler.

libstdc++-v3/ChangeLog:

* src/c++26/debugging.cc [_GLIBCXX_HAVE_SYS_PTRACE_H]: Include
<sys/types.h>.
(breakpoint) [__i386__ || __x86_64__]: Use "int 0x03" instead of
"int3".

RISC-V: Add test case for unsigned scalar SAT_MUL form 4

The form 4 of unsigned scalar SAT_MUL is covered in middle-expand
alreay, add test case here to cover form 4.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u64.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

phiopt, math-opts: Adjust spaceship_replacement and optimize_spaceship for recent libstdc++ changes [PR121698]

libstdc++ changed its ABI in <compare> for C++20 recently (under the
C++20 is still experimental rule).  In addition to the -1, 0, 1 values
for less, equal, greater it now uses -128 for unordered instead of
former 2 and changes some of the operators, instead of checks like
(_M_value & ~1) == _M_value in some cases it now uses _M_reverse()
which is negation in unsigned char type + conversion back to the original
type. _M_reverse() thus turns the -1, 0, 1, -128 values into
1, 0, -1, -128.  Note libc++ uses value -127 instead of 2/-128.

Now, the middle-end has some optimizations which rely on the particular
implementation and don't optimize if not.  One is optimize_spaceship
which on some targets (currently x86, aarch64 and s390) attempts to use
better comparison instructions (ideally just one floating point comparison
to get all 4 possible outcomes plus some flag tests or magic instead of
2 or 3 floating point comparisons).  This one can actually handle
arbitrary int non-[-1,1] values for unordered but still has a default
of 2.  The patch changes that default to -128 so that even if something
is expanded as branches if it is later during RTL optimizations determined
to convert that into partial_ordering we get better code.

The other optimization (phiopt one) is about optimizing (x <=> y) < 0
etc. into just x < y.  This one actually relies on the exact unordered
value (2) and has code to deal with that (_M_value & ~1) == _M_value
kind of tests and whatever match.pd lowers it.  So, this patch partially
rewrites it to look for -128 instead of 2, drop those
(_M_value & ~1) == _M_value pattern recognitions and instead introduces
pattern recognition of _M_reverse(), i.e. cast to unsigned char, negation
in that type and cast back to the original signed type.

With all these changes we get back the desired optimizations for all
the cases we could optimize previously (note, for HONOR_NANS case
we don't try to optimize say (x <=> y) == 0 because the original
will raise exception if either x or y is a NaN, while turning it into
x == y will not, but (x <=> y) <= 0 is fine (x <= y), because it
does raise those exceptions.

2025-08-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/121698
* tree-ssa-phiopt.cc (spaceship_replacement): Adjust
to handle spaceship unordered value -128 rather than 2 and
stmts from the new std::partial_order::_M_reverse() instead
of (_M_value & ~1) == _M_value etc.
* doc/md.texi (spaceship@var{m}4): Use -128 instead of 2.
* tree-ssa-math-opts.cc (optimize_spaceship): Adjust comments
that libstdc++ unordered value is -128 rather than 2 and use
that as the default unordered value.
* config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Use
GEN_INT (-128) instead of const2_rtx and adjust comment accordingly.
* config/aarch64/aarch64.cc (aarch64_expand_fp_spaceship): Likewise.
* config/s390/s390.cc (s390_expand_fp_spaceship): Likewise.

* gcc.dg/pr94589-2.c: Adjust for expected unordered value -128
rather than 2 and negations in unsigned char instead of and with
~1 and comparison against original value.
* gcc.dg/pr94589-4.c: Likewise.
* gcc.dg/pr94589-5.c: Likewise.
* gcc.dg/pr94589-6.c: Likewise.

doc: Improve markup for list of vector operators

gcc:
* doc/extend.texi (Vector Extensions): Improve markup for list
of operators.

doc: Update Objective-C language reference

gcc:
* doc/standards.texi (Standards): Update "Object-Oriented
Programming and the Objective-C Language" reference.

x86-64: Use UNSPEC_DTPOFF to check source operand in TLS64_COMBINE

Since the first operand of PLUS in the source of TLS64_COMBINE pattern:

(set (reg/f:DI 128)
    (plus:DI (unspec:DI [
                (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
                (reg:DI 126)
                (reg/f:DI 7 sp)
            ] UNSPEC_TLSDESC)
        (const:DI (unspec:DI [
                    (symbol_ref:DI ("bfd_error") [flags 0x1a] <var_decl 0x7fffe99d6e40 bfd_error>)
                ] UNSPEC_DTPOFF))))

is unused, use the second operand of PLUS:

(const:DI (unspec:DI [
            (symbol_ref:DI ("bfd_error") [flags 0x1a] <var_decl 0x7fffe99d6e40 bfd_error>)
        ] UNSPEC_DTPOFF))

to check if 2 TLS_COMBINE patterns have the same source.

gcc/

PR target/121725
* config/i386/i386-features.cc
(pass_x86_cse::candidate_gnu2_tls_p): Use the UNSPEC_DTPOFF
operand to check source operand in TLS64_COMBINE pattern.

gcc/testsuite/

PR target/121725
* gcc.target/i386/pr121725-1a.c: New test.
* gcc.target/i386/pr121725-1b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

forwprop: Copy the memcmp optimization from strlen to forwprop [PR116651]

To better optimize code dealing with `memcmp == 0` where we have
a small constant size, we can inline the memcmp in those cases.
There is code to do this in strlen but that is run too late in
the case where we can figure out the value of one of the arguments
to memcmp. So this copies the optimization to forwprop.

An example of where this helps is:
```
bool cmpvect(const std::vector<int> &a) { return a == std::vector<int>{10}; }
```

Where the above should be optimized to just `return a.size() == 1 && a[0] == 10;`.

Note pr44130.c testcase needed to change as now it will be optimized away otherwise.
Note the loop in pr44130.c os also vectorized which it was not before.

Note the optimization remains in strlen as the other part (memcmp -> memcmp_eq)
should move to either isel or fab and I didn't want to remove it just yet.

Bootstrapped and tested on x86_64-linux-gnu.

Changes since v1:
* v2: Add verification of arguments to memcmp to simplify_builtin_memcmp.

PR tree-optimization/116651
PR tree-optimization/93265
PR tree-optimization/103647
PR tree-optimization/52171

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_builtin_memcmp): New function.
(simplify_builtin_call): Call simplify_builtin_memcmp for memcmp
memcmp_eq builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr44130.c: Add an inline-asm clobber.
* g++.dg/tree-ssa/vector-compare-1.C: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Revert "Fix _Decimal128 arithmetic error under FE_UPWARD."

This reverts commit 50064b2898edfb83bc37f2597a35cbd3c1c853e3.

Daily bump.

PR modula2/121709: Failed bootstrap in m2

This patch is a followup to PR modula2/121629 which uses
the cpp_include_defaults array to configure the default search path
entries. In particular it creates default search paths
based on LOCAL_INCLUDE_DIR, PREFIX_INCLUDE_DIR, gcc version path
and NATIVE_SYSTEM_HEADER_DIR.

gcc/m2/ChangeLog:

PR modula2/121709
* gm2-lang.cc (concat_component): New function.
(find_cpp_entry): Ditto.
(lookup_cpp_default): Ditto.
(add_default_include_paths): Rewrite.
(m2_pathname_root): Remove.

gcc/ChangeLog:

PR modula2/121709
* doc/gm2.texi (Module Search Path): Reflect the new
search order.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: array subscript with COND_EXPR as the array

The following minimum reproducer would miscompile with vanilla gcc:

  extern int x[10], y[10];
  bool g();
  void f() { 0[g() ? x : y] = 1; }

gcc would mistakenly treat the subexpression (g() ? x : y) as a prvalue and
move that array to stack. The following assignment would then write to the
stack instead of to the global arrays. When optimizations are enabled, this
assignment is discarded by dse and gcc generates the following code for the
f function:

  "_Z1fi":
        jmp     "_Z1gv"

The miscompilation requires all the following conditions to be met:

  - The array subscription expression is written as idx[array], instead of
    the usual form array[idx];
  - The "array" part must be a ternary expression (COND_EXPR in gcc tree)
    and it must be an lvalue.
  - The code must be compiled with -fstrong-eval-order which is the default
    for -std=c++17 or later.

The cause of the issue lies in cp_build_array_ref, where it mistakenly
generates a COND_EXPR with ARRAY_TYPE to the IL when all the criteria above
are met. This patch tries to resolve this issue. It moves the
canonicalization step that transforms idx[array] to array[idx] early in
cp_build_array_ref to ensure we handle these two forms of array subscription
consistently.

Tested on x86_64-linux.

gcc/cp/ChangeLog:

* typeck.cc (cp_build_array_ref): Handle 0[arr] earlier.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/array-condition-expr.C: New test.

Signed-off-by: Sirui Mu <msrlancern@gmail.com>

diagnostics: add GCC_DIAGNOSTICS_LOG

Whilst experimenting with PR diagnostics/121039 (potentially capturing
suppressed diagnostics in SARIF output), I found it very useful to have
a text log from the diagnostic subsystem to track what it's doing and
the decisions it's making (e.g. exactly when and why a diagnostic is
being rejected).

This patch adds a simple logging mechanism to the diagnostics subsystem,
enabled by setting GCC_DIAGNOSTICS_LOG in the environment, which emits
nested text like this to stderr (or a named file):

warning (option_id: 668, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
  diagnostics::context::diagnostic_impl (option_id: 668, kind: warning, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
    diagnostics::context::report_diagnostic
    rejecting: diagnostic not enabled
    false <- diagnostics::context::diagnostic_impl
  false <- warning

This logging mechanism doesn't use pretty_printer because it can be
helpful to use it to debug pretty_printer itself.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostics/logging.o.
* diagnostic-global-context.cc: Include "diagnostics/logging.h".
(log_function_params, auto_inc_log_depth): New "using" decls.
(verbatim): Add logging.
(emit_diagnostic): Likewise.
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
(inform): Likewise.
(inform_n): Likewise.
(warning): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(warning_n): Likewise.
(pedwarn): Likewise.
(permerror): Likewise.
(permerror_opt): Likewise.
* diagnostics/context.cc: Include "diagnostics/logging.h".
(context::initialize): Initialize m_logger.  Add logging.
(context::finish): Add logging.  Clean up m_logger.
(context::dump): Add indent param.
(context::set_sink): Add logging.
(context::add_sink): Add logging.
(diagnostic_kind_debug_text): New.
(get_debug_string_for_kind): New.
(context::report_diagnostic): Add logging.
(context::diagnostic_impl): Likewise.
(context::diagnostic_n_impl): Likewise.
(context::end_group): Likewise.
* diagnostics/context.h: Include "diagnostics/logging.h".
(context::dump): Add indent param.
(context::get_logger): New accessor.
(context::classify_diagnostics): Add logging.
(context::push_diagnostics): Likewise.
(context::pop_diagnostics): Likewise.
(context::m_logger): New field.
* diagnostics/html-sink.cc: Include "diagnostics/logging.h".
(html_builder::flush_to_file): Add logging.
(html_sink::on_report_diagnostic): Likewise.
* diagnostics/kinds.h (get_debug_string_for_kind): New decl.
* diagnostics/logging.cc: New file.
* diagnostics/logging.h: New file.
* diagnostics/output-file.h: Include "label-text.h".
* diagnostics/sarif-sink.cc: Include "diagnostics/logging.h".
(sarif_builder::flush_to_object): Add logging.
(sarif_builder::flush_to_file): Likewise.
(sarif_sink::on_report_diagnostic): Likewise.
* diagnostics/sink.h (sink::get_logger): New.
* diagnostics/text-sink.cc: Include "diagnostics/logging.h".
(text_sink::on_report_diagnostic): Add logging.
* doc/invoke.texi (Environment Variables): Document
GCC_DIAGNOSTICS_LOG.
* opts-diagnostic.cc: Include "diagnostics/logging.h".
(handle_OPT_fdiagnostics_add_output_): Add loggging.
(handle_OPT_fdiagnostics_set_output_): Likewise.

gcc/analyzer/ChangeLog:
* pending-diagnostic.cc: Include "diagnostics/logging.h".
(diagnostic_emission_context::warn): Add logging.
(diagnostic_emission_context::inform): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

xtensa: Rewrite bswapsi2_internal with compact syntax

Also, the omission of the instruction that sets the shift amount register
(SAR) to 8 is now more efficient: it is omitted if there was a previous
bswapsi2 in the same BB, but not omitted if no bswapsi2 is found or another
insn that modifies SAR is found first (see below).

Note that the five instructions for writing to SAR are as follows, along
with the insns that use them (except for bswapsi2_internal itself):

- SSA8B
    *shift_per_byte, *shlrd_per_byte
- SSA8L
    *shift_per_byte, *shlrd_per_byte
- SSR
    ashrsi3 (alt 1), lshrsi3 (alt 1), *shlrd_reg, rotrsi3 (alt 1)
- SSL
    ashlsi3_internal (alt 1), *shlrd_reg, rotlsi3 (alt 1)
- SSAI
    *shlrd_const, rotlsi3 (alt 0), rotrsi3 (alt 0)

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h (xtensa_bswapsi2_output):
New function prototype.
* config/xtensa/xtensa.cc
(xtensa_bswapsi2_output_1, xtensa_bswapsi2_output):
New functions.
* config/xtensa/xtensa.md (bswapsi2_internal):
Rewrite in compact syntax and use xtensa_bswapsi2_output() as asm
output.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/bswap-SSAI8.c: New.

[RISC-V][PR target/121548] Avoid bogus index into recog operand cache

So the RISC-V port has attributes which indicate the index within the
recog_data where certain operands will be found.

For this BZ the default value for the merge_op_idx attribute on the given insn
is "2".  But the insn only has operands 0 & 1.  So we do an out of bounds array
access and boom the ICE/valgrind failure.

As we discussed in the patchwork meeting, this is all a bit clunky and has been
fairly error prone.  This doesn't add any massive checking, but does introduce
some asserts to help catch problems a bit earlier and clearer.

In particular in cases where we're already asserting that the returned index is
valid (!= INVALID_ATTRIBUTE) we also assert that the index is less than the
total number of operands.

In the get_vlmax_ta_preferred_avl routine it appears like we need to handle
these two cases more gracefully as we apparently legitimately query for the
merge_op_idx on a fairly arbitrary insn.  We just have to make sure to not
*use* the result if it's INVALID_ATTRIBUTE.  So for that code we assert that
merge_op_idx is either INVALID_ATTRIBUTE or smaller than the number of
operands.

This patch also adds overrides for 3 patterns to return INVALID_ATTRIBUTE for
merge_op_idx, similar to how they already do for mode_idx and avl_type_idx.

This has been bootstrapped and regression tested on the bpi & pioneer systems
and regression tested for riscv32-elf and riscv64-elf.  Waiting on CI before
pushing.

PR target/121548
gcc/
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): Assert
MODE_IDX is smaller than the number of operands.
(simplify_replace_vlmax_avl): Similarly.
(pass_avlprop::get_vlmax_ta_preferred_avl): Similarly.
* config/riscv/vector.md: Override merge_op_idx computation
for simple moves, just like is done for avl_type_idx and mode_idx.

Fortran: improve compile-time checking of character dummy arguments [PR93330]

PR fortran/93330

gcc/fortran/ChangeLog:

* interface.cc (get_sym_storage_size): Add argument size_known to
indicate that the storage size could be successfully determined.
(get_expr_storage_size): Likewise.
(gfc_compare_actual_formal): Use them to handle zero-sized dummy
and actual arguments.
If a character formal argument has the pointer or allocatable
attribute, or is an array that is not assumed or explicit size,
we generate an error by default unless -std=legacy is specified,
which falls back to just giving a warning.
If -Wcharacter-truncation is given, warn on a character actual
argument longer than the dummy. Generate an error for too short
scalar character arguments if -std=f* is given instead of just a
warning.

gcc/testsuite/ChangeLog:

* gfortran.dg/argument_checking_15.f90: Adjust dg-pattern.
* gfortran.dg/bounds_check_strlen_7.f90: Add dg-pattern.
* gfortran.dg/char_length_3.f90: Adjust options.
* gfortran.dg/whole_file_24.f90: Add dg-pattern.
* gfortran.dg/whole_file_29.f90: Likewise.
* gfortran.dg/argument_checking_27.f90: New test.

RISC-V: Add patterns for vector-scalar IEEE floating-point min

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an unspec_vfmin RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmin.vv       v1,v1,v2

After, we get only one:
  vfmin.vf       v1,v1,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md
(*vfmin_vf_ieee_<mode>): Add new patterns to combine vec_duplicate +
vfmin.vv (unspec) into vfmin.vf.
(*vfmul_vf_<mode>, *vfrdiv_vf_<mode>, *vfmin_vf_<mode>): Fix attribute
types.
* config/riscv/vector.md (@pred_<ieee_fmaxmin_op><mode>_scalar): Allow
VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Add vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f64.c: New test.

x86: Allow by_pieces op when expanding memcpy/memset epilogue

Since

commit 401199377c50045ede560daf3f6e8b51749c2a87
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Jun 17 10:17:17 2025 +0800

    x86: Improve vector_loop/unrolled_loop for memset/memcpy

uses move_by_pieces and store_by_pieces to expand memcpy/memset epilogue
with vector_loop even when targetm.use_by_pieces_infrastructure_p returns
false, which triggers

  gcc_assert (targetm.use_by_pieces_infrastructure_p
                (len, align,
                 memsetp ? SET_BY_PIECES : STORE_BY_PIECES,
                 optimize_insn_for_speed_p ()));

in store_by_pieces.  Fix it by:

1. Add by_pieces_in_use to machine_function to indicate that by_pieces op
is currently in use.
2. Set and clear by_pieces_in_use when expanding memcpy/memset epilogue
with move_by_pieces and store_by_pieces.
3. Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P to return true if
by_pieces_in_use is true.

gcc/

PR target/121096
* config/i386/i386-expand.cc (expand_cpymem_epilogue): Set and
clear by_pieces_in_use when using by_pieces op.
(expand_setmem_epilogue): Likewise.
* config/i386/i386.cc (ix86_use_by_pieces_infrastructure_p): New.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Likewise.
* config/i386/i386.h (machine_function): Add by_pieces_in_use.

gcc/testsuite/

PR target/121096
* gcc.target/i386/memcpy-strategy-14.c: New test.
* gcc.target/i386/memcpy-strategy-15.c: Likewise.
* gcc.target/i386/memset-strategy-10.c: Likewise.
* gcc.target/i386/memset-strategy-11.c: Likewise.
* gcc.target/i386/memset-strategy-12.c: Likewise.
* gcc.target/i386/memset-strategy-13.c: Likewise.
* gcc.target/i386/memset-strategy-14.c: Likewise.
* gcc.target/i386/memset-strategy-15.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Handle constant in any modes in setmem_epilogue_gen_val

Since the constant passed to setmem_epilogue_gen_val may not be in
word_mode, update setmem_epilogue_gen_val to handle any integer modes.

gcc/

PR target/121108
* config/i386/i386-expand.cc (setmem_epilogue_gen_val): Don't
assert op_mode == word_mode and handle any integer modes.

gcc/testsuite/

PR target/121108
* gcc.target/i386/memset-strategy-16.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86-64: Improve source operand check for TLS_CALL

Source operands of 2 TLS_CALL patterns in

(insn 10 9 11 3 (set (reg:DI 100)
        (unspec:DI [
                (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
            ] UNSPEC_TLSDESC)) "x.c":7:16 1674 {*tls_dynamic_gnu2_lea_64_di}
     (nil))
(insn 11 10 12 3 (parallel [
            (set (reg:DI 99)
                (unspec:DI [
                        (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
                        (reg:DI 100)
                        (reg/f:DI 7 sp)
                    ] UNSPEC_TLSDESC))
            (clobber (reg:CC 17 flags))
        ]) "x.c":7:16 1676 {*tls_dynamic_gnu2_call_64_di}
     (expr_list:REG_DEAD (reg:DI 100)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

and

(insn 19 17 20 4 (set (reg:DI 104)
        (unspec:DI [
                (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
            ] UNSPEC_TLSDESC)) "x.c":6:10 discrim 1 1674 {*tls_dynamic_gnu2_lea_64_di}
     (nil))
(insn 20 19 21 4 (parallel [
            (set (reg:DI 103)
                (unspec:DI [
                        (symbol_ref:DI ("caml_state") [flags 0x10]  <var_decl 0x7fe10e1d9e40 caml_state>)
                        (reg:DI 104)
                        (reg/f:DI 7 sp)
                    ] UNSPEC_TLSDESC))
            (clobber (reg:CC 17 flags))
        ]) "x.c":6:10 discrim 1 1676 {*tls_dynamic_gnu2_call_64_di}
     (expr_list:REG_DEAD (reg:DI 104)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

are the same even though rtx_equal_p returns false since (reg:DI 100)
and (reg:DI 104) are set from the same symbol.  Use the UNSPEC_TLSDESC
symbol

(unspec:DI [(symbol_ref:DI ("caml_state") [flags 0x10])] UNSPEC_TLSDESC))

to check if 2 TLS_CALL patterns have the same source.

For TLS64_COMBINE, use both UNSPEC_TLSDESC and UNSPEC_DTPOFF unspecs to
check if 2 TLS64_COMBINE patterns have the same source.

gcc/

PR target/121694
* config/i386/i386-features.cc (redundant_pattern): Add
tlsdesc_val.
(pass_x86_cse): Likewise.
(pass_x86_cse::tls_set_insn_from_symbol): New member function.
(pass_x86_cse::candidate_gnu2_tls_p): Set tlsdesc_val.  For
TLS64_COMBINE, match both UNSPEC_TLSDESC and UNSPEC_DTPOFF
symbols.  For TLS64_CALL, match the UNSPEC_TLSDESC sumbol.
(pass_x86_cse::x86_cse): Initialize the tlsdesc_val field in
load.  Pass the tlsdesc_val field to ix86_place_single_tls_call
for X86_CSE_TLSDESC.

gcc/testsuite/

PR target/121694
* gcc.target/i386/pr121668-1b.c: New test.
* gcc.target/i386/pr121694-1a.c: Likewise.
* gcc.target/i386/pr121694-1b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

c++: -fimplicit-constexpr testcase tweak

If B::get is (implictly or explicitly) constexpr the individual b bindings
have constant initialization and get optimized away, so their symbols don't
appear in the assembly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/decomp26.C: Add -fimplicit-constexpr.

invoke.texi: AMD GCN - remove '(experimental)' from some gfx*-generic

GCC added generic support in r15-7406-gb5a29a93ee29a8 (Feb 2025) with an
'(experimental)' marker, also because ROCm only supported it in their
git repository and not in a released version. Since ROCm 6.4 (Apr 2025),
generic is also supported in released ROCm versions - and has been
meanwhile tested by us.

For architectures that have a well tested architecture, there is no
reason that a binary, compiled for the associated generic architecture,
performs any different to the specific version. Hence, this commit
removes the marker for gfx-9-generic (gfx900, gfx906, gfx90c are known
to work specific architectures), gfx10-3-generic (likewise for gfx1030
and gfx1036), and gfx11-generic (gfx1100 and gfx1103).

gcc/ChangeLog:

* doc/invoke.texi (AMD GCN Options: -march): Remove '(experimental)'
from gfx-{9,10-3,11}-generic.

install.texi: For amdgcn, clarify which llvm-* binaries are required

Also remove future tense for ROCm as 6.4.0 has been released in April 2025
and it supports generic architectures.

gcc/ChangeLog:

* doc/install.texi (amdgcn): Clarify which binaries must be the
LLVM version and which must be installed. Update version data for
ROCm for generic architectures.

i386: Fix vect-pragma-target-[12].c testcase for -march=XYZ [PR120643]

These 2 testcases were originally designed for the default -march= of
x86_64 so if you pass -march=native (on a target with AVX512 enabled),
they will fail. It fix this, we add `-mno-sse3 -mtune=generic`
to the options to force a specific arch to the testcase.

Changes since v1:
* v2: Use -mtune=generic instead of -mprefer-vector-width=512.

Tested on a skylake-avx512 machine with -march=native.

PR testsuite/120643
gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-pragma-target-1.c: Add `-mno-sse3 -mtune=generic`
to the options.
* gcc.target/i386/vect-pragma-target-2.c: Likewise.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

aarch64/testsuite: Fix vld2-1.c after r16-3201 [PR121713]

After r16-3201-gee67004474d521, this testcase started to fail as
we can copy prop into arguments now so the number of "after previous"
check has doubled.

Pushed after a quick check to make sure the testcase is now passing.

PR testsuite/121713
gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vld2-1.c: Update the number of "after previous"
checks.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Document -param=ix86-vect-unroll-limit.

gcc/ChangeLog:

* doc/invoke.texi: Document -param=ix86-vect-unroll-limit.

RISC-V: Add test for vec_duplicate + vnmsac.vv unsigned combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vnmsac.vv signed combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsac-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vnmsac.vv to vnmsac.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vnmsac.vv to the
vnmsac.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_TERNARY_CASE_0(T, OP_1, OP_2, NAME)
\
  void
\
  test_vx_ternary_##NAME##_##T##_case_0 (T * restrict vd, T * restrict
vs2, \NAME                                         T rs1, unsigned n)
\
  {
\
    for (unsigned i = 0; i < n; i++)
\
      vd[i] = vd[i] OP_2 vs2[i] OP_1 rs1;
\
  }

  DEF_VX_TERNARY_CASE_0(int32_t, *, +, macc)

Before this patch:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  ...
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  22   │     vnmsac.vv v1,v2,v3
  ...
  25   │     bne a3,zero,.L3

After this patch:
  11   │     beq a3,zero,.L8
  ...
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  ...
  20   │     vnmsac.vx v1,a2,v3
  ...
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Add new
pattern to combine to vx.
* config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Add new
pattern to generate rtl.
(*pred_nmsac_<mode>_scalar_undef): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

Fix _Decimal128 arithmetic error under FE_UPWARD.

libgcc/config/libbid/ChangeLog:

PR target/120691
* bid128_div.c: Fix _Decimal128 arithmetic error under
FE_UPWARD.
* bid128_rem.c: Ditto.
* bid128_sqrt.c: Ditto.
* bid64_div.c (bid64_div): Ditto.
* bid64_sqrt.c (bid64_sqrt): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr120691.c: New test.

Daily bump.

fixincludes: Skip pthread_incomplete_struct_argument for modern glibc [PR118009]

The pthread_incomplete_struct_argument fix was intended for ancient
versions of Glibc (only 2.3.3 and 2.3.4, I believe). From Glibc 2.3.5
the pthread.h header already included the change to use a pointer
instead of an array, so the fixinclude was no longer used.

However, the https://sourceware.org/bugzilla/show_bug.cgi?id=26647 fix
changed the __setjmpbuf declaration to use struct __jmp_buf_tag __env[1]
again, which caused this fixinclude to start matching again. This means
that GCC now installs a "fixed" pthread.h with a change to a declaration
that guarded by #if ! __GNUC_PREREQ (11, 0), i.e. it's not even relevant
for modern versions of GCC. The "fixed" pthread.h causes problems for
users because of changes to internal implementation details of the
pthread_cond_t type, which require the "fixed" pthread.h to be updated
with mkheaders if Glibc is updated.

This change adds a bypass to the fixinclude, so that it no longer
matches modern Glibc versions, and only applies to glibc versions 2.3.3
and 2.3.4 as originally intended.

Also remove outdated reference to svn in the comment at the top of the
generated file.

fixincludes/ChangeLog:

PR bootstrap/118009
PR bootstrap/119089
* inclhack.def (pthread_incomplete_struct_argument): Add bypass.
* fixincl.tpl: Remove reference to svn in comment.
* fixincl.x: Regenerate.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Implement C++26 <debugging> features [PR119670]

This implements P2546R5 (Debugging Support), including the P2810R4
(is_debugger_present is_replaceable) changes, allowing
std::is_debugger_present to be replaced by the program.

It would be good to provide a macOS definition of is_debugger_present as
per https://developer.apple.com/library/archive/qa/qa1361/_index.html
but that isn't included in this change.

The src/c++26/debugging.cc file defines a global volatile int which can
be set by debuggers to indicate when they are attached and detached from
a running process. This allows std::is_debugger_present() to give a
reliable answer, and additionally allows a debugger to choose how
std::breakpoint() should behave. Setting the global to a positive value
will cause std::breakpoint() to use that value as an argument to
std::raise, so debuggers that prefer SIGABRT for breakpoints can select
that. By default std::breakpoint() will use a platform-specific action
such as the INT3 instruction on x86, or GCC's __builtin_trap().

On Linux the std::is_debugger_present() function checks whether the
process is being traced by a process named "gdb", "gdbserver" or
"lldb-server", to try to avoid interpreting other tracing processes
(such as strace) as a debugger. There have been comments suggesting this
isn't desirable and that std::is_debugger_present() should just return
true for any tracing process (which is the case for non-Linux targets
that support the ptrace system call).

libstdc++-v3/ChangeLog:

PR libstdc++/119670
* acinclude.m4 (GLIBCXX_CHECK_DEBUGGING): Check for facilities
needed by <debugging>.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_DEBUGGING.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (debugging): Add.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Add new header.
* src/c++26/Makefile.am: Add new file.
* src/c++26/Makefile.in: Regenerate.
* include/std/debugging: New file.
* src/c++26/debugging.cc: New file.
* testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
* testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
New test.
* testsuite/19_diagnostics/debugging/is_debugger_present.cc: New
test.
* testsuite/19_diagnostics/debugging/is_debugger_present-2.cc:
New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++: > in lambda in template arg [PR107953]

As with PR116928, we need to set greater_than_is_operator_p within the
lambda delimiters.

PR c++/107953

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_expression): Set
greater_than_is_operator_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ18.C: New test.

passes: Move cleanup_eh before first tailr [PR115201]

So the current pass order is:
```
          NEXT_PASS (pass_tail_recursion);
          NEXT_PASS (pass_if_to_switch);
          NEXT_PASS (pass_convert_switch);
          NEXT_PASS (pass_cleanup_eh);
```
But nothing in if_to_switch nor convert_switch will change the IR
such that cleanup eh will take into account.
tail_recusion benifits the most by not having "almost" empty landing pads.
This order was originally done when cleanup_eh was added in r0-92178-ga8da523f8a442f
but it looks like it was just done just before inlining rather than thinking it
could improve passes before hand.

An example where this helps is PR 115201 where we have:
```
;;   basic block 5, loop depth 0, maybe hot
;;    prev block 4, next block 6, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 (TRUE_VALUE,EXECUTABLE)
  [LP 1] # .MEM_19 = VDEF <.MEM_45>
  # USE = nonlocal escaped
  # CLB = nonlocal escaped
  D.4770 = _Z12binarySearchIi2itIiEET0_RKT_S2_S2_D.4690 (item_15(D), startD.4711, midD.4717);
  goto <bb 7>; [INV]
;;    succ:       8 (EH,EXECUTABLE)
;;                7 (FALLTHRU,EXECUTABLE)
...

;;   basic block 8, loop depth 0, maybe hot
;;    prev block 7, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:       5 (EH,EXECUTABLE)
;;                6 (EH,EXECUTABLE)
  # .MEM_7 = PHI <.MEM_19(5), .MEM_18(6)>
<L6>: [LP 1]
  # .MEM_20 = VDEF <.MEM_7>
  midD.4717 ={v} {CLOBBER(eos)};
  resx 1
;;    succ:
```

As you can see the empty landing pad should be able to remove away and
then a tail recursion can happen.

Bootstrapped and tested x86_64-linux-gnu.

PR tree-optimization/115201
gcc/ChangeLog:

* passes.def: Move cleanup_eh before first tail_recursion.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

MAINTAINERS: add myself to write after approval

ChangeLog:

* MAINTAINERS: add myself to write after approval

RISC-V: Add pattern for vector-scalar floating-point min

This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an smin RTL instruction.

Before this patch, we have two instructions, e.g.:
  vfmv.v.f       v2,fa0
  vfmin.vv       v1,v1,v2

After, we get only one:
  vfmin.vf       v1,v1,fa0

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfmin_vf_<mode>): Add new pattern to
combine vec_duplicate + vfmin.vv into vfmin.vf.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/floating-point-min-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-min-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for
function variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfmin.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmin-run-1-f64.c: New test.

Dump niter assumption versioning when vectorizing

The following emits the assumption that is used for versioning from
niter analysis.

* tree-vect-loop.cc (vect_analyze_loop_form): Dump
niter assumption used for versioning.

AArch64: Add isinf expander [PR 66462]

Add an expander for isinf using integer arithmetic. This is
typically faster and avoids generating spurious exceptions on
signaling NaNs. This fixes part of PR66462.

int isinf1 (float x) { return __builtin_isinf (x); }

Before:
fabs s0, s0
mov w0, 2139095039
fmov s31, w0
fcmp s0, s31
cset w0, le
eor w0, w0, 1
ret

After:
fmov w1, s0
mov w0, -16777216
cmp w0, w1, lsl 1
cset w0, eq
ret

gcc:
PR middle-end/66462
* config/aarch64/aarch64.md (isinf<mode>2): Add new expander.
* config/aarch64/iterators.md (mantissa_bits): Add new mode_attr.

gcc/testsuite:
PR middle-end/66462
* gcc.target/aarch64/pr66462.c: Add new test.

libstdc++: Test comparing ordering with type convertible to any pointer.

libstdc++-v3/ChangeLog:

* testsuite/18_support/comparisons/categories/zero_neg.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

Compute reduction var in vectorize_fold_left_reduction

Instead of going via the PHI node accessible through the reduc-dec
link, use the scalar def of the reduction SLP node. Compute this
in vectorize_fold_left_reduction itself.

* tree-vect-loop.cc (vectorize_fold_left_reduction): Do not get
reduc_var as argument, instead compute it here.
(vect_transform_reduction): Adjust.

libstdc++: Remove implicit type conversions in std::complex

The current implementation of `complex<_Tp>` assumes that int
`int` is implicitly convertible to `_Tp`, e.g., when using
`complex<_Tp>(1)`.

This patch transforms the implicit conversions into explicit type casts.
As a result, `std::complex` is now able to support more types. One
example is the type `Eigen::Half` from
https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
implement implicit type conversions.

libstdc++-v3/ChangeLog:

* include/std/complex (polar, __complex_sqrt, pow)
(__complex_pow_unsigned): Use explicit conversions from int to
the complex value_type.

libstdc++: Constrain bitset(const CharT*) constructor [PR121046]

Asking std::is_constructible_v<std::bitset<1>, NonTrivial*> gives an
error, rather than answering the query. The problem is that the
constructor for std::bitset("010101") is not constrained to only accept
pointers to char-like types, and for the second parameter (which has a
default argument) std::basic_string_view<CharT> gets instantiated. If
the type is not char-like then that has undefined behaviour, and might
trigger a static_assert to fail in the body of std::basic_string_view.

We can fix it by constraining that constructor using the requirements
for char-like types from [strings.general] p1. I've submitted LWG 4294
and proposed making this change in the standard.

libstdc++-v3/ChangeLog:

PR libstdc++/121046
* include/std/bitset (bitset(const CharT*, ...)): Add
constraints on CharT type.
* testsuite/23_containers/bitset/lwg4294.cc: New test.

libstdc++: Provide helpers to interoperate between __cmp_cat::_Ord and ordering types.

This patch adds two new internal helpers for ordering types:
* __cmp_cat::__ord to retrieve an internal _Ord value,
* __cmp_cat::__make<Ordering> to create an ordering from an _Ord value.

Conversions between ordering types are now handled by __cmp_cat::__make. As a
result, ordering types no longer need to befriend each other, only the new
helpers.

The __fp_weak_ordering implementation has also been simplified by:
* using the new helpers to convert partial_ordering to weak_ordering,
* using strong_ordering to weak_ordering conversion operator,
for the __isnan_sign comparison,
* removing the unused __cat local variable.

Finally, the _Ncmp enum is removed, and the unordered enumerator is added
to the existing _Ord enum.

libstdc++-v3/ChangeLog:

* libsupc++/compare (__cmp_cat::_Ord): Add unordered enumerator.
(__cmp_cat::_Ncmp): Remove.
(__cmp_cat::__ord, __cmp_cat::__make): Define.
(partial_ordering::partial_ordering(__cmp_cat::_Ncmp)): Remove.
(operator<=>(__cmp_cat::__unspec, partial_ordering))
(partial_ordering::unordered): Replace _Ncmp with _Ord.
(std::partial_ordering, std::weak_ordering, std::strong_ordering):
Befriend __ord and __make helpers, remove friend declartions for
other orderings.
(__compare::__fp_weak_ordering): Remove unused __cat variable.
Simplify ordering conversions.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

c++/modules: Add explanatory note for incomplete types with definition in different module [PR119844]

The confusion in the PR arose because the definition of 'User' in a
separate named module did not provide an implementation for the
forward-declaration in the global module. This seems likely to be a
common mistake while people are transitioning to modules, so this patch
adds an explanatory note.

While I was looking at this I also noticed that the existing handling of
partial specialisations for this note was wrong (we pointed at the
primary template declaration rather than the relevant partial spec), so
this patch fixes that up, and also gives a more precise error message
for using a template other than by self-reference while it's being
defined.

PR c++/119844

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_inform): Add explanation when
a similar type is complete but attached to a different module.
Also fix handling of partial specs and templates.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr119844_a.C: New test.
* g++.dg/modules/pr119844_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

PR modula2/121629: adding third party modules

This patch makes it easier to add third party modules.
cc1gm2 now appends the search directory prefix/include/m2
to the search path for non dialect specific modules.
Prior to this it appends the dialect specific subdirectories
{m2pim,m2iso,m2log,m2min} with the appropriate dialect pathname.
The patch also includes a new option -fm2-pathname-root=prefix
which allow additional prefix/m2 directories to be searched
before the default.

gcc/ChangeLog:

PR modula2/121629
* doc/gm2.texi (Module Search Path): New section.
(Compiler options): New option -fm2-pathname-root=.
New option -fm2-pathname-rootI.

gcc/m2/ChangeLog:

PR modula2/121629
* gm2-compiler/PathName.mod: Add copyright notice.
* gm2-lang.cc (named_path): Add field lib_root.
(push_back_Ipath): Set lib_root false.
(push_back_lib_root): New function.
(get_dir_sep_size): Ditto.
(add_path_component): Ditto.
(add_one_import_path): Ditto.
(add_non_dialect_specific_path): Ditto.
(foreach_lib_gen_import_path): Ditto.
(get_module_source_dir): Ditto.
(add_default_include_paths): Ditto.
(assign_flibs): Ditto.
(m2_pathname_root): Ditto.
(add_m2_import_paths): Remove function.
(gm2_langhook_post_options): Call assign_flibs.
Check np.lib_root and call foreach_lib_gen_import_path.
Replace call to add_m2_import_paths with a call to
add_default_include_paths.
(gm2_langhook_handle_option): Add case OPT_fm2_pathname_rootI_.
* gm2spec.cc (named_path): Add field lib_root.
(push_back_Ipath): Set lib_root false.
(push_back_lib_root): New function.
(add_m2_I_path): Add OPT_fm2_pathname_rootI_ option
if np.lib_root.
(lang_specific_driver): Add case OPT_fm2_pathname_root_.
* lang.opt (fm2-pathname-root=): New option.
(fm2-pathname-rootI=): Ditto.

gcc/testsuite/ChangeLog:

PR modula2/121629
* gm2/switches/pathnameroot/pass/switches-pathnameroot-pass.exp: New test.
* gm2/switches/pathnameroot/pass/test.mod: New test.
* gm2/switches/pathnameroot/pass/testlib/m2/foo.def: New test.
* gm2/switches/pathnameroot/pass/testlib/m2/foo.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[gcn] gcc/configure.ac + install.texi - changes to detect HAVE_AS_LEB128 [PR119367]

The llvm-mc linker by default assemblies to another assembly file and not to an ELF
binary; that usually does not matter – but for the LEB128 check, additionally, the
resulting binary is checked. Hence, when using llvm-mc as target linker for
amdgcn-*-*, we better add the "--filetype=obj -triple=amdgcn--amdhsa" flags. The
current patch does so unconditionally, assuming that always llvm-mc is used.

Additionally, the resulting ELF file is checked, which requires an ELF reader such
as objdump. This commit adds llvm-objdump to the build documentation for amdgcn,
albeit also, e.g., Binutil's 'objdump' would do - as long as either
amdgcn-amdhsa-objdump or amdgcn-amdhsa/bin/objdump is found during the amdgcn
cross build.

gcc/ChangeLog:

PR debug/119367
* acinclude.m4 (gcc_GAS_FLAGS): For gcn, use "--filetype=obj
-triple=amdgcn--amdhsa", if supported.
* configure: Regenerate.
* doc/install.texi (amdgcn-*-*): Also add llvm-objdump to the list of
to-be-copied files.

c++: Fix auto return type deduction with expansion statements [PR121583]

The following testcase ICEs during expansion, because cfun->returns_struct
wasn't cleared, despite auto being deduced to int.

The problem is that check_return_type -> apply_deduced_return_type
is called when parsing the expansion stmt body, at that time
processing_template_decl is non-zero and apply_deduced_return_type
in that case doesn't do the
     if (function *fun = DECL_STRUCT_FUNCTION (fco))
       {
         bool aggr = aggregate_value_p (result, fco);
#ifdef PCC_STATIC_STRUCT_RETURN
         fun->returns_pcc_struct = aggr;
#endif
         fun->returns_struct = aggr;
       }
My assumption is that !processing_template_decl in that case
is used in the sense "the fco function is not a function template",
for function templates no reason to bother with fun->returns*struct,
nothing will care about that.
When returning a type dependent expression in the expansion stmt
body, apply_deduced_return_type just won't be called during parsing,
but when instantiating the body and all will be fine.  But when
returning a non-type-dependent expression, while check_return_type
will be called again during instantiation of the body, as the return
type is no longer auto in that case apply_deduced_return_type will not
be called again and so nothing will fix up fun->returns*struct.

The following patch fixes that by using !uses_template_parms (fco)
check instead of !processing_template_decl.

2025-08-28  Jakub Jelinek  <jakub@redhat.com>

PR c++/121583
* semantics.cc (apply_deduced_return_type): Adjust
fun->returns*_struct when !uses_template_parms (fco) instead of
when !processing_template_decl.

* g++.dg/cpp26/expansion-stmt23.C: New test.
* g++.dg/cpp26/expansion-stmt24.C: New test.

c++: Fix ICE with parameter uses in expansion stmts [PR121575]

The following testcase shows an ICE when a parameter of a non-template
function is referenced in expansion stmt body.

tsubst_expr in that case assumes that either the PARM_DECL has registered
local specialization, or is this argument or it is in unevaluated context.
Parameters are always defined outside of the expansion statement
for-range-declaration or body, so for the instantiation of the body
outside of templates should always map to themselves.
It could be fixed by registering local self-specializations for all the
function parameters, but just handling it in tsubst_expr seems to be easier
and less costly.
Some PARM_DECLs, e.g. from concepts, have NULL DECL_CONTEXT, those are
handled like before (and assert it is unevaluated operand), for others
this checks if the PARM_DECL is from a non-template and in that case it
will just return t.

2025-08-28 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR c++/121575
* pt.cc (tsubst_expr) <case PARM_DECL>: If DECL_CONTEXT (t) isn't a
template return t for PARM_DECLs without local specialization.

* g++.dg/cpp26/expansion-stmt20.C: New test.

Avoid mult pattern if that will break reduction constraints

synth-mult introduces multiple uses of a reduction variable
in some cases which will ultimatively fail vectorization (or ICE
with a pending change). So avoid applying the pattern in such case.

* tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid
in cases that introduce multiple uses of reduction operands.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

The divmod pattern will break reduction constraints

When we apply a divmod pattern this will break reductions by introducing
multiple uses of the reduction var, so avoid this pattern in reductions.

* tree-vect-patterns.cc (vect_recog_divmod_pattern): Avoid
for stmts participating in a reduction.

configure: Add readelf fallback for HAVE_AS_ULEB128 test [PR119367]

The following patch adds a readelf fallback if objdump nor otool don't
exist. All of GNU binutils readelf, eu-readelf and llvm-readelf can
handle it with those options.

2025-08-28 Jakub Jelinek <jakub@redhat.com>

PR debug/119367
* configure.ac (gcc_cv_as_leb128): Add fallback using readelf.
Grammar fix in comment.
* configure: Regenerate.

dwarf2out: Use DW_LNS_advance_pc instead of DW_LNS_fixed_advance_pc if possible [PR119367]

In the usual case we use .loc directives and don't emit the line table
manually.  And assembler usually uses DW_LNS_advance_pc which has
uleb128 argument and in most cases will have just a single byte operand.
But if we do emit it for whatever reason (old or buggy assembler or
-gno-as-loc{,view}-support option), we do use DW_LNS_fixed_advance_pc
instead, which has fixed 2 byte operand.  That is both wasteful
in the usual case of very small advances, and more importantly will
just result in assembler errors if we need to advance over more than 65535
bytes.
The following patch uses DW_LNS_advance_pc instead if assembler supports
.uleb128 directive with a difference of two labels in the same section.
This is only possible if Minimum Instruction Length in the .debug_line
header is 1 (otherwise DW_LNS_advance_pc operand is multiplied by that
value and DW_LNS_fixed_advance_pc is not), but we emit 1 for that
on all targets.
Looking at dwarf2out.o (from dwarf2out.cc with this patch)
compiled with compilers before/after this change with additional -fpic
-gno-as-loc{,view}-support options, I see .debug_line section shrunk from
878067 bytes to 773381 bytes, so shrink by 12%.
Admittedly gas generated .debug_line is even smaller, 501374 bytes (with
-fpic and without -gno-as-loc{,view}-support options).

2025-08-28  Jakub Jelinek  <jakub@redhat.com>

PR debug/119367
* dwarf2out.cc (output_one_line_info_table) <case LI_adv_address>: If
HAVE_AS_LEB128, use DW_LNS_advance_pc with dw2_asm_output_delta_uleb128
instead of DW_LNS_fixed_advance_pc with dw2_asm_output_delta.

Fortran: Constructors with PDT components did not work [PR82843]

2025-08-28 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/82843
* intrinsic.cc (gfc_convert_type_warn): If the 'from_ts' is a
PDT instance, copy the derived type to the target ts.
* resolve.cc (gfc_resolve_ref): A PDT component in a component
reference can be that of the pdt_template. Unconditionally use
component of the PDT instance to ensure that the backend_decl
is set during translation. Likewise if a component is
encountered that is a PDT template type, use the component
parmeters to convert to the correct PDT instance.

gcc/testsuite/
PR fortran/82843
* gfortran.dg/pdt_40.f03: New test.

Fortran: Implement correct form of PDT constructors [PR82205]

2025-08-28 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/82205
* decl.cc (gfc_get_pdt_instance): Copy the default initializer
for components that are not PDT parameters or parameterized. If
any component is a pointer or allocatable set the attributes
'pointer_comp' or 'alloc_comp' of the new PDT instance.
* primary.cc (gfc_match_rvalue): Implement the correct form of
PDT constructors with 'name (type parms)(component values)'.
* trans-array.cc (structure_alloc_comps): Apply scalar default
initializers. Array initializers await the coming change in PDT
representation.
* trans-io.cc (transfer_expr): Do not output the type parms of
a PDT in list directed output.

gcc/testsuite/
PR fortran/82205
* gfortran.dg/pdt_22.f03: Use the correct for PDT constructors.
* gfortran.dg/pdt_23.f03: Likewise.
* gfortran.dg/pdt_3.f03: Likewise.

Daily bump.

Remove xfail marker on RISC-V test

So yet another testsuite hygiene patch. This time turning XPASS -> PASS. My
tester treats those cases the same so I didn't get notified that nozicond-2.c
was passing after some recent changes.

This removes the xfail marker on that test and thus the test is expected to
pass now.

Pushing to the trunk momentarily.

gcc/testsuite/
* gcc.target/riscv/nozicond-2.c: Remove xfails.

Fortran: H edit descriptor error with -std=f95

PR fortran/114611

gcc/fortran/ChangeLog:

* io.cc: Issue an error on use of the H descriptor in
a format with -std=f95 or higher. Otherwise, issue a
warning.

gcc/testsuite/ChangeLog:

* gfortran.dg/aliasing_dummy_1.f90: Accommodate errors
and warnings as needed.
* gfortran.dg/eoshift_8.f90: Likewise.
* gfortran.dg/g77/f77-edit-h-out.f: Likewise.
* gfortran.dg/hollerith_1.f90: Likewise.
* gfortran.dg/io_constraints_1.f90: Likewise.
* gfortran.dg/io_constraints_2.f90: Likewise.
* gfortran.dg/longline.f: Likewise.
* gfortran.dg/pr20086.f90: Likewise.
* gfortran.dg/unused_artificial_dummies_1.f90: Likewise.
* gfortran.dg/x_slash_1.f: Likewise.

ifcvt: fix factor_out_operators (again) [PR121695]

r16-2648-gaebbc90d8c7c70 had a copy and pasto where
the second statement was supposed to be setting
the operand 1 of the phi but it was setting operand 0 instead.
This fixes typo.

Push as obvious after a quick build test for x86_64-linux-gnu.

PR tree-optimization/121695

gcc/ChangeLog:

* tree-if-conv.cc (factor_out_operators): Fix typo
in assignment of the phi.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr121695-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

RISC-V: testsuite: Fix vf_vfmul and vf_vfrdiv

Fix type and remove useless DejaGnu directives.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: Fix type.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: Remove
useless dg directives.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: Likewise.

libstdc++: Use _M_reverse to reverse partial_ordering using operator<=>

The patch r16-3414-gfcb3009a32dc33 changed the representation of unordered to
optimize reversing of order, but it did not update implementation of reversing
operator<=>(0, partial_order).

libstdc++-v3/ChangeLog:

* libsupc++/compare
(operator<=>(__cmp_cat::__unspec, partial_ordering)):
Implement using _M_reverse.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Move tai_- and gps_clock::now impls out of ABI

This patch moves std::tai_clock::now() and std::tai_clock::now()
definitions from header inlines to static members invoked via a
normal function call, in service of stabilizing the C++20 ABI.

It also changes #if guards to mention the actual __cpp_lib_*
feature gated, not just the language version, for clarity.

New global function symbols std::chrono::tai_clock::now
and std::chrono::gps_clock::now are exported.

libstdc++-v3/ChangeLog:
* include/std/chrono (gps_clock::now, tai_clock::now): Remove
inline definitions.
* src/c++20/clock.cc (gps_clock::now, tai_clock::now): New file
for out-of-line now() impls.
* src/c++20/Makefile.am: Mention clock.cc.
* src/c++20/Makefile.in: Regenerate.
* config/abi/pre/gnu.ver: add mangled now() symbols.