where the intermediate may-def SR.83_44->i = {} prevents CSE of the
load to zero. The problem is two-fold here, one is that the code
skipping may-defs does not handle zeroing via a CTOR, the other is that
(partial) must-defs can be better handled by later code as otherwise
we may not find an appropriate definition to CSE to.
I've noticed we fail to guard against storage-order issues, so fixed
that on the fly.
PR tree-optimization/121740
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow skipping
may-defs from CTORs. Do not skip may-defs with storage-order
issues or (partial) must-defs.
* gcc.dg/tree-ssa/ssa-fre-104.c: Un-XFAIL.
* gcc.dg/tree-ssa/ssa-fre-110.c: New testcase.
Nathaniel Shead [Sat, 23 Aug 2025 15:56:32 +0000 (01:56 +1000)]
c++/modules: Fix ADL [PR117658]
On looking again at [basic.lookup.argdep] p4, I believe GCC hasn't fully
implemented the wording here for ADL. This patch fixes two issues.
First, 4.3 indicates that a function exported from a named module should
be visible to ADL regardless of whether it's visible to normal name
lookup, as long as some restrictions are followed.
This patch implements this; for skipping declarations that "do not
appear in the TU containing the point of lookup" I don't think there's
anything special we need to do, as any declarations before the point of
lookup will be found in other ways anyway, and any remaining
declarations from the current TU cannot be seen regardless.
Secondly, currently we only add the exported functions along the
instantiation path of a lookup. But I don't think this is intended by
the current wording, so this patch adjusts that. I also clean up the
logic to do all different module processing in adl_namespace_fns so that
we don't duplicate work in traversing the module binding list
unnecessarily.
This new handling means we need to do some extra work to properly error
on overload sets containing TU-local entities (as this might actually
come up now!) but I'm leaving that for a later patch.
As a drive-by fix this also fixes an ICE for C++26 expansion statements
with finding the instantiation path.
PR c++/117658
gcc/cp/ChangeLog:
* cp-tree.h (get_originating_module): Adjust parameter names.
* module.cc (path_of_instantiation): Handle C++26 expansion
statements.
* name-lookup.cc (name_lookup::adl_namespace_fns): Handle
exported declarations attached to the same module of an
associated entity with the same innermost non-inline namespace,
and non-exported functions on the instantiation path.
(name_lookup::search_adl): Build mapping of namespace to modules
that associated entities are attached to; remove now-unneeded
instantiation path handling.
gcc/testsuite/ChangeLog:
* g++.dg/modules/adl-4_a.C: Test should pass.
* g++.dg/modules/adl-4_b.C: Test should pass.
* g++.dg/modules/adl-6_a.C: New test.
* g++.dg/modules/adl-6_b.C: New test.
* g++.dg/modules/adl-6_c.C: New test.
* g++.dg/modules/adl-7_a.C: New test.
* g++.dg/modules/adl-7_b.C: New test.
* g++.dg/modules/adl-7_c.C: New test.
* g++.dg/modules/adl-8_a.C: New test.
* g++.dg/modules/adl-8_b.C: New test.
* g++.dg/modules/adl-8_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
c++/modules: Mark implicit inline namespaces as purview [PR121724]
When we push an existing namespace within the module purview for the
first time, we also need to mark any parent inline namespaces as purview
to not confuse the streaming logic.
PR c++/121724
gcc/cp/ChangeLog:
* name-lookup.cc (push_namespace): Mark inline namespace
contexts as purview if needed.
gcc/testsuite/ChangeLog:
* g++.dg/modules/namespace-12_a.C: New test.
* g++.dg/modules/namespace-12_b.C: New test.
testsuite, darwin: Suppress unwind frames in scantest-lto.c.
Currently, for Darwin unwind and EH frames are emitted without use
of .cfi_xxx instructions; the emitted frames also contain the
string 'ascii'. For the purpose of this test, omit them.
RISC-V: Add support for the XAndesbfhcvt ISA extension.
This extension defines instructions to perform scalar floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754
32-bit single-precision floating-point (SP) data in a scalar
floating point register.
gcc/ChangeLog:
* config/riscv/andes.def: Add nds_fcvt_s_bf16 and nds_fcvt_bf16_s.
* config/riscv/riscv.md (truncsfbf2): Add TARGET_XANDESBFHCVT support.
(extendbfsf2): Ditto.
* config/riscv/riscv-builtins.cc: New AVAIL andesbfhcvt.
Add new define RISCV_ATYPE_BF and RISCV_ATYPE_SF.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xandes/xandesbfhcvt-1.c: New test.
* gcc.target/riscv/xandes/xandesbfhcvt-2.c: New test.
RISC-V: Add support for the XAndesperf ISA extension.
This patch adds support for the XAndesperf ISA extension.
The 32-bit AndeStar V5 extension includes branch instructions,
load effective address instructions, and string processing
instructions for performance improvement.
New INSN patterns are added into the new file andes.md
as a seprated vender extension.
gcc/ChangeLog:
* config/riscv/constraints.md (Ou07): New constraint.
(ads_Bext): New constraint.
* config/riscv/iterators.md (ANYLE32): New iterator.
(sizen): New iterator.
(sh_limit): New iterator.
(sh_bit): New iterator.
(cs): New iterator.
* config/riscv/predicates.md (ads_branch_bbcs_operand): New predicate.
(ads_branch_bimm_operand): New predicate.
(ads_imm_extract_operand): New predicate.
(ads_extract_size_imm_si): New predicate.
(ads_extract_size_imm_di): New predicate.
(const_int5_operand): New predicate.
* config/riscv/riscv-builtins.cc:
Add new AVAIL andesperf32 and andesperf64.
Add new define RISCV_ATYPE_DI.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
* config/riscv/riscv.cc
(riscv_extend_cost): Cost for pattern 'bfo'.
(riscv_rtx_costs): Cost for XAndesperf extension.
* config/riscv/riscv.md: Add support for XAndesperf to patterns
zero_extendsidi2_internal, zero_extendhi2, extendsidi2_internal,
extend<SHORT:mode><SUPERQI:mode>2, <any_extract:optab><GPR:mode>3
and branch_on_bit.
* config/riscv/vector-iterators.md
(sz): Add sign_extract and zero_extract.
* config/riscv/andes.def: New file for vender Andes.
* config/riscv/andes.md: New file for vender Andes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/riscv.exp: Add runtest for subdir xandes.
* gcc.target/riscv/xandes/xandesperf-1.c: New test.
* gcc.target/riscv/xandes/xandesperf-10.c: New test.
* gcc.target/riscv/xandes/xandesperf-2.c: New test.
* gcc.target/riscv/xandes/xandesperf-3.c: New test.
* gcc.target/riscv/xandes/xandesperf-4.c: New test.
* gcc.target/riscv/xandes/xandesperf-5.c: New test.
* gcc.target/riscv/xandes/xandesperf-6.c: New test.
* gcc.target/riscv/xandes/xandesperf-7.c: New test.
* gcc.target/riscv/xandes/xandesperf-8.c: New test.
* gcc.target/riscv/xandes/xandesperf-9.c: New test.
* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
(XANDESPERF) : New mask.
(XANDESBFHCVT): Ditto.
(XANDESVBFHCVT): Ditto.
(XANDESVSINTLOAD): Ditto.
(XANDESVPACKFPH): Ditto.
(XANDESVDOT): Ditto.
* config/riscv/t-riscv: Add riscv-ext-andes.def.
* doc/riscv-ext.texi: Regenerated.
* config/riscv/riscv-ext-andes.def: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xandes/xandes-predef-1.c: New test.
* gcc.target/riscv/xandes/xandes-predef-2.c: New test.
* gcc.target/riscv/xandes/xandes-predef-3.c: New test.
* gcc.target/riscv/xandes/xandes-predef-4.c: New test.
* gcc.target/riscv/xandes/xandes-predef-5.c: New test.
* gcc.target/riscv/xandes/xandes-predef-6.c: New test.
Co-author: Lino Hsing-Yu Peng (linopeng@andestech.com)
Co-author: Kai Kai-Yi Weng (kaiweng@andestech.com).
RISC-V: Add pattern for vector-scalar floating-point max
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an smax RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfmax.vv v1,v1,v2
After, we get only one:
vfmax.vf v1,v1,fa0
In some cases, it also shaves off one vsetvli.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfmax_vf_<mode>): Rename into...
(*vf<optab>_vf_<mode>): New pattern to combine vec_duplicate +
vf{min,max}.vv into vf{max,min}.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-max-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmax. Also add
missing scan-dump for vfmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Add vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add max functions.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmax-run-1-f64.c: New test.
Austin Law [Wed, 3 Sep 2025 16:41:17 +0000 (10:41 -0600)]
[RISC-V][PR target/121213] Avoid unnecessary sign extension in amoswap sequence
This is Austin's work to remove the redundant sign extension seen in pr121213.
--
The .w form of amoswap will sign extend its result from 32 to 64 bits, thus any
explicit sign extension insn doing the same is redundant.
This uses Jivan's approach of allocating a DI temporary for an extended result
and using a promoted subreg extraction to get that result into the final
destination.
Tested with no regressions on riscv32-elf and riscv64-elf and bootstrapped on
the BPI and pioneer systems.
PR target/121213
gcc/
* config/riscv/sync.md (amo_atomic_exchange_extended<mode>):
Separate insn with sign extension for 64 bit targets.
Jan Hubicka [Wed, 3 Sep 2025 15:55:54 +0000 (17:55 +0200)]
Do not auto-enable loop optimizations with AutoFDO
With -O2 we automatically enable several loop optimizations with -fprofile-use.
The rationale is that those optimizations at -O3 only mainly since they may
hurt performance or not pay back in code size when used blindly on all loops.
Profile feedback gives us data on number of iterations which is used by heuristics
controlling those optimizations.
Currently auto-FDO is not that good on determining number of iterations so I think we
do not want to enable them until we can prove that those are useful.
This is affecting primarily -O2 codegen.
Theoretically auto-FdO with lbr can be pretty good on estimating # of
iterations, but to make it useful we will need to implement multiplicity for
discriminators at least.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* opts.cc (enable_fdo_optimizations): Do not auto-enabele loop
optimizations with AutoFDO.
Jan Hubicka [Wed, 3 Sep 2025 15:45:02 +0000 (17:45 +0200)]
Increase default number of LTO partitions
The number of LTO partitions should exceed number of CPUs (or hyper-threads) of
commonly used CPUs. I think it is time to increase it again and as discussed
in the LTO and toplevel asm thread, doing so scales quite well. Tmp file usage
grows from 2.7 to 2.9MB which seems acceptable. Overall build time on machine
with 256 hyperthreads is comparable.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* params.opt (-param=lto-partitions=): INcrease default value from 128 to 512.
aarch64: PR target/121749: Use correct predicate for narrowing shift amounts
With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift instructions
are now represented with standard RTL and more merging optimisations occur.
This exposed a wrong predicate for the shift amount operand.
The shift amount is the number of bits of the narrow destination, not the input
sources.
Correct this by using the vn_mode attribute when specifying the predicate, which
exists for this purpose.
I've spotted a few more narrowing shift patterns that need the restriction, so
they are updated as well.
Bootstrapped and tested on aarch64-none-linux-gnu.
Patrick Palka [Wed, 3 Sep 2025 14:10:00 +0000 (10:10 -0400)]
c++: constant non-dep init folding vs FIELD_DECL access [PR97740]
Here although the local templated variables x and y have the same
reduced constant value, only x's initializer {a.get()} is well-formed
as written since A::m has private access. We correctly reject y's
initializer {&a.m} (at instantiation time), but we also reject x's
initializer because we happen to constant fold it ahead of time, which
means at instantiation time it's already represented as a COMPONENT_REF
to a FIELD_DECL, and so when substituting this COMPONENT_REF we naively
double check that the given FIELD_DECL is accessible, which fails.
This patch sidesteps around this particular issue by not checking access
when substituting a COMPONENT_REF to a FIELD_DECL. If the target of a
COMPONENT_REF is already a FIELD_DECL (i.e. before substitution), then I
think we can assume access has been already checked appropriately.
PR c++/97740
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case COMPONENT_REF>: Don't check access
when the given member is already a FIELD_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-97740a.C: New test.
* g++.dg/cpp0x/constexpr-97740b.C: New test.
Richard Biener [Wed, 3 Sep 2025 08:41:17 +0000 (10:41 +0200)]
tree-optimization/121756 - handle irreducible regions when sinking
The sinking code currently does not heuristically avoid placing
code into an irreducible region in the same way it avoids placing
into a deeper loop nest. Critically for the PR we may not insert
a VDEF into a irreducible region that does not contain a virtual
definition. The following adds the missing heuristic and also
a stop-gap for the VDEF issue - since we cannot determine
validity inside an irreducible region we have to reject any
VDEF movement with destination inside such region, even when
it originates there. In particular irreducible sub-cycles are
not tracked separately and can cause issues.
I chose to not complicate the already partly incomplete assert
but prune it down to essentials.
PR tree-optimization/121756
* tree-ssa-sink.cc (select_best_block): Avoid irreducible
regions in otherwise same loop depth.
(statement_sink_location): When sinking a VDEF, never place
that into an irreducible region.
Jonathan Wakely [Mon, 1 Sep 2025 17:12:27 +0000 (18:12 +0100)]
libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]
Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.
It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).
Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.
libstdc++-v3/ChangeLog:
PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
The following fixes a corner case of pattern stmt STMT_VINFO_REDUC_IDX
updating which happens auto-magically. When a 2nd pattern sequence
uses defs from inside a prior pattern sequence then the first guess
for the lookfor can be off. This happens when for example widening
patterns use vect_get_internal_def, which looks into earlier patterns.
PR tree-optimization/121758
* tree-vect-patterns.cc (vect_mark_pattern_stmts): Try
harder to find a reduction continuation.
Jonathan Wakely [Tue, 2 Sep 2025 21:30:46 +0000 (22:30 +0100)]
libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]
For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.
Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.
libstdc++-v3/ChangeLog:
PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Tue, 2 Sep 2025 16:04:13 +0000 (17:04 +0100)]
libstdc++: Restore C++20 <chrono> support for old std::string ABI
The r16-3416-g806de30f51c8b9 change to use __cpp_lib_chrono in
preprocessor conditions broke support for <chrono> for freestanding and
the COW std::string ABI. That happened because __cpp_lib_chrono is only
defined to the C++20 value for hosted and for the new ABI, because the
full set of C++20 features are not defined for freestanding and tzdb is
not defined for the old ABI.
This introduces a new internal feature test macro that corresponds to
the features that are always supported (e.g. chrono::local_time,
chrono::year, chrono::weekday).
libstdc++-v3/ChangeLog:
* include/bits/version.def (chrono_cxx20): Define.
* include/bits/version.h: Regenerate.
* include/std/chrono: Check __glibcxx_chrono_cxx20 instead of
__cpp_lib_chrono for C++20 features that don't require the new
std::string ABI and/or can be used for freestanding.
* src/c++20/clock.cc: Adjust preprocessor condition.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
On the assignement of _8, get_inner_reference will return `MEM[(struct s1 *)t_3(D) + 4B]`
and an offset but that does not match up with `t_3(D)` which is how split_address_to_core_and_offset
handles pointer plus.
So this patch adds the unwrapping of the MEM_REF after the call to get_inner_reference
and have it act like a pointer plus.
Changes since v1:
* v2: Remove check on operand 1 for poly_int_tree_p, it is always.
Add before the check to see if it fits in shwi instead of after.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121355
gcc/ChangeLog:
* fold-const.cc (split_address_to_core_and_offset): Handle an MEM_REF after the call
to get_inner_reference.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ptrdiff-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Sun, 31 Aug 2025 01:52:09 +0000 (18:52 -0700)]
Move the folding of memcmpy to memcmpy_eq to fold all builtins
This is a small cleanup by moving the optimization of memcmp to
memcmp_eq to fab from strlen pass. Since the copy of the other
part of the memcmp strlen optimization to forwprop, this was the
only thing left that strlen can do memcmp.
Note this move will cause memcmp_eq to be used for -Os too.
It also removes the optimization from strlen since both are now
handled elsewhere.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-ccp.cc (optimize_memcmp_eq): New function.
(pass_fold_builtins::execute): Call optimize_memcmp_eq
for memcmp.
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcmp): Remove.
(strlen_pass::check_and_optimize_call): Don't call handle_builtin_memcmp.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 2 Sep 2025 20:57:26 +0000 (13:57 -0700)]
strlen: Fixup load alignment for memcmp
Like the previous commit but for strlen copy so we can backport
this commit. The loads should have the correct alignment on them
so we need to create newly aligned types when the alignment of the
pointer is less than the alignment of the current type.
Pushed as pre-approved by https://gcc.gnu.org/pipermail/gcc-patches/2025-September/694016.html
after a bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcmp): Create
unaligned types if the alignment of the pointers is less
than the alignment of the new type.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 2 Sep 2025 00:47:55 +0000 (17:47 -0700)]
forwprop: Fix alignment of types in expansion of memcmp
I noticed that when looking into g++.dg/tree-ssa/vector-compare-1.C
failure on arm, the wrong alignment was being used for the load.
There needs to be an unaligned type here to get the correct alignment.
NOTE this means the code in strlen is also wrong but that is on its way
out so I am not sure if we should update it or not to backport to the
release branches; there could be wrong code happening too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_builtin_memcmp): Create
unaligned types if the alignment of the pointers is less
than the alignment of the new type.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
gcc/fortran
PR fortran/89707
* decl.cc (gfc_get_pdt_instance): Copy the typebound procedure
field from the PDT template. If the template interface has
kind=0, provide the new instance with an interface with a type
spec that points to that of the parameterized component.
(match_ppc_decl): When 'saved_kind_expr' this is a PDT and the
expression should be copied to the component kind_expr.
* gfortran.h: Define gfc_get_tbp.
gcc/testsuite/
PR fortran/89707
* gfortran.dg/pdt_43.f03: New test.
Paul Thomas [Tue, 2 Sep 2025 20:48:55 +0000 (21:48 +0100)]
Fortran: Handle PDTs correctly with unlimited selector [PR87669]
2025-09-02 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87669
* expr.cc (gfc_spec_list_type): If no LEN components are seen,
unconditionally return 'SPEC_ASSUMED'. This suppresses an
invalid error in match.cc(gfc_match_type_is).
gcc/testsuite/
PR fortran/87669
* gfortran.dg/pdt_42.f03: New test.
libgfortran/
PR fortran/87669
* intrinsics/extends_type_of.c (is_extension_of): Use the vptr
rather than the hash value to identify the types.
arm: testsuite: improve test compatibility of asm-hard-reg-... tests
On arm, overriding -march can lead to warnings if the testsuite
options try to pass -mcpu. Avoid these by ensuring the -mcpu is unset
before adding the architecture.
Also, improve the compatibility of asm-hard-reg-error-3.c for
hard-float environment by allowing FP instructions in the
architecture.
gcc/testsuite:
* gcc.dg/asm-hard-reg-4.c: On Arm, unset the CPU before
setting the arch.
* gcc.dg/asm-hard-reg-error-3.c: Similarly. Also add
floating-point instructions to aid hard-float variants.
Match on arm* not just arm.
Richard Biener [Tue, 2 Sep 2025 08:16:28 +0000 (10:16 +0200)]
tree-optimization/121753 - ICE with pattern breaking reduction constraints
The recent change to vect_synth_mult_by_constant missed to handle
the synth_shift_p case for alg_shift, so we still changed c * 4
to c + c + c + c. The following also amends alg_add_t2_m, alg_sub_t2_m,
alg_add_factor and alg_sub_factor appropriately.
PR tree-optimization/121753
* tree-vect-patterns.cc (vect_synth_mult_by_constant): Properly
bail when synth_shift_p and an alg_shift use. Handle other
problematic cases.
Robin Dapp [Thu, 7 Aug 2025 07:26:09 +0000 (09:26 +0200)]
RISC-V: Fix is_vlmax_len_p and use for strided ops.
This patch changes is_vlmax_len_p to handle VLS modes properly.
Before we would check if len == GET_MODE_NUNITS (mode). This works vor
VLA modes but not necessarily for VLS modes. We regularly have e.g.
small VLS modes where LEN equals their number of units but which do not
span a full vector. Therefore now check if len * GET_MODE_UNIT_SIZE
(mode) equals BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL.
Changing this uncovered an oversight in avlprop where we used
GET_MODE_NUNITS as AVL when GET_MODE_NUNITS / NF would be correct.
The testsuite is unchanged. I didn't bother to add a dedicated test
because we would have seen the fallout any way once the gather patch
lands.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (is_vlmax_len_p): Properly handle VLS
modes.
(imm_avl_p): Fix VLS length check.
(expand_strided_load): Use is_vlmax_len_p.
(expand_strided_store): Ditto.
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute):
Use GET_MODE_NUNITS / NF as avl.
Robin Dapp [Mon, 1 Sep 2025 09:41:34 +0000 (11:41 +0200)]
RISC-V: Handle overlap in expand_vec_perm PR121742.
In a two-source gather we unconditionally overwrite target with the
first gather's result already. If op1 == target this clobbers the
source operand for the second gather. This patch uses a temporary in
that case.
PR target/121742
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_perm): Use temporary if
op1 and target overlap.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121742.c: New test.
Jakub Jelinek [Tue, 2 Sep 2025 15:01:30 +0000 (17:01 +0200)]
s390: Adjust s390/spaceship-fp-*.c tests for recent changes
In r16-3414 libstdc++ changed ABI for (still experimental C++20) and uses
unordered value -128 instead of 2. Generally the change improved code
generation on all targets tested, see
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693534.html
for details.
In r16-3474 I've adjusted the middle-end and backends to use that value.
This apparently broke the gcc.target/s390/spaceship-fp-2.c test,
with -ffast-math the 2 value is unreachable and so the .SPACESHIP last
argument in that case is the default, which changed from 2 to -128.
But spaceship-fp-1.c test also doesn't test what libstdc++ uses anymore,
so the following patch uses -128 in all the spots.
2025-09-02 Jakub Jelinek <jakub@redhat.com>
* gcc.target/s390/spaceship-fp-1.c: Expect .SPACESHIP call with
-128 as last argument instead of 2.
(TEST): Use -128 instead of 2.
* gcc.target/s390/spaceship-fp-2.c: Expect .SPACESHIP call with
-128 as last argument instead of 2.
(TEST): Use -128 instead of 2.
Iain Sandoe [Sat, 30 Aug 2025 11:14:58 +0000 (12:14 +0100)]
c++, contracts: Simplify contracts headers [NFC].
We have contracts-related declarations and macros split between contracts.h
and cp-tree.h, and then contracts.h is included in the latter, which means
that it is included in all c++ front end files.
This patch:
- moves all the contracts-related material to contracts.h.
- makes some functions that are only used in contracts.cc static.
- tries to group the external API for contracts into related topics.
- includes contracts.h in the front end sources that need it.
osthread.d is trying to use PPC_THREAD_STATE32 which is not defined
in thread_act.d (PPC_THREAD_STATE is defined for the 32b case). This
leads to a build fail for libdruntime.
libphobos/ChangeLog:
* libdruntime/core/thread/osthread.d: Use PPC_THREAD_STATE
instead of PPC_THREAD_STATE32.
This patch update RISC-V Zba extension 'shNadd.uw' instruction generation.
Supplemented the instruction generation detection of 'sh1add.uw' and
'sh3add.uw'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zba-shadd.c: New test functions.
libstdc++: Move _Index_tuple, _Build_index_tuple to <type_traits>.
As preparation for implementing std::constant_wrapper that's part of the
C++26 version of the <type_traits> header, the two classes _Index_tuple
and _Build_index_tuple are moved to <type_traits>. These two helpers are
needed by std::constant_wrapper to initialize the elements of one C
array with another.
Since, <bits/utility.h> already includes <type_traits> this solution
avoids creating a very small header file for just these two internal
classes. This approach doesn't move std::index_sequence and related code
to <type_traits> and therefore doesn't change which headers provide
user-facing features.
Richard Biener [Mon, 1 Sep 2025 13:49:59 +0000 (15:49 +0200)]
Restore STMT_VINFO_VECTYPE during analysis, set to NULL for all stmts
The following makes vect_analyze_stmt call vectorizable_* with all
STMT_VINFO_VECTYPE NULL_TREE but restores the value for eventual
iteration with single-lane SLP. It clears it for every stmt during
vect_transform_stmt.
* tree-vect-stmts.cc (vect_transform_stmt): Clear
STMT_VINFO_VECTYPE for all stmts.
(vect_analyze_stmt): Likewise. But restore at the end again.
Richard Biener [Mon, 1 Sep 2025 13:01:24 +0000 (15:01 +0200)]
Pass vectype to vect_check_gather_scatter
The strided-store path needs to have the SLP trees vector type so
the following patch passes dowm the vector type to be used to
vect_check_gather_scatter and adjusts all other callers. This
removes one of the last pieces requiring STMT_VINFO_VECTYPE
during SLP stmt analysis.
* tree-vectorizer.h (vect_check_gather_scatter): Add
vectype parameter.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Get
vectype as parameter.
(vect_analyze_data_refs): Adjust.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Get vectype
as parameter, pass down.
(vect_build_slp_tree_2): Adjust.
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Likewise.
(vect_use_strided_gather_scatters_p): Likewise.
Tomasz Kamiński [Thu, 28 Aug 2025 12:56:13 +0000 (14:56 +0200)]
libstdc++: Rename __cmp_cat::__unspec to __cmp_cat::__literal_zero.
This slightly improve the readability of error message, by suggesting
that 0 (literal) is expected as argument:
invalid conversion from 'int' to 'std::__cmp_cat::__literal_zero*'
libstdc++-v3/ChangeLog:
* libsupc++/compare (__cmp_cat::__literal_zero): Rename
from __unspec.
(__cmp_cat::__unspec): Rename to __literal_zero.
(operator==, operator<, operator>, operator<=, operator>=):
Replace __cmp_cat::__unspec to __cmp_cat::__literal_zero.
Jakub Jelinek [Tue, 2 Sep 2025 10:18:52 +0000 (12:18 +0200)]
tree-cfg: Fix up assign_discriminator ICE with too large #line [PR121663]
As mentioned in the PR, LOCATION_LINE is represented in an int,
and while we have -pedantic diagnostics (and -pedantic-error error)
for too large #line, we can still overflow into negative line
numbers up to -2 and -1. We could overflow to that even with valid
source if it says has #line 2147483640 and then just has
2G+ lines after it.
Now, the ICE is because assign_discriminator{,s} uses a hash_map
with int_hash <int64_t, -1, -2>, so values -2 and -1 are reserved
for deleted and empty entries. We just need to make sure those aren't
valid. One possible fix would be just that
- discrim_entry &e = map.get_or_insert (LOCATION_LINE (loc), &existed);
+ discrim_entry &e
+ = map.get_or_insert ((unsigned) LOCATION_LINE (loc), &existed);
by adding unsigned cast when the key is signed 64-bit, it will never
be -1 or -2.
But I think that is wasteful, discrim_entry is a struct with 2 unsigned
non-static data members, so for lines which can only be 0 to 0xffffffff
(sure, with wrap-around), I think just using a hash_map with 96bit elts
is better than 128bit.
So, the following patch just doesn't assign any discriminators for lines
-1U and -2U, I think that is fine, normal programs never do that.
Another possibility would be to handle lines -1U and -2U as if it was say
-3U.
2025-09-02 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121663
* tree-cfg.cc (assign_discriminator): Change map argument type
from hash_map with int_hash <int64_t, -1, -2> to one with
int_hash <unsigned, -1U, -2U>. Cast LOCATION_LINE to unsigned.
Return early for (unsigned) LOCATION_LINE above -3U.
(assign_discriminators): Change map type from hash_map with
int_hash <int64_t, -1, -2> to one with int_hash <unsigned, -1U, -2U>.
Kito Cheng [Wed, 27 Aug 2025 08:03:30 +0000 (16:03 +0800)]
RISC-V: Remove unused print_ext_doc_entry function [NFC]
The print_ext_doc_entry function and associated version_t struct in
gen-riscv-ext-opt.cc were not being used anywhere in the codebase.
Remove them to clean up the code.
Mark Harmstone [Fri, 29 Aug 2025 19:43:57 +0000 (20:43 +0100)]
Fix assertion when trying to represent Ada arrays in CodeView
The LF_ARRAY CodeView type represents a C- or C++-style array, which a
length known at compile time. We were crashing when using -gcodeview
with Ada (bug #121157), as the DW_AT_upper_bound value is not an
unsigned integer but something more complicated:
maintainer-scripts: Improve syncing of libstdc++ docs
rsync generally is a more commonly used tool for syncing data - among
others it retains time stamps and is able to remove orphaned files on
the receiver side.
We just need to exclude some directories and a symlink from being
removed as "orphaned", since they originate elsewhere.
maintainer-scripts:
* update_web_docs_libstdcxx_git: Copy our "inner" documentation
into the web area using rsync instead of cpio and remove orphaned
files.
Jakub Jelinek [Mon, 1 Sep 2025 19:55:49 +0000 (21:55 +0200)]
c: Implement C2Y N3457 - The __COUNTER__ predefined macro
The following patch implements the
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3457.htm
paper without the first 3 lines in Recommended practice.
Seems GCC behavior already matches the expected behavior except for
diagnostics of more than 2147483648 __COUNTER__ expansions, so the
patch adds a diagnostic for that (but not testcase because
#define A __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__ __COUNTER__
#define B A A A A A A A A
#define C B B B B B B B B
#define D C C C C C C C C
#define E D D D D D D D D
#define F E E E E E E E E
#define G F F F F F F F F
#define H G G G G G G G G
#define I H H H H H H H H
#define J I I I I I I I I
J J J J
__COUNTER__
just takes too long to preprocess).
Plus I've included all the snippets from the paper into one testcase.
2025-09-01 Jakub Jelinek <jakub@redhat.com>
* macro.cc: Implement C2Y N3457 - The __COUNTER__ predefined macro.
(_cpp_builtin_macro_text): Diagnose if __COUNTER__ reaches 2147483648 value.
Jakub Jelinek [Mon, 1 Sep 2025 19:47:09 +0000 (21:47 +0200)]
c: Rename uimaxabs to umaxabs
The following patch implements
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3577.txt
No big deal on the GCC side, for uimaxabs we just won't
recognize it as builtin and I don't see it worth preserving
__builtin_uimaxabs, I doubt anything but gcc testsuite used
that.
But on the glibc side I think it will need to remain exported
for ABI compatibility :(
2025-09-01 Jakub Jelinek <jakub@redhat.com>
* builtins.def: Implement C2Y N3577 - Rename s/uimaxabs/umaxabs/.
(BUILT_IN_UIMAXABS): Rename to ...
(BUILT_IN_UMAXABS): ... this. Change second argument to "umaxabs".
* builtins.cc (fold_builtin_1): Use BUILT_IN_UMAXABS rather than
BUILT_IN_UIMAXABS.
* gcc.c-torture/execute/builtins/lib/abs.c (uimaxabs): Rename to ...
(umaxabs): ... this.
* gcc.c-torture/execute/builtins/uabs-2.c (uimaxabs): Rename to ...
(umaxabs): ... this.
(main_test): Use umaxabs instead of uimaxabs.
* gcc.c-torture/execute/builtins/uabs-3.c (main_test): Use umaxabs
instead of uimaxabs.
Harald Anlauf [Sun, 31 Aug 2025 18:42:23 +0000 (20:42 +0200)]
Fortran: truncate constant string passed to character,value dummy [PR121727]
PR fortran/121727
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_const_length_character_type_p): New helper
function.
(conv_dummy_value): Use it to determine if a character actual
argument has a constant length. If a character actual argument is
constant and longer than the dummy, truncate it at compile time.
diagnostics: Fix bootstrap fail on Darwin 32b hosts.
The use of HOST_SIZE_T_PRINT_HEX needs to be paired with a c-style
cast to (fmt_size_t) otherwise the detection mechanisms in hwint.h
are not sufficient to deal with size_t defined as 'long unsigned int'
which is done on Darwin (and I think on Windows).
This patch just makes that update.
gcc/ChangeLog:
* diagnostics/logging.h (log_param_location_t): Cast
location_t value to fmt_size_t.
configure, Darwin: Do not claim .cfi_xxx instruction support.
While the assemblers used by Darwin that are based on LLVM, do
support .cfi_ instructions, their use triggers production of
compact unwind which currently does not interwork properly with
GCC's output.
When the system objdump is used in the configure process this is
currently working by good fortune (the objdump does not recognise
the command and we fail to detect the cfi_advance.
However, if a user has binutils objdump earlier in thier PATH then
we will detect support and try to use .cfi_ which will cause later
and hard-to-diagnose issues.
Until we have this resolved, force cfi instruction use off for
Darwin.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Do not claim cfi instruction support even
if the assembler has it.
Yoshinori Sato [Mon, 1 Sep 2025 17:12:17 +0000 (11:12 -0600)]
PR target/89828 Inernal compiler error on "-fno-omit-frame-pointer"
The problem was caused by an erroneous note about creating a stack frame,
which caused the cur_cfa reg to fail to assert with a value other than
the frame pointer.
This fix will generate notes that correctly update cur_cfa.
v2 changes.
Add testcase.
All tests that failed with
"internal compiler error: in dwarf2out_frame_debug_adjust_cfa, at dwarf2cfi.cc"
now pass.
PR target/89828
gcc
* config/rx/rx.cc (add_pop_cfi_notes): Release the frame pointer if it is
used.
(rx_expand_prologue): Redesigned stack pointer and frame pointer update
process.
Like we do in other effective-targets, add
"-mcpu=unset -march=armv8-a"
directly when setting et_arm_v8_neon_flags in arm_v8_neon_ok_nocache,
to avoid having to add these two flags in all users of arm_v8_neon_ok.
This avoids duplication and possible typos / oversights.
testsuite: arm: remove arm32 check from a few effective-targets
A few arm effective-targets call check_effective_target_arm32 even
though they would force a -march=XXX flag which supports Arm and/or
Thumb-2, thus making the arm32 check useless. This has an impact when
the toolchain is configured with a default -march or -mcpu which
supports Thumb-1 only: in such a case, arm32 is false and we skip many
tests, thus reducing coverage.
This patch removes the call to check_effective_target_arm32 where it
is useless, enabling about 2000 tests.
In addition, add an early exit if the target is not an arm one, thus
saving a few compilation cycles where not needed. In all callers of
arm_neon_ok, remove the now useless "istarget arm*-*-*.
Richard Biener [Mon, 1 Sep 2025 11:29:23 +0000 (13:29 +0200)]
tree-optimization/121744 - handle CST << var in shift pattern recog
We currently do not handle promotion/demotion of 'var' when the
left operand of a variable shift is constant. There's no good
reason why, so the following fixes this omission.
PR tree-optimization/121744
* tree-vect-patterns.cc (vect_recog_vector_vector_shift_pattern):
Allow constant left operand.
Richard Biener [Wed, 27 Aug 2025 12:59:34 +0000 (14:59 +0200)]
Eliminate some STMT_VINFO_REDUC_IDX for SLP_TREE_REDUC_IDX
The following uses SLP_TREE_REDUC_IDX where it looks more appropriate.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use SLP_TREE_REDUC_IDX for following the SLP graph and
for identifying whether we use the 'else' in a COND.
(vectorizable_lane_reducing): Simplify check of whether
we are in a reduction.
(vectorizable_reduction): Add sanity checking around
SLP_TREE_REDUC_IDX and use it where it looks appropriate.
(vect_transform_reduction): Use SLP_TREE_REDUC_IDX.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_condition): Likewise.
Richard Biener [Wed, 27 Aug 2025 13:20:03 +0000 (15:20 +0200)]
Remove no longer needed STMT_VINFO_REDUC_DEF sets
The following removes no longer needed extra sets of STMT_VINFO_REDUC_DEF
and replaces a single remaining one with a more appropriate check.
* tree-vect-loop.cc (vectorizable_live_operation): Check
vect_is_reduction on the SLP node rather than
STMT_VINFO_REDUC_DEF on the stmt.
(vectorizable_reduction): Do not set STMT_VINFO_REDUC_DEF
on live stmts.
Richard Biener [Fri, 22 Aug 2025 10:29:35 +0000 (12:29 +0200)]
Introduce abstraction for vect reduction info, tracked from SLP nodes
While we have already the accessor info_for_reduction, its result
is a plain stmt_vec_info. The following turns that into a class
for the purpose of changing accesses to reduction info to a new
set of accessors prefixed with VECT_REDUC_INFO and removes
the corresponding STMT_VINFO prefixed accessors where possible.
There is few reduction related things that are used by scalar
cycle detection and thus have to stay as-is for now and as
copies in future.
This also separates reduction info into one object per reduction
and associate it with SLP nodes, splitting it out from
stmt_vec_info, retaining (and duplicating) parts used by scalar
cycle analysis. The data is then associated with SLP nodes
forming reduction cycles and accessible via info_for_reduction.
The data is created at SLP discovery time as we look at it even
pre-vectorizable_reduction analysis, but most of the data is
only populated by the latter. There is no reduction info with
nested cycles that are not part of an outer reduction.
In the process this adds cycle info to each SLP tree, notably
the reduc-idx and a way to identify the reduction info.
* tree-vectorizer.h (vect_reduc_info): New.
(create_info_for_reduction): Likewise.
(VECT_REDUC_INFO_TYPE): Likewise.
(VECT_REDUC_INFO_CODE): Likewise.
(VECT_REDUC_INFO_FN): Likewise.
(VECT_REDUC_INFO_SCALAR_RESULTS): Likewise.
(VECT_REDUC_INFO_INITIAL_VALUES): Likewise.
(VECT_REDUC_INFO_REUSED_ACCUMULATOR): Likewise.
(VECT_REDUC_INFO_INDUC_COND_INITIAL_VAL): Likewise.
(VECT_REDUC_INFO_EPILOGUE_ADJUSTMENT): Likewise.
(VECT_REDUC_INFO_FORCE_SINGLE_CYCLE): Likewise.
(VECT_REDUC_INFO_RESULT_POS): Likewise.
(VECT_REDUC_INFO_VECTYPE): Likewise.
(STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL): Remove.
(STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise.
(STMT_VINFO_FORCE_SINGLE_CYCLE): Likewise.
(STMT_VINFO_REDUC_FN): Likewise.
(STMT_VINFO_REDUC_VECTYPE): Likewise.
(vect_reusable_accumulator::reduc_info): Adjust.
(vect_reduc_type): Adjust.
(_slp_tree::cycle_info): New member.
(SLP_TREE_REDUC_IDX): Likewise.
(vect_reduc_info_s): Move/copy data from ...
(_stmt_vec_info): ... here.
(_loop_vec_info::redcu_infos): New member.
(info_for_reduction): Adjust to take SLP node.
(vect_reduc_type): Adjust.
(vect_is_reduction): Add overload for SLP node.
* tree-vectorizer.cc (vec_info::new_stmt_vec_info):
Do not initialize removed members.
(vec_info::free_stmt_vec_info): Do not release them.
* tree-vect-stmts.cc (vectorizable_condition): Adjust.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
cycle info.
(vect_build_slp_tree_2): Compute SLP reduc_idx and store
it. Create, populate and propagate reduction info.
(vect_print_slp_tree): Print cycle info.
(vect_analyze_slp_reduc_chain): Set cycle info on the
manual added conversion node.
(vect_optimize_slp_pass::start_choosing_layouts): Adjust.
* tree-vect-loop.cc (_loop_vec_info::~_loop_vec_info):
Release reduction infos.
(info_for_reduction): Get the reduction info from
the vector in the loop_vinfo.
(vect_create_epilog_for_reduction): Adjust.
(vectorizable_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise, deal with nested
cycles not part of a double reduction have no reduction info.
* config/aarch64/aarch64.cc (aarch64_force_single_cycle):
Use VECT_REDUC_INFO_FORCE_SINGLE_CYCLE, get SLP node and use
that.
(aarch64_vector_costs::count_ops): Adjust.
Richard Biener [Fri, 29 Aug 2025 11:50:32 +0000 (13:50 +0200)]
Simplify vectorizer IV analysis
The following simplifies the flow of IV analysis a bit.
* tree-vect-loop.cc (vect_is_simple_iv_evolution): Get
stmt_info and store into STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
and STMT_VINFO_LOOP_PHI_EVOLUTION_PART here. Drop unused
output parameters.
(vect_is_nonlinear_iv_evolution): Likewise.
(vect_analyze_scalar_cycles_1): Remove redundant setting
of STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED and
STMT_VINFO_LOOP_PHI_EVOLUTION_PART.
ira: Remove soft conflict related code in improve_allocation. [PR117838]
The original intention of this code was to allow more allocnos
to share the same register, but this led to expensive allocno
overflows. Extracted a small case (a bit large, see Bugzilla
PR117838 for details) from 548.exchange2_r to analyze this
register allocation issue.
------------------------------
Dump info in improve_allocation function:
Base:
Spilling a493r125 for a5r113
Spilling a573r202 for a5r113
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a20r237 for a551r120
With patch:
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a493r125 for a551r120
------------------------------
After assign_hard_reg (at the end of improve_allocation):
Collected spec2017 performance on Znver3/Graviton4/EMR/SRF for O2 and Ofast.
No performance regression was observed.
FOR multi-copy O2
SRF: 548.exchange2_r increased by 7.5%, 500.perlbench_r increased by 2.0%.
EMR: 548.exchange2_r increased by 4.5%, 500.perlbench_r increased by 1.7%.
Graviton4: 548.exchange2_r Increased by 2.2%, 511.povray_r increased by 2.8%.
Znver3 : 500.perlbench_r increased by 2.0%.
gcc/ChangeLog:
PR rtl-optimization/117838
* ira-color.cc (improve_allocation): Remove soft conflict related code.
liuhongt [Fri, 29 Aug 2025 06:38:00 +0000 (23:38 -0700)]
Fix ICE due to wrong operand is passed to ix86_vgf2p8affine_shift_matrix.
1) Fix predicate of operands[3] in cond_<insn><mode> since only
const_vec_dup_operand is excepted for masked operations, and pass real
count to ix86_vgf2p8affine_shift_matrix.
2) Pass operands[2] instead of operands[1] to
gen_vgf2p8affineqb_<mode>_mask which excepted the operand to shifted,
but operands[1] is mask operand in cond_<insn><mode>.
gcc/ChangeLog:
PR target/121699
* config/i386/predicates.md (const_vec_dup_operand): New
predicate.
* config/i386/sse.md (cond_<insn><mode>): Fix predicate of
operands[3], and fix wrong operands passed to
ix86_vgf2p8affine_shift_matrix and
gen_vgf2p8affineqb_<mode>_mask.
xtensa: Optimize branch whether (reg:SI) is within/out the range handled by CLAMPS instruction
The CLAMPS instruction in Xtensa ISA, provided when the TARGET_CLAMPS
configuration is enabled (and also requires TARGET_MINMAX), returns a
value clamped the number in the specified register to between -(1<<N) and
(1<<N)-1 inclusive, where N is an immediate value from 7 to 22.
Therefore, when the above configurations are met, by comparing the clamped
result with the original value for equality, branching whether the value
is within the range mentioned above or not is implemented with fewer
instructions, especially when the upper and lower bounds of the range are
too large to fit into a single immediate assignment.
/* example (TARGET_MINMAX and TARGET_CLAMPS) */
extern void foo(void);
void test0(int a) {
if (a >= -(1 << 9) && a < (1 << 9))
foo();
}
void test1(int a) {
if (a < -(1 << 20) || a >= (1 << 20))
foo();
}
(note: Currently, in the RTL instruction combination pass, the possible
const_int values are fundamentally constrained by
TARGET_LEGITIMATE_CONSTANT_P() if no bare large constant assignments are
possible (i.e., neither -mconst16 nor -mauto-litpools), so limiting N to
a range of 7 to only 10 instead of to 22. A series of forthcoming
patches will introduce an entirely new "xt_largeconst" pass that will
solve several issues including this.)
gcc/ChangeLog:
* config/xtensa/predicates.md (alt_ubranch_operator):
New predicate.
* config/xtensa/xtensa.md (*eqne_in_range):
New insn_and_split pattern.
Paul Thomas [Sun, 31 Aug 2025 15:47:18 +0000 (16:47 +0100)]
Fortran: Pass PDTs to dummies with VALUE attribute [PR99709]
2025-08-31 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/99709
* trans-array.cc (structure_alloc_comps): For the case
COPY_ALLOC_COMP, do a deep copy of non-allocatable PDT arrays
Suppress the use of 'duplicate_allocatable' for PDT arrays.
* trans-expr.cc (conv_dummy_value): When passing to a PDT dummy
with the VALUE attribute, do a deep copy to ensure that
parameterized components are reallocated.
gcc/testsuite/
PR fortran/99709
* gfortran.dg/pdt_41.f03: New test.
Shreya Munnangi [Sun, 31 Aug 2025 13:48:21 +0000 (07:48 -0600)]
[RISC-V] Improve initial RTL generation for SImode adds on rv64
So this is the next chunk of Shreya's work to adjust our add expanders. In this
patch we're adding support for adding a 2*s12 immediate in SI for rv64.
To recap, the basic idea is reduce our reliance on the define_insn_and_split
that was added a year or so ago by synthesizing the more efficient sequence at
expansion time. By handling this early rather than late the synthesized
sequence participates in the various optimizer passes in the natural way. In
contrast using the define_insn_and_split bypasses the cost modeling in combine
and hides the synthesis until after reload as completed (which in turn leads to
the problems seen in pr120811).
This doesn't solve pr120811, but it is the last prerequisite patch before
directly tackling pr120811.
This has been bootstrapped & regression tested on the pioneer & bpi and been
through the usual testing on riscv32-elf and riscv64-elf. Waiting on
pre-commit CI before moving forward.
gcc/
* config/riscv/riscv-protos.h (synthesize_add_extended): Prototype.
* config/riscv/riscv.cc (synthesize_add_extended): New function.
* config/riscv/riscv.md (addsi3): For RV64, try synthesize_add_extended.
gcc/testsuite/
* gcc.target/riscv/add-synthesis-2.c: New test.
Pan Li [Sun, 24 Aug 2025 08:36:00 +0000 (16:36 +0800)]
RISC-V: Add test case for unsigned scalar SAT_MUL form 4
The form 4 of unsigned scalar SAT_MUL is covered in middle-expand
alreay, add test case here to cover form 4.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-5-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-5-u8-from-u64.c: New test.
Jakub Jelinek [Sat, 30 Aug 2025 16:29:04 +0000 (18:29 +0200)]
phiopt, math-opts: Adjust spaceship_replacement and optimize_spaceship for recent libstdc++ changes [PR121698]
libstdc++ changed its ABI in <compare> for C++20 recently (under the
C++20 is still experimental rule). In addition to the -1, 0, 1 values
for less, equal, greater it now uses -128 for unordered instead of
former 2 and changes some of the operators, instead of checks like
(_M_value & ~1) == _M_value in some cases it now uses _M_reverse()
which is negation in unsigned char type + conversion back to the original
type. _M_reverse() thus turns the -1, 0, 1, -128 values into
1, 0, -1, -128. Note libc++ uses value -127 instead of 2/-128.
Now, the middle-end has some optimizations which rely on the particular
implementation and don't optimize if not. One is optimize_spaceship
which on some targets (currently x86, aarch64 and s390) attempts to use
better comparison instructions (ideally just one floating point comparison
to get all 4 possible outcomes plus some flag tests or magic instead of
2 or 3 floating point comparisons). This one can actually handle
arbitrary int non-[-1,1] values for unordered but still has a default
of 2. The patch changes that default to -128 so that even if something
is expanded as branches if it is later during RTL optimizations determined
to convert that into partial_ordering we get better code.
The other optimization (phiopt one) is about optimizing (x <=> y) < 0
etc. into just x < y. This one actually relies on the exact unordered
value (2) and has code to deal with that (_M_value & ~1) == _M_value
kind of tests and whatever match.pd lowers it. So, this patch partially
rewrites it to look for -128 instead of 2, drop those
(_M_value & ~1) == _M_value pattern recognitions and instead introduces
pattern recognition of _M_reverse(), i.e. cast to unsigned char, negation
in that type and cast back to the original signed type.
With all these changes we get back the desired optimizations for all
the cases we could optimize previously (note, for HONOR_NANS case
we don't try to optimize say (x <=> y) == 0 because the original
will raise exception if either x or y is a NaN, while turning it into
x == y will not, but (x <=> y) <= 0 is fine (x <= y), because it
does raise those exceptions.
2025-08-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121698
* tree-ssa-phiopt.cc (spaceship_replacement): Adjust
to handle spaceship unordered value -128 rather than 2 and
stmts from the new std::partial_order::_M_reverse() instead
of (_M_value & ~1) == _M_value etc.
* doc/md.texi (spaceship@var{m}4): Use -128 instead of 2.
* tree-ssa-math-opts.cc (optimize_spaceship): Adjust comments
that libstdc++ unordered value is -128 rather than 2 and use
that as the default unordered value.
* config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Use
GEN_INT (-128) instead of const2_rtx and adjust comment accordingly.
* config/aarch64/aarch64.cc (aarch64_expand_fp_spaceship): Likewise.
* config/s390/s390.cc (s390_expand_fp_spaceship): Likewise.
* gcc.dg/pr94589-2.c: Adjust for expected unordered value -128
rather than 2 and negations in unsigned char instead of and with
~1 and comparison against original value.
* gcc.dg/pr94589-4.c: Likewise.
* gcc.dg/pr94589-5.c: Likewise.
* gcc.dg/pr94589-6.c: Likewise.
to check if 2 TLS_COMBINE patterns have the same source.
gcc/
PR target/121725
* config/i386/i386-features.cc
(pass_x86_cse::candidate_gnu2_tls_p): Use the UNSPEC_DTPOFF
operand to check source operand in TLS64_COMBINE pattern.
gcc/testsuite/
PR target/121725
* gcc.target/i386/pr121725-1a.c: New test.
* gcc.target/i386/pr121725-1b.c: Likewise.
Andrew Pinski [Fri, 29 Aug 2025 00:20:21 +0000 (17:20 -0700)]
forwprop: Copy the memcmp optimization from strlen to forwprop [PR116651]
To better optimize code dealing with `memcmp == 0` where we have
a small constant size, we can inline the memcmp in those cases.
There is code to do this in strlen but that is run too late in
the case where we can figure out the value of one of the arguments
to memcmp. So this copies the optimization to forwprop.
An example of where this helps is:
```
bool cmpvect(const std::vector<int> &a) { return a == std::vector<int>{10}; }
```
Where the above should be optimized to just `return a.size() == 1 && a[0] == 10;`.
Note pr44130.c testcase needed to change as now it will be optimized away otherwise.
Note the loop in pr44130.c os also vectorized which it was not before.
Note the optimization remains in strlen as the other part (memcmp -> memcmp_eq)
should move to either isel or fab and I didn't want to remove it just yet.
Bootstrapped and tested on x86_64-linux-gnu.
Changes since v1:
* v2: Add verification of arguments to memcmp to simplify_builtin_memcmp.
Gaius Mulley [Fri, 29 Aug 2025 21:10:29 +0000 (22:10 +0100)]
PR modula2/121709: Failed bootstrap in m2
This patch is a followup to PR modula2/121629 which uses
the cpp_include_defaults array to configure the default search path
entries. In particular it creates default search paths
based on LOCAL_INCLUDE_DIR, PREFIX_INCLUDE_DIR, gcc version path
and NATIVE_SYSTEM_HEADER_DIR.
Sirui Mu [Thu, 28 Aug 2025 13:48:24 +0000 (21:48 +0800)]
c++: array subscript with COND_EXPR as the array
The following minimum reproducer would miscompile with vanilla gcc:
extern int x[10], y[10];
bool g();
void f() { 0[g() ? x : y] = 1; }
gcc would mistakenly treat the subexpression (g() ? x : y) as a prvalue and
move that array to stack. The following assignment would then write to the
stack instead of to the global arrays. When optimizations are enabled, this
assignment is discarded by dse and gcc generates the following code for the
f function:
"_Z1fi":
jmp "_Z1gv"
The miscompilation requires all the following conditions to be met:
- The array subscription expression is written as idx[array], instead of
the usual form array[idx];
- The "array" part must be a ternary expression (COND_EXPR in gcc tree)
and it must be an lvalue.
- The code must be compiled with -fstrong-eval-order which is the default
for -std=c++17 or later.
The cause of the issue lies in cp_build_array_ref, where it mistakenly
generates a COND_EXPR with ARRAY_TYPE to the IL when all the criteria above
are met. This patch tries to resolve this issue. It moves the
canonicalization step that transforms idx[array] to array[idx] early in
cp_build_array_ref to ensure we handle these two forms of array subscription
consistently.
David Malcolm [Fri, 29 Aug 2025 18:39:37 +0000 (14:39 -0400)]
diagnostics: add GCC_DIAGNOSTICS_LOG
Whilst experimenting with PR diagnostics/121039 (potentially capturing
suppressed diagnostics in SARIF output), I found it very useful to have
a text log from the diagnostic subsystem to track what it's doing and
the decisions it's making (e.g. exactly when and why a diagnostic is
being rejected).
This patch adds a simple logging mechanism to the diagnostics subsystem,
enabled by setting GCC_DIAGNOSTICS_LOG in the environment, which emits
nested text like this to stderr (or a named file):
warning (option_id: 668, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
diagnostics::context::diagnostic_impl (option_id: 668, kind: warning, gmsgid: "%<-Wformat-security%> ignored without %<-Wformat%>")
diagnostics::context::report_diagnostic
rejecting: diagnostic not enabled
false <- diagnostics::context::diagnostic_impl
false <- warning
This logging mechanism doesn't use pretty_printer because it can be
helpful to use it to debug pretty_printer itself.
xtensa: Rewrite bswapsi2_internal with compact syntax
Also, the omission of the instruction that sets the shift amount register
(SAR) to 8 is now more efficient: it is omitted if there was a previous
bswapsi2 in the same BB, but not omitted if no bswapsi2 is found or another
insn that modifies SAR is found first (see below).
Note that the five instructions for writing to SAR are as follows, along
with the insns that use them (except for bswapsi2_internal itself):
* config/xtensa/xtensa-protos.h (xtensa_bswapsi2_output):
New function prototype.
* config/xtensa/xtensa.cc
(xtensa_bswapsi2_output_1, xtensa_bswapsi2_output):
New functions.
* config/xtensa/xtensa.md (bswapsi2_internal):
Rewrite in compact syntax and use xtensa_bswapsi2_output() as asm
output.
Jeff Law [Fri, 29 Aug 2025 17:43:30 +0000 (11:43 -0600)]
[RISC-V][PR target/121548] Avoid bogus index into recog operand cache
So the RISC-V port has attributes which indicate the index within the
recog_data where certain operands will be found.
For this BZ the default value for the merge_op_idx attribute on the given insn
is "2". But the insn only has operands 0 & 1. So we do an out of bounds array
access and boom the ICE/valgrind failure.
As we discussed in the patchwork meeting, this is all a bit clunky and has been
fairly error prone. This doesn't add any massive checking, but does introduce
some asserts to help catch problems a bit earlier and clearer.
In particular in cases where we're already asserting that the returned index is
valid (!= INVALID_ATTRIBUTE) we also assert that the index is less than the
total number of operands.
In the get_vlmax_ta_preferred_avl routine it appears like we need to handle
these two cases more gracefully as we apparently legitimately query for the
merge_op_idx on a fairly arbitrary insn. We just have to make sure to not
*use* the result if it's INVALID_ATTRIBUTE. So for that code we assert that
merge_op_idx is either INVALID_ATTRIBUTE or smaller than the number of
operands.
This patch also adds overrides for 3 patterns to return INVALID_ATTRIBUTE for
merge_op_idx, similar to how they already do for mode_idx and avl_type_idx.
This has been bootstrapped and regression tested on the bpi & pioneer systems
and regression tested for riscv32-elf and riscv64-elf. Waiting on CI before
pushing.
PR target/121548
gcc/
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): Assert
MODE_IDX is smaller than the number of operands.
(simplify_replace_vlmax_avl): Similarly.
(pass_avlprop::get_vlmax_ta_preferred_avl): Similarly.
* config/riscv/vector.md: Override merge_op_idx computation
for simple moves, just like is done for avl_type_idx and mode_idx.
Harald Anlauf [Thu, 28 Aug 2025 20:07:10 +0000 (22:07 +0200)]
Fortran: improve compile-time checking of character dummy arguments [PR93330]
PR fortran/93330
gcc/fortran/ChangeLog:
* interface.cc (get_sym_storage_size): Add argument size_known to
indicate that the storage size could be successfully determined.
(get_expr_storage_size): Likewise.
(gfc_compare_actual_formal): Use them to handle zero-sized dummy
and actual arguments.
If a character formal argument has the pointer or allocatable
attribute, or is an array that is not assumed or explicit size,
we generate an error by default unless -std=legacy is specified,
which falls back to just giving a warning.
If -Wcharacter-truncation is given, warn on a character actual
argument longer than the dummy. Generate an error for too short
scalar character arguments if -std=f* is given instead of just a
warning.