git.ipfire.org Git - thirdparty/gcc.git/log

tree-optimization/125185 - fix ICE with associating DOT_PROD_EXPR

When trying to discover a SLP reduction chain we eventually feed
non-binary associatable stmts to vect_slp_linearize_chain which
isn't prepared for that. Don't.

PR tree-optimization/125185
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): Guard
first vect_slp_linearize_chain call.

* gcc.dg/torture/pr125185.c: New testcase.

libstdc++: Make std::unique_ptr<void>::operator* SFINAE-friendly

This implements LWG 4324, "unique_ptr<void>::operator* is not
SFINAE-friendly", approved in Croydon, 2026.

The noexcept-specifier added to C++23 by LWG 2762 is ill-formed if the
pointer type cannot be dereferenced, which means that code which was
checking whether the function exists (e.g. in a SFINAE context) no
longer works. Such code was always questionable, because the function
body was ill-formed if the pointer isn't dereferenceable, so the SFINAE
check was probably giving the wrong answer, but it was possible to ask
the question. Since LWG 2762 just asking the question can produce an
error outside the immediate context, so operator* is no longer
SFINAE-friendly.

LWG 4324 adds a constraint to the function, so that it doesn't
participate in overload resolution if it would be ill-formed. That's
easy to implement for C++20 because we can just add a requires-clause.

For C++11/14/17 we can't constrain it easily, so just adjust the
noexcept-specifier so that it's not ill-formed. This still means you get
the wrong answer (i.e. it looks like unique_ptr<void>::operator* is
callable) but there's no error outside the immediate context. This
restores the original semantics before the LWG 2762 change, for better
or worse.

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h (unique_ptr::_Nothrow_deref): New
helper for pre-C++20.
(unique_ptr::operator*): Either constrain or use _Nothrow_deref.
* testsuite/20_util/unique_ptr/lwg4324.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

xtensa: Apply further improvement to xtensa_legitimize_address()

The load/store instructions in the Xtensa ISA have an unsigned 8-bit
displacement immediate field that scales with the byte width of the
reference.  That is, for a 1-byte reference, the displacement is between
0 and 255, for 2-bytes between 0 and 510, and for 4-bytes between 0 and
1020.

However, xtensa_legitimize_address() has not been able to take advantage
of this fact until now, and has limited the maximum displacement to 255
regardless of the reference byte width.

This patch resolves the above limitation and slightly improves the effi-
ciency of large positive displacements during memory accesses wider than
1-byte.

     /* example */
     int test(short a[]) {
       return a[32767] + a[16511] + a[1];
     }

     ;; before (-O2)
      .literal_position
      .literal .LC0, 65534
     test:
      entry sp, 32
      l32r a8, .LC0
      addmi a9, a2, 0x100
      add.n a8, a2, a8
      addmi a9, a9, 0x7f00
      l16si a8, a8, 0 ;; 32767 = 65534 / 2
      l16si a9, a9, 254 ;; 16551 = (32512 + 256 + 254) / 2
      l16si a2, a2, 2
      add.n a8, a8, a9
      add.n a2, a8, a2
      retw.n

     ;; after (-O2)
     test:
      entry sp, 32
      addmi a9, a2, 0x7f00 ;; CSEd
      addmi a8, a9, 0x7f00
      l16si a8, a8, 510 ;; 32767 = (32512 + 32512 + 510) / 2
      l16si a9, a9, 510 ;; 16511 = (32512 + 510) / 2
      l16si a2, a2, 2
      add.n a8, a8, a9
      add.n a2, a8, a2
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_legitimize_address):
Modify to extend the upper limit of the coverable offset if the
address displacement of the corresponding machine instruction is
greater than 255.

testsuite: Fix up pr122569*.c tests [PR122569]

On Tue, May 05, 2026 at 02:27:23PM +0800, H.J. Lu wrote:
> The new tests failed with -m32 on Linux/x86-64:
>
> FAIL: gcc.dg/tree-ssa/pr122569-1.c scan-tree-dump forwprop1
> "__builtin_ctz|\\.CTZ"
> FAIL: gcc.dg/tree-ssa/pr122569-2.c scan-tree-dump forwprop1
> "__builtin_clz|\\.CLZ"
>
> Should these tests require int128?

They should first of all require ctzll resp. clzll effective targets,
if there is a function call for those, then it certainly isn't optimized.

The problem is that that isn't enough, ia32 is both ctzll and clzll
effective target. That is because we handle double-word __builtin_c[tl]zll
by doing 2 word ops and one conditional.
The tree-ssa-forwprop.cc optimization is checking for whether it can use
IFN_CLZ/IFN_CTZ, and that is not the case, because we only use direct optab
for that and don't have the double-word unop fallback for that.

Rather than int128 I think it is more natural to test for lp64 || llp64.

2026-05-05 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: Only require __builtin_ctz/.CTZ
on ctzll 64-bit targets.
* gcc.dg/tree-ssa/pr122569-2.c: Only require __builtin_clz/.CLZ
on clzll 64-bit targets.

Reviewed-by: Richard Biener <rguenth@suse.de>

i386: Avoid splitting 16/32-bit volatile mem test into 8-bit test [PR125180]

With -ffuse-ops-with-volatile-access which is now even on by default
thie splitter can split a 16 or 32-bit volatile memory test into
an 8-bit volatile memory test, which is undesirable and e.g. when
it refers to some memory mapped hw registers it could misbehave.

2026-05-05 Jakub Jelinek <jakub@redhat.com>

PR target/125180
* config/i386/i386.md (HI/SI test -> QI test splitter): Punt if
operands[2] is a volatile MEM.

* gcc.target/i386/pr125180.c: New test.

Reviewed-by: Uros Bizjak <ubizjak@gmail.com>

c++: modules: Fix posix_fallocate error handling

When testing GCC on FreeBSD in a ZFS build directory, every single
g++.dg/modules test FAILS like

FAIL: g++.dg/modules/100616_a.H (internal compiler error: Segmentation fault)
FAIL: g++.dg/modules/100616_a.H (test for excess errors)
FAIL: g++.dg/modules/100616_a.H module-cmi  (gcm.cache/\$srcdir/g++.dg/modules/100616_a.H.gcm)

for a total of almost 2200.  This happens because posix_fallocate
returns ENOTSUP as documented in IEEE 1003.1-2024/XPG8:

[ENOTSUP]
    The underlying file system does not support this operation.

However, module.cc (elf_out::create_mapping) only falls back to
ftruncate for a return value of EINVAL.  This won't happen on
glibc-based systems because posix_fallocate itself emulates the
alloction under the hood, so the error is never exposed.

The patch is trivial: just also expect ENOTSUP in this situation, which
fixes all related failures.

Bootstrapped without regressions on amd64-pc-freebsd15.0,
i386-pc-solaris2.11, and x86_64-pc-linux-gnu.

2026-05-01  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/cp:
* module.cc (elf_out::create_mapping) [MAPPED_WRITING]
(elf_out::create_mapping) [HAVE_POSIX_FALLOCATE]: Allow for
ENOTSUP return from posix_fallocate.

libstdc++: Remove duplicated __mdspan::__is_constant_wrapper.

Replace it with std::__is_constant_wrapper_v from utility.

libstdc++-v3/ChangeLog:

* include/std/mdspan: Replace eight spaces with tabs.
(__mdspan::__is_constant_wrapper): Remove.
(__mdspan::__acceptable_slice_type, __mdspan::__static_slice_extent)
(__mdspan::__is_unit_stride_slice, __mdspan::__canonical_range_slice)
(__mdspan::__check_inrange_index, __mdspan::__check_valid_index)
(__mdspan::__check_valid_slice): Use std::__is_constant_wrapper_v.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

tree-optimization/125124 - disable sanity checking of BB SLP partitioning

The following disables a sanity check that BB SLP partitioning correctly
partitioned the SLP graph.

PR tree-optimization/125124
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Disable
BB SLP partitioning sanity-check.

* gcc.dg/torture/pr125124.c: New testcase.

x86: Fix shift-gf2p8affine-2.c failure on non-AVX512 CPU

Enabling AVX512 via command line may cause the compiler to generate
AVX512 instructions even before the runtime CPU feature check, causing
the test to SIGILL if the CPU lacks AVX512.  Extract tests to do_test
and change main to call only if __builtin_cpu_supports ("gfni") returns
true to avoid any AVX512 instructions in main:

main:
        movq    __cpu_features2@GOTPCREL(%rip), %rax
        testb   $1, (%rax)
        jne     .L1577
        xorl    %eax, %eax
        ret
.L1577:
        pushq   %rax
        call    do_test
        xorl    %eax, %eax
        popq    %rdx
        ret

* gcc.target/i386/shift-gf2p8affine-2.c (do_test): New function.
Extracted from main.
(main): Drop __builtin_cpu_init.  Call do_test only if
__builtin_cpu_supports ("gfni") returns true.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

configury: Use only one copy of CHECK_ATTRIBUTE_VISIBILITY macro

Currently libatomic, libgfortran, libgomp, and libitm have a version
of the CHECK_ATTRIBUTE_VISIBILITY macro.

Put the macro in its own file and have all libraries use it.

config/ChangeLog:

* visibility.m4: New file.

libatomic/ChangeLog:

* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.

libgfortran/ChangeLog:

* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY istead of
LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.

libgomp/ChangeLog:

* Makefile.in: Regenerate.
* acinclude.m4: Delete LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.

libitm/ChangeLog:

* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.

Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>

Daily bump.

c++: Fix handling of && after a class definition [PR65271]

After r166977, we are wrongly rejecting:
struct {} && m = {};
because our code to diagnose a missing ; after a class definition doesn't
realize that && can follow a class definition.

This is simlar in nature to what was done for `::` in r12-8304-g851031b2fcd5210b9676.

Bootstrapped and tested on x86_64-linux-gnu.

Changes since v1:
* v2: Remove the check on c++11 and add a few more testcases.
* v3: Move CPP_AND_AND right below CPP_AND_AND and add enum case to the testcase.

PR c++/65271

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_specifier): Accept &&.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/rv-decl1.C: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

c/c++: Declare stack protection guard as a global symbol

Add a new target hook, stack_protect_guard_symbol_p, to support the user
provided stack protection guard as a global symbol. If the hook returns
true,

1. Declare __stack_chk_guard as a global uintptr_t variable so that it
can be initialized as an integer.
2. If the user declared variable matches __stack_chk_guard, merge it with
__stack_chk_guard, including its visibility attribute.

gcc/

PR c/121911
* target.def (stack_protect_guard_symbol_p): New target hook.
* targhooks.cc (default_stack_protect_guard): Use the type of
uintptr_t, instead of ptr_type_node, if the
stack_protect_guard_symbol_p hook returns true.
* config/i386/i386.cc (ix86_stack_protect_guard_symbol_p): New.
(TARGET_STACK_PROTECT_GUARD_SYMBOL_P): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_STACK_PROTECT_GUARD_SYMBOL_P): New.

gcc/c-family/

PR c/121911
* c-common.cc (c_common_nodes_and_builtins): If the
stack_protect_guard_symbol_p hook returns true, declare a global
symbol for stack protection guard.

gcc/testsuite/

PR c/121911
* g++.target/i386/ssp-global-1.C: New test.
* g++.target/i386/ssp-global-2.C: Likewise.
* g++.target/i386/ssp-global-3.C: Likewise.
* g++.target/i386/ssp-global-hidden-1.C: Likewise.
* g++.target/i386/ssp-global-hidden-2.C: Likewise.
* g++.target/i386/ssp-global-hidden-3.C: Likewise.
* gcc.target/i386/ssp-global-2.c: Likewise.
* gcc.target/i386/ssp-global-3.c: Likewise.
* gcc.target/i386/ssp-global-4.c: Likewise.
* gcc.target/i386/ssp-global-hidden-1.c: Likewise.
* gcc.target/i386/ssp-global-hidden-2.c: Likewise.
* gcc.target/i386/ssp-global-hidden-3.c: Likewise.
* gcc.target/i386/ssp-global.c: Include <stdint.h>.
(__stack_chk_guard): Change its type to uintptr_t.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

m32c: Remove all support for M32C target.

m32c target support in GCC was deprecated in GCC 16 as the target had no
maintainer since GCC 9.

contrib/ChangeLog:

* compare-all-tests: Remove references to m32c.
* config-list.mk: Likewise.

gcc/ChangeLog:

* config/m32c/*: Delete entire directory.
* attr-urls.def: Remove references to m32c.
* config.gcc: Likewise.
* config/msp430/msp430.cc (msp430_expand_epilogue): Likewise.
* configure: Regenerate.
* configure.ac: Remove references to m32c.
* doc/extend.texi: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* explow.cc (promote_mode): Likewise.
* regenerate-opt-urls.py: Likewise.
* config/microblaze/microblaze.opt.urls: Regenerate.
* config/msp430/msp430.opt.urls: Regenerate.
* config/nds32/nds32.opt.urls: Regenerate.
* config/rl78/rl78.opt.urls: Regenerate.
* config/rs6000/sysv4.opt.urls: Regenerate.
* config/rx/elf.opt.urls: Regenerate.
* config/stormy16/stormy16.opt.urls: Regenerate.
* config/visium/visium.opt.urls: Regenerate.

libgcc/ChangeLog:

* config/m32c/*: Delete entire directory.
* config.host: Remove references to m32c.
* config/rl78/lib2div.c (C3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/20000804-1.c: Remove references to m32c.
* gcc.c-torture/compile/20001226-1.c: Likewise.
* gcc.c-torture/compile/limits-stringlit.c: Likewise.
* gcc.c-torture/execute/20020404-1.c: Likewise.
* gcc.dg/20020312-2.c: Likewise.
* gcc.dg/max-1.c: Likewise.
* gcc.dg/torture/pr26565.c: Likewise.
* gcc.dg/tree-ssa/reassoc-32.c: Likewise.
* gcc.dg/tree-ssa/reassoc-33.c: Likewise.
* gcc.dg/tree-ssa/reassoc-34.c: Likewise.
* gcc.dg/tree-ssa/reassoc-35.c: Likewise.
* gcc.dg/tree-ssa/reassoc-36.c: Likewise.
* gcc.dg/utf-array-short-wchar.c: Likewise.
* gcc.dg/utf-array.c: Likewise.
* lib/target-supports.exp: Likewise.

aarch64: Fix SVE vec_perm for VL2048 VNx16QI

SVE's vec_perm pattern is restricted to constant VLs.  There are two
expansions: one for when the selector is known to refer to only the
first vector, and one for the general case.

The first expansion uses a single TBL whereas the fallback uses a
five-instruction sequence that includes a SUB of nunits and two TBLs.

Normally the first expansion is purely an optimisation.  However,
in the specific case of a VL2048 permutation of bytes, the first
form is needed for correctness, since the SUB of nunits (256)
would be truncated to a SUB of zero.

For example, in:

  svint8_t f(svint8_t x, svint8_t y, svint8_t z) {
    return __builtin_shuffle(x, y, z);
  }

"z" can only select from "x" for VL2048.  The testcase previously
generated:

        tbl     z0.b, {z0.b}, z2.b
        tbl     z1.b, {z1.b}, z2.b
        orr     z0.d, z0.d, z1.d
        ret

where the SUB is optimised away.  This sequence is equivalent to:

    return __builtin_shuffle(x | y, x | y, z);

even though "y" should be entirely ignored.

I used "<= nunits - 1U" rather than "< nunits" to match the existing
check and as a hopefully natural way of making the rhs unsigned.

gcc/
* config/aarch64/aarch64.cc (aarch64_expand_sve_vec_perm): Check
whether all indices of a variable selector refer to the first
values vector.

gcc/testsuite/
* gcc.target/aarch64/sve/vec_perm_2.c: New test.
* gcc.target/aarch64/sve/vec_perm_3.c: Likewise.

Regenerate xtensa.opt.urls

Fixes: 9ae50cbca946 "doc: Document several "force_l32" features for Xtensa"
gcc/ChangeLog:

* config/xtensa/xtensa.opt.urls: Regenerated.

c++, contracts: fix testsuite basic.contract.eval.p8 failed

I noticed that the following code passed.

1 | consteval void foo( auto x ) pre( false ) { return x; }
2 |
3 | static_assert (foo( 1 ) == 1, "");
4 |
5 | int main() {
6 | foo( 1 );
7 | }

However, the code has contract violations.
In constexpr_call, a result with contract violations should
not be cached.

gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Do not cache
result with contract violation.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/cpp26/basic.contract.eval.p8-3.C: New test.

c++/reflection: fix ICE on is_accessible [PR124241]

Anonymous unions don't have their own access.
This patch fix the missing check for otype in accessible_p at search.cc.

gcc/cp/ChangeLog:

PR c++/124241
* search.cc (accessible_p): Call type_context_for_name_lookup
for otype if it's anonymous union.

gcc/testsuite/ChangeLog:

PR c++/124241
* g++.dg/reflect/is_accessible2.C: Completed the TODO of the PR.

Reviewed-by: Jason Merrill <jason@redhat.com>

gcc/doc: Clarify warning for variable unused

In the www gcc-16/porting_to, two lines in comment text on
the example code for -Wunused-variable was changed from
"pre/postincrement used" to "pre/postincrement result used".
Approval there directs that the change should be propagated
back to the texi source the example came from. This is that
propagation.

gcc/Changelog:
* doc/invoke.texi: insert "result" in comment text

forwprop: allow more VPEs in simplify_vector_constructor () [PR122679]

Currently, simplify_vector_constructor () tries to rewrite a CONSTRUCTOR
expression into a VEC_PERM_EXPR, as long as constructor elements all come
from 1 or 2 source vectors.  While doing so, it protects against creating
VEC_PERM_EXPRs unsupported by the target by calling can_vec_perm_const_p
() before enacting the transformation and bailing when that returns false.

However, we can instead allow those VEC_PERM_EXPRs to be created if we
know that a later vector lowering pass will legitimize them for us.  IOW,
only if the target doesn't support the resulting permute and the
PROP_gimple_lvec property is already set, do we give up.  This patch
inserts the required checks.

This also allows us to remove the unnecessary vect_int requirement
(wrongly added in r16-5244-g5a2319b71e4d30) from forwprop-43.c.

(Re-)regtested on aarch64, arm, and x86_64.

PR tree-optimization/122679

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_vector_constructor): Check the
PROP_gimple_lvec property before returning false.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/forwprop-43.c: Remove the vect_int check.

phiprop: skip over clobbers [PR116823]

Like the already approved patch at
https://inbox.sourceware.org/gcc-patches/20240924161312.1556293-2-quic_apinski@quicinc.com/
but reworked to fit into the new simplified code and
also fixed a bug noticed for aggregates.

Aggregates include a store when doing phiprop so we need to
check if there are also loads between the original store/load
and the clobber we are skipping. Like the skipping of the
store case, I didn't see this happening enough to add the
extra checks. I did add a testcase (phiprop-5.C) which checks
this.

changes since v2:
* v3: treat aggregates special earlier and don't duplicate code.
* v2: adapt to can_handle_load instead of inline.

PR tree-optimization/116823

gcc/ChangeLog:

* tree-ssa-phiprop.cc (can_handle_load): Skip past
clobbers for !aggregate.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/phiprop-2.C: New test.
* g++.dg/tree-ssa/phiprop-4.C: New test.
* g++.dg/tree-ssa/phiprop-5.C: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Rename value_range::set_type to value_range::set_range_class

value_range::set_type doesn't set the m_type of the underlying
vrange; it merely sets m_vrange to use an appropriate vrange
subclass for the given type.

This confused me. This patch renames it to avoid other people being
similarly confused.

No functional change intended.

gcc/
* data-streamer-in.cc (streamer_read_value_range): Update for
renaming of value_range::set_type to value_range::set_range_class.
* gimple-range-gori.cc (gori_compute::compute_operand_range):
Likewise.
(gori_compute::compute_operand1_and_operand2_range): Likewise.
(gori_stmt_info::gori_stmt_info): Likewise.
(gori_calc_operands): Likewise.
(gori_name_helper): Likewise.
* ipa-cp.cc (ipcp_vr_lattice::set_to_bottom): Likewise.
* ipa-cp.h (ipcp_vr_lattice::init): Likewise.
* ipa-fnsummary.cc (evaluate_properties_for_edge): Likewise.
* ipa-prop.cc (ipa_vr::get_vrange): Likewise.
* range-op.h (range_cast): Likewise.
* value-range.h (value_range::set_type): Rename to...
(value_range::set_range_class): ...this, and add a note to the
leading comment that it doesn't set the type of the underlying
vrange.
(value_range::init): Add a similar note to the leading comment.
gcc/analyzer/
* svalue.cc (binop_svalue::maybe_get_value_range): Update for
renaming of value_range::set_type to value_range::set_range_class.
(unaryop_svalue::maybe_get_value_range): Likewise.

Ranger_cache::range_of_expr should handle no context.

range_of_expr is suppose to return a global value if there is no context
and instead it was crashing.

* gimple-range-cache.cc (ranger_cache::range_of_expr): Handle
NULL statement.

Unify range_of_address with other range_of_* routines.

When range_of_address is called, we return immeidately, missing any
potential post calculation processing.

* gimple-range-fold.cc (fold_using_range::fold_stmt): Move
range_of_address call into nested 'if' with other routines.

update_range_info can mark a statement for recalculation.

Add an alternative update_range_info method which marks the SSA_NAME as
"to be recalcualted" the next time it is used.

* gimple-range-cache.cc (ranger_cache::ranger_cache): Allocate bitmap.
(ranger_cache::~ranger_cache): Free bitmap.
(ranger_cache::mark_stale): New.
(ranger_cache::get_global_range): Check if NAME is marked stale.
* gimple-range-cache.h (ranger_cache::mark_stale): New.
* gimple-range.cc (gimple_ranger::update_range_info): New variant.
* gimple-range.h (update_range_info): New prototype.
* gimple.h (gimple_set_modified): Call update_range_info.
* value-query.cc (range_query::update_range_info): New variant.
* value-query.h (range_query::update_range_info): New prototype.

Integrate bound snapping with pair construction.

Rather than build all the pairs and then apply a mask to those pairs,
apply the mask to each pair as they are constructed.

* value-range.cc (irange::intersect): Snap bounds as they are created.

get_tree_range should check the supplied range type.

get_tree_range currently checks whether value_range supports the
requested type which is incorrect. It should check whether the supplied
vrange supports the type.

* value-query.cc (range_query::get_tree_range): Check if return
range R supports the expression type.

i386: Relax predicates in BT splitters

Allow QImode subregs of AND results in HImode and SImode (and DImode
on 64-bit targets).  Also allow memory operands for the BT base operand
to increase combine opportunities and enable better insn propagation.

The BT insn is slow when using a memory base with a variable bit index,
but the register allocator can reload a memory operand into a register to
satisfy BT pattern constraints.

The patch improves code generation for the included testcase from:

mask_get_flag:
        movl    %esi, %ecx
        movl    $1, %eax
        salq    %cl, %rax
        testq   %rdi, %rax
        setne   %al
        ret
to:

mask_get_flag:
        xorl    %eax, %eax
        btq     %rsi, %rdi
        setc    %al
        ret

gcc/ChangeLog:

* config/i386/i386.md (*bt<SWI48:mode>_mask): Use
int248_register_operand for operand 1 predicate.
(*jcc_bt<mode>_mask): Use nonimmediate_operand for operand 1 predicate.
(*jcc_bt<SWI48:mode>_mask_1): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.
(BT followed by CMOV splitter): Use nonimmediate_operand
for operand 1 predicate.
(*bt<mode>_setcqi): Ditto.
(*bt<mode>_setncqi): Ditto.
(*bt<mode>_setnc<mode>): Ditto.
(*bt<mode>_setncqi_2): Ditto.
(*bt<mode>_setc<mode>_mask): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/bt-8.c: New test.

testsuite, X86, Darwin: 64bit Darwin does not support non-PIC code.

Making good portable function-body scan tests can be challenging.

In addition to assembler syntax and ABI differences, one also needs to
account for platform constraints. In some cases, we hope to automate
common comparisons - but there are limits to what is feasible.

64Bit Darwin does not support non-PIC code on any platform and so some
of the x86 function b0dy scan tests which are expecting the ELF default
produce code which is too different to be realistically handled with
conditional matches.

We are just going to skip tests in this category.

gcc/testsuite/ChangeLog:

* gcc.target/i386/builtin-memmove-12.c: Skip for Darwin.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.
* gcc.target/i386/memset-pr70308-1b.c: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

phiprop: Allow for one store inbetween the load and the phi which is being used to insert [PR123120]

So phiprop has one disadvantage is that if there is store between the
phi with the addresses and the new load, phiprop will no do anything.
This means for some C++ code where you have a min of a max (or the opposite),
depending on the argument order of evaluation phiprop might do
the transformation or it might not (see tree-ssa/phiprop-3.C for examples).
So we need to allow skipping of one store inbetween the load and
where the phi is located.

Aggregates include a store when doing phiprop so we need to check
if there are also loads between the original store/load and the
store we are skipping. This can be added afterwards but I didn't
see aggregate case happening enough to make a big dent. I added
testcases (phiprop-{10,11}.c) to make sure cases where the load
would make a different shows up though.

changes since v1:
* v2: rewrite can_handle_load to avoid duplicated skipping store code.

PR tree-optimization/123120
PR tree-optimization/116823
gcc/ChangeLog:

* tree-ssa-phiprop.cc (phiprop_insert_phi): Add other_vuse
argument, use it instead of the vuse on the use_stmt.
(can_handle_load): Add aggregate argument. Also return the vuse
of the load/store when the insert is allowed.
Skipping over one non-modifying store for !aggregate.
(propagate_with_phi): Update call to can_handle_load
and phiprop_insert_phi.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phiprop-8.c: New test.
* gcc.dg/tree-ssa/phiprop-9.c: New test.
* gcc.dg/tree-ssa/phiprop-10.c: New test.
* gcc.dg/tree-ssa/phiprop-11.c: New test.
* gcc.dg/tree-ssa/phiprop-12.c: New test.
* g++.dg/tree-ssa/phiprop-3.C: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

tree-optimization/125153 - testcase for fixed PR

The following adds a testcase for the PR which was fixed by
reversion of r16-303.

PR tree-optimization/125153
* gcc.dg/torture/pr125153.c: New testcase.

Revert "tree-optimization/120003 - missed jump threading"

This reverts commit 1a13684dfc7286139064f7d7341462c9995cbd1c.

middle-end/125156 - preserve edge flags in cleanup_control_expr_graph

cleanup_control_expr_graph when setting EDGE_FALLTHRU cleared all
existing edge flags such as EDGE_IRREDUCIBLE_LOOP rather than
just the no longer relevant EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags.

PR middle-end/125156
* tree-cfgcleanup.cc (cleanup_control_expr_graph): Clear
EDGE_TRUE_VALUE and EDGE_FALSE_VALUE edge flags only.

* gcc.dg/torture/pr125156.c: New testcase.

middle-end/125146 - fold_stmt fails to release SSA names

When match-and-simplify simplification fails we have to release
eventually pushed stmts.

PR middle-end/125146
* gimple-fold.cc (fold_stmt_1): Discard stmts in seq
after failed gimple_simplify as well.

rs6000: Add -mcpu=future support and built-in gating infrastructure

This patch introduces support for the -mcpu=future option, intended to
enable experimental processor features that may or may not be included
in future Power processors. The option serves as a placeholder for
development and evaluation purposes, and may be renamed if a
corresponding processor is defined.

In addition, this change adds support for gating rs6000 built-ins using
a new target predicate "future", corresponding to -mcpu=future. This
extends rs6000-gen-builtins.cc and rs6000-builtin.cc to recognize
[future] as a valid predicate, allowing new built-ins defined in .bif
files to be conditionally enabled.

Bootstrapped and Regtested on Power10 little-endian system, using the
--with-cpu=future configuration option.

2026-05-04 Kishan Parmar <kishan@linux.ibm.com>

gcc/
* config.gcc (powerpc*-*-*): Add support for supporting
--with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Handle
ENB_FUTURE and issue diagnostic requiring -mcpu=future.
(rs6000_builtin_is_supported): Return TARGET_FUTURE for
ENB_FUTURE built-ins.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_FUTURE if -mcpu=future.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(POWERPC_MASKS): Add OPTION_MASK_FUTURE.
(rs6000_cpu_opt_value): New entry for 'future' via the RS6000_CPU macro.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add
BSTZ_FUTURE for future.
(write_decls): Add ENB_FUTURE in bif_enable enum of generated header
file.
* config/rs6000/rs6000-opts.h (PROCESSOR_FUTURE): New macro.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_machine_from_flags) If -mcpu=future,
set the .machine directive to "future".
(rs6000_opt_masks): Add entry for -mfuture.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/rs6000.opt (-mfuture): New option.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document
-mcpu=future.

gcc/testsuite/
* gcc.target/powerpc/future-1.c: New test.
* gcc.target/powerpc/future-2.c: Likewise.

doc: Document several "force_l32" features for Xtensa

This patch adds documentation for the "force_l32" features of the Xtensa
target that were added in recent patches.

gcc/ChangeLog:

* doc/extend.texi (Xtensa Named Address Spaces):
Document '__force_l32'.
(Xtensa Attributes): Document 'force_l32'.
* doc/invoke.texi (Xtensa Options):
Document '-m[no-]force-l32'.

xtensa: Implement "-mforce-l32" target-specific option

In the previous patches, both the named address space "__force_l32" and
the target-specific attribute "force_l32" were introduced for reading
sub-words from the instruction memory area.

This patch introduces a new target-specific option "-mforce-l32", which
allows sub-word reading from the instruction memory area even in the
generic address spaces (ie., the default memory references) or without
the "force_l32" attribute.

     /* example */
     int test(unsigned int i) {
       static const char string[] __attribute__((section(".irom.text")))
         = "The quick brown fox jumps over the lazy dog.";
       return i < __builtin_strlen(string) ? string[i] : -1;
     }

     ;; result (-O2 -mforce-l32)
      .literal_position
      .literal .LC0, string$0
     test:
      entry sp, 32
      movi.n a8, 0x2b
      bltu a8, a2, .L3
      l32r a9, .LC0 ;; If -mno-force-l32,
      movi.n a8, -4 ;;
      add.n a9, a9, a2 ;; l32r a8, .LC0
      and a8, a9, a8 ;; add.n a8, a8, a2
      l32i.n a8, a8, 0 ;; l8ui a2, a8, 0
      ssa8l a9 ;;
      srl a8, a8 ;;
      extui a2, a8, 0, 8 ;;
      retw.n
     .L3:
      movi.n a2, -1
      retw.n
      .section .irom.text,"a"
     string$0:
      .string "The quick brown fox jumps over the lazy dog."

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_expand_load_force_l32_2):
New sub-function for inspecting pseudos that clearly point to the
function's stack frame.
(xtensa_expand_load_force_l32):
Add handling for loading from the generic address space when the
"-mforce-l32" option is enabled, however, obvious references to
function stack frames are excluded.
* config/xtensa/xtensa.opt (mforce-l32):
New target-specific option definition.

xtensa: Implement "force_l32" target-specific attribute

The previous patch introduced the target-specific named address space
"__force_l32", but this reserved identifier can only be used from C.

Therefore, this patch introduces a new target-specific attribute
"force_l32," which is very similar to the named address space "__force_l32,"
making that feature usable not only in C but also in other languages.

     /* example */
     extern "C" {
       unsigned int test(const char *p) {
         for (const char __attribute__((force_l32)) *q = p; ; ++q)
           if (!*q)
             return q - p;
       }
     }

     ;; result (-Os -mlittle-endian)
     test:
      entry sp, 32
      mov.n a8, a2
      movi.n a10, -4
     .L3:
      and a9, a8, a10 ;; *q : align to SImode
      l32i.n a9, a9, 0 ;; *q : load:SI
      ssa8l a8 ;; *q : shift to bit position 0
      srl a9, a9
      extui a9, a9, 0, 8 :: *q : zero_extract:QI
      beqz.n a9, .L5
      addi.n a8, a8, 1
      j .L3
     .L5:
      sub a2, a8, a2
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_attribute_table,
TARGET_ATTRIBUTE_TABLE):
New definitions for target-specific attributes.
(xtensa_expand_load_force_l32_1): New sub-function for inspecting
the attribute from the specified MEM rtx.
(xtensa_expand_load_force_l32): Add handlings for for addresses
with offsets.
(xtensa_handle_force_l32_attribute_1,
xtensa_handle_force_l32_attribute):
New functions for handling the attribute.

xtensa: Implement "__force_l32" named address space

In the Xtensa ISA, unless the memory regions for placing machine instructions
are configured as "unified," instructions other than specific 32-bit width
load/store ones are not defined to be able to access data in such regions.

In such cases, data residing in the same memory area as the instructions,
eg., pre-configured constant tables or string literals, cannot be read using
the usual sub-word memory load instructions when reading them in units of
1- or 2-bytes.  Instead, a series of alternative instructions are needed to
extract the desired sub-word bit by bit from the result of loading an aligned
full-word.

This patch introduces a new target-specific named address space "__force_l32"
which indicates that such considerations are necessary when loading sub-words
from memory.

     /* example #1 */
     struct foo {
       short a, b, c, d;
     };
     int test(void) {
       extern __force_l32 struct foo *p;
       return p->a * p->d;
     }

     ;; result #1 (-O2 -mlittle-endian)
      .literal_position
      .literal .LC0, p
     test:
      entry sp, 32
      l32r a9, .LC0 ;; the address of p
      movi.n a8, -4 ;; consolidated by fwprop/CSE
      l32i.n a9, a9, 0 ;; the value of p
      addi.n a10, a9, 6
      and a2, a9, a8 ;; p->a : align to SImode
      and a8, a10, a8 ;; p->d : align to SImode
      l32i.n a2, a2, 0 ;; p->a : load:SI
      l32i.n a8, a8, 0 ;; p->d : load:SI
      ssa8l a9 ;; p->a : shift to bit position 0
      srl a2, a2
      ssa8l a10 ;; p->d : shift to bit position 0
      srl a8, a8
      mul16s a2, a2, a8 ;; mulhisi3
      retw.n

     /* example #2 */
     char *strcpy_irom(char *dst, __force_l32 const char *src) {
       char *p = dst;
       while (*p = *src)
         ++p, ++src;
       return dst;
     }

     ;; result #2 (-Os -mbig-endian)
     strcpy_irom:
      entry sp, 32
      mov.n a9, a2
      movi.n a10, -4 ;; hoisted out
      j .L2
     .L3:
      addi.n a9, a9, 1
      addi.n a3, a3, 1
     .L2:
      and a8, a3, a10 ;; *src : align to SImode
      l32i.n a8, a8, 0 ;; *src : load:SI
      ssa8b a3 ;; *src : shift to bit position 0
      sll a8, a8
      extui a8, a8, 24, 8 ;; *src : zero_extract:QI
      s8i a8, a9, 0 ;; *p   : store:QI
      bnez.n a8, .L3
      retw.n

gcc/ChangeLog:

* config/xtensa/xtensa-protos.h
(xtensa_expand_load_force_l32): New function prototype.
* config/xtensa/xtensa.cc (#include): Add "expmed.h".
(TARGET_LEGITIMATE_ADDRESS_P):
Change a whitespace delimiter from HTAB to SPACE.
(TARGET_ADDR_SPACE_SUBSET_P, TARGET_ADDR_SPACE_CONVERT,
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P):
New macro definitions for named address space.
(xtensa_addr_space_subset_p, xtensa_addr_space_convert,
xtensa_addr_space_legitimate_address_p):
New hook function prototypes and definitions required for
implementing the named address space.
(xtensa_expand_load_force_l32): New function that generates RTXes
that perform loads from memory belonging to the named address
space.
* config/xtensa/xtensa.h (ADDR_SPACE_FORCE_L32):
New macro for the ID# of the named address space.
(REGISTER_TARGET_PRAGMAS): New hook for registering C language
identifier for the named address space.
* config/xtensa/xtensa.md
(zero_extend<mode>si2_internal): Rename from zero_extend<mode>si2.
(zero_extend<mode>si2): New RTL generation pattern that calls
xtensa_expand_load_force_l32().
(extendhisi2, extendqisi2, movhi, movqi):
Change to call xtensa_expand_load_force_l32() first.
(*shift_per_byte): Delete the insn condition.

MAINTAINERS: Add myself to write after approval

2026-05-04 Vijay Shankar <vijay@linux.ibm.com>

ChangeLog:
* MAINTAINERS: Add myself to write after approval.

[V2][RISC-V][PR rtl-optimization/124766] Simplify x + y == y into x == 0

So Richard S. noticed 3 issues in the V1 patch.  Specifically it should have
been using rtx_equal_p rather than just testing pointer equality.  That's not a
correctness issue, but could potentially allow the pattern to apply more often.

Second we should be checking for !side_effects_p on the operand we're dropping.
Easy to fix.

Finally there was a const0_rtx use that should have been CONST0_RTX.  Given how
often I mention that one to others, I'm embarrassed I missed it.

Bootstrapped on x86 and retested on the various embedded platforms.  Bootstraps
on riscv platforms, aarch64, armv7 and sh4eb are in flight.

--

So this is derived from S_regmatch in spec2017, so fairly hot.

long
frob (unsigned short *y, long z)
{
  long ret = (*y << 2) + z;
  if (ret != z)
    return 0;
  return ret;
}

It generates this code on riscv:

        lhu     a5,0(a0)
        sh2add  a5,a5,a1
        sub     a1,a1,a5
        czero.nez       a0,a5,a1
        ret

That's not bad, but the sh2add and sub are not actually needed. This may look
familiar to a case Daniel was recently discussing, the major difference are the
types of the function args which I got wrong the first time I reduced this
case.

czero instructions check their condition for zero/nonzero status. So we just
need to know if a1 has a zero/nonzero value at the czero instruction.  So
working backwards:

a1 = a1 - a5                // sub instruction
a1 = a1 - ((a5 << 2) + a1)  // substitute from sh2add
a1 = a5 << 2                // a1 terms cancel out

So we just need the nonzero state of a5 << 2.  Now since a5 was set by the lhu
instruction, the upper 48 bits are already known zero, so critically we know
the upper 2 bits are zero. Meaning that we can just test a5 as set by the lhu
instruction for zero/nonzero.  The net is we can generate this code instead:

        lhu     a0,0(a0)
        czero.nez       a0,a1,a0
        ret

It's a small, but visible instruction count savings and likely a small
performance improvement on most designs.

So the trick to get there is a small simplify-rtx improvement. We just need to
simplify
(eq/ne (plus (x) (y)) (y)) ->  (eq/ne (x) (0))

And all the right things just happen.  Bootstrapped and regression tested on a
variety of native platforms including x86, aarch64, riscv and tested across the
various embedded targets in my tester.  I'll wait for the RISC-V pre-commit CI
tester to render a verdict before going forward.

PR rtl-optimization/124766

gcc/

* simplify-rtx.cc (simplify_context::simplify_relational_operation_1):
Simplify x + y == y constructs.

gcc/testsuite/

* gcc.target/riscv/pr124766.c: New test.

match: Optimize `A > B ? ABS(A) : B` to `MAX(A, B)` when B >= 0 [PR116700]

When B is known to be non-negative and A > B, A must be positive,
so ABS(A) == A. The whole expression (A > B ? ABS(A) : B) then
simplifies to MAX(A, B). This is caught at -O2 via VRP, but at
-O1 phiopt1 produces ABS_EXPR and no later pass simplifies it.

PR tree-optimization/116700

gcc/ChangeLog:

* match.pd: (A > B ? ABS(A) : B -> MAX(A, B)): New pattern
for non-negative B.

gcc/testsuite/ChangeLog:

* gcc.dg/pr116700.c: New test.
* gcc.dg/tree-ssa/phi-opt-48.c: New test.

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>

libbacktrace: support multiple zstd frames

Based on patch by GitHub user ofats.

* elf.c (elf_zstd_decompress_frame): New static function,
broken out of elf_zstd_decompress.
(elf_zstd_decompress): Call elf_zstd_decompress_frame in a loop.
* zstdtest.c (test_large): Compress the file in chunks.

Daily bump.

chrec: Move variable rtype definition to the scope only used

rtype here is only needed for POINTER_PLUS_EXPR and is only used
in the condition for PPE, so move it to that scope instead.

Pushed as obvious after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-chrec.cc (chrec_fold_plus_poly_poly): Move
rtype definition to right before the use.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

c++: Handle EXACT_DIV_EXPR as printing `/` [PR119567]

Before r8-4233-g6ff16d19d26a41, we would print EXACT_DIV_EXPR as `(ceiling /)`
which is wrong. Now we print it as `unknown operator` which is also wrong.
Printing it as `/` is correct here since it is the similar to `FLOOR_DIV_EXPR`
except it is undefined behavior if it is not exact (so floor is fine :)).
This shows up when printing out the reason why the following is not a contexpr:
constexpr int (*p1)[0] = 0, (*p2)[0] = 0;
constexpr int k2 = p2 - p1;

Bootstrapped and tested on x86_64-linux-gnu.

PR c++/119567
gcc/cp/ChangeLog:

* error.cc (dump_expr): Treat EXACT_DIV_EXPR the same as FLOOR_DIV_EXPR.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Ada: Fix build failure for 32-bit libada on FreeBSD

The FreeBSD-specific subunit has not been adjusted to the renaming.

gcc/ada/
PR ada/125168
* libgnat/s-dorepr__freebsd.adb (Two_Prod): Adjust to renaming.
(Two_Sqr): Likewise.

tree-optimization/122569 - fix DeBruijn CLZ table validator shift-by-64 UB

simplify_count_zeroes validates DeBruijn CLZ tables by computing
(1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade
b |= b >> 1; ... b |= b >> 32.  For 64-bit input with data == 63 (the
MSB bit), data + 1 equals HOST_BITS_PER_WIDE_INT, making the shift
(HOST_WIDE_INT_1U << 64) undefined behavior.  Hosts typically produce
0, so the check (0 * magic) >> 58 == 63 fails and check_table_array
returns false.

Every well-formed 64-bit DeBruijn CLZ table has an entry mapping the
all-ones value to bit 63, so this UB rejected every such table --
including the magic 0x03f79d71b4cb0a89 used in Stockfish's msb(),
zstd's bits.h, and cpython's pycore_bitutils.h.

Fix by special-casing data + 1 == HOST_BITS_PER_WIDE_INT to use
HOST_WIDE_INT_M1U.  Only the 64-bit CLZ path is affected.

gcc/ChangeLog:

PR tree-optimization/122569
* tree-ssa-forwprop.cc (simplify_count_zeroes): Avoid
shift-by-HOST_BITS_PER_WIDE_INT UB when computing the all-ones
value for the CLZ validator.

gcc/testsuite/ChangeLog:

PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: New test.
* gcc.dg/tree-ssa/pr122569-2.c: New test.

[RISC-V][PR target/124009] Improve select between 2^n and 0 on RISC-V

So this was something I noticed a while back, I'm pretty sure while throwing
hot blocks into an LLM to see what the LLM thought might be optimizable.  In
this case it was mcf from spec2017.

So the basic idea is for code like this:

int foo(int x, int y) { return (y < x) ? 1 : -1; }

We get something like this for rv64gcbv_zicond:

        slt     a1,a1,a0        # 27    [c=4 l=4]  slt_didi3
        li      a5,2            # 28    [c=4 l=4]  *movdi_64bit/1
        czero.eqz       a0,a5,a1        # 29    [c=4 l=4]  *czero.eqz.didi
        addi    a0,a0,-1        # 17    [c=4 l=4]  *adddi3/1

That's not bad, in particular it avoids a likely tough to predict conditional
branch.  But we can do better.

Essentially the code is selecting between 1 and -1.  So if we take the output
of the SLT (0/1) shift it left by one position (0/2), then subtract one we get
a select for -1, 1.

After this patch we get the expected:

        slt     a1,a1,a0        # 28    [c=4 l=4]  slt_didi3
        slli    a0,a1,1 # 29    [c=4 l=4]  ashldi3
        addi    a0,a0,-1        # 17    [c=4 l=4]  *adddi3/1

It's probably not any faster on a modern design, but it will encode more
efficiently, saving either 2 or 4 bytes (potentially improving performance by
getting more ops per fetch block).    There's some very obvious
generalizations.  We can select between 2^n and 0, we can select between 2^n-1
and -1.  But we can also do things like select between 3, 5 or 9 and 0 (think
using shNadd where both source operands are the output of the slt).    There's
all kinds of interesting possibilities here.

The key is to implement a splitter which handles 2^n and 0.  Once that is in
place pre-existing code will handle the 2^n-1 and -1 case automatically.  While
cases like selecting between 9 and 0 aren't yet handled, it would be a fairly
simple extension to these new splitters with the basic framework in place.

Anyway, while working on this I realized the scc_0 iterator didn't include
any_lt, which seems like a dreadful oversight on my part. So I fixed that as
well.

Given the high degree of non-orthogonality in the sCC capabilities of the
RISC-V ISA, this is actually several splitters to deal with the different cases
of sCC we can handle in a single instruction.

Tested on riscv32-elf and riscv64-elf.  Will wait for pre-commit CI before
moving forward.

PR target/124009
gcc/

* config/riscv/iterators.md (scc_0): Add any_lt.
* config/riscv/zicond.md: Add splitters to select between 2^n and 0.

gcc/testsuite/

* gcc.target/riscv/pr124009.c: New test.

ginclude: avoid redefining __STDC_VERSION_LIMITS_H__

We define this macro after including the systems limits.h header which
may define this macro. Using glibc-2.43, for example, before this patch
every file that included limits.h would emit a warning if
-Wsystem-headers was in use.

PR c/125161

gcc/
* glimits.h (__STDC_VERSION_LIMITS_H__): Only define the macro
if it was not already defined.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>

[RISC-V][PR target/125152] Don't use stale mode in conditional move expansion

This is a trivial oversight in the recently added improvement to conditional
move generation on the RISC-V port.

We have a step which canonicalizes the comparison operands. The process of
canonicalizing may change one or both operands, including giving a new pseudo
with a different mode.

The new code failed to account for that and as a result it was using a stale
mode (QI) which caused all kinds of problems later. Just swapping the code
which canonicalizes the operand with the code that extracts the mode and
everything is happy again. Fixed a formatting nit while I was in there.

Tested on riscv32-elf and riscv64-elf. But waiting for pre-commit CI to do its
thing.

PR target/125152
gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Extract the
mode after operand canonicalization.

gcc/testsuite/

* gcc.target/riscv/pr125152.c: New test.

Daily bump.

libgo: cmd/go: use 'gcloud storage cp' instead of 'gsutil cp'

In some misguided attempt at "cleanup", Google Cloud has
decided to retire 'gsutil' in favor of 'gcloud storage' instead
of leaving an entirely backwards-compatible wrapper so
that client scripts and muscle memory keep working.

In addition to breaking customers this way, they are also
sending AI bots around "cleaning up" old usages with scary
warnings that maybe the changes will break your entire world.
This is even more misguided, of course, and resulted in us
receiving CL 748661 (originally GitHub PR golang/gofrontend#13)
and then me receiving a private email asking for it to be merged.

It was easier to recreate the 4-line CL myself than to
enumerate everything that was wrong with that CL's
commit message.

I hope that only Google teams are being subjected to this.

This is based on https://go.dev/cl/748900 from the main Go repo by Russ.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/749000

phiopt: Set cfgchanged if cselim-limited happened

I noticed while improving cselim-limited that if
not creating a new phi, there are a few empty basic blocks.
So this sets cfgcleanup when cselim-limited does
something in phiopt. cselim-5.c shows the case I
was looking into.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (pass_phiopt::execute): Set cfgcleanup
if cselim_limited returns true.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cselim-5.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Fortran/OpenMP: cleanup gfc_free_omp_namelist

Move the logic to deduce what needs to be freed from the
caller to the callee by passing the OMP_LIST_... enum value
instead of multiple bool arguments to gfc_free_omp_namelist.

Additionally, add the name 'gfc_omp_list_type' to the existing
OMP_LIST_... enum values and OMP_LIST_NONE (== OMP_LIST_NUM)
as special value.

As an enum is available, use it properly and replace 0 by
OMP_LIST_FIRST in the list walks.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_omp_list_type): Add this name
to the existing OMP_LIST... enum; add OMP_LIST_NONE.
(gfc_free_omp_namelist): Take that enum as arg instead of bool args.
* match.cc (gfc_free_omp_namelist): Update.
* openmp.cc (gfc_free_omp_clauses, gfc_free_omp_declare_variant_list,
gfc_match_omp_clause_reduction, gfc_match_omp_clauses,
gfc_match_omp_allocate, gfc_match_omp_flush,
gfc_match_omp_declare_target, resolve_omp_clauses,
gfc_resolve_omp_parallel_blocks, resolve_omp_do,
gfc_resolve_oacc_blocks, gfc_resolve_oacc_declare): Update
gfc_free_omp_namelist call and used enum type instead of
int.
* st.cc (gfc_free_statement): Likewise.

Co-Authored-By: Julian Brown <julian@codesourcery.com>

[RISC-V][PR tree-optimization/109038] Recognize shifts+rotate as simple shift in some cases

Consider this test from pr109038:

unsigned
foo (unsigned int a)
{
  unsigned int b = a & 0x00FFFFFF;
  unsigned int c = ((b & 0x000000FF) << 8
            | (b & 0x0000FF00) << 8
            | (b & 0x00FF0000) << 8
            | (b & 0xFF000000) >> 24);
  return c;
}

We currently generate something like this for rv64gcbv:

        slli    a0,a0,40
        srli    a0,a0,40
        roriw   a0,a0,24
        ret

Two key points.  The first two shifts clear the upper 40 bits. The roriw is a
rotation of the low 32 bits by 24 positions with a sign extension from bit 31
into bits 32..63.

So we're going to have bit 31 defining bits 32..63 after the rotation and the
low 8 bits will be clear.  So we can just do

    slliw a0,a0,8

Note that doesn't even strictly need bitmanip, though the original sequence
did.  The mask is always going to be a consecutive run of on bits including
bits 31..63.   The number of bits off in the mask must be 32 - rotate count.
Put it all together and you get a nice slliw.

Essentially it's a 3->1 combination, so a define_insn is sufficient.

An earlier version of this patch has been in my tester for weeks, so the usual
testing has been performed.  But that version was meaningfully different (left
a trailing andi and was impemented as a splitter).  So I consider most of that
testing invalid.  This version did go through riscv32-elf and riscv64-elf
without regressions and I'll be waiting on the upstream pre-commit to render a
verdict.

PR target/109038
gcc/
* config/riscv/bitmanip.md (rotate_with_masking_to_shift): New pattern.

gcc/testsuite/
* gcc.target/riscv/pr109038.c: New test.

testsuite: don't link top-level asm tests as PIE [PR 70150]

If these tests are linked as PIE, the linker ends up creating runtime
text relocation and warns or errors out.

gcc/testsuite/

PR testsuite/70150
* gcc.dg/ipa/pr122458.c (dg-options): Add -no-pie.
* gcc.dg/lto/toplevel-extended-asm-1_0.c (dg-lto-options): Add
-no-pie.
* gcc.dg/lto/toplevel-simple-asm-1_0.c (dg-lto-options): Add
-no-pie.

i386: testsuite: disable PIE for some tests [PR 70150]

These tests use check_function_bodies.  Some of them expect a function
body that is not valid for PIE.  Some have minor difference of
"1+sym(%rip)" vs "sym+1(%rip)".  Others have extra "@PLT" in call
instructions.

gcc/testsuite/

PR testsuite/70150
* gcc.target/i386/builtin-memmove-13.c (dg-options): Add
-fno-pie.
* g++.target/i386/memset-pr108585-1a.C: Likewise.
* g++.target/i386/memset-pr108585-1b.C: Likewise.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-20.c: Likewise.
* gcc.target/i386/memset-pr120683-21.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.
* gcc.target/i386/pr111657-1.c: Likewise.
* gcc.target/i386/pr120881-2a.c: Likewise.

i386: testsuite: disable stack protector for 5 tests

These tests have check_function_bodies against functions allocating
arrays on stack, so they fail with --enable-default-ssp. Disable stack
protector explicitly to fix them.

gcc/testsuite/

* g++.target/i386/memset-pr108585-1a.C (dg-options): Add
-fno-stack-protector.
* g++.target/i386/memset-pr108585-1b.C (dg-options): Likewise.
* gcc.target/i386/auto-init-padding-9.c (dg-options): Likewise.
* gcc.target/i386/memset-pr70308-1a.c (dg-options): Likewise.
* gcc.target/i386/memset-pr70308-1b.c (dg-options): Likewise.

[PATCH] RISC-V: Update riscv.opt.urls for -mmpy-optionThis option is currently missing docs.

Adding the comment that regenerate-opt-urls produced.
I will add docs in a future patch. This is just to make the CI happy in
the mean time.

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Add temp fix for -mmpy-option.

Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>

Minor testsuite tweaks

gcc/testsuite/
* gnat.dg/valid_scalars2.adb: Remove -O0 option.
* gnat.dg/validity_check3.ads: Rename to...
* gnat.dg/valid_scalars3.ads: ...this.
* gnat.dg/validity_check3.adb: Rename to...
* gnat.dg/valid_scalars3.adb: ...this.

testsuite: semaphore/try_acquire_until: reorder clock::now calls

Clock calls on VxWorks are slow, so the odds that the consecutive
calls of *clock::now() will yield a different result are not
negligible. Reordering the calls avoids false positives.

for libstdc++-v3/ChangeLog

* testsuite/30_threads/semaphore/try_acquire_until.cc
(test01): Reorder calls.

match: Fix `(A>>bool) EQ 0 -> (unsigned)A LE bool` pattern for vector types [PR125139]

This pattern does not work for vector types as written. To make it work we need to
create a vec_duplicate of the `bool` value. I am not sure that is better so for
right now this just enables the pattern only for INTEGRAL_TYPE_P types (which means
non-vectors).

Pushed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR tree-optimization/125139

gcc/ChangeLog:

* match.pd (`(A>>bool) EQ 0 -> (unsigned)A LE bool`): Enable
only for INTEGRAL_TYPE_P types.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr125139-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Daily bump.

Update gcc .po files

* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
ja.po, ka.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po,
zh_CN.po, zh_TW.po: Update.

gcc: fix gcov-tool MOSTLYCLEANFILES typo

gcc/ChangeLog:

* Makefile.in (MOSTLYCLEANFILES): Fix typo of '$(exeext)'.

Signed-off-by: Sam James <sam@gentoo.org>

algol68: Correct typo exeect -> exeext

This typo was breaking compiling for Windows (which of course, uses .exe
extension)

gcc/algol68/ChangeLog:

* Make-lang.in: Correct typo exeect -> exeext

[PATCH v3] match.pd: (A>>bool) == 0 -> (unsigned)A) <= bool [PR119420]

Also add its counterpart:

"(A>>bool) != 0 -> (unsigned)A) > bool"

Changes from v2:
- gate the pattern with "#if GIMPLE"
- use 'single_use' in the rshift result
- add the NE variant
link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712431.html
Bootstrap tested in x86, aarch64 and RISC-V.
Regression tested in x86 and aarch64.

PR tree-optimization/119420

gcc/ChangeLog

* match.pd(`(A>>bool) EQ 0 -> (unsigned)A LE bool`): New
pattern.

gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/pr119420.c: New test.

[PATCH] match.pd: make "if (c) a |= CST1 else a &= ~CST1" unconditional [PR123967]

We have an instance in Perlbench of a code that if a condition is true a
bit is set, if false the same bit is cleared. This can be made
unconditional by always running the bit clear, and then run the bit_ior
with the result of (cond) * CST1:

(a & ~CST1) | (cond * CST1)

If "cond" is false (zero) the bit_ior is a no-op and the bit will remain
cleared, if "cond" is true we'll set the bit as intended.

Note that the transformation will add a mult into the pattern, therefore
make it valid only if type <= word_size to avoid wide int
multiplications.

Bootstrapped on x86, aarch64 and rv64.
Regression tested on x86 and aarch64.

PR rtl-optimization/123967

gcc/ChangeLog:

* match.pd(`if (cond) (A | CST1) : (A & ~CST1)`)`: New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr123967-2.c: New test.
* gcc.dg/tree-ssa/pr123967-3.c: New test.
* gcc.dg/tree-ssa/pr123967.c: New test.

c: argument expressions may be evaluated too often by typeof [PR124576]

When there are multiple declarators in a declaration and the type
is specified via typeof, an expression inside the argument of
typeof may be evaluated multiple times. Fix this by adding a
save expression.

PR c/124576

gcc/c/ChangeLog:
* c-decl.cc (declspecs_add_type): Add save_expr.

gcc/testsuite/ChangeLog:
* gcc.dg/pr124576.c: New test.

[PATCH v3] match.pd: (A>>C) != (B>>C) -> (A^B) >= (1<<C) [PR110010]

Also adding the variant "(A>>C) == (B>>C) -> (A^B) < (1<<C)"

Bootstrapped on x86, aarch64 and rv64.
Regression tested on x86 and aarch64.

Changes from v2:
- add type_has_mode_precision_p () check
- add types_match() to simplify types comparison
- add rshift operand checks (must not be negative, must not
surpass type size)
link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/711284.html
PR tree-optimization/110010

gcc/ChangeLog:

* match.pd (`(A>>C) NE|EQ (B>>C) -> (A^B) GE|LT (1<<C)`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr110010.c: New test.

[PATCH v2 2/2] build: Set default for CPP_FOR_BUILD environment variable in all cases.

A default was set in the `"${build}" != "${host}"` case, but not in the
`"${build}" = "${host}"` case.

For a working build, this change should not make any difference. CPP_FOR_BUILD
is passed to build modules as CPP. If not set, autoconf macro AC_PROG_CC infers
CPP by trying various programs. First, it tries "$CC -E", which CPP will
default to in all cases with this patch.

The following command produces the same build directory with and without the
patch:

./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu

The following command produces a Makefile containing `CPP_FOR_BUILD = ` without
the patch and containing `CPP_FOR_BUILD = $(CC_FOR_BUILD) -E` with the patch:

./configure

ChangeLog:

* configure.ac: Set default for CPP_FOR_BUILD environment variable in all cases.
* configure: Regenerate.

Signed-off-by: Manuel Jacob <me@manueljacob.de>

[PATCH v2 1/2] build: Preserve *_FOR_BUILD environment variables in all cases.

They were preserved in the `"${build}" != "${host}"` case, but not in the
`"${build}" = "${host}"` case.

Each of the following commands produces the same build directory with and
without the patch:

./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu
CC_FOR_BUILD=/tmp/gcc_for_build ./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu
./configure

The following command produces a Makefile containing `CC_FOR_BUILD = $(CC)`
without the patch and containing `CC_FOR_BUILD = /tmp/gcc_for_build` with the
patch:

CC_FOR_BUILD=/tmp/gcc_for_build ./configure

ChangeLog:

* configure.ac: Preserve *_FOR_BUILD environment variables in all cases.
* configure: Regenerate.

Signed-off-by: Manuel Jacob <me@manueljacob.de>

c++/modules: merging fn w/ inst noexcept + deduced auto [PR125115]

Here when streaming in view_interface<int>::data() and merging it with
the in-TU version, we find that the streamed-in version already has its
noexcept instantiated _and_ its return type deduced.  is_matching_decl
has logic to update the in-TU version when that is the case, first by
propagating the instantiated noexcept.  But this is done by overwriting
the entire function type with the streamed-in one, which simultaneously
updates the return type as well.  This premature return type updating
breaks the later deduced return type checks which are partially in terms
of the original function type.

This patch fixes this by propagating the instantiated noexcept more
narrowly via build_exception_variant.  Also turn e_type into a
reference so that it's not stale after updating e_inner's TREE_TYPE.

PR c++/125115

gcc/cp/ChangeLog:

* module.cc (trees_in::is_matching_decl): Turn e_type into a
reference and use it instead of TREE_TYPE (e_inner).  Always
use build_exception_variant to propagate an already-instantiated
noexcept.

gcc/testsuite/ChangeLog:

* g++.dg/modules/auto-9.h: New test.
* g++.dg/modules/auto-9_a.H: New test.
* g++.dg/modules/auto-9_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

[PATCH] RISC-V: Extract fusion logic to riscv-fusion.cc

Simple non-functional change.

I'm planning to add many more cases to riscv_macro_fusion_pair_p so it
is moved to a separate source file to prevent riscv.cc from becoming
too unwieldy.

Also added some tests to verify the cases that are actually tied to
mtunes present upstream. Unfortunately, many of them are not.

Regtested for rv32gc & rv64gc with the new tests included in the baseline.

gcc/ChangeLog:

* config.gcc: Added riscv-fusion.o
* config/riscv/riscv-protos.h (enum riscv_fusion_pairs):
(riscv_macro_fusion_p): Added declaration.
(riscv_macro_fusion_pair_p): Idem.
(riscv_get_fusible_ops): Idem.
* config/riscv/riscv.cc (enum riscv_fusion_pairs):
(riscv_macro_fusion_p): Moved to riscv-fusion.cc
(riscv_fusion_enabled_p): Idem.
(riscv_set_is_add): Idem.
(riscv_set_is_addi): Idem.
(riscv_set_is_adduw): Idem.
(riscv_set_is_shNadd): Idem.
(riscv_set_is_shNadduw): Idem.
(riscv_macro_fusion_pair_p): Idem.
(riscv_get_fusible_ops): New function to access tune_param->fusible_ops
from riscv-fusion.cc.
* config/riscv/t-riscv: Added riscv-fusion.cc
* config/riscv/riscv-fusion.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fusion-auipc-addi.c: New test.
* gcc.target/riscv/fusion-lui-addi.c: New test.
* gcc.target/riscv/fusion-zexth.c: New test.
* gcc.target/riscv/fusion-zextw.c: New test.

Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>

i386: Adjust some c86-4g*.md modeling to reduce build time

Commit r17-203 caused significant increase in GCC build time
on several environments as folks reported, mainly due to
excessively long execution time of genautomata.

As Alexander pointed out, the current division modeling in
c86-4g*.md can cause a combinatorial explosion in the
automaton, that further leads to significant build time
increase.

Following Alexander's suggestion, this patch introduces the
dedicated automatons and cpu_units for idiv and fdiv, uses
them to updates the integer, floating point division and
square root modeling for now.  Some evaluated statistics
are listed below.

With r17-202:

    *Tested stage-1 i686 build -j 32: 255 seconds*

    $ nm -CS -t d --defined-only gcc/insn-automata.o \
  | sed 's/^[0-9]* 0*//' \
  | sort -n | tail -20
13896 r slm_transitions
15360 r znver4_fp_store_transitions
16760 r znver4_ieu_transitions
17776 r bdver1_ieu_transitions
20068 r bdver1_fp_check
20068 r bdver1_fp_transitions
20983 t internal_state_transition(int, DFA_chip*)
22270 t internal_min_issue_delay(int, DFA_chip*)
26208 r slm_min_issue_delay
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions

With culprit commit r17-203:

    *Tested stage-1 i686 build -j 32: 949 seconds*

$ nm -CS -t d --defined-only gcc/insn-automata.o \
  | sed 's/^[0-9]* 0*//' \
  | sort -n | tail -20
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
68160 r c86_4g_ieu_min_issue_delay
93960 r bdver3_fp_transitions
110080 r c86_4g_fp_min_issue_delay
136320 r c86_4g_ieu_transitions
181744 r znver4_fpu_transitions
220160 r c86_4g_fp_transitions
262988 r c86_4g_m7_fpu_base
475225 r c86_4g_m7_ieu_min_issue_delay
950450 r c86_4g_m7_ieu_transitions
4010567 r c86_4g_m7_fpu_min_issue_delay
5496908 r c86_4g_m7_fpu_check
5496908 r c86_4g_m7_fpu_transitions

With this patch:

    *Tested stage-1 i686 build -j 32: 257 seconds*

$ nm -CS -t d --defined-only gcc/insn-automata.o \
  | sed 's/^[0-9]* 0*//' \
  | sort -n | tail -20

20068 r bdver1_fp_transitions
22354 r c86_4g_m7_ieu_min_issue_delay
25705 t internal_state_transition(int, DFA_chip*)
26208 r slm_min_issue_delay
27164 t internal_min_issue_delay(int, DFA_chip*)
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
33728 r c86_4g_fp_transitions
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
89414 r c86_4g_m7_ieu_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions
326322 r c86_4g_m7_fpu_min_issue_delay
1305288 r c86_4g_m7_fpu_transitions

I noticed the number of c86_4g_m7_fpu_transitions is still
large, but this patch can address the build time issue.
To avoid impacting folks' daily builds and regular testings,
I'd like to land this patch first if possible.  We can then further
refine the c86-4g modeling and investigate large transition
count as part of the follow-up work, even potentially part
of PR 87832.

gcc/ChangeLog:

* config/i386/c86-4g-m7.md (c86_4g_m7_idiv): New automaton.
(c86_4g_m7_fdiv): Ditto.
(c86-4g-m7-idiv): New unit.
(c86-4g-m7-fdiv): Ditto.
(c86_4g_m7_idiv_DI): Adjust unit in the reservation.
(c86_4g_m7_idiv_SI): Ditto.
(c86_4g_m7_idiv_HI): Ditto.
(c86_4g_m7_idiv_QI): Ditto.
(c86_4g_m7_idiv_DI_load): Ditto.
(c86_4g_m7_idiv_SI_load): Ditto.
(c86_4g_m7_idiv_HI_load): Ditto.
(c86_4g_m7_idiv_QI_load): Ditto.
(c86_4g_m7_fp_div): Ditto.
(c86_4g_m7_fp_div_load): Ditto.
(c86_4g_m7_fp_idiv_load): Ditto.
(c86_4g_m7_avx512_ssediv): Ditto.
(c86_4g_m7_avx512_ssediv_mem): Ditto.
(c86_4g_m7_avx512_ssediv_z): Ditto.
(c86_4g_m7_avx512_ssediv_zmem): Ditto.
(c86_4g_m7_avx512_sse_sqrt): Ditto.
(c86_4g_m7_avx512_sse_sqrt_load): Ditto.
(c86_4g_m7_fp_sqrt): Ditto.  Rename from ...
(c86_4g_m7fp_sqrt): ... here.
* config/i386/c86-4g.md (c86_4g_idiv): New automaton.
(c86_4g_fdiv): Ditto.
(c86-4g-idiv): New unit.
(c86-4g-fdiv): Ditto.
(c86_4g_idiv_DI): Ditto.
(c86_4g_idiv_SI): Ditto.
(c86_4g_idiv_HI): Ditto.
(c86_4g_idiv_QI): Ditto.
(c86_4g_idiv_mem_DI): Ditto.
(c86_4g_idiv_mem_SI): Ditto.
(c86_4g_idiv_mem_HI): Ditto.
(c86_4g_idiv_mem_QI): Ditto.
(c86_4g_fp_sqrt): Ditto.
(c86_4g_sse_sqrt_sf): Ditto.
(c86_4g_sse_sqrt_sf_mem): Ditto.
(c86_4g_sse_sqrt_df): Ditto.
(c86_4g_sse_sqrt_df_mem): Ditto.
(c86_4g_fp_op_div): Ditto.
(c86_4g_fp_op_div_load): Ditto.
(c86_4g_fp_op_idiv_load): Ditto.
(c86_4g_ssediv_ss_ps): Ditto.
(c86_4g_ssediv_ss_ps_load): Ditto.
(c86_4g_ssediv_ss_pd): Ditto.
(c86_4g_ssediv_ss_pd_load): Ditto.
(c86_4g_ssediv_avx256_ps): Ditto.
(c86_4g_ssediv_avx256_ps_load): Ditto.
(c86_4g_ssediv_avx256_pd): Ditto.
(c86_4g_ssediv_avx256_pd_load): Ditto.

Signed-off-by: Kewen Lin <linkewen@hygon.cn>

[PATCH v2] RISC-V: Add Synopsys RMX-100 series pipeline description.

This patch introduces the pipeline description for the Synopsys RMX-100 series
processor to the RISC-V GCC backend. The RMX-100 has a short, three-stage,
in-order execution pipeline with configurable multiply unit options.

The option -mmpy-option was added to control which version of the MPY unit the
core has and what the latency of multiply instructions should be similar to
ARCv2 cores (see gcc/config/arc/arc.opt:60).

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add arc-v-rmx-100-series.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add arcv_rmx100.
(enum arcv_mpy_option_enum): New enum for ARC-V multiply options.
* config/riscv/riscv-protos.h (arcv_mpy_1c_bypass_p): New declaration.
(arcv_mpy_2c_bypass_p): New declaration.
(arcv_mpy_10c_bypass_p): New declaration.
* config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): New function.
(arcv_mpy_2c_bypass_p): New function.
(arcv_mpy_10c_bypass_p): New function.
* config/riscv/riscv.md: Add arcv_rmx100.
* config/riscv/riscv.opt: New option for RMX-100 multiply unit
configuration.
* doc/riscv-mtune.texi: Document arc-v-rmx-100-series.
* config/riscv/arcv-rmx100.md: New file.

Co-authored-by: Artemiy Volkov <artemiyv@acm.org>
Co-authored-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>

[PATCH v2] RISC-V: Add Synopsys RHX-100 series pipeline description

This patch introduces the pipeline description for the Synopsys RHX-100 series
processor to the RISC-V GCC backend.  The RHX-100 features a 10-stage,
dual-issue, in-order execution pipeline architecture.

It has support for instruction fusion, which will be addressed by subsequent
patches.  Due to fusion, up to four instructions can be issued in a single
cycle.  It is modeled as four separate pipelines and the issue_rate is set to
four.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add arc-v-rhx-100-series.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add
arcv_rhx100.
* config/riscv/riscv.cc (arcv_rhx100_tune_info): New riscv_tune_param.
* config/riscv/riscv.md: Add arcv_rhx100 to tune attribute.
* doc/riscv-mtune.texi: Add RHX-100 documentation.
* config/riscv/arcv-rhx100.md: New file.

Co-authored-by: Artemiy Volkov <artemiyv@acm.org>
Co-authored-by: Luis Silva <luiss@synopsys.com>
Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>

[PATCH GCC17-stage1] riscv: Optimize power-of-2 boundary comparisons in conditional moves

In riscv_expand_conditional_move, detect unsigned comparisons against
power-of-2 boundaries and convert them to shift-based equality tests.
This avoids materializing large constants (e.g. 2^56 - 1) that may
require multiple instructions (bseti + sltu), replacing them with a
single srli that feeds directly into czero.eqz/czero.nez.

The transformation handles four cases:
  GTU x, (2^N-1)  ->  NE (x >> N), 0
  LEU x, (2^N-1)  ->  EQ (x >> N), 0
  GEU x, 2^N      ->  NE (x >> N), 0
  LTU x, 2^N      ->  EQ (x >> N), 0

For example, `(a & (0xff << 56)) ? b : 0` previously generated:
  bseti  a5, zero, 56
  sltu   a0, a0, a5
  czero.nez  a0, a1, a0

Now generates:
  srli      a0, a0, 56
  czero.eqz a0, a1, a0

Existing define_split patterns in riscv.md (lines 3727-3748) handle
the same optimization for standalone SCC operations, but they don't
fire in the conditional move expansion path which goes through
riscv_expand_int_scc directly.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move):
Convert unsigned comparisons against power-of-2 boundaries
to shift-based equality tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-shift-cond.c: New test.

c++/reflection: propagate cv-quals for SPLICE_SCOPE [PR125096]

tsubst_splice_scope isn't propagating cv-quals from the template tree
to the result, which means wrongly failed asserts in the new test due to
a missing 'const'. So let's add the cv-quals like we do in so many
other places in tsubst.

PR c++/125096

gcc/cp/ChangeLog:

* pt.cc (tsubst_splice_scope): Don't return early for
dependent_splice_p. Propagate cv-qualifiers from the
SPLICE_SCOPE to the result.
* reflect.cc (valid_splice_scope_p): Accept SPLICE_SCOPE.

gcc/testsuite/ChangeLog:

* g++.dg/reflect/mangle4.C: Move dg-error.
* g++.dg/reflect/dep16.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

build: Check solaris_{as,ld} where appropriate

Several of the gas and gnu_ld checks in gcc/configure actually need to
determine if Solaris as and ld are in use.  Since solaris_as and
solaris_ld are determined reliably now, it's clearer to check them
directly instead of !gas and !gnu_ld.

This patch does just that.  Since solaris_as/solaris_ld imply target
*-*-solaris2*, the tests can be simplified and sometimes converted from
case/esac to if/else.

Bootstrapped on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11,
x86_64-pc-linux-gnu, amd64-pc-freebsd15.0, and
x86_64-apple-darwin21.6.0.

When there are different flavours of as and/or ld depending on PATH
(/usr/bin/as vs. /usr/gnu/bin/as resp. ld on Solaris, /usr/bin/ld, LLD,
and /usr/local/bin/ld, GNU ld on FreeBSD), the builds were configured
with --with-as/--with-ld.

The Solaris tests were run for as/ld, gas/ld, and gas/gld
configurations, the FreeBSD tests with gas/gld.

In all cases, gcc/auto-host.h and gcc/Makefile were unchanged.

2026-02-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* configure.ac: Test solaris_as, solaris_ld instead of gas, gnu_ld.
(gcc_cv_as_working_gdwarf_n_flag): Escape '.' in filename.
* acinclude.m4 (gcc_cv_initfini_array): Test solaris_as,
solaris_ld instead of gas, gnu_ld.
* configure: Regenerate.

[PATCH] RISC-V: Fix missing braces in riscv_rtx_costs for slli.uw pattern [PR???]

The AND case in riscv_rtx_costs for the slli.uw pattern (zba extension) has a
multi-statement if body without braces. This causes the 'return true' to
execute unconditionally whenever the left operand of AND is an ASHIFT,
regardless of whether the inner condition (checking register_operand,
CONST_INT_P, and the 0xffffffff mask) is satisfied.

This effectively short-circuits the entire AND cost calculation for any
AND+ASHIFT combination when TARGET_ZBA && TARGET_64BIT && DImode,
skipping subsequent pattern checks (bclri, bclr, etc.) and the
fallthrough to PLUS/MINUS.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Add missing braces
around the if body for the slli.uw pattern in the AND case.

strlen: Adjust objsz arg in __strcat_chk -> __stpcpy_chk transformation [PR125079]

As the following testcase shows, we have two different transformations
of __strcat_chk.  One done in strlen_pass::handle_builtin_strcat,
which transforms __strcat_chk (x, y, z) if we know beforehand strlen (x),
so something like:
  l = strlen (x);
  __strcat_chk (x, y, z);
and since PR87672 we change that to
  l = strlen (x);
  __strcpy_chk (x + l, y, z - l);
i.e. decrease the objsz in
  if (objsz)
    {
      objsz = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (objsz), objsz,
                               fold_convert_loc (loc, TREE_TYPE (objsz),
                                                 unshare_expr (dstlen)));
      objsz = force_gimple_operand_gsi (&m_gsi, objsz, true, NULL_TREE, true,
                                        GSI_SAME_STMT);
    }
And another transformation is when we have earlier __strcat_chk (x, y, z)
call and want to compute strlen (x) after that.  In that case
get_string_length transforms
  __strcat_chk (x, y, z);
to
  t = strlen (x);
  l = __stpcpy_chk (x + t, y, z) - x;
where l is the len we are looking for.  This patch changes it similarly to
the PR87672 to
  t = strlen (x);
  l = __stpcpy_chk (x + t, y, z - t) - x;
instead.

2026-05-01  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/125079
* tree-ssa-strlen.cc (get_string_length): Transform
__strcat_chk (x, y, z) when we need strlen (x) afterwards into
l1 = strlen (x); l = __stpcpy_chk (x + l1, y, z - l1) - x;
where l is the strlen (x), instead of using z as last __stpcpy_chk
argument.

* gcc.dg/strlenopt-97.c: New test.

Reviewed-by: Richard Biener <rguenth@suse.de>

[PR target/124559][RISC-V] Improve RISC-V constant synthesis for some HImode constants

So this is a trivial little bug we found doing some comparisons against LLVM.

For the function sub2 in load-immediate.c we get this code:

        li      a5,-32768
        sh      a5,0(a0)
        xori    a5,a5,-1
        sh      a5,0(a1)

Note carefully that li+xori.  There's a slightly better sequence here from an
encoding standpoint.  Instead of using xori we can adjust the synthesis
sequence to target an "addi" for that statement and in doing so we can save two
code bytes of space.

The xori sequence was used because we can't do this in gcc:

(set (dest:HI) (const_int 0x8000))

We're in HI mode so the constant must be sign extended from bit 15 to a
HOST_WIDE_INT.

Fixing this isn't hard.  The key is realizing the vast majority of the time we
really don't want/need to load in HImode and in fact we're typically going to
be generating objects in word_mode.  So instead of passing in the pre-promoted
mode, pass in the post-promoted mode.

That's fine and good with one caveat.   CSE fails to use NEG/NOT to derive a
new constant from an older constant, even if the cost is smaller, which caused
a code quality regression elsewhere on the RISC-V port.  So this patch adjusts
CSE ever-so-slightly to allow it to derive constants from a previous constant
using NOT/NEG in a fairly obvious way.

This has been in my tester for a while, so it's been through the usual
bootstrap & regression test on the Pioneer, BPI, x86 and aarch64 and others as
well as testing across the various embedded targets.

Waiting on pre-commit testing to do its thing.

PR target/124559
gcc/
* config/riscv/riscv-protos.h (riscv_move_integer): Drop mode argument.
* config/riscv/riscv.cc (riscv_move_integer): Pass mode after promotions
to riscv_build_integer.  All callers changed.
* config/riscv/riscv.md: Corresponding changes.
* cse.cc (cse_insn): Try to derive one constant from another using NOT/NEG.

libstdc++: Tweak Doxygen comments for experimental simd

I noticed that Doxygen was not documenting the contents of
<experimental/simd> as part of namespace std, because it didn't know
about the _GLIBCXX_SIMD_BEGIN_NAMESPACE and _GLIBCXX_SIMD_END_NAMESPACE
macros which open and close namespace std::experimental::parallelism_v2.

After defining those macros in the Doxygen config, the Doxygen comments
in experimental/bits/simd.h were causing namespace std to be documented
as part of the Parallelism TS v2. That's because the preprocessed code
looks like:

/** @ingroup ts_simd
* @{
*/
namespace std::experimental::inline parallelism_v2 {

This causes Doxygen to apply the @ingroup command to all three of
namespace std, namespace std::experimental, and namespace
std::experimental::parallelism_v2. I don't know if this is the intended
behaviour, but it doesn't seem useful so I've opened an issue about it:
https://github.com/doxygen/doxygen/issues/12114

To workaround this, we can move the _GLIBCXX_SIMD_BEGIN_NAMESPACE macro
before the @{ group and document it separately with a @namespace
comment. That makes the @ingroup only apply to the namespace named by
the @namespace command, not to its enclosing namespaces as well. Moving
the position of the BEGIN macro also fixes the nesting, as previously we
had @{ then BEGIN then @} then END. Now we have BEGIN @{ @} END which
seems preferable.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (PREDEFINED): Add BEGIN/END macros for
the <experimental/simd> namespace.
* include/experimental/bits/simd.h: Move BEGIN macro before
Doxygen @{ group.

libstdc++: Suppress Doxygen docs for internals in <bits/locale_conv.h>

libstdc++-v3/ChangeLog:

* include/bits/locale_conv.h: Prevent namespace __detail from
being documented as part of the Locales topic.

libstdc++: Improve Doxygen comments for <iterator> contents

Use markdown and suppress unwanted docs for internal helpers.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h: Prevent Doxygen from documenting
namespace __detail as part of the Iterators topic.
* include/bits/stl_iterator_base_funcs.h: Likewise. Also mark
internal helpers as undocumented.
(distance, advance): Improve Doxygen comments.
* include/bits/stl_iterator_base_types.h (iterator): Use
markdown in Doxygen comment. Add @deprecated.
(iterator_traits): Improve wording of Doxygen comment.

libstdc++: Do not assume URBG::result_type exists [PR121919]

The ranges::sample and ranges::shuffle algorithms are supposed to work
with types which model std::uniform_random_bit_generator, which means
they should not assume that G::result_type is present. That isn't needed
to satisfy the concept. Change the algorithms to use decltype(__g())
instead of using result_type.

This isn't sufficient to fix the bug though, because those algorithms
use std::uniform_int_distribution and that class template's operator()
overloads depend on the more restrictive uniform random bit generator
requirements, which do include the presence of a nested result_type
member.

We need to change std::uniform_int_distribution to also use decltype
instead of the nested result_type, even though the standard says that
std::uniform_int_distribution is allowed to assume that result_type
exists.

There's yet another problem, which is that a type that returns random
bool values can model the concept, but doesn't meet the named
requirements and can't be used with std::uniform_int_distribution. That
isn't addressed by this change.

libstdc++-v3/ChangeLog:

PR libstdc++/121919
* include/bits/ranges_algo.h (__sample_fn, __shuffle_fn): Use
decltype(__g()) instead of remove_reference_t<_G>::result_type.
* include/bits/uniform_int_dist.h
(uniform_int_distribution::operator()): Use decltype(__urng())
instead of _UniformRandomBitGenerator::result_type
(uniform_int_distribution::__generate_impl): Likewise.
* testsuite/25_algorithms/sample/121919.cc: New test.
* testsuite/25_algorithms/shuffle/121919.cc: New test.

Reviewed-by: Nathan Myers <nmyers@redhat.com>

Ada: Link with PIC static Ada runtime when -pie is specified

This changes gnatlink to append _pic to the name of the static Ada runtime
when -pie is passed on the command line.

gcc/ada/
PR ada/87936
* gnatlink.adb (Gnatlink): Rename local variable and add Output_PIE
local variable; when it is set, compile the binder file with -fPIE.
(Process_Args): Set Output_PIE upon seeing -pie.
(Process_Binder_File): Append "_pic" to the name of the static Ada
runtime if Output_PIE is set.

gcc/testsuite/
* gnat.dg/pie1.adb: New file.

x86: Correct last_4x_vec_label in ix86_expand_movmem

commit b41f96465190751561f6909e858604ceab00595b
Author: H.J. Lu <hjl.tools@gmail.com>

    x86-64: Inline memmove with overlapping unaligned loads and stores

has

      rtx_code_label *last_4x_vec_label = nullptr;
      if (min_size == 0 || min_size < 4 * move_max)
        last_4x_vec_label = gen_label_rtx ();

      /* Jump to LAST_4X_VEC_LABEL if size < 4 * MOVE_MAX.  */
      if (last_4x_vec_label)
        emit_cmp_and_jump_insns (count_exp, GEN_INT (4 * move_max), LTU,
                                 nullptr, count_mode, 1,
                                 last_4x_vec_label);

...

      if (last_4x_vec_label)
        {
          /* Size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX.  */
          emit_label (last_4x_vec_label);

The last_4x_vec_label block covers min_size <= 4 * MOVE_MAX, not
min_size < 4 * MOVE_MAX.  When MOVE_MAX == 16 bytes and min_size == 64,
the last_4x_vec_label isn't generated.  Change min_size < 4 * move_max
to min_size <= 4 * move_max to correct the last_4x_vec_label condition.

Tested on Linux/x86-64.

gcc/

PR target/125117
* config/i386/i386-expand.cc (ix86_expand_movmem): Generate
last_4x_vec_label when min_size <= 4 * MOVE_MAX.

gcc/testsuite/

PR target/125117
* gcc.dg/pr125117.c: New test.
* gfortran.dg/pr125117.f90: Likewise.
* gcc.target/i386/builtin-memmove-10.c: Updated.
* gcc.target/i386/builtin-memmove-15.c: Likewise.
* gcc.target/i386/builtin-memmove-2a.c: Likewise.
* gcc.target/i386/builtin-memmove-2b.c: Likewise.
* gcc.target/i386/builtin-memmove-2c.c: Likewise.
* gcc.target/i386/builtin-memmove-2d.c: Likewise.
* gcc.target/i386/builtin-memmove-3a.c: Likewise.
* gcc.target/i386/builtin-memmove-3b.c: Likewise.
* gcc.target/i386/builtin-memmove-3c.c: Likewise.
* gcc.target/i386/builtin-memmove-4a.c: Likewise.
* gcc.target/i386/builtin-memmove-4b.c: Likewise.
* gcc.target/i386/builtin-memmove-4c.c: Likewise.
* gcc.target/i386/builtin-memmove-5b.c: Likewise.
* gcc.target/i386/builtin-memmove-5c.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

s390: Fix dealing with HF vector modes in s390_secondary_reload

Initial HF mode support was added in commit r16-6682-g5d6d56d837c which
is missing HF vector mode support when dealing with secondary reloads
for instructions which do not accept relative operands.

gcc/ChangeLog:

* config/s390/s390.cc (s390_secondary_reload): Add cases for HF
vector modes.
* config/s390/s390.md: Add modes V{1,2,4,8}HF to mode iterator
ALL.

tree-vect-loop: Remove useless && 1.

r16-476 has replaced && slp_node with && 1 and it remained that way
until now. THis patch just removes that.

2026-05-01 Jakub Jelinek <jakub@redhat.com>

* tree-vect-loop.cc (vectorizable_reduction): Remove pointless
&& 1.

[V3][RISC-V][PR rtl-optimization/96692] Improve xor+xor+ior sequence when possible

Consider this code:

int f(int a, int b, int c)
{
    return (a ^ b) ^ (a | c);
}

For RISC-V we generate something like this:

        xor     a1,a0,a1
        or      a0,a0,a2
        xor     a0,a1,a0

But this would be better:

        andn    a0,a2,a0
        xor     a0,a0,a1

It looks like Roger tackled this earlier with splitters for x86. I'd have
leaned more towards simplify-rtx, but there may be secondary concerns at play.
So I'll attack in the RISC-V target files in a similar manner.

The patch, but not the testcase, have been in my tester for a while, so it's
been bootstrapped and regression tested on the Pioneer and BPI-F3 board and
regression tested on riscv32-elf and riscv64-elf. Obviously I'll wait for
pre-commit CI before moving forward.

PR rtl-optimization/96692
gcc/
* config/riscv/bitmanip.md (xor+xor+ior splitters): New splitters
that ultimately generate andn+xor when possible.

gcc/testsuite

* gcc.target/riscv/pr96692.c: New test.

Daily bump.

x86: Remove DI_REG/SI_REG from x86_64_int_return_registers

Since only AX/DX register pair and XMM0/XMM1 register pair are used for
function return values in 64-bit mode, remove DI_REG and SI_REG registers
from x86_64_int_return_registers and limit the number of registers used
in return values to 2 in 64-bit mode.

Tested on Linux/x86-64 and Linux/i686.

PR target/124878
* config/i386/i386.cc (x86_64_int_return_registers): Remove
DI_REG and SI_REG.
(ix86_function_value_regno_p): Remove DI_REG and SI_REG cases.
(function_value_64): Replace X86_64_REGPARM_MAX and
X86_64_SSE_REGPARM_MAX with X86_64_MAX_RETURN_NREGS and
X86_64_MAX_SSE_RETURN_NREGS for the number of registers used
in return values.
* config/i386/i386.h (X86_64_MAX_RETURN_NREGS): New. Defined
to 2.
(X86_64_MAX_SSE_RETURN_NREGS): Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Disable 16-bit imm store for TARGET_LCP_STALL

When TARGET_LCP_STALL is enabled, 16-bit immediate integer store should
be avoided. Update V_16_32_64:*mov<mode>_imm to disable 16-bit immediate
integer store when TARGET_LCP_STALL is enabled.

Tested on Linux/x86-64 and Linux/i686.

PR target/125102
* config/i386/mmx.md (V_16_32_64:*mov<mode>_imm): Disable
16-bit immediate integer store if TARGET_LCP_STALL is true.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

libstdc++: Add <bits/binders.h> to freestanding headers [PR125112]

The <ranges> header was added to the freestanding headers in
r16-3575-g1a41e52d7ecb58 but bits/binders.h that it depends on was not
moved, making <ranges> unusable with --disable-libstdcxx-hosted.

libstdc++-v3/ChangeLog:

PR libstdc++/125112
* include/Makefile.am: Move bits/binders.h from bits_headers to
bits_freestanding.
* include/Makefile.in:

Ada: Fix build of GNAT tools with coverage enabled

This removes an obsolete comment in the process.

gcc/
* Makefile.in (COVERAGE_FLAGS): Remove obsolete comment.

gcc/ada/
PR ada/110336
* gcc-interface/Makefile.in (COVERAGE_FLAGS): New variable
(GCC_LINK_FLAGS): Add $(COVERAGE_FLAGS).
(ALL_CFLAGS): Likewise.
(enable_host_pie): Fold into single use.

[IRA]: Process operand NO_REGS class for reg cost calculation

In record_reg_classes there is no special processing of case op_class ==
NO_REGS. It can result in very high cost of the insn alternative cost.
The patch fixes this and can change generated code.

gcc/ChangeLog:

* ira-costs.cc (record_reg_classes): Process correctly case
op_class == NO_REGS.

[IRA]: Fix soft conflict and hard reg cost calculation

When finding soft conflict in IRA, we wrongly use conflict allocno mode.
This can result in more shuffling on the region borders and worse code
generation. The patch fixes this.

gcc/ChangeLog:

* ira-color.cc (assign_hard_reg): Use the right allocno mode to
call note_conflict.