git.ipfire.org Git - thirdparty/gcc.git/log

phiprop: Move vuse variable declaration to right before use

This is just a small cleanup moving the variable declaration
of vuse to right before its use.

Bootstrappd and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Move vuse variable
declaration right before its use.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

phiprop: Move the check on vuse before the dominator tests

This again is some small optimization of the order of checks here.
The dom tests don't say if the prop can happen any more so putting
them after tests that will cause the prop not to happen is a good thing.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Move vuse checks
before the dominator tests.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

phiprop: Factor out the vdef check into new function

This is just a small cleanup and should make the code easier
to understand. And it should make it easier to add/allow
to skip over some store statements that don't affect the
variable being loadded.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Factor out
checking the load for vdef to ....
(can_move_into_conditional): Here.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

testsuite: Add phiprop testcase that is already fixed [PR116823]

This testcase was extracted from fold-const.cc but was fixed
by r16-4212-gf256a13f8aed83 which removed the clobber.
Since this is fixed seperately from the other improvements,
it is in a seperate patch.

Tested on x86_64-linux-gnu.

PR tree-optimization/116823

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/phiprop-1.C: New test.

libstdc++: Update C++23 status table

This hasn't been updated to reflect the new features in GCC 16 (and one
that was backported to gcc-15 for GCC 15.3 but not released yet).

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2023.xml: Update status table.
* doc/html/manual/status.html: Regenerate.

libstdc++: Update documentation about default -std option

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml (Flags): Update description of
default value for -std option.
* doc/html/manual/using.html: Regenerate.

[4/6] fold-mem-offsets: Move RISC-V size-optimization workaround to the backend

The fold-mem-offsets pass contained a target-specific workaround that
skipped basic blocks optimized for size, to avoid conflicting with
RISC-V's shorten-memrefs pass. This penalized all targets.

Move the workaround to the RISC-V backend by disabling fold-mem-offsets
via SET_OPTION_IF_UNSET in riscv_option_override when optimizing for
size with compressed instructions enabled (the same condition that gates
the shorten-memrefs pass). This preserves the RISC-V behavior while
allowing other targets to fold offsets in size-optimized blocks.

gcc/ChangeLog:

* fold-mem-offsets.cc (pass_fold_mem_offsets::execute): Remove
optimize_bb_for_size_p check.
* config/riscv/riscv.cc (riscv_option_override): Disable
flag_fold_mem_offsets when optimizing for size with compressed
instructions.

[PATCH] RISC-V: Support Zalasr extension.

This patch support RISC-V Zalasr[1](load-acquire/store-release) extension. Based on Edwin Lu's old patch:
https://patchwork.sourceware.org/project/gcc/patch/20250410214940.2712673-1-ewlu@rivosinc.com/

[1] https://docs.riscv.org/reference/isa/extensions/zalasr/_attachments/riscv-zalasr.pdf

Co-Authored-by: Edwin Lu <ewlu@rivosinc.com>
gcc/ChangeLog:

* config/riscv/riscv-ext.def: New extension.
* config/riscv/riscv-ext.opt: Ditto.
* config/riscv/sync-rvwmo.md: Add check for zalasr.
* config/riscv/sync-ztso.md: Ditto.
* doc/riscv-ext.texi: New extension.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/a-rvwmo-fence.c: Disable zalasr from -march.
* gcc.target/riscv/amo/a-rvwmo-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-release.c: Ditto.
* gcc.target/riscv/amo/a-ztso-fence.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-release.c: Ditto.
* gcc.target/riscv/amo/zaamo-preferred-over-zalrsc.c: Ditto.
* gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: Ditto.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: Ditto.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: Ditto.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: Ditto.
* gcc.target/riscv/amo/zabha-zacas-atomic-cas.c: Ditto.
* gcc.target/riscv/amo/zabha-zacas-preferred-over-zalrsc.c: Ditto.
* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: Ditto.
* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: Ditto.
* gcc.target/riscv/amo/zacas-char-requires-zabha.c: Ditto.
* gcc.target/riscv/amo/zacas-char-requires-zacas.c: Ditto.
* gcc.target/riscv/amo/zacas-preferred-over-zalrsc.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acquire.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-relaxed.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-release.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping-no-fence.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping.cc: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acquire.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-relaxed.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-release.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acquire.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-relaxed.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-release.c: Ditto.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-char.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping-no-fence.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping.cc: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto.
* lib/target-supports.exp: Add zalasr checks.
* gcc.target/riscv/amo/zalasr-rvwmo-load-acquire.c: New test.
* gcc.target/riscv/amo/zalasr-rvwmo-load-relaxed.c: New test.
* gcc.target/riscv/amo/zalasr-rvwmo-load-seq-cst.c: New test.
* gcc.target/riscv/amo/zalasr-rvwmo-store-compat-seq-cst.c: New test.
* gcc.target/riscv/amo/zalasr-rvwmo-store-relaxed.c: New test.
* gcc.target/riscv/amo/zalasr-rvwmo-store-release.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-load-acquire.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-load-relaxed.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-load-seq-cst.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-store-compat-seq-cst.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-store-relaxed.c: New test.
* gcc.target/riscv/amo/zalasr-ztso-store-release.c: New test.

Co-Authored-by: Edwin Lu <ewlu@rivosinc.com>

[PATCH 3/3] libsanitizer: RISC-V supports HWASAN.

With both libsanitizer and the RISC-V back end supporting HWASAN, enable
building libsanitizer with HWASAN support for RISC-V targets.

------

libsanitizer/ChangeLog:

* configure.tgt: Enable HWASAN support for RISC-V target.

[PATCH 2/3] RISC-V: Enable address tagging on 64-bit targets supporting pointer masking.

Implements TARGET_MEMTAG_CAN_TAG_ADDRESSES and TARGET_MEMTAG_TAG_BITSIZE
for the RISC-V back end, allowing -fsanitize=hwaddress if the target
machine supports the pointer masking extension.

------

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_can_tag_addresses): New function.
(RISCV_HWASAN_TAG_SIZE): New definition.
(riscv_memtag_tag_bitsize): New function.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New definition.
(TARGET_MEMTAG_TAG_BITSIZE): Likewise.

pru: Allow device spec to append assembler command line

Intended use case is for asm_device spec to specify the PRU core
revision.

This could alternatively be implemented with spec append for *asm.
But linker and cpp already use *_device spec variables, so let's
be consistent.

gcc/ChangeLog:

* config/pru/pru.h (ASM_SPEC): Define.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

[HWASAN] [RISC-V] Update EnableTaggingAbi for RISC-V linux. (#176616)

Cherry-picked from LLVM commit: 32d21326f3b60874fd72bbe509c06dbe5b729a32
Enabling pointer tagging in the userspace ABI for RISC-V kernels differs
to that of Aarch64. It requires requesting a particular number of masked
pointer bits, an error is returned if the platform could not accommodate
the request:
https://docs.kernel.org/arch/riscv/uabi.html#pointer-masking

While experimenting with enabling RISC-V HWASAN on GCC I was hitting the
error

> HWAddressSanitizer failed to enable tagged address syscall ABI

when attempting to run instrumented programs in the spike simulator
running kernel release 6.18. This patch successfully allows the tagged
address syscall ABI to be enabled by the support runtime.

[RISC-V][PR tree-optimization/94892] Improve equality test of sign bit splat against zero

One of the tests in pr94892 showed a case where we failed to convert a
sign bit splat + equality test against into a simple lt/ge test which
doesn't require the sign bit splat.

This is only failing on rv64, probably because the case in question has
a DI sign bit splat, then we take a lowpart SI subreg.  The lowpart
dance isn't needed for rv32, though I've structured the test to verify
that we get sensible code on rv32 as well as rv64.

Like many other patches I'm submitting now, this has been in my tester
for a while, but the test has not.  I'll be waiting on the pre-commit
tester to verify sanity before moving forward.  I'm particularly
interested to see how it behaves with no -march flags.  It should be
taking the defaults from when the toolchain was built, which should do
what we want.

Jeff

PR tree-optimization/94892
gcc/
* config/riscv/riscv.md (sign_bit_splat_equality_test): New pattern.

gcc/testsuite/

* gcc.target/riscv/pr94892.c: New test.

libstdc++: Make pointer_traits::pointer_to constexpr for main template.

This resolves LWG3454, "pointer_traits::pointer_to should be constexpr",
accepted in Kona 2025.

The change is applied since C++20, i.e. standard in which pointer_to
was made constexpr for T* specialization.

libstdc++-v3/ChangeLog:

* include/bits/ptr_traits.h (__ptr_traits_ptr_to::pointer_to):
Define as constexpr since C++20.
* testsuite/20_util/pointer_traits/pointer_to_constexpr.cc:
New test for custom pointer-like type.

libstdc++: Expand tests for std::type_order.

Expanded test coverage to include:
* array of unknown bounds
* function types and pointers
* data and function member pointers

libstdc++-v3/ChangeLog:

* testsuite/18_support/comparisons/type_order/1.cc: Add tests
for more type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/type-order1.C: Add tests for more type.

Reviewed-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libgomp.fortran/map-subarray-6.f90: Fix and robustify

Changes:
* Actually initialize the proper variable.
* Handle the three cases explicitly: self mapping/host fallback, mapping
  but host accessible and mapping and (potentially) not host accessible.
  Hence, remove 'dg-should-fail' - as the code should now always run.
* Add more checks for not pointer attaching, using values outside mapped
  range.
* Add several comments and handle the case that 'tgt' is actually removed
  during gimplification as unused. (Two cases: once the result with 'tgt'
  removed - and once using 'tgt'/'tgt2' in the target region - and checking
  then for the result).

libgomp/ChangeLog:

* testsuite/libgomp.fortran/map-subarray-6.f90: Fix, extend, and
robustify.

Avoid live code-generation for stmts kept as scalars

The following avoids trying to code-generate live lane extracts for
scalar defs that we have to keep anyway because they are used in
SLP graph leafs as extern inputs.

This resolves the known cases of one of the workarounds in live
code-generation.

* tree-vect-slp.cc (vect_bb_slp_mark_live_stmts): Do not
attempt to live code-generate defs that are kept in scalar
form anyway.
* tree-vect-loop.cc (vectorizable_live_operation): Update
comment.

Cost each BB vect live lane only once

The following makes sure to cost live scalar stmts appearing in multiple
SLP nodes only once and code-generate them from the SLP node we verified
we can replace all scalar uses from.

* tree-vectorizer.h (_slp_tree::live_lanes): New vector.
(SLP_TREE_LIVE_LANES): New.
* tree-vect-loop.cc (vectorizable_live_operation): Append
to SLP_TREE_LIVE_LANES.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_LIVE_LANES.
(_slp_tree::~_slp_tree): Release SLP_TREE_LIVE_LANES.
(vect_print_slp_tree): Adjust live lane dumping, indicating
the SLP node a lane is code generated from.
(vect_bb_slp_mark_live_stmts): No longer verify we can
code-generate from all SLP nodes but at least one, picking
the first.
* tree-vect-stmts.cc (vect_transform_stmt): Iterate over
SLP_TREE_LIVE_LANES.
(vect_analyze_stmt): Also analyze reductions for live
lanes.

tree-optimization/124222 - rewrite BB SLP costing scalar coverage

The following uses the vector coverage indicated by SLP_TREE_TYPE to
improve and simplify BB vector scalar costing, finally handling SLP
patterns properly.

PR tree-optimization/124222
* tree-vect-slp.cc (vect_slp_gather_vectorized_scalar_stmts): Remove.
(vect_bb_slp_scalar_cost): Simplify by using SLP_TREE_TYPE and
a use-def walk of the scalar stmts SSA uses.
(vect_bb_vectorization_profitable_p): Simplify.

* gcc.dg/vect/costmodel/x86_64/costmodel-pr124222.c: New testcase.

Simplify vect_bb_slp_mark_live_stmts

The following uses the full scalar stmt coverage now denoted by
SLP_TREE_TYPE to simplify computing STMT_VINFO_LIVE_P for code
generation of live lanes.

* tree-vect-slp.cc (vec_slp_has_scalar_use): Remove.
(vect_bb_slp_mark_live_stmts): Simplify.

Re-do vect_mark_slp_stmts to compute full scalar stmt coverage

The following re-purposes STMT_SLP_TYPE for BB vectorization to indicate
the scalar (non-pattern) stmt coverage of the vectorized SLP graph.
This will allow for simpler and more precise determining of live lanes
and scalar costing.

* tree-vect-slp.cc (vect_slp_analyze_bb_1): Split out pure_slp
marking into ...
(vect_bb_slp_mark_stmts_vectorized): ... new function. Compute
full scalar stmt coverage of the SLP graph.
(vect_slp_gather_extern_scalar_stmts): New helper.
(vect_bb_slp_mark_live_stmts): Adjust.
* tree-vect-loop.cc (vectorizable_live_operation): Likewise.

Add slp_oprnds class

This adds a helper class to marshall between SLP and GIMPLE operands.

* tree-vect-slp.cc (struct slp_oprnds): New.

Move BB analysis code to make flow more obvious

The following moves BB vect live stmt marking out of
vect_slp_analyze_operations to vect_slp_analyze_bb_1 and SLP stmt marking,
marking some vectorized stmts as PURE_SLP, right before it which
is the only remaining consumer.

* tree-vect-slp.cc (vect_slp_analyze_operations): Move
vect_bb_slp_mark_live_stmts call ...
(vect_slp_analyze_bb_1): ... here. Move SLP stmt marking
right before it.
(vect_mark_slp_stmts): Remove unused overload.

Cleanup vect_slp_child_index_for_operand and vect_get_operand_map APIs

This makes them get a stmt_vec_info, eliding the separate gather-scatter
flag argument.

* tree-vectorizer.h (vect_slp_child_index_for_operand): Get
a stmt_vec_info, elide gather-scatter flag.
* tree-vect-stmts.cc (vect_check_store_rhs): Adjust.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vect-slp.cc (vect_get_operand_map): New overload
with a stmt_vec_info argument.
(vect_slp_child_index_for_operand): Adjust.
(vect_get_and_check_slp_defs): Likewise.
(compatible_calls_p): Likewise.
(vect_build_slp_tree_2): Likewise.

Handle VIEW_CONVERT_EXPR via vect_get_operand_map

The following makes VIEW_CONVERT_EXPR handling in vect_get_and_check_slp_defs
transparent via vect_get_operand_map, thereby applying some TLC there.

* tree-vect-slp.cc (vect_get_operand_map): Put operand map
data here. Handle VIEW_CONVERT_EXPR, use else if, factor
out assertion.
(vect_get_and_check_slp_defs): Remove explicit VIEW_CONVERT_EXPR
handling here.

i386: Avoid redundant classify_argument call in construct_container

In construct_container, remove the early call to classify_argument.
examine_argument function already invokes classify_argument internally
and returns early if the argument must be passed in memory, making the
initial call in construct_container redundant. Recompute the
classification only when needed and assert that it succeeds.

While there, change the type of the in_return parameter from int to
bool in examine_argument and construct_container, and adjust all call
sites accordingly.

No functional change intended.

gcc/
* config/i386/i386.cc (construct_container): Remove redundant
early call to classify_argument. Recompute the classification
only when needed and assert that it succeeds.
(examine_argument): Change in_return parameter type to bool.
(function_arg_advance_64): Update call to examine_argument.
(function_arg_64): Update call to construct_container.
(function_value_64): Likewise.
(ix86_return_in_memory): Update call to examine_argument.
(ix86_gimplify_va_arg): Update calls to construct_container
and examine_argument.

RISC-V: Specify -mcpu if --with-cpu is used

Commit 5be645a introduces the support of --with-cpu.  The current
implementation specifies an `-march` option based on the default cpu
value.  This behavior is not consistent with how the `-mcpu` option
works, however, as the `-mcpu` option involves both `-march` and
`-mtune`.  Only setting the `-march` value can be confusing for users,
who may expect the --with-cpu option to act the same way as an explicit
`-mcpu` option.

If it is some design choice, though, I would be glad to know.

Thanks,
Bohan

gcc/ChangeLog:

* config/riscv/riscv.h (OPTION_DEFAULT_SPECS): Specify -mcpu
instead of -march when --with-cpu is used.

simplify-rtx: Simplify (cmp (and/ior x C1) C2)

This is v6 of
https://gcc.gnu.org/pipermail/gcc-patches/2026-March/711809.html, fixed
and enhanced as was suggested by Philipp and Andrew.  The previous
version:
https://gcc.gnu.org/pipermail/gcc-patches/2026-April/714878.html.
Andrew noticed the x86 regression and suggested updating the testcase.

This patch adds missing simplifications for (cmp (and/ior x C1) C2) in
special cases.  In the AND case, when (and C1 C2) is not equal to C1,
some bits set in C2 are not set in C1, and thus (eq (and x C1) C2) can
never be true.  The OR case is similar when (and C1 C2) is not equal to
C2.  As we know that the result of (and x C1) cannot be greater than C1,
and that that of (or x C1) cannot be less than C1 for unsigned integers,
LTU, LEU, GTU, GEU cases can be optimized, too.

The patch is meant to fix an ICE on RISC-V.  In a former patch, I tried
to change the insn condition directly, but Jeff pointed out that it was
more reasonable to optimize it out before the split.  As was suggested
by Jeff, this patch tries to simplify the expression in
simplify_relational_operation_1.

The URL for the former patch:
https://patchwork.sourceware.org/project/gcc/patch/20251229024238.15044-1-garthlei@linux.alibaba.com/

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_relational_operation_1):
Add simplifications for `(cmp (and/ior x C1) C2)`.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113609-1.c: Change assembly check after
optimization.
* gcc.target/riscv/zbs-if_then_else-02.c: New test.

Add comment to vect_estimate_min_profitable_iters

The following adds a comment how it's awkward to add the scalar loop
stmt cost vectors with scaled count to the vector loop cost vector
to estimate peeling costs.

* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Add comment about costing of prologue/epilogue.

Use scalar_costs in vect_get_known_peeling_cost

The following makes us use the scalar_costs summary when computing
the peeling cost part comparing different peelings for alignment.
That gets rid of repeated walks of LOOP_VINFO_SCALAR_ITERATION_COST
and the fallback to builtin_vectorization_cost.

* tree-vect-loop.cc (vect_get_known_peeling_cost): Use
scalar_costs instead of guesstimating it.

Cost scalar into vect_body

The following adjusts vect_compute_single_scalar_iteration_cost to
record stmts as vect_body rather than vect_prologue so that
scalar_costs->body_cost () will not be zero.

* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Record stmt cost to vect_body.

Simplify vect_get_known_peeling_cost

The following reflects into vect_get_known_peeling_cost what it actually
does and simplifies that with the three callers in mind which do not
need most of what is computed. The function ends up using
legacy builtin_vectorization_cost to sum up N scalar loop copies.

With the next patch in the series this should improve, also
compile-time wise.

* tree-vectorizer.h (vect_get_known_peeling_cost): Simplify API.
* tree-vect-loop.cc (vect_get_known_peeling_cost): Avoid
all the overhead of record_stmt_cost as we only are interested
in the overall sum of the included builtin_vectorization_cost
calls.
* tree-vect-data-refs.cc (vect_peeling_hash_get_lowest_cost):
Adjust.
(vect_enhance_data_refs_alignment): Likewise.

MAINTAINERS: Add myself to write after approval and DCO

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO.

match.pd: x != CST1 ? x + CST2 : CST3 -> x + CST2 [PR112659, PR122996]

This patch extends the conditional addition simplification introduced
in PR 122996 to handle a third constant and uniform vectors. This also
resolves the missing vector addition fold noted in PR 112659.

It simplifies x != CST1 ? x + CST2 : CST3 into x + CST2 when
CST1 + CST2 == CST3.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

PR tree-optimization/112659
PR tree-optimization/122996

gcc/ChangeLog:

* match.pd: Extend conditional addition pattern to handle
a third constant and uniform vectors.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cond-add-vec-1.C: New test (positive cases).
* g++.dg/tree-ssa/cond-add-vec-2.C: New test (negative cases).
* gcc.dg/tree-ssa/cond-add-1.c: New test (positive cases).
* gcc.dg/tree-ssa/cond-add-2.c: New test (negative cases).

RISC-V: Use long jump for crossing section boundaries

When -freorder-blocks-and-partition is used, GCC places cold code in
.text.unlikely section.  Jumps from hot code (.text) to cold code
(.text.unlikely) may cross section boundaries.  Since the linker may
place these sections more than 1MB apart, the JAL instruction's ±1MB
range can be exceeded, causing linker errors like:

  relocation truncated to fit: R_RISCV_JAL against `.text.unlikely'

This patch fixes the issue by checking CROSSING_JUMP_P in the length
attribute calculation for jump instructions.  When a jump crosses
section boundaries, the length is set to 8 bytes (AUIPC+JALR) instead
of 4 bytes (JAL), ensuring the long form is used.

This approach is consistent with other backends (NDS32, SH, ARC) that
also use CROSSING_JUMP_P to handle cross-section jumps.

gcc/ChangeLog:

* config/riscv/riscv.md (length attribute): Check CROSSING_JUMP_P
for jump instructions and use length 8 for crossing jumps.
(jump): Update comment to explain when long form is used.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr-crossing-jump-1.c: New test.
* gcc.target/riscv/pr-crossing-jump-2.c: New test.
* gcc.target/riscv/pr-crossing-jump-3.c: New test.

range-op-float: Fix ICE on undefined_p ranges [PR125039]

The following testcase ICEs at -O1 since r14-4153.
lower_bound/upper_bound methods on frange (and others) assert
they aren't called on undefined_p () ranges, because such ranges
don't really have any lower or upper bound.
Most fold_range virtual methods call empty_range_varying early
which checks if the operand ranges aren't undefined and in that case
return true and set r to varying, and then can safely use
lower_bound/upper_bound etc.
Now, operator_not_equal::fold_range did that until r14-4152 indirectly,
by calling it in frelop_early_resolve which it called unconditionally.
r14-4153 changed it not to call frelop_early_resolve in some cases
because it could misbehave as mentioned in the comment.
frelop_early_resolve has 3 conditionals it handles.
  if (!maybe_isnan (op1, op2) && relation_union (rel, my_rel) == my_rel)
doesn't apply for this case, because the
  if (rel == VREL_EQ && maybe_isnan (op1, op2))
condition means maybe_isnan (op1, op2) will be true.
  if (relation_intersect (rel, my_rel) == VREL_UNDEFINED)
is the condition which r14-4153 wanted to avoid.  And finally
  if (empty_range_varying (r, type, op1, op2))
is the condition the following patch readds, so that we don't ICE on those.

2026-04-27  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/125039
* range-op-float.cc (operator_not_equal::fold_range): Call
empty_range_varying when not calling frelop_early_resolve.

* gcc.c-torture/compile/pr125039.c: New test.

Reviewed-by: Richard Biener <rguenth@suse.de>

c, middle-end: Implement C2Y N3747 paper - Integer Sets, v5

C23 disallowed signed _BitInt(1), it only allowed unsigned _BitInt(1)
and signed _BitInt(2) and larger precisions.
The https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3747.pdf paper
changes this for C2Y and allows signed _BitInt(1) and (backwards
incompatibly) changes the type of 0wb from _BitInt(2) to _BitInt(1),
all other literals keep their earlier types.  The paper contains large
redesign of the C types hierarchy, but my understanding is that only
those two changes are changing something for users.

2026-04-28  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree.cc (build_bitint_type): Allow build_bitint_type (1, 0).
(signed_or_unsigned_type_for): Call that for !unsignedp case
of BITINT_TYPE with bits 1.
gcc/c-family/
* c-common.cc (c_common_signed_or_unsigned_type): Use
build_bitint_type for TREE_CODE (type) == BITINT_TYPE whenever
flag_isoc2y even when precision is 1.
(c_common_get_alias_set): Don't special case BITINT_TYPE
with precision 1 for flag_isoc2y.
* c-lex.cc (interpret_integer): Use _BitInt(1) type for 0wb
if flag_isoc2y, rather than _BitInt(2).
gcc/c/
* c-decl.cc (finish_declspecs) <case cts_bitint>: Implement
C2Y N3747 - Integer Sets, v5.  Allow signed _BitInt(1) for
flag_isoc2y.
gcc/testsuite/
* gcc.dg/torture/bitint-96.c: New test.
* gcc.dg/torture/bitint-97.c: New test.
* gcc.dg/torture/bitint-98.c: New test.
* gcc.dg/bitint-130.c: New test.
* gcc.dg/bitint-131.c: New test.
* gcc.dg/bitint-132.c: New test.

Reviewed-by: Joseph Myers <josmyers@redhat.com>

ivopts: Fix up doloop support for enum and bitint types [PR125036]

After r17-89-g78280307c78ead, ninter handles enum types so ivcannon
will also use enum types and now ninter returns an enum type here.
Also add_iv_candidate_for_doloop was expecting only integer types.
This fixes the issue by allowing non integer types for what ninter
returns and converts it into an integer type which is a full mode
integer type.

Bootstrapped and tested on powerpc64le-linux-gnu.

PR tree-optimization/125036

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (add_iv_candidate_for_doloop): Don't
assert on niniter being integer type. Convert to a full mode
integer type if non integer or non-full mode integer type.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr125036-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

c++: ICE with [[trivial_abi]] [PR125022]

We don't mess with cleanups at template parsing time.

PR c++/125022

gcc/cp/ChangeLog:

* decl.cc (store_parm_decls): Only do trivial_abi stuff when
!processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/attr-trivial_abi7.C: New test.

Daily bump.

Update gcc sv.po

* sv.po: Update.

[RISC-V][PR tree-optimization/57650] Detect more czero opportunities

So in pr57650 we have RTL like this:

> (set (reg:DI 147)
>     (and:DI (gt:DI (reg:DI 153 [ y ])
>             (reg:DI 154 [ z ]))
>         (ne:DI (reg/v/f:DI 138 [ x ])
>             (const_int 0 [0]))))

That's going to generate:

        sgt     a1,a1,a2
        snez    a5,a0
        and     a5,a5,a1

But with zicond we can do better.  That's really just:

        sgt     a1,a1,a2
        czero.eqz       a1,a1,a0

We already had patterns to clean this kind of mess up a bit, but they needed a
bit more generalization.  First they only accepted NE forms, but EQ is just as
valid and just requires us to select between czero.nez and czero.eqz.  Second
the AND is commutative, so the equality test can appear in either position.
With those generalizations we can get the desired code.  Note I'm not trying to
tackle the larger problems with 57650, just the low level code generation
inefficiencies.

This has been in my tester for a while without regressions and is being
exercised during a bootstrap on the BPI.  I'll wait for pre-commit CI to render
a verdict.

PR tree-optimization/57650
gcc/
* config/riscv/zicond.md: Generalize patterns which identify
a logical AND of an equality test and some other sCC insn to
handle more cases.

gcc/testsuite/
* gcc.target/riscv/pr57650.c: New test.

c++: fix decltype(id) for pointer-to-data-member access expr [PR124978]

Here after substitution into decltype(X), X is the expanded but not
constant-evaluated pointer-to-data-member access expression

  *((const int *) *cw<Divide{42}>::value + (sizetype) *cw<&Divide::value>::value)

and finish_decltype_type wrongly strips the outermost INDIRECT_REF under
the assumption that it's an implicit dereference of a reference, but here
it's an explicit pointer dereference.  This causes the decltype to yield
const int* instead of the expected int.

This patch fixes this particular bug by checking REFERENCE_REF_P instead
of INDIRECT_REF_P which additionally verifies the dereferenced thing
actually has reference type.  The decltype now yields the correct type
modulo an unnecessary const due to the separate bug PR115314.

PR c++/124978
PR c++/115314

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Check REFERENCE_REF_P
instead of INDIRECT_REF_P before stripping implicit dereferences.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class74.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: defer completion of streamed-in cNTTPs [PR124953]

Here we hit lazy loading recursion when streaming in the cNTTP object
wrap<Storage>{}, via get_template_parm_object -> cp_finish_decl ->
ensure_literal_type_for_constexpr_object -> complete_type, apparently
the class definition of wrap<Storage> hasn't been streamed in yet.
If we disable that literal type check for NTTP objects, we still hit
recursion, from layout_var_decl.

It seems prudent to defer calling cp_finish_decl for NTTP objects until
after lazy loading has completed like we do for expand_or_defer_fn and
cdtors. This patch arranges that, as a follow-up to the some previous
NTTP object streaming fixes r15-3031 and r16-318.

PR c++/124953

gcc/cp/ChangeLog:

* module.cc (trees_in::tree_node) <tt_nttp_var>: Push the result
of get_template_parm_object to post_load_decls.
(post_load_processing): Call cp_finish_decl on any not yet
completed NTTP objects.
* pt.cc (get_template_parm_object): Don't call cp_finish_decl
when !check_init.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-nttp-3_a.H: New test.
* g++.dg/modules/tpl-nttp-3_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

aarch64: Remove redundant m_curr_insn initialization/de-initialization

m_curr_insn has a default member initializer (= nullptr) in the class
declaration, so the explicit assignment in the ctor is redundant.

The assignment in the dtor is also unnecessary since the member's lifetime ends
with the object.

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:

* config/aarch64/aarch64-narrow-gp-writes.cc
(narrow_gp_writes::narrow_gp_writes): Remove redundant m_curr_insn
initialization.
(narrow_gp_writes::~narrow_gp_writes): Remove redundant m_curr_insn
de-initialization.

ext-dce: Promote narrow operations to wider mode when extended bits are dead

When an operation like (sign_extend:DI (plus:SI ...)) has dead extended
bits, promote the inner operation to the wider mode, eliminating the
extension wrapper. This enables combine to see DI-mode sequences and
form instructions like sh1add, sh2add, sh3add on RISC-V.

Only promote candidates that form chains — where one candidate's result
feeds into another's operand. Standalone (isolated) promotions are
skipped because they cause regressions on targets with free sign
extension (e.g., RISC-V W-suffix instructions): they prevent combine
from folding sext.w patterns and break combine split patterns that
depend on the sign_extend wrapper (sh1add, packw).

Chain detection tracks promotion candidates and their register
connections within each basic block, propagating through copies
created by optimized extensions.

gcc/ChangeLog:

* ext-dce.cc (promotion_candidate_info): New struct.
(copy_info): New struct.
(promotion_candidates, promotable_dests): New file-scope variables.
(consumed_by_candidate, promotion_copies): Likewise.
(ext_dce_try_promote_operation): New function to promote
sign/zero-extended arithmetic to wider mode.
(ext_dce_record_promotion_candidate): New function to record
promotion candidates for deferred chain analysis.
(ext_dce_promote_chained_candidates): New function to promote
only chained candidates.
(ext_dce_process_uses): Record candidates instead of promoting
immediately; propagate chain info through optimized copies.
(ext_dce_process_bb): Call ext_dce_promote_chained_candidates
after processing all insns in a block.
(ext_dce_init): Allocate chain detection bitmaps.
(ext_dce_finish): Free chain detection data structures.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/ext-dce-promote-2.c: Update to verify both
chain promotions (sh1add, sh3add) and standalone skipping.

ext-dce: Only remove REG_EQUAL/EQUIV notes on successful optimization

In ext_dce_try_optimize_extension, REG_EQUAL/EQUIV notes were removed
unconditionally after attempting validate_change, even when the
validation failed and the insn was reverted to its original state.
This could cause subsequent passes to generate different (incorrect)
code because they lost the REG_EQUAL hint on an unchanged insn.

Guard the note removal with the 'ok' flag so notes are only stripped
when validate_change actually committed the transformation.

gcc/ChangeLog:

* ext-dce.cc (ext_dce_try_optimize_extension): Only remove
REG_EQUAL/EQUIV notes when validate_change succeeds.

aarch64: Update br_mispredict_factor for generic tunings

After some testing, we have found that a br_mispredict_factor of 7 is more
suitable than the default factor of 6 that was proposed in d7aebc72899.
6 can be too restrictive on certain workloads and reject cheaper csels in favour
of conditional branches.

On an Olympus core, this change improves SPEC2017 fp rate geomean by 1% while
the int rate geomean is unchanged. There are no visible regressions >1%.

Additionally, github.com/facebook/zstd retains the performance improvement this
patch introduced.

Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:

* config/aarch64/tuning_models/generic.h: Update br_mispredict_factor
to 7.

match.pd: Relax single_use for fold-to-zero comparisons

The single_use restriction on the X +- C1 CMP C2 -> X CMP C2 -+ C1
simplification (for eq/ne) prevents folding patterns like (++*a == 1)
into (*a == 0) when the defining SSA value has multiple uses.

Comparing against zero is cheaper on most targets (beqz on RISC-V,
cbz on AArch64), so the transform is profitable even when the
defining SSA has multiple uses.  Relax single_use when the folded
comparison constant is zero.

For example, given:
  _1 = *a;
  _2 = _1 + 1;
  *a = _2;
  if (_2 == 1)

match.pd now produces:
  if (_1 == 0)

which generates beqz/cbz instead of li+beq/cmp+b.eq.

This is a partial fix towards the issue described in PR120283.

gcc/ChangeLog:

* match.pd: Relax single_use for eq/ne when folded constant
is zero.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/forwprop-pre-incr-cmp.c: New test.

c++: constexpr union with no active member [PR124910]

Patrick pointed out that while r16-8767 made a union constant after
destroying its active member, we still weren't treating a union that never
had an active member as constant; the difference is the
CONSTRUCTOR_NO_CLEARING flag, and what that means to
reduced_constant_expression_p.

It seems to me that since P2686 [expr.const] says whether a prvalue
expression is a constant expression depends on the constituent values, and
[intro.object] says that only the active member is a constituent value of a
union, so a union with no active member has no constituent values and so is
vacuously constant, like an object of empty type. P2686 as a whole is not a
DR, but the draft was previously unclear, and CWG2658 also clarified that
copying a union is equivalent to copying the active member *if any*.

I was somewhat surprised that none of the existing tests needed to be
changed.

PR c++/124910
DR 2658

gcc/cp/ChangeLog:

* constexpr.cc (reduced_constant_expression_p): Allow a union
with no active member.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-union12.C: New test.

[LRA]: Fix reg notes update

There is a typo in using dead_set instead of set in
clear_sparseset_regnos and regnos_in_sparseset_p. This can result in
wrong unused (stalled) notes and wrong or worse code generation by
optimizations using unused notes after RA.

gcc/ChangeLog:

* lra-lives.cc (clear_sparseset_regnos, regnos_in_sparseset_p):
Use set instead of dead_set.

[IRA]: Fix some cost calculation.

ira_memory_move_cost is used in many IRA places but I found 2 places where
load and store costs are used instead of correspondingly store and load
costs. The patch fixes this.

gcc/ChangeLog:

* ira-costs.cc (record_reg_classes): When calculating alt_cost use
the right cost of memory-reg move.
* ira-emit.cc (emit_move_list): Use load cost instead of store for
moving memory to reg.

[RA]: Fix some typos and remove unused code

The following patch fixes different harmless typos and removes some unused code.

gcc/ChangeLog:

* ira-build.cc (add_to_conflicts): Use sizeof(ira_object_p)
instead of sizeof(ira_allocno_t) for allocations.
* ira-color.cc (print_hard_reg_set): Fix printing hard reg set.
* ira-emit.cc (allocno_last_set, allocno_last_set_check): Remove
unused static variables.
* ira.cc (combine_and_move_insns): Fix dead note recognition.
(ira_remove_insn_scratches): Use dump_file instead of
ira_dump_file.
* lra-constraints.cc (match_reload): Remove always true condition.
(undo_optional_reloads): Fix recognition of clobber for assertion.

match: (X * C1) + (X << C2) -> X * (C1 + (1 << C2)) [PR124886]

This patch adds the following match pattern.
(X * C1) + (X << C2) -> X * (C1 + (1 << C2))

Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

PR tree-optimization/124886

gcc/ChangeLog:

* match.pd ((X * C1) + (X << C2) -> X * (C1 + (1 << C2))): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr124886.c: New test.

Signed-off-by: Pengxuan Zheng <pengxuan.zheng@oss.qualcomm.com>

[RISC-V][PR target/121268] Add splitters to improve andn generation

So if we have something like (and (not X) (not Y)) where X or Y is a simple
register and the other is possibly more complex, but implementable with a
single instruction, we want to split at the the complex expression.  Let's say
it's Y above.  We want to generate

(set (temp) (not Y))
(set (dest) (and (not (X) (temp))

The most interesting cases for Y exploit the ~x = -x + 1 identity or (x & -x) -
1 = (x - 1) & ~x

If we take two functions from the PR:

unsigned int f1(unsigned int x)
{
    return ~(x | -x);
}

unsigned int f3(unsigned int x)
{
    return (x & -x) - 1;
}

Currently generates this on rv64:

f1:
        negw    a5,a0
        or      a0,a5,a0
        not     a0,a0
        ret

f3:
        negw    a5,a0
        and     a0,a5,a0
        addiw   a0,a0,-1
        ret

After this patch we generate:

f1:
        addiw   a5,a0,-1
        andn    a0,a5,a0
        ret

f3:
        addiw   a5,a0,-1
        andn    a0,a5,a0
        ret

I considered doing these in simplify-rtx.  My biggest worry is over-fitting to
the way the RISC-V port expresses the "w" form instructions.  So I stuck with a
target specific solution.

It's just a few 3->2 splitters.   The bulk of the patch has been in my tester
for a while, but the last pattern is new after I did some experimentation on
rv32 to make sure it's generating sensible code too.  The runs in my tester
have all been without regressions.  Obviously I'll be waiting on the pre-commit
CI system to render a verdict.

PR target/121268
gcc/
* config/riscv/bitmanip.md: Add splitters to exploit identities
that relate subtraction and bitwise negation on 2's complement
arithmetic.

gcc/testsuite/
* gcc.target/riscv/pr121268.c: New test.

aarch64/testsuite: add LTO coverage for branch-protection notes and attributes

Recent binutils (e.g. 2.46) switched AArch64 branch-protection emission
from .note.gnu.property to build attributes
(Tag_Feature_BTI, Tag_Feature_PAC, Tag_Feature_GCS) when GCC is
configured with such toolchains.

PR target/124365 exposed an issue where -flto with
-mbranch-protection=standard caused loss of branch-protection metadata in build
attributes. This was due to an LTO bug, now fixed upstream
(8b39ec70741b7fb9d059b6944f30a6743dea996a).

Add tests to verify both forms in LTO builds, covering:
• older binutils behaviour (.note.gnu.property), and
• newer binutils behaviour (build attributes).

This ensures branch-protection metadata is preserved across LTO for both
toolchain configurations.

PR target/124365

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lto/lto.exp: New DejaGnu test driver for LTO tests
for aarch64. Copied from gcc/testsuite/gcc.target/arm/lto/lto.exp with
minor changes.
* gcc.target/aarch64/lto/pr124365-build-attributes-1_0.c: New test
for build attributes with branch protection.
* gcc.target/aarch64/lto/pr124365-build-attributes-1_1.c: Companion
source file for the LTO test.
* gcc.target/aarch64/lto/pr124365-build-attributes-2_0.c: New test
for build attributes without branch protection.
* gcc.target/aarch64/lto/pr124365-build-attributes-2_1.c: Companion
source file for the LTO test with branch protection enabled.
* gcc.target/aarch64/lto/pr124365-gnu-property-1_0.c: New test for
`.note.gnu.property` with branch protection.
* gcc.target/aarch64/lto/pr124365-gnu-property-1_1.c: Companion
source file for the LTO test.
* gcc.target/aarch64/lto/pr124365-gnu-property-2_0.c: New test for
`.note.gnu.property` without branch protection.
* gcc.target/aarch64/lto/pr124365-gnu-property-2_1.c: Companion
source file for the LTO test with branch protection enabled.

testsuite: Extend object-readelf beyond attributes

object-readelf in lib/lto.exp was hard-wired to use readelf -A,
limiting it to attribute checks. Extend it to accept a readelf option and
a regex, where the option selects the readelf flag and the regex is
matched against the output.

Add wrapper procedures for common use cases:

• attribute checks, and
• note checks.

Also add support for negative checks via an "is-negative" argument,
which requires that the regex is not present in the output.

gcc/ChangeLog:

* doc/sourcebuild.texi (Scan object metadata with readelf): Document
object-readelf-attributes, object-readelf-attributes-not,
object-readelf-notes, and object-readelf-notes-not as regex-based
checks with optional target/xfail selectors.

gcc/testsuite/ChangeLog:

* lib/lto.exp (object-readelf): Accept a readelf option and a single
regex; match against full readelf output. Keep positive/negative
behaviour via wrappers.
(object-readelf-attributes, object-readelf-attributes-not,
object-readelf-notes, object-readelf-notes-not): Implement as wrappers
over the generic matcher.
* gcc.dg-selftests/dg-final.exp (dg_final_directive_check_num_args):
Update for object-readelf-* wrappers to regex-style arguments (1..3).
* gcc.target/arm/lto/pr61123-enum-size_0.c: Update to use
object-readelf-attributes with a single regex.

[PATCH v3] tree-optimization: lower mempcpy to memcpy when result is unused [PR93556]

This patch allows the GIMPLE folder to transform __builtin_mempcpy into
__builtin_memcpy in cases where the return value is ignored. This is beneficial
because most targets have an efficient implementation for memcpy.

Existing tests that relied on the unfolded mempcpy have been duplicated - one
version now takes the folded mempcpy into account, and the other intentionally
prevents the folding from happening.

Bootstrapped and regression tested on x86_64-linux-gnu.

PR tree-optimization/93556

gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_builtin_mempcpy): New function.
(gimple_fold_builtin): Handle BUILT_IN_MEMPCPY.

gcc/testsuite/ChangeLog:

* gcc.dg/pr79223.c: Rename to gcc.dg/pr79223-1.c and update scans.
* gcc.dg/tree-prof/val-prof-7.c: Rename to
gcc.dg/tree-prof/val-prof-7-1.c and update scans.
* gcc.dg/tree-ssa/builtins-folding-gimple-3.c: Update scans.
* gcc.dg/builtin-mempcpy-1.c: New test.
* gcc.dg/builtin-mempcpy-2.c: New test.
* gcc.dg/pr79223-2.c: New test.
* gcc.dg/tree-prof/val-prof-7-2.c: New test.
* gcc.dg/tree-ssa/builtins-folding-gimple-4.c: New test.

Signed-off-by: Netanel Komm <netanelkomm@gmail.com>

libstdc++: Fix up std::is_scalar for std::meta::info [PR125024]

https://eel.is/c++draft/basic.types.general#9.sentence-1 says that
std::meta::info and its cv-qualified versions are scalar types too
(and in https://eel.is/c++draft/basic.fundamental#19.sentence-1
that they are fundamental types too).
Now, on the reflection side, eval_is_scalar_type is handled
in the compiler and uses SCALAR_TYPE_P (type) which includes
REFLECTION_TYPE_P check and eval_is_fundamental_type includes that
explicitly too.
std::is_fundamental uses
   template<typename _Tp>
     struct is_fundamental
     : public __or_<is_arithmetic<_Tp>, is_void<_Tp>,
                    is_null_pointer<_Tp>
#if __cpp_impl_reflection >= 202506L
                    , is_reflection<_Tp>
#endif
                    >::type
     { };
but for std::is_scalar we apparently forgot to include is_reflection.

The following patch fixes that.

2026-04-26  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/125024
* include/std/type_traits (std::is_scalar): For
__cpp_impl_reflection >= 202506L handle is_reflection types as
scalar.
* testsuite/20_util/is_scalar/reflection.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

testsuite: Fix up bitint-95.c test [PR124988]

I forgot to add the usual guards of bitint tests to bitint-95.c test
(which were done even in the 4 other tests from the same commit).

2026-04-27 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/124988
* gcc.dg/torture/bitint-95.c: Add bitint effective targets and
guard parts of test which need _BitInt(192) support with
__BITINT_MAXWIDTH__ >= 192.

tree-optimization/125025 - ICE with niter analysis and UBSAN

The following avoids trying to compute the absolute step by
negating a signed step, instead, as done in one other place
already, first convert to unsigned and then negate.

PR tree-optimization/125025
* tree-ssa-loop-niter.cc (number_of_iterations_ne): Avoid
negation of most negative signed integer.
(number_of_iterations_lt): Likewise.

* gcc.dg/torture/pr125025.c: New testcase.

tree-optimization/125019 - fix ICE with recurrence vectorization

This fixes an oversight with the PR124677 fix.

PR tree-optimization/125019
* tree-vect-loop.cc (vectorizable_recurr): Properly guard
against hitting last stmt when searching for the insertion
place.

* gcc.dg/pr125019.c: New testcase.

Daily bump.

match: Optimize `signed < 0 ? positive : min<signed, positive>` into `(signed)min<(unsigned), (unsigned)positive>` [PR110262]

While looking into PR 110252 a few years back, I noticed this missed
optimization in code from sel-sched.cc. I only realized today
I could generalize it to handle more than just 1 to all positive
values.
This adds the pattern to optimize:
signed < 0 ? positive : min<signed, positive>
into:
unsigned ts = signed;
unsigned ps = positive;
unsigned ru = min<ts, tp>;
(signed)ru

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/110262

gcc/ChangeLog:

* match.pd (`signed < 0 ? positive : min<signed, positive>`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110262-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-46.c: New test.
* gcc.dg/tree-ssa/phi-opt-47.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

install: Use Binutils over binutils

gcc:
* doc/install.texi (Prerequisites): Use Binutils over binutils to
refer to that project.
(Downloading the source): Ditto.
(Configuration): Ditto.
(Building): Ditto.
(Specific): Ditto.

rtl-optimization: Simplify vec_select of a vec_select.

This patch adds an RTL optimization to simplify-rtx.cc to simplify a
vec_select of a vec_select.

A motivating example is the following code on x86_64:

typedef unsigned int v4si __attribute__((vector_size(16)));

v4si foo(v4si vec, int val) {
    vec[1] = val;
    vec[2] = val;
    return vec;
}

with -O2, GCC currently generates the following code:

foo:    movd    %edi, %xmm1
        pshufd  $225, %xmm0, %xmm0 // swap elements 0 and 1
        movss   %xmm1, %xmm0 // overwrite element 0
        pshufd  $225, %xmm0, %xmm0 // swap elements 0 and 1
        pshufd  $198, %xmm0, %xmm0 // swap elements 0 and 3
        movss   %xmm1, %xmm0 // overwrite element 0
        pshufd  $198, %xmm0, %xmm0 // swap elements 0 and 3

Notice there a two consecutive pshufd instructions, permuting the
same register.  During combine, we see:

Trying 11 -> 14:
   11: r103:V4SI=vec_select(r103:V4SI,parallel)
   14: r105:V4SI=vec_select(r103:V4SI,parallel)
      REG_DEAD r103:V4SI
Failed to match this instruction:
(set (reg:V4SI 105 [ vec_5 ])
    (vec_select:V4SI (vec_select:V4SI (reg:V4SI 103 [ vec_4 ])
            (parallel [
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                ]))
        (parallel [
                (const_int 2 [0x2])
                (const_int 1 [0x1])
                (const_int 0 [0])
                (const_int 3 [0x3])
            ])))

Clearly a permutation of a permutation is another permutation, so
the above expression can be simplified/canonicalized.  Conveniently
there's already code in simplify_rtx to spot that a vec_select of
vec_select is an identity, this patch extends that functionality to
simplify a vec_select of a vec_select to a single vec_select.

With this transformation in simplify-rtx.cc, combine now reports:

Trying 11 -> 14:
   11: r103:V4SI=vec_select(r103:V4SI,parallel)
   14: r105:V4SI=vec_select(r103:V4SI,parallel)
      REG_DEAD r103:V4SI
Successfully matched this instruction:
(set (reg:V4SI 105 [ vec_5 ])
    (vec_select:V4SI (reg:V4SI 103 [ vec_4 ])
        (parallel [
                (const_int 2 [0x2])
                (const_int 0 [0])
                (const_int 1 [0x1])
                (const_int 3 [0x3])
            ])))
allowing combination of insns 11 and 14
original costs 4 + 4 = 8
replacement cost 4

And for the example above, we now generate:

foo:    movd    %edi, %xmm1
        pshufd  $225, %xmm0, %xmm0
        movss   %xmm1, %xmm0
        pshufd  $210, %xmm0, %xmm0
        movss   %xmm1, %xmm0
        pshufd  $198, %xmm0, %xmm0
        ret

2026-04-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case VEC_SELECT>: Simplify a (non-identity) vec_select of a
vec_select.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-pshufd-2.c: New test case.

PR tree-optimization/124715: pow(0,-1) sets errno with -fmath-errno

This patch addresses PR tree-optimization/124715, where it is unsafe for
GCC (specifically match.pd) to transform pow(x,-1) into 1.0/x if x may be
zero, which sets errno, unless -fno-math-errno (included in -ffast-math)
is specified.

2026-04-26 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR tree-optimization/124715
* match.pd (simpify pows): Check flag_errno_math before simplifying
pow(x,-1) -> 1/x when x could be zero.

gcc/testsuite/ChangeLog
PR tree-optimization/124715
* gcc.dg/no-math-errno-5.c: New test case.
* gcc.dg/no-math-errno-6.c: Likewise.

i386: Refactor AVX512 comparisons in machine description sse.md.

This patch refactors/tidies up the define_insns for vector comparisons
on 512-bit vectors in sse.md.  The motivation is that the current
organization (accidentally) introduces dubious instructions such as
avx512f_cmpv16si3_mask_round and avx512vl_cmpv2di3_mask_round, which
are integer comparisons that specify a floating point rounding mode!?

The problem is caused by the decomposition of mode iterators.
Currently, sse.md uses four patterns: (1) for signed comparions
of floating point and large integer modes (V48H), (2) for signed
comparisons of small integer modes (VI12), (3) for unsigned
comparisons of small integer modes (VI12) and (4) for unsigned
comparisons of large integer modes (VI48).  The first pattern
also allows for variants specifying the FP rounding mode.

The refactoring below uses a more sensible decomposition into
only three patterns: (1) for [signed] comparisons of floating
point modes (VFH), (2) for signed comparisons of integers (VI1248)
and (3) for unsigned comparisons of integers (VI1248).

For the record, to show this produces the same coverage:

V48H = v{16,8,4}si v{8,4,2}di v{32,16,8}hf v{16,8,4}sf v{8,4,2}df
V12 = v{64,32,16}qi v{32,16,8}hi

VFH = v{32,16,8}hf v{16,8,4}sf v{8,4,2}df
VI1248 = v{64,32,16}qi v{32,16,8}hi v{16,8,4}si v{8,4,2}di

The simplification also allows a clean-up of predicates
(for operand[3]) as there are 8 integer comparison operators
and 32 floating point comparison operators, and we no longer
need cmp_imm_predicate to restrict range based upon <mode>.

V48H cmp_imm_predicate -> VFH const_0_to_31_operand (FP)
VI12 cmp_imm_predicate -> VI1248 const_0_to_7_operand (signed)
VI12 const_0_to_7_operand ->
VI48 const_0_to_7_operand -> VI1248 const_0_to_7_operand (unsigned)

There are no changes other than removing the non-sensical patterns
from insn-emit, insn-recog and friends.

2026-04-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/sse.md
(<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>):
Change mode iterator from V48H_AVX512VL to VFH_AVX512VL and op3's
predicate from <cmp_imm_predicate> to const_0_to_31_operand.
(<avx512>_cmp<mode>3<mask_scalar_merge_name>): Change mode
iterator from VI12_AVX512VL to VI1248_AVX512VLBW.
(<avx512>_ucmp<mode>3<mask_scalar_merge_name>): Likewise.

Daily bump.

[RISC-V][PR rtl-optimization/56096] Improve equality comparisons of a logical AND expressions

This BZ shows that we can improve certain comparisons for RISC-V.  In
particular if we are testing the result of a logical AND for equality and one
operand of the AND requires synthesis, we may be able to do better if we right
shift away any trailing zeros from the constant and shift the other input as
well.  This wins when the shifted constant does not require synthesis.

That may in turn allow improvement of a select of 0 and 2^n based on the
zero/nonzero status of a logical AND.  Essentially we can rewrite the sequence
to remove a data dependency.

Concretely:

>
> unsigned f1 (unsigned x, unsigned m)
> {
>     x >>= ((m & 0x008080) ? 8 : 0);
>     return x;
> }

Compiles into:

>         li      a5,32768
>         addi    a5,a5,128
>         and     a1,a1,a5
>         snez    a1,a1
>         slliw   a1,a1,3
>         srlw    a0,a0,a1
>         ret

But after this patch we generate this instead:

>         srai    a5,a1,7
>         andi    a5,a5,257
>         li      a4,8
>         czero.eqz       a1,a4,a5
>         srlw    a0,a0,a1
>         ret

It's just one less instruction, but the li can issue whenever the uarch wants
before the srlw as it has no incoming dependency.  So we're slight more dense
on encoding and slightly more efficient as well.  Much like 57650, I'm focused
on the low level RISC-V codegen issues, not the broader issues that are raised
in the PR.

This has been in my tree for a while, so it's been tested on riscv32-elf,
riscv64-elf and bootstrapped on the BPI which has support for czero.  Waiting
on pre-commit CI before moving forward.

PR rtl-optimization/56096
gcc/
* config/riscv/riscv.md: Add new patterns to optimize certain cases with
a logical AND feeding an equality test against zero.

gcc/testsuite/

* gcc.target/riscv/pr56096.c: New test.

scev/niter: Use INTEGRAL_NB_TYPE_P instead of direct comparison to INTEGER_TYPE [PR124061]

I noticed this while looking into PR 124052. This is not the first time we had
direct type comparison against INTEGER_TYPE which should have been different.
As mention in PR 124052, I didn't include bool types so I needed a new macro
to simplify things.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/124061

gcc/ChangeLog:

* tree-scalar-evolution.cc (interpret_rhs_expr): Use
INTEGRAL_NB_TYPE_P instead of comparing the code to INTEGER_TYPE.
* tree-ssa-loop-niter.cc (number_of_iterations_ne): Likewise.
(number_of_iterations_cltz): Likewise.
(number_of_iterations_exit_assumptions): Likewise.
* tree.h (INTEGRAL_NB_TYPE_P): New macro.

gcc/testsuite/ChangeLog:

* g++.dg/opt/enum-loop-1.C: New test.
* gcc.dg/tree-ssa/bitint-loop-opt-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

[RISC-V][PR target/123904] Improve bit masking of shifted values

If we are masking off bits on the upper and lower part of a register on riscv,
depending on the precise mask it may be best implemented as a shift triplet.
ie, shift left to clear upper bits, shift right to clear lower bits, shift left
again to put the bits into their proper position.

If the input value is already left shifted and the shift count corresponds to
the low mask bits, then we can get away with just two shifts. We shift left to
clear the relevant high bits, then shift right to put them into their proper
position.

This likey came from spec or coremark given it was reported to me by the RAU
team a while back. But the testcase didn't include enough breadcrumbs to know
for sure.

This has been repeatedly bootstrapped and regression tested on the Pioneer and
BPI as well as regularly regression tested on the riscv32-elf and riscv64-elf
embedded targets.

I'll wait for pre-commit CI to spin before pushing to the trunk.

PR target/123904
gcc/
* config/riscv/riscv.md (masking shifted value): New splitter to
optimize certain masking operations on shifted values.

gcc/testsuite/
* gcc.target/riscv/pr123904.c: New test.

[RISC-V][PR target/123838] Improve code generated for shifts with counts 31-N or 63-N

A shift count expressed at 31 - n ends up generating code like this:

        li      a5,31
        subw    a5,a5,a1
        sllw    a0,a0,a5
        ret

Note how we had to load 31 into a constant for the subtraction. But instead of
using 31 - n we can use a bit-not as it'll do precisely what we need in the
bits that the shift instruction actually uses.  This results in:

        not     a1, a1
        sllw    a0, a0, a1
        ret

The core idea we're exploiting here is the processor implements
SHIFT_COUNT_TRUNCATED semantics.  so a SI shift only cares about the low 5 bits
and DI the low 6 bits of the shift count.  And if we think about what bit
pattern -1 would be in those cases we get 31 and 63.  We then exploit the
identity

-x = ~x + 1  // identity
-1 - x = ~x  // a tiny bit of algebra

So in these limited cases we can place the the -1 - x with ~x.

I didn't implement this in simplify-rtx.  It wasn't actually going to help
because while the RISC-V chip implements SHIFT_COUNT_TRUNCATED semantics, it
doesn't define SHIFT_COUNT_TRUNCATED for "reasons".

So there's two patterns.  One for an X mode destination, naturally the shift
count is 31/63 - n for SI/DI respectively.  It's a bit odd that the subtraction
is always SImode, but that's probably narrowing happening somewhere.

The second pattern covers the "w" forms for rv64.

This trick probably works for the zbs instructions as well. That's going to be
a whole lot more patterns and I haven't seen this idiom show up anywhere in
practice, so it doesn't seem like a good cost/benefit analysis.

This spun overnight on riscv32-elf and riscv64-elf and on the Pioneer without
regressions.  I'll wait for pre-commit CI to do its thing before pushing.

PR target/123838
gcc/
* config/riscv/riscv.md: Use splitters to simplify shifts where
the shift count is 31-N or 63-N.

gcc/testsuite
* gcc.target/riscv/pr123838.c: New test.

Co-authored-by: Austin Law <austinklaw@gmail.com>

c: Fix recursive structure / union redeclaration with qualifiers [PR124303]

We reject correct recursive redeclarations when qualifiers are involved.
The reason is that the check is done before the variants are completed.

PR c/124303

gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Check for consistency of
declarations after completing variants.

gcc/testsuite/ChangeLog:
* gcc.dg/pr124303.c: New test.

RISC-V: Add test for vec_duplicate + vmsle.vv combine with GR2VR cost 0, 1 and 15

Add asm dump check and run test for vec_duplicate + vmsle.vv
combine to vmsle.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vmsle.vx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmsle-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmsle-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmsle-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmsle-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmsle.vv to vmsle.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmsle.vv to the
vmsle.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have asm code like below, GR2VR cost is 0.

Before this patch:
  11       beq a3,zero,.L8
  12       vsetvli a5,zero,e32,m1,ta,ma
  13       vmv.v.x v2,a2
  ...
  16   .L3:
  17       vsetvli a5,a3,e32,m1,ta,ma
  ...
  22       vmsle.vv v1,v2,v3
  ...
  25       bne a3,zero,.L3

After this patch:
  11       beq a3,zero,.L8
  ...
  14    .L3:
  15       vsetvli a5,a3,e32,m1,ta,ma
  ...
  20       vmsle.vx v1,a2,v3
  ...
  23       bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/predicates.md: Add ge to the swappable
cmp operator iterator.
* config/riscv/riscv-v.cc (get_swapped_cmp_rtx_code): Take
care of the swapped rtx code as well.

Signed-off-by: Pan Li <pan2.li@intel.com>

match.pd: remove bit set/bit clear branch mispredict [PR64567]

Add two patterns to eliminate mispredicts in the following bit ops
scenarios:

- checking if a single bit is not set, and in this case set it: always
set the bit;
- checking if a bitmask is set (even partially), and in this case clear
it: always clear the bitmask.

Bootstrapped and tested with x86_64-pc-linux-gnu.

PR tree-optimization/64567

gcc/ChangeLog:

* match.pd (`cond (bit_and A IMM) (bit_or A IMM) A`): New
pattern.
(`cond (bit_and A IMM) (bit_and A ~IMM) A`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr64567-2.c: New test.
* gcc.dg/tree-ssa/pr64567.c: New test.

tree-ssa-strlen: Use gimple_build/gimple_convert_to_ptrofftype [PR122989]

Replace convert_to_ptrofftype, force_gimple_operand_gsi,
gimple_build_assign, and gsi_insert_before with
gimple_convert_to_ptrofftype and gimple_build.

gcc/ChangeLog:

PR tree-optimization/122989
* tree-ssa-strlen.cc (get_string_length): Use
gimple_convert_to_ptrofftype and gimple_build instead of
convert_to_ptrofftype/force_gimple_operand_gsi/gimple_build_assign.

Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>

testsuite: Fix gcc.target/x86_64/abi tests on FreeBSD

The gcc.target/x86_64/abi tests currently FAIL on FreeBSD/amd64 when
using GNU ld.  Most of the failures are like

FAIL: gcc.target/x86_64/abi/test_3_element_struct_and_unions.c compilation,  -O0

gld-2.46: warning: /tmp//cckSN7Ts.o: missing .note.GNU-stack section implies executable stack
gld-2.46: NOTE: This behaviour is deprecated and will be removed in a future version of the linker

UNRESOLVED: gcc.target/x86_64/abi/test_3_element_struct_and_unions.c execution,
-O0

This causes more than 1000 failures.  This patch fixes this by emitting
.note.GNU-stack on FreeBSD, too.

With this fixed, the ms-sysv tests now FAIL with

FAIL: gcc.target/x86_64/abi/ms-sysv/ms-sysv.c  -O2 "-DGEN_ARGS=-p0" (test for excess errors)
UNRESOLVED: gcc.target/x86_64/abi/ms-sysv/ms-sysv.c  -O2 "-DGEN_ARGS=-p0" compilation failed to produce executable

Excess errors:
ms-sysv/ms-sysv-generated.h:30:1: error: bp cannot be used in 'asm' here

Like Solaris, FreeBSD empirically needs --omit-rbp-clobbers in GEN_ARGS.

There's one more failure:

FAIL: gcc.target/x86_64/abi/callabi/leaf-2.c scan-assembler-not %rsp

This test expects -fomit-frame-pointer, while FreeBSD defaults to
-fno-omit-frame-pointer.

Bootstrapped without regressions on amd64-pc-freebsd15.0 with both gld
and /usr/bin/ld (lld), and x86_64-pc-linux-gnu.

2026-03-18  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/x86_64/abi/asm-support.S: Use .note.GNU-stack on
FreeBSD, too.
* gcc.target/x86_64/abi/avx/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512f/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/bf16/asm-support.S: Likewise.
* gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Likewise.
* gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Likewise.
* gcc.target/x86_64/abi/ms-sysv/do-test.S: Likewise.
Update comment.

* gcc.target/x86_64/abi/ms-sysv/ms-sysv.exp (runtest_ms_sysv): Add
--omit-rbp-clobbers on FreeBSD.

* gcc.target/x86_64/abi/callabi/leaf-2.c (dg-options): Add
-fomit-frame-pointer.

[RISC-V][PR target/124984] Fix RTL checking abort in thead memory address classification

As shown in the PR, we can trigger an RTL checking abort when classifying thead
specific addressing modes. As far as I can tell, the code is supposed to be
extracting constant value from the multiply operation, but instead is
referencing the wrong object.

The fix is trivial. I don't think this is anywhere near serious enough to try
to get into the imminent gcc-16 release. So after pre-commit testing is done
I'll push to the trunk, then backport in a week or so after the gcc-16 release
has been made.

This has been regression tested on riscv64-elf and riscv32-elf. While it will
spin on the Pioneer overnight, which has the relevant thead extensions, they
aren't enabled by default, so I don't really expect any meaningful improvements
to coverage.

PR target/124984
gcc/
* config/riscv/thead.cc (th_memidx_classify_address_index): Extract
constant multiplicand value from the right object.

gcc/testsuite
* gcc.target/riscv/pr124984.c: New test.

Daily bump.

testsuite: New effective-target sleep

libgfortran calls sleep, which is not available on all targets.

gcc:
* doc/sourcebuild.texi (Effective-Target Keywords): Document 'sleep'.

gcc/testsuite:
* lib/target-supports.exp (check_effective_target_sleep): New.

[RISC-V][PR rtl-optimization/80770] Canonicalize extending byte loads for RISC-V

In the process of debugging pr80770 with Shreya it became apparent that a
failure to CSE certain memory references was inhibiting Shreya's RTL
simplification from firing in all the cases we cared about as the simplifier
requires two operands to be the same pseudo.

The failure to CSE stems from having two QI loads which are sign extended to
different sized destinations.  As it turns out the code to fix that was
something I already had in flight as it's a small piece of eliminating a few
define_insn_and_split patterns (or simplifying them down to just a
define_split).

To expose the missed CSE what we really want to do is extend the value out to
word mode in a temporary, then use a lowpart extraction to set the real
destination.  The key being we haven't changed the size of the load, just how
widely it gets extended. Think of it as canonicalization for the purposes of
CSE.

This isn't the full set of changes I had in flight in that space, but does
clean things up enough for QImode loads to get CSE'd better and is enough to
trigger Shreya's pr80770 changes consistently for the testcodes we have on
RISC-V.

This has been spinning in my tester for a while.  So it's clean on riscv64-elf,
riscv32-elf as well as bootstrapped and regression tested on the Pioneer and
BPI-F3.  I'll wait for the pre-commit tester to do its thing before pushing to
the trunk.

In case it's not obvious, I'm focused on trickling RISC-V target improvements
right now so as not to potentially interfere with the release process.  So this
doesn't include Shreya's simplify-rtx.cc changes.

PR rtl-optimization/80770
gcc/
* config/riscv/riscv.md (zero_extendqi<SUPERQI:mode>2): Always extend
out to a word and use a subreg lowpart extraction to get the right bits.
(extend<SHORT:mode><SUPERQI:mode>2): Similarly.

gcc/testsuite
* gcc.target/riscv/rvv/base/vwaddsub-1.c: Adjust expected output.

mips: Fix ICE on mips64-elf by removing MAX_FIXED_MODE_SIZE override [PR120144]

The definition of MAX_FIXED_MODE_SIZE did not account for MIPS supporting
TImode, which causes an internal compiler error when building libstdc++. Upon further
investigation, this definition appears to be a historical mistake.

This patch removes the MAX_FIXED_MODE_SIZE override, which fixes the error.

PR target/120144

gcc/ChangeLog:

* config/mips/mips.h (MAX_FIXED_MODE_SIZE): Remove.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr120144.c: New test.

Signed-off-by: Carter Rennick <carter.rennick@gmail.com>

tree-ssa-dce: eliminate dead relaxed atomic loads with no LHS [PR123966]

A relaxed atomic load whose result is never used has no observable
effect: the value is discarded and __ATOMIC_RELAXED provides no
inter-thread synchronisation guarantee.

Fix this by adding an early-return check for
BUILT_IN_ATOMIC_LOAD_1/2/4/8/16 calls that have no LHS and a
compile-time-constant relaxed memory order.

PR tree-optimization/123966

gcc/ChangeLog:
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary):
Don't mark a relaxed atomic load with no LHS as necessary.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr123966.c: New test.

Signed-off-by: Eikansh Gupta <eikansh.gupta@oss.qualcomm.com>

libstdc++: Disallow duration of cv-qualified types and references.

This implements LWG 4481, "Disallow chrono::duration<const T, P>",
which was approved in Croydon 2026.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h: Add static_assert requiring cv-unqualified
non-reference type.
* testsuite/20_util/duration/io.cc: Remove const-qualifier in
stream manipluators tests.
* testsuite/20_util/duration/requirements/typedefs_neg4.cc:
New test.

[PATCH] RISC-V: Add vector cost model for Spacemit-X60

This patch implements a dedicated vector cost model for the Spacemit-X60
core. The cost values are derived from micro-benchmarking
data provided by the Camel CDR project.

Following discussions during the RISC-V Patchwork Meeting and based on
the upstream review process, this model applies a clamping
for long-latency instructions. Specifically, all long reservations
are capped at 7 cycles.

As we do not have access to the SPEC CPU benchmark suite, no testing
was performed using that suite. The implementation is based on the
cycle counts reported in the linked data source.

Data source:
https://camel-cdr.github.io/rvv-bench-results/spacemit_x60/index.html

Discussion reference:
https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707625.html

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_adjust_cost):Enable
TARGET_ADJUST_LMUL_COST for spacemit_x60.
* config/riscv/spacemit-x60.md: Add vector pipeline model
for Spacemit-X60.

Co-authored-by: Dusan Stojkovic <Dusan.Stojkovic@rt-rk.com>
Co-authored-by: Nikola Ratkovac <Nikola.Ratkovac@rt-rk.com>

libsdc++: Restore check for validity of std::get for elements_view.

Resolves LWG3797, "elements_view insufficiently constrained".

When P2165R4 updated __has_tuple_element in C++23 to reuse __tuple_like
concept, it dropped the requirement of validity of get, assuming that for
tuple_like type with size of N, get<I> on lvalue is well-formed for any I < N.
This however does not hold for ranges::subrange (tuple-like of size 2) with
move-only iterator, for which get can only be applied on rvalue. In consequence
constrains allowed instantiating elements_view for range of such subrange,
but instantiating it's iterator lead to hard error from iterator_category
computation.

This patch applies the requirements on validity of get also in C++23 and
later standard modes.

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__has_tuple_element): Check
if std::get<_Nm>(__t) returns referenceable type also for C++23
and later.
* testsuite/std/ranges/adaptors/elements.cc: Add test covering
vector of ranges::subrange with move-only iterator.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

Some TLC to vect_create_new_slp_node APIs

The following properly documents the overloads of vect_create_new_slp_node
and adjusts callers in tree-vect-slp-patterns.cc

* tree-vect-slp.cc (vect_create_new_slp_node): Assert that 'code'
is either ERROR_MARK or VEC_PERM_EXPR. Document properly.
* tree-vect-slp-patterns.cc (vect_build_swap_evenodd_node):
Use lane_permutation_t.
(vect_build_combine_node): Likewise. Pass VEC_PERM_EXPR
as code.

Do not pass vector type to scalar costing

The following drops passing of the vector type to scalar stmt costing
for BB vectorization.

* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Do not pass
vector type to costing.

[RISC-V][V2][PR target/123839] Improve subset of constant permutes for RISC-V

There's a set of constant permutes that are currently implemented
via vslideup+vcompress which requires a mask (and setup of the
mask), but which can be implemented via vslideup+vslidedown.

This has been tested on riscv{32,64}-elf as well as in a BPI-F3 which
is configured to use V by default.

PR target/123839
gcc/
* config/riscv/riscv-v.cc (shuffle_slide_patterns): Use a
vslideup+vslidedown pair rather than a vcompressed based
sequence.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c: Adjust
expected output.
* gcc.target/riscv/rvv/autovec/pr123839.c: New test.

rs6000: Don't fold stuff for C++ during targetm.resolve_overloaded_builtin [PR124133]

The following testcase ICEs starting with the removal of NON_DEPENDENT_EXPR
in GCC 14.  The problem is that while parsing templates if all the arguments
of the overloaded builtins are non-dependent types,
targetm.resolve_overloaded_builtin can be called on it.  And trying to
fold_convert or fold_build2 subexpressions of such arguments can ICE,
because they can contain various FE specific trees, or standard trees
with NULL_TREE types, or e.g. type mismatches in binary tree operands etc.
All that goes away later when the trees are instantiated and
targetm.resolve_overloaded_builtin is called again, but if it ICEs while
doing that, it won't reach that point.  And the reason to call that
hook in that case if none of the arguments are type dependent is to figure
out if the result type is also non-dependent.

Given the general desire to fold stuff in the FE during parsing as little
as possible and fold it only during cp_fold later on and because from the
target *-c.cc files it isn't easily possible to find out if it is
processing_template_decl or not, the following patch just stops folding
anything in the arguments, calls convert instead of fold_convert and
just build2 instead of fold_build2 etc. when in C++ (and keeps doing what
it did for C).

2026-04-24  Jakub Jelinek  <jakub@redhat.com>

PR target/124133
* config/rs6000/rs6000-c.cc (c_fold_convert): New function.
(c_fold_build2_loc): Likewise.
(fully_fold_convert): Use c_fold_convert instead of fold_convert.
(altivec_build_resolved_builtin): Likewise.  Use c_fold_build2_loc
instead of fold_build2.
(resolve_vec_mul, resolve_vec_adde_sube, resolve_vec_addec_subec):
Use c_fold_build2_loc instead of fold_build2_loc.
(resolve_vec_splats, resolve_vec_extract): Use c_fold_convert instead
of fold_convert.
(resolve_vec_insert): Use c_fold_build2_loc instead of fold_build2.
(altivec_resolve_overloaded_builtin): Use c_fold_convert instead
of fold_convert.

* g++.target/powerpc/pr124133-1.C: New test.
* g++.target/powerpc/pr124133-2.C: New test.

Reviewed-by: Michael Meissner <meissner@linux.ibm.com>

bitintlower: Padding bit fixes, part 5 [PR123635]

The following patch is hopefully the last missing part of the _BitInt
bitint_extended padding bit fixes, this time for
__builtin_{add,sub,mul}_overflow.  For __builtin_{add,sub}_overflow,
the extension in the padding bits of a partial limb (if any) is already
done in some cases during the handling of the limbs (and the last
hunk in gimple-lower-bitint.cc just adds it to one spot where it was
missing).  The extension in the padding bits of a full limb of padding
bits (if any) and for __builtin_mul_overflow partial limb too is done
in finish_arith_overflow.  If both var and obj are NULL, it is
__builtin_*_overflow_p or __builtin_*_overflow that ignores the result
of the operation and only cares about whether it overflowed or not; in
that case there is nothing to extend.

2026-04-24  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/123635
PR tree-optimization/124988
* gimple-lower-bitint.cc (bitint_large_huge::finish_arith_overflow):
Handle bitint_extend.
(bitint_large_huge::lower_addsub_overflow): Fix up comment spelling.
For bitint_extended extend the partial limb if any.

* gcc.dg/torture/bitint-91.c: New test.
* gcc.dg/torture/bitint-92.c: New test.
* gcc.dg/torture/bitint-93.c: New test.
* gcc.dg/torture/bitint-94.c: New test.
* gcc.dg/torture/bitint-95.c: New test.

Reviewed-by: Richard Biener <rguenth@suse.de>

libstdc++: Reject using views::iota on iota_view.

Resolves LWG4096, views::iota(views::iota(0)) should be rejected.

For __e of type _Tp that is specialization of iota_view, the CTAD based
expression iota_view(__e) is well formed, and creates a copy of __e.
As iota_view<decay_t<_Tp>> is ill-formed in this case (iota_view is not
weakly_incrementable), using that type in return type explicitly, removes
the overload from overload resolution in this case.

The (now redudant) __detail::__can_iota_view constrain in template head is
preserved, to provide error messages consistent with adaptors for other
non-incrementable types.

libstdc++-v3/ChangeLog:

* include/std/ranges (_Iota::operator()(_Tp&&)): Replace
auto return type and CTAD with iota_view<decay_t<_Tp>>.
* testsuite/std/ranges/iota/iota_view.cc: Tests if
views::iota(iota_view) is rejected.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Constrain views::adjacent(_transform)?<0> to forward_ranges.

This resolves LWG 4098, "views::adjacent<0> should reject non-forward ranges"
which was approved in Sofia 2024.

libstdc++-v3/ChangeLog:

* include/std/ranges (_AdjacentTransform::operator())
(_Adjacent::operator()): Require forward_range for N == 0.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Test if input_ranges
are rejected.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Add _GLIBCXX_RESOLVE_LIB_DEFECTS comment for LWG4083.

The LWG4083, "views::as_rvalue should reject non-input ranges" is resolved,
as input_range<_Range> is implied by __detail::__can_as_rvalue_view<_Range>.

libstdc++-v3/ChangeLog:

* include/std/ranges: Add comment for LWG4083.

x86_cse: Use integer load for CONST_VECTOR load

CONST_VECTOR load no larger than integer register

(set (reg:V2QI 294)
(const_vector:V2QI [(const_int 0 [0]) repeated x2]))

can use integer load. Use inner mode as the scalar mode for CONST_VECTOR
load source.

gcc/

PR target/125009
* config/i386/i386-features.cc (ix86_place_single_vector_set):
Support CONST_VECTOR load no larger than integer register.
(ix86_broadcast_inner): Use inner mode as the scalar mode for
CONST_VECTOR load source.
(pass_x86_cse::x86_cse): Generate CONST_VECTOR broadcast source
for CONST_VECTOR load no larger than integer register.

gcc/testsuite/

PR target/125009
* g++.target/i386/pr125009.C: New test.
* gcc.target/i386/pr125009.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

libstdc+: Provide iterator type for basic_const_iterator.

This resolves LWG 4253, "basic_const_iterator should provide iterator_type"
which was approved in Kona 2025.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (basic_const_iterator::iterator_type):
Define.
* testsuite/24_iterators/const_iterator/1.cc: Tests for
iterator_type.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

tree-optimization/124843 - vectorize inversion of scalar bools

Scalar bool inversion vectorization fails due to bools having
bit precision. The following adds a pattern to rewrite it
to the corresponding BIT_XOR_EXPR operation which we can vectorize
just fine.

PR tree-optimization/124843
* tree-vect-patterns.cc (vect_recog_bool_pattern): Recognize
BIT_NOT_EXPR of scalar bools and rewrite with BIT_XOR_EXPR.

* gcc.dg/vect/vect-bool-4.c: New testcase.

Improve points-to after vectorization

The following teaches the vectorizer to create points-to info from
non-pointer accesses like copy_ref_info does.

* tree-vect-data-refs.cc (vect_duplicate_ssa_name_ptr_info):
Create points-to info from decl-based accesses.
(vect_create_addr_base_for_vector_ref): Adjust.
(vect_create_data_ref_ptr): Likewise.
(bump_vector_ptr): Likewise.