git.ipfire.org Git - thirdparty/gcc.git/log

c++: capture of reference to global in template [PR123536]

Thanks to DR 696 (r253266), this works:

  int g;
  void fn ()
  {
    int &c = g;
    auto l = [] { c++; };
    l();
  }

because `c` in the lambda body is not an odr-use because we can
evaluate it to a constant and so there's no capture.  But when
fn is a template, we reject the code and crash.  This patch fixes
both.

Outside a template, the call to maybe_constant_value in mark_use
evaluates `c` to `(int&) &g` but in a template, it remains `c`.
Then we emit an error, and crash on the error_mark_node from
process_outer_var_ref.  One of the reasons is
      else if (TYPE_REF_P (TREE_TYPE (expression)))
        /* FIXME cp_finish_decl doesn't fold reference initializers.  */
        return true;
in value_dependent_expression_p but even if that changed, we still
wouldn't get the referent because decl_really_constant_value wouldn't
give it to us; the DECL_INITIAL is not a TREE_CONSTANT yet.

So I stopped trying to make this work in a template, and instead
I'm deferring the error in process_outer_var_ref to instantiation
when it's instantiation-dependent.  The VAR_P check there is not
to regress the diagnostic in pr57416.C.

The mark_use hunk is to fix a crash on invalid (lambda-const14.C).

PR c++/123536

gcc/cp/ChangeLog:

* cp-tree.h (process_outer_var_ref): Remove a parameter's name.
* expr.cc (mark_use): Return if mark_rvalue_use returns
error_mark_node.
* semantics.cc (process_outer_var_ref): Return decl when it is
instantiation-dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const12.C: New test.
* g++.dg/cpp0x/lambda/lambda-const13.C: New test.
* g++.dg/cpp0x/lambda/lambda-const14.C: New test.
* g++.dg/template/local11.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

aarch64: Add vec_packs_float pattern [PR123748]

This enables the vectorizer to vectorize conversion from long to float for
aarch64 target.

Bootstrapped and tested on aarch64_linux_gnu.

PR target/123748

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_packs_float_v2di): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr123748.c: New test.

Signed-off-by: Pengxuan Zheng <pengxuan.zheng@oss.qualcomm.com>

libstdc++: Fix incorrect move in flat_map::_M_try_emplace [PR125374]

PR libstdc++/125374

libstdc++-v3/ChangeLog:

* include/std/flat_map (_Flat_map_impl::_M_try_emplace): Forward
instead of unconditionally moving __k when inserting it.
* testsuite/23_containers/flat_map/1.cc (test10): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

AArch64: Add scalar-to-vector costs for vec_construct

An anti-pattern found in compiled code when predicated tails were
enabled for basic block SLP vectorization was triggered by
byte-reversing patterns in source code, such as:

uint8_t *dst;
int size;
dst[0] = size >> 24;
dst[1] = size >> 16;
dst[2] = size >> 8;
dst[3] = size >> 0;

which would previously have compiled to:

rev    w1, w1
str    w1, [x0]

but (with tail-predication) was vectorized as:

mov     z31.b, w1
ptrue   p7.s, vl4
fmov    s30, w1
sshr    v29.2s, v30.2s, #8
insr    z31.s, s29
sshr    v30.2s, v30.2s, #16
insr    z31.s, s30
fmov    s30, w1
sshr    v30.2s, v30.2s, #24
insr    z31.s, s30
st1b    {z31.s}, p7, [x0]

One reason is that the SLP pass runs before the store-merging
pass gets a chance to coalesce 4 stores into 1 and substitute a
32 bit bswap implementation. Even ignoring that, costing of the
vectorized version (cost: 4) compared to the scalar version
(also 4) was not realistic:

_2 1 times vector_store costs 1 in body
node 0x32ee6d0 1 times vec_construct costs 3 in prologue

There were a couple of contributing issues:
1. the cost of mask construction for the vector_store (ptrue) was
omitted for BB SLP, whereas the loop vectorizer explicitly charges
for it.
2. the cost of vec_construct (elements / 2 + 1) did not incorporate
any GPR-to-SIMD register transfer costs (mov, fmov).

Since the supposed cost of the vectorised code only just reached parity
with the scalar code, addressing either of the above issues would be
sufficient to prevent vectorisation (in this specific case). It is also
less risky than changing the order of passes, and less hacky than
teaching the SLP pass about store-merging.

This commit addresses only the second issue, by adding code in
vector_costs::add_stmt_cost to charge scalar_to_vec_cost for each
element of an external def of kind vec_construct (with specific
exceptions noted below). This cost is added to the base cost
already charged by aarch64_builtin_vectorization_cost for a
vec_construct (which is assumed to cover the cost of the INSR or
equivalent instructions).

This is justifiable because SIMD-to-SIMD insertions into a vector
register generally have lower latency and higher throughput than
GPR-to-SIMD insertions.

The basic structure of the code was copied from commit
90d693bdc9d71841f51d68826ffa5bd685d7f0bc which modified the x86
backend in a similar way, but adapted to use a hash_set<tree>
instead of TREE_VISITED to guard against charging twice or more for
the same scalar op feeding an external def.

This commit assumes that constructing a vector from memory
is no more costly than the equivalent set of scalar loads (or at least
that any difference is incorporated in the cost returned by
aarch64_builtin_vectorization_cost for vec_construct). It also assumes
that constructing a vector from scalar values of floating point type,
from a BIT_FIELD_REF/lastb that extracts from a vector register, or
from the result of a call to an inbuilt reduction function, does not
incur GPR-to-SIMD register transfer costs because such scalars are
typically already in FP/SIMD registers on AArch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_call_scalar_result_in_simd_reg_p):
New function to determine probabilistically whether a gcall
produces a scalar result in a SIMD/FP register.
(aarch64_scalar_op_to_vec_p): New function to determine
whether or not to add scalar_to_vec_cost per scalar operand
from which a vector is to be constructed.
(aarch64_external_adjust_stmt_cost): New function to adjust the
cost of an SLP tree node for a vec_construct that is fed by
values defined outside the vectorized region.
(aarch64_vector_costs::add_stmt_cost): Call the new
aarch64_external_adjust_stmt_cost function if we have an SLP
node and a vector type.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/vec_construct_1.c: New test.
* gcc.target/aarch64/sve/vec_construct_2.c: New test.
* gcc.target/aarch64/sve/vec_construct_3.c: New test.
* gcc.target/aarch64/sve/vec_construct_4.c: New test.
* gcc.target/aarch64/sve/vec_construct_5.c: New test.
* gcc.target/aarch64/vec-construct-1.c: New test.
* gcc.target/aarch64/vec-construct-10.c: New test.
* gcc.target/aarch64/vec-construct-11.c: New test.
* gcc.target/aarch64/vec-construct-12.c: New test.
* gcc.target/aarch64/vec-construct-2.c: New test.
* gcc.target/aarch64/vec-construct-3.c: New test.
* gcc.target/aarch64/vec-construct-4.c: New test.
* gcc.target/aarch64/vec-construct-5.c: New test.
* gcc.target/aarch64/vec-construct-6.c: New test.
* gcc.target/aarch64/vec-construct-7.c: New test.
* gcc.target/aarch64/vec-construct-8.c: New test.
* gcc.target/aarch64/vec-construct-9.c: New test.

libstdc++: Use allocate_at_least in vector, string (P0401) [PR118030]

Implement as much of allocator<>::allocate_at_least as possible
relying solely on known alignment behavior of standard operator
new.

Use allocator_at_least in string and vector to maximize usage of
actually allocated storage, as revealed by the allocator in use.
For user-supplied allocators this may make a big difference.

Nothing is changed in include/ext/malloc_allocator or others.
They can be updated at leisure, piecemeal.

libstdc++-v3/ChangeLog:
PR libstdc++/118030
* config/abi/pre/gnu.ver: Expose string::_S_allocate_at_least,
_M_create_plus symbols.
* include/bits/alloc_traits.h:
(allocate_at_least): Delegate in allocator_traits<allocator<_Tp>>
specialization to allocator<_Tp>::allocate_at_least, unconditionally;
annotate [[__gnu__::always_inline__]].
(allocate_at_least): Declare "= delete;" in allocator<void>.
* include/bits/allocator.h (allocate_at_least): Delegate to base
allocate_at_least where defined, calling with explicit base-class
qualification, picking up __new_allocator member.
* include/bits/basic_string.h:
(_Alloc_result): Define new type.
(_S_allocate_at_least): Define, using it.
(_S_allocate): Minimize for legacy ABI use only.
(_M_create_plus): Declare.
(_M_create_and_place): Define, abstracting common operations.
(assign): Use _S_allocate_at_least.
* include/bits/basic_string.tcc:
(_M_create_plus): Define.
(_M_replace, reserve): Use _S_allocate_at_least.
(_M_construct, others (3x)): Use _M_create_and_place.
(_M_construct, input iterators): Use _M_create_plus.
(_M_create, _M_assign, reserve, _M_mutate): Same.
* include/bits/memory_resource.h (allocate_at_least): Define,
document.
* include/bits/new_allocator.h (allocate_at_least): Define.
(_S_check_allocation_limit) Define.
(allocate): Use _S_check_allocation_limit.
(_S_max_size): Change from _M_max_size.
(deallocate): Refine "if constexpr" logic.
* include/bits/stl_vector.h:
(_S_max_size): Move to _Vector_base.
(_Alloc_result): Define type.
(_M_allocate_at_least): Define, using allocate_at_least where supported.
(_M_allocate): Delegate to _M_allocate_at_least.
(max_size, _S_check_init_len): Use _S_max_size as moved.
(_M_create_storage, append_range, _M_allocate_and_copy,
_M_replace_storage): Define, abstracting common operations.
(_M_replace_with): Define, likewise.
(_M_range_initialize_n): Use _M_allocate_at_least.
(_M_check_len): Improve logic.
* include/bits/vector.tcc:
(reserve, _M_fill_append, _M_range_insert): Use _M_allocate_at_least
and _M_replace_storage.
(operator=, _M_assign_aux): Use _M_replace_with.
(_M_realloc_insert, _M_realloc_append, _M_default_append, insert_range):
Use _M_allocate_at_least.
(_M_fill_insert): Use _M_replace_storage, normalize whitespace.
* testsuite/util/testsuite_allocator.h:
(allocate_at_least (3x)): Define.
(allocate): Use allocate_at_least.
* testsuite/20_util/allocator/allocate_at_least.cc: Add tests.
* testsuite/21_strings/basic_string/capacity/char/18654.cc:
Loosen capacity check.
* testsuite/21_strings/basic_string/capacity/char/shrink_to_fit.cc:
Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/18654.cc: Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/2.cc: Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/shrink_to_fit.cc:
Same.
* testsuite/23_containers/vector/capacity/shrink_to_fit.cc: Same.
* testsuite/23_containers/vector/capacity/shrink_to_fit2.cc: Same
* testsuite/23_containers/vector/modifiers/emplace/self_emplace.cc:
Adapt to looser reserve behavior.

c++: another constexpr nested empty object [PR125336]

When looking into 125315 I came up with another test that crashes
due to an empty object.  It still crashes even after Jason's patch.

Here we have a subobject nested in an empty object:

  {.w = {.v = TARGET_EXPR <f(s)>}}

w's type is W, an empty union due to [[no_unique_address]], so
init_subob_ctx clears the ctor, but then we recurse to

  {.v = TARGET_EXPR <f(s)>}

with a null ctx->ctor so the call to get_or_insert_ctor_field in
cxx_eval_bare_aggregate crashes.  This fixes the crash similarly
to c++/125315.

no_unique_address18.C worked fine even before this patch because
there we don't have an empty object.  But let's test it also.

PR c++/125336

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_bare_aggregate): Don't call
get_or_insert_ctor_field when there is no CONSTRUCTOR.  Assert
is_empty_class.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/no_unique_address17.C: New test.
* g++.dg/cpp2a/no_unique_address18.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

OpenMP 5.0: Allow multiple clauses mapping same variable

This patch allows multiple clauses on the same construct to map the same
variable, which was not valid in OpenMP 4.5, but allowed in 5.0.

Internally, map clauses have to be deduplicated or merged before reaching the
topological sort in gimplify.cc, lest they might result in a cycle. This happens
in two places: first in the respective front-ends before any clause expansion,
then in the gimplifier just before grouping. The second pass is necessary due to
early clause expansion in the FE reintroducing some duplication (see
map-multi-2.f90).

To make duplicate detection and folding easier in Fortran, enum gfc_omp_map_op
is adjusted to have the two least signficant bits mapped to FROM and TO, similar
to gomp_map_kind in gomp-constants.h

This version of the patch only allows multiple clauses mapping the same variable
on OpenMP code; similar OpenACC code will still be rejected (for now). It also
fixes some minor issues: allow array section bounds to be null and run
target-map-multi-2 only on offload device.

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_omp_clauses): Call omp_remove_duplicate_maps
before clause expansion.

gcc/cp/ChangeLog:

* semantics.cc (finish_omp_clauses): Likewise.

gcc/ChangeLog:

* fold-const.cc (operand_compare::operand_equal_p): Handle
OMP_ARRAY_SECTION.
* gimplify.cc (gimplify_scan_omp_clauses): Call
omp_remove_duplicate_maps after partial clause expansion.
* omp-general.cc (omp_remove_duplicate_maps): New function.
* omp-general.h (omp_remove_duplicate_maps): Declare.
* omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_omp_map_op): Dedicate the two LSB to TO and FROM.
* openmp.cc (resolve_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.
* trans-openmp.cc (gfc_trans_omp_clauses): Remove duplicates before
clause expansion.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-map-multi-1.C: New test.
* testsuite/libgomp.c-c++-common/target-map-iterators-6.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-1.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-2.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-4.c: New test.
* testsuite/libgomp.fortran/target-map-multi-1.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-2.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-3.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-4.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-5.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/clauses-2.c: Adjust testcase.
* c-c++-common/gomp/map-6.c: Adjust testcase.
* gfortran.dg/gomp/pr107214.f90: Adjust testcase.
* c-c++-common/gomp/map-multi-1.c: New test.
* c-c++-common/gomp/map-multi-2.c: New test.
* gfortran.dg/gomp/map-multi-1.f90: New test.
* gfortran.dg/gomp/map-multi-2.f90: New test.

Co-Authored-By: Chung-Lin Tang <cltang@baylibre.com>
Co-Authored-By: Sandra Loosemore <sloosemore@baylibre.com>

PR fortran/115260 - fix data corruption on inline packing/unpacking

This patch fixes a data corruption occuring when a non-contiguous slice of an
allocatable array component was passed to a procedure expecting a g77-style
argument. The problem was the inline packing (PR fortran/88821) which went
astray gfc_trans_scalar_assign was told to deallocate the argument upon
return.

The solution was to not pass that argument if passing a g77-style array,
in effect a one-liner.

This is a regression which goes back to all supported releases.

gcc/fortran/ChangeLog:

PR fortran/115260
* trans-expr.cc (gfc_conv_subref_array_arg): Pass false to
dealloc argument of gfc_trans_scalar_assign if we are
converting a g77-style argument.

gcc/testsuite/ChangeLog:

PR fortran/115260
* gfortran.dg/pr115260.f90: New test.

Fix up some typos [PR125348]

The PR mentions some typos. I've removed those which I saw also in
Dhruv's patchset, here is the rest.

2026-05-19 Jakub Jelinek <jakub@redhat.com>

PR other/125348
gcc/
* config/i386/i386-expand.cc (ix86_expand_builtin): Fix diagnostic
typo, forth -> fourth.
gcc/ada/
* libgnat/s-regpat.ads: Fix comment spelling, paramter -> parameter.
gcc/m2/
* gm2-compiler/M2GenGCC.mod (PerformLastForIterator): Fix diagnostic
typo, intemediate -> intermediate.
gcc/testsuite/
* gcc.target/i386/pr117416-2.c (prefetch_test): Adjust expected
diagnostic spelling.
* gdc.test/compilable/dtoh_TemplateDeclaration.d: Fix comment
spelling, paramter -> parameter.

Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

AVR: Add bitreverseqi2 insns.

Now that https://gcc.gnu.org/r17-591 has been applied, the
middle-end will express 8-bit bitreverse code in terms of
a 16-bit bitreverse. Therefore, add bitreverseqi2 insns.

PR target/50481
gcc/
* config/avr/avr.md (bitreverseqi2): New insn-and-split.
(*bitreverseqi2): New insn.

i386: Optimize ptestz(x,-1) as ptestz(x,x) on x86

This patch, inspired by PR target/90483 and libstdc++/118416, implements
some RTL expansion-time simplifications of ptest. A common idiom for
testing a vector against zero is to use ptestz(mask,-1).  Alas the code
generated for this is suboptimal, requiring materialization of an all_ones
vector.  Given that ptestz(x,y) is defined as (x & y) == 0, an equivalent
form is ptestz(mask,mask), saving an instruction (if ~0 isn't available).

Consider the function:

typedef long long v2di __attribute__ ((__vector_size__ (16)));

int foo (v2di x)
{
  return __builtin_ia32_ptestz128(x,~(v2di){0,0});
}

with -O2 -mavx2, GCC currently generates:

foo: vpcmpeqd        %xmm1, %xmm1, %xmm1
        xorl    %eax, %eax
        vptest  %xmm1, %xmm0
        sete    %al
        ret

with this patch, it now generates:

foo: xorl    %eax, %eax
        vptest  %xmm0, %xmm0
        sete    %al
        ret

2026-05-19  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/90483
PR libstdc++/118416
* config/i386/i386-expand.cc (ix86_expand_sse_ptest):  Refactor
with optimizations for PTESTZ*, PTESTC* and PTESTNZC*, including
transforming ptestz(x,-1) into ptestz(x,x).

gcc/testsuite/ChangeLog
PR target/90483
PR libstdc++/118416
* gcc.target/i386/sse4_1-ptest-8.c: New test case.
* gcc.target/i386/sse4_1-ptest-9.c: Likewise.

arm: Fix MVE load/store with writeback intrinsics [PR124870]

These intrinsics (vldr*_gather_base_wb, vstr*_scatter_base_wb) lacked
modelling of memory accesses corresponding to writeback: in this case,
they both read and write memory.

2024-04-24 Christophe Lyon <christophe.lyon@arm.com>

PR target/124870
gcc/
* config/arm/arm-mve-builtins-base.cc (vstrq_scatter_base_impl)
(vldrq_gather_base_impl): Fix call_properties.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/pr124870.c: New test.

libstdc++: Make chrono::parse fail for bad %z [PR125369]

The chrono parsing code failed to check for errors when parsing input to
match %z. The expected input is [+-]hh[mm] but if we read less than two
valid digits for the hh or mm parts we didn't set failbit in the stream,
and used the -1 error values returned for each bad digit in the offset
value. This resulted in a "successful" parse that produced a value like
-11h or -11min for the time zone offset.

libstdc++-v3/ChangeLog:

PR libstdc++/125369
* include/bits/chrono_io.h (__detail::_Parser::operator()):
Check for errors when parsing digits for a %z format.
* testsuite/std/time/parse/125369.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

tree: Move unshare_expr from gimplifier to generic tree

We use unshare_expr in many places now outside of gimple
even. So it makes sense to move the decl to tree.h.
A few sources can now even not need to include gimplify.h;
I have not checked all of them just a few which seemed
like including gimplify.h didn't make sense.

This also moves the implementations of unshare_expr,
unshare_expr_without_location and copy_if_shared from gimplify.cc
to tree.cc to keep the headers "clean".

Bootstrapped and tested on x86_64-linux-gnu.

Changes since v1:
* v2: Move implementation too.

gcc/ChangeLog:

* cfgrtl.cc: Don't include gimplify.h or gimplify-me.h.
* cgraphbuild.cc: Likewise.
* emit-rtl.cc: Likewie.
* tree-ssa-dom.cc: Likewise.
* tree-ssa-dse.cc: Likewise.
* tree-ssa-loop-im.cc: Likewise.
* tree-ssa-loop-niter.cc: Likewise.
* tree-ssa-loop-unswitch.cc: Likewise.
* tree-ssa-math-opts.cc: Likewise.
* tree-ssa-phiopt.cc: Likewise.
* tree-ssa-phiprop.cc: Likewise.
* tree-ssa-pre.cc: Likewise.
* tree-ssa-propagate.cc: Likewise.
* tree-ssa-sccvn.cc: Likewise.
* gimplify.h (unshare_expr): Remove.
(unshare_expr_without_location): Remove.
(copy_if_shared): Remove.
* tree.h (unshare_expr): New decl.
(unshare_expr_without_location): Likewise.
(copy_if_shared): Likewise.
* gimplify.cc (mostly_copy_tree_r): Moved to tree.cc.
(copy_if_shared_r): Likewise.
(copy_if_shared): Likewise.
(unshare_expr): Likewise.
(prune_expr_location): Likewise.
(unshare_expr_without_location): Likewise.
* tree.cc (mostly_copy_tree_r): Moved from gimplify.cc
(copy_if_shared_r): Likewise.
(copy_if_shared): Likewise.
(unshare_expr): Likewise.
(prune_expr_location): Likewise.
(unshare_expr_without_location): Likewise.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

i386: Use vpermilps for some non-const permutations [PR125357]

We don't use vpermilps insn for V4S[IF]mode variable permutations on
TARGET_AVX without TARGET_AVX512*.  For TARGET_AVX512* there are plenty
of permutation instructions already.  For TARGET_AVX2, the function has
special cases for one_operand_shuffle for V8SImode/V8SFmode and emits
reasonable code, but for V4SImode/V4SFmode with TARGET_AVX2 it handles
those using V8SImode/V8SFmode as two operand shuffle, which requires
2 preparation instructions, vpermd and one finalization instruction.
And for !TARGET_AVX2 && TARGET_AVX we just emit terrible code for these.

So, the following patch uses vpermilps for V4S[IF]mode one_operand_shuffle.

Trying to handle V8S[IF]mode is not worth it, for TARGET_AVX2 we already
emit good code (see above) and for !TARGET_AVX2 && TARGET_AVX V8SImode
mask is not valid vector mode, so we emit terrible code no matter what.

2026-05-19  Jakub Jelinek  <jakub@redhat.com>

PR target/125357
* config/i386/i386-expand.cc (ix86_expand_vec_perm): For
one_operand_shuffle if TARGET_AVX and not TARGET_AVX512F use
vpermilps for V4SImode/V4SFmode.  Formatting fix.

* gcc.target/i386/avx-pr125357.c: New test.
* gcc.target/i386/avx2-pr125357.c: New test.

Reviewed-by: Hongtao Liu <hongtao.liu@intel.com>

AVR: Add insns and libgcc functions for __builtin_bitreverse16/32.

gcc/
* config/avr/avr.md (bitreversehi2, bitreversesi2): New insn_and_split.
(*bitreversehi2.libgcc, *bitreversesi2.libgcc): New insns.

libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _bitreverse8, _bitreverse16,
_bitreverse24, _bitreverse32.
* config/avr/lib1funcs.S (__bitreverse8, __bitreverse16)
(__bitreverse24, __bitreverse32): New functions.

optabs: Handle bitreverse using widening or two bitreverses of halves [PR50481]

The following patch extends the widen_bswap and expand_doubleword_bswap
functions to handle also bitreverse, so that all the backends with
say just bitreversesi2 or bitreverse{s,d}i2 can handle also
bitreverse{q,h}i2 and bitreverse{d,t}i2 easily.

2026-05-19  Jakub Jelinek  <jakub@redhat.com>

PR target/50481
* optabs.cc (widen_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab.  Rename to ...
(widen_bswap_or_bitreverse): ... this.
(expand_doubleword_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab.  Rename to ...
(expand_doubleword_bswap_or_bitreverse): ... this.
(expand_bitreverse): Use widen_bswap_or_bitreverse and
expand_doubleword_bswap_or_bitreverse.
(expand_unop): Adjust widen_bswap and expand_doubleword_bswap callers
to use new names and add an extra bswap_optab argument.

Reviewed-by: Jeffrey Law <jeffrey.law@oss.qualcomm.com>

x86: Don't inline cold memmove call

Replace optimize_function_for_size_p with optimize_insn_for_size_p in
ix86_expand_movmem to avoid inlining cold memmove call.

gcc/

PR target/125355
* config/i386/i386-expand.cc (ix86_expand_movmem): Replace
optimize_function_for_size_p with optimize_insn_for_size_p.

gcc/testsuite/

PR target/125355
* gcc.target/i386/pr125355-2.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

mingw: Check seh_endproc for function end

For mingw targets, one way to identify the end of function is to check
the .seh_endproc directive.  Extend the lib/scanasm.exp change in

commit 6c9585ce44faeba5e0b6859871712f4895537d29
Author: Saurabh Jha <saurabh.jha@arm.com>
Date:   Thu Oct 9 14:04:45 2025 +0000

    aarch64: mingw: emit seh_endproc as comment

to cover all mingw targets.

* lib/scanasm.exp (configure_check-function-bodies): Check
"*-*-mingw32" instead of "aarch64*-*-mingw32".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

RISC-V: Guard 64-bit vec_extract.

Currently, reduc-6.c fails on the trunk when compiling for 32 bit.
We emit a pred_extract_first of a V2DImode during legitimization of
a move.  Normally, we would split that insn into two 32-bit extracts
but this splitter needs to be able to create pseudos which it can't
after reload.  The insn here is created during reload when we can still
create pseudos.  This patch just piggybacks on the existing handling
when no 64-bit vector elements are available (!TARGET_VECTOR_ELEN64).
Thus, we don't emit 64-bit extracts and don't need to rely on splitting
late.

PR target/125097

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Emit 32-bit
vec_extracts right away.

RISC-V: Remove cbranch_all patterns.

When first introducing the cbranch_any/_all patterns I messed up the all
pattern. After giving it more thought, I removed the patterns entirely.
Our current early-break handling via autovec-opt.md is not ideal but
similar to what the patterns would give us, so no need for confusion.
The situation will improve anyway once the no-scalar-epilogue early-break
patches land.

While at it, I tried unifying the int and float comparison emitter
functions. The latter now also has "mask" capabilities.

gcc/ChangeLog:

* config/riscv/autovec.md (<cbranch_optab><mode>): Remove.
* config/riscv/riscv-protos.h (expand_vec_cmp_float): Add mask
and else arguments.
* config/riscv/riscv-v.cc (expand_vec_cmp): Add mask and else
arguments.
(expand_vec_cmp_float): Ditto.
* config/riscv/vector-iterators.md: Remove iterators.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/early-break-5.c: Remove redundant
comments.

rs6000: Adding missed ISA 3.0 atomic memory operation instructions

Changes to amo.h include the addition of the following load atomic
operations: Compare and Swap Not Equal, Fetch and Increment Bounded,
Fetch and Increment Equal, and Fetch and Decrement Bounded.
Additionally, Store Twin is added for store atomic operations.

2026-05-19 Peter Bergner <bergner@linux.ibm.com>
Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
* config/rs6000/amo.h: Add missing atomic memory operations.
* doc/extend.texi (PowerPC Atomic Memory Operation Functions): Document
new functions.

gcc/testsuite/
* gcc.target/powerpc/amo3.c: New test.
* gcc.target/powerpc/amo4.c: Likewise.
* gcc.target/powerpc/amo5.c: Likewise.
* gcc.target/powerpc/amo6.c: Likewise.
* gcc.target/powerpc/amo7.c: Likewise.

Fix masm ptwrite again

The earlier 64bit ptwrite as 32bit fix broke Intel syntax output. Handle
that too by using an alternative. In Intel syntax the instruction
data type is defined by the operands.

I'll commit it as obvious in a day or so for 15/16/trunk, unless there
are objections.

PR target/125351

gcc/ChangeLog:

* config/i386/i386.md: Use alternative to handle masm ptwrite
syntax.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr125351.c: New test.

Daily bump.

testsuite: Fix pr112095.c for veclowering

Basically we need to test earlier in release_ssa instead
of optimization which is before vec lowering happens.
Also "return a_" to be expanded to match "<retval> = a_"
for vector types that return via memory.
Also add -Wno-psabi to avoid a warning/note about the vector
argument.

Pushed as obvious after testing on x86_64-linux-gnu with both
-m64 and -m32/-mno-sse to invoke the cases that matter here.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr112095.c: Add -Wno-psabi to the options.
Look at release_ssa instead of optimization. Match
"<retval> = a_" in addition to "return a_".

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

libstdc++: Move std::bitset test to correct directory

I added this test in r16-3435-gbbc0e70b610f19 but I'd previously moved
the rest of the bitset tests under 20_util, in r13-2778-g4b4b51445f7f3d.
This moves the lwg4294.cc test where it belongs.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/bitset/lwg4294.cc: Move to...
* testsuite/20_util/bitset/cons/lwg4294.cc: ...here.

[PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)

When split_loop does iteration space splitting, split_at_bb_p may
swap the guard condition so that operand 0 is always the loop IV
and operand 1 is the invariant.  For example, "t < i" (LT_EXPR)
becomes "i > t" (GT_EXPR).  This can cause initial_true to be
false, meaning loop1 handles iterations where the guard is false
and loop2 handles iterations where the guard is true.

The profile update code used true_edge->probability for loop1 and
its inverse for loop2.  That is correct only when loop1 keeps the
true branch.  When initial_true is false, loop1 keeps the false
branch and loop2 keeps the true branch, so all profile quantities
must follow those semantic edges instead of the raw true/false edge
names.

Derive loop1_edge and loop2_edge once from initial_true and use them
consistently for loop_version's then probability, the split-loop BB
count scaling, and the iteration estimate scaling.  Also make
fix_loop_bb_probability take loop1/loop2 edges explicitly, rather
than assuming its arguments are true/false edges.

The bug caused BB counts in the split loops to be swapped when
initial_true is false: the loop body whose guard is forced false
(loop1, executing fewer iterations) would get the higher profile
count, and vice versa.  It could also leave the precondition edge
probabilities and iteration estimates based on the wrong split edge.

gcc/ChangeLog:

* tree-ssa-loop-split.cc (fix_loop_bb_probability): Rename
parameters from true_edge/false_edge to loop1_edge/loop2_edge
and scale both loops directly from their semantic edge
probabilities.
(split_loop): Derive loop1_edge and loop2_edge from initial_true.
Use them for loop1_prob, fix_loop_bb_probability, and iteration
estimate scaling.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/loop-split-4.c: New test.

c++: remove dead code

We've had this comment to remove the code since r13-963, so it seems
it's been long enough now to go ahead and remove it.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_bare_aggregate): Remove dead code.

Reviewed-by: Jason Merrill <jason@redhat.com>

[PATCH] RISC-V: Add xt-c9501fdvt CPU support

Add the XuanTie C950 (xt-c9501fdvt) as a known RISC-V CPU. The C950
is based on the rva23s64 profile with additional extensions.

gcc/ChangeLog:

* config/riscv/riscv-cores.def: Add xt-c9501fdvt tune and core.
* doc/riscv-mcpu.texi: Regenerated.
* doc/riscv-mtune.texi: Regenerated.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-xt-c9501fdvt.c: New test.

Signed-off-by: Wang Yaduo <wangyaduo@linux.alibaba.com>

[RISC-V] Improve SI->DI zero/sign extension patterns for RISC-V

--  From the original submission in Oct 2025 --

So this is a slightly scaled back variant of a patch I've been working
on.  I'd originally planned to handle both zero and sign extensions, but
there's some fallout with the sign extension adjustments that I'm going
to need more time to tackle.  This piece stands on its own and unlocks a
subsequent patch to improve codegen.  No sense in having it possibly
miss the merge window.

This patch adjusts the core zero-extension patterns as well as one
closely related combiner pattern.

For the named expanders, we now generate shift pairs if the Zba/Zbb
extensions are not available and the source operand is a REG.  Things
are kept as-is for MEMs.

The existing define_insn_and_split it turned into a define_insn that
only handles MEM sources.  Those instructions are always available, so
no need to mess with shift pairs.  This avoids regressions with a
follow-up patch which enhances a closely related combiner pattern.

That closely related combiner pattern is a define_insn_and_split which
can now turn into a simpler define_split.  So that's adjusted as well.

The net is we drop 3 define_insn_and_splits and occasionally get better
code as a result.  It also makes it possible to improve some additional
cases which I'll handle as a followup.

The test changes are minimal and mostly related to making sure we have
the right Zb* things enabled based on what the test relies on under the
hood.  It's not even clear that part of the change is strictly necessary
anymore.  I see it more as test hygiene than anything.

This has been bootstrapped and regression tested on the Pioneer which is
a good test since it doesn't have any of the Zb* extensions and thus
relies heavily on the shift-pair approach to zero extensions.
riscv32-elf and riscv64-elf have also been regression tested.  The BPI
hasn't started chewing on this patch yet.

--

Subsequent changes were to the testsuite to ensure that --with-cpu or
--with-tune configure time options wouldn't impact the testresults.

gcc/

* config/riscv/riscv.cc (riscv_rtx_costs): Properly cost pack insns
for Zbkb.
* config/riscv/riscv.md (zero_extendsidi2): Expand into shift pairs
when the appropriate instructions are not available.
(zero_extendhi<GPR:mode>2): Simlarly.
(*zero_extendsidi2_internal): Make a simple define_insn.  Only handle
MEM sources.
(*zero_extendhi<GPR2:mode>2): Similarly.
(zero_extendsidi2_shifted): Turn into a define_split and generalize
to handle more constants.
* config/riscv/predicates.md (dimode_shift_operand): New predicate.

gcc/testsuite/

* gcc.target/riscv/slt-1.c: Skip for -Oz as well.  Set explicit branch
cost.
* gcc.target/riscv/zba-shNadd-04.c: Add Zbb to command line switches.
* gcc.target/riscv/zba-slliuw.c: Add Zbs to command line switches.
* gcc.target/riscv/zbs-zext.c: Add Zbs to command line switches.
* gcc.target/riscv/shift-shift-6.c: New test.
* gcc.target/riscv/shift-shift-7.c: New test.
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/zalasr-rvwmo-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/zalasr-ztso-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/pr105314.c: Set explicitly branch cost.
* gcc.target/riscv/pr105314-rtl.c: Set explicitly branch cost.

[RISC-V] Improve ext-dce's live bit tracking for IOR/AND with a constant argument

Investigation of a regression with some RISC-V target changes exposed a clear
missed optimization in ext-dce.c

In particular if we mask off bits via a logical AND, then the masked off bits
are not live-in for the other input.  Tracking that can in turn allow us to
eliminate more extensions.  There's a similar case for logical IOR when it
unconditionally turns bits on.

So if we look at this testcase:

typedef long unsigned int size_t;
struct function
{
  unsigned int curr_properties;
  unsigned int last_verified;
};
extern struct function *cfun;

void
execute_function_todo (void *data)
{
  unsigned int flags = (size_t) data;
  if (flags & (1 << 5))
    flags |= (1 << 15);
  (cfun + 0)->last_verified = flags & ((1 << 2) | (1 << 3) | (1 << 4));
}

It currently generates this code for rv64gcbv:

        andi    a5,a0,32
        sext.w  a0,a0
        beq     a5,zero,.L2
        bseti   a0,a0,15
.L2:
        lui     a5,%hi(cfun)
        ld      a5,%lo(cfun)(a5)
        andi    a0,a0,28
        sw      a0,4(a5)
        ret

Note carefully the 2nd andi instruction.  That's unconditionally turning off
bits 32..63 (and others).  Thus those bits are not relevant/live for the
incoming value in a0.  Walking backwards we find the sext.w which sign extends
from bit 31 into bits 32..63. But with bits 32..63 not being live, the sext.w
is useless.

After this patch we get:

        andi    a5,a0,32
        beq     a5,zero,.L2
        bseti   a0,a0,15
.L2:
        lui     a5,%hi(cfun)
        ld      a5,%lo(cfun)(a5)
        andi    a0,a0,28
        sw      a0,4(a5)
        ret

It doesn't trigger often based on my quick testing.

Bootstrapped and regression tested on various targets including x86 and riscv.
Also tested on the usual assortment of embedded targets in my tester.  I'll
wait for pre-commit CI to give a final verdict.

gcc/
* ext-dce.cc (carry_backpropagate): Handle AND and IOR with a constant argument.

gcc/testsuite/

* gcc.target/riscv/ext-dce-2.c: New test.

analyzer: rename --param=analyzer-{bb->supernode}-explosion-factor=

This param name is confusing, due to it changing meaning in
r16-6063-g0b786d961d4426.  Update the name to better express what it now
controls, and clarify the wording of the warning that fires when that
limit is hit by the analyzer, and to give more info.

gcc/analyzer/ChangeLog:
* analyzer.opt (-param=analyzer-bb-explosion-factor=): Rename to...
(-param=analyzer-supernode-explosion-factor=): ...this.
* engine.cc (exploded_graph::process_worklist): Update for change
to param name.  Clarify the -Wanalyzer-too-complex message when
hitting the overall limit on enodes by also showing the number
of snodes.

gcc/ChangeLog:
* doc/analyzer.texi: Update for change in param name.
* doc/params.texi: Likewise.  Clarify wording.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/fibonacci.c: Update for change in message
wording.
* c-c++-common/analyzer/raw-data-cst-pr117262-1.c: Update for
change in param name.
* gcc.dg/analyzer/explode-2a.c: Likewise.
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Likewise.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Update gcc sv.po

* sv.po: Update.

analyzer: drop unused field exploded_node::m_num_processed_stmts

I believe this became redundant in r16-6063-g0b786d961d4426.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::status_to_str): Drop unused field
m_num_processed_stmts.
(exploded_node::dump): Likewise.
(exploded_node::to_json): Likewise.
* exploded-graph.h (exploded_node::m_num_processed_stmts):
Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Modernize class optrecord_json_writer

No functional change intended.

gcc/ChangeLog:
* dump-context.h (dump_context::emit_optinfo): "info" is non-null,
so pass it by reference.
* dumpfile.cc (dump_context::end_any_optinfo): Update for above
change.
(dump_context::emit_optinfo): Likewise.
* optinfo-emit-json.cc: Update throughout to eliminate naked "new"
and "delete" in favor of std::make_unique and unique_ptr. Drop
redundant dtor. Use nullptr rather than NULL. Pass by
const-reference rather than by const-pointer in the places that
require non-null.
* optinfo-emit-json.h: Likewise.
* optinfo.cc (optinfo::emit_for_opt_problem): Update for above
changes.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

jit: clarify docs for gcc_jit_context_add_{command_line,driver}_option

gcc/jit/ChangeLog:
* docs/topics/contexts.rst
(gcc_jit_context_add_command_line_option): Clarify that adding
multiple options requires multiple calls.
(gcc_jit_context_add_driver_option): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Reduce iterations in PSTL test for Debug Mode

This test often times out, especially on machines with a large number of
cores when the tests are run with a lot of parallel jobs. I suspect that
TBB creates a lot of threads due to std::hardware_concurrency() being a
large number, but because most cores are already busy running other
tests (due to `make -jN check` with large N) the system gets
oversubscribed. In Debug Mode, the testcase runs much slower, and often
times out.

It's probably fine to just test with fewer iterations when Debug Mode is
active.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Reduce iterations for debug mode.

libstdc++: optimize std::uninitialized_move{,_n}() to memcpy when possible [PR121789]

std::uninitialized_move{,_n} delegates to the corresponding
std::uninitialized_copy() variant after wrapping with a move
iterator, but the std::uninitialized_copy() doesn't unwrap the
move iterator, therefore losing the memcpy optimization if the
iterators were just pointers.

Fix this by unwrapping the move iterator using __miter_base().

We remove operator-() in testsuite_greedy_ops.h; otherwise it breaks
the range size computation.

libstdc++v3/Changelog:

PR libstdc++/121789
* include/bits/stl_uninitialized.h (uninitialized_copy):
Unwrap move iterators
* testsuite/20_util/specialized_algorithms/uninitialized_move/121789.cc:
New test.
* testsuite/util/testsuite_greedy_ops.h (greedy_ops): Comment
out operator-(T, T).

libstdc++: Include range_access.h from <valarray>

This implements <valarray> related parts of section 4.8. of P3016R6.
This is treated as DR against C++11 (to expose array begin/end), to follow
similar changes to other semi-containers that were accepted as LWG issues
and treated as DR: <optional> (LWG4131), <stacktrace> (LWG3625).

libstdc++-v3/ChangeLog:

* include/std/valarray [__cplusplus >= 201103L]: Include
<bits/ranges_access.h>.
* testsuite/26_numerics/valarray/range_access2.cc: Remove
<iterator> include, and add test for std::size.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

[PATCH] RX: The size of the mov instruction will be corrected

> THanks.  There's still work to do.  I spun my tester after this change
> on the rx port:
>
> Tests that now fail, but worked before (431 tests):
>
> I won't list them all.  Given how many are execution failures, there's
> likely a code generation failure in there somewhere.

> A few of them:
>
> rx-sim: gcc: gcc.c-torture/execute/20001009-2.c   -O0  execution test
> rx-sim: gcc: gcc.c-torture/execute/20020614-1.c   -O0  execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c   -O1  execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c   -O2  execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c   -O2 -flto
> -fno-use-linker-plugin -flto-partition=none  execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c   -O3 -g  execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c   -Os  execution test
> rx-sim: gcc: gcc.c-torture/execute/921016-1.c   -O0  execution test
> rx-sim: gcc: gcc.c-torture/execute/960311-1.c   -O1  execution test
> rx-sim: gcc: gcc.c-torture/execute/960311-2.c   -O1  execution test
> rx-sim: gcc: gcc.c-torture/execute/980617-1.c   -O0  execution test
> rx-sim: gcc: gcc.c-torture/execute/990324-1.c   -O0  execution test
> rx-sim: gcc: gcc.c-torture/execute/990326-1.c   -O0  execution test
>
> Anyway, seems like something for Yoshinori to look into.

The code extension was causing incorrect output.
Optimization mitigated this issue, so I didn't notice it.
The attached changes now allow the test to pass.

When expanding `extendqisi2` or `extendhisi2`, incorrect operation size
instructions were sometimes output.
This update ensures that the operation size is determined reliably.

gcc/

* config/rx/rx.cc (rx_gen_move_template): Select the mode with the smallest size
for the mov instruction.

middle-end: Optimize reversed CRC table-based implementation

The previous reversed CRC implementation used explicit bit reflection
before and after the CRC computation:

  reflect(crc_init);
  reflect(data);
  for (int i = 0; i < data_bit_size / 8; i++)
    crc = (crc << 8) ^ table[(crc >> (crc_bit_size - 8))
                             ^ (data >> (data_bit_size - (i+1) * 8) & 0xFF)];
  reflect(crc);

This patch generates a reversed polynomial lookup table directly,
eliminating the need for bit reflection operations.  The new algorithm:

  for (int i = 0; i < data_bit_size / 8; i++)
    crc = (crc >> 8) ^ table[(crc ^ (data >> (i * 8))) & 0xFF];

This improves code generation for all targets using table-based reversed
CRC, as it removes the overhead of reflecting input data and CRC values.

Note on code size: one could imagine sharing a single (non-reversed) table
between programs that compute both reversed and non-reversed CRCs in order
to save space under -Os.  A survey of ~10k Fedora packages (by Mariam and
Jeff Law) found no package that uses both flavors in the same binary, so
this case is not worth optimizing for.

Ref:

[1] "Reversing CRC - Theory and Practice"
  https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf

gcc/ChangeLog:

* expr.cc (calculate_reversed_crc): New function.
(assemble_reversed_crc_table): New function.
(generate_reversed_crc_table): New function.
(calculate_table_based_reversed_CRC): New function.
(expand_reversed_crc_table_based): Remove gen_reflecting_code
parameter.  Use calculate_table_based_reversed_CRC.
* expr.h (expand_reversed_crc_table_based): Update prototype.
* builtins.cc (expand_builtin_crc_table_based): Update call.
* internal-fn.cc (expand_crc_optab_fn): Update call.
* config/aarch64/aarch64.md (crc_rev<ALLI:mode><ALLX:mode>4):
Update call.
* config/i386/i386.md (crc_rev<SWI124:mode>si4): Update call.
* config/loongarch/loongarch.md (crc_rev<mode>si4): Update call.
Remove local rbit lambda.
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4):
Update call.  Remove TARGET_ZBKB case.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev):
Remove.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
Remove declaration.

RISC-V: Remove blanket else in riscv_hard_regno_mode_ok.

While looking at PR124439 I noticed that we have unreachable code in
riscv_hard_regno_mode_ok. Right now we just return false for registers
that don't match one of the first four if conditions.

This patch just removes the else.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_hard_regno_mode_ok): Remove else.

RISC-V: Fix format specifier.

Right now we get

../../gcc/config/riscv/riscv.cc: In function ‘bool riscv_check_target_clone_version(string_slice, location_t*)’:
../../gcc/config/riscv/riscv.cc:15078:17: warning: unknown conversion type character ‘B’ in format [-Wformat=]
15078 | "invalid version %qB for %<target_clones%> attribute",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../gcc/config/riscv/riscv.cc:15078:17: warning: too many arguments for format [-Wformat-extra-args]

with a GCC 15 host compiler.

This patch replaces %qB with $<%.*s%>.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_check_target_clone_version): Fix
format specifier.

libstdc++: Make is_exhaustive const for layout_(left/right)_padded

This is specified as const in the standard, and required to be const-callable
per layout mapping requirement. This made calls to is_exhaustive on mdspan
with such layout ill-formed.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_left_padded::is_exhaustive)
(layout_righ_padded::is_exhaustive): Mark as const.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Test noexcept and
const-invocability for is_exhaustive, is_strided, and is_unique.
* testsuite/23_containers/mdspan/layouts/padded.cc: Test is_exhaustive on
const mapping..
* testsuite/23_containers/mdspan/layouts/stride.cc: Likewise.
* testsuite/23_containers/mdspan/mdspan.cc: Checks const-invocability
for is_exhaustive, is_strided, is_unique.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

i386: Implement bitreverse<mode>2 optab for GFNI [PR50481]

The following patch implements the bitreverse<mode>2 optab for
-mgfni -msse2 (SSE2 because apparently -mgfni doesn't imply -msse nor
-msse2).
This is done by using gf2p8affineqb insn with a special constant
which reverses bits in each byte, and for modes wider than QImode
also by doing a byteswap afterwards.
With -m64 it emits
.LC0:
        .byte   1, 2, 4, 8, 16, 32, 64, -128
        .byte   1, 2, 4, 8, 16, 32, 64, -128
and
        movd    %edi, %xmm0
        gf2p8affineqb   $0, .LC0(%rip), %xmm0
        movd    %xmm0, %eax
for __builtin_bitreverse8,
        movd    %edi, %xmm0
        gf2p8affineqb   $0, .LC0(%rip), %xmm0
        movd    %xmm0, %eax
        rolw    $8, %ax
for __builtin_bitreverse16,
        movd    %edi, %xmm0
        gf2p8affineqb   $0, .LC0(%rip), %xmm0
        movd    %xmm0, %eax
        bswap   %eax
for __builtin_bitreverse32,
        movq    %rdi, %xmm0
        gf2p8affineqb   $0, .LC0(%rip), %xmm0
        movq    %xmm0, %rax
        bswap   %rax
for __builtin_bitreverse64, and
        movq    %rdi, %xmm0
        pinsrq  $1, %rsi, %xmm0
        gf2p8affineqb   $0, .LC0(%rip), %xmm0
        movq    %xmm0, %rax
        pextrq  $1, %xmm0, %rdx
        bswap   %rax
        bswap   %rdx
        xchgq   %rdx, %rax
for __builtin_bitreverse128 (only the xchgq is unnecessary
and surprising, some RA issue).

2026-05-18  Jakub Jelinek  <jakub@redhat.com>

PR target/50481
* config/i386/i386-protos.h (ix86_expand_gfni_bitreverse): Declare.
* config/i386/i386-expand.cc (ix86_expand_gfni_bitreverse): New
function.
* config/i386/i386.md (bitreverse<mode>2): New expander.

* gcc.target/i386/gfni-builtin-bitreverse-1.c: New test.

Reviewed-by: Hongtao Liu <hongtao.liu@intel.com>

libstdc++: Use on_month_day istream operator in ZoneInfo parsing. [PR124852]

This patch changes ZoneInfo parsing, to use operator>> for on_month_day
directly, and removes on_day tag. The operator>>(istream&, on_month_day)
is updated to not override on.month if the MONTH component is not present,
and set failbit instead. This allows to use in >> on >> time, to parse
MONTH DAY TIME.

We also handle failure to parse day number N for Www>=N or Www<=N
productions, by leaving the day part of input unchanged and setting
failbit.

PR libstdc++/124852

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (on_month_day::on_day_t, on_month_day::on_day):
Remove.
(operator>>(istream&, on_month_day::day_t&)): Inlined into...
(operator>>(istream&, on_month_day)): Inlined on_month_day::on_day.
Avoid modifying on.month if MONTH is not present. Report failure
on failure to parse day for LessEq / GreaterEq.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

x86: Don't inline memmove for -Os

Update ix86_expand_movmem to return false if optimize_function_for_size_p
returns true to avoid inlining memmove for -Os.

gcc/

PR target/125355
* config/i386/i386-expand.cc (ix86_expand_movmem): Return false
for -Os.

gcc/testsuite/

PR target/125355
* gcc.target/i386/pr125355.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

uncprop: small compile time optimization with switches

In the process of converting gswitch away from CASE_LABEL_EXPR,
I found a place in uncprop (like the case in dom) where we store
the whole CASE_LABEL_EXPR. This place only needed to store the value
of the case rather than the whole case expression. This does that
small optimization and adds a few comments for the next person
to understand what is going on here. It was not obvious at my
first read of the code what it was doing or what error_mark
was being used for.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-uncprop.cc (associate_equivalences_with_edges): For switches
info only store the case low value to be recorded as
the only value. Add comments.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

ICF: Remove unneeded check for CASE_LABEL_EXPR

I noticed there was a check to see gimple_switch_label
returns a CASE_LABEL_EXPR after already using CASE_LOW/CASE_HIGH
on the same value.

This removes the check as it is always true.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* ipa-icf-gimple.cc (func_checker::compare_gimple_switch): Remove
the check on CASE_LABEL_EXPR since it is redundant.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

dom: small compile time optimization with switches

In the process of converting gswitch away from CASE_LABEL_EXPR,
I found a place in dom where we store the whole CASE_LABEL_EXPR.
This place only needed to store the value of the case rather than
the whole case expression. This does that small optimization
and adds a few comments for the next person to understand what
is going on here. It was not obvious at my first read of the code
what it was doing or what error_mark was being used for.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-dom.cc (record_edge_info): For switches
info only store the case low value to be recorded as
the only value. Add comments.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Daily bump.

PR 122245: -fc-prototypes when procedure defined via INTERFACE

This simple patch emits correct prototypes when a procedure is
defined via an interface by simply checking the presence
of an interface and using its formal arglist.

gcc/fortran/ChangeLog:

PR fortran/122245
* dump-parse-tree.cc (write_formal_arglist): Take the formal
arglist from the symbol's interface if it is present.

libgomp: Fix env.c compilation on Darwin

Darwin bootstrap is currently broken compiling libgomp/env.c:

libgomp/env.c: In function 'initialize_env':
libgomp/env.c:2476:7: error: use of logical '||' with constant operand '2097152' [-Werror=constant-logical-operand]
2476 |       || GOMP_DEFAULT_STACKSIZE)
      |       ^~
libgomp/env.c:2476:7: note: use '|' for bitwise operation
2476 |       || GOMP_DEFAULT_STACKSIZE)
      |       ^~

This is only seen on Darwin since this is the only target that defines a
non-zero GOMP_DEFAULT_STACKSIZE.

Bootstrapped without regressions on x86_64-apple-darwin25.5.0 and
x86_64-apple-darwin21.6.0.

2026-05-16  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

libgomp:
* env.c (initialize_env): Check GOMP_DEFAULT_STACKSIZE for
non-zero.

libstdc++: replace assert with __glibcxx_assert [PR125228]

Unlike `__glibcxx_assert` which is guarded by `_GLIBCXX_ASSERTIONS` and
enabled only in Debug build of libstdc++, `assert` is either always
enabled, or always disabled if manually defining `NDEBUG` before
`#include <cassert>` or `#include <assert.h>`. This not only makes
`assert` inflexible, but also introduces extra runtime overhead and/or
increased binary size in Release builds.

Uses of `assert` without `NDEBUG` introduces `__FILE__` into the final
library, and unconditionally checks the assertions.

This patch replaces the uses of `assert` in ryu and debug.cc with
`__glibcxx_assert`, and removed their direct dependency on `<cassert>`.
To avoid modifying the third-party ryu headers, this patch redefines
`assert` to `__glibcxx_assert` when including the ryu headers.

libstdc++-v3/ChangeLog:

PR libstdc++/125228
* src/c++11/debug.cc: Replace assert with __glibcxx_assert,
and remove the include of <cassert>.
* src/c++17/floating_to_chars.cc: Likewise, but redefine
assert as __glibcxx_assert.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Make configure check for atomics work on Windows [PR125312]

The changes in r16-427-g86627faec10da5 do not work for native mingw-w64
builds because the #include with a hardcoded unix-style path doesn't
work for a native mingw-w64 compiler. This results in configure
detecting that native mingw builds do not support atomic builtins for
the _Atomic_word type, causing non-inline atomics to be used for
__gnu_cxx::__exchange_and_add and __gnu_cxx::__atomic_add, which is an
unintented ABI change (and inconsistent with mingw cross-compilers where
the configure test passes and so enables the inline functions using
atomic builtins).

This attempts to solve the problem by copying the atomic_word.h header
to the current working directory, so it can be included without using an
absolute path.

libstdc++-v3/ChangeLog:

PR libstdc++/125312
* acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Copy header
into cwd instead of including it via an absolute path.
* configure: Regenerate.

match.pd: Enable some __builtin_bswap* optimizations even for __builtin_bitreverse* [PR50481]

Most of the bswap optimizations equally apply also to bitreverse builtins.
The following patch enables those.

2026-05-16 Jakub Jelinek <jakub@redhat.com>

PR target/50481
* match.pd (BITREVERSE): New define_operator_list. Use it next to
BSWAP for a subset of bswap simplifications.

* gcc.dg/builtin-bitreverse-4.c: New test.

Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

Add __builtin_bitreverse128 [PR50481]

We already have __builtin_bswap{16,32,64,128}, the last one has been
added ~6 years ago.  So, I think we should have also
__builtin_bitreverse128.

The following patch does that.

Note, we don't have __builtin_bswapg and I don't think we should, one can
only byteswap something which has number of bits divisible by CHAR_BIT.
For __builtin_bitreverseg that isn't a problem, but am not sure I want to
spend time handling it on say unsigned _BitInt(357).  Perhaps only if there
is some real-world use-case.

2026-05-16  Jakub Jelinek  <jakub@redhat.com>

PR target/50481
* doc/extend.texi (__builtin_bitreverse32, __builtin_bitreverse64):
Tweak wording for consistency with __builtin_bswap*.
(__builtin_bitreverse128): Document.
* builtins.def (BUILT_IN_BITREVERSE128): New.
* builtins.cc (expand_builtin): Handle also BUILT_IN_BITREVERSE128.
(is_inexpensive_builtin): Likewise.
* fold-const-call.cc (fold_const_call_ss): Handle also
CFN_BUILT_IN_BITREVERSE128.
* fold-const.cc (tree_call_nonnegative_warnv_p): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Handle also BUILT_IN_BITREVERSE128.
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Handle also
CFN_BUILT_IN_BITREVERSE128.
(cond_removal_in_builtin_zero_pattern): Likewise.

* gcc.dg/builtin-bitreverse-1.c: Add __builtin_bitreverse128 tests.
* gcc.dg/builtin-bitreverse-2.c: Likewise.

Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

tree-ssa-ccp: Fix up __builtin_bitreverse* handling [PR50481]

The committed __builtin_bitreverse* patch mishandled the
bitwise CCP handling, it is true that BUILT_IN_BITREVERSE* can be
handled there similarly to BUILT_IN_BSWAP*, but not exactly, for
the latter we need (and do) bswap the value and mask constants,
while for the former we obviously need to bitreverse them instead.

2026-05-16 Jakub Jelinek <jakub@redhat.com>

PR target/50481
* tree-ssa-ccp.cc (evaluate_stmt): Fix up
BUILT_IN_BITREVERSE{8,16,32,64} handling.

* gcc.dg/builtin-bitreverse-3.c: New test.

Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

x86_64: Handle hard registers in TImode STV with inter-unit moves.

This patch extends the types of chains that can be converted by x86's
TImode Scalar-To-Vector (STV) pass, to include chains that originate
and/or terminate with moves from/to hard registers.  Currently STV
candidate instructions explicitly exclude those than mention hard
registers.

As motivation, consider the four following functions:

__int128 a, b, c, z;
__int128 fun();

void foo_in(__int128 x) { z = (x ^ a ^ b ^ c); }

__int128 foo_out() { return (z ^ a ^ b ^ c); }

__int128 foo_inout(__int128 x) { return (x ^ a ^ b ^ c ^ z); }

void foo_fun() { z = (fun() ^ a ^ b ^ c); }

Of these, only the first, foo_in, is currently STV converted to use
SSE instructions.  Its incoming argument is constructed from a concat
of two DImode registers, and support for this idiom was added in a
previous STV patch.  The next two functions aren't converted because
the chain terminates with a return, which places the TImode result in
a hard register.  Likewise, the final foo_fun case isn't converted as
the result from fun initiates a chain from a hard register.

This patch supports STV conversion of TImode register-to-register
moves, where either the source or the destination (but not both) is
a hard register, by implementing it as a (relatively expensive)
inter-unit move.

Before, with -O2 -mavx:

foo_out:
        movq    z(%rip), %rax
        movq    z+8(%rip), %rdx
        xorq    a(%rip), %rax
        xorq    a+8(%rip), %rdx
        xorq    b(%rip), %rax
        xorq    b+8(%rip), %rdx
        xorq    c(%rip), %rax
        xorq    c+8(%rip), %rdx
        ret

After, with -O2 -mavx:

foo_out:
        vmovdqa z(%rip), %xmm0
        vpxor   a(%rip), %xmm0, %xmm0
        vpxor   b(%rip), %xmm0, %xmm0
        vpxor   c(%rip), %xmm0, %xmm0
        vpextrq $1, %xmm0, %rdx
        vmovq   %xmm0, %rax
        ret

Likewise for foo_fun, before with -O2 -mavx:

foo_fun:
        subq    $8, %rsp
        call    fun
        movq    a(%rip), %rsi
        movq    a+8(%rip), %rdi
        xorq    b(%rip), %rsi
        xorq    b+8(%rip), %rdi
        xorq    c(%rip), %rsi
        xorq    c+8(%rip), %rdi
        xorq    %rax, %rsi
        xorq    %rdx, %rdi
        movq    %rsi, z(%rip)
        movq    %rdi, z+8(%rip)
        addq    $8, %rsp
        ret

After with -O2 -mavx:

foo_fun:
        subq    $8, %rsp
        call    fun
        vmovdqa a(%rip), %xmm0
        vpxor   b(%rip), %xmm0, %xmm0
        vmovq   %rax, %xmm2
        vpxor   c(%rip), %xmm0, %xmm0
        vpinsrq $1, %rdx, %xmm2, %xmm1
        vpxor   %xmm1, %xmm0, %xmm0
        vmovdqa %xmm0, z(%rip)
        addq    $8, %rsp
        ret

The one small subtlety in this patch is in the cost calculation
for inter-unit moves, which now correctly uses both sse_to_integer
and integer_to_sse costs.  This patch models the transfer of double
word transfers between units as interunit_cost + COSTS_N_INSNS(1),
i.e. that the two transfers are pipelined in parallel, so that the
high latency is accounted for once [rather than 2*interunit_cost
that assumes the transfers take place strictly sequentially with
twice the single word transfer latency].

This revision implements Hongtao's suggestions/fixes to support
TImode values in non-general hard registers, and adds two more
test cases.  Alas things turned out to be a little more complicated
than originally proposed; previously STV used PUT_MODE on TImode
pseudo registers to change their mode everywhere, but something
different is required for hard registers, which may be used in
multiple modes in a function.

To demonstrate the (additional) benefits, consider the function:

register __int128 x __asm("xmm0");
register __int128 y __asm("xmm1");
__int128 m;

void foo()
{
  m = x ^ y;
}

Previously GCC on x86_64 with -O2 generated:

foo:    movaps  %xmm0, -24(%rsp)
        movq    -24(%rsp), %rax
        movq    -16(%rsp), %rdx
        movaps  %xmm1, -24(%rsp)
        xorq    -24(%rsp), %rax
        xorq    -16(%rsp), %rdx
        movq    %rax, m(%rip)
        movq    %rdx, m+8(%rip)
        ret

With this revised patch, we now generate:

foo: movdqa  %xmm0, %xmm2
        pxor    %xmm1, %xmm2
        movaps  %xmm2, m(%rip)
        ret

2026-05-16  Roger Sayle  <roger@nextmovesoftware.com>
    Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain): If the chain
starts with a register-to-register move from a hard register,
then the hard register's defs don't need to converted.
(timode_scalar_chain::compute_convert_gain): Provide costs
for hard_reg-to-pseudo and pseudo-to-hard_reg moves.
Tweak speed cost of timode_concatdi_p moves.
(timode_scalar_chain::convert_insn): Add support for
hard_reg-to-pseudo and pseudo-to-hard_reg TImode transfers.
(timode_scalar_to_vector_candidate_p): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx-stv-1.c: New test case.
* gcc.target/i386/sse2-stv-3.c: Likewise.
* gcc.target/i386/sse2-stv-4.c: Likewise.
* gcc.target/i386/sse2-stv-5.c: Likewise.

RISC-V: Add test cases for scalar unsigned SAT form 10

Form 10 is supported already, add test to make sure of it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add form 10.
* gcc.target/riscv/sat/sat_u_mul-11-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u8.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test cases for scalar unsigned SAT form 9

Form 9 is supported already, add test to make sure of it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add form 9.
* gcc.target/riscv/sat/sat_u_mul-10-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u8.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test cases for scalar unsigned SAT form 8

Form 8 is supported already, add test to make sure of it.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add form 8.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u64.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

libgfortran: Fix libcaf_shmem build on Solaris

libcaf_shmem doesn't currently build on Solaris.  Previously this went
unnoticed because the AX_PTHREADS autoconf macro erroneously didn't
detect pthreads support.  Once this is fixed, compilation fails:

In file included from caf/shmem/supervisor.h:35,
                 from caf/shmem/alloc.c:31:
caf/shmem/sync.h:46:25: error: conflicting types for ‘lock_t’; have ‘caf_shmem_mutex’ {aka ‘struct _pthread_mutex’}
   46 | typedef caf_shmem_mutex lock_t;
      |                         ^~~~~~
In file included from /usr/include/sys/machtypes.h:12,
                 from /usr/include/sys/types.h:17,
                 from caf/shmem/thread_support.h:33,
                 from caf/shmem/shared_memory.h:28,
                 from caf/shmem/allocator.h:31,
                 from caf/shmem/alloc.h:28,
                 from caf/shmem/alloc.c:29:
/usr/include/ia32/sys/machtypes.h:50:25: note: previous declaration of ‘lock_t’ with type ‘lock_t’ {aka ‘unsigned int’}

The lock_t definition in <ia32/sys/machtypes.h> is benign: POSIX.1
reserves the _t suffix for the implementation.  At the very least the
code should use a properly prefixed type instead, which this patch does
by changing the code to use caf_shmem_lock_t instead.

Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.

2026-05-14  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

libgfortran:
* caf/shmem/sync.h (lock_t): Rename to caf_shmem_lock_t.
* caf/shmem.c: Adapt uses.

libstdc++: Remove Solaris workaround in 20_util/to_chars/float128_c++23.cc [PR107815]

As described in PR libstdc++/107815, one subtest of
20_util/to_chars/float128_c++23.cc was disabled on Solaris due to a bug
in printf(3C). This has been fixed since October 2023, so the
workaround can be removed.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

2026-05-13 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libstdc++-v3:
PR libstdc++/107815
* testsuite/20_util/to_chars/float128_c++23.cc (test): Remove
Solaris workaround.

match.pd: Simplify ((~x) & y) ^ (x | y)

This adds the simplification of:
  _1 = ~x_2(D);
  t1_4 = _1 & y_3(D);
  t2_5 = x_2(D) | y_3(D);
  _6 = t1_4 ^ t2_5;
  return _6;

to:
  return x_1(D);

also for ((~x) | y) ^ (x & y) version
  _1 = ~x_2(D);
  t1_4 = _1 | y_3(D);
  t2_5 = x_2(D) & y_3(D);
  _6 = t1_4 ^ t2_5;
  return _6;
to:
   int _1;
   _1 = ~x_2(D);
   return _1;

Bootstrapped and tested on aarch64-linux-gnu with
RUNTESTFLAGS="tree-ssa.exp".

changes since v1:
* v3: Change sf2/sg2 to sf/sg in test case
* v2:
- Update testcase to exercise GIMPLE folding
- Add additional type coverage
- Add vector and _Bool coverage
- Move code above in the file

PR tree-optimization/112095

gcc/ChangeLog:

* match.pd: Simplify ((~x) & y) ^ (x | y)
to x and ((~x) | y) ^ (x & y) to ~x.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr112095.c: New test.

Signed-off-by: Shivam Gupta <shivam98.tkg@gmail.com>

aarch64: mingw: fix support for posix threading

Currently without this --with-threads=pthread fails.

gcc/ChangeLog:

* config/i386/mingw-pthread.h:
rename to generic config/mingw/mingw-pthread.h
* config.gcc [aarch64-*-mingw*]:
Fix support for posix threading on aarch64 mingw targets.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>

Daily bump.

Configure EditorConfig for Git commit messages

Add a section for Git commit messages to the `.editorconfig` file, so
that editors with EditorConfig support will automatically format commit
messages according to the GNU style.

ChangeLog:

* .editorconfig (COMMIT_EDITMSG): New section.

ssa_operands: speed up GIMPLE_SWITCH handling

The operands of a GIMPLE_SWITCH is index, followed by the cases.
The cases are all CASE_LABEL_EXPR which are skipped via operands_scanner::get_expr_operands
anyways so we only need to scan the index operand.
This also the first step in changing GIMPLE_SWITCH slightly.

gcc/ChangeLog:

* tree-ssa-operands.cc (operands_scanner::parse_ssa_operands):
Process index of the gswitch only.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

testsuite: fix Wuninitialized-pr107919-1.C

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-pr107919-1.C: Fix a dg-bogus.

tree-cfg: Revert part of r8-546 [PR125290]

This reverts the group_case_labels_stmt part of r8-546-gca4d2851687875.
This is placed in the wrong location to remove the case statements that go
directly to __builtin_unreachable. In fact the removal of the case statements
make us lose optimizations in some cases (Wuninitialized-pr107919-1.C for one).

Also this fixes PR 125290 by no longer leaving around a switch which just
has a default case.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/125290

gcc/ChangeLog:

* tree-cfg.cc (group_case_labels_stmt): Remove code that was
added to remove `cases` that goto blocks of unreachable.
* tree-ssa-forwprop.cc (optimize_unreachable): Remove the
comment about switch cases being handled.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-pr107919-1.C: Remove xfail.
* gcc.dg/analyzer/taint-assert.c: Update for the non-removal
of block containing unreachable.
* gcc.dg/torture/pr125290-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>

libstdc++: Include range_access.h from <optional> and <stacktrace>

This implements resolutions for LWG4131 and LWG3625 (also part of 4.8. section
of the P3016R6). As for any other issues LWG issue changes are applied as DR
for the oldest applicable standards:
* <optional> (LWG4131): C++26, since optional range support
* <stacktrace> (LWG3625): C++23, since introduction

libstdc++-v3/ChangeLog:

* include/std/optional [__cpp_lib_optional_range_support]:
Replace <bits/stl_iterator.h> include with <bits/range_access.h>.
* include/std/stacktrace: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

pr124532: Reset musttail attribute in compound statements

A compound statement didn't reset the musttail state after the statement
with the attribute, which led to bogus errors later. Always reset it.

PR c/124532

gcc/c/ChangeLog:

* c-parser.cc (struct attr_state): Add reset method.
(c_parser_compound_statement_nostart): Rename a to astate.
Reset state before iterating statements.

gcc/testsuite/ChangeLog:

* c-c++-common/pr124532.c: New test.

hppa64: Use DW_EH_PE_aligned encoding on 64-bit HP-UX

In testing with GNU ld, I noticed that the HP-UX dynamic linker
doesn't support unaligned DW_EH_PE_absptr encodings.

2026-05-15 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.h (ASM_PREFERRED_EH_DATA_FORMAT): Use
DW_EH_PE_aligned encoding on 64-bit HP-UX.

testsuite: Add aarch64 SVE support to slp-reduc-15.c

Add aarch64 SVE support and use -mavx2 for x86 to support all x86
modes.

Changes:
- Add aarch64-*-* target with -march=armv8.2-a+sve
- Use -mavx2 instead of -march=x86-64-v3 to support all x86 modes
- Separate -fgimple from architecture-specific options.

Reported-by: https://linaro.atlassian.net/browse/GNU-1901
gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-15.c: Add aarch64 support and use
-mavx2 for x86.

caller_save_regs: Return the enabled registers

Since the caller can save a register only if the register is enabled in
the caller, change caller_save_regs to return the enabled registers.

PR rtl-optimization/125321
* function-abi.cc (function_abi_aggregator::caller_save_regs):
Return the enabled registers.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

libstdc++: Use IANA name for ISO-8859-1 in format tests.

Use IANA name for ISO-8859-1 as "-fexec-charset=" and add dg-require-iconv,
to make sure it is supported.

libstdc++-v3/ChangeLog:

* testsuite/std/format/debug_nonunicode.cc: Pass ISO-8859-1 as
exec-charset and make sure that it is supported.
* testsuite/std/format/fill_nonunicode.cc: Likewise.

scev: Sign extend step in peeled converted IV handling [PR 125291]

For 3 iterations of

    unsigned char flagbits;
    _877 = flagbits_832 + 254;
    _879 = (int) _877;
    # prephitmp_880 = PHI <_879(40), 6(41)>
    _70 = _68 >> prephitmp_880;

The peeled converted IV handling added in r16-3562 incorrectly analyzes
it as [6, 6 + 254, 6 + 254 * 2] instead of [6, 4, 2].  Then VRP uses the
intersect of {6, 560, 514} and {2, 4, 6}, i.e. {6} as the possible value
range, and propagates the constant 6 for _70.

Extend the step (for example, 254 => -2) to fix the issue.

PR tree-optimization/125291

gcc/

* tree-scalar-evolution.cc (simplify_peeled_chrec): Sign-extend
the step for peeled converted IV.

gcc/testsuite/

* gcc.c-torture/execute/pr125291.c: New test.

LoongArch: add spaceship expanders

This helps to optimize certain nested ternary operation producing -1, 0,
or 1 to slt[u]-slt[u]-sub.

gcc/

* config/loongarch/loongarch.md (spaceship<mode>4): New
define_expand.

gcc/testsuite/

* gcc.target/loongarch/la64/spaceship.c: New test.

match.pd: Allow FNMA fold through conversions

The FMA folds in match.pd currently only matches (negate @0) directly.
When the negated operand is wrapped in a type conversion
(e.g. (convert (negate @0))), the simplification to IFN_FNMA does not
trigger.

This prevents folding of patterns such as:

*c = *c - (v8u)(*a * *b);

when the multiply operands undergo vector type conversions before being
passed to FMA. In such cases the expression lowers to neg + mla/mad
instead of the more optimal msb/mls on AArch64 SVE, because the current
fold cannot see through the casts.

Extend the match pattern to allow conversions on the negated
operand and the second multiplicand:

(fmas:c (nop_convert (negate @0)) @1 @2)

The fold rewrites the expression as:

convert (IFN_FNMA (convert:ut @0) (convert:ut @1) (convert:ut @2))

The match is restricted to nop_convert on the negated operand to avoid
folding through value-changing conversions. This enables recognition of
the subtraction-of-product form even when vector element type casts are
present.

The fold is only performed when signed overflow is unobservable for
both the outer FMA operation and the inner negation. This avoids
changing sanitized overflow behaviour when looking through the nop
conversion on the negated operand.

With this change, AArch64 SVE code generation is able to select msb/mls
instead of emitting a separate neg followed by mla/mad.

This patch was bootstrapped and regression tested on aarch64-linux-gnu.

PR target/123924
gcc/
* match.pd: Allow conversions in FMA-to-FNMA fold.

gcc/testsuite/
* gcc.target/aarch64/sve/fnma_match.c: New test.
* gcc.target/aarch64/sve/pr123897.c:
Update the test to scan for FNMA in the tree dump.

Add vector_costs::add_slp_cost grouping hook

The following simplifies the earlier RFC for making it easier
for the target to correlate multiple cost events created from
a single SLP operation. Instead of changing where we record
costs this patch only adjusts the submission part.

Targets wanting to take advantage of this can implement
add_slp_cost and handle all or select cases and resort
to the default implementation to get add_stmt_cost events
for unhandled groups.

* tree-vectorizer.h (vector_costs::add_slp_cost): New.
* tree-vectorizer.cc (vector_costs::add_slp_cost): New
default version.
* tree-vect-slp.cc (add_slp_costs): Helper for dispatching
a cost vector in SLP chunks.
(vect_slp_analyze_operations): Adjust.
(li_cost_vec_cmp): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.

tree-optimization/125296 - preserve alignment of access with address forwarding

The following makes sure to preserve the alignment of the access
and not pick up that of parts of the address we forward.

PR tree-optimization/125296
* tree-ssa-forwprop.cc (forward_propagate_addr_expr_1):
Preserve alignment of the original access.

* gcc.dg/pr125206.c: New testcase.

i386: Fix up *minmax<mode>3_4 [PR125308]

IEEE min/max are not commutative and in the pattern
(define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
  [(set (match_operand:VFH 0 "register_operand" "=x,v")
        (unspec:VFH
          [(match_operand:VFH 1 "register_operand" "0,v")
           (match_operand:VFH 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")]
          IEEE_MAXMIN))]
the first operand is a register and only the second one is register/memory.
Now, the *minmax<mode>3_3 define_insn_and_split does
   rtx tmp = force_reg (<MODE>mode, operands[3]);
   rtvec v = gen_rtvec (2, tmp, operands[2]);
   operands[5] = gen_rtx_UNSPEC (<MODE>mode, v, u);
where operands[3] is the const0_operand, so operands[2] can there be
a memory, but in the *minmax<mode>3_4 case
   rtx tmp = force_reg (<MODE>mode, operands[3]);
   rtvec v = gen_rtvec (2, operands[2], tmp);
   operands[5] = gen_rtx_UNSPEC (<MODE>mode, v, u);
operands[2] goes into the operand which must be a REG, so it
is incorrect to split it into something that won't work.
Now, I've tried both disabling the define_insn_and_split and
the following patch, the former to the latter results in
        movaps  a, %xmm0
        pxor    %xmm1, %xmm1
-       cmpltps %xmm0, %xmm1
-       andps   %xmm1, %xmm0
+       maxps   %xmm1, %xmm0
        movaps  %xmm0, a
        ret
on the testcase, so I think it is better to match it and force_reg
(it is a pre-reload splitter) than change "nonimmediate_operand"
to "register_operand" because it won't match in that case.

2026-05-15  Jakub Jelinek  <jakub@redhat.com>

PR target/125308
* config/i386/sse.md (*minmax<mode>3_4): Force also
operands[2] into a REG.

* gcc.target/i386/pr125308.c: New test.

Add __builtin_bitreverse{8,16,32,64} builtins [PR50481]

Future work could optimize this on specific targets:
- ARM: lower to RBIT
- x86 with GFNI: lower to vgf2p8affineqb
  https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/

2026-05-15  Disservin  <disservin.social@gmail.com>
    Jakub Jelinek  <jakub@redhat.com>

PR target/50481
* builtin-types.def (BT_FN_UINT8_UINT8): New.
* builtins.def (BUILT_IN_BITREVERSE8, BUILT_IN_BITREVERSE16,
BUILT_IN_BITREVERSE32, BUILT_IN_BITREVERSE64): New builtins.
* builtins.cc (expand_builtin, is_inexpensive_builtin): Handle
bitreverse builtins.
* fold-const-call.cc (fold_const_call_ss): Fold bitreverse builtins.
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
bitreverse builtins.
* optabs.def (bitreverse_optab): New.
* optabs.cc (expand_bitreverse): New function.
(expand_unop): Use it for bitreverse_optab.
* tree-ssa-ccp.cc (evaluate_stmt): Handle bitreverse builtins.
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p,
cond_removal_in_builtin_zero_pattern): Likewise.
* doc/extend.texi: Document __builtin_bitreverse{8,16,32,64}.
* doc/md.texi (bitreverse<mode>2): Document.

* gcc.dg/builtin-bitreverse-1.c: New test.
* gcc.dg/builtin-bitreverse-2.c: New test.

Signed-off-by: Disservin <disservin.social@gmail.com>

testsuite: Add testcase for consteval-only type [PR125179]

The following testcase tests that the consteval-only computation is not
quadratic. With a loop of 50000 types, I believe this would be O(50000^2)
in the earlier implementation and so would timeout.

2026-05-15 Jakub Jelinek <jakub@redhat.com>

PR c++/125179
* g++.dg/reflect/pr125179.C: New test.

c, c++: Introduce -Wconstant-logical-operand warning [PR125081]

Given the recent (data->flags && ff_genericize) vs.
(data->flags & ff_genericize) typo, I've looked at warning in similar
cases.
We don't warn for cases like that at all, clang/clang++ has
-Wconstant-logical-operand warning enabled by default.
Their behavior is:
1) only warns for rhs of &&/|| (why?)
2) don't warn if rhs is bool
3) for C++ warn if rhs is constant or folds into constant,
   for C warn if rhs is constant or folds into constant and
   that constant is not 0 or 1
4) I think it doesn't warn if rhs comes from a macro
The following patch implements similar warning with similar wording,
just provides the value of the constant, but
1) warns for lhs and rhs
2) doesn't warn if either lhs or rhs is bool
3) doesn't warn if lhs or rhs is or folds to constant 0 or 1
   (but does warn if it is constant 1 of enum type in an enum which
   has enumerator other than just 0/1 (i.e. poor man's boolean))
4) doesn't care if it comes from a macro or not
I think 64 && x is similarly suspicious to x && 64 and both
are likely to be meant 64 & x or x & 64.  I think having
&& 1 or && 0 is common even in C++, people don't always write
&& true or && false etc. and don't see why C++ would be different
in that from C, I think people sometimes write
if (1
#ifdef ABC
     && ABC
#endif
#ifdef DEF
     && DEF
#endif
     && 1)
and similar (or similarly with 0/true/false or ||).  And the warning
is only enabled in -Wall, not by default.

2026-05-15  Jakub Jelinek  <jakub@redhat.com>

PR c++/125081
gcc/
* doc/invoke.texi (Wconstant-logical-operand): Document.
gcc/c-family/
* c.opt (Wconstant-logical-operand): New option.
* c.opt.urls: Regenerate.
gcc/c/
* c-tree.h (parser_build_binary_op): Add ORIG_ARG1 argument.
* c-typeck.cc (parser_build_binary_op): Likewise.  Emit
-Wconstant-logical-operand warnings.
* c-parser.cc (c_parser_binary_expression): Adjust
parser_build_binary_op caller, pass to it the original
stack[sp - 1].expr.value before c_objc_common_truthvalue_conversion.
gcc/cp/
* typeck.cc (cp_build_binary_op): Emit -Wconstant-logical-operand
warnings.
gcc/testsuite/
* c-c++-common/Wconstant-logical-operand-1.c: New test.
* c-c++-common/Wconstant-logical-operand-2.c: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: "Joseph S. Myers" <josmyers@redhat.com>

c++: constexpr nested empty objects [PR125315]

Here we have one empty subobject inside another; we didn't create a
CONSTRUCTOR for the outer one, so we don't have it as context for the inner.

PR c++/125315

gcc/cp/ChangeLog:

* constexpr.cc (init_subob_ctx): Allow both ctor and object
to be null for an empty subobject.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/no_unique_address16.C: New test.

Daily bump.

testsuite: Remove debugging puts from check_profiling_available

Remove a stray debugging puts and an unused global declaration from
check_profiling_available. Neither affects the AutoFDO availability
check, as the wrapper path is already computed by profopt-perf-wrapper.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_profiling_available): Remove
unused global declaration and debugging output.

c++/reflection: undeduced auto, deferred noexcept [PR124628]

Various reflection queries reject functions (or variables) with an
undeduced return type.  But this assumes return type deduction has
already been attempted which is not the case if the function is a
specialization that has not yet been ODR-used or otherwise instantiated,
which we must do now.  Similarly a function can also have an deferred
noexcept-specification which we should also instantiate at this point.

Rather an inventing a new way to resolve the type of such a function
or variable for reflection purposes, I think we can just silently call
mark_used in an unevaluated context, which will behave similarly to
requires { &decl; }.  Since diagnostics (in the immediate context) get
suppressed, we'll gracefully handle deleted functions or those with
unsatisfied constraints, leaving it up to the caller to handle them.

PR c++/124628

gcc/cp/ChangeLog:

* reflect.cc (resolve_type_of_reflected_decl): New.
(get_reflection): Call resolve_type_of_reflected_decl instead
of mark_used.
(has_type): Call resolve_type_of_reflected_decl before
checking for an undeduced auto.
(eval_can_substitute): Likewise.  Also look through BASELINK.
(members_of_representable): Call resolve_type_of_reflected_decl
before checking for an undeduced auto.

gcc/testsuite/ChangeLog:

* g++.dg/reflect/can_substitute2.C: New test.
* g++.dg/reflect/members_of14.C: New test.
* g++.dg/reflect/substitute3.C: Adjust test so that f<int>'s
return type fails to get deduced.
* g++.dg/reflect/type_of3.C: Also test type_of of a templated
member function with deduced return type.

Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>

PR124316: Fix ptwrite assembler mode

Add explicit assembler mode for cases when the argument is not
unambigious. This avoids cases where a user specified 64bit ptwrite
ends up being 32bit due to an ambigious argument.

PR target/124316

gcc/ChangeLog:

* config/i386/i386.md (ptwrite): Add explicit mode to
instruction.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr124316.c: New test.

OpenMP: mapper [C/C++] reject w/o map usage, reject C++98, fix map decay

This commit adds a check for the following 'declare mapper restriction:
"At least one map clause that maps var or at least one element of var is
required."

It additionally fixes a bug in the map-decay code, which did not handle
map modifiers like 'always' when specified in the declare mapper's map
clause.

For C++, some checks are now also run when templates are involved.
Additionally, it turned out that the internal use of constexpr caused
bogus errors when compiled with -std=c++98; therefore, a sorry is now
shown. Solution is to use -std=c++11 or higher.

gcc/c-family/ChangeLog:

* c-omp.cc (omp_map_decayed_kind): Handle map modifiers
also for declare-mapper's map clauses.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_declare_mapper): Check that the
struct var is actually used by at least one map clause.

gcc/cp/ChangeLog:

* semantics.cc (cp_check_omp_declare_mapper): Change what
argument is expected; check that the struct var is used by at
least one map war. Print sorry when compiling with -std=c++98.
* pt.cc (tsubst_stmt, tsubst_expr): Call it.
* parser.cc (cp_parser_omp_declare_mapper): Update call.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-mapper-10.c: Exclude C++98.
* c-c++-common/gomp/declare-mapper-15.c: Likewise.
* c-c++-common/gomp/declare-mapper-16.c: Likewise.
* c-c++-common/gomp/declare-mapper-3.c: Likewise.
* c-c++-common/gomp/declare-mapper-4.c: Likewise.
* c-c++-common/gomp/declare-mapper-5.c: Likewise.
* c-c++-common/gomp/declare-mapper-6.c: Likewise.
* c-c++-common/gomp/declare-mapper-7.c: Likewise.
* c-c++-common/gomp/declare-mapper-8.c: Likewise.
* c-c++-common/gomp/declare-mapper-9.c: Likewise.
* g++.dg/gomp/declare-mapper-1.C: Likewise.
* g++.dg/gomp/declare-mapper-2.C: Likewise.
* c-c++-common/gomp/pr122866.c: Expect sorry with C++98.
* c-c++-common/gomp/declare-mapper-11.c: Likewise.
Add dg-error for missing var-in-map-clause use.
* g++.dg/gomp/declare-mapper-3.C: Likewise.
* c-c++-common/gomp/declare-mapper-17.c: New test.
* c-c++-common/gomp/declare-mapper-18.c: New test.
* g++.dg/gomp/declare-mapper-4.C: New test.
* g++.dg/gomp/declare-mapper-5.C: New test.

PR fortran/125092 - implement for binding label argument mismatch.

This patch implements some checks on different interfaces to the same
C binding functions. It contains a few policy changes, and is somewhat
more permissive than the standard, but there are no constraint
violations (to my knowledge) that it misses.

Apart from checking for standards conformance, this should also
help proof code against (now or future) type-based aliasing mishaps.

Checks for global identifiers are performed on a case-insensitive
basis by default, and only sensitive when -pedantic is in force.
This makes sense if Fortran code wants to interface to "FOO" and
"foo". The restriction to case-insensitive labels comes from a time
when relevant systems had linkers which were case-insensitive, and
it is not possible to implement C (especially the C versions referenced
in the standard) with such a linker.

Return types of functions, ranks, number, type and rank of arguments
are checked. In non-pedantic mode, arguments which have the same
prototype on the C side are permitted, for example passing a scalar
or an array by reference, or arrays of different rank (both for pass
by reference and pass by descriptors). Assumed types are also
assumed to bee OK. This functionality was checked in a few test
cases, so it would make little sense to remove it.

C_PTR is *not* compatible with a random argument passed by reference.
For example, a TYPE(C_PTR), VALUE argument is not compatible
with an INTEGER argument (without VALUE); C_LOC has to be used.

The one-liner in decl.cc may fix some ENTRY problems, I didn't check.

gcc/fortran/ChangeLog:

PR fortran/125092
* decl.cc (add_global_entry): Use string from the heap instead
of a pointer to stack-allocated memory.
* frontend-passes.cc (check_against_globals): If there is an error
already, return early.
* gfortran.h (gfc_symbol_rank): New prototype.
* interface.cc (symbol_rank): Rename to
(gfc_symbol_rank): this.
(gfc_check_dummy_characteristics): Use new function name.
(gfc_check_result_characteristics): Likewise.
(gfc_compare_interfaces): Likewise.
(compare_parameter): Likewise.
(get_sym_storage_size): Likewise.
(gfc_procedure_use): Likewise.
* resolve.cc (decays_to_pointer): New function.
(c_types_conform): New function.
(compare_c_binding_arglists): New function.
(gfc_verify_binding_labels): Check return types and rank
plus argument lists if there is a pre-exisiting global
symbol.

gcc/testsuite/ChangeLog:

PR fortran/125092
* gfortran.dg/PR100906.f90: Add -Wno-pedantic to options.
* gfortran.dg/PR100911.f90: Likewise.
* gfortran.dg/PR100915.f90: Likewise.
* gfortran.dg/PR94327.f90: Likewise.
* gfortran.dg/PR94331.f90: Likewise.
* gfortran.dg/bind_c_procs_4.f90: Add error messages, remove
warning.
* gfortran.dg/binding_label_tests_25.f90: Add error messages.
* gfortran.dg/binding_label_tests_3.f03: Add error messages.
* gfortran.dg/binding_label_tests_34.f90: Add -Wno-pedantic to
options.
* gfortran.dg/c_char_tests_4.f90: Likewise.
* gfortran.dg/c_char_tests_5.f90: Likewise.
* gfortran.dg/binding_label_tests_36.f90: New test.
* gfortran.dg/binding_label_tests_37.f90: New test.

LoongArch: Improve xor+xor+ior sequence when possible [PR 96692]

Copy the a ^ b ^ (a | c) => (c & ~a) ^ b optimization from RISC-V zbb
(r17-241) as we have the andn instruction in LA64 and LA32S.

PR rtl-optimization/96692

gcc/

* config/loongarch/loongarch.md (define_split): New splitters
turning a ^ b ^ (a | c) => (c &~ a) ^ b.

gcc/testsuite/

* gcc.target/loongarch/pr96692.c: New test.

libstdc++: Merge __type_identity and type_identity for C+20.

The components that used __type_identity in C++20 mode (due source
compatibility with older standard) lead to instantiation of separate
class template from std::type_identity for each used type. This
patch makes __type_identity an alias to type_identity if the latter is
available.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__type_identity, __type_identity_t)
[__cpp_lib_type_identity]: Define as alias to type_identity
and its nested type respectively.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

[RISC-V] Drop unused parameters to restore bootstrap

Bootstraps on RISC-V recently started failing due to an unused parameter. I've
only lightly tested this, but it should restore bootstrapping. Pushing to the
trunk.

gcc/
* config/riscv/riscv-v.cc (expand_const_vector_stepped): Drop unused
SRC parameter. All callers changed.
(expand_const_vector_interleaved_stepped_npatterns): Likewise.
(expand_const_vector): Corresponding changes.

doc: Fix description of GET_MODE_MASK

This updates the description of GET_MODE_MASK in the GCC internals
manual.  Currently it says that the macro can only be used for modes
whose bitsize is less than or equal to HOST_BITS_PER_INT.  That may have
once been correct, but these days it seems to work for modes up to
HOST_BITS_PER_WIDE_INT in size.  The code in genmodes.cc:emit_mode_mask
seems to support this.  It has:

  print_maybe_const_decl ("%sunsigned HOST_WIDE_INT", "mode_mask_array",
                          "NUM_MACHINE_MODES", adj_nunits);
  puts ("\
  ((m) >= HOST_BITS_PER_WIDE_INT)             \\\n\
   ? HOST_WIDE_INT_M1U                        \\\n\
   : (HOST_WIDE_INT_1U << (m)) - 1\n");

gcc/ChangeLog:

* doc/rtl.texi (Machine Modes): Update description of
GET_MODE_MASK to use HOST_BITS_PER_WIDE_INT instead of
HOST_BITS_PER_INT as the upper limit of the input mode's bitsize.

libstdc++: Fix reserve of size_t(-1) elements in piecewise_constant_distribution. [PR113761]

The piecewise_constant_distribution constructor from std::initializer_list il,
unconditionally reserve _M_den for il.size()-1 elements. In case when the il.size()
was zero, this led to unsigned overflow, and attempt to allocate size_t(-1)
elements.

This patch addresses above, by refactoring the constructors of param_type for
both piecewise_constant_distribution and piecewise_linear_distribution, to
exit early (and do not populate internal vectors) if number of intervals range is
smaller than two. For the constructor accepting pair of iterators, this is done
by checking result of __detail::__load_first2, that extracts up to two elements,
and returns false, if less than two is found.

Furthermore, we if the number of intervals is equal to two (for iterator __bbegin
is at __bend after __load_first2), we store densities value on stack, and call
newly introduced an _M_initialize2 helper, that does not populate internal vector
if __ints and __dens values correspond to default configuration.

With both of above changes, we no longer populate _M_int and _M_den with default
values, and corresponding code that clears them in _M_initialize is not necessary.
This avoids any unnecessary memory allocators.

For constructor accepting two iterators, we reserve required space in _M_int vector,
if the iterators are forward (or model sized_sentinel in C++20). The _M_den
initialization is performed afterwards, so _M_int size is already determined
(even for input iterators) and it can be used for call to reserve.

Finally, the _M_initialize members are renamed to _M_configure, so the new
implementation that skips the check, is not used by TU compiler will other
that would invoke it with empty vector, and thus retrigger the issue.

PR libstdc++/113761

libstdc++-v3/ChangeLog:

* include/bits/random.h
(piecewise_constant_distribution::param_type::_M_initialize)
(piecewise_linear_distribution::param_type::_M_initialize): Remove.
(piecewise_constant_distribution::param_type::_M_configure)
(piecewise_linear_distribution::param_type::_M_configure): Define.
(piecewise_constant_distribution::param_type::_M_initialize2)
(piecewise_linear_distribution::param_type::_M_initialize2): Declare.
* include/bits/random.tcc (__detail::__load_first2): Define.
(piecewise_constant_distribution::param_type::_M_initialize)
(piecewise_linear_distribution::param_type::_M_initialize):
Rename to...
(piecewise_constant_distribution::param_type::_M_configure)
(piecewise_linear_distribution::param_type::_M_configure):
Renamed implementation of _M_initialize, that removes checks for
default values.
(piecewise_constant_distribution::param_type::_M_initialize2)
(piecewise_linear_distribution::param_type::_M_initialize2): Define.
(piecewise_constant_distribution::param_type::param_type)
(piecewise_linear_distribution::param_type::param_type):
Exit early for less that two intervals. Use _M_initialize2 to handle
two intervals case. Reserve _M_int for iterators case.
* testsuite/26_numerics/random/piecewise_constant_distribution/cons/range.cc:
Test input and forward iterators, in addition to random_access ones.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/random/piecewise_constant_distribution/cons/fallback.cc:
New test.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/fallback.cc:
New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

OpenMP: Improve interface comment for the omp_deep_mapping lang hooks

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt,
gfc_omp_deep_mapping): Improve interface comment.

gcc/ChangeLog:

* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
lhd_omp_deep_mapping): Improve interface comment.
* langhooks.h (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
lhd_omp_deep_mapping): Likewise