git.ipfire.org Git - people/ms/gcc.git/log

i386: Fix up builtins used in avx512bf16vlintrin.h [PR108881]

The builtins used in avx512bf16vlintrin.h implementation need both
avx512bf16 and avx512vl ISAs, which the header ensures for them, but
the builtins weren't actually requiring avx512vl, so when used by hand
with just -mavx512bf16 -mno-avx512vl it resulted in ICEs.

Fixed by adding OPTION_MASK_ISA_AVX512VL to their BDESC.

2023-02-24 Jakub Jelinek <jakub@redhat.com>

PR target/108881
* config/i386/i386-builtin.def (__builtin_ia32_cvtne2ps2bf16_v16hi,
__builtin_ia32_cvtne2ps2bf16_v16hi_mask,
__builtin_ia32_cvtne2ps2bf16_v16hi_maskz,
__builtin_ia32_cvtne2ps2bf16_v8hi,
__builtin_ia32_cvtne2ps2bf16_v8hi_mask,
__builtin_ia32_cvtne2ps2bf16_v8hi_maskz,
__builtin_ia32_cvtneps2bf16_v8sf_mask,
__builtin_ia32_cvtneps2bf16_v8sf_maskz,
__builtin_ia32_cvtneps2bf16_v4sf_mask,
__builtin_ia32_cvtneps2bf16_v4sf_maskz,
__builtin_ia32_dpbf16ps_v8sf, __builtin_ia32_dpbf16ps_v8sf_mask,
__builtin_ia32_dpbf16ps_v8sf_maskz, __builtin_ia32_dpbf16ps_v4sf,
__builtin_ia32_dpbf16ps_v4sf_mask,
__builtin_ia32_dpbf16ps_v4sf_maskz): Require also
OPTION_MASK_ISA_AVX512VL.

* gcc.target/i386/avx512bf16-pr108881.c: New test.

(cherry picked from commit 0ccfa3884f638816af0f5a3f0ee2695e0771ef6d)

reassoc: Fold some statements [PR108819]

This spot in update_ops can replace one or both of the assign operands with
constants, creating 1 & 1 and similar expressions which can confuse later
passes until they are folded. Rather than folding both constants by hand
and also handling swapping of operands for commutative ops if the first one
is constant and second one is not, the following patch just uses
fold_stmt_inplace to do that. I think we shouldn't fold more than the
single statement because that could screw up the rest of the pass, we'd have
to mark all those with uids, visited and the like.

2023-02-18 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/108819
* tree-ssa-reassoc.cc (update_ops): Fold new stmt in place.

* gcc.dg/pr108819.c: New test.

(cherry picked from commit 32b5875c911f80d551d006d7473e6f1f8705857a)

libgomp: Fix up some typos in libgomp.texi

I decided to check for repeated the the in libgomp and noticed
there are several occurrences of a typo theads rather than threads
in libgomp.texi.

2023-02-16 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi: Fix typos - theads -> threads.

(cherry picked from commit 0b9bd33d69d0c30330a465e6bad262d90c94d4ea)

i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
and just sets stuff up based on CPUID leaf 1, or some extended ones,
so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
and not for all other CPUs too? I think various programs in the wild
which aren't using __builtin_cpu_{is,supports} just check the various CPUID
leafs and query bits in there, without blacklisting unknown CPU vendors,
so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
1 or some extended ones. Calling it for all CPUs also means it can be
inlined because there will be just a single caller.

I have tested it on Intel and Martin tested it on AMD, but can't test it
on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
behavior.

2023-02-09 Jakub Jelinek <jakub@redhat.com>

PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call
get_available_features for all CPUs with max_level >= 1, rather
than just Intel or AMD.

(cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)

Daily bump.

d: Fix closure fields don't get same alignment as local variable [PR109144]

Local variables with both non-local references and explicit alignment
did not propagate their alignment to either the closure field or closure
frame type, resulting in the closure being misaligned. This is now
correctly set-up when building the frame type.

PR d/109144

gcc/d/ChangeLog:

* d-codegen.cc (build_frame_type): Set frame field and type alignment.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr109144.d: New test.

(cherry picked from commit 46c4be98d1e759a406069487e5dbaad0346e7e7d)

testsuite/108985 - missing vect_simd_clones target requirement on test

Added.

PR testsuite/108985
* gcc.dg/vect/pr108950.c: Require vect_simd_clones.

(cherry picked from commit a2926653ebbc88e8bba335563fa86b44651598d6)

Daily bump.

Fortran: fix bounds check for copying of class expressions [PR106945]

In the bounds check for copying of class expressions, the number of elements
determined from a descriptor, returned as type gfc_array_index_type (i.e. a
signed type), should be converted to the type of the passed element count,
which is of type size_type_node (i.e. unsigned), for use in comparisons.

gcc/fortran/ChangeLog:

PR fortran/106945
* trans-expr.cc (gfc_copy_class_to_class): Convert element counts in
bounds check to common type for comparison.

gcc/testsuite/ChangeLog:

PR fortran/106945
* gfortran.dg/pr106945.f90: New test.

(cherry picked from commit 2cf5f485e0351bb1faf46196a99e524688f3966e)

Fortran: fix ICE with bind(c) in block data [PR104332]

gcc/fortran/ChangeLog:

PR fortran/104332
* resolve.cc (resolve_symbol): Avoid NULL pointer dereference while
checking a symbol with the BIND(C) attribute.

gcc/testsuite/ChangeLog:

PR fortran/104332
* gfortran.dg/bind_c_usage_34.f90: New test.

(cherry picked from commit e20e5d9dc11b64e8eabce6803c91cb5768207083)

ubsan: missed -fsanitize=bounds for compound ops [PR108060]

In this PR we are dealing with a missing .UBSAN_BOUNDS, so the
out-of-bounds access in the test makes the program crash before
a UBSan diagnostic was emitted. In C and C++, c_genericize gets

 a[b] = a[b] | c;

but in C, both a[b] are one identical shared tree (not in C++ because
cp_fold/ARRAY_REF created two same but not identical trees). Since
ubsan_walk_array_refs_r keeps a pset, in C we produce

 a[.UBSAN_BOUNDS (0B, SAVE_EXPR , 8);, SAVE_EXPR ;] = a[b] | c;

because the LHS is walked before the RHS.

Since r7-1900, we gimplify the RHS before the LHS. So the statement above
gets gimplified into

 _1 = a[b];
 c.0_2 = c;
 b.1 = b;
 .UBSAN_BOUNDS (0B, b.1, 8);

With this patch we produce:

 a[b] = a[.UBSAN_BOUNDS (0B, SAVE_EXPR , 8);, SAVE_EXPR ;] | c;

which gets gimplified into:

 b.0 = b;
 .UBSAN_BOUNDS (0B, b.0, 8);
 _1 = a[b.0];

therefore we emit a runtime error before making the bad array access.

I think it's OK that only the RHS gets a .UBSAN_BOUNDS, as in few lines
above: the instrumented array access dominates the array access on the
LHS, and I've verified that

 b = 0;
 a[b] = (a[b], b = -32768, a[0] | c);

works as expected: the inner a[b] is OK but we do emit an error for the
a[b] on the LHS.

For GCC 14, we could apply
<https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613687.html>
since the copy_node doesn't seem to be needed.

PR sanitizer/108060
PR sanitizer/109050

gcc/c-family/ChangeLog:

* c-gimplify.cc (ubsan_walk_array_refs_r): For a MODIFY_EXPR, instrument
the RHS before the LHS.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/bounds-17.c: New test.
* c-c++-common/ubsan/bounds-18.c: New test.
* c-c++-common/ubsan/bounds-19.c: New test.
* c-c++-common/ubsan/bounds-20.c: New test.
* c-c++-common/ubsan/bounds-21.c: New test.

(cherry picked from commit 4d0baeae315ebe7d0ec7682ea3e7c0516027c2b8)

c++: ICE with constexpr lambda [PR107280]

We crash here since r10-3661, the store_init_value hunk in particular.
Before, we called cp_fully_fold_init, so e.g.

{.str=VIEW_CONVERT_EXPR<char[8]>("")}

was folded into

{.str=""}

but now we don't fold and keep the VCE around, and it causes trouble in
cxx_eval_store_expression: in the !refs->is_empty () loop we descend on
.str's initializer but since it's wrapped in a VCE, we skip the STRING_CST
check and then crash on the CONSTRUCTOR_NO_CLEARING.

PR c++/107280

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Strip location wrappers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-lambda28.C: New test.

(cherry picked from commit be20dcc359bcc4677c5b9ce011d3cd7b4ce94a64)

tree-optimization/108950 - widen-sum reduction ICE

When we end up with a widen-sum with an invariant smaller operand
the reduction code uses a wrong vector type for it, causing
IL checking ICEs. The following fixes that and the inefficiency
of using a widen-sum with a widenend invariant operand as well
by actually performing the check the following comment wants.

PR tree-optimization/108950
* tree-vect-patterns.cc (vect_recog_widen_sum_pattern):
Check oprnd0 is defined in the loop.
* tree-vect-loop.cc (vectorizable_reduction): Record all
operands vector types, compute that of invariants and
properly update their SLP nodes.

* gcc.dg/vect/pr108950.c: New testcase.

(cherry picked from commit e3837b6f6c28a1d2cea3a69efbda795ea3fb8816)

tree-optimization/108821 - store motion and volatiles

The following fixes store motion to not re-issue volatile stores
to preserve TBAA behavior since that will result in the number
of volatile accesses changing.

PR tree-optimization/108821
* tree-ssa-loop-im.cc (sm_seq_valid_bb): We can also not
move volatile accesses.

* gcc.dg/tree-ssa/ssa-lim-24.c: New testcase.

(cherry picked from commit 4c4f0f7acd6b96ee744ef598cbea5c7046a33654)

tree-optimization/108816 - vect versioning check split confusion

The split of the versioning condition assumes the definition is
in the condition block which is ensured by the versioning code.
But that only works when we actually have to insert any statements
for the versioning condition. The following adjusts the guard
accordingly and asserts this condition.

PR tree-optimization/108816
* tree-vect-loop-manip.cc (vect_loop_versioning): Adjust
versioning condition split prerequesite, assert required
invariant.

* gcc.dg/torture/pr108816.c: New testcase.

(cherry picked from commit 63471c5008819bbf6ec32a6f4d8701fe57b96fa9)

tree-optimization/108793 - niter compute type mismatch

When computing the number of iterations until wrap types are mixed up,
eventually leading to checking ICEs with a pointer bitwise inversion.
The following uses niter_type for the calculation.

PR tree-optimization/108793
* tree-ssa-loop-niter.cc (number_of_iterations_until_wrap):
Use convert operands to niter_type when computing num.

* gcc.dg/torture/pr108793.c: New testcase.

(cherry picked from commit a7e706df2280de4a42f68b6c44401e4348d3593c)

tree-optimization/108724 - vectorized code getting piecewise expanded

This fixes an oversight to when removing the hard limits on using
generic vectors for the vectorizer to enable both SLP and BB
vectorization to use those.  The vectorizer relies on vector lowering
to expand plus, minus and negate to bit operations but vector
lowering has a hard limit on the minimum number of elements per
work item.  Vectorizer costs for the testcase at hand work out
to vectorize a loop with just two work items per vector and that
causes element wise expansion and spilling.

The fix for now is to re-instantiate the hard limit, matching what
vector lowering does.  For the future the way to go is to emit the
lowered sequence directly from the vectorizer instead.

PR tree-optimization/108724
* tree-vect-stmts.cc (vectorizable_operation): Avoid
using word_mode vectors when vector lowering will
decompose them to elementwise operations.

* gcc.target/i386/pr108724.c: New testcase.

(cherry picked from commit dc87e1391c55c666c7ff39d4f0dea87666f25468)

middle-end/108625 - wrong folding due to misinterpreted !

The following fixes a problem with ! handling in genmatch which isn't
conservative enough when intermediate simplifications push to the
sequence but the final operation appears to just pick an existing
(but in this case newly defined in the sequence) operand. The easiest
fix is to disallow adding to the sequence when processing !.

PR middle-end/108625
* genmatch.cc (expr::gen_transform): Also disallow resimplification
from pushing to lseq with force_leaf.
(dt_simplify::gen_1): Likewise.

* gcc.dg/pr108625.c: New testcase.

(cherry picked from commit 605d1297b91c2c7c23ccfe669e66dda5791d1f55)

middle-end/108500 - replace recursive domtree DFS traversal

The following replaces the recursive DFS traversal of the dominator
tree in assign_dfs_numbers with a tree traversal using the fact
that we have recorded parents.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This makes r13-5325 somewhat obsolete, though not computing the
DFS numbers at all is beneficial in the cases where we perform
immediate CFG manipulations.

OK for trunk and later branch(es)?

Thanks,
Richard.

PR middle-end/108500
* dominance.cc (assign_dfs_numbers): Replace recursive DFS
with tree traversal algorithm.

(cherry picked from commit 97258480438db77e52f4b3947fa2c075b09d3fe1)

tree-optimization/107451 - SLP load vectorization issue

When vectorizing SLP loads with permutations we can access excess
elements when the load vector type is bigger than the group size
and the vectorization factor covers less groups than necessary
to fill it. Since we know the code will only access up to
group_size * VF elements in the unpermuted vector we can simply
fill the rest of the vector with whatever we want. For simplicity
this patch chooses to repeat the last group.

PR tree-optimization/107451
* tree-vect-stmts.cc (vectorizable_load): Avoid loading
SLP group members from group numbers in excess of the
vectorization factor.

* gcc.dg/torture/pr107451.c: New testcase.

(cherry picked from commit 7b2cf5041460859ca4f58e5da1308b7ef9129d8b)

tree-optimization/106904 - bogus -Wstringopt-overflow with vectors

The following avoids CSE of &ps->wp to &ps->wp.hwnd confusing
-Wstringopt-overflow by making sure to produce addresses to the
biggest container from vectorization. For this I introduce
strip_zero_offset_components which turns &ps->wp.hwnd into
&(*ps) and use that to base the vector data references on.
That will also work for addresses with variable components,
alternatively emitting pointer arithmetic via calling
get_inner_reference and gimplifying that would be possible
but likely more intrusive.

This is by no means a complete fix for all of those issues
(avoiding ADDR_EXPRs in favor of pointer arithmetic might be).
Other passes will have similar issues.

In theory that might now cause false negatives.

PR tree-optimization/106904
* tree.h (strip_zero_offset_components): Declare.
* tree.cc (strip_zero_offset_components): Define.
* tree-vect-data-refs.cc (vect_create_addr_base_for_vector_ref):
Strip zero offset components before building the address.

* gcc.dg/Wstringop-overflow-pr106904.c: New testcase.

(cherry picked from commit f8d136e50e6f82cba793483d910a2b2643108508)

Daily bump.

d: Fix undefined reference to lambda defined in private enum [PR109108]

Previously lambdas were connected to the module they were defined in.
Now they are emitted into every referencing compilation unit, and are
given one-only linkage.

PR d/109108

gcc/d/ChangeLog:

* decl.cc (function_defined_in_root_p): Remove.
(get_symbol_decl): Set DECL_LAMBDA_FUNCTION_P on function literals.
(start_function): Unconditionally unset DECL_EXTERNAL
(set_linkage_for_decl): Give lambda functions one-only linkage.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/imports/pr109108.d: New test.
* gdc.dg/torture/pr109108.d: New test.

(cherry picked from commit 423d34f61c43e400f0d5b837fe93c83963b2ecdd)

Enable scatter for generic

2023-03-06 Jan Hubicka <hubicka@ucw.cz>

PR target/108429
* config/i386/x86-tune.def (X86_TUNE_USE_SCATTER_2PARTS): Enable for
generic.
(X86_TUNE_USE_SCATTER_4PARTS): Likewise.
(X86_TUNE_USE_SCATTER): Likewise.

(cherry picked from commit b83acefb0409056b566133f66843ead6c04b8474)

Enable 512 bit vector for zen4

While internally 512 registers are splits into two 256 halves, 512 bit vectors
reduces number of instructions to retire and has chance to improve paralelism.
There are few tsvc benchmarks that improves significantly:

           runtime
benchmark  256bit  512bit
s2275      48.57   20.67    -58%
s311       32.29   16.06    -50%
s312       32.30   16.07    -50%
vsumr      32.30   16.07    -50%
s314       10.77   5.42     -50%
s313       21.52   10.85    -50%
vdotr      43.05   21.69    -50%
s316       10.80   5.64     -48%
s235       61.72   33.91    -45%
s161       15.91   9.95     -38%
s3251      32.13   20.31    -36%

And there are no benchmarks with off-noise regression.  The basic matrix
multiplication loop improves by 32%.  It is also expected that 512 bit
vectors are more power effecient (I can't masure that).

The down side is that loops with low trip counts may get slower when the
unvectorized prologue and epilogue is hit more often.  With SPECfp this
problem happens with x264 (12% regression) and bwaves (6% regression)
and this is tracked in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
and will need more work on vectorizer to support masked epilogues.

After some additional testing it seems that using 512 bit vectors by
default is now overall better choice.

Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow.

* config/i386/x86-tune.def (X86_TUNE_AVX256_OPTIMAL): Turn off
for znver4.

(cherry picked from commit a7502c4a614238ac3f80271886b217b156bdf923)

Daily bump.

c++: top level bind when rewriting coroutines [PR106188]

In the edge case of a coroutine not containing any locals, the ifcd/switch
temporaries would get added to the coroutine frame, corrupting its
layout. To prevent this, we can make sure there is always a BIND_EXPR at
the top of the function body, and thus, always a place for our new
temporaries to go without interfering with the coroutine frame.

PR c++/106188 - Incorrect frame layout after transforming conditional statement without top-level bind expression
PR c++/106713 - if (co_await ...) crashes with a jump to ud2

PR c++/106188
PR c++/106713

gcc/cp/ChangeLog:

* coroutines.cc (coro_rewrite_function_body): Ensure we have a
BIND_EXPR wrapping the function body.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr106188.C: New test.

d: Delay removing DECL_EXTERNAL from thunks until funcion has finished

Second part to fixing PR109108, don't blindly generate the associated
function definition of all referenced thunks in the compilation. Just
delay finishing a thunk until the function gets codegen itself. If the
function never gets a definition, then the thunk is left as "extern".

gcc/d/ChangeLog:

* decl.cc (finish_thunk): Unset DECL_EXTERNAL on thunk.
(make_thunk): Set DECL_EXTERNAL on thunk, don't call build_decl_tree.
(finish_function): Call finish_thunk on forward referenced thunks.

(cherry picked from commit d1bddcaf15a362d88c29337295a0aaaaaa037642)

d: Refactor DECL_ARGUMENT and DECL_RESULT generation to own function

When looking into PR109108, the reason why things go awry is because
of the logic around functions with thunks - they have their definitions
generated even when they are external. This subsequently then relied on
the detection of whether a function receiving codegen really is extern
or not, and this check ultimately prunes too much.

This is a first step to both removing the call to `build_decl_tree' from
`make_thunk' and the pruning of symbols within the `build_decl_tree'
visitor method for functions. Move the generation of DECL_ARGUMENT and
DECL_RESULT out of `build_decl_tree' and into their own functions.

gcc/d/ChangeLog:

* decl.cc (get_fndecl_result): New function.
(get_fndecl_arguments): New function.
(DeclVisitor::visit (FuncDeclaration *)): Adjust to call
get_fndecl_arguments.
(make_thunk): Adjust to call get_fndecl_arguments and
get_fndecl_result.
(start_function): Adjust to call get_fndecl_result.

(cherry picked from commit 499b07700f0e679a490c2e3b80ca7c382dd737ab)

Daily bump.

fortran: Reuse associated_dummy memory if previously allocated [PR108923]

This avoids making the associted_dummy field point to a new memory chunk
if it's already pointing somewhere, in which case doing so would leak the
previously allocated chunk.

PR fortran/108923

gcc/fortran/ChangeLog:

* intrinsic.cc (get_intrinsic_dummy_arg,
set_intrinsic_dummy_arg): Rename the former to the latter.
Remove the return value, add a reference to the lhs as argument,
and do the pointer assignment inside the function. Don't do
it if the pointer is already non-NULL.
(sort_actual): Update caller.

(cherry picked from commit 5c638095e7e0fa4de4e4f7326384a86830b25732)

fortran: Plug leak of associated_dummy memory. [PR108923]

This fixes a memory leak by accompanying the release of
gfc_actual_arglist elements' memory with a release of the
associated_dummy field memory (if allocated).
Actual argument copy is adjusted as well so that each copy can free
its field independently.

PR fortran/108923

gcc/fortran/ChangeLog:

* expr.cc (gfc_free_actual_arglist): Free associated_dummy
memory.
(gfc_copy_actual_arglist): Make a copy of the associated_dummy
field if it is set in the original element.

(cherry picked from commit 24c9edfa73632276d7698c103f35833f29804d98)

Daily bump.

Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector types

Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.

Committed after a bootstrapped and tested on x86_64-linux-gnu with no regressions

gcc/ChangeLog:

PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/vector-shift-1.c: New test.

(cherry picked from commit 193fccaa5c3525e979a989835c47c76d2c49d10c)

Daily bump.

tree-optimization: [PR108684] ICE in verify_ssa due to simple_dce_from_worklist

In simple_dce_from_worklist, we were removing an inline-asm which had a vdef.
We should not be removing inline-asm which have a vdef as this code
does not check to the store.
This fixes that oversight. This was a latent bug exposed recently
by both VRP and removal of stores to static starting to use
simple_dce_from_worklist.

Backported after bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/108684

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist):
Check all ssa names and not just non-vdef ones
before accepting the inline-asm.
Call unlink_stmt_vdef on the statement before
removing it.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/dce-inline-asm-1.c: New test.
* gcc.c-torture/compile/dce-inline-asm-2.c: New test.
* gcc.dg/tree-ssa/pr108684-1.c: New test.

co-authored-by: Andrew Macleod <amacleod@redhat.com>
(cherry picked from commit 6a5cb782d1486b378d70857c8efae558da0eb2cc)

libstdc++: Add missing free functions for atomic_flag [PR103934]

This patch adds -
  atomic_flag_wait
  atomic_flag_wait_explicit
  atomic_flag_notify
  atomic_flag_notify_explicit

Which were missed when commit 83a1be introduced C++20 atomic wait.

libstdc++-v3/ChangeLog:

PR libstdc++/103934
* include/std/atomic (atomic_flag_wait): Add.
(atomic_flag_wait_explicit): Add.
(atomic_flag_notify): Add.
(atomic_flag_notify_explicit): Add.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc:
Add test case to cover missing atomic_flag free functions.

(cherry picked from commit 56cf9372c0596c4df4003c72dc4665a306fbfe31)

Daily bump.

OpenMP/Fortran: Fix handling of optional is_device_ptr + bind(C) [PR108546]

For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).

Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.

PR middle-end/108546

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.

gcc/ChangeLog:

* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.

(cherry picked from commit 96ff97ff6574666a5509ae9fa596e7f2b6ad4f88)

Daily bump.

c++: ICE initing lifetime-extended constexpr var [PR107079]

We ICE on the simple:

 struct X { const X* x = this; };
 constexpr const X& x = X{};

where store_init_value initializes 'x' with

 &TARGET_EXPR <D.2768, {.x=(const struct X *) &<PLACEHOLDER_EXPR struct X>}>

but we must lifetime-extend via extend_ref_init_temps and we get

 _ZGR1x_.x = (const struct X *) &<PLACEHOLDER_EXPR struct X> >>>;, (const struct X &) &_ZGR1x_;

Since 'x' was declared constexpr, we do cxx_constant_init and we hit
the preeval code added in r269003 while evaluating the INIT_EXPR:

 _ZGR1x_.x = (const struct X *) &<PLACEHOLDER_EXPR struct X> >>>

but we have no ctx.ctor or ctx.object here so lookup_placeholder won't
find anything that could replace X and we ICE on
7861 /* A placeholder without a referent. We can get here when
7862 checking whether NSDMIs are noexcept, or in massage_init_elt;
7863 just say it's non-constant for now. */
7864 gcc_assert (ctx->quiet);
because cxx_constant_init means !ctx->quiet. It's not correct that
there isn't a referent. I think the following patch is a pretty
straightforward fix: pass the _ZGR var down to maybe_constant_init so
that it can replace the PLACEHOLDER_EXPR with _ZGR1x_.

The commented assert in the test doesn't pass: we complain that _ZGR1x_
isn't a constexpr variable because we don't implement DR2126 (PR101588).

PR c++/107079

gcc/cp/ChangeLog:

* call.cc (set_up_extended_ref_temp): Pass var to maybe_constant_init.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-nsdmi2.C: New test.

(cherry picked from commit 67b82bc1b29b82e4577df9491793624f3a8eb12f)

c++: error with constexpr operator() [PR107939]

Similarly to PR107938, this also started with r11-557, whereby cp_finish_decl
can call check_initializer even in a template for a constexpr initializer.

Here we are rejecting

 extern const Q q;

 template<int>
 constexpr auto p = q(0);

even though q has a constexpr operator(). It's deemed non-const by
decl_maybe_constant_var_p because even though 'q' is const it is not
of integral/enum type.

If fun is not a function pointer, we don't know if we're using it as an
lvalue or rvalue, so with this patch we pass 'any' for want_rval. With
that, p_c_e/VAR_DECL doesn't flat out reject the underlying VAR_DECL.

PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>: Pass
'any' when recursing on a VAR_DECL and not a pointer to function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ74.C: Remove dg-error.
* g++.dg/cpp1y/var-templ77.C: New test.

(cherry picked from commit e09bc034d1b4d692b409fa5af52ae34480a6f4dc)

c++: thinko in extract_local_specs [PR108998]

In order to fix PR100295, r13-4730-g18499b9f848707 attempted to make
extract_local_specs walk the given pattern twice, ignoring unevaluated
operands the first time around so that we prefer to process a local
specialization in an evaluated context if it appears in one (we process
each local specialization once even if it appears multiple times in the
pattern).

But there's a thinko in the patch, namely that we don't actually walk
the pattern twice since we don't clear the visited set for the second
walk (to avoid processing a local specialization twice) and so the root
node (and any node leading up to an unevaluated operand) is considered
visited already. So the patch effectively made extract_local_specs
ignore unevaluated operands altogether, which this testcase demonstrates
isn't quite safe (extract_local_specs never sees 'aa' and we don't record
its local specialization, so later we try to specialize 'aa' on the spot
with the args {{int},{17}} which causes us to nonsensically substitute
its auto with 17.)

This patch fixes this by refining the second walk to start from the
trees we skipped over during the first walk.

PR c++/108998

gcc/cp/ChangeLog:

* pt.cc (el_data::skipped_trees): New data member.
(extract_locals_r): Push to skipped_trees any unevaluated
contexts that we skipped over.
(extract_local_specs): For the second walk, start from each
tree in skipped_trees.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic11.C: New test.

(cherry picked from commit 341e6cd8d603a334fd34657a6b454176be1c6437)

c++: get_nsdmi in template context [PR108116]

Here during ahead of time checking of C{}, we indirectly call get_nsdmi
for C::m from finish_compound_literal, which in turn calls
break_out_target_exprs for C::m's (non-templated) initializer, during
which we build a call to A::~A and check expr_noexcept_p for it (from
build_vec_delete_1). But this is all done with processing_template_decl
set, so the built A::~A call is templated (whose form was recently
changed by r12-6897-gdec8d0e5fa00ceb2) which expr_noexcept_p doesn't
expect, and we crash.

This patch fixes this by clearing processing_template_decl before
the call to break_out_target_exprs from get_nsdmi. And since it more
generally seems we shouldn't be seeing (or producing) non-templated
trees in break_out_target_exprs, this patch also adds an assert to
that effect.

PR c++/108116

gcc/cp/ChangeLog:

* constexpr.cc (maybe_constant_value): Clear
processing_template_decl before calling break_out_target_exprs.
* init.cc (get_nsdmi): Likewise.
* tree.cc (break_out_target_exprs): Assert processing_template_decl
is cleared.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template24.C: New test.

(cherry picked from commit cf59c8983ef6590f0d69014f8dc8778b5b7691c6)

c++: template friend with variadic constraints [PR107853]

When instantiating a constrained hidden template friend, we substitute
into its template-head requirements in tsubst_friend_function. For this
substitution we use the template's full argument vector whose outer
levels correspond to the instantiated class's arguments and innermost
level corresponds to the template's own level-lowered generic arguments.

But for A<int>::f here, for which the relevant argument vector is
{{int}, {Us...}}, the substitution into (C<Ts, Us> && ...) triggers the
assert in use_pack_expansion_extra_args_p since one argument is a pack
expansion and the other isn't.

And for A<int, int>::f, for which the relevant argument vector is
{{int, int}, {Us...}}, the use_pack_expansion_extra_args_p assert would
also trigger but we first get a bogus "mismatched argument pack lengths"
error from tsubst_pack_expansion.

Sidestepping the question of whether tsubst_pack_expansion should be
able to handle such substitutions, it seems we can work around this by
using only the instantiated class's arguments and not also the template
friend's own generic arguments, which is consistent with how we normally
substitute into the signature of a member template.

PR c++/107853

gcc/cp/ChangeLog:

* constraint.cc (maybe_substitute_reqs_for): Substitute into
the template-head requirements of a template friend using only
its outer arguments via outer_template_args.
* cp-tree.h (outer_template_args): Declare.
* pt.cc (outer_template_args): Define, factored out and
generalized from ...
(ctor_deduction_guides_for): ... here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend12.C: New test.
* g++.dg/cpp2a/concepts-friend13.C: New test.

(cherry picked from commit bd1fc4a219d8c0fad0ec41002e895b49e384c1c2)

c++: explicit specialization and trailing requirements [PR107864]

Here we're crashing when using the explicit specialization of the
function template g with trailing requirements ultimately because
earlier decls_match (called indirectly from register_specialization) for
for the explicit specialization returned false since the template has
trailing requirements whereas the specialization doesn't.

In r12-2230-gddd25bd1a7c8f4, we fixed a similar issue concerning template
requirements instead of trailing requirements.  We could extend that fix
to ignore trailing requirement mismatches for explicit specializations
as well, but it seems cleaner to just propagate constraints from the
specialized template to the specialization when declaring an explicit
specialization so that decls_match will naturally return true in this
case.  And it looks like determine_specialization already does this,
albeit inconsistently (only when specializing a non-template member
function of a class template as in cpp2a/concepts-explicit-spec4.C).

So this patch makes determine_specialization consistently propagate
constraints from the specialized template to the specialization, which
in turn lets us get rid of the function_requirements_equivalent_p special
case added by r12-2230.

PR c++/107864

gcc/cp/ChangeLog:

* decl.cc (function_requirements_equivalent_p): Don't check
DECL_TEMPLATE_SPECIALIZATION.
* pt.cc (determine_specialization): Propagate constraints when
specializing a function template too.  Simplify by using
add_outermost_template_args.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/explicit-spec1a.C: New test.

(cherry picked from commit 36cabc257dfb7dd4f7625896891f6c5b195a0241)

c++: requires-expr and access checking [PR107179]

Like during satisfaction, we also need to avoid deferring access checks
during substitution of a requires-expr because the outcome of an access
check can determine the value of the requires-expr. Otherwise (in
deferred access checking contexts such as within a base-clause), the
requires-expr may evaluate to the wrong result, and along the way a
failed access check may leak out from it into a non-SFINAE context and
cause a hard error (as in the below testcase).

PR c++/107179

gcc/cp/ChangeLog:

* constraint.cc (tsubst_requires_expr): Make sure we're not
deferring access checks.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires31.C: New test.

(cherry picked from commit 40c34beef620ed13c4113c893ed4335ccc1b8f92)

Daily bump.

LoongArch: Stop -mfpu from silently breaking ABI [PR109000]

In the toolchain convention, we describe -mfpu= as:

"Selects the allowed set of basic floating-point instructions and
registers. This option should not change the FP calling convention
unless it's necessary."

Though not explicitly stated, the rationale of this rule is to allow
combinations like "-mabi=lp64s -mfpu=64".  This will be useful for
running applications with LP64S/F ABI on a double-float-capable
LoongArch hardware and using a math library with LP64S/F ABI but native
double float HW instructions, for a better performance.

And now a case in Linux kernel has again proven the usefulness of this
kind of combination.  The AMDGPU DCN kernel driver needs to perform some
floating-point operation, but the entire kernel uses LP64S ABI.  So the
translation units of the AMDGPU DCN driver need to be compiled with
-mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s
(or you can't link it with the other part of the kernel).

Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to
determine the floating calling convention.  This causes "-mfpu=64"
silently allow using $fa* to pass parameters and return values EVEN IF
-mabi=lp64s is used.  To make things worse, the generated object file
has SOFT-FLOAT set in the eflags field so the linker will happily link
it with other LP64S ABI object files, but obviously this will lead to
bad results at runtime.  And for now all loongarch64 CPU models (-march
settings) implies -mfpu=64 on by default, so the issue makes a single
"-mabi=lp64s" option basically broken (fortunately most projects for eg
the Linux kernel have used -msoft-float which implies both -mabi=lp64s
and -mfpu=none as we've recommended in the toolchain convention doc).

The fix is simple: use TARGET_*_FLOAT_ABI instead.

I consider this a bug fix: the behavior difference from the toolchain
convention doc is a bug, and generating object files with SOFT-FLOAT
flag but parameters/return values passed through FPRs is definitely a
bug.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk and
release/gcc-12 branch?

gcc/ChangeLog:

PR target/109000
* config/loongarch/loongarch.h (FP_RETURN): Use
TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT.
(UNITS_PER_FP_ARG): Likewise.

gcc/testsuite/ChangeLog:

PR target/109000
* gcc.target/loongarch/flt-abi-isa-1.c: New test.
* gcc.target/loongarch/flt-abi-isa-2.c: New test.
* gcc.target/loongarch/flt-abi-isa-3.c: New test.
* gcc.target/loongarch/flt-abi-isa-4.c: New test.

(cherry picked from commit 75eccddef5784bc5e262af31f535267a9c4e993e)

Daily bump.

Fortran: fix corner case of IBITS intrinsic [PR108937]

gcc/fortran/ChangeLog:

PR fortran/108937
* trans-intrinsic.cc (gfc_conv_intrinsic_ibits): Handle corner case
LEN argument of IBITS equal to BITSIZE(I).

gcc/testsuite/ChangeLog:

PR fortran/108937
* gfortran.dg/ibits_2.f90: New test.

(cherry picked from commit 6cce953ebec274f1468d5d3a0697cf05bb43b8f6)

Fortran: reject invalid CHARACTER length of derived type components [PR96024]

gcc/fortran/ChangeLog:

PR fortran/96024
* resolve.cc (resolve_component): The type of a CHARACTER length
expression must be INTEGER.

gcc/testsuite/ChangeLog:

PR fortran/96024
* gfortran.dg/pr96024.f90: New test.

(cherry picked from commit 31303c9b5bab200754cdb7ef8cd91ae4918f3018)

Fortran: improve checking of character length specification [PR96025]

gcc/fortran/ChangeLog:

PR fortran/96025
* parse.cc (check_function_result_typed): Improve type check of
specification expression for character length and return status.
(parse_spec): Use status from above.
* resolve.cc (resolve_fntype): Prevent use of invalid specification
expression for character length.

gcc/testsuite/ChangeLog:

PR fortran/96025
* gfortran.dg/pr96025.f90: New test.

(cherry picked from commit 6c1b825b3d6499dfeacf7c79dcf4b56a393ac204)

c++: variable template and targ deduction [PR108550]

In this test, we get a bogus error because we failed to deduce the auto in
constexpr auto is_pointer_v = is_pointer<Tp>::value;
to bool. Then ensure_literal_type_for_constexpr_object thinks the object
isn't literal and an error is reported.

This is another case of the interaction between tf_partial and 'auto',
where the auto was not reduced so the deduction failed. In more detail:
we have

 Wrap1<int>()

in the code and we need to perform OR -> fn_type_unification. The targ
list is incomplete, so we do
 tsubst_flags_t ecomplain = complain | tf_partial | tf_fndecl_type;
 fntype = tsubst (TREE_TYPE (fn), explicit_targs, ecomplain, NULL_TREE);
where TREE_TYPE (fn) is struct integral_constant <T402> (void). Then
we substitute the return type, which results in tsubsting is_pointer_v<int>.
is_pointer_v is a variable template with a placeholder type:

 template <class Tp>
 constexpr auto is_pointer_v = is_pointer<Tp>::value;

so we find ourselves in lookup_and_finish_template_variable. tf_partial is
still set, so finish_template_variable -> instantiate_template -> tsubst
won't reduce the level of auto. But then we do mark_used which eventually
calls do_auto_deduction which clears tf_partial, because we want to replace
the auto now. But we hadn't reduced auto's level so this fails. And
since we're not in an immediate context, we emit a hard error.

I suppose that when we reach lookup_and_finish_template_variable it's
probably time to clear tf_partial. (I added an assert and our testsuite
doesn't have a test whereby we get to lookup_and_finish_template_variable
while tf_partial is still active.)

PR c++/108550

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Clear tf_partial.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ70.C: New test.
* g++.dg/cpp1y/var-templ71.C: New test.
* g++.dg/cpp1y/var-templ72.C: New test.

c-family: avoid compile-time-hog in c_genericize [PR108880]

This fixes a compile-time hog with UBSan. This only happened in cc1 but
not cc1plus. The problem is ultimately that c_genericize_control_stmt/
STATEMENT_LIST -> walk_tree_1 doesn't use a hash_set to remember visited
nodes, so it kept on recursing for a long time. We should be able to
use the pset that c_genericize created. We just need to use walk_tree
instead of walk_tree_w_d so that the pset is explicit.

PR c/108880

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_genericize_control_stmt) <case STATEMENT_LIST>: Pass
pset to walk_tree_1.
(c_genericize): Call walk_tree with an explicit pset.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/pr108880.c: New test.

(cherry picked from commit 1370014f2ea02ec185cf1199027575916f79fe63)

c++: ICE with -Wmismatched-tags and member template [PR106259]

-Wmismatched-tags warns about the (harmless) struct/class mismatch.
For, e.g.,

 template<typename T> struct A { };
 class A<int> a;

it works by adding A<T> to the class2loc hash table while parsing the
class-head and then, while parsing the elaborate type-specifier, we
add A<int>. At the end of c_parse_file we go through the table and
warn about the class-key mismatches. In this PR we crash though; we
have

 template<typename T> struct A {
 template<typename U> struct W { };
 };
 struct A<int>::W<int> w; // #1

where while parsing A and #1 we've stashed
 A<T>
 A<T>::W
 A<int>::W<int>
into class2loc. Then in class_decl_loc_t::diag_mismatched_tags TYPE
is A<int>::W<int>, and specialization_of gets us A<int>::W, which
is not in class2loc, so we crash on gcc_assert (cdlguide). But it's
OK not to have found A<int>::W, we should just look one "level" up,
that is, A<T>::W.

It's important to handle class specializations, so e.g.

 template<>
 struct A<char> {
 template<typename U>
 class W { };
 };

where W's class-key is different than in the primary template above,
so we should warn depending on whether we're looking into A<char>
or into a different instantiation.

PR c++/106259

gcc/cp/ChangeLog:

* parser.cc (class_decl_loc_t::diag_mismatched_tags): If the first
lookup of SPEC didn't find anything, try to look for
most_general_template.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wmismatched-tags-11.C: New test.

(cherry picked from commit 71afd0628419c5d670701cb35bc9860380c7d9fb)

c++: can't eval PTRMEM_CST in incomplete class [PR107574]

Here we're attempting to evaluate a PTRMEM_CST in a class that hasn't
been completed yet, but that doesn't work:

        /* We can't lower this until the class is complete.  */
        if (!COMPLETE_TYPE_P (DECL_CONTEXT (member)))
          return cst;

and then this unlowered PTRMEM_CST is used as EXPR in

    tree op1 = build_nop (ptrdiff_type_node, expr);

and we crash in a subsequent cp_fold_convert which gets type=ptrdiff_type_node,
expr=PTRMEM_CST and does

  else if (TREE_CODE (expr) == PTRMEM_CST
           && same_type_p (TYPE_PTRMEM_CLASS_TYPE (type),
                           PTRMEM_CST_CLASS (expr)))

where TYPE_PTRMEM_CLASS_TYPE (type) is going to crash since the type
is ptrdiff_type_node.  We could just add a TYPE_PTRMEM_P check before
accessing TYPE_PTRMEM_CLASS_TYPE but I think it's nicer to explain why
we couldn't evaluate the expression.

PR c++/107574

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Emit an error when
a PTRMEM_CST cannot be evaluated.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/ptrmem-cst1.C: New test.

(cherry picked from commit de81e06273c613d7e06cbe2c8d9e72826c638056)

c++: ICE with constexpr variable template [PR107938]

Since r11-557, cp_finish_decl can call check_initializer even in
a template for a constexpr initializer. That ultimately leads to
convert_for_assignment and check_address_or_pointer_of_packed_member,
where we crash, because it doesn't expect that the CALL_EXPR is
a function object. Q has a constexpr operator(), but since we're
in a template, q(0) is a CALL_EXPR whose CALL_EXPR_FN is just
a VAR_DECL; it hasn't been converted to Q::operator<int>(&q, 0) yet.
I propose to robustify check_address_or_pointer_of_packed_member.

var-templ74.C has an XFAIL, subject to 107939.

I noticed that our -Waddress-of-packed-member tests weren't testing
member functions, added thus. (I was tempted to check
FUNCTION_POINTER_TYPE_P but that doesn't include METHOD_TYPE.)

PR c++/107938

gcc/c-family/ChangeLog:

* c-warn.cc (check_address_or_pointer_of_packed_member): Check
POINTER_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ73.C: New test.
* g++.dg/cpp1y/var-templ74.C: New test.
* g++.dg/warn/Waddress-of-packed-member3.C: New test.

(cherry picked from commit ea718febab2a1f6e58806738abf70f1c73c6a308)

Daily bump.

s390: libatomic: Fix 16 byte atomic {cas,load,store}

This is a follow-up to commit a4c6bd0821099f6b8c0f64a96ffd9d01a025c413
introducing a runtime check for alignment for 16 byte atomic
compare-exchange, load, and store.

libatomic/ChangeLog:

* config/s390/cas_n.c: New file.
* config/s390/load_n.c: New file.
* config/s390/store_n.c: New file.

(cherry picked from commit 9056d0df830c5a295d7594d517d409d10476990d)

d: Fix ICE on explicit immutable struct import [PR108877]

Const and immutable types are built as variants of the type they are
derived from, and TYPE_STUB_DECL is not set for these variants.

PR d/108877

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (EnumDeclaration *)): Call
make_import on TYPE_MAIN_VARIANT.
(ImportVisitor::visit (AggregateDeclaration *)): Likewise.
(ImportVisitor::visit (ClassDeclaration *)): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/imports/pr108877a.d: New test.
* gdc.dg/pr108877.d: New test.

(cherry picked from commit ce1cea3e22f58bbddde017f8a92e59bae8892339)

Daily bump.

asan: adjust module name for global variables

As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.

After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").

PR sanitizer/108834

gcc/ChangeLog:

* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).

gcc/testsuite/ChangeLog:

* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.

(cherry picked from commit 94c9b1bb79f63d000ebb05efc155c149325e332d)

rs6000/test: Adjust some test cases on partial vector [PR96373]

As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.

Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610728.html

PR target/96373

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/p9-vec-length-epil-1.c: Add -fno-trapping-math.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/pr96373.c: New test.

(cherry picked from commit 4f5a1198065dc078f8099db628da7b06a2666f34)

Daily bump.

RTEMS: Tune multilib selection

gcc/ChangeLog:

* config/riscv/t-rtems: Keep only -mcmodel=medany 64-bit multilibs.
Add non-compact 32-bit multilibs.

(cherry picked from commit 35a067020e41d97bc3be15b518b3dc2a64b4aae2)

Daily bump.

libstdc++: Simplify three helper functions into one

Broadcast is a very common function. This should reduce compile-time
effort.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/108030
* include/experimental/bits/simd.h (__vector_broadcast):
Implement via __vector_broadcast_impl instead of
__call_with_n_evaluations + 2 lambdas.
(__vector_broadcast_impl): New.

(cherry picked from commit 2e29e2fbeb8936e5c85cefaf547cba42e17e137b)

libstdc++: Fix -Wsign-compare issue

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_builtin.h (_S_set): Compare as
int. The actual range of these indexes is very small.

(cherry picked from commit ffa39f7120f6e83a567d7a83ff4437f6b41036ea)

libstdc++: Add missing constexpr on simd shift implementation

Resolves -Wtautological-compare warnings about `if
(__builtin_is_constant_evaluated())` in the implementations of these
functions.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h (_S_bit_shift_left)
(_S_bit_shift_right): Declare constexpr. The implementation was
already expecting constexpr evaluation.

(cherry picked from commit fa37ac2b59ed1c379b35dbf9bd58f7849f9fd5b5)

libstdc++: Fix uses of non-reserved names in simd header

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h (__extract_part, split):
Use reserved name for template parameter.

(cherry picked from commit bb920f561e983c64d146f173dc4ebc098441a962)

Daily bump.

libstdc++: Add noexcept-specifier to std::reference_wrapper::operator()

This isn't required by the standard, but there's an LWG issue suggesting
to add it.

Also use __invoke_result instead of result_of, to match the spec in
recent standards.

libstdc++-v3/ChangeLog:

* include/bits/refwrap.h (reference_wrapper::operator()): Add
noexcept-specifier and use __invoke_result instead of result_of.
* testsuite/20_util/reference_wrapper/invoke-noexcept.cc: New test.

(cherry picked from commit e47df5eb56c4e7aca0d3e50826e5aaa1887fa446)

libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]

With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internsl. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.

libstdc++-v3/ChangeLog:

PR libstdc++/108636
* include/bits/fs_path.h (path::path(string_view, _Type))
(path::_Cmpt::_Cmpt(string_view, _Type, size_t)): Move inline
definitions to ...
* src/c++17/fs_path.cc: ... here.
* testsuite/27_io/filesystem/path/108636.cc: New test.

(cherry picked from commit db8d6fc572ec316ccfcf70b1dffe3be0b1b37212)

Daily bump.

c++: ICE with redundant capture [PR108829]

Here we crash in is_capture_proxy:

 /* Location wrappers should be stripped or otherwise handled by the
 caller before using this predicate. */
 gcc_checking_assert (!location_wrapper_p (decl));

We only crash with the redundant capture:

 int abyPage = [=, abyPage] { ... }

because prune_lambda_captures is only called when there was a default
capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST.

The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated
correctly and so var_to_maybe_prune proceeded where it shouldn't.

Co-Authored by: Patrick Palka <ppalka@redhat.com>

PR c++/108829

gcc/cp/ChangeLog:

* pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P.
(tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to
prepend_one_capture.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-108829-2.C: New test.
* g++.dg/cpp0x/lambda/lambda-108829.C: New test.

(cherry picked from commit 02d8ab3e4e2f3d9dc12157a98c976d6698e71e29)

aarch64: Fix up bfmlal lane pattern [PR104921]

As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.

This patch fixes the constraint accordingly.

The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.

gcc/ChangeLog:

PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.

gcc/testsuite/ChangeLog:

PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.

(cherry picked from commit 277e1f30a5e4e634304a7b8a532825119f0ea47f)

Daily bump.

LoongArch: Fix multiarch tuple canonization

Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple.  For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf".  Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).

A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1].  Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical.  This commit also
aligns GCC with the revision.

Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.

[1]: https://github.com/loongson/LoongArch-Documentation/pull/80

gcc/ChangeLog:

* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).

(cherry picked from commit 017849d9d88f021770a90f12fffec9aa2425ed27)

Daily bump.

libstdc++: Fix incorrect function call in -ffast-math optimization

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_math.h (__hypot): Bitcasting
between scalars requires the __bit_cast helper function instead
of simd_bit_cast.

(cherry picked from commit a5de17d9120dde7e6598a05ea4d1556c2783c69b)

libstdc++: Fix incorrect __builtin_is_constant_evaluated calls

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_not_equal_to, _SimdImplX86::_S_less)
(_SimdImplX86::_S_less_equal): Do not call
__builtin_is_constant_evaluated in constexpr-if.

(cherry picked from commit 1fd3836463c65f695831ef04c7dbda1e7a1794ba)

libstdc++: printf format string fix in testsuite

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/bits/verify.h
(verify::verify): Use %zx for size_t in format string.

(cherry picked from commit 07e4648b4ab86a76ab040ea18a756e388652c882)

libstdc++: Document timeout and timeout-factor of simd tests

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/README.md: Document the timeout
and timeout-factor directives. Minor typo fixed.

(cherry picked from commit b0f4b166ada3b92da5f2917ac3f4397e99d1b58f)

libstdc++: Ensure __builtin_constant_p isn't lost on the way

The more expensive code path should only be taken if it can be optimized
away.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h
(_SimdWrapper::_M_is_constprop_none_of)
(_SimdWrapper::_M_is_constprop_all_of): Return false unless the
computed result still satisfies __builtin_constant_p.

(cherry picked from commit fea34ee491104f325682cc5fb75683b7d74a0a3b)

Fortran: error recovery on invalid assumed size reference [PR104554]

gcc/fortran/ChangeLog:

PR fortran/104554
* resolve.cc (check_assumed_size_reference): Avoid NULL pointer
dereference.

gcc/testsuite/ChangeLog:

PR fortran/104554
* gfortran.dg/pr104554.f90: New test.

(cherry picked from commit a418129273725fd02e881e6fb5e0877287a1356c)

Daily bump.

Fix PR target/90458

This is the incompatibility of -fstack-clash-protection with Windows SEH.
Now the Windows ports always enable TARGET_STACK_PROBE, which means that
the stack is always probed (out of line) so -fstack-clash-protection does
nothing more.

gcc/
PR target/90458
* config/i386/i386.cc (ix86_compute_frame_layout): Disable the
effects of -fstack-clash-protection for TARGET_STACK_PROBE.
(ix86_expand_prologue): Likewise.