git.ipfire.org Git - thirdparty/gcc.git/log

x86: Update gcc.target/i386/apx-interrupt-1.c

ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone.  Since

commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Apr 13 12:20:42 2025 -0700

    APX: Don't use red-zone with 32 GPRs and no caller-saved registers

disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.

PR target/119784
* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
directives.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 5ed2fa4768f3d318b8ace5bd4a095596e06fad7b)

APX: Don't use red-zone with 32 GPRs and no caller-saved registers

Don't use red-zone when there are no caller-saved registers with 32 GPRs
since 128-byte red-zone is too small for 31 GPRs.

gcc/

PR target/119784
* config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone
with 32 GPRs and no caller-saved registers.

gcc/testsuite/

PR target/119784
* gcc.target/i386/pr119784a.c: New test.
* gcc.target/i386/pr119784b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801)

Extend check-function-bodies to allow label and directives

As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit d6bb1e257fc414d21bc31faa7ddecbc93a197e3c)

RISC-V: Put jump table in text for large code model

Large code model assume the data or rodata may put far away from
text section. So we need to put jump table in text section for
large code model.

gcc/ChangeLog:

* config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
large code model.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/jump-table-large-code-model.c: New test.

(cherry picked from commit 1d9e02bb7e0af4f3d3eaaa1a0f4961970aba5560)

RISC-V: Fix vec_duplicate[bimode] expander [PR119572].

Since r15-9062-g70391e3958db79 we perform vector bitmask initialization
via the vec_duplicate expander directly. This triggered a latent bug in
ours where we missed to mask out the single bit which resulted in an
execution FAIL of pr119114.c

The attached patch adds the 1-masking of the broadcast operand.

PR target/119572

gcc/ChangeLog:

* config/riscv/autovec.md: Mask broadcast value.

(cherry picked from commit 716d39f0a248c1003033e6a312c736180790ef70)

aarch64: Split aarch64_combinev16qi before RA [PR115258]

Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second input to the second half of the output).

In the worst case, the RA might allocate things so that the destination
of the aarch64_combinev16qi is the second input followed by the first
input.  In that case, the split form of aarch64_combinev16qi uses three
eors to swap the registers around.

This PR is about a test where this worst case occurred.  And given the
insn description, that allocation doesn't semm unreasonable.

early-ra should (hopefully) mean that we're now better at allocating
subregs of vector registers.  The upcoming RA subreg patches should
improve things further.  The best fix for the PR therefore seems
to be to split the combination before RA, so that the RA can see
the underlying moves.

Perhaps it even makes sense to do this at expand time, avoiding the need
for aarch64_combinev16qi entirely.  That deserves more experimentation
though.

gcc/
PR target/115258
* config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
the split before reload.
* config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize
into a form that handles pseudo registers.

gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258.c: New test.

(cherry picked from commit 39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec)

aarch64: Avoid unnecessary use of 2-input TBLs [PR115258]

When using TBL for (say) a V4SI permutation, the aarch64 port first
asks target-independent code to lower to a V16QI permutation.
Then, during code generation, an input like:

  (reg:V4SI R)

gets converted to:

  (subreg:V16QI (reg:V4SI R) 0)

aarch64_vectorize_vec_perm_const had:

  d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
  if (op0 == op1)
    d.op1 = d.op0;
  else
    d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;

But subregs (unlike regs) are not shared, so the op0 == op1 check
always failed for this case.  We'd then force each subreg into a
fresh register, meaning that during the later:

  aarch64_expand_vec_perm_1 (d->target, d->op0, d->op1, sel);

there is no way for aarch64_expand_vec_perm_1 to realise that
d->op0 and d->op1 are the same value.  It would therefore generate
a two-input TBL in the testcase, even though a single-input TBL
is enough.

I'm not sure forcing subregs to a fresh regiter is a good idea --
it caused problems for copysign & co. -- but that's not something
to fiddle with during stage 4.  Using op0 == op1 for rtx equality
is independently wrong, so we might as well just fix that for now.

The patch gets rid of extra MOVs that are a regression from GCC 14.

The testcase is based on one from Kugan, itself based on TSVC.

gcc/
PR target/115258
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Use
d.one_vector_p to decide whether op1 should be a copy of op0.

gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258_2.c: New test.

Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
(cherry picked from commit 31dcf941ac78c4b1b01dc4b2ce9809f0209153b8)

vect: Enforce dr_with_seg_len::align precondition [PR116125]

tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks.  The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment.  As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:

  /* The minimum common alignment of DR's start address, SEG_LEN and
     ACCESS_SIZE.  */
  unsigned int align;

PR115192 was a case in which this assumption didn't hold.  The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed.  The alignment was therefore double the
access size.

In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines.  But I think that was
misconceived.  The precondition means that we should cap the
alignment at source instead.

Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.

gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.

gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.

(cherry picked from commit e8651b80aeb86da935035e218747a6b41b611497)

LoongArch: Fix invalid subregs in xorsign [PR118501]

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign<mode>3): Use
force_lowpart_subreg.

(cherry picked from commit 9ddf4a6cc650360e620c8fd97f550bf833cc177a)

aarch64: Fix invalid subregs in xorsign [PR118501]

In the testcase, we try to use xorsign on:

   (subreg:DF (reg:TI R) 8)

i.e. the highpart of the TI.  xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation.  In cases like this, we need to force the highpart into
a fresh register first.

gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.

gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.

(cherry picked from commit 6612b8e55471fabd2071a9637a06d3ffce2b05a6)

aarch64: Use force_lowpart_subreg in a BFI splitter [PR119133]

lowpart_subreg ICEs are the gift that keeps giving. This is another
case where we need to use force_lowpart_subreg instead, to handle
cases where the input is already a subreg and where the combined
subreg is not allowed as a single operation.

We don't need to check can_create_pseudo_p since the input should
be a hard register rather than a subreg if !can_create_pseudo_p.

gcc/
PR target/119133
* config/aarch64/aarch64.md
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use
force_lowpart_subreg.

gcc/testsuite/
PR target/119133
* gcc.dg/torture/pr119133.c: New test.

(cherry picked from commit 5ae621e2e86c00d1fb13ef6839d0c3bace762ac8)

Avoid using POINTER_DIFF_EXPR for overlap checks [PR119399]

In r10-4803-g8489e1f45b50600c I'd used POINTER_DIFF_EXPR to subtract
the two pointers involved in an overlap test. I'm not sure whether
I'd specifically chosen that over MINUS_EXPR or not; if so, the only
reason I can think of is that it is probably faster on targets with
PSImode pointers. Regardless, as the PR points out, subtracting
unrelated pointers using POINTER_DIFF_EXPR is undefined behaviour.

gcc/
PR tree-optimization/119399
* tree-data-ref.cc (create_waw_or_war_checks): Use a MINUS_EXPR
on two converted pointers, rather than converting a POINTER_DIFF_EXPR
on the pointers.

gcc/testsuite/
PR tree-optimization/119399
* gcc.dg/vect/pr119399.c: New test.

(cherry picked from commit 4c8c373495d7d863dfb7102726ac3b4b41685df4)

Add force_lowpart_subreg

optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg.

The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.

Later patches will make more use of the new function.

gcc/
* explow.h (force_lowpart_subreg): Declare.
* explow.cc (force_lowpart_subreg): New function.

(cherry picked from commit 5f40d1c0cc6ce91ef28d326b8707b3f05e6f239c)

Make force_subreg emit nothing on failure

While adding more uses of force_subreg, I realised that it should
be more careful to emit no instructions on failure. This kind of
failure should be very rare, so I don't think it's a case worth
optimising for.

gcc/
* explow.cc (force_subreg): Emit no instructions on failure.

(cherry picked from commit 01044471ea39f9be4803c583ef2a946abc657f99)

libstdc++: Adjust comment in <numeric>

We don't need to mention ranges::out_value_result in this comment,
because <numeric> doesn't care about that name.

libstdc++-v3/ChangeLog:

* include/std/numeric: Only mention ranges::iota in comment.

libstdc++: Add test for not using reserved name 'ranges' before C++20

This is a test for a bug that was present on trunk, because 'ranges' is
not a reserved name before C++20.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Check ranges is not used as an
identifier before C++20.

libstdc++: Do not define __cpp_lib_ranges_iota in <ranges>

In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from <ranges>, but __cpp_lib_ranges_iota
should be defined in <ranges> at all.

libstdc++-v3/ChangeLog:

* include/std/ranges (__glibcxx_want_ranges_iota): Do not
define.

(cherry picked from commit 25775e73ea4d40a55a26b71c42cc6509caf4845f)

libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

Before this patch, using std::ranges::iota required including
<algorithm> when it should have been sufficient to only include
<numeric>.

For the backport to the release branch ranges::iota is defined in
<bits/ranges_algobase.h> so that it's available in both <numeric> and
<algorithm>. This avoids breaking code that compiles successfully using
existing releases where <algorithm> defines ranges::iota.

libstdc++-v3/ChangeLog:

PR libstdc++/108760
* include/bits/ranges_algo.h (ranges::out_value_result)
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
<bits/ranges_algobase.h>.
* include/bits/ranges_algobase.h (ranges::out_value_result):
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
here.
* include/std/numeric: Include <bits/ranges_algobase.h>.
* testsuite/25_algorithms/iota/1.cc: Renamed to ...
* testsuite/26_numerics/iota/2.cc: ... here.

Signed-off-by: Michael Levine <mlevine55@bloomberg.net>
(cherry picked from commit 0bb1db32ccf54a9de59bea718f7575f7ef22abf5)

libstdc++: Fix ranges::move and ranges::move_backward to use iter_move [PR105609]

The ranges::move and ranges::move_backward algorithms are supposed to
use ranges::iter_move(iter) instead of std::move(*iter), which matters
for an iterator type with an iter_move overload findable by ADL.

Currently those algorithms use std::__assign_one which uses std::move,
so define a new ranges::__detail::__assign_one helper function that uses
ranges::iter_move.

libstdc++-v3/ChangeLog:

PR libstdc++/105609
* include/bits/ranges_algobase.h (__detail::__assign_one): New
helper function.
(__copy_or_move, __copy_or_move_backward): Use new function
instead of std::__assign_one.
* testsuite/25_algorithms/move/constrained.cc: Check that
ADL iter_move is used in preference to std::move.
* testsuite/25_algorithms/move_backward/constrained.cc:
Likewise.

(cherry picked from commit 3866ca796d5281d33f25b4165badacf8f198c6d1)

libstdc++: Reuse std::__assign_one in <bits/ranges_algobase.h>

Use std::__assign_one instead of ranges::__assign_one. Adjust the uses,
because std::__assign_one has the arguments in the opposite order (the
same order as an assignment expression).

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (ranges::__assign_one): Remove.
(__copy_or_move, __copy_or_move_backward): Use std::__assign_one
instead of ranges::__assign_one.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit d0a9ae1321f01c33b7ee377249cad30187061c0c)

libstdc++: Fix ranges::copy_backward for a single memcpyable element [PR117121]

The result iterator needs to be decremented before writing to it.

Improve the PR 108846 tests for all of std::copy, std::copy_n,
std::copy_backward, and the std::ranges versions.

libstdc++-v3/ChangeLog:

PR libstdc++/117121
* include/bits/ranges_algobase.h (copy_backward): Decrement
output iterator before assigning one element through it.
* testsuite/25_algorithms/copy/108846.cc: Ensure the algorithm's
effects are correct for a single memcpyable element.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.

(cherry picked from commit 27f6b376e8e196c7c85c8b47436cd2f2993768da)

libstdc++: Do not use use memmove for 1-element ranges [PR108846,PR116471]

This commit ports the fixes already applied by r13-6372-g822a11a1e642e0
to the range-based versions of copy/move algorithms.

When doing so, a further bug (PR116471) was discovered in the
implementation of the range-based algorithms: although the algorithms
are already constrained by the indirectly_copyable/movable concepts,
there was a failing static_assert in the memmove path.

This static_assert checked that iterator's value type was assignable by
using the is_copy_assignable (move) type traits. However, this is a
problem, because the traits are too strict when checking for constness;
a type like

struct S { S& operator=(S &) = default; };

is trivially copyable (and thus could benefit of the memmove path),
but it does not satisfy is_copy_assignable because the operator takes
by non-const reference.

Now, the reason for the check to be there is because a type with
a deleted assignment operator like

struct E { E& operator=(const E&) = delete; };

is still trivially copyable, but not assignable. We don't want
algorithms like std::ranges::copy to compile because they end up
selecting the memmove path, "ignoring" the fact that E isn't even
copy assignable.

But the static_assert isn't needed here any longer: as noted before,
the ranges algorithms already have the appropriate constraints; and
even if they didn't, there's now a non-discarded codepath to deal with
ranges of length 1 where there is an explicit assignment operation.

Therefore, this commit removes it. (In fact, r13-6372-g822a11a1e642e0
removed the same static_assert from the non-ranges algorithms.)

libstdc++-v3/ChangeLog:

PR libstdc++/108846
PR libstdc++/116471
* include/bits/ranges_algobase.h (__assign_one): New helper
function.
(__copy_or_move): Remove a spurious static_assert; use
__assign_one for memcpyable ranges of length 1.
(__copy_or_move_backward): Likewise.
* testsuite/25_algorithms/copy/108846.cc: Extend to range-based
algorithms, and cover both memcpyable and non-memcpyable
cases.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.
* testsuite/25_algorithms/move/108846.cc: Likewise.
* testsuite/25_algorithms/move_backward/108846.cc: Likewise.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
(cherry picked from commit 5938e0681c3907b2771ce6717988416b0ddd2f54)

libstdc++: Add missing header to <bits/ranges_algobase.h> for std::__memcmp

As noticed by Michael Levine.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h: Include <bits/stl_algobase.h>.

(cherry picked from commit 674d213ab91871652e96dc2de06e6f50682eebe0)

RISC-V: revert pr114194 tests on gcc-14 [PR118601]

The gcc-14 backport that split the pr114194 testcase for rv32 and rv64
would only generate the expected rv32 sequence if commit
6b315907c0353f71169a7555e653d29a981fef67 had also been backported, but
it wasn't. Without it, we get the same code as before on both rv32
and rv64, so revert to the original test.

for gcc/testsuite/ChangeLog

PR target/118601
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Restore.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: Remove.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: Likewise.

RISC-V: adjust testcase for gcc-14 [PR118182]

The pr118182-2.c testcase backported from gcc-15 depended on the late
combine pass after register allocation to substitute the zero constant
into the pred_broadcast to get to the expected vmv.s.x instruction.
Without that pass, we get a mfmv.s.f instead. Expect that on gcc-14.

for gcc/testsuite/ChangeLog

PR target/118182
* gcc.target/riscv/rvv/autovec/pr118182-2.c: Adjust.

Daily bump.

discriminators: Fix assigning discriminators on edge [PR113546]

The problem here is there was a compare debug since the discriminators
would still take into account debug statements. For the edge we would look
at the first statement after the labels and that might have been a debug statement.
So we need to skip over debug statements otherwise we could get different
discriminators # with and without -g.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/113546

gcc/ChangeLog:

* tree-cfg.cc (first_non_label_stmt): Rename to ...
(first_non_label_nondebug_stmt): This and use gsi_start_nondebug_after_labels_bb.
(assign_discriminators): Update call to first_non_label_nondebug_stmt.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr113546-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit c5ca45b8069229b6ad9bc845f03f46340f6316d7)

c++: wrong targs in satisfaction diagnostic context line [PR99214]

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a satisfaction failure
error, e.g.  the below testcase without this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = int]'
and with this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = char]'.

This patch fixes this by passing the original 'args' to push_tinst_level,
which ought to properly correspond to 't'.

PR c++/99214

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Pass the
original ARGS to push_tinst_level.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic20.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 00966a7fdb1478b3af5254ff3a80a3ef336c5a94)

libstdc++: Document thread-safety for COW std::string [PR21334]

The gcc4-compatible copy-on-write std::string does not conform to the
C++11 requirements on data race avoidance in standard containers.
Specifically, calling non-const member functions such as begin() and
data() needs to do the "copy on write" operation and so is most
definitely a modification of the object. As such, those non-const
members must not be called concurrently with any other uses of the
string object.

libstdc++-v3/ChangeLog:

PR libstdc++/21334
* doc/xml/manual/using.xml: Document that container data race
avoidance rules do not apply to COW std::string.
* doc/html/*: Regenerate.

(cherry picked from commit dd35f66287b7cca196a720c9641e463255dceb1c)

phiopt: Reset the number of iterations information of a loop when changing an exit from the loop [PR117243]

After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure:
      |
    middle-bb -----|
      |            |
      |   |----|   |
    phi<1, 2>  |   |
    cond       |   |
      |        |   |
      |--------+---|

Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8,
due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an
infinite one.
So phiopt would come along and change the condition to be unconditionally true,
we change the inner loop to being an infinite one but don't reset the estimate
on the loop and cleanup cfg comes along and changes it into one loop but also
does not reset the estimate of the loop. Then the loop unrolling uses the old estimate
and decides to add an unreachable there.o
So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to
how cleanupcfg does it when merging some basic blocks.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117243
PR tree-optimization/116749

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Reset loop
estimates if the cond_block was an exit to a loop.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117243-1.c: New test.
* gcc.dg/torture/pr117243-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit b7c69cc072ef0da36439ebc55c513b48e68391b7)

phiopt: Fix value_replacement for middle bb having phi nodes [PR118922]

After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the
following cfg structure:
```
  <bb 5> [local count: 1014686024]:
  if (h_6 != 0)
    goto <bb 7>; [94.50%]
  else
    goto <bb 6>; [5.50%]

  <bb 6> [local count: 114863530]:
  # h_6 = PHI <0(4), 1(5)>

  <bb 7> [local count: 1073741824]:
  # f_8 = PHI <0(5), h_6(6)>
  _9 = f_8 ^ 1;
  a.0_10 = a;
  _11 = _9 + a.0_10;
  if (_11 != -117)
    goto <bb 5>; [94.50%]
  else
    goto <bb 8>; [5.50%]
```

value_replacement would incorrectly think the middle bb (6) was empty and so it decides
to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`.
But since the there is an incoming phi node to bb6 defining h_6 that is incorrect.

The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false.
This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to
other checks.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/118922

gcc/ChangeLog:

* tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p
to false when there is phi nodes for the middle bb.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118922-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 7232c005afb5002cdfd0a2dbd0e8b8f2d80250ce)

Daily bump.

Revert very recent backport of changes to the type system

The backport of the change made for PR c/113688 onto the 14 branch a couple
of weeks ago has seriously broken the LTO compiler for the Ada language on
the 14 branch, because it changes the GCC type system for the sake of C in
a way that is not compatible with simple discriminated types in Ada.  To be
more precise, useless_type_conversion_p now returns true for some (view-)
conversions that are needed by the rest of the compiler.

gcc/
PR lto/119792
Revert

Backported from master:
    2024-12-12  Martin Uecker  <uecker@tugraz.at>

PR c/113688
PR c/114014
PR c/114713
PR c/117724
* tree.cc (gimple_canonical_types_compatible_p): Add exception.
(verify_type): Add exception.

gcc/lto/
PR lto/119792
Revert

Backported from master:
    2024-12-12  Martin Uecker  <uecker@tugraz.at>
* lto-common.cc (hash_canonical_type): Add exception.

gcc/testsuite/
* gcc.dg/pr113688.c: Delete.
* gcc.dg/pr114014.c: Likewise.
* gcc.dg/pr114713.c: Likewise.
* gcc.dg/pr117724.c: Likewise

testcase: Add testcase for already fixed PR [PR118476]

This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.

Pushed as obvious after a quick test.

PR tree-optimization/118476

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118476-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit d45a6502d1ec87d43f1a39f87cca58f1e28369c8)

match: Reject non-ssa name/min invariants in gimple_extract [PR116412]

After the conversion for phiopt's conditional operand
to use maybe_push_res_to_seq, it was found that gimple_extract
will extract out from REALPART_EXPR/IMAGPART_EXPR/VCE and BIT_FIELD_REF,
a memory load. But that extraction was not needed as memory loads are not
simplified in match and simplify. So gimple_extract should return false
in those cases.

Changes since v1:
* Move the rejection to gimple_extract from factor_out_conditional_operation.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116412

gcc/ChangeLog:

* gimple-match-exports.cc (gimple_extract): Return false if op0
was not a SSA name nor a min invariant for REALPART_EXPR/IMAGPART_EXPR/VCE
and BIT_FIELD_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116412-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit c7b76a076cb2c6ded7ae208464019b04cb0531a2)

vec-lowering: Fix ABSU lowering [PR111285]

ABSU_EXPR lowering incorrectly used the resulting type
for the new expression but in the case of ABSU the resulting
type is an unsigned type and with ABSU is folded away. The fix
is to use a signed type for the expression instead.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/111285

gcc/ChangeLog:

* tree-vect-generic.cc (do_unop): Use a signed type for the
operand if the operation was ABSU_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vect-absu-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit ad0084337e901ddaedd48c14e7a5dad9fc2a093e)

backprop: Fix deleting of a phi node [PR116922]

The problem here is remove_unused_var is called on a name that is
defined by a phi node but it deletes it like removing a normal statement.
remove_phi_node should be called rather than gsi_remove for phinodes.

Note there is a possibility of using simple_dce_from_worklist instead
but that is for another day.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116922

gcc/ChangeLog:

* gimple-ssa-backprop.cc (remove_unused_var): Handle phi
nodes correctly.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116922.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit cea87c84eacdb422caeada734ba5138c994d7022)

aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]

Early-RA was considering throwing instructions as being dead and removing
them even if -fno-delete-dead-exceptions was in use. This fixes that oversight.

Built and tested for aarch64-linux-gnu.

PR target/116927

gcc/ChangeLog:

* config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
that throw are not dead with -fno-delete-dead-exceptions.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116927-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit edec4bfc99744b48da3ffde1e4f39c9aceecfd42)

phiopt: Fix VCE moving by rewriting it into cast [PR116098]

Phiopt match_and_simplify might move a well defined VCE assign statement
from being conditional to being uncondtitional; that VCE might no longer
being defined. It will need a rewrite into a cast instead.

This adds the rewriting code to move_stmt for the VCE case.
This is enough to fix the issue at hand. It should also be using rewrite_to_defined_overflow
but first I need to move the check to see a rewrite is needed into its own function
and that is causing issues (see https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663938.html).
Plus this version is easiest to backport.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116098

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): Rewrite VCEs from integer to integer
types to case.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr116098-2.c: New test.
* g++.dg/torture/pr116098-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 1f619fe25925a5f79b9c33962e7a72e1f9fa4444)

Fortran: fix issue with impure elemental subroutine and interface [PR119656]

PR fortran/119656

gcc/fortran/ChangeLog:

* interface.cc (gfc_compare_actual_formal): Fix front-end memleak
when searching for matching interfaces.
* trans-expr.cc (gfc_conv_procedure_call): If there is a formal
dummy corresponding to an absent argument, use its type, and only
fall back to inferred type otherwise.

gcc/testsuite/ChangeLog:

* gfortran.dg/optional_absent_13.f90: New test.

(cherry picked from commit 334545194d9023fb9b2f72ee0dcde8af94930f25)

Add testcase for PR lto/119792

It demonstrates a serious LTO breakage for the Ada language.

gcc/testsuite/
PR lto/119792
* gnat.dg/lto29.adb: New test.
* gnat.dg/lto29_pkg.ads: New helper.

c++: Properly fold <COND_EXPR>.*<COMPONENT> [PR114525]

We've been miscompiling the following since r0-51314-gd6b4ea8592e338 (I
did not go compile something that old, and identified this change via
git blame, so might be wrong)

=== cut here ===
struct Foo { int x; };
Foo& get (Foo &v) { return v; }
void bar () {
  Foo v; v.x = 1;
  (true ? get (v) : get (v)).*(&Foo::x) = 2;
  // v.x still equals 1 here...
}
=== cut here ===

The problem lies in build_m_component_ref, that computes the address of
the COND_EXPR using build_address to build the representation of
  (true ? get (v) : get (v)).*(&Foo::x);
and gets something like
  &(true ? get (v) : get (v))  // #1
instead of
  (true ? &get (v) : &get (v)) // #2
and the write does not go where want it to, hence the miscompile.

This patch replaces the call to build_address by a call to
cp_build_addr_expr, which gives #2, that is properly handled.

PR c++/114525

gcc/cp/ChangeLog:

* typeck2.cc (build_m_component_ref): Call cp_build_addr_expr
instead of build_address.

gcc/testsuite/ChangeLog:

* g++.dg/expr/cond18.C: New test.

(cherry picked from commit 35ce9afc84a63fb647a90cbecb2adf3e748178be)

Daily bump.

rtl-optimization/119689 - compare-debug failure with LRA

The previous change to fix LRA rematerialization broke compare-debug
for i586 bootstrap. Fixed by using prev_nonnote_nondebug_insn
instead of prev_nonnote_insn.

PR rtl-optimization/119689
PR rtl-optimization/115568
* lra-remat.cc (create_cands): Use prev_nonnote_nondebug_insn
to check whether insn2 is directly before insn.

* g++.target/i386/pr119689.C: New testcase.

(cherry picked from commit 088887de7717a22b1503760e9b79dfbe22a0f428)

[PR115568][LRA]: Use more strict output reload check in rematerialization

  In this PR case LRA rematerialized a value from inheritance insn
instead of output reload one.  This resulted in considering a
rematerilization candidate value available when it was actually
not.  As a consequence an insn after rematerliazation used the
unexpected value and this use resulted in fp exception.  The patch
fixes this bug.

gcc/ChangeLog:

PR rtl-optimization/115568
* lra-remat.cc (create_cands): Check that output reload insn is
adjacent to given insn.  Update a comment.

gcc/testsuite/ChangeLog:

PR rtl-optimization/115568
* gcc.target/i386/pr115568.c: New.

(cherry picked from commit 98545441308c2ae4d535f14b108ad6551fd927d5)

Daily bump.

c++: avoid ARM -Wunused-value [PR114970]

Because of the __builtin_is_constant_evaluated, maybe_constant_init in
expand_default_init fails, so the constexpr constructor isn't folded until
cp_fold, which builds a COMPOUND_EXPR in case the enclosing expression is
relying on the ARM behavior of returning 'this'.

As in other places, avoid -Wunused-value on artificial COMPOUND_EXPR.

PR c++/114970

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold): Suppress warnings on
return_this COMPOUND_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/opt/is_constant_evaluated4.C: New test.

(cherry picked from commit 4acdfb71d4fdaa43c2707ad7b2fb7b2b7bddfc42)

[PATCH v2] RISC-V: Fixbug for slli + addw + zext.w into sh[123]add + zext.w

Assuming we have the following variables:

unsigned long long a0, a1;
unsigned int a2;

For the expression:

a0 = (a0 << 50) >> 49;  // slli a0, a0, 50 + srli a0, a0, 49
a2 = a1 + a0;           // addw a2, a1, a0 + slli a2, a2, 32 + srli a2, a2, 32

In the optimization process of ZBA (combine pass), it would be optimized to:

a2 = a0 << 1 + a1;      // sh1add a2, a0, a1 + zext.w a2, a2

This is clearly incorrect, as it overlooks the fact that a0=a0&0x7ffe, meaning
that the bits a0[32:14] are set to zero.

gcc/ChangeLog:

* config/riscv/bitmanip.md: The optimization can only be applied if
the high bit of operands[3] is set to 1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-09.c: New test.
* gcc.target/riscv/zba-shNadd-10.c: New test.

(cherry picked from commit dd6ebc0a3473a830115995bdcaf8f797ebd085a3)

Daily bump.

c++: nested lambda capture pack [PR119345]

tsubst_stmt already registers a local capture proxy as a
local_specialization of both an outer capture proxy and the captured
variable; we also need to do that in add_extra_args.

PR c++/119345

gcc/cp/ChangeLog:

* pt.cc (add_extra_args): Also register a specialization
of the captured variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ14.C: New test.

(cherry picked from commit 5957b9919c9ecda6e4ca198086f8bb9ea215232c)

c++: lambda in constraint of lambda [PR119175]

Here when we went to mangle the constraints of from<0>, the outer lambda has
no mangling scope, but the inner one was treated as having the outer one as
its scope. And mangling the outer one means mangling its constraints, which
include the inner one. So infinite recursion.

But a lambda closure type isn't a scope that anything should have for
mangling, the inner lambda should also have no mangling scope.

PR c++/119175

gcc/cp/ChangeLog:

* mangle.cc (decl_mangling_context): Look through lambda type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-lambda23.C: New test.

(cherry picked from commit 39892d9618ee0f06dd09271589878b0df7b1e75d)

c++: self-dependent alias template [PR117530]

Here, instantiating B<short> means instantiating A<short>, which means
instantiating B<short>.  And then when we go to register the initial
instantiation, it conflicts with the inner one.  Fixed by checking after
tsubst whether there's already something in the hash table.  We already did
something much like this in tsubst_decl, but that doesn't handle this case.

PR c++/117530

gcc/cp/ChangeLog:

* pt.cc (instantiate_template): Check retrieve_specialization after
tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval27.C: New test.

(cherry picked from commit d034c78c7be613db3c25fddec1dd50222327117b)

c++: alias_ctad_tweaks ICE w/ inherited CTAD [PR119687]

With inherited CTAD the set of guides may be a two-dimensional overload
set (i.e. OVERLOADs of OVERLOADs) so alias_ctad_tweaks (which also does
the inherited CTAD transformation) needs to use the 2D-aware lkp_iterator
instead of ovl_iterator, or better yet use the more idiomatic lkp_range.

PR c++/119687

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Use lkp_range / lkp_iterator
instead of ovl_iterator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited8.C: New test.

Reviewed-by: Jason Merill <jason@redhat.com>
(cherry picked from commit 493974aa0ad8b94dbeb61f00d2acc57c94fd4809)

libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]

The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.

Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.

Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.

For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.

(cherry picked from commit 90c578654a2c96032aa6621449859243df5f641b)

libstdc++: Define __is_pair variable template for C++11

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (__is_pair): Define for C++11 and
C++14 as well.

(cherry picked from commit dd08cdccc36d084eda0e2748c772f6bf9a7f412f)

libstdc++: Fix test broken when using COW std::string

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_map/96088.cc (test03): Fix increments
value when _GLIBCXX_USE_CXX11_ABI is equal to 0.

(cherry picked from commit d01dc97a26d2f5034ca135f46094aa52c44cc90a)

libstdc++: Always instantiate key_type to compute hash code [PR115285]

Even if it is possible to compute a hash code from the inserted arguments
we need to instantiate the key_type to guaranty hash code consistency.

Preserve the lazy instantiation of the mapped_type in the context of
associative containers.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_S_forward_key<_Kt>): Always return a temporary
key_type instance.
* testsuite/23_containers/unordered_map/96088.cc: Adapt to additional instanciation.
Also check that mapped_type is not instantiated when there is no insertion.
* testsuite/23_containers/unordered_multimap/96088.cc: Adapt to additional
instanciation.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/pr115285.cc: New test case.

(cherry picked from commit ee030b28004eade3da872e7ae62a526a2940a705)

RISC-V: Disable unsupported vsext/vzext patterns for XTheadVector.

XThreadVector does not support the vsext/vzext instructions; however,
due to the reuse of RVV optimizations, it may generate these instructions
in certain cases. To prevent the error "Unknown opcode 'th.vsext.vf2',"
we should disable these patterns.

V2:
Change the value of dg-do in the test case from assemble to compile, and
remove the -save-temps option.

gcc/ChangeLog:

* config/riscv/vector.md: Disable vsext/vzext for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vsext.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vzext.c: New test.

(cherry picked from commit 196b45caca0aae57a95bffcdd5c188994317de08)

RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.

https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html

PR target/118601

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.

Reported-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 580f571be6ce80aa71fb80e7b16e01824f088229)

RISC-V: Add a new constraint to ensure that the vl of XTheadVector does not get a non-zero immediate

Although we have handled the vl of XTheadVector correctly in the
expand phase and predicates, the results show that the work is
still insufficient.

In the curr_insn_transform function, the insn is transformed from:
(insn 69 67 225 12 (set (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 209)
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 143 [ _xx ])
            (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])))
     (expr_list:REG_DEAD (reg/v:RVVM8SF 143 [ _xx ])
        (nil)))
to
(insn 69 284 225 11 (set (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 104 v8 [orig:143 _xx ] [143])
            (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])))
     (nil))

Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).

This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.

The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this will
conflict with RVV (RISC-V Vector Extension).

Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.

PR target/116593

gcc/ChangeLog:

* config/riscv/constraints.md (vl): New.
* config/riscv/thead-vector.md: Replacing rK with rvl.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector.
* g++.target/riscv/rvv/xtheadvector/pr116593.C: New test.

(cherry picked from commit 3024b12f2cde5db3bf52b49b07e32ef3065929fb)

RISC-V: Enable and adjust the testsuite for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.

(cherry picked from commit ab24171d237a9138714f0e6d2bb38fd357ccaed9)

[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

error: unrecognizable insn:

(insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0)
        (unspec:RVVM1SF [
                (const_vector:RVVM1SF repeat [
                        (const_double:SF 0.0 [0x0.0p+0])
                    ])
                (reg:DI 0 zero)
                (const_int 1 [0x1])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
            ] UNSPEC_TH_VWLDST)) -1
     (nil))
during RTL pass: mode_sw

PR target/116591

gcc/ChangeLog:

* config/riscv/vector.md: Add restriction to call pred_th_whole_mov.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.

(cherry picked from commit 8564d0948c72df0a66d7eb47e15c6ab43e9b25ce)

RISC-V: Fix the behavior for multilib-generator with --cmodel=large on rv32

Large code model is only supported on RV64, so we don't need to
generate the multilibs for RV32 with --cmodel=large. And the compact
code model is something we don't supported on upstream (which is
accidentally added in the past), so we need to remove it.

gcc/ChangeLog:

* config/riscv/multilib-generator: Remove the compact code model
and check large code model for RV32.

(cherry picked from commit 72dff34bcdd6f05b64bbf07739ab815e673b5946)

Daily bump.

libstdc++: Fix constraint recursion in basic_const_iterator operator- [PR115046]

It was proposed in PR112490 to also adjust basic_const_iterator's friend
operator-(sent, iter) overload alongside the r15-7757-g4342c50ca84ae5
adjustments to its comparison operators, but we lacked a concrete
testcase demonstrating fixable constraint recursion there.  It turns out
Hewill Kang's PR115046 is such a testcase!  So this patch makes the same
adjustments to that overload as well, fixing PR115046.  The LWG 4218 P/R
will need to get adjusted too.

PR libstdc++/115046
PR libstdc++/112490

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (basic_const_iterator::operator-):
Replace non-dependent basic_const_iterator function parameter with
a dependent one of type basic_const_iterator<_It2> where _It2
matches _It.
* testsuite/std/ranges/adaptors/as_const/1.cc (test04): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit d69f73c0334486f3c66937388f02008736809e87)

c++: ICE with nested default targ lambdas [PR119574]

In GCC 14 we fixed PR116567 in a more conservative way that doesn't
distinguish between the two kinds of deferred substitutions, and so
for PR119574 we instead ICE from get_innermost_template_args due to
TMPL_PARMS_DEPTH of the lambda, 2, being greater than the depth of the
augmented args, 1.

This patch works around the ICE in a best effort kind of way by guarding
the get_innermost_template_args call appropriately; I don't think it's
possible to get this completely right in GCC 14 without backporting the
proper fix for PR116567.

Note that lambda-targ13b.C present in the GCC 15 version of this patch[1]
never worked in GCC 14, and still doesn't work, which is why it's not
present in this patch.

[1]: r15-9350-gf3862ab07943d1

PR c++/119574

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): Don't call
get_innermost_template_args if we're requesting too many
levels.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ13.C: New test.
* g++.dg/cpp2a/lambda-targ13a.C: New test.

RISC-V: Fix vid const vector expander for non-npatterns size steps

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2, 4, 4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return value[pr117286]

This patch fixes following ICE:

test.c: In function 'func':
test.c:37:24: internal compiler error: Segmentation fault
   37 |     vfloat16mf2_t vc = __riscv_vlmul_trunc_v_f16m1_f16mf2(vb);
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The root cause is that vlmul_trunc has a null return value.
gimple_call <__riscv_vlmul_trunc_v_f16m1_f16mf2, NULL, vb_13>
                                                 ^^^

Passed the rv64gcv_zvfh regression test.

Singed-off-by: Li Xu <xuli1@eswincomputing.com>
PR target/117286

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Do not expand NULL return.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr117286.c: New test.

RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]

This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483

If prev and next satisfy the following rules, we should forbid the case
(next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
Otherwise, the tail elements of next will be polluted.

DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew,
max_sew_overlap_and_next_ratio_valid_for_prev_sew_p,
always_false, use_max_sew_and_lmul_with_next_ratio)

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
PR target/117483

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117483.c: New test.

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes. Essentially there's no way for the uninit analysis code to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

RISC-V: Error early with V and no M extension.

For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.

RISC-V: Reject 'd' extension with ILP32E ABI

Also add a testcase for -mabi=lp64d where 'd' is required.

gcc/ChangeLog:

PR target/116111
* config/riscv/riscv.cc (riscv_option_override): Add error.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-41.c: New test.
* gcc.target/riscv/pr116111.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].

In PR116149 we choose a wrong vector length which causes wrong values in
a reduction.  The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length.  For the
non-scalar variants the respective operand has the correct non-widened
mode.  For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit.  This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1).  Those end up
in the reduction causing the wrong result.

This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.

gcc/ChangeLog:

PR target/116149

* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116149.c: New test.

[RISC-V][PR target/116240] Ensure object is a comparison before extracting arguments

This was supposed to go out the door yesterday, but I kept getting interrupted.

The target bits for rtx costing can't assume the rtl they're given actually
matches a target pattern.   It's just kind of inherent in how the costing
routines get called in various places.

In this particular case we're trying to cost a conditional move:

(set (dest) (if_then_else (cond) (true) (false))

On the RISC-V port the backend only allows actual conditionals for COND.  So
something like (eq (reg) (const_int 0)).  In the costing code for if-then-else
we did something like

(XEXP (XEXP (cond, 0), 0)))

Which fails miserably if COND is a terminal node like (reg) rather than (ne
(reg) (const_int 0)

So this patch tightens up the RTL scanning to ensure that we have a comparison
before we start looking at the comparison's arguments.

Run through my tester without incident, but I'll wait for the pre-commit tester
to run through a cycle before pushing to the trunk.

Jeff

ps.   We probably could support a naked REG for the condition and internally convert it to (ne (reg) (const_int 0)), but I don't think it likely happens with any regularity.

PR target/116240
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Ensure object is a
comparison before looking at its arguments.

gcc/testsuite
* gcc.target/riscv/pr116240.c: New test.

RISC-V: Delete duplicate '#define RISCV_DWARF_VLENB'

gcc/ChangeLog:

* config/riscv/riscv.h (RISCV_DWARF_VLENB): Delete.

RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

PR target/116305

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take
BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng <zhijin.zeng@spacemit.com>

RISC-V: Add missing mode_idx for vrol and vror

We add pattern for vector rotate, but seems like we forgot adding
mode_idx which used in AVL propgation (riscv-avlprop.cc).

gcc/ChangeLog:

* config/riscv/vector.md (mode_idx): Add vrol and vror.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/rotr.c: New.

RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to. For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).

As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.

The culprit in PR116086 is

_12 = BIT_FIELD_REF <vect_cst__42, 128, 128>;

which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.

PR target/116086

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract<mode><v_half>): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.

[PATCH v4] [target/116592] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for XTheadVector

Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.

PR target/116592

gcc/ChangeLog:

* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.

RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass

This patch fixes a bug in the current vsetvl pass.  The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions.  However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction.  This can lead to incorrect vsetvl eliminations, as is
shown in the testcase.  In this patch, we create a `dest_vl` variable for
this scenerio.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.

riscv: Fix duplicate assmbler label in @tlsdesc<mode> insn

Use %= instead of maintaining a sequence number manually, so that it
doesn't result in a duplicate assembler label when the insn is duplicated.

PR target/116693
* config/riscv/riscv.cc (riscv_legitimize_tls_address): Don't pass
seqno to gen_tlsdesc and remove it.
* config/riscv/riscv.md (@tlsdesc<mode>): Remove operand 1. Use
%= instead of %1 in template.

[PATCH] RISC-V: Allow zero operand for DI variants of vssubu.vx

The RISC-V vector machine description relies on the helper function
`sew64_scalar_helper` to emit actual insns for the DI variants of
vssub.vx and vssubu.vx. This works with vssub.vx, but can cause
problems with vssubu.vx with the scalar operand being constant zero,
because `has_vi_variant_p` returns false, and the operand will be taken
without being loaded into a reg. The attached testcases can cause an
internal compiler error as a result.

Allowing a constant zero operand in those insns seems to be a simple
solution that only affects minimum existing code.

gcc/ChangeLog:

* config/riscv/vector.md: Allow zero operand for DI variants of
vssubu.vx

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vssubu-1.c: New test.
* gcc.target/riscv/rvv/base/vssubu-2.c: New test.

[PATCH] RISC-V: Fix th.extu operands exceeding range on rv32.

The Combine Pass may generate zero_extract instructions that are out of range.
Drawing from other architectures like AArch64, we should impose restrictions
on the "*th_extu<mode>4" pattern.

gcc/
* config/riscv/thead.md (*th_extu<mode>4): Fix th.extu
operands exceeding range on rv32.

gcc/testsuite/
* gcc.target/riscv/xtheadbb-extu-4.c: New.

[PATCH 1/2] RISC-V: Fix the outer_code when calculating the cost of SET expression.

I think it is a typo. When calculating the 'SET_SRC (x)' cost,
outer_code should be set to SET.

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Fix the outer_code
when calculating the cost of SET expression.

[PATCH v3] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

gcc/ChangeLog:

* config/riscv/riscv.md: Change "truncate" to unspec for the Zfa extension on rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.

RISC-V: Ensure vtype for full-register moves [PR117544].

As discussed in PR117544 the VTYPE register is not preserved across
function calls. Even though vmv1r-like instructions operate
independently of the actual vtype they still require a valid vtype. As
we cannot guarantee that the vtype is valid we must make sure to emit a
vsetvl between a function call and a vmv1r.v.

This patch makes the necessary changes by splitting the full-reg-move
insns into patterns that use the vtype register and adding vmov to the
types of instructions requiring a vset.

PR target/117544

gcc/ChangeLog:

* config/riscv/vector.md (*mov<mode>_whole): Split.
(*mov<mode>_fract): Ditto.
(*mov<mode>): Ditto.
(*mov<mode>_vls): Ditto.
(*mov<mode>_reg_whole_vtype): New pattern with vtype use.
(*mov<mode>_fract_vtype): Ditto.
(*mov<mode>_vtype): Ditto.
(*mov<mode>_vls_vtype): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-4.c: Expect vsetvl.
* gcc.target/riscv/rvv/base/pr117544.c: New test.

RISC-V: Add assert for insn operand out of range access [PR117878][NFC]

According to the the initial analysis of PR117878, the ice comes from
the out-of-range operand access for recog_data.operand[]. Thus, add
one assert here to expose this explicitly.

PR target/117878

gcc/ChangeLog:

* config/riscv/riscv-v.cc (vlmax_avl_type_p): Add assert for
out of range access.
(nonvlmax_avl_type_p): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix compress shuffle pattern [PR117383].

This patch makes vcompress use the tail-undisturbed policy by default
and also uses the proper VL.

PR target/117383

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Use TU policy.
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c:
Expect tu.
* gcc.target/riscv/rvv/autovec/pr117383.c: New test.

[RISC-V][PR target/106544] Avoid ICEs due to bogus asms

This is a fix for a bug Andrew P filed a while back where essentially a poorly
crafted asm statement could trigger a ICE during assembly output.  Various
cases will use INTVAL (op) without verifying the operand is a CONST_INT node
first.

The usual way to handle this is via output_operand_lossage, which this patch
implements.

I focused primarily on the CONST_INT cases, there could well be other problems
in this space, if so they should get distinct bugs with testcases.

Tested in my tester on rv32 and rv64.  Waiting for pre-commit testing before
moving forward.

PR target/106544
gcc/

* config/riscv/riscv.cc (riscv_print_operand): Issue an error for
invalid operands rather than invalidly accessing INTVAL of an
object that is not a CONST_INT.  Fix one error string for 'N'.

gcc/testsuite
* gcc.target/riscv/pr106544.c: New test.

[PATCH] riscv: add mising masking in lrsc expander (PR118137)

gcc:
PR target/118137
* config/riscv/sync.md ("lrsc_atomic_exchange<mode>"): Apply mask
to shifted value.

gcc/testsuite:
PR target/118137
* gcc.dg/atomic/pr118137.c: New.

RISC-V: Disallow negative step for interleaving [PR117682]

Hi,

in PR117682 we build an interleaving pattern

  { 1, 201, 209, 25, 161, 105, 113, 185, 65, 9,
    17, 89, 225, 169, 177, 249, 129, 73, 81, 153,
    33, 233, 241, 57, 193, 137, 145, 217, 97, 41,
    49, 121 };

with negative step expecting wraparound semantics due to -fwrapv.

For building interleaved patterns we have an optimization that
does e.g.
  {1, 209, ...} = { 1, 0, 209, 0, ...}
and
  {201, 25, ...} >> 8 = { 0, 201, 0, 25, ...}
and IORs those.

The optimization only works if the lowpart bits are zero.  When
overflowing e.g. with a negative step we cannot guarantee this.

This patch makes us fall back to the generic merge handling for negative
steps.

I'm not 100% certain we're good even for positive steps.  If the
step or the vector length is large enough we'd still overflow and
have non-zero lower bits.  I haven't seen this happen during my
testing, though and the patch doesn't make things worse, so...

Regtested on rv64gcv_zvl512b.  Let's see what the CI says.

Regards
Robin

PR target/117682

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fall back to
merging if either step is negative.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117682.c: New test.

RISC-V: Fix vsetvl compatibility predicate [PR118154].

In PR118154 we emit strided stores but the first of those does not
always have the proper VTYPE.  That's because we erroneously delete
a necessary vsetvl.

In order to determine whether to elide

(1)
      Expr[7]: VALID (insn 116, bb 17)
        Demand fields: demand_ratio_and_ge_sew demand_avl
        SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(reg:DI 0 zero)

when e.g.

(2)
      Expr[3]: VALID (insn 360, bb 15)
        Demand fields: demand_sew_lmul demand_avl
        SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(reg:DI 0 zero)
        VL=(reg:DI 13 a3 [345])

is already available, we use
sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p.

(1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8.  (2) has ratio = 64,
though, so we cannot directly elide (1).

This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p.

PR target/118154

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define.
(pre_vsetvl::earliest_fuse_vsetvl_info): Use.
(pre_vsetvl::pre_global_vsetvl_info): New predicate with equal
ratio.
* config/riscv/riscv-vsetvl.def: Use.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.

RISC-V: Fix code gen for reduction with length 0 [PR118182]

`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
return value will be the start value even if the length is 0.

However current code gen in RISC-V backend is not meet that semantic, it will
result a random garbage value if length is 0.

Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
        # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
        vsetvli zero,a5,e64,m1,ta,ma
        vfmv.s.f        v2,fa5     # insn 1
        vfredosum.vs    v1,v1,v2   # insn 2
        vfmv.f.s        fa5,v1     # insn 3

insn 1:
- vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
insn 2:
- vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
(v-spec say: `If vl=0, no operation is performed and the destination register
is not updated.`)
insn 3:
- vfmv.f.s will move the value from v1 even VL=0, so this is safe.

So how we fix that? we need two fix for that:

1. insn 1: need always execute with VL=1, so that we can guarantee it will
           always work as expect.
2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
           all reduction patterns, then we can guarantee vd[0] will contain the
           start value when vl=0

For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
(start value).

Change since V3:
- Rename _AV to _VL0_SAFE for readability.
- Use non-VL0_SAFE version if VL is const or VLMAX.
- Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
- Two more testcase.

gcc/ChangeLog:

PR target/118182
* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
argument for expand_reduction.
(*widen_reduc_plus_scal_<mode>): Ditto.
(*fold_left_widen_plus_<mode>): Ditto.
(*mask_len_fold_left_widen_plus_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
expand_reduction.
(reduc_smax_scal_<mode>): Ditto.
(reduc_umax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_umin_scal_<mode>): Ditto.
(reduc_and_scal_<mode>): Ditto.
(reduc_ior_scal_<mode>): Ditto.
(reduc_xor_scal_<mode>): Ditto.
(reduc_plus_scal_<mode>): Ditto.
(reduc_smax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_fmax_scal_<mode>): Ditto.
(reduc_fmin_scal_<mode>): Ditto.
(fold_left_plus_<mode>): Ditto.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_reduction): Add one more
argument for reduction code for vl0-safe.
* config/riscv/riscv-protos.h (expand_reduction): Ditto.
* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
reduction.
(ANY_REDUC_VL0_SAFE): New.
(ANY_WREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
(reduc_op): Add _VL0_SAFE variant of reduction.
(order) Ditto.
* config/riscv/vector.md (@pred_<reduc_op><mode>): New.

gcc/testsuite/ChangeLog:

PR target/118182
* gfortran.target/riscv/rvv/pr118182.f: New.
* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.

RISC-V: Move fortran testcase to gfortran.target

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111395.f90: Move this file to...
* gfortran.target/riscv/rvv/pr111395.f90: ...here.
* gcc.target/riscv/rvv/fortran/pr111566.f90: Move this file to...
* gfortran.target/riscv/rvv/pr111566.f90: ...here.
* gcc.target/riscv/rvv/rvv-fortran.exp: Move this file to...
* gfortran.target/riscv/rvv/rvv.exp: ...here.

[PR target/118357] RISC-V: Disable fusing vsetvl instructions by VSETVL_VTYPE_CHANGE_ONLY for XTheadVector.

In RVV 1.0, the instruction "vsetvli zero,zero,*" indicates that the
available vector length (avl) does not change. However, in XTheadVector,
this same instruction signifies that the avl should take the maximum value.
Consequently, when fusing vsetvl instructions, the optimization labeled
"VSETVL_VTYPE_CHANGE_ONLY" is disabled for XTheadVector.

PR target/118357

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Function change_vtype_only_p always
returns false for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr118357.c: New test.

[RISC-V][PR target/116308] Fix generation of initial RTL for atomics

While this wasn't originally marked as a regression, it almost certainly is
given that older versions of GCC would have used libatomic and would not have
ICE'd on this code.

Basically this is another case where we directly used simplify_gen_subreg when
we should have used gen_lowpart.

When I fixed a similar bug a while back I noted the code in question as needing
another looksie.  I think at that time my brain saw the mixed modes (SI & QI)
and locked up.  But the QI stuff is just the shift count, not some deeper
issue.  So fixing is trivial.

We just replace the simplify_gen_subreg with a gen_lowpart and get on with our
lives.

Tested on rv64 and rv32 in my tester.  Waiting on pre-commit testing for final
verdict.

PR target/116308
gcc/
* config/riscv/riscv.cc (riscv_lshift_subword): Use gen_lowpart
rather than simplify_gen_subreg.

gcc/testsuite/

* gcc.target/riscv/pr116308.c: New test.

[RISC-V][PR target/116256] Fix incorrect return value for predicate

Another bug found while chasing paths to fix the remaining issues in pr116256.

This case is sometimes benign when the optimizers are enabled.  But could show
up in a -O0 compile with some patterns I was playing around with.

Basically we have a predicate that is meant to return true if bits set in the
operand are all consecutive.

That predicate would return the wrong value when presented with (const_int 0)
indicating it had a run of on bits when obviously no bits are on 😉

It's pretty obvious once you look at the implementation.

if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0)
  return false
return true;

The right shift is always going to produce 0.  0 + 1 = 1 which is a power of 2.
So exact_log2 returns 0 and we get a true result rather than a false result.

The fix is trivial.  "<=".  While inside we might as well fix the formatting.

Tested on rv32 and rv64 in my tester.  Waiting on upstream pre-commit testing
to render a verdict.

PR target/116256
gcc/
* config/riscv/predicates.md (consecutive_bits_operand): Properly
handle (const_int 0).

LoongArch: Fix awk / sed usage for compatibility

Tested with nawk, mawk, and gawk.

gcc/ChangeLog:

* config/loongarch/genopts/gen-evolution.awk: remove
usage of "asort".
* config/loongarch/genopts/genstr.sh: replace sed with awk.

(cherry picked from commit 6ed8c17c2bce631ae370d93164ceb6c1b5adf925)

LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

Avoid using gensub that FreeBSD awk lacks, use gsub and split those each
of gawk, mawk, and FreeBSD awk provides.

Reported-by: mpysw@vip.163.com
Link: https://man.freebsd.org/cgi/man.cgi?query=awk
gcc/ChangeLog:

* config/loongarch/genopts/gen-evolution.awk: Avoid using gensub
that FreeBSD awk lacks.

(cherry picked from commit 92ca72b41a74aef53978cadbda33dd38b69d3ed3)