git.ipfire.org Git - thirdparty/gcc.git/log

arm: [MVE intrinsics] factorize vqshlq vshlq

Factorize vqshlq and vshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N)
(MVE_SHIFT_N, MVE_SHIFT_R): New.
(mve_insn): Add vqshl, vshl.
* config/arm/mve.md (mve_vqshlq_n_<supf><mode>)
(mve_vshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqshlq_r_<supf><mode>, mve_vshlq_r_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_r_<supf><mode>): ... this.
(mve_vqshlq_m_r_<supf><mode>, mve_vshlq_m_r_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_r_<supf><mode>): ... this.
(mve_vqshlq_m_n_<supf><mode>, mve_vshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): Transform
into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.

arm: [MVE intrinsics] rework vrshlq vqrshlq

Implement vrshlq, vqrshlq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle
vqrshlq, vrshlq.
* config/arm/arm_mve.h (vrshlq): Remove.
(vrshlq_m_n): Remove.
(vrshlq_m): Remove.
(vrshlq_x): Remove.
(vrshlq_u8): Remove.
(vrshlq_n_u8): Remove.
(vrshlq_s8): Remove.
(vrshlq_n_s8): Remove.
(vrshlq_u16): Remove.
(vrshlq_n_u16): Remove.
(vrshlq_s16): Remove.
(vrshlq_n_s16): Remove.
(vrshlq_u32): Remove.
(vrshlq_n_u32): Remove.
(vrshlq_s32): Remove.
(vrshlq_n_s32): Remove.
(vrshlq_m_n_u8): Remove.
(vrshlq_m_n_s8): Remove.
(vrshlq_m_n_u16): Remove.
(vrshlq_m_n_s16): Remove.
(vrshlq_m_n_u32): Remove.
(vrshlq_m_n_s32): Remove.
(vrshlq_m_s8): Remove.
(vrshlq_m_s32): Remove.
(vrshlq_m_s16): Remove.
(vrshlq_m_u8): Remove.
(vrshlq_m_u32): Remove.
(vrshlq_m_u16): Remove.
(vrshlq_x_s8): Remove.
(vrshlq_x_s16): Remove.
(vrshlq_x_s32): Remove.
(vrshlq_x_u8): Remove.
(vrshlq_x_u16): Remove.
(vrshlq_x_u32): Remove.
(__arm_vrshlq_u8): Remove.
(__arm_vrshlq_n_u8): Remove.
(__arm_vrshlq_s8): Remove.
(__arm_vrshlq_n_s8): Remove.
(__arm_vrshlq_u16): Remove.
(__arm_vrshlq_n_u16): Remove.
(__arm_vrshlq_s16): Remove.
(__arm_vrshlq_n_s16): Remove.
(__arm_vrshlq_u32): Remove.
(__arm_vrshlq_n_u32): Remove.
(__arm_vrshlq_s32): Remove.
(__arm_vrshlq_n_s32): Remove.
(__arm_vrshlq_m_n_u8): Remove.
(__arm_vrshlq_m_n_s8): Remove.
(__arm_vrshlq_m_n_u16): Remove.
(__arm_vrshlq_m_n_s16): Remove.
(__arm_vrshlq_m_n_u32): Remove.
(__arm_vrshlq_m_n_s32): Remove.
(__arm_vrshlq_m_s8): Remove.
(__arm_vrshlq_m_s32): Remove.
(__arm_vrshlq_m_s16): Remove.
(__arm_vrshlq_m_u8): Remove.
(__arm_vrshlq_m_u32): Remove.
(__arm_vrshlq_m_u16): Remove.
(__arm_vrshlq_x_s8): Remove.
(__arm_vrshlq_x_s16): Remove.
(__arm_vrshlq_x_s32): Remove.
(__arm_vrshlq_x_u8): Remove.
(__arm_vrshlq_x_u16): Remove.
(__arm_vrshlq_x_u32): Remove.
(__arm_vrshlq): Remove.
(__arm_vrshlq_m_n): Remove.
(__arm_vrshlq_m): Remove.
(__arm_vrshlq_x): Remove.
(vqrshlq): Remove.
(vqrshlq_m_n): Remove.
(vqrshlq_m): Remove.
(vqrshlq_u8): Remove.
(vqrshlq_n_u8): Remove.
(vqrshlq_s8): Remove.
(vqrshlq_n_s8): Remove.
(vqrshlq_u16): Remove.
(vqrshlq_n_u16): Remove.
(vqrshlq_s16): Remove.
(vqrshlq_n_s16): Remove.
(vqrshlq_u32): Remove.
(vqrshlq_n_u32): Remove.
(vqrshlq_s32): Remove.
(vqrshlq_n_s32): Remove.
(vqrshlq_m_n_u8): Remove.
(vqrshlq_m_n_s8): Remove.
(vqrshlq_m_n_u16): Remove.
(vqrshlq_m_n_s16): Remove.
(vqrshlq_m_n_u32): Remove.
(vqrshlq_m_n_s32): Remove.
(vqrshlq_m_s8): Remove.
(vqrshlq_m_s32): Remove.
(vqrshlq_m_s16): Remove.
(vqrshlq_m_u8): Remove.
(vqrshlq_m_u32): Remove.
(vqrshlq_m_u16): Remove.
(__arm_vqrshlq_u8): Remove.
(__arm_vqrshlq_n_u8): Remove.
(__arm_vqrshlq_s8): Remove.
(__arm_vqrshlq_n_s8): Remove.
(__arm_vqrshlq_u16): Remove.
(__arm_vqrshlq_n_u16): Remove.
(__arm_vqrshlq_s16): Remove.
(__arm_vqrshlq_n_s16): Remove.
(__arm_vqrshlq_u32): Remove.
(__arm_vqrshlq_n_u32): Remove.
(__arm_vqrshlq_s32): Remove.
(__arm_vqrshlq_n_s32): Remove.
(__arm_vqrshlq_m_n_u8): Remove.
(__arm_vqrshlq_m_n_s8): Remove.
(__arm_vqrshlq_m_n_u16): Remove.
(__arm_vqrshlq_m_n_s16): Remove.
(__arm_vqrshlq_m_n_u32): Remove.
(__arm_vqrshlq_m_n_s32): Remove.
(__arm_vqrshlq_m_s8): Remove.
(__arm_vqrshlq_m_s32): Remove.
(__arm_vqrshlq_m_s16): Remove.
(__arm_vqrshlq_m_u8): Remove.
(__arm_vqrshlq_m_u32): Remove.
(__arm_vqrshlq_m_u16): Remove.
(__arm_vqrshlq): Remove.
(__arm_vqrshlq_m_n): Remove.
(__arm_vqrshlq_m): Remove.

arm: [MVE intrinsics] factorize vqrshlq vrshlq

Factorize vqrshlq, vrshlq so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New.
(mve_insn): Add vqrshl, vrshl.
* config/arm/mve.md (mve_vqrshlq_n_<supf><mode>)
(mve_vrshlq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vqrshlq_m_n_<supf><mode>, mve_vrshlq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary_round_lshift shape

This patch adds the binary_round_lshift shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.

RISC-V: Fix PR109615

This patch is to fix following case:
void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  if (cond)
    vl = m * 2;
  else
    vl = m * 2 * vl;

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
      __riscv_vse8_v_i8mf8 (out + i, v, vl);

      vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl);

      vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl);
      __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
    }

  for (size_t i = 0; i < n; i++)
    {
      vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
      __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
    }
}

The value of "vl" is coming from different blocks so it will be wrapped as a PHI node of each
block.

In the first loop, the "vl" source is a PHI node from bb 4.
In the second loop, the "vl" source is a PHI node from bb 5.
since bb 5 is dominated by bb 4, the PHI input of "vl" in the second loop is the PHI node of "vl"
in bb 4.
So when 2 "vl" PHI node are both degenerate PHI node (the phi->num_inputs () == 1) and their only
input are same, it's safe for us to consider they are compatible.

This patch is only optimize degenerate PHI since it's safe and simple optimization.

non-dengerate PHI are considered as incompatible unless the PHI are the same in RTL_SSA.
TODO: non-generate PHI is complicated, we can support it when it is necessary in the future.

Before this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
vsetvli zero,a3,e8,mf8,ta,ma
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

After this patch:

...
.L2:
addi    a4,a1,100
add     t1,a0,a2
mv      t0,a0
beq     a2,zero,.L1
vsetvli zero,a3,e8,mf8,tu,mu
.L4:
addi    a6,t0,100
addi    a7,a4,-100
vle8.v  v1,0(t0)
addi    t0,t0,1
vse8.v  v1,0(a7)
vlm.v   v0,0(a6)
vle8.v  v1,0(a6),v0.t
vse8.v  v1,0(a4)
addi    a4,a4,1
bne     t0,t1,.L4
addi    a0,a0,300
addi    a1,a1,300
add     a2,a0,a2
.L5:
vle8.v  v2,0(a0)
addi    a0,a0,1
vse8.v  v2,0(a1)
addi    a1,a1,1
bne     a2,a0,.L5
.L1:
ret

PR target/109615

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::multiple_source_equal_p): Add
denegrate PHI optmization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-74.c: Adapt testcase.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr109615.c: New test.

i386: Rename index_register_operand predicate to register_no_SP_operand

Rename index_register_operand predicate to what it really does.

No functional change.

gcc/ChangeLog:

* config/i386/predicates.md (register_no_SP_operand):
Rename from index_register_operand.
(call_register_operand): Update for rename.
* config/i386/i386.md (*lea<mode>_general_[1234]): Update for rename.

match.pd: Use splits in makefile and make configurable.

This updates the build system to split up match.pd files into chunks of 10.
This also introduces a new flag --with-matchpd-partitions which can be used to
change the number of partitions.

For the analysis of why 10 please look at the previous patch in the series.

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ,
GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O,
GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New.
(OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them.
(s-match): Split into s-generic-match and s-gimple-match.
* configure.ac (with-matchpd-partitions,
DEFAULT_MATCHPD_PARTITIONS): New.
* configure: Regenerate.

match.pd: automatically partition *-match.cc files.

Following on from Richi's RFC[1] this is another attempt to split up match.pd
into multiple gimple-match and generic-match files.  This version is fully
automated and requires no human intervention.

First things first, some perf numbers.  The following shows the effect of the
patch on my desktop doing parallel compilation of gimple-match:

+--------+------------------+--------+------------------+
| splits | rel. improvement | splits | rel. improvement |
+--------+------------------+--------+------------------+
|      1 | 0.00%            |     33 | 91.03%           |
|      2 | 71.77%           |     34 | 84.02%           |
|      3 | 100.71%          |     35 | 83.42%           |
|      4 | 143.08%          |     36 | 78.80%           |
|      5 | 176.18%          |     37 | 74.06%           |
|      6 | 174.40%          |     38 | 55.76%           |
|      7 | 176.62%          |     39 | 66.90%           |
|      8 | 168.35%          |     40 | 18.25%           |
|      9 | 189.80%          |     41 | 16.55%           |
|     10 | 171.77%          |     42 | 47.02%           |
|     11 | 152.82%          |     43 | 15.29%           |
|     12 | 112.20%          |     44 | 21.63%           |
|     13 | 158.57%          |     45 | 41.53%           |
|     14 | 158.57%          |     46 | 21.98%           |
|     15 | 152.07%          |     47 | -42.74%          |
|     16 | 151.70%          |     48 | -32.62%          |
|     17 | 131.52%          |     49 | 11.81%           |
|     18 | 133.11%          |     50 | 34.07%           |
|     19 | 137.33%          |     51 | 2.71%            |
|     20 | 103.83%          |     52 | -22.23%          |
|     21 | 132.47%          |     53 | 32.30%           |
|     22 | 116.52%          |     54 | 21.45%           |
|     23 | 112.73%          |     55 | 40.02%           |
|     24 | 111.94%          |     56 | 42.83%           |
|     25 | 112.73%          |     57 | -9.98%           |
|     26 | 104.07%          |     58 | 18.01%           |
|     27 | 113.27%          |     59 | -4.91%           |
|     28 | 96.77%           |     60 | 22.94%           |
|     29 | 93.42%           |     61 | -3.73%           |
|     30 | 87.67%           |     62 | -27.43%          |
|     31 | 89.54%           |     63 | -1.05%           |
|     32 | 84.42%           |     64 | -5.44%           |
+--------+------------------+--------+------------------+

As can be seen there seems to be a point of diminishing returns in doing splits.
This comes from the fact that these match files consume a sizeable amount of
headers.  At a certain point the parsing overhead of the headers dominate and
you start losing in gains.

As such from this I've made the default 10 splits per file to allow for some
room for growth in the future without needing changes to the split amount.
Since 5-10 show roughly the same gains it means we can afford to double the
file sizes before we need to up the split amount.  This can be controlled
by the configure parameter --with-matchpd-partitions=.

At 10 splits the sizes of the files are:

1.2M gimple-match-1.cc
490K gimple-match-2.cc
459K gimple-match-3.cc
462K gimple-match-4.cc
466K gimple-match-5.cc
690K gimple-match-6.cc
517K gimple-match-7.cc
693K gimple-match-8.cc
1011K gimple-match-9.cc
490K gimple-match-10.cc
210K gimple-match-auto.h

The reason gimple-match-1.cc is so large is because it got allocated a very
large function: gimple_simplify_NE_EXPR.

Because of these sporadically large functions the allocation to a split happens
based on the amount of data already written to a split instead of just a simple
round robin allocation (though the patch supports that too.).   This means that
once gimple_simplify_NE_EXPR is allocated to gimple-match-1.cc nothing uses it
again until the rest of the files catch up.

To support this split a new header file *-match-auto.h is generated to allow
the individual files to compile separately.

Lastly for the auto generated files I use pragmas to silence the unused
predicate warnings instead of the previous Makefile way because I couldn't find
a way to set them without knowing the number of split files beforehand.

Finally with this change, bootstrap time has dropped 8 minutes on AArch64.

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New.
(decision_tree::gen): Accept list of files instead of single and update
to write function definition to header and main file.
(write_predicate): Likewise.
(write_header): Emit pragmas and new includes.
(main): Create file buffers and cleanup.
(showUsage, write_header_includes): New.

genmatch: split shared code to gimple-match-exports.cc

In preparation for automatically splitting match.pd files I split off the
non-static helper functions that are shared between the match.pd functions off
to another file.

This file can be compiled in parallel and also allows us to later avoid
duplicate symbols errors.

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (OBJS): Add gimple-match-exports.o.
* genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers.
* gimple-match-head.cc (gimple_simplify, gimple_resimplify1,
gimple_resimplify2, gimple_resimplify3, gimple_resimplify4,
gimple_resimplify5, constant_for_folding, convert_conditional_op,
maybe_resimplify_conditional_op, gimple_match_op::resimplify,
maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq,
do_valueize, try_conditional_simplification, gimple_extract,
gimple_extract_op, canonicalize_code, commutative_binary_op_p,
commutative_ternary_op_p, first_commutative_argument,
associative_binary_op_p, directly_supported_p,
get_conditional_internal_fn): Moved to gimple-match-exports.cc
* gimple-match-exports.cc: New file.

match.pd: CSE the dump output check.

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here. The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (decision_tree::gen, write_predicate): Generate new
debug_dump var.
(dt_simplify::gen_1): Use it.

match.pd: Remove commented out line pragmas unless -vv is used.

genmatch currently outputs commented out line directives that have no effect
but the compiler still has to parse only to discard.

They are however handy when debugging genmatch output. As such this moves them
behind the -vv flag.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (output_line_directive): Only emit commented directive
when -vv.

match.pd: don't emit label if not needed

This is a small QoL codegen improvement for match.pd to not emit labels when
they are not needed. The codegen is nice and there is a small (but consistent)
improvement in compile time.

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (dt_simplify::gen_1): Only emit labels if used.

GCN: Silence unused-variable warning

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_builtin_vectorized_function): Remove
unused in_mode/in_n variables.

tree-optimization/109735 - conversion for vectorized pointer-diff

There's handling in vectorizable_operation for POINTER_DIFF_EXPR
requiring conversion of the result of the unsigned operation to
a signed type.  But that's conditional on the "default" kind of
vectorization.  In this PR it's shown the emulated vector path
needs it and I think the masked operation case will, too (though
we might eventually never mask an integral MINUS_EXPR).  So the
following makes that handling unconditional.

PR tree-optimization/109735
* tree-vect-stmts.cc (vectorizable_operation): Perform
conversion for POINTER_DIFF_EXPR unconditionally.

i386: Introduce mulv2si3 instruction

For SSE2 targets the expander unpacks input elements into the correct
position in the V4SI vector and emits PMULUDQ instruction. The output
elements are then shuffled back to their positions in the V2SI vector.

For SSE4 targets PMULLD instruction is emitted directly.

gcc/ChangeLog:

* config/i386/mmx.md (mulv2si3): New expander.
(*mulv2si3): New insn pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sse2-mmx-mult-vec.c: New test.

[libstdc++] [testsuite] xfail double-prec from_chars for ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on aarch64-*-vxworks.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on aarch64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
aarch64-vxworks.

nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1. The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
gcc/
PR libgomp/108098

* config/nvptx/mkoffload.cc (process): Emit dummy procedure
alongside reverse-offload function table to prevent NULL values
of the function addresses.

builtins: Fix comment typo mpft_t -> mpfr_t

I've noticed 4 typos in comments, fixed thusly.

2023-05-05 Jakub Jelinek <jakub@redhat.com>

* builtins.cc (do_mpfr_ckconv, do_mpc_ckconv): Fix comment typo,
mpft_t -> mpfr_t.
* fold-const-call.cc (do_mpfr_ckconv, do_mpc_ckconv): Likewise.

PHIOPT: Fix diamond case of match_simplify_replacement

So it turns out I messed checking which edge was true/false for the diamond
form. The edges, e0 and e1 here are edges from the merge block but the
true/false edges are from the conditional block and with diamond/threeway,
there is a bb inbetween on both edges.
Most of the time, the check that was in match_simplify_replacement would
happen to be correct for diamond form as most of the time the first edge in
the conditional is the edge for the true side of the conditional.
This is why I didn't see the issue during bootstrap/testing.

I added a fragile gimple testcase which exposed the issue. Since there is
no way to specify the order of the edges in the gimple fe, we have to
have forwprop to swap the false/true edges (not order of them, just swapping
true/false flags) and hope not to do cleanupcfg inbetween forwprop and the
first phiopt pass. This is the fragile part really, it is not that we will
produce wrong code, just we won't hit what was the failing case.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/109732

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement): Fix the selection
of the argtrue/argfalse.

gcc/testsuite/ChangeLog:

* gcc.dg/pr109732.c: New test.
* gcc.dg/pr109732-1.c: New test.

MATCH: Add ABSU<a> == 0 to a == 0 simplification

There is already an `ABS<a> == 0` to `a == 0` pattern,
this just extends that to ABSU too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/109722

gcc/ChangeLog:

* match.pd: Extend the `ABS<a> == 0` pattern
to cover `ABSU<a> == 0` too.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-1.c: New test.

Revert "c++: restore instantiate_decl assert"

In the testcase the assert fails because we use one member function from
another while we're in the middle of instantiating them all, which is
perfectly fine. It seems complicated to detect this situation, so let's
remove the assert again.

PR c++/109658

This reverts commit 95d4c0d2e6318aef88ba0bc607dfc1ec6b7a612f.

gcc/testsuite/ChangeLog:

* g++.dg/template/local10.C: New test.

Daily bump.

i386: Tighten ashift to lea splitter operand predicates [PR109733]

The predicates of ashift to lea post-reload splitter were too broad
so the splitter tried to convert the mask shift instruction. Tighten
operand predicates to match only general registers.

gcc/ChangeLog:

PR target/109733
* config/i386/predicates.md (index_reg_operand): New predicate.
* config/i386/i386.md (ashift to lea spliter): Use
general_reg_operand and index_reg_operand predicates.

PR modula2/109729 cannot use a CHAR type as a FOR loop iterator

This patch introduces a new quadruple ArithAddOp which is used in
the construction of FOR loop to ensure that when constant folding
is applied it does not concatenate two constant char operands into
a string constant. Overloading only occurs with constant operands.

gcc/m2/ChangeLog:

PR modula2/109729
* gm2-compiler/M2GenGCC.mod (CodeStatement): Detect
ArithAddOp and call CodeAddChecked.
(ResolveConstantExpressions): Detect ArithAddOp and call
FoldArithAdd.
(FoldArithAdd): New procedure.
(FoldAdd): Refactor to use FoldArithAdd.
* gm2-compiler/M2Quads.def (QuadOperator): Add ArithAddOp.
* gm2-compiler/M2Quads.mod: Remove commented imports.
(QuadFrame): Changed comments to use GNU coding standards.
(ArithPlusTok): New global variable.
(BuildForToByDo): Use ArithPlusTok instead of PlusTok.
(MakeOp): Detect ArithPlusTok and return ArithAddOp.
(WriteQuad): Add ArithAddOp clause.
(WriteOperator): Add ArithAddOp clause.
(Init): Initialize ArithPlusTok.

gcc/testsuite/ChangeLog:

PR modula2/109729
* gm2/pim/run/pass/ForChar.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[2/2] aarch64: Reimplement (R){ADD,SUB}HN2 patterns with standard RTL codes

Similar to the previous patch, this one converts the high-half versions of the patterns.
With this patch we can remove the UNSPEC_* codes involved entirely.

Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn2<mode>_insn_le):
Rename and reimplement with RTL codes to...
(aarch64_<optab>hn2<mode>_insn_le): .. This.
(aarch64_r<optab>hn2<mode>_insn_le): New pattern.
(aarch64_<sur><addsub>hn2<mode>_insn_be): Rename and reimplement with RTL
codes to...
(aarch64_<optab>hn2<mode>_insn_be): ... This.
(aarch64_r<optab>hn2<mode>_insn_be): New pattern.
(aarch64_<sur><addsub>hn2<mode>): Rename and adjust expander to...
(aarch64_<optab>hn2<mode>): ... This.
(aarch64_r<optab>hn2<mode>): New expander.
* config/aarch64/iterators.md (UNSPEC_ADDHN, UNSPEC_RADDHN,
UNSPEC_SUBHN, UNSPEC_RSUBHN): Delete unspecs.
(ADDSUBHN): Delete.
(sur): Remove handling of the above.
(addsub): Likewise.

[1/2] aarch64: Reimplement (R){ADD,SUB}HN intrinsics with RTL codes

We can implement the halving-narrowing add/sub patterns with standard RTL codes as well rather than relying on unspecs.
This patch handles the low-part ones and the second patch does the high-part ones and removes the unspecs themselves.
The operation ADDHN on V4SI, for example, is represented as (truncate:V4HI ((src1:V4SI + src2:V4SI) >> 16))
and RADDHN as (truncate:V4HI ((src1:V4SI + src2:V4SI + (1 << 15)) >> 16)).
Taking this opportunity I specified the patterns returning the narrow mode and annotated them with the
<vczle><vczbe> define_subst rules to get the vec_concat-zero meta-patterns too. This allows us to simplify
the expanders somewhat too. Tests are added to check that the combinations work.

Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn<mode>_insn_le):
Delete.
(aarch64_<optab>hn<mode>_insn<vczle><vczbe>): New define_insn.
(aarch64_<sur><addsub>hn<mode>_insn_be): Delete.
(aarch64_r<optab>hn<mode>_insn<vczle><vczbe>): New define_insn.
(aarch64_<sur><addsub>hn<mode>): Delete.
(aarch64_<optab>hn<mode>): New define_expand.
(aarch64_r<optab>hn<mode>): Likewise.
* config/aarch64/predicates.md (aarch64_simd_raddsubhn_imm_vec):
New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/pr99195_4.c: New test.

OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

This patch moves several tests introduced by the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html
  commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7

into the proper location for OpenACC testing (thanks to Thomas for
spotting my mistake!), and also fixes a few additional problems --
missing diagnostics for non-pointer attaches, and a case where a pointer
was incorrectly dereferenced. Tests are also adjusted for vector-length
warnings on nvidia accelerators.

2023-04-29  Julian Brown  <julian@codesourcery.com>

PR fortran/109622

gcc/fortran/
* openmp.cc (resolve_omp_clauses): Add diagnostic for
non-pointer/non-allocatable attach/detach.
* trans-openmp.cc (gfc_trans_omp_clauses): Remove dereference for
pointer-to-scalar derived type component attach/detach.  Fix
attach/detach handling for descriptors.

gcc/testsuite/
* gfortran.dg/goacc/pr109622-5.f90: New test.
* gfortran.dg/goacc/pr109622-6.f90: New test.

libgomp/
* testsuite/libgomp.fortran/pr109622.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore
vector length warning.
* testsuite/libgomp.fortran/pr109622-2.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here.  Add
missing copyin/copyout variable. Ignore vector length warnings.
* testsuite/libgomp.fortran/pr109622-3.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here.  Ignore
vector length warnings.
* testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.

libstdc++: Document new library version in manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml (abi.versioning.history): Document
libstdc++.so.6.0.32 and GLIBCXX_3.4.32 version.
* doc/html/manual/abi.html: Regenerate.

libstdc++: Mention recent libgcc_s symbol versions in manual

GCC_11.0 is an aarch64-specific outlier.

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml (abi.versioning.history): Add
GCC_7.0.0, GCC_9.0.0, GCC_11.0, GCC_12.0.0, GCC_13.0.0 for
libgcc_s.

PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bb

While looking at differences between what minmax_replacement
and match_simplify_replacement does. I noticed that they sometimes
chose different edges to remove. I decided we should be able to do
better and be able to remove both empty basic blocks in the
case of match_simplify_replacement as that moves the statements.

This also updates the testcases as now match_simplify_replacement
will remove the unused MIN/MAX_EXPR and they were checking for
those.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle
diamond form bb with forwarder only empty blocks better.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-15.c: Update test.
* gcc.dg/tree-ssa/minmax-16.c: Update test.
* gcc.dg/tree-ssa/minmax-3.c: Update test.
* gcc.dg/tree-ssa/minmax-4.c: Update test.
* gcc.dg/tree-ssa/minmax-5.c: Update test.
* gcc.dg/tree-ssa/minmax-8.c: Update test.

Move copy_phi_arg_into_existing_phi to common location and use it

While improving replace_phi_edge_with_variable for the diamond formed bb
case, I need a way to copy phi entries from one edge to another as I am
removing a forwarding bb inbetween. I was pointed out that jump threading
code had copy_phi_arg_into_existing_phi which I can use.
I also noticed that both gimple_duplicate_sese_tail and
remove_forwarder_block have similar code so it makes sense to use that function
in those two locations too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-threadupdate.cc (copy_phi_arg_into_existing_phi): Move to ...
* tree-cfg.cc (copy_phi_arg_into_existing_phi): Here and remove static.
(gimple_duplicate_sese_tail): Use copy_phi_arg_into_existing_phi instead
of an inline version of it.
* tree-cfgcleanup.cc (remove_forwarder_block): Likewise.
* tree-cfg.h (copy_phi_arg_into_existing_phi): New declaration.

PHIOPT: Improve replace_phi_edge_with_variable's dce_ssa_names slightly

When I added the dce_ssa_names argument, I didn't realize bitmap was a
pointer so I used the default argument value as auto_bitmap(). But
instead we could just use nullptr and check if it was a nullptr
before calling simple_dce_from_worklist.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Change
the default argument value for dce_ssa_names to nullptr.
Check to make sure dce_ssa_names is a non-nullptr before
calling simple_dce_from_worklist.

i386: Improve index_register_operand predicate

Use the same approach as in register_no_elim_operand predicate, but also
reject stack_pointer_rtx operands.

gcc/ChangeLog:

* config/i386/predicates.md (index_register_operand): Reject
arg_pointer_rtx, frame_pointer_rtx, stack_pointer_rtx and
VIRTUAL_REGISTER_P operands. Allow subregs of memory before reload.
(call_register_no_elim_operand): Rewrite as ...
(call_register_operand): ... this.
(call_insn_operand): Use call_register_operand predicate.

tree-optimization/109721 - emulated vectors

When fixing PR109672 I noticed we let SImode AND through when
target_support_p even though it isn't word_mode and I didn't want to
change that but had to catch the case where SImode PLUS is supported
but emulated vectors rely on it being word_mode. The following
makes sure to preserve the word_mode check when !target_support_p
to avoid excessive lowering later even for bit operations.

PR tree-optimization/109721
* tree-vect-stmts.cc (vectorizable_operation): Make sure
to test word_mode for all !target_support_p operations.

aarch64: PR target/99195 annotate simple ternary ops for vec-concat with zero

We're now moving onto various simple ternary instructions, including some lane forms.
These include intrinsics that map down to mla, mls, fma, aba, bsl instructions.
Tests are added for lane 0 and lane 1 as for some of these instructions the lane 0 variants
use separate simpler patterns that need a separate annotation.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<su>aba<mode>): Rename to...
(aarch64_<su>aba<mode><vczle><vczbe>): ... This.
(aarch64_mla<mode>): Rename to...
(aarch64_mla<mode><vczle><vczbe>): ... This.
(*aarch64_mla_elt<mode>): Rename to...
(*aarch64_mla_elt<mode><vczle><vczbe>): ... This.
(*aarch64_mla_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_mla_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(aarch64_mla_n<mode>): Rename to...
(aarch64_mla_n<mode><vczle><vczbe>): ... This.
(aarch64_mls<mode>): Rename to...
(aarch64_mls<mode><vczle><vczbe>): ... This.
(*aarch64_mls_elt<mode>): Rename to...
(*aarch64_mls_elt<mode><vczle><vczbe>): ... This.
(*aarch64_mls_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_mls_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(aarch64_mls_n<mode>): Rename to...
(aarch64_mls_n<mode><vczle><vczbe>): ... This.
(fma<mode>4): Rename to...
(fma<mode>4<vczle><vczbe>): ... This.
(*aarch64_fma4_elt<mode>): Rename to...
(*aarch64_fma4_elt<mode><vczle><vczbe>): ... This.
(*aarch64_fma4_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_fma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(*aarch64_fma4_elt_from_dup<mode>): Rename to...
(*aarch64_fma4_elt_from_dup<mode><vczle><vczbe>): ... This.
(fnma<mode>4): Rename to...
(fnma<mode>4<vczle><vczbe>): ... This.
(*aarch64_fnma4_elt<mode>): Rename to...
(*aarch64_fnma4_elt<mode><vczle><vczbe>): ... This.
(*aarch64_fnma4_elt_<vswap_width_name><mode>): Rename to...
(*aarch64_fnma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This.
(*aarch64_fnma4_elt_from_dup<mode>): Rename to...
(*aarch64_fnma4_elt_from_dup<mode><vczle><vczbe>): ... This.
(aarch64_simd_bsl<mode>_internal): Rename to...
(aarch64_simd_bsl<mode>_internal<vczle><vczbe>): ... This.
(*aarch64_simd_bsl<mode>_alt): Rename to...
(*aarch64_simd_bsl<mode>_alt<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_3.c: New test.

aarch64: PR target/99195 annotate more simple binary ops for vec-concat with zero

More pattern annotations and tests to eliminate redundant vec-concat with zero instructions.
These are for the abd family of instructions and the pairwise floating-point max/min and fadd
operations too.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>): Rename to...
(aarch64_<su>abd<mode><vczle><vczbe>): ... This.
(fabd<mode>3): Rename to...
(fabd<mode>3<vczle><vczbe>): ... This.
(aarch64_<optab>p<mode>): Rename to...
(aarch64_<optab>p<mode><vczle><vczbe>): ... This.
(aarch64_faddp<mode>): Rename to...
(aarch64_faddp<mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for more binary ops.
* gcc.target/aarch64/simd/pr99195_2.c: Add testing for more binary ops.

gcov: add GCOV format version to gcov -v

gcc/ChangeLog:

* gcov.cc (GCOV_JSON_FORMAT_VERSION): New definition.
(print_version): Use it.
(generate_results): Likewise.

tree-optimization/109724 - new testcase

The following adds a testcase for PR109724 which was caused by
backporting r13-2375-gbe1b42de9c151d and fixed by r11-199-g2b42509f8b7bdf.

PR tree-optimization/109724
* g++.dg/torture/pr109724.C: New testcase.

Rename last_stmt to last_nondebug_stmt

The following renames last_stmt to last_nondebug_stmt which is
what it really does.

* tree-cfg.h (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
* tree-cfg.cc (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
(assign_discriminators): Adjust.
(group_case_labels_stmt): Likewise.
(gimple_can_duplicate_bb_p): Likewise.
(execute_fixup_cfg): Likewise.
* auto-profile.cc (afdo_propagate_circuit): Likewise.
* gimple-range.cc (gimple_ranger::range_on_exit): Likewise.
* omp-expand.cc (workshare_safe_to_combine_p): Likewise.
(determine_parallel_type): Likewise.
(adjust_context_and_scope): Likewise.
(expand_task_call): Likewise.
(remove_exit_barrier): Likewise.
(expand_omp_taskreg): Likewise.
(expand_omp_for_init_counts): Likewise.
(expand_omp_for_init_vars): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_simd): Likewise.
(expand_oacc_for): Likewise.
(expand_omp_for): Likewise.
(expand_omp_sections): Likewise.
(expand_omp_atomic_fetch_op): Likewise.
(expand_omp_atomic_cas): Likewise.
(expand_omp_atomic): Likewise.
(expand_omp_target): Likewise.
(expand_omp): Likewise.
(omp_make_gimple_edges): Likewise.
* trans-mem.cc (tm_region_init): Likewise.
* tree-inline.cc (redirect_all_calls): Likewise.
* tree-parloops.cc (gen_parallel_loop): Likewise.
* tree-ssa-loop-ch.cc (do_while_loop_p): Likewise.
* tree-ssa-loop-ivcanon.cc (canonicalize_loop_induction_variables):
Likewise.
* tree-ssa-loop-ivopts.cc (stmt_after_ip_normal_pos): Likewise.
(may_eliminate_iv): Likewise.
* tree-ssa-loop-manip.cc (standard_iv_increment_position): Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations):
Likewise.
(estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-split.cc (compute_added_num_insns): Likewise.
* tree-ssa-loop-unswitch.cc (get_predicates_for_bb): Likewise.
(set_predicates_for_bb): Likewise.
(init_loop_unswitch_info): Likewise.
(hoist_guard): Likewise.
* tree-ssa-phiopt.cc (match_simplify_replacement): Likewise.
(minmax_replacement): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise.
(optimize_range_tests_to_bit_test): Likewise.
(optimize_range_tests_var_bound): Likewise.
(optimize_range_tests): Likewise.
(no_side_effect_bb): Likewise.
(suitable_cond_bb): Likewise.
(maybe_optimize_range_tests): Likewise.
(reassociate_bb): Likewise.
* tree-vrp.cc (rvrp_folder::pre_fold_bb): Likewise.

i386: Fix up handling of debug insns in STV [PR109676]

The following testcase ICEs because STV replaces there
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1
     (nil))
with
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:V1TI 91 [ p ])) -1
     (nil))
which is invalid because of the mode mismatch.
STV has fix_debug_reg_uses function which is supposed to fix this up
and adjust such debug insns into
(debug_insn 114 47 51 8 (var_location:TI D#3 (subreg:TI (reg:V1TI 91 [ p ]) 0)) -1
     (nil))
but it doesn't trigger here.
The IL before stv1 has:
(debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1
     (nil))
...
(insn 63 62 64 8 (set (mem/c:TI (reg/f:DI 89 [ .result_ptr ]) [0 <retval>.mStorage+0 S16 A32])
        (reg:TI 91 [ p ])) "pr109676.C":4:48 87 {*movti_internal}
     (expr_list:REG_DEAD (reg:TI 91 [ p ])
        (nil)))
in bb 8 and
(insn 97 96 98 9 (set (reg:TI 91 [ p ])
        (mem/c:TI (plus:DI (reg/f:DI 19 frame)
                (const_int -32 [0xffffffffffffffe0])) [0 p+0 S16 A128])) "pr109676.C":26:12 87 {*movti_internal}
     (nil))
(insn 98 97 99 9 (set (mem/c:TI (plus:DI (reg/f:DI 19 frame)
                (const_int -64 [0xffffffffffffffc0])) [0 tmp+0 S16 A128])
        (reg:TI 91 [ p ])) "pr109676.C":26:12 87 {*movti_internal}
     (nil))
in bb9.
PUT_MODE on a REG is done in two spots in timode_scalar_chain::convert_insn,
one is:
  switch (GET_CODE (dst))
    {
    case REG:
      if (GET_MODE (dst) == TImode)
        {
          PUT_MODE (dst, V1TImode);
          fix_debug_reg_uses (dst);
        }
      if (GET_MODE (dst) == V1TImode)
when seeing the REG in SET_DEST and another one the hunk the patch adjusts.
Because bb 8 comes first in the order the pass walks the bbs, we first
notice the TImode pseudo on insn 63 where it is SET_SRC, use PUT_MODE there
unconditionally, so for a shared REG it changes all other uses in the IL,
and then don't call fix_debug_reg_uses because DF_REG_DEF_CHAIN (REGNO (src))
is non-NULL - the REG is set in insn 97 but we haven't processed it yet.
Later on we process insn 97, but because the REG in SET_DEST already has
V1TImode, we don't do anything, even when the src handling code earlier
relied on it being done.

The following patch fixes this by using similar code for both dst and src,
in particular calling fix_debug_reg_uses once when we actually change REG
mode from TImode to V1TImode, and not later on.

2023-05-04  Jakub Jelinek  <jakub@redhat.com>

PR debug/109676
* config/i386/i386-features.cc (timode_scalar_chain::convert_insn):
If src is REG, change its mode to V1TImode and call fix_debug_reg_uses
for it only if it still has TImode.  Don't decide whether to call
fix_debug_reg_uses based on whether SRC is ever set or not.

* g++.target/i386/pr109676.C: New test.

CRIS: peephole2 an "and" with a contiguous "one-sided" sequences of 1s

This kind of transformation seems pretty generic and might be a
candidate for adding to the middle-end, perhaps as part of combine.

I noticed these happened more often for LRA, which is the reason I
went on this track of low-hanging-fruit-microoptimizations that are
such an itch when noticing them, inspecting generated code for libgcc.
Unfortunately, this one improves coremark only by a few cycles at the
beginning or end (<0.0005%) for cris-elf -march=v10. The size of the
coremark code is down by 0.4% (0.22% pre-lra).

Using an iterator from the start because other binary operations will
be added and their define_peephole2's would look exactly the same for
the .md part.

Some existing and-peephole2-related tests suffered, because many of
them were using patterns with only contiguous 1:s in them: adjusted.
Also, spotted and fixed, by adding a space, some
scan-assembler-strings that were prone to spurious identifier or file
name matches.

gcc:
* config/cris/cris.cc (cris_split_constant): New function.
* config/cris/cris.md (splitop): New iterator.
(opsplit1): New define_peephole2.
* config/cris/cris-protos.h (cris_split_constant): Declare.
(cris_splittable_constant_p): New macro.

gcc/testsuite:
* gcc.target/cris/peep2-andsplit1.c: New test.
* gcc.target/cris/peep2-andu1.c, gcc.target/cris/peep2-andu2.c,
gcc.target/cris/peep2-xsrand.c, gcc.target/cris/peep2-xsrand2.c:
Adjust values to avoid interference with "opsplit1" with AND. Add
whitespace to match-strings that may be confused with identifiers
or file names.

CRIS-LRA: Define TARGET_SPILL_CLASS

This has no effect on arith-rand-ll (which suffers badly from LRA) and
marginal effects (0.01% improvement) on coremark, but the size of
coremark shrinks by 0.2%. An earlier version was tested with a tree
around 2023-03 which showed (marginally) that ALL_REGS is preferable
to GENERAL_REGS.

* config/cris/cris.cc (TARGET_SPILL_CLASS): Define
to ALL_REGS.

PR modula2/109675 implementation of writeAddress is non portable

The implementation of gcc/m2/gm2-libs/DynamicStrings.mod:writeAddress
is non portable as it casts a void * into an unsigned long int.  This
procedure has been re-implemented to use snprintf.  As it is a library
the support tools 'mc' and 'pge' have been rebuilt.  There have been
linking changes in the library and the underlying boolean type is now
bool since the last rebuild hence the size of the patch.

gcc/m2/ChangeLog:

PR modula2/109675
* Make-lang.in (MC-LIB-DEFS): Remove M2LINK.def.
(BUILD-PGE-O): Remove GM2LINK.o.
* Make-maintainer.in (PPG-DEFS): New define.
(PPG-LIB-DEFS): Remove M2LINK.def.
(BUILD-BOOT-PPG-H): Add PPGDEF .h files.
(m2/ppg$(exeext)): Remove M2LINK.o
(PGE-DEPS): New define.
(m2/pg$(exeext)): Remove M2LINK.o.
(m2/gm2-pge-boot/$(SRC_PREFIX)%.o): Add -Im2/gm2-pge-boot.
(m2/pge$(exeext)): Remove M2LINK.o.
(pge-maintainer): Re-implement.
(pge-libs-push): Re-implement.
(m2/m2obj3/cc1gm2$(exeext)): Remove M2LINK.o.
* gm2-libs/DynamicStrings.mod (writeAddress): Re-implement
using snprintf.
* gm2-libs/M2Dependent.mod: Remove commented out imports.
* mc-boot/GDynamicStrings.cc: Rebuild.
* mc-boot/GFIO.cc: Rebuild.
* mc-boot/GFormatStrings.cc: Rebuild.
* mc-boot/GM2Dependent.cc: Rebuild.
* mc-boot/GM2Dependent.h: Rebuild.
* mc-boot/GM2RTS.cc: Rebuild.
* mc-boot/GM2RTS.h: Rebuild.
* mc-boot/GRTExceptions.cc: Rebuild.
* mc-boot/GRTint.cc: Rebuild.
* mc-boot/GSFIO.cc: Rebuild.
* mc-boot/GStringConvert.cc: Rebuild.
* mc-boot/Gdecl.cc: Rebuild.
* pge-boot/GASCII.cc: Rebuild.
* pge-boot/GASCII.h: Rebuild.
* pge-boot/GArgs.cc: Rebuild.
* pge-boot/GArgs.h: Rebuild.
* pge-boot/GAssertion.cc: Rebuild.
* pge-boot/GAssertion.h: Rebuild.
* pge-boot/GBreak.h: Rebuild.
* pge-boot/GCmdArgs.h: Rebuild.
* pge-boot/GDebug.cc: Rebuild.
* pge-boot/GDebug.h: Rebuild.
* pge-boot/GDynamicStrings.cc: Rebuild.
* pge-boot/GDynamicStrings.h: Rebuild.
* pge-boot/GEnvironment.h: Rebuild.
* pge-boot/GFIO.cc: Rebuild.
* pge-boot/GFIO.h: Rebuild.
* pge-boot/GFormatStrings.h:: Rebuild.
* pge-boot/GFpuIO.h:: Rebuild.
* pge-boot/GIO.cc: Rebuild.
* pge-boot/GIO.h: Rebuild.
* pge-boot/GIndexing.cc: Rebuild.
* pge-boot/GIndexing.h: Rebuild.
* pge-boot/GLists.cc: Rebuild.
* pge-boot/GLists.h: Rebuild.
* pge-boot/GM2Dependent.cc: Rebuild.
* pge-boot/GM2Dependent.h: Rebuild.
* pge-boot/GM2EXCEPTION.cc: Rebuild.
* pge-boot/GM2EXCEPTION.h: Rebuild.
* pge-boot/GM2RTS.cc: Rebuild.
* pge-boot/GM2RTS.h: Rebuild.
* pge-boot/GNameKey.cc: Rebuild.
* pge-boot/GNameKey.h: Rebuild.
* pge-boot/GNumberIO.cc: Rebuild.
* pge-boot/GNumberIO.h: Rebuild.
* pge-boot/GOutput.cc: Rebuild.
* pge-boot/GOutput.h: Rebuild.
* pge-boot/GPushBackInput.cc: Rebuild.
* pge-boot/GPushBackInput.h: Rebuild.
* pge-boot/GRTExceptions.cc: Rebuild.
* pge-boot/GRTExceptions.h: Rebuild.
* pge-boot/GSArgs.h: Rebuild.
* pge-boot/GSEnvironment.h: Rebuild.
* pge-boot/GSFIO.cc: Rebuild.
* pge-boot/GSFIO.h: Rebuild.
* pge-boot/GSYSTEM.h: Rebuild.
* pge-boot/GScan.h: Rebuild.
* pge-boot/GStdIO.cc: Rebuild.
* pge-boot/GStdIO.h: Rebuild.
* pge-boot/GStorage.cc: Rebuild.
* pge-boot/GStorage.h: Rebuild.
* pge-boot/GStrCase.cc: Rebuild.
* pge-boot/GStrCase.h: Rebuild.
* pge-boot/GStrIO.cc: Rebuild.
* pge-boot/GStrIO.h: Rebuild.
* pge-boot/GStrLib.cc: Rebuild.
* pge-boot/GStrLib.h: Rebuild.
* pge-boot/GStringConvert.h: Rebuild.
* pge-boot/GSymbolKey.cc: Rebuild.
* pge-boot/GSymbolKey.h: Rebuild.
* pge-boot/GSysExceptions.h: Rebuild.
* pge-boot/GSysStorage.cc: Rebuild.
* pge-boot/GSysStorage.h: Rebuild.
* pge-boot/GTimeString.h: Rebuild.
* pge-boot/GUnixArgs.h: Rebuild.
* pge-boot/Gbnflex.cc: Rebuild.
* pge-boot/Gbnflex.h: Rebuild.
* pge-boot/Gdtoa.h: Rebuild.
* pge-boot/Gerrno.h: Rebuild.
* pge-boot/Gldtoa.h: Rebuild.
* pge-boot/Glibc.h: Rebuild.
* pge-boot/Glibm.h: Rebuild.
* pge-boot/Gpge.cc: Rebuild.
* pge-boot/Gtermios.h: Rebuild.
* pge-boot/Gwrapc.h: Rebuild.
* mc-boot/GM2LINK.h: Removed.
* pge-boot/GM2LINK.cc: Removed.
* pge-boot/GM2LINK.h: Removed.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

CRIS-LRA: Fix uses of reload_in_progress

This shows no difference neither in arith-rand-ll nor coremark
numbers. Comparing libgcc and newlib libc before/after, the only
difference can be seen in a few functions where it's mostly neutral
(newlib's _svfprintf_r et al) and one function (__gdtoa), which
improves ever so slightly (four bytes less; one load less, but one
instruction reading from memory instead of a register).

* config/cris/cris.cc (cris_side_effect_mode_ok): Use
lra_in_progress, not reload_in_progress.
* config/cris/cris.md ("movdi", "*addi_reload"): Ditto.
* config/cris/constraints.md ("Q"): Ditto.

libstdc++: Fix up abi.exp FAILs on powerpc64le-linux

This is an ABI problem on powerpc64le-linux, introduced in 13.1.
When libstdc++ is configured against old glibc, the
_ZSt10from_charsPKcS0_RDF128_St12chars_format@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_St12chars_format@@GLIBCXX_3.4.31
_ZSt8to_charsPcS_DF128_St12chars_formati@@GLIBCXX_3.4.31
symbols are exported from the library, while when it is configured against
new enough glibc, those symbols aren't exported and we export instead
_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29
_ZSt8to_charsPcS_u9__ieee128St12chars_formati@@GLIBCXX_IEEE128_3.4.29
together with various other @@GLIBCXX_IEEE128_3.4.{29,30,31} and
@@CXXABI_IEEE128_1.3.13 symbols. The idea was that those *IEEE128* symbol
versions (similarly to *LDBL* symbol versions) are optional (but if it
appears, all symbols from it up to the version of the library appears),
but the base appears always.
My _Float128 from_chars/to_chars changes unfortunately broke this.
I believe nothing really uses those symbols if libstdc++ has been
configured against old glibc, so if 13.1 wasn't already released, it might
be best to make sure they aren't exported on powerpc64le-linux.
But as they were exported, I think the best resolution for this ABI
difference is to add those 4 symbols as aliases to the
GLIBCXX_IEEE128_3.4.29 *u9__ieee128* symbols, which the following patch
does.

2023-05-03 Jakub Jelinek <jakub@redhat.com>

* src/c++17/floating_from_chars.cc
(_ZSt10from_charsPKcS0_RDF128_St12chars_format): New alias to
_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format.
* src/c++17/floating_to_chars.cc (_ZSt8to_charsPcS_DF128_): New alias to
_ZSt8to_charsPcS_u9__ieee128.
(_ZSt8to_charsPcS_DF128_St12chars_format): New alias to
_ZSt8to_charsPcS_u9__ieee128St12chars_format.
(_ZSt8to_charsPcS_DF128_St12chars_formati): New alias to
_ZSt8to_charsPcS_u9__ieee128St12chars_formati.
* config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: Updated.

libstdc++: Fix up abi.exp FAILs on powerpc64-linux

As discussed on IRC, my _Float128/_Float64x support changes broke
abi.exp testing on powerpc64-linux.

The
_ZTIDF128_@@CXXABI_1.3.14
_ZTIDF64x@@CXXABI_1.3.14
_ZTIPDF128_@@CXXABI_1.3.14
_ZTIPDF64x@@CXXABI_1.3.14
_ZTIPKDF128_@@CXXABI_1.3.14
_ZTIPKDF64x@@CXXABI_1.3.14
symbols only appear on powerpc64le-linux (both when building against
very old glibcs as well as contemporary glibcs), while they don't
appear on powerpc64-linux, because the latter never has _Float128 or
_Float64x support.

But we were using the same baseline_symbols.txt file for both
powerpc64-linux and powerpc64le-linux, even when it contained quite a lot
of stuff specific to the latter; but that was just the IEEE128 related
stuff that appears only when configured against not very old glibc.

The following patch keeps those exports as is and just splits the
config/abi/post/ files, copies the current one to powerpc64le-linux
unmodified and removes the above mentioned symbols plus all
GLIBCXX_IEEE128_3.4.{29,30,31} and CXXABI_IEEE128_1.3.13 symbols
from the powerpc64-linux version.

2023-05-03 Jakub Jelinek <jakub@redhat.com>

* configure.host (abi_baseline_pair): Use powerpc64le-linux-gnu
rather than powerpc64-linux-gnu for powerpc64le*-linux*.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Remove
_ZTI*DF128_, _ZTI*DF64x symbols and symbols in
GLIBCXX_IEEE128_3.4.{29,30,31} and CXXABI_IEEE128_1.3.13 symbol
versions.
* config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: New
file.

c++: over-eager friend matching [PR109649]

A bug in the simplification I did around 91618; at this point X<int>::f has
DECL_IMPLICIT_INSTANTIATION set, but we've already identified what template
it corresponds to, so we don't want to call check_explicit_specialization.

To distinguish this case we need to look at DECL_TI_TEMPLATE. grokfndecl
has for a long time set it to the OVERLOAD in this case, while the new cases
I added for 91618 were leaving DECL_TEMPLATE_INFO null; let's adjust them to
match.

PR c++/91618
PR c++/109649

gcc/cp/ChangeLog:

* friend.cc (do_friend): Don't call check_explicit_specialization if
DECL_TEMPLATE_INFO is already set.
* decl2.cc (check_classfn): Set DECL_TEMPLATE_INFO.
* name-lookup.cc (set_decl_namespace): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend77.C: New test.

Add stats to simple_dce_from_worklist

While looking to move substitute_and_fold_engine
over to use simple_dce_from_worklist, I noticed
that we don't record the stats of the removed stmts/phis.
So this does that.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): Record
stats on removed number of statements and phis.

Allow varying ranges of unknown types in irange::verify_range [PR109711]

The old legacy code allowed building ranges of unknown types so passes
like IPA could build and propagate VARYING. For now it's easiest to
allow the old behavior, it's not like you can do anything with these
ranges except build them and copy them.

Eventually we should convert all users of set_varying() to use
supported types. I will address this in my upcoming IPA work.

PR tree-optimization/109711

gcc/ChangeLog:

* value-range.cc (irange::verify_range): Allow types of
error_mark_node.

do not tailcall __sanitizer_cov_trace_pc [PR90746]

When instrumentation is requested via -fsanitize-coverage=trace-pc, GCC
emits calls of __sanitizer_cov_trace_pc callback in each basic block.
This callback is supposed to be implemented by the user, and should be
able to identify the containing basic block by inspecting its return
address. Tailcalling the callback prevents that, so disallow it.

gcc/ChangeLog:

PR sanitizer/90746
* calls.cc (can_implement_as_sibling_call_p): Reject calls
to __sanitizer_cov_trace_pc.

gcc/testsuite/ChangeLog:

PR sanitizer/90746
* gcc.dg/sancov/basic0.c: Verify absence of tailcall.

aarch64: Fix ABI handling of aligned enums [PR109661]

aarch64_function_arg_alignment has traditionally taken the alignment
of a scalar type T from TYPE_ALIGN (TYPE_MAIN_VARIANT (T)).  This is
supposed to discard any user alignment and give the alignment of the
underlying fundamental type.

PR109661 shows that this did the wrong thing for enums with
a defined underlying type, because:

(1) The enum itself could be aligned, using attributes.
(2) The enum would pick up any user alignment on the underlying type.

We get the right behaviour if we look at the TYPE_MAIN_VARIANT
of the underlying type instead.

As always, this affects register and stack arguments differently,
because:

(a) The code that handles register arguments only considers the
    alignment of types that occupy two registers, whereas the
    stack alignment is applied regardless of size.

(b) The code that handles register arguments tests the alignment
    for equality with 16 bytes, so that (unexpected) greater alignments
    are ignored.  The code that handles stack arguments instead caps the
    alignment to 16 bytes.

There is now (since GCC 13) an assert to trap the difference between
(a) and (b), which is how the new incompatiblity showed up.

Clang alredy handled the testcases correctly, so this patch aligns
the GCC behaviour with the Clang behaviour.

I'm planning to remove the asserts on the branches, since we don't
want to change the ABI there.

gcc/
PR target/109661
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Add
a new ABI break parameter for GCC 14.  Set it to the alignment
of enums that have an underlying type.  Take the true alignment
of such enums from the TYPE_ALIGN of the underlying type's
TYPE_MAIN_VARIANT.
(aarch64_function_arg_boundary): Update accordingly.
(aarch64_layout_arg, aarch64_gimplify_va_arg_expr): Likewise.
Warn about ABI differences.

gcc/testsuite/
* g++.target/aarch64/pr109661-1.C: New test.
* g++.target/aarch64/pr109661-2.C: Likewise.
* g++.target/aarch64/pr109661-3.C: Likewise.
* g++.target/aarch64/pr109661-4.C: Likewise.
* gcc.target/aarch64/pr109661-1.c: Likewise.

aarch64: Rename abi_break parameters [PR109661]

aarch64_function_arg_alignment has two related abi_break
parameters: abi_break for a change in GCC 9, and abi_break_packed
for a related follow-on change in GCC 13.  In a sense, abi_break_packed
is a "subfix" of abi_break.

PR109661 now requires a third ABI break that is independent
of the other two.  Having abi_break for the GCC 9 break and
abi_break_<something> for the GCC 13 and GCC 14 breaks might
give the impression that they're all related, and that the GCC 14
fix (like the GCC 13 fix) is a "subfix" of the GCC 9 one.
It therefore seemed like a good idea to rename the existing
variables first.

It would be difficult to choose names that describe briefly and
precisely what went wrong in each case.  The next best thing
seemed to be to name them after the relevant GCC version.
(Of course, this might break down in future if we need two
independent fixes in the same version.  Let's hope not.)

I wondered about putting all the variables in a structure,
but one advantage of using independent variables is that it's
harder to forget to update a caller.  Maybe a fourth parameter
would be a tipping point.

gcc/
PR target/109661
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Rename
ABI break variables to abi_break_gcc_9 and abi_break_gcc_13.
(aarch64_layout_arg, aarch64_function_arg_boundary): Likewise.
(aarch64_gimplify_va_arg_expr): Likewise.

arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq

Implement vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_NO_F)
(FUNCTION_WITHOUT_N_NO_F, FUNCTION_WITH_M_N_NO_U_F): New.
(vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq)
(vrmulhq): New.
* config/arm/arm-mve-builtins-base.def (vhaddq, vhsubq, vmulhq)
(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
* config/arm/arm-mve-builtins-base.h (vhaddq, vhsubq, vmulhq)
(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
* config/arm/arm_mve.h (vhsubq): Remove.
(vhaddq): Remove.
(vhaddq_m): Remove.
(vhsubq_m): Remove.
(vhaddq_x): Remove.
(vhsubq_x): Remove.
(vhsubq_u8): Remove.
(vhsubq_n_u8): Remove.
(vhaddq_u8): Remove.
(vhaddq_n_u8): Remove.
(vhsubq_s8): Remove.
(vhsubq_n_s8): Remove.
(vhaddq_s8): Remove.
(vhaddq_n_s8): Remove.
(vhsubq_u16): Remove.
(vhsubq_n_u16): Remove.
(vhaddq_u16): Remove.
(vhaddq_n_u16): Remove.
(vhsubq_s16): Remove.
(vhsubq_n_s16): Remove.
(vhaddq_s16): Remove.
(vhaddq_n_s16): Remove.
(vhsubq_u32): Remove.
(vhsubq_n_u32): Remove.
(vhaddq_u32): Remove.
(vhaddq_n_u32): Remove.
(vhsubq_s32): Remove.
(vhsubq_n_s32): Remove.
(vhaddq_s32): Remove.
(vhaddq_n_s32): Remove.
(vhaddq_m_n_s8): Remove.
(vhaddq_m_n_s32): Remove.
(vhaddq_m_n_s16): Remove.
(vhaddq_m_n_u8): Remove.
(vhaddq_m_n_u32): Remove.
(vhaddq_m_n_u16): Remove.
(vhaddq_m_s8): Remove.
(vhaddq_m_s32): Remove.
(vhaddq_m_s16): Remove.
(vhaddq_m_u8): Remove.
(vhaddq_m_u32): Remove.
(vhaddq_m_u16): Remove.
(vhsubq_m_n_s8): Remove.
(vhsubq_m_n_s32): Remove.
(vhsubq_m_n_s16): Remove.
(vhsubq_m_n_u8): Remove.
(vhsubq_m_n_u32): Remove.
(vhsubq_m_n_u16): Remove.
(vhsubq_m_s8): Remove.
(vhsubq_m_s32): Remove.
(vhsubq_m_s16): Remove.
(vhsubq_m_u8): Remove.
(vhsubq_m_u32): Remove.
(vhsubq_m_u16): Remove.
(vhaddq_x_n_s8): Remove.
(vhaddq_x_n_s16): Remove.
(vhaddq_x_n_s32): Remove.
(vhaddq_x_n_u8): Remove.
(vhaddq_x_n_u16): Remove.
(vhaddq_x_n_u32): Remove.
(vhaddq_x_s8): Remove.
(vhaddq_x_s16): Remove.
(vhaddq_x_s32): Remove.
(vhaddq_x_u8): Remove.
(vhaddq_x_u16): Remove.
(vhaddq_x_u32): Remove.
(vhsubq_x_n_s8): Remove.
(vhsubq_x_n_s16): Remove.
(vhsubq_x_n_s32): Remove.
(vhsubq_x_n_u8): Remove.
(vhsubq_x_n_u16): Remove.
(vhsubq_x_n_u32): Remove.
(vhsubq_x_s8): Remove.
(vhsubq_x_s16): Remove.
(vhsubq_x_s32): Remove.
(vhsubq_x_u8): Remove.
(vhsubq_x_u16): Remove.
(vhsubq_x_u32): Remove.
(__arm_vhsubq_u8): Remove.
(__arm_vhsubq_n_u8): Remove.
(__arm_vhaddq_u8): Remove.
(__arm_vhaddq_n_u8): Remove.
(__arm_vhsubq_s8): Remove.
(__arm_vhsubq_n_s8): Remove.
(__arm_vhaddq_s8): Remove.
(__arm_vhaddq_n_s8): Remove.
(__arm_vhsubq_u16): Remove.
(__arm_vhsubq_n_u16): Remove.
(__arm_vhaddq_u16): Remove.
(__arm_vhaddq_n_u16): Remove.
(__arm_vhsubq_s16): Remove.
(__arm_vhsubq_n_s16): Remove.
(__arm_vhaddq_s16): Remove.
(__arm_vhaddq_n_s16): Remove.
(__arm_vhsubq_u32): Remove.
(__arm_vhsubq_n_u32): Remove.
(__arm_vhaddq_u32): Remove.
(__arm_vhaddq_n_u32): Remove.
(__arm_vhsubq_s32): Remove.
(__arm_vhsubq_n_s32): Remove.
(__arm_vhaddq_s32): Remove.
(__arm_vhaddq_n_s32): Remove.
(__arm_vhaddq_m_n_s8): Remove.
(__arm_vhaddq_m_n_s32): Remove.
(__arm_vhaddq_m_n_s16): Remove.
(__arm_vhaddq_m_n_u8): Remove.
(__arm_vhaddq_m_n_u32): Remove.
(__arm_vhaddq_m_n_u16): Remove.
(__arm_vhaddq_m_s8): Remove.
(__arm_vhaddq_m_s32): Remove.
(__arm_vhaddq_m_s16): Remove.
(__arm_vhaddq_m_u8): Remove.
(__arm_vhaddq_m_u32): Remove.
(__arm_vhaddq_m_u16): Remove.
(__arm_vhsubq_m_n_s8): Remove.
(__arm_vhsubq_m_n_s32): Remove.
(__arm_vhsubq_m_n_s16): Remove.
(__arm_vhsubq_m_n_u8): Remove.
(__arm_vhsubq_m_n_u32): Remove.
(__arm_vhsubq_m_n_u16): Remove.
(__arm_vhsubq_m_s8): Remove.
(__arm_vhsubq_m_s32): Remove.
(__arm_vhsubq_m_s16): Remove.
(__arm_vhsubq_m_u8): Remove.
(__arm_vhsubq_m_u32): Remove.
(__arm_vhsubq_m_u16): Remove.
(__arm_vhaddq_x_n_s8): Remove.
(__arm_vhaddq_x_n_s16): Remove.
(__arm_vhaddq_x_n_s32): Remove.
(__arm_vhaddq_x_n_u8): Remove.
(__arm_vhaddq_x_n_u16): Remove.
(__arm_vhaddq_x_n_u32): Remove.
(__arm_vhaddq_x_s8): Remove.
(__arm_vhaddq_x_s16): Remove.
(__arm_vhaddq_x_s32): Remove.
(__arm_vhaddq_x_u8): Remove.
(__arm_vhaddq_x_u16): Remove.
(__arm_vhaddq_x_u32): Remove.
(__arm_vhsubq_x_n_s8): Remove.
(__arm_vhsubq_x_n_s16): Remove.
(__arm_vhsubq_x_n_s32): Remove.
(__arm_vhsubq_x_n_u8): Remove.
(__arm_vhsubq_x_n_u16): Remove.
(__arm_vhsubq_x_n_u32): Remove.
(__arm_vhsubq_x_s8): Remove.
(__arm_vhsubq_x_s16): Remove.
(__arm_vhsubq_x_s32): Remove.
(__arm_vhsubq_x_u8): Remove.
(__arm_vhsubq_x_u16): Remove.
(__arm_vhsubq_x_u32): Remove.
(__arm_vhsubq): Remove.
(__arm_vhaddq): Remove.
(__arm_vhaddq_m): Remove.
(__arm_vhsubq_m): Remove.
(__arm_vhaddq_x): Remove.
(__arm_vhsubq_x): Remove.
(vmulhq): Remove.
(vmulhq_m): Remove.
(vmulhq_x): Remove.
(vmulhq_u8): Remove.
(vmulhq_s8): Remove.
(vmulhq_u16): Remove.
(vmulhq_s16): Remove.
(vmulhq_u32): Remove.
(vmulhq_s32): Remove.
(vmulhq_m_s8): Remove.
(vmulhq_m_s32): Remove.
(vmulhq_m_s16): Remove.
(vmulhq_m_u8): Remove.
(vmulhq_m_u32): Remove.
(vmulhq_m_u16): Remove.
(vmulhq_x_s8): Remove.
(vmulhq_x_s16): Remove.
(vmulhq_x_s32): Remove.
(vmulhq_x_u8): Remove.
(vmulhq_x_u16): Remove.
(vmulhq_x_u32): Remove.
(__arm_vmulhq_u8): Remove.
(__arm_vmulhq_s8): Remove.
(__arm_vmulhq_u16): Remove.
(__arm_vmulhq_s16): Remove.
(__arm_vmulhq_u32): Remove.
(__arm_vmulhq_s32): Remove.
(__arm_vmulhq_m_s8): Remove.
(__arm_vmulhq_m_s32): Remove.
(__arm_vmulhq_m_s16): Remove.
(__arm_vmulhq_m_u8): Remove.
(__arm_vmulhq_m_u32): Remove.
(__arm_vmulhq_m_u16): Remove.
(__arm_vmulhq_x_s8): Remove.
(__arm_vmulhq_x_s16): Remove.
(__arm_vmulhq_x_s32): Remove.
(__arm_vmulhq_x_u8): Remove.
(__arm_vmulhq_x_u16): Remove.
(__arm_vmulhq_x_u32): Remove.
(__arm_vmulhq): Remove.
(__arm_vmulhq_m): Remove.
(__arm_vmulhq_x): Remove.
(vqsubq): Remove.
(vqaddq): Remove.
(vqaddq_m): Remove.
(vqsubq_m): Remove.
(vqsubq_u8): Remove.
(vqsubq_n_u8): Remove.
(vqaddq_u8): Remove.
(vqaddq_n_u8): Remove.
(vqsubq_s8): Remove.
(vqsubq_n_s8): Remove.
(vqaddq_s8): Remove.
(vqaddq_n_s8): Remove.
(vqsubq_u16): Remove.
(vqsubq_n_u16): Remove.
(vqaddq_u16): Remove.
(vqaddq_n_u16): Remove.
(vqsubq_s16): Remove.
(vqsubq_n_s16): Remove.
(vqaddq_s16): Remove.
(vqaddq_n_s16): Remove.
(vqsubq_u32): Remove.
(vqsubq_n_u32): Remove.
(vqaddq_u32): Remove.
(vqaddq_n_u32): Remove.
(vqsubq_s32): Remove.
(vqsubq_n_s32): Remove.
(vqaddq_s32): Remove.
(vqaddq_n_s32): Remove.
(vqaddq_m_n_s8): Remove.
(vqaddq_m_n_s32): Remove.
(vqaddq_m_n_s16): Remove.
(vqaddq_m_n_u8): Remove.
(vqaddq_m_n_u32): Remove.
(vqaddq_m_n_u16): Remove.
(vqaddq_m_s8): Remove.
(vqaddq_m_s32): Remove.
(vqaddq_m_s16): Remove.
(vqaddq_m_u8): Remove.
(vqaddq_m_u32): Remove.
(vqaddq_m_u16): Remove.
(vqsubq_m_n_s8): Remove.
(vqsubq_m_n_s32): Remove.
(vqsubq_m_n_s16): Remove.
(vqsubq_m_n_u8): Remove.
(vqsubq_m_n_u32): Remove.
(vqsubq_m_n_u16): Remove.
(vqsubq_m_s8): Remove.
(vqsubq_m_s32): Remove.
(vqsubq_m_s16): Remove.
(vqsubq_m_u8): Remove.
(vqsubq_m_u32): Remove.
(vqsubq_m_u16): Remove.
(__arm_vqsubq_u8): Remove.
(__arm_vqsubq_n_u8): Remove.
(__arm_vqaddq_u8): Remove.
(__arm_vqaddq_n_u8): Remove.
(__arm_vqsubq_s8): Remove.
(__arm_vqsubq_n_s8): Remove.
(__arm_vqaddq_s8): Remove.
(__arm_vqaddq_n_s8): Remove.
(__arm_vqsubq_u16): Remove.
(__arm_vqsubq_n_u16): Remove.
(__arm_vqaddq_u16): Remove.
(__arm_vqaddq_n_u16): Remove.
(__arm_vqsubq_s16): Remove.
(__arm_vqsubq_n_s16): Remove.
(__arm_vqaddq_s16): Remove.
(__arm_vqaddq_n_s16): Remove.
(__arm_vqsubq_u32): Remove.
(__arm_vqsubq_n_u32): Remove.
(__arm_vqaddq_u32): Remove.
(__arm_vqaddq_n_u32): Remove.
(__arm_vqsubq_s32): Remove.
(__arm_vqsubq_n_s32): Remove.
(__arm_vqaddq_s32): Remove.
(__arm_vqaddq_n_s32): Remove.
(__arm_vqaddq_m_n_s8): Remove.
(__arm_vqaddq_m_n_s32): Remove.
(__arm_vqaddq_m_n_s16): Remove.
(__arm_vqaddq_m_n_u8): Remove.
(__arm_vqaddq_m_n_u32): Remove.
(__arm_vqaddq_m_n_u16): Remove.
(__arm_vqaddq_m_s8): Remove.
(__arm_vqaddq_m_s32): Remove.
(__arm_vqaddq_m_s16): Remove.
(__arm_vqaddq_m_u8): Remove.
(__arm_vqaddq_m_u32): Remove.
(__arm_vqaddq_m_u16): Remove.
(__arm_vqsubq_m_n_s8): Remove.
(__arm_vqsubq_m_n_s32): Remove.
(__arm_vqsubq_m_n_s16): Remove.
(__arm_vqsubq_m_n_u8): Remove.
(__arm_vqsubq_m_n_u32): Remove.
(__arm_vqsubq_m_n_u16): Remove.
(__arm_vqsubq_m_s8): Remove.
(__arm_vqsubq_m_s32): Remove.
(__arm_vqsubq_m_s16): Remove.
(__arm_vqsubq_m_u8): Remove.
(__arm_vqsubq_m_u32): Remove.
(__arm_vqsubq_m_u16): Remove.
(__arm_vqsubq): Remove.
(__arm_vqaddq): Remove.
(__arm_vqaddq_m): Remove.
(__arm_vqsubq_m): Remove.
(vqdmulhq): Remove.
(vqdmulhq_m): Remove.
(vqdmulhq_s8): Remove.
(vqdmulhq_n_s8): Remove.
(vqdmulhq_s16): Remove.
(vqdmulhq_n_s16): Remove.
(vqdmulhq_s32): Remove.
(vqdmulhq_n_s32): Remove.
(vqdmulhq_m_n_s8): Remove.
(vqdmulhq_m_n_s32): Remove.
(vqdmulhq_m_n_s16): Remove.
(vqdmulhq_m_s8): Remove.
(vqdmulhq_m_s32): Remove.
(vqdmulhq_m_s16): Remove.
(__arm_vqdmulhq_s8): Remove.
(__arm_vqdmulhq_n_s8): Remove.
(__arm_vqdmulhq_s16): Remove.
(__arm_vqdmulhq_n_s16): Remove.
(__arm_vqdmulhq_s32): Remove.
(__arm_vqdmulhq_n_s32): Remove.
(__arm_vqdmulhq_m_n_s8): Remove.
(__arm_vqdmulhq_m_n_s32): Remove.
(__arm_vqdmulhq_m_n_s16): Remove.
(__arm_vqdmulhq_m_s8): Remove.
(__arm_vqdmulhq_m_s32): Remove.
(__arm_vqdmulhq_m_s16): Remove.
(__arm_vqdmulhq): Remove.
(__arm_vqdmulhq_m): Remove.
(vrhaddq): Remove.
(vrhaddq_m): Remove.
(vrhaddq_x): Remove.
(vrhaddq_u8): Remove.
(vrhaddq_s8): Remove.
(vrhaddq_u16): Remove.
(vrhaddq_s16): Remove.
(vrhaddq_u32): Remove.
(vrhaddq_s32): Remove.
(vrhaddq_m_s8): Remove.
(vrhaddq_m_s32): Remove.
(vrhaddq_m_s16): Remove.
(vrhaddq_m_u8): Remove.
(vrhaddq_m_u32): Remove.
(vrhaddq_m_u16): Remove.
(vrhaddq_x_s8): Remove.
(vrhaddq_x_s16): Remove.
(vrhaddq_x_s32): Remove.
(vrhaddq_x_u8): Remove.
(vrhaddq_x_u16): Remove.
(vrhaddq_x_u32): Remove.
(__arm_vrhaddq_u8): Remove.
(__arm_vrhaddq_s8): Remove.
(__arm_vrhaddq_u16): Remove.
(__arm_vrhaddq_s16): Remove.
(__arm_vrhaddq_u32): Remove.
(__arm_vrhaddq_s32): Remove.
(__arm_vrhaddq_m_s8): Remove.
(__arm_vrhaddq_m_s32): Remove.
(__arm_vrhaddq_m_s16): Remove.
(__arm_vrhaddq_m_u8): Remove.
(__arm_vrhaddq_m_u32): Remove.
(__arm_vrhaddq_m_u16): Remove.
(__arm_vrhaddq_x_s8): Remove.
(__arm_vrhaddq_x_s16): Remove.
(__arm_vrhaddq_x_s32): Remove.
(__arm_vrhaddq_x_u8): Remove.
(__arm_vrhaddq_x_u16): Remove.
(__arm_vrhaddq_x_u32): Remove.
(__arm_vrhaddq): Remove.
(__arm_vrhaddq_m): Remove.
(__arm_vrhaddq_x): Remove.
(vrmulhq): Remove.
(vrmulhq_m): Remove.
(vrmulhq_x): Remove.
(vrmulhq_u8): Remove.
(vrmulhq_s8): Remove.
(vrmulhq_u16): Remove.
(vrmulhq_s16): Remove.
(vrmulhq_u32): Remove.
(vrmulhq_s32): Remove.
(vrmulhq_m_s8): Remove.
(vrmulhq_m_s32): Remove.
(vrmulhq_m_s16): Remove.
(vrmulhq_m_u8): Remove.
(vrmulhq_m_u32): Remove.
(vrmulhq_m_u16): Remove.
(vrmulhq_x_s8): Remove.
(vrmulhq_x_s16): Remove.
(vrmulhq_x_s32): Remove.
(vrmulhq_x_u8): Remove.
(vrmulhq_x_u16): Remove.
(vrmulhq_x_u32): Remove.
(__arm_vrmulhq_u8): Remove.
(__arm_vrmulhq_s8): Remove.
(__arm_vrmulhq_u16): Remove.
(__arm_vrmulhq_s16): Remove.
(__arm_vrmulhq_u32): Remove.
(__arm_vrmulhq_s32): Remove.
(__arm_vrmulhq_m_s8): Remove.
(__arm_vrmulhq_m_s32): Remove.
(__arm_vrmulhq_m_s16): Remove.
(__arm_vrmulhq_m_u8): Remove.
(__arm_vrmulhq_m_u32): Remove.
(__arm_vrmulhq_m_u16): Remove.
(__arm_vrmulhq_x_s8): Remove.
(__arm_vrmulhq_x_s16): Remove.
(__arm_vrmulhq_x_s32): Remove.
(__arm_vrmulhq_x_u8): Remove.
(__arm_vrmulhq_x_u16): Remove.
(__arm_vrmulhq_x_u32): Remove.
(__arm_vrmulhq): Remove.
(__arm_vrmulhq_m): Remove.
(__arm_vrmulhq_x): Remove.

arm: [MVE intrinsics] factorize several binary operations

Factorize vabdq, vhaddq, vhsubq, vmulhq, vqaddq_u, vqdmulhq,
vqrdmulhq, vqrshlq, vqshlq, vqsubq_u, vrhaddq, vrmulhq, vrshlq
so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_INT_SU_BINARY): New.
(mve_insn): Add vabdq, vhaddq, vhsubq, vmulhq, vqaddq, vqdmulhq,
vqrdmulhq, vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq.
(supf): Add VQDMULHQ_S, VQRDMULHQ_S.
* config/arm/mve.md (mve_vabdq_<supf><mode>)
(@mve_vhaddq_<supf><mode>, mve_vhsubq_<supf><mode>)
(mve_vmulhq_<supf><mode>, mve_vqaddq_<supf><mode>)
(mve_vqdmulhq_s<mode>, mve_vqrdmulhq_s<mode>)
(mve_vqrshlq_<supf><mode>, mve_vqshlq_<supf><mode>)
(mve_vqsubq_<supf><mode>, @mve_vrhaddq_<supf><mode>)
(mve_vrmulhq_<supf><mode>, mve_vrshlq_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_<supf><mode>): ... this.
* config/arm/vec-common.md (avg<mode>3_floor, uavg<mode>3_floor)
(avg<mode>3_ceil, uavg<mode>3_ceil): Use gen_mve_q instead of
gen_mve_vhaddq / gen_mve_vrhaddq.

arm: [MVE intrinsics] factorize several binary _m_n operations

Factorize vhaddq_m_n, vhsubq_m_n, vmlaq_m_n, vmlasq_m_n, vqaddq_m_n,
vqdmlahq_m_n, vqdmlashq_m_n, vqdmulhq_m_n, vqrdmlahq_m_n,
vqrdmlashq_m_n, vqrdmulhq_m_n, vqsubq_m_n
so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_INT_SU_M_N_BINARY): New.
(mve_insn): Add vhaddq, vhsubq, vmlaq, vmlasq, vqaddq, vqdmlahq,
vqdmlashq, vqdmulhq, vqrdmlahq, vqrdmlashq, vqrdmulhq, vqsubq.
(supf): Add VQDMLAHQ_M_N_S, VQDMLASHQ_M_N_S, VQRDMLAHQ_M_N_S,
VQRDMLASHQ_M_N_S, VQDMULHQ_M_N_S, VQRDMULHQ_M_N_S.
* config/arm/mve.md (mve_vhaddq_m_n_<supf><mode>)
(mve_vhsubq_m_n_<supf><mode>, mve_vmlaq_m_n_<supf><mode>)
(mve_vmlasq_m_n_<supf><mode>, mve_vqaddq_m_n_<supf><mode>)
(mve_vqdmlahq_m_n_s<mode>, mve_vqdmlashq_m_n_s<mode>)
(mve_vqrdmlahq_m_n_s<mode>, mve_vqrdmlashq_m_n_s<mode>)
(mve_vqsubq_m_n_<supf><mode>, mve_vqdmulhq_m_n_s<mode>)
(mve_vqrdmulhq_m_n_s<mode>): Merge into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] factorize several binary _n operations

Factorize
vhaddq_n, vhsubq_n, vqaddq_n, vqdmulhq_n, vqrdmulhq_n, vqsubq_n
so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_INT_SU_N_BINARY): New.
(mve_insn): Add vhaddq, vhsubq, vqaddq, vqdmulhq, vqrdmulhq,
vqsubq.
(supf): Add VQDMULHQ_N_S, VQRDMULHQ_N_S.
* config/arm/mve.md (mve_vhaddq_n_<supf><mode>)
(mve_vhsubq_n_<supf><mode>, mve_vqaddq_n_<supf><mode>)
(mve_vqdmulhq_n_s<mode>, mve_vqrdmulhq_n_s<mode>)
(mve_vqsubq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.

arm: [MVE intrinsics] factorize several binary_m operations

Factorize m-predicated versions of vabdq, vhaddq, vhsubq, vmaxq,
vminq, vmulhq, vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq,
vqdmulhq, vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq
so that they use the same pattern.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_INT_SU_M_BINARY): New.
(mve_insn): Add vabdq, vhaddq, vhsubq, vmaxq, vminq, vmulhq,
vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq, vqdmulhq,
vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq.
(supf): Add VQDMLADHQ_M_S, VQDMLADHXQ_M_S, VQDMLSDHQ_M_S,
VQDMLSDHXQ_M_S, VQDMULHQ_M_S, VQRDMLADHQ_M_S, VQRDMLADHXQ_M_S,
VQRDMLSDHQ_M_S, VQRDMLSDHXQ_M_S, VQRDMULHQ_M_S.
* config/arm/mve.md (@mve_<mve_insn>q_m_<supf><mode>): New.
(mve_vshlq_m_<supf><mode>): Merged into
@mve_<mve_insn>q_m_<supf><mode>.
(mve_vabdq_m_<supf><mode>): Likewise.
(mve_vhaddq_m_<supf><mode>): Likewise.
(mve_vhsubq_m_<supf><mode>): Likewise.
(mve_vmaxq_m_<supf><mode>): Likewise.
(mve_vminq_m_<supf><mode>): Likewise.
(mve_vmulhq_m_<supf><mode>): Likewise.
(mve_vqaddq_m_<supf><mode>): Likewise.
(mve_vqrshlq_m_<supf><mode>): Likewise.
(mve_vqshlq_m_<supf><mode>): Likewise.
(mve_vqsubq_m_<supf><mode>): Likewise.
(mve_vrhaddq_m_<supf><mode>): Likewise.
(mve_vrmulhq_m_<supf><mode>): Likewise.
(mve_vrshlq_m_<supf><mode>): Likewise.
(mve_vqdmladhq_m_s<mode>): Likewise.
(mve_vqdmladhxq_m_s<mode>): Likewise.
(mve_vqdmlsdhq_m_s<mode>): Likewise.
(mve_vqdmlsdhxq_m_s<mode>): Likewise.
(mve_vqdmulhq_m_s<mode>): Likewise.
(mve_vqrdmladhq_m_s<mode>): Likewise.
(mve_vqrdmladhxq_m_s<mode>): Likewise.
(mve_vqrdmlsdhq_m_s<mode>): Likewise.
(mve_vqrdmlsdhxq_m_s<mode>): Likewise.
(mve_vqrdmulhq_m_s<mode>): Likewise.

arm: [MVE intrinsics] rework vcreateq

Implement vcreateq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_M_N): New. (vcreateq): New.
* config/arm/arm-mve-builtins-base.def (vcreateq): New.
* config/arm/arm-mve-builtins-base.h (vcreateq): New.
* config/arm/arm_mve.h (vcreateq_f16): Remove.
(vcreateq_f32): Remove.
(vcreateq_u8): Remove.
(vcreateq_u16): Remove.
(vcreateq_u32): Remove.
(vcreateq_u64): Remove.
(vcreateq_s8): Remove.
(vcreateq_s16): Remove.
(vcreateq_s32): Remove.
(vcreateq_s64): Remove.
(__arm_vcreateq_u8): Remove.
(__arm_vcreateq_u16): Remove.
(__arm_vcreateq_u32): Remove.
(__arm_vcreateq_u64): Remove.
(__arm_vcreateq_s8): Remove.
(__arm_vcreateq_s16): Remove.
(__arm_vcreateq_s32): Remove.
(__arm_vcreateq_s64): Remove.
(__arm_vcreateq_f16): Remove.
(__arm_vcreateq_f32): Remove.

arm: [MVE intrinsics] factorize vcreateq

We need a 'fake' iterator to be able to use mve_insn for vcreateq_f.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_FP_CREATE_ONLY): New.
(mve_insn): Add VCREATEQ_S, VCREATEQ_U, VCREATEQ_F.
* config/arm/mve.md (mve_vcreateq_f<mode>): Rename into ...
(@mve_<mve_insn>q_f<mode>): ... this.
(mve_vcreateq_<supf><mode>): Rename into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.

arm: [MVE intrinsics] add create shape

This patch adds the create shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (create): New.
* config/arm/arm-mve-builtins-shapes.h: (create): New.

arm: [MVE intrinsics] add unspec_mve_function_exact_insn

Introduce a function that will be used to build intrinsics which use
UNSPECS for the versions.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn): New.

arm: [MVE intrinsics] rework vorrq

Implement vorrq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N_NO_N_F): New.
(vorrq): New.
* config/arm/arm-mve-builtins-base.def (vorrq): New.
* config/arm/arm-mve-builtins-base.h (vorrq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vorrq.
* config/arm/arm_mve.h (vorrq): Remove.
(vorrq_m_n): Remove.
(vorrq_m): Remove.
(vorrq_x): Remove.
(vorrq_u8): Remove.
(vorrq_s8): Remove.
(vorrq_u16): Remove.
(vorrq_s16): Remove.
(vorrq_u32): Remove.
(vorrq_s32): Remove.
(vorrq_n_u16): Remove.
(vorrq_f16): Remove.
(vorrq_n_s16): Remove.
(vorrq_n_u32): Remove.
(vorrq_f32): Remove.
(vorrq_n_s32): Remove.
(vorrq_m_n_s16): Remove.
(vorrq_m_n_u16): Remove.
(vorrq_m_n_s32): Remove.
(vorrq_m_n_u32): Remove.
(vorrq_m_s8): Remove.
(vorrq_m_s32): Remove.
(vorrq_m_s16): Remove.
(vorrq_m_u8): Remove.
(vorrq_m_u32): Remove.
(vorrq_m_u16): Remove.
(vorrq_m_f32): Remove.
(vorrq_m_f16): Remove.
(vorrq_x_s8): Remove.
(vorrq_x_s16): Remove.
(vorrq_x_s32): Remove.
(vorrq_x_u8): Remove.
(vorrq_x_u16): Remove.
(vorrq_x_u32): Remove.
(vorrq_x_f16): Remove.
(vorrq_x_f32): Remove.
(__arm_vorrq_u8): Remove.
(__arm_vorrq_s8): Remove.
(__arm_vorrq_u16): Remove.
(__arm_vorrq_s16): Remove.
(__arm_vorrq_u32): Remove.
(__arm_vorrq_s32): Remove.
(__arm_vorrq_n_u16): Remove.
(__arm_vorrq_n_s16): Remove.
(__arm_vorrq_n_u32): Remove.
(__arm_vorrq_n_s32): Remove.
(__arm_vorrq_m_n_s16): Remove.
(__arm_vorrq_m_n_u16): Remove.
(__arm_vorrq_m_n_s32): Remove.
(__arm_vorrq_m_n_u32): Remove.
(__arm_vorrq_m_s8): Remove.
(__arm_vorrq_m_s32): Remove.
(__arm_vorrq_m_s16): Remove.
(__arm_vorrq_m_u8): Remove.
(__arm_vorrq_m_u32): Remove.
(__arm_vorrq_m_u16): Remove.
(__arm_vorrq_x_s8): Remove.
(__arm_vorrq_x_s16): Remove.
(__arm_vorrq_x_s32): Remove.
(__arm_vorrq_x_u8): Remove.
(__arm_vorrq_x_u16): Remove.
(__arm_vorrq_x_u32): Remove.
(__arm_vorrq_f16): Remove.
(__arm_vorrq_f32): Remove.
(__arm_vorrq_m_f32): Remove.
(__arm_vorrq_m_f16): Remove.
(__arm_vorrq_x_f16): Remove.
(__arm_vorrq_x_f32): Remove.
(__arm_vorrq): Remove.
(__arm_vorrq_m_n): Remove.
(__arm_vorrq_m): Remove.
(__arm_vorrq_x): Remove.

arm: [MVE intrinsics] add binary_orrq shape

patch adds the binary_orrq shape description.

MODE_n intrinsics use a set of predicates (preds_m_or_none) different
the MODE_none ones, so we explicitly reference preds_m_or_none from
the shape, thus we need to make it a global array.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_orrq): New.
* config/arm/arm-mve-builtins-shapes.h (binary_orrq): New.
* config/arm/arm-mve-builtins.cc (preds_m_or_none): Remove static.
* config/arm/arm-mve-builtins.h (preds_m_or_none): Declare.

arm: [MVE intrinsics] rework vandq veorq

Implement vamdq, veorq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M): New.
(vandq,veorq): New.
* config/arm/arm-mve-builtins-base.def (vandq, veorq): New.
* config/arm/arm-mve-builtins-base.h (vandq, veorq): New.
* config/arm/arm_mve.h (vandq): Remove.
(vandq_m): Remove.
(vandq_x): Remove.
(vandq_u8): Remove.
(vandq_s8): Remove.
(vandq_u16): Remove.
(vandq_s16): Remove.
(vandq_u32): Remove.
(vandq_s32): Remove.
(vandq_f16): Remove.
(vandq_f32): Remove.
(vandq_m_s8): Remove.
(vandq_m_s32): Remove.
(vandq_m_s16): Remove.
(vandq_m_u8): Remove.
(vandq_m_u32): Remove.
(vandq_m_u16): Remove.
(vandq_m_f32): Remove.
(vandq_m_f16): Remove.
(vandq_x_s8): Remove.
(vandq_x_s16): Remove.
(vandq_x_s32): Remove.
(vandq_x_u8): Remove.
(vandq_x_u16): Remove.
(vandq_x_u32): Remove.
(vandq_x_f16): Remove.
(vandq_x_f32): Remove.
(__arm_vandq_u8): Remove.
(__arm_vandq_s8): Remove.
(__arm_vandq_u16): Remove.
(__arm_vandq_s16): Remove.
(__arm_vandq_u32): Remove.
(__arm_vandq_s32): Remove.
(__arm_vandq_m_s8): Remove.
(__arm_vandq_m_s32): Remove.
(__arm_vandq_m_s16): Remove.
(__arm_vandq_m_u8): Remove.
(__arm_vandq_m_u32): Remove.
(__arm_vandq_m_u16): Remove.
(__arm_vandq_x_s8): Remove.
(__arm_vandq_x_s16): Remove.
(__arm_vandq_x_s32): Remove.
(__arm_vandq_x_u8): Remove.
(__arm_vandq_x_u16): Remove.
(__arm_vandq_x_u32): Remove.
(__arm_vandq_f16): Remove.
(__arm_vandq_f32): Remove.
(__arm_vandq_m_f32): Remove.
(__arm_vandq_m_f16): Remove.
(__arm_vandq_x_f16): Remove.
(__arm_vandq_x_f32): Remove.
(__arm_vandq): Remove.
(__arm_vandq_m): Remove.
(__arm_vandq_x): Remove.
(veorq_m): Remove.
(veorq_x): Remove.
(veorq_u8): Remove.
(veorq_s8): Remove.
(veorq_u16): Remove.
(veorq_s16): Remove.
(veorq_u32): Remove.
(veorq_s32): Remove.
(veorq_f16): Remove.
(veorq_f32): Remove.
(veorq_m_s8): Remove.
(veorq_m_s32): Remove.
(veorq_m_s16): Remove.
(veorq_m_u8): Remove.
(veorq_m_u32): Remove.
(veorq_m_u16): Remove.
(veorq_m_f32): Remove.
(veorq_m_f16): Remove.
(veorq_x_s8): Remove.
(veorq_x_s16): Remove.
(veorq_x_s32): Remove.
(veorq_x_u8): Remove.
(veorq_x_u16): Remove.
(veorq_x_u32): Remove.
(veorq_x_f16): Remove.
(veorq_x_f32): Remove.
(__arm_veorq_u8): Remove.
(__arm_veorq_s8): Remove.
(__arm_veorq_u16): Remove.
(__arm_veorq_s16): Remove.
(__arm_veorq_u32): Remove.
(__arm_veorq_s32): Remove.
(__arm_veorq_m_s8): Remove.
(__arm_veorq_m_s32): Remove.
(__arm_veorq_m_s16): Remove.
(__arm_veorq_m_u8): Remove.
(__arm_veorq_m_u32): Remove.
(__arm_veorq_m_u16): Remove.
(__arm_veorq_x_s8): Remove.
(__arm_veorq_x_s16): Remove.
(__arm_veorq_x_s32): Remove.
(__arm_veorq_x_u8): Remove.
(__arm_veorq_x_u16): Remove.
(__arm_veorq_x_u32): Remove.
(__arm_veorq_f16): Remove.
(__arm_veorq_f32): Remove.
(__arm_veorq_m_f32): Remove.
(__arm_veorq_m_f16): Remove.
(__arm_veorq_x_f16): Remove.
(__arm_veorq_x_f32): Remove.
(__arm_veorq): Remove.
(__arm_veorq_m): Remove.
(__arm_veorq_x): Remove.

arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq

Factorize vandq, veorq, vorrq, vbicq so that they use the same
parameterized names.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC)
(MVE_FP_M_BINARY_LOGIC): New.
(MVE_INT_M_N_BINARY_LOGIC): New.
(MVE_INT_N_BINARY_LOGIC): New.
(mve_insn): Add vand, veor, vorr, vbic.
* config/arm/mve.md (mve_vandq_m_<supf><mode>)
(mve_veorq_m_<supf><mode>, mve_vorrq_m_<supf><mode>)
(mve_vbicq_m_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this.
(mve_vandq_m_f<mode>, mve_veorq_m_f<mode>, mve_vorrq_m_f<mode>)
(mve_vbicq_m_f<mode>): Merge into ...
(@mve_<mve_insn>q_m_f<mode>): ... this.
(mve_vorrq_n_<supf><mode>)
(mve_vbicq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vorrq_m_n_<supf><mode>, mve_vbicq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.

arm: [MVE intrinsics] add binary shape

This patch adds the binary shape description.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary): New.
* config/arm/arm-mve-builtins-shapes.h (binary): New.

arm: [MVE intrinsics] rework vaddq vmulq vsubq

Implement vaddq, vmulq, vsubq using the new MVE builtins framework.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/

* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N):
New.
(vaddq, vmulq, vsubq): New.
* config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
* config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
* config/arm/arm_mve.h (vaddq): Remove.
(vaddq_m): Remove.
(vaddq_x): Remove.
(vaddq_n_u8): Remove.
(vaddq_n_s8): Remove.
(vaddq_n_u16): Remove.
(vaddq_n_s16): Remove.
(vaddq_n_u32): Remove.
(vaddq_n_s32): Remove.
(vaddq_n_f16): Remove.
(vaddq_n_f32): Remove.
(vaddq_m_n_s8): Remove.
(vaddq_m_n_s32): Remove.
(vaddq_m_n_s16): Remove.
(vaddq_m_n_u8): Remove.
(vaddq_m_n_u32): Remove.
(vaddq_m_n_u16): Remove.
(vaddq_m_s8): Remove.
(vaddq_m_s32): Remove.
(vaddq_m_s16): Remove.
(vaddq_m_u8): Remove.
(vaddq_m_u32): Remove.
(vaddq_m_u16): Remove.
(vaddq_m_f32): Remove.
(vaddq_m_f16): Remove.
(vaddq_m_n_f32): Remove.
(vaddq_m_n_f16): Remove.
(vaddq_s8): Remove.
(vaddq_s16): Remove.
(vaddq_s32): Remove.
(vaddq_u8): Remove.
(vaddq_u16): Remove.
(vaddq_u32): Remove.
(vaddq_f16): Remove.
(vaddq_f32): Remove.
(vaddq_x_s8): Remove.
(vaddq_x_s16): Remove.
(vaddq_x_s32): Remove.
(vaddq_x_n_s8): Remove.
(vaddq_x_n_s16): Remove.
(vaddq_x_n_s32): Remove.
(vaddq_x_u8): Remove.
(vaddq_x_u16): Remove.
(vaddq_x_u32): Remove.
(vaddq_x_n_u8): Remove.
(vaddq_x_n_u16): Remove.
(vaddq_x_n_u32): Remove.
(vaddq_x_f16): Remove.
(vaddq_x_f32): Remove.
(vaddq_x_n_f16): Remove.
(vaddq_x_n_f32): Remove.
(__arm_vaddq_n_u8): Remove.
(__arm_vaddq_n_s8): Remove.
(__arm_vaddq_n_u16): Remove.
(__arm_vaddq_n_s16): Remove.
(__arm_vaddq_n_u32): Remove.
(__arm_vaddq_n_s32): Remove.
(__arm_vaddq_m_n_s8): Remove.
(__arm_vaddq_m_n_s32): Remove.
(__arm_vaddq_m_n_s16): Remove.
(__arm_vaddq_m_n_u8): Remove.
(__arm_vaddq_m_n_u32): Remove.
(__arm_vaddq_m_n_u16): Remove.
(__arm_vaddq_m_s8): Remove.
(__arm_vaddq_m_s32): Remove.
(__arm_vaddq_m_s16): Remove.
(__arm_vaddq_m_u8): Remove.
(__arm_vaddq_m_u32): Remove.
(__arm_vaddq_m_u16): Remove.
(__arm_vaddq_s8): Remove.
(__arm_vaddq_s16): Remove.
(__arm_vaddq_s32): Remove.
(__arm_vaddq_u8): Remove.
(__arm_vaddq_u16): Remove.
(__arm_vaddq_u32): Remove.
(__arm_vaddq_x_s8): Remove.
(__arm_vaddq_x_s16): Remove.
(__arm_vaddq_x_s32): Remove.
(__arm_vaddq_x_n_s8): Remove.
(__arm_vaddq_x_n_s16): Remove.
(__arm_vaddq_x_n_s32): Remove.
(__arm_vaddq_x_u8): Remove.
(__arm_vaddq_x_u16): Remove.
(__arm_vaddq_x_u32): Remove.
(__arm_vaddq_x_n_u8): Remove.
(__arm_vaddq_x_n_u16): Remove.
(__arm_vaddq_x_n_u32): Remove.
(__arm_vaddq_n_f16): Remove.
(__arm_vaddq_n_f32): Remove.
(__arm_vaddq_m_f32): Remove.
(__arm_vaddq_m_f16): Remove.
(__arm_vaddq_m_n_f32): Remove.
(__arm_vaddq_m_n_f16): Remove.
(__arm_vaddq_f16): Remove.
(__arm_vaddq_f32): Remove.
(__arm_vaddq_x_f16): Remove.
(__arm_vaddq_x_f32): Remove.
(__arm_vaddq_x_n_f16): Remove.
(__arm_vaddq_x_n_f32): Remove.
(__arm_vaddq): Remove.
(__arm_vaddq_m): Remove.
(__arm_vaddq_x): Remove.
(vmulq): Remove.
(vmulq_m): Remove.
(vmulq_x): Remove.
(vmulq_u8): Remove.
(vmulq_n_u8): Remove.
(vmulq_s8): Remove.
(vmulq_n_s8): Remove.
(vmulq_u16): Remove.
(vmulq_n_u16): Remove.
(vmulq_s16): Remove.
(vmulq_n_s16): Remove.
(vmulq_u32): Remove.
(vmulq_n_u32): Remove.
(vmulq_s32): Remove.
(vmulq_n_s32): Remove.
(vmulq_n_f16): Remove.
(vmulq_f16): Remove.
(vmulq_n_f32): Remove.
(vmulq_f32): Remove.
(vmulq_m_n_s8): Remove.
(vmulq_m_n_s32): Remove.
(vmulq_m_n_s16): Remove.
(vmulq_m_n_u8): Remove.
(vmulq_m_n_u32): Remove.
(vmulq_m_n_u16): Remove.
(vmulq_m_s8): Remove.
(vmulq_m_s32): Remove.
(vmulq_m_s16): Remove.
(vmulq_m_u8): Remove.
(vmulq_m_u32): Remove.
(vmulq_m_u16): Remove.
(vmulq_m_f32): Remove.
(vmulq_m_f16): Remove.
(vmulq_m_n_f32): Remove.
(vmulq_m_n_f16): Remove.
(vmulq_x_s8): Remove.
(vmulq_x_s16): Remove.
(vmulq_x_s32): Remove.
(vmulq_x_n_s8): Remove.
(vmulq_x_n_s16): Remove.
(vmulq_x_n_s32): Remove.
(vmulq_x_u8): Remove.
(vmulq_x_u16): Remove.
(vmulq_x_u32): Remove.
(vmulq_x_n_u8): Remove.
(vmulq_x_n_u16): Remove.
(vmulq_x_n_u32): Remove.
(vmulq_x_f16): Remove.
(vmulq_x_f32): Remove.
(vmulq_x_n_f16): Remove.
(vmulq_x_n_f32): Remove.
(__arm_vmulq_u8): Remove.
(__arm_vmulq_n_u8): Remove.
(__arm_vmulq_s8): Remove.
(__arm_vmulq_n_s8): Remove.
(__arm_vmulq_u16): Remove.
(__arm_vmulq_n_u16): Remove.
(__arm_vmulq_s16): Remove.
(__arm_vmulq_n_s16): Remove.
(__arm_vmulq_u32): Remove.
(__arm_vmulq_n_u32): Remove.
(__arm_vmulq_s32): Remove.
(__arm_vmulq_n_s32): Remove.
(__arm_vmulq_m_n_s8): Remove.
(__arm_vmulq_m_n_s32): Remove.
(__arm_vmulq_m_n_s16): Remove.
(__arm_vmulq_m_n_u8): Remove.
(__arm_vmulq_m_n_u32): Remove.
(__arm_vmulq_m_n_u16): Remove.
(__arm_vmulq_m_s8): Remove.
(__arm_vmulq_m_s32): Remove.
(__arm_vmulq_m_s16): Remove.
(__arm_vmulq_m_u8): Remove.
(__arm_vmulq_m_u32): Remove.
(__arm_vmulq_m_u16): Remove.
(__arm_vmulq_x_s8): Remove.
(__arm_vmulq_x_s16): Remove.
(__arm_vmulq_x_s32): Remove.
(__arm_vmulq_x_n_s8): Remove.
(__arm_vmulq_x_n_s16): Remove.
(__arm_vmulq_x_n_s32): Remove.
(__arm_vmulq_x_u8): Remove.
(__arm_vmulq_x_u16): Remove.
(__arm_vmulq_x_u32): Remove.
(__arm_vmulq_x_n_u8): Remove.
(__arm_vmulq_x_n_u16): Remove.
(__arm_vmulq_x_n_u32): Remove.
(__arm_vmulq_n_f16): Remove.
(__arm_vmulq_f16): Remove.
(__arm_vmulq_n_f32): Remove.
(__arm_vmulq_f32): Remove.
(__arm_vmulq_m_f32): Remove.
(__arm_vmulq_m_f16): Remove.
(__arm_vmulq_m_n_f32): Remove.
(__arm_vmulq_m_n_f16): Remove.
(__arm_vmulq_x_f16): Remove.
(__arm_vmulq_x_f32): Remove.
(__arm_vmulq_x_n_f16): Remove.
(__arm_vmulq_x_n_f32): Remove.
(__arm_vmulq): Remove.
(__arm_vmulq_m): Remove.
(__arm_vmulq_x): Remove.
(vsubq): Remove.
(vsubq_m): Remove.
(vsubq_x): Remove.
(vsubq_n_f16): Remove.
(vsubq_n_f32): Remove.
(vsubq_u8): Remove.
(vsubq_n_u8): Remove.
(vsubq_s8): Remove.
(vsubq_n_s8): Remove.
(vsubq_u16): Remove.
(vsubq_n_u16): Remove.
(vsubq_s16): Remove.
(vsubq_n_s16): Remove.
(vsubq_u32): Remove.
(vsubq_n_u32): Remove.
(vsubq_s32): Remove.
(vsubq_n_s32): Remove.
(vsubq_f16): Remove.
(vsubq_f32): Remove.
(vsubq_m_s8): Remove.
(vsubq_m_u8): Remove.
(vsubq_m_s16): Remove.
(vsubq_m_u16): Remove.
(vsubq_m_s32): Remove.
(vsubq_m_u32): Remove.
(vsubq_m_n_s8): Remove.
(vsubq_m_n_s32): Remove.
(vsubq_m_n_s16): Remove.
(vsubq_m_n_u8): Remove.
(vsubq_m_n_u32): Remove.
(vsubq_m_n_u16): Remove.
(vsubq_m_f32): Remove.
(vsubq_m_f16): Remove.
(vsubq_m_n_f32): Remove.
(vsubq_m_n_f16): Remove.
(vsubq_x_s8): Remove.
(vsubq_x_s16): Remove.
(vsubq_x_s32): Remove.
(vsubq_x_n_s8): Remove.
(vsubq_x_n_s16): Remove.
(vsubq_x_n_s32): Remove.
(vsubq_x_u8): Remove.
(vsubq_x_u16): Remove.
(vsubq_x_u32): Remove.
(vsubq_x_n_u8): Remove.
(vsubq_x_n_u16): Remove.
(vsubq_x_n_u32): Remove.
(vsubq_x_f16): Remove.
(vsubq_x_f32): Remove.
(vsubq_x_n_f16): Remove.
(vsubq_x_n_f32): Remove.
(__arm_vsubq_u8): Remove.
(__arm_vsubq_n_u8): Remove.
(__arm_vsubq_s8): Remove.
(__arm_vsubq_n_s8): Remove.
(__arm_vsubq_u16): Remove.
(__arm_vsubq_n_u16): Remove.
(__arm_vsubq_s16): Remove.
(__arm_vsubq_n_s16): Remove.
(__arm_vsubq_u32): Remove.
(__arm_vsubq_n_u32): Remove.
(__arm_vsubq_s32): Remove.
(__arm_vsubq_n_s32): Remove.
(__arm_vsubq_m_s8): Remove.
(__arm_vsubq_m_u8): Remove.
(__arm_vsubq_m_s16): Remove.
(__arm_vsubq_m_u16): Remove.
(__arm_vsubq_m_s32): Remove.
(__arm_vsubq_m_u32): Remove.
(__arm_vsubq_m_n_s8): Remove.
(__arm_vsubq_m_n_s32): Remove.
(__arm_vsubq_m_n_s16): Remove.
(__arm_vsubq_m_n_u8): Remove.
(__arm_vsubq_m_n_u32): Remove.
(__arm_vsubq_m_n_u16): Remove.
(__arm_vsubq_x_s8): Remove.
(__arm_vsubq_x_s16): Remove.
(__arm_vsubq_x_s32): Remove.
(__arm_vsubq_x_n_s8): Remove.
(__arm_vsubq_x_n_s16): Remove.
(__arm_vsubq_x_n_s32): Remove.
(__arm_vsubq_x_u8): Remove.
(__arm_vsubq_x_u16): Remove.
(__arm_vsubq_x_u32): Remove.
(__arm_vsubq_x_n_u8): Remove.
(__arm_vsubq_x_n_u16): Remove.
(__arm_vsubq_x_n_u32): Remove.
(__arm_vsubq_n_f16): Remove.
(__arm_vsubq_n_f32): Remove.
(__arm_vsubq_f16): Remove.
(__arm_vsubq_f32): Remove.
(__arm_vsubq_m_f32): Remove.
(__arm_vsubq_m_f16): Remove.
(__arm_vsubq_m_n_f32): Remove.
(__arm_vsubq_m_n_f16): Remove.
(__arm_vsubq_x_f16): Remove.
(__arm_vsubq_x_f32): Remove.
(__arm_vsubq_x_n_f16): Remove.
(__arm_vsubq_x_n_f32): Remove.
(__arm_vsubq): Remove.
(__arm_vsubq_m): Remove.
(__arm_vsubq_x): Remove.
* config/arm/arm_mve_builtins.def (vsubq_u, vsubq_s, vsubq_f):
Remove.
(vmulq_u, vmulq_s, vmulq_f): Remove.
* config/arm/mve.md (mve_vsubq_<supf><mode>): Remove.
(mve_vmulq_<supf><mode>): Remove.

arm: [MVE intrinsics] factorize vadd vsubq vmulq

In order to avoid using a huge switch when generating all the
intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
....)
This is achieved by using the new mve_insn iterator.

Having done that, it becomes easier to share similar patterns, to
avoid useless/error-prone code duplication.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/ChangeLog:

* config/arm/iterators.md (MVE_INT_BINARY_RTX, MVE_INT_M_BINARY)
(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul, mve_insn): New
iterators.
* config/arm/mve.md
(mve_vsubq_n_f<mode>, mve_vaddq_n_f<mode>, mve_vmulq_n_f<mode>):
Factorize into ...
(@mve_<mve_insn>q_n_f<mode>): ... this.
(mve_vaddq_n_<supf><mode>, mve_vmulq_n_<supf><mode>)
(mve_vsubq_n_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vaddq<mode>, mve_vmulq<mode>, mve_vsubq<mode>): Factorize
into ...
(mve_<mve_addsubmul>q<mode>): ... this.
(mve_vaddq_f<mode>, mve_vmulq_f<mode>, mve_vsubq_f<mode>):
Factorize into ...
(mve_<mve_addsubmul>q_f<mode>): ... this.
(mve_vaddq_m_<supf><mode>, mve_vmulq_m_<supf><mode>)
(mve_vsubq_m_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this,
(mve_vaddq_m_n_<supf><mode>, mve_vmulq_m_n_<supf><mode>)
(mve_vsubq_m_n_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
(mve_vaddq_m_f<mode>, mve_vmulq_m_f<mode>, mve_vsubq_m_f<mode>):
Factorize into ...
(@mve_<mve_insn>q_m_f<mode>): ... this.
(mve_vaddq_m_n_f<mode>, mve_vmulq_m_n_f<mode>)
(mve_vsubq_m_n_f<mode>): Factorize into ...
(@mve_<mve_insn>q_m_n_f<mode>): ... this.

arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn

Introduce a function that will be used to build intrinsics which use
RTX codes for the non-predicated, no-mode version, and UNSPECS
otherwise.

2022-09-08 Christophe Lyon <christophe.lyon@arm.com>

gcc/ChangeLog:

* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_base): New.
(class unspec_based_mve_function_exact_insn): New.

arm: [MVE intrinsics] add binary_opt_n shape

This patch adds the binary_opt_n shape description.

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_opt_n): New.
* config/arm/arm-mve-builtins-shapes.h (binary_opt_n): New.

arm: [MVE intrinsics] Rework vuninitialized

Implement vuninitialized using the new MVE builtins framework.

We need to keep the overloaded __arm_vuninitializedq definitions
because their resolution depends on the result type only, which is not
currently supported by the resolver.

2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class
vuninitializedq_impl): New.
* config/arm/arm-mve-builtins-base.def (vuninitializedq): New.
* config/arm/arm-mve-builtins-base.h (vuninitializedq): New
declaration.
* config/arm/arm-mve-builtins-shapes.cc (inherent): New.
* config/arm/arm-mve-builtins-shapes.h (inherent): New
declaration.
* config/arm/arm_mve_types.h (__arm_vuninitializedq): Move to ...
* config/arm/arm_mve.h (__arm_vuninitializedq): ... here.
(__arm_vuninitializedq_u8): Remove.
(__arm_vuninitializedq_u16): Remove.
(__arm_vuninitializedq_u32): Remove.
(__arm_vuninitializedq_u64): Remove.
(__arm_vuninitializedq_s8): Remove.
(__arm_vuninitializedq_s16): Remove.
(__arm_vuninitializedq_s32): Remove.
(__arm_vuninitializedq_s64): Remove.
(__arm_vuninitializedq_f16): Remove.
(__arm_vuninitializedq_f32): Remove.

arm: [MVE intrinsics] Rework vreinterpretq

This patch implements vreinterpretq using the new MVE intrinsics
framework.

The old definitions for vreinterpretq are removed as a consequence.

2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New class.
* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
* config/arm/arm-mve-builtins-base.h (vreinterpretq): New declaration.
* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New function.
(parse_type): Likewise.
(parse_signature): Likewise.
(build_one): Likewise.
(build_all): Likewise.
(overloaded_base): New struct.
(unary_convert_def): Likewise.
* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
macro.
(TYPES_reinterpret_unsigned1): Likewise.
(TYPES_reinterpret_integer): Likewise.
(TYPES_reinterpret_integer1): Likewise.
(TYPES_reinterpret_float1): Likewise.
(TYPES_reinterpret_float): Likewise.
(reinterpret_integer): New.
(reinterpret_float): New.
(handle_arm_mve_h): Register builtins.
* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
(vreinterpretq_s32): Likewise.
(vreinterpretq_s64): Likewise.
(vreinterpretq_s8): Likewise.
(vreinterpretq_u16): Likewise.
(vreinterpretq_u32): Likewise.
(vreinterpretq_u64): Likewise.
(vreinterpretq_u8): Likewise.
(vreinterpretq_f16): Likewise.
(vreinterpretq_f32): Likewise.
(vreinterpretq_s16_s32): Likewise.
(vreinterpretq_s16_s64): Likewise.
(vreinterpretq_s16_s8): Likewise.
(vreinterpretq_s16_u16): Likewise.
(vreinterpretq_s16_u32): Likewise.
(vreinterpretq_s16_u64): Likewise.
(vreinterpretq_s16_u8): Likewise.
(vreinterpretq_s32_s16): Likewise.
(vreinterpretq_s32_s64): Likewise.
(vreinterpretq_s32_s8): Likewise.
(vreinterpretq_s32_u16): Likewise.
(vreinterpretq_s32_u32): Likewise.
(vreinterpretq_s32_u64): Likewise.
(vreinterpretq_s32_u8): Likewise.
(vreinterpretq_s64_s16): Likewise.
(vreinterpretq_s64_s32): Likewise.
(vreinterpretq_s64_s8): Likewise.
(vreinterpretq_s64_u16): Likewise.
(vreinterpretq_s64_u32): Likewise.
(vreinterpretq_s64_u64): Likewise.
(vreinterpretq_s64_u8): Likewise.
(vreinterpretq_s8_s16): Likewise.
(vreinterpretq_s8_s32): Likewise.
(vreinterpretq_s8_s64): Likewise.
(vreinterpretq_s8_u16): Likewise.
(vreinterpretq_s8_u32): Likewise.
(vreinterpretq_s8_u64): Likewise.
(vreinterpretq_s8_u8): Likewise.
(vreinterpretq_u16_s16): Likewise.
(vreinterpretq_u16_s32): Likewise.
(vreinterpretq_u16_s64): Likewise.
(vreinterpretq_u16_s8): Likewise.
(vreinterpretq_u16_u32): Likewise.
(vreinterpretq_u16_u64): Likewise.
(vreinterpretq_u16_u8): Likewise.
(vreinterpretq_u32_s16): Likewise.
(vreinterpretq_u32_s32): Likewise.
(vreinterpretq_u32_s64): Likewise.
(vreinterpretq_u32_s8): Likewise.
(vreinterpretq_u32_u16): Likewise.
(vreinterpretq_u32_u64): Likewise.
(vreinterpretq_u32_u8): Likewise.
(vreinterpretq_u64_s16): Likewise.
(vreinterpretq_u64_s32): Likewise.
(vreinterpretq_u64_s64): Likewise.
(vreinterpretq_u64_s8): Likewise.
(vreinterpretq_u64_u16): Likewise.
(vreinterpretq_u64_u32): Likewise.
(vreinterpretq_u64_u8): Likewise.
(vreinterpretq_u8_s16): Likewise.
(vreinterpretq_u8_s32): Likewise.
(vreinterpretq_u8_s64): Likewise.
(vreinterpretq_u8_s8): Likewise.
(vreinterpretq_u8_u16): Likewise.
(vreinterpretq_u8_u32): Likewise.
(vreinterpretq_u8_u64): Likewise.
(vreinterpretq_s32_f16): Likewise.
(vreinterpretq_s32_f32): Likewise.
(vreinterpretq_u16_f16): Likewise.
(vreinterpretq_u16_f32): Likewise.
(vreinterpretq_u32_f16): Likewise.
(vreinterpretq_u32_f32): Likewise.
(vreinterpretq_u64_f16): Likewise.
(vreinterpretq_u64_f32): Likewise.
(vreinterpretq_u8_f16): Likewise.
(vreinterpretq_u8_f32): Likewise.
(vreinterpretq_f16_f32): Likewise.
(vreinterpretq_f16_s16): Likewise.
(vreinterpretq_f16_s32): Likewise.
(vreinterpretq_f16_s64): Likewise.
(vreinterpretq_f16_s8): Likewise.
(vreinterpretq_f16_u16): Likewise.
(vreinterpretq_f16_u32): Likewise.
(vreinterpretq_f16_u64): Likewise.
(vreinterpretq_f16_u8): Likewise.
(vreinterpretq_f32_f16): Likewise.
(vreinterpretq_f32_s16): Likewise.
(vreinterpretq_f32_s32): Likewise.
(vreinterpretq_f32_s64): Likewise.
(vreinterpretq_f32_s8): Likewise.
(vreinterpretq_f32_u16): Likewise.
(vreinterpretq_f32_u32): Likewise.
(vreinterpretq_f32_u64): Likewise.
(vreinterpretq_f32_u8): Likewise.
(vreinterpretq_s16_f16): Likewise.
(vreinterpretq_s16_f32): Likewise.
(vreinterpretq_s64_f16): Likewise.
(vreinterpretq_s64_f32): Likewise.
(vreinterpretq_s8_f16): Likewise.
(vreinterpretq_s8_f32): Likewise.
(__arm_vreinterpretq_f16): Likewise.
(__arm_vreinterpretq_f32): Likewise.
(__arm_vreinterpretq_s16): Likewise.
(__arm_vreinterpretq_s32): Likewise.
(__arm_vreinterpretq_s64): Likewise.
(__arm_vreinterpretq_s8): Likewise.
(__arm_vreinterpretq_u16): Likewise.
(__arm_vreinterpretq_u32): Likewise.
(__arm_vreinterpretq_u64): Likewise.
(__arm_vreinterpretq_u8): Likewise.
* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32): Remove.
(__arm_vreinterpretq_s16_s64): Likewise.
(__arm_vreinterpretq_s16_s8): Likewise.
(__arm_vreinterpretq_s16_u16): Likewise.
(__arm_vreinterpretq_s16_u32): Likewise.
(__arm_vreinterpretq_s16_u64): Likewise.
(__arm_vreinterpretq_s16_u8): Likewise.
(__arm_vreinterpretq_s32_s16): Likewise.
(__arm_vreinterpretq_s32_s64): Likewise.
(__arm_vreinterpretq_s32_s8): Likewise.
(__arm_vreinterpretq_s32_u16): Likewise.
(__arm_vreinterpretq_s32_u32): Likewise.
(__arm_vreinterpretq_s32_u64): Likewise.
(__arm_vreinterpretq_s32_u8): Likewise.
(__arm_vreinterpretq_s64_s16): Likewise.
(__arm_vreinterpretq_s64_s32): Likewise.
(__arm_vreinterpretq_s64_s8): Likewise.
(__arm_vreinterpretq_s64_u16): Likewise.
(__arm_vreinterpretq_s64_u32): Likewise.
(__arm_vreinterpretq_s64_u64): Likewise.
(__arm_vreinterpretq_s64_u8): Likewise.
(__arm_vreinterpretq_s8_s16): Likewise.
(__arm_vreinterpretq_s8_s32): Likewise.
(__arm_vreinterpretq_s8_s64): Likewise.
(__arm_vreinterpretq_s8_u16): Likewise.
(__arm_vreinterpretq_s8_u32): Likewise.
(__arm_vreinterpretq_s8_u64): Likewise.
(__arm_vreinterpretq_s8_u8): Likewise.
(__arm_vreinterpretq_u16_s16): Likewise.
(__arm_vreinterpretq_u16_s32): Likewise.
(__arm_vreinterpretq_u16_s64): Likewise.
(__arm_vreinterpretq_u16_s8): Likewise.
(__arm_vreinterpretq_u16_u32): Likewise.
(__arm_vreinterpretq_u16_u64): Likewise.
(__arm_vreinterpretq_u16_u8): Likewise.
(__arm_vreinterpretq_u32_s16): Likewise.
(__arm_vreinterpretq_u32_s32): Likewise.
(__arm_vreinterpretq_u32_s64): Likewise.
(__arm_vreinterpretq_u32_s8): Likewise.
(__arm_vreinterpretq_u32_u16): Likewise.
(__arm_vreinterpretq_u32_u64): Likewise.
(__arm_vreinterpretq_u32_u8): Likewise.
(__arm_vreinterpretq_u64_s16): Likewise.
(__arm_vreinterpretq_u64_s32): Likewise.
(__arm_vreinterpretq_u64_s64): Likewise.
(__arm_vreinterpretq_u64_s8): Likewise.
(__arm_vreinterpretq_u64_u16): Likewise.
(__arm_vreinterpretq_u64_u32): Likewise.
(__arm_vreinterpretq_u64_u8): Likewise.
(__arm_vreinterpretq_u8_s16): Likewise.
(__arm_vreinterpretq_u8_s32): Likewise.
(__arm_vreinterpretq_u8_s64): Likewise.
(__arm_vreinterpretq_u8_s8): Likewise.
(__arm_vreinterpretq_u8_u16): Likewise.
(__arm_vreinterpretq_u8_u32): Likewise.
(__arm_vreinterpretq_u8_u64): Likewise.
(__arm_vreinterpretq_s32_f16): Likewise.
(__arm_vreinterpretq_s32_f32): Likewise.
(__arm_vreinterpretq_s16_f16): Likewise.
(__arm_vreinterpretq_s16_f32): Likewise.
(__arm_vreinterpretq_s64_f16): Likewise.
(__arm_vreinterpretq_s64_f32): Likewise.
(__arm_vreinterpretq_s8_f16): Likewise.
(__arm_vreinterpretq_s8_f32): Likewise.
(__arm_vreinterpretq_u16_f16): Likewise.
(__arm_vreinterpretq_u16_f32): Likewise.
(__arm_vreinterpretq_u32_f16): Likewise.
(__arm_vreinterpretq_u32_f32): Likewise.
(__arm_vreinterpretq_u64_f16): Likewise.
(__arm_vreinterpretq_u64_f32): Likewise.
(__arm_vreinterpretq_u8_f16): Likewise.
(__arm_vreinterpretq_u8_f32): Likewise.
(__arm_vreinterpretq_f16_f32): Likewise.
(__arm_vreinterpretq_f16_s16): Likewise.
(__arm_vreinterpretq_f16_s32): Likewise.
(__arm_vreinterpretq_f16_s64): Likewise.
(__arm_vreinterpretq_f16_s8): Likewise.
(__arm_vreinterpretq_f16_u16): Likewise.
(__arm_vreinterpretq_f16_u32): Likewise.
(__arm_vreinterpretq_f16_u64): Likewise.
(__arm_vreinterpretq_f16_u8): Likewise.
(__arm_vreinterpretq_f32_f16): Likewise.
(__arm_vreinterpretq_f32_s16): Likewise.
(__arm_vreinterpretq_f32_s32): Likewise.
(__arm_vreinterpretq_f32_s64): Likewise.
(__arm_vreinterpretq_f32_s8): Likewise.
(__arm_vreinterpretq_f32_u16): Likewise.
(__arm_vreinterpretq_f32_u32): Likewise.
(__arm_vreinterpretq_f32_u64): Likewise.
(__arm_vreinterpretq_f32_u8): Likewise.
(__arm_vreinterpretq_s16): Likewise.
(__arm_vreinterpretq_s32): Likewise.
(__arm_vreinterpretq_s64): Likewise.
(__arm_vreinterpretq_s8): Likewise.
(__arm_vreinterpretq_u16): Likewise.
(__arm_vreinterpretq_u32): Likewise.
(__arm_vreinterpretq_u64): Likewise.
(__arm_vreinterpretq_u8): Likewise.
(__arm_vreinterpretq_f16): Likewise.
(__arm_vreinterpretq_f32): Likewise.
* config/arm/mve.md (@arm_mve_reinterpret<mode>): New pattern.
* config/arm/unspecs.md: (REINTERPRET): New unspec.

gcc/testsuite/
* g++.target/arm/mve.exp: Add general-c++ and general directories.
* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.

arm: [MVE intrinsics] Add new framework

This patch introduces the new MVE intrinsics framework, heavily
inspired by the SVE one in the aarch64 port.

Like the MVE intrinsic types implementation, the intrinsics framework
defines functions via a new pragma in arm_mve.h. A boolean parameter
is used to pass true when __ARM_MVE_PRESERVE_USER_NAMESPACE is
defined, and false when it is not, allowing for non-prefixed intrinsic
functions to be conditionally defined.

Future patches will build on this framework by adding new intrinsic
functions and adding the features needed to support them.

Differences compared to the aarch64/SVE port include:
- when present, the predicate argument is the last one with MVE (the
  first one with SVE)
- when using merging predicates ("_m" suffix), the "inactive" argument
  (if any) is inserted in the first position
- when using merging predicates ("_m" suffix), some function do not
  have the "inactive" argument, so we maintain an exception-list
- MVE intrinsics dealing with floating-point require the FP extension,
  while SVE may support different extensions
- regarding global state, MVE does not have any prefetch intrinsic, so
  we do not need a flag for this
- intrinsic names can be prefixed with "__arm", depending on whether
  preserve_user_namespace is true or false
- parse_signature: the maximum number of arguments is now a parameter,
  this helps detecting an overflow with a new assert.
- suffixes and overloading can be controlled using
  explicit_mode_suffix_p and skip_overload_p in addition to
  explicit_type_suffix_p

At this implemtation stage, there are some limitations compared
to aarch64/SVE, which are removed later in the series:
- "offset" mode is not supported yet
- gimple folding is not implemented

2022-09-08  Murray Steele  <murray.steele@arm.com>
    Christophe Lyon  <christophe.lyon@arm.com>

gcc/ChangeLog:

* config.gcc: Add arm-mve-builtins-base.o and
arm-mve-builtins-shapes.o to extra_objs.
* config/arm/arm-builtins.cc (arm_builtin_decl): Handle MVE builtin
numberspace.
(arm_expand_builtin): Likewise
(arm_check_builtin_call): Likewise
(arm_describe_resolver): Likewise.
* config/arm/arm-builtins.h (enum resolver_ident): Add
arm_mve_resolver.
* config/arm/arm-c.cc (arm_pragma_arm): Handle new pragma.
(arm_resolve_overloaded_builtin): Handle MVE builtins.
(arm_register_target_pragmas): Register arm_check_builtin_call.
* config/arm/arm-mve-builtins.cc (class registered_function): New
class.
(struct registered_function_hasher): New struct.
(pred_suffixes): New table.
(mode_suffixes): New table.
(type_suffix_info): New table.
(TYPES_float16): New.
(TYPES_all_float): New.
(TYPES_integer_8): New.
(TYPES_integer_8_16): New.
(TYPES_integer_16_32): New.
(TYPES_integer_32): New.
(TYPES_signed_16_32): New.
(TYPES_signed_32): New.
(TYPES_all_signed): New.
(TYPES_all_unsigned): New.
(TYPES_all_integer): New.
(TYPES_all_integer_with_64): New.
(DEF_VECTOR_TYPE): New.
(DEF_DOUBLE_TYPE): New.
(DEF_MVE_TYPES_ARRAY): New.
(all_integer): New.
(all_integer_with_64): New.
(float16): New.
(all_float): New.
(all_signed): New.
(all_unsigned): New.
(integer_8): New.
(integer_8_16): New.
(integer_16_32): New.
(integer_32): New.
(signed_16_32): New.
(signed_32): New.
(register_vector_type): Use void_type_node for mve.fp-only types when
mve.fp is not enabled.
(register_builtin_tuple_types): Likewise.
(handle_arm_mve_h): New function..
(matches_type_p): Likewise..
(report_out_of_range): Likewise.
(report_not_enum): Likewise.
(report_missing_float): Likewise.
(report_non_ice): Likewise.
(check_requires_float): Likewise.
(function_instance::hash): Likewise
(function_instance::call_properties): Likewise.
(function_instance::reads_global_state_p): Likewise.
(function_instance::modifies_global_state_p): Likewise.
(function_instance::could_trap_p): Likewise.
(function_instance::has_inactive_argument): Likewise.
(registered_function_hasher::hash): Likewise.
(registered_function_hasher::equal): Likewise.
(function_builder::function_builder): Likewise.
(function_builder::~function_builder): Likewise.
(function_builder::append_name): Likewise.
(function_builder::finish_name): Likewise.
(function_builder::get_name): Likewise.
(add_attribute): Likewise.
(function_builder::get_attributes): Likewise.
(function_builder::add_function): Likewise.
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
(function_builder::add_overloaded_functions): Likewise.
(function_builder::register_function_group): Likewise.
(function_call_info::function_call_info): Likewise.
(function_resolver::function_resolver): Likewise.
(function_resolver::get_vector_type): Likewise.
(function_resolver::get_scalar_type_name): Likewise.
(function_resolver::get_argument_type): Likewise.
(function_resolver::scalar_argument_p): Likewise.
(function_resolver::report_no_such_form): Likewise.
(function_resolver::lookup_form): Likewise.
(function_resolver::resolve_to): Likewise.
(function_resolver::infer_vector_or_tuple_type): Likewise.
(function_resolver::infer_vector_type): Likewise.
(function_resolver::require_vector_or_scalar_type): Likewise.
(function_resolver::require_vector_type): Likewise.
(function_resolver::require_matching_vector_type): Likewise.
(function_resolver::require_derived_vector_type): Likewise.
(function_resolver::require_derived_scalar_type): Likewise.
(function_resolver::require_integer_immediate): Likewise.
(function_resolver::require_scalar_type): Likewise.
(function_resolver::check_num_arguments): Likewise.
(function_resolver::check_gp_argument): Likewise.
(function_resolver::finish_opt_n_resolution): Likewise.
(function_resolver::resolve_unary): Likewise.
(function_resolver::resolve_unary_n): Likewise.
(function_resolver::resolve_uniform): Likewise.
(function_resolver::resolve_uniform_opt_n): Likewise.
(function_resolver::resolve): Likewise.
(function_checker::function_checker): Likewise.
(function_checker::argument_exists_p): Likewise.
(function_checker::require_immediate): Likewise.
(function_checker::require_immediate_enum): Likewise.
(function_checker::require_immediate_range): Likewise.
(function_checker::check): Likewise.
(gimple_folder::gimple_folder): Likewise.
(gimple_folder::fold): Likewise.
(function_expander::function_expander): Likewise.
(function_expander::direct_optab_handler): Likewise.
(function_expander::get_fallback_value): Likewise.
(function_expander::get_reg_target): Likewise.
(function_expander::add_output_operand): Likewise.
(function_expander::add_input_operand): Likewise.
(function_expander::add_integer_operand): Likewise.
(function_expander::generate_insn): Likewise.
(function_expander::use_exact_insn): Likewise.
(function_expander::use_unpred_insn): Likewise.
(function_expander::use_pred_x_insn): Likewise.
(function_expander::use_cond_insn): Likewise.
(function_expander::map_to_rtx_codes): Likewise.
(function_expander::expand): Likewise.
(resolve_overloaded_builtin): Likewise.
(check_builtin_call): Likewise.
(gimple_fold_builtin): Likewise.
(expand_builtin): Likewise.
(gt_ggc_mx): Likewise.
(gt_pch_nx): Likewise.
(gt_pch_nx): Likewise.
* config/arm/arm-mve-builtins.def(s8): Define new type suffix.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
(f16): Likewise.
(f32): Likewise.
(n): New mode.
(offset): New mode.
* config/arm/arm-mve-builtins.h (MAX_TUPLE_SIZE): New constant.
(CP_READ_FPCR): Likewise.
(CP_RAISE_FP_EXCEPTIONS): Likewise.
(CP_READ_MEMORY): Likewise.
(CP_WRITE_MEMORY): Likewise.
(enum units_index): New enum.
(enum predication_index): New.
(enum type_class_index): New.
(enum mode_suffix_index): New enum.
(enum type_suffix_index): New.
(struct mode_suffix_info): New struct.
(struct type_suffix_info): New.
(struct function_group_info): Likewise.
(class function_instance): Likewise.
(class registered_function): Likewise.
(class function_builder): Likewise.
(class function_call_info): Likewise.
(class function_resolver): Likewise.
(class function_checker): Likewise.
(class gimple_folder): Likewise.
(class function_expander): Likewise.
(get_mve_pred16_t): Likewise.
(find_mode_suffix): New function.
(class function_base): Likewise.
(class function_shape): Likewise.
(function_instance::operator==): New function.
(function_instance::operator!=): Likewise.
(function_instance::vectors_per_tuple): Likewise.
(function_instance::mode_suffix): Likewise.
(function_instance::type_suffix): Likewise.
(function_instance::scalar_type): Likewise.
(function_instance::vector_type): Likewise.
(function_instance::tuple_type): Likewise.
(function_instance::vector_mode): Likewise.
(function_call_info::function_returns_void_p): Likewise.
(function_base::call_properties): Likewise.
* config/arm/arm-protos.h (enum arm_builtin_class): Add
ARM_BUILTIN_MVE.
(handle_arm_mve_h): New.
(resolve_overloaded_builtin): New.
(check_builtin_call): New.
(gimple_fold_builtin): New.
(expand_builtin): New.
* config/arm/arm.cc (TARGET_GIMPLE_FOLD_BUILTIN): Define as
arm_gimple_fold_builtin.
(arm_gimple_fold_builtin): New function.
* config/arm/arm_mve.h: Use new arm_mve.h pragma.
* config/arm/predicates.md (arm_any_register_operand): New predicate.
* config/arm/t-arm: (arm-mve-builtins.o): Add includes.
(arm-mve-builtins-shapes.o): New target.
(arm-mve-builtins-base.o): New target.
* config/arm/arm-mve-builtins-base.cc: New file.
* config/arm/arm-mve-builtins-base.def: New file.
* config/arm/arm-mve-builtins-base.h: New file.
* config/arm/arm-mve-builtins-functions.h: New file.
* config/arm/arm-mve-builtins-shapes.cc: New file.
* config/arm/arm-mve-builtins-shapes.h: New file.

Co-authored-by: Christophe Lyon <christophe.lyon@arm.com

arm: move builtin function codes into general numberspace

This patch introduces a separate numberspace for general arm builtin
function codes. The intent of this patch is to separate the space of
function codes that may be assigned to general builtins and future
MVE intrinsic functions by using the first bit of each function code
to differentiate them. This is identical to how SVE intrinsic functions
are currently differentiated from general aarch64 builtins.

Future intrinsics implementations may also make use of numberspacing by
changing the values of ARM_BUILTIN_SHIFT and ARM_BUILTIN_CLASS, and
adding themselves to the arm_builtin_class enum.

2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_general_add_builtin_function):
New function.
(arm_init_builtin): Use arm_general_add_builtin_function instead
of arm_add_builtin_function.
(arm_init_acle_builtins): Likewise.
(arm_init_mve_builtins): Likewise.
(arm_init_crypto_builtins): Likewise.
(arm_init_builtins): Likewise.
(arm_general_builtin_decl): New function.
(arm_builtin_decl): Defer to numberspace-specialized functions.
(arm_expand_builtin_args): Rename into arm_general_expand_builtin_args.
(arm_expand_builtin_1): Rename into arm_general_expand_builtin_1 and ...
(arm_general_expand_builtin_1): ... specialize for general builtins.
(arm_expand_acle_builtin): Use arm_general_expand_builtin
instead of arm_expand_builtin.
(arm_expand_mve_builtin): Likewise.
(arm_expand_neon_builtin): Likewise.
(arm_expand_vfp_builtin): Likewise.
(arm_general_expand_builtin): New function.
(arm_expand_builtin): Specialize for general builtins.
(arm_general_check_builtin_call): New function.
(arm_check_builtin_call): Specialize for general builtins.
(arm_describe_resolver): Validate numberspace.
(arm_cde_end_args): Likewise.
* config/arm/arm-protos.h (enum arm_builtin_class): New enum.
(ARM_BUILTIN_SHIFT, ARM_BUILTIN_CLASS): New constants.

Co-authored-by: Christophe Lyon <christophe.lyon@arm.com>

riscv: fix error: control reaches end of non-void function

Fixes:
gcc/config/riscv/sync.md:66:1: error: control reaches end of non-void function [-Werror=return-type]
66 | [(set (attr "length") (const_int 4))])
| ^

PR target/109713

gcc/ChangeLog:

* config/riscv/sync.md: Add gcc_unreachable to a switch.

More last_stmt removal

This is the last set of changes removing calls to last_stmt in favor of
*gsi_last_bb where this is obviously correct. As with the last changes
I tried to cleanup the code as far as dependences are concerned.

* tree-ssa-loop-split.cc (split_at_bb_p): Avoid last_stmt.
(patch_loop_exit): Likewise.
(connect_loops): Likewise.
(split_loop): Likewise.
(control_dep_semi_invariant_p): Likewise.
(do_split_loop_on_cond): Likewise.
(split_loop_on_cond): Likewise.
* tree-ssa-loop-unswitch.cc (find_unswitching_predicates_for_bb):
Likewise.
(simplify_loop_version): Likewise.
(evaluate_bbs): Likewise.
(find_loop_guard): Likewise.
(clean_up_after_unswitching): Likewise.
* tree-ssa-math-opts.cc (maybe_optimize_guarding_check):
Likewise.
(optimize_spaceship): Take a gcond * argument, avoid
last_stmt.
(math_opts_dom_walker::after_dom_children): Adjust call to
optimize_spaceship.
* tree-vrp.cc (maybe_set_nonzero_bits): Avoid last_stmt.
* value-pointer-equiv.cc (pointer_equiv_analyzer::visit_edge):
Likewise.

libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]

This always sets _M_string_length in the constructor for ranges of input
iterators, such as stream iterators.

We copy from the source range to the local buffer, and then repeatedly
reallocate a larger one if necessary. When disposing the old buffer,
_M_is_local() is used to tell if the buffer is the local one or not (and
so must be deallocated). In addition to comparing the buffer address
with the local buffer, _M_is_local() has an optimization hint so that
the compiler knows that for a string using the local buffer, there is an
invariant that _M_string_length <= _S_local_capacity (added for PR109299
via r13-6915-gbf78b43873b0b7). But we failed to set _M_string_length in
the constructor taking a pair of iterators, so the invariant might not
hold, and __builtin_unreachable() is reached. This causes UBsan errors,
and potentially misoptimization.

To ensure the invariant holds, _M_string_length is initialized to zero
before doing anything else, so that _M_is_local() doesn't see an
uninitialized value.

This issue only surfaces when constructing a string with a range of
input iterator, and the uninitialized _M_string_length happens to be
greater than _S_local_capacity, i.e., 15 for the std::string
specialization.

libstdc++-v3/ChangeLog:

PR libstdc++/109703
* include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
Initialize _M_string_length.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>

riscv/linux: Don't add -latomic with -pthread

Now that we have support for inline subword atomic operations, it is no
longer necessary to link against libatomic. This also fixes testsuite
failures because the framework does not properly set up the linker flags
for finding libatomic.
The use of atomic operations is also independent of the use of libpthread.

gcc/
* config/riscv/linux.h (LIB_SPEC): Don't redefine.

RISC-V: Support segment intrinsics

Add segment load/store intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/198

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (fold_fault_load):
New function.
(class vlseg): New class.
(class vsseg): Ditto.
(class vlsseg): Ditto.
(class vssseg): Ditto.
(class seg_indexed_load): Ditto.
(class seg_indexed_store): Ditto.
(class vlsegff): Ditto.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vlseg):
Ditto.
(vsseg): Ditto.
(vlsseg): Ditto.
(vssseg): Ditto.
(vluxseg): Ditto.
(vloxseg): Ditto.
(vsuxseg): Ditto.
(vsoxseg): Ditto.
(vlsegff): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct
seg_loadstore_def): Ditto.
(struct seg_indexed_loadstore_def): Ditto.
(struct seg_fault_load_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_builder::append_nf): New function.
* config/riscv/riscv-vector-builtins.def (vfloat32m1x2_t):
Change ptr from double into float.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h: Add segment intrinsics.
* config/riscv/riscv-vsetvl.cc (fault_first_load_p): Adapt for
segment ff load.
* config/riscv/riscv.md: Add segment instructions.
* config/riscv/vector-iterators.md: Support segment intrinsics.
* config/riscv/vector.md (@pred_unit_strided_load<mode>): New
pattern.
(@pred_unit_strided_store<mode>): Ditto.
(@pred_strided_load<mode>): Ditto.
(@pred_strided_store<mode>): Ditto.
(@pred_fault_load<mode>): Ditto.
(@pred_indexed_<order>load<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>load<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>load<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>load<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>load<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>load<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>load<V64T:mode><V64I:mode>): Ditto.
(@pred_indexed_<order>store<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>store<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>store<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>store<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>store<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>store<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>store<V64T:mode><V64I:mode>): Ditto.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

RISC-V: Add tuple type vget/vset intrinsics

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Adapt for
tuple type support.
(inttype): Ditto.
(floattype): Ditto.
(main): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vset): Add
tuple type vset.
(vget): Add tuple type vget.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_TUPLE_OPS): New macro.
(vint8mf8x2_t): Ditto.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_OPS):
Ditto.
(DEF_RVV_TYPE_INDEX): Ditto.
(rvv_arg_type_info::get_tuple_subpart_type): New function.
(DEF_RVV_TUPLE_TYPE): New macro.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX):
Adapt for tuple vget/vset support.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
(tuple_subpart): Add tuple subpart base type.
* config/riscv/riscv-vector-builtins.h (struct
rvv_arg_type_info): Ditto.
(tuple_type_field): New function.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

RISC-V: Add tuple types support

gcc/ChangeLog:

* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
* config/riscv/riscv-protos.h (riscv_v_ext_tuple_mode_p): New
function.
(get_nf): Ditto.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-v.cc (ENTRY): New macro.
(TUPLE_ENTRY): Ditto.
(get_nf): New function.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_TYPE):
New macro.
(register_tuple_type): New function
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TUPLE_TYPE):
New macro.
(vint8mf8x2_t): New macro.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h (DEF_RVV_TUPLE_TYPE):
Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): New
function.
(TUPLE_ENTRY): Ditto.
(riscv_v_ext_mode_p): New function.
(riscv_v_adjust_nunits): Add tuple mode adjustment.
(riscv_classify_address): Ditto.
(riscv_binary_cost): Ditto.
(riscv_rtx_costs): Ditto.
(riscv_secondary_memory_needed): Ditto.
(riscv_hard_regno_nregs): Ditto.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_vector_mode_supported_p): Ditto.
(riscv_regmode_natural_size): Ditto.
(riscv_array_mode): New function.
(TARGET_ARRAY_MODE): New target hook.
* config/riscv/riscv.md: Add tuple modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md (mov<mode>): Add tuple modes data
movement.
(*mov<VT:mode>_<P:mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: New test.
* gcc.target/riscv/rvv/base/abi-11.c: New test.
* gcc.target/riscv/rvv/base/abi-12.c: New test.
* gcc.target/riscv/rvv/base/abi-13.c: New test.
* gcc.target/riscv/rvv/base/abi-14.c: New test.
* gcc.target/riscv/rvv/base/abi-15.c: New test.
* gcc.target/riscv/rvv/base/abi-16.c: New test.
* gcc.target/riscv/rvv/base/abi-8.c: New test.
* gcc.target/riscv/rvv/base/abi-9.c: New test.
* gcc.target/riscv/rvv/base/tuple-1.c: New test.
* gcc.target/riscv/rvv/base/tuple-10.c: New test.
* gcc.target/riscv/rvv/base/tuple-11.c: New test.
* gcc.target/riscv/rvv/base/tuple-12.c: New test.
* gcc.target/riscv/rvv/base/tuple-13.c: New test.
* gcc.target/riscv/rvv/base/tuple-14.c: New test.
* gcc.target/riscv/rvv/base/tuple-15.c: New test.
* gcc.target/riscv/rvv/base/tuple-16.c: New test.
* gcc.target/riscv/rvv/base/tuple-17.c: New test.
* gcc.target/riscv/rvv/base/tuple-18.c: New test.
* gcc.target/riscv/rvv/base/tuple-19.c: New test.
* gcc.target/riscv/rvv/base/tuple-2.c: New test.
* gcc.target/riscv/rvv/base/tuple-20.c: New test.
* gcc.target/riscv/rvv/base/tuple-21.c: New test.
* gcc.target/riscv/rvv/base/tuple-22.c: New test.
* gcc.target/riscv/rvv/base/tuple-23.c: New test.
* gcc.target/riscv/rvv/base/tuple-24.c: New test.
* gcc.target/riscv/rvv/base/tuple-25.c: New test.
* gcc.target/riscv/rvv/base/tuple-26.c: New test.
* gcc.target/riscv/rvv/base/tuple-27.c: New test.
* gcc.target/riscv/rvv/base/tuple-3.c: New test.
* gcc.target/riscv/rvv/base/tuple-4.c: New test.
* gcc.target/riscv/rvv/base/tuple-5.c: New test.
* gcc.target/riscv/rvv/base/tuple-6.c: New test.
* gcc.target/riscv/rvv/base/tuple-7.c: New test.
* gcc.target/riscv/rvv/base/tuple-8.c: New test.
* gcc.target/riscv/rvv/base/tuple-9.c: New test.
* gcc.target/riscv/rvv/base/user-10.c: New test.
* gcc.target/riscv/rvv/base/user-11.c: New test.
* gcc.target/riscv/rvv/base/user-12.c: New test.
* gcc.target/riscv/rvv/base/user-13.c: New test.
* gcc.target/riscv/rvv/base/user-14.c: New test.
* gcc.target/riscv/rvv/base/user-15.c: New test.
* gcc.target/riscv/rvv/base/user-7.c: New test.
* gcc.target/riscv/rvv/base/user-8.c: New test.
* gcc.target/riscv/rvv/base/user-9.c: New test.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Speedup cse_insn

When cse_insn prunes src{,_folded,_eqv_here,_related} with the
equivalence set in the *_same_value chain it also searches for
an equivalence to the destination of the instruction with

          /* This is the same as the destination of the insns, we want
             to prefer it.  Copy it to src_related.  The code below will
             then give it a negative cost.  */
          if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
            src_related = p->exp;

this picks up the last such equivalence and in particular any
later duplicate will be pruned by the preceeding

          else if (src_related && GET_CODE (src_related) == code
                   && rtx_equal_p (src_related, p->exp))
            src_related = 0;

first.  This wastes cycles doing extra rtx_equal_p checks.  The
following instead searches for the first destination equivalence
separately in this loop and delays using src_related for it until
we are about to process that, avoiding another redundant rtx_equal_p
check.

I've came here because of a testcase with very large equivalence
lists and compile-time of cse_insn.  The patch below doesn't speed
it up significantly since there's no equivalence on the destination.

In theory this opens the possibility to track dest_related
separately, avoiding the implicit pruning of any previous
value in src_related.  As is the change should be a no-op for
code generation.

* cse.cc (cse_insn): Track an equivalence to the destination
separately and delay using src_related for it.

Improve RTL CSE hash table hash usage

The RTL CSE hash table has a fixed number of buckets (32) each
with a linked list of entries with the same hash value.  The
actual hash values are computed using hash_rtx which uses adds
for mixing and adds the rtx CODE as CODE << 7 (apart from some
exceptions such as MEM).  The unsigned int typed hash value
is then simply truncated for the actual lookup into the fixed
size table which means that usually CODE is simply lost.

The following improves this truncation by first mixing in more
bits using xor.  It does not change the actual hash function
since that's used outside of CSE as well.

An alternative would be to bump the fixed number of buckets,
say to 256 which would retain the LSB of CODE or to 8192 which
can capture all 6 bits required for the last CODE.

As the comment in CSE says, there's invalidate_memory and
flush_hash_table done possibly frequently and those at least
need to walk all slots, so when the hash table is mostly empty
enlarging it will be a loss.  Still there should be more
regular lookups by hash, so less collisions should pay off
as well.

Without enlarging the table a better hash function is unlikely
going to make a big difference, simple statistics on the
number of collisions at insertion time shows a reduction of
around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
at the expense of reducing the average table fill by 10%
(all of this stats from looking just at fold-const.i at -O2).
Increasing HASH_SHIFT more leaves the table even more sparse
likely showing that hash_rtx uses add for mixing which is
quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
collisions.

Experimenting with using inchash instead of adds for the
mixing does not improve things when looking at the HASH_SHIFT
bumped by 2 numbers.

* cse.cc (HASH): Turn into inline function and mix
in another HASH_SHIFT bits.
(SAFE_HASH): Likewise.

aarch64: PR target/99195 annotate HADDSUB patterns for vec-concat with zero

Further straightforward patch for the various halving intrinsics with or without rounding, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<sur>h<addsub><mode>): Rename to...
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for halving and rounding
add/sub intrinsics.

aarch64: PR target/99195 annotate simple floating-point patterns for vec-concat with zero

Continuing the, almost mechanical, series this patch adds annotation for some of the simple
floating-point patterns we have, and adds testing to ensure that redundant zeroing instructions
are eliminated.

Bootstrapped and tested on aarch64-none-linux-gnu and also aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (add<mode>3): Rename to...
(add<mode>3<vczle><vczbe>): ... This.
(sub<mode>3): Rename to...
(sub<mode>3<vczle><vczbe>): ... This.
(mul<mode>3): Rename to...
(mul<mode>3<vczle><vczbe>): ... This.
(*div<mode>3): Rename to...
(*div<mode>3<vczle><vczbe>): ... This.
(neg<mode>2): Rename to...
(neg<mode>2<vczle><vczbe>): ... This.
(abs<mode>2): Rename to...
(abs<mode>2<vczle><vczbe>): ... This.
(<frint_pattern><mode>2): Rename to...
(<frint_pattern><mode>2<vczle><vczbe>): ... This.
(<fmaxmin><mode>3): Rename to...
(<fmaxmin><mode>3<vczle><vczbe>): ... This.
(*sqrt<mode>2): Rename to...
(*sqrt<mode>2<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for some unary
and binary floating-point ops.
* gcc.target/aarch64/simd/pr99195_2.c: New test.

Docs: Add vector register constarint for asm operands

`vr`, `vm` and `vd` constarint for vector register constarint, those 3
constarint has implemented on LLVM as well.

gcc/ChangeLog:

* doc/md.texi (RISC-V): Add vr, vm, vd constarint.

clang warning: warning: private field 'm_gc' is not used [-Wunused-private-field]

PR tree-optimization/109693

gcc/ChangeLog:

* value-range-storage.cc (vrange_allocator::vrange_allocator):
Remove unused field.
* value-range-storage.h: Likewise.

c++: Fix up VEC_INIT_EXPR gimplification after r12-7069

During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from &pset to cp_fold_data
&data, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set<tree> and so if during the folding we ever touch
data->flags, we use uninitialized data there.

The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.

2023-05-03 Jakub Jelinek <jakub@redhat.com>

* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than &pset to cp_walk_tree with cp_fold_r.

c++: fix TTP level reduction cache

We try to cache the result of reduce_template_parm_level so that when we
reduce the same parm multiple times we get the same result, but this wasn't
working for template template parms because in that case TYPE is a
TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
level mismatch that we're trying to adjust for. So in that case compare the
template parms of the template template parms instead.

The result can be seen in nontype12.C, where we previously gave three
duplicate errors on line 7 and now give only one because subsequent
substitutions use the cache.

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Fix comparison of
template template parm to cached version.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Check for duplicate error.

Daily bump.

c++: simplify member template substitution

I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X) \
    template <class U> struct X##1; \
    template <class U> struct X##2; \
    template <class U> struct X##3; \
    template <class U> struct X##4; \
    template <class U> struct X##5; \
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

PHIOPT: small refactoring of match_simplify_replacement.

When I added diamond shaped form bb to match_simplify_replacement,
I copied the code to move the statement rather than factoring it
out to a new function. This does the refactoring to a new function
to avoid the duplicated code. It will make adding support for having
two statements to move easier (the second statement will only be a
conversion).

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): New function.
(match_simplify_replacement): Use move_stmt instead
of the inlined version.

MATCH: Port CLRSB part of builtin_zero_pattern

This ports the clrsb builtin part of builtin_zero_pattern
to match.pd. A simple pattern to port.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
pattern.

tree-optimization: [PR109702] MATCH: Fix a ? func(a) : N patterns

I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.

I committed this as obvious after a bootstrap/test on x86_64-linux-gnu

PR tree-optimization/109702

gcc/ChangeLog:

* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-25b.c: New test.

target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64

There is no canonical form for this case defined. So the aarch64 backend needs
a pattern to match both of these forms.

The forms are:
(set (reg/i:SI 0 x0)
    (if_then_else:SI (eq (reg:CC 66 cc)
            (const_int 0 [0]))
        (reg:SI 97)
        (const_int -1 [0xffffffffffffffff])))
and
(set (reg/i:SI 0 x0)
    (ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
                (const_int 0 [0])))
        (reg:SI 102)))

Currently the aarch64 backend matches the first form so this
patch adds a insn_and_split to match the second form and
convert it to the first form.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions

PR target/109657

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov<mode>_insn_m1): New
insn_and_split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/csinv-2.c: New test.

c++: less invalidate_class_lookup_cache

In the testcase below, we push_to_top_level to instantiate f and g, and they
can both use the previous_class_level cache from instantiating A<int>.
Wiping the cache in pop_from_top_level is not helpful; we'll do that in
pushclass if needed.

  template <class T> struct A
  {
    int i;
    void f() { i = 42; }
    void g() { i = 24; }
  };

  int main()
  {
    A<int> a;
    a.f();
    a.g();
  }

gcc/cp/ChangeLog:

* name-lookup.cc (pop_from_top_level): Don't
invalidate_class_lookup_cache.

c++: look for empty base at specific offset [PR109678]

While looking at the empty base handling for 109678, it occurred to me that
we ought to be able to look for an empty base at a specific offset, not just
in general.

PR c++/109678

gcc/cp/ChangeLog:

* cp-tree.h (lookup_base): Add offset parm.
* constexpr.cc (cxx_fold_indirect_ref_1): Pass it.
* search.cc (struct lookup_base_data_s): Add offset.
(dfs_lookup_base): Handle it.
(lookup_base): Pass it.

c++: std::variant slow to compile [PR109678]

Here, when dealing with a class with a complex subobject structure, we would
try and fail to find the relevant FIELD_DECL for an empty base before giving
up. And we would do this at each level, in a combinatorially problematic
way. Instead, we should check for an empty base first.

PR c++/109678

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/variant1.C: New test.

RISC-V: Table A.6 conformance tests

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>