git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Further simplify _Hashtable inheritance hierarchy

The main change here is using [[no_unique_address]] instead of the Empty
Base-class Optimization. Using the attribute allows us to use data
members instead of base-classes. That simplifies the inheritance
hierarchy, which means less work for the compiler. It also means that
ADL has fewer associated classes and associated namespaces to consider,
further reducing the work the compiler has to do.

Reducing the differences between the _Hashtable_ebo_helper primary
template and the partial specialization means we no longer need to use
member functions to access the stored object, because it's now always a
data member called _M_obj. This means we can also remove a number of
other helper functions that were using those member functions to access
the object, for example we can swap the _Hash and _Equal objects
directly in _Hashtable::swap instead of calling _Hashtable_base::_M_swap
which then calls _Hash_code_base::_M_swap.

Although [[no_unique_address]] would allow us to reduce the size for
empty types that are also 'final', doing so would be an ABI break
because those types were previously excluded from using the EBO. So we
still need the _Hashtable_ebo_helper class template and a partial
specialization, so that we only use the attribute under exactly the same
conditions as we previously used the EBO. This could be avoided with a
non-standard [[no_unique_address(expr)]] attribute that took a boolean
condition, or with reflection and token sequence injection, but we don't
have either of those things.

Because _Hashtable_ebo_helper is no longer used as a base-class we don't
need to disambiguate possible identical bases, so it doesn't need an
integral non-type template parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::swap): Swap hash
function and equality predicate here. Inline allocator swap
instead of using __alloc_on_swap.
* include/bits/hashtable_policy.h (_Hashtable_ebo_helper):
Replace EBO with no_unique_address attribute. Remove NTTP.
(_Hash_code_base): Replace base class with data member using
no_unique_address attribute.
(_Hash_code_base::_M_swap): Remove.
(_Hash_code_base::_M_hash): Remove.
(_Hashtable_base): Replace base class with data member using
no_unique_address attribute.
(_Hashtable_base::_M_swap): Remove.
(_Hashtable_alloc): Replace base class with data member using
no_unique_address attribute.

libstdc++: Fix fancy pointer support in linked lists [PR57272]

The union members I used in the new _Node types for fancy pointers only
work for value types that are trivially default constructible. This
change replaces the anonymous union with a named union so it can be
given a default constructor and destructor, to leave the variant member
uninitialized.

This also fixes the incorrect macro names in the alloc_ptr_ignored.cc
tests as pointed out by François, and fixes some std::list pointer
confusions that the fixed alloc_ptr_ignored.cc test revealed.

libstdc++-v3/ChangeLog:

PR libstdc++/57272
* include/bits/forward_list.h (__fwd_list::_Node): Add
user-provided special member functions to union.
* include/bits/stl_list.h (__list::_Node): Likewise.
(_Node_base::_M_hook, _Node_base::swap): Use _M_base() instead
of std::pointer_traits::pointer_to.
(_Node_base::_M_transfer): Likewise. Add noexcept.
(_List_base::_M_put_node): Use 'if constexpr' to avoid using
pointer_traits::pointer_to when not necessary.
(_List_base::_M_destroy_node): Fix parameter to be the pointer
type used internally, not the allocator's pointer.
(list::_M_create_node): Likewise.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
Check explicit instantiation of non-trivial value type.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
Likewise.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
Fix macro name.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
Likewise.

Fix non-aligned CodeView symbols

CodeView symbols in PDB files are aligned to four-byte boundaries. It's
not really clear what logic MSVC uses to enforce this; sometimes the
symbols are padded in the object file, sometimes the linker seems to do
the work.

It makes more sense to do this in the compiler, so fix the two instances
where we can write symbols with a non-aligned length. S_FRAMEPROC is
unusually not a multiple of 4, so will always have 2 bytes padding.
S_INLINESITE is followed by variable-length "binary annotations", so
will also usually have padding.

gcc/
* dwarf2codeview.cc (write_s_frameproc): Align output.
(write_s_inlinesite): Align output.

Daily bump.

hppa: Implement TARGET_FRAME_POINTER_REQUIRED

If a function receives nonlocal gotos, it needs to save the frame
pointer in the argument save area. This ensures that LRA sets
frame_pointer_needed when it saves arguments in the save area.

2024-12-15 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/118018
* config/pa/pa.cc (pa_frame_pointer_required): Declare and
implement.
(TARGET_FRAME_POINTER_REQUIRED): Define.

testsuite: Enable TImode tests on hppa64

2024-12-15 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ivopts-1.c: Enable TImode tests on hppa64.

testsuite: xfail scan-assembler-times in c-c++-common/gomp/unroll-[45].c

Count differs on hppa*-*-hpux* due to hpux specific directives.

2024-12-15 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/unroll-4.c: xfail scan-assembler-times
"dummy" for hppa*-*-hpux*.
* c-c++-common/gomp/unroll-5.c: Likewise.

testsuite: Require lto in g++.dg/modules/enum-14.C

2024-12-15 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-14.C: Require lto.

c++, coroutines: Use finish_if_stmt in a missed case.

Just shorter code.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Use
finish_if_stmt instead of manually applying the same process.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines: Make the resume index consistent for debug.

At present, we only update the resume index when we actually are
at the stage that the coroutine is considered suspended. This is
on the basis that it is UB to resume or destroy a coroutines that
is not suspended (and therefore we never need to access this value
otherwise).  However, it is possible that someone could set a debug
breakpoint on the resume which can be reached without suspending
if await_ready() returns true.  In that case, the debugger would
read an incorrect resume index.  Fixed by moving the update to
just before the test for ready.

gcc/cp/ChangeLog:

* coroutines.cc (expand_one_await_expression): Update the
resume index before checking if the coroutine is ready.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines:Ensure bind exprs are visited once [PR98935].

Recent changes in the C++ FE and the coroutines implementation have
exposed a latent issue in which a bind expression containing a var
that we need to capture in the coroutine state gets visited twice.
This causes an ICE (from a checking assert).  Fixed by adding a pset
to the relevant tree walk.  Exit the callback early when the tree is
not a BIND_EXPR.

PR c++/98935

gcc/cp/ChangeLog:

* coroutines.cc (register_local_var_uses): Add a pset to the
tree walk to avoid visiting the same BIND_EXPR twice.  Make
an early exit for cases that the callback does not apply.
(cp_coroutine_transform::apply_transforms): Add a pset to the
tree walk for register_local_vars.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr98935.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

arm: fix bootstrap after MVE changes

The recent commits for MVE on Saturday have broken armhf bootstrap due to a
-Werror false positive:

    inlined from 'virtual rtx_def* {anonymous}::vstrq_scatter_base_impl::expand(arm_mve::function_expander&) const' at /gcc/config/arm/arm-mve-builtins-base.cc:352:17:
./genrtl.h:38:16: error: 'new_base' may be used uninitialized [-Werror=maybe-uninitialized]
   38 |   XEXP (rt, 1) = arg1;
/gcc/config/arm/arm-mve-builtins-base.cc: In member function 'virtual rtx_def* {anonymous}::vstrq_scatter_base_impl::expand(arm_mve::function_expander&) const':
/gcc/config/arm/arm-mve-builtins-base.cc:311:26: note: 'new_base' was declared here
  311 |     rtx insns, base_ptr, new_base;
      |                          ^~~~~~~~
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
    inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, rtx)' at ./genrtl.h:50:26,
    inlined from 'virtual rtx_def* {anonymous}::vldrq_gather_base_impl::expand(arm_mve::function_expander&) const' at /gcc/config/arm/arm-mve-builtins-base.cc:527:17:
./genrtl.h:38:16: error: 'new_base' may be used uninitialized [-Werror=maybe-uninitialized]
   38 |   XEXP (rt, 1) = arg1;
/gcc/config/arm/arm-mve-builtins-base.cc: In member function 'virtual rtx_def* {anonymous}::vldrq_gather_base_impl::expand(arm_mve::function_expander&) const':
/gcc/config/arm/arm-mve-builtins-base.cc:486:26: note: 'new_base' was declared here
  486 |     rtx insns, base_ptr, new_base;

To fix it I just initialize the value.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (expand): Initialize new_base.

Shrink back size of tree_exp from 40 bytes to 32

The following patch implements what I've mentioned in the 64-bit
location_t thread.
struct tree_exp had unsigned condition_uid member added for something
rarely used (-fcondition-coverage) and even there used only on very small
subset of trees only for the duration of the gimplification.

The following patch uses a hash_map instead, which allows shrinking
tree_exp to its previous size (32 bytes + (number of operands - 1) * sizeof
(tree)).

2024-12-15 Jakub Jelinek <jakub@redhat.com>

* tree-core.h (struct tree_exp): Remove condition_uid member.
* tree.h (SET_EXPR_UID, EXPR_COND_UID): Remove.
* gimplify.cc (nextuid): Rename to ...
(nextconduid): ... this.
(cond_uids): New static variable.
(next_cond_uid, reset_cond_uid): Adjust for the renaming,
formatting fix.
(tree_associate_condition_with_expr): New function.
(shortcut_cond_r, tag_shortcut_cond, shortcut_cond_expr): Use it
instead of SET_EXPR_UID.
(gimplify_cond_expr): Look up cond_uid in cond_uids hash map if
non-NULL instead of using EXPR_COND_UID.
(gimplify_function_tree): Delete cond_uids and set it to NULL.

Fortran: Pointer fcn results must not be finalized [PR117897]

2024-12-15 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/117897
* trans-expr.cc (gfc_trans_assignment_1): RHS pointer function
results must not be finalized.

gcc/testsuite/
PR fortran/117897
* gfortran.dg/finalize_59.f90: New test.

Daily bump.

libbacktrace: don't use ZSTD_CLEVEL_DEFAULT

PR 117812 reports that testing GCC with zstd 1.3.4 fails because
ZSTD_CLEVEL_DEFAULT is not defined, so avoid using it.

PR libbacktrace/117812
* zstdtest.c (test_large): Use 3 rather than ZSTD_CLEVEL_DEFAULT

[PATCH v3] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

Thank you for the feedback. I have made the minor changes that were requested.
Additionally, I extracted the repetitive code into a reusable helper function,
match_plus_neg_pattern, making the code much more readable. Furthermore, the
logic, code, and tests remain the same as in version 2 of the patch.

gcc/ChangeLog:

* match.pd: New pattern.
* simplify-rtx.cc (match_plus_neg_pattern): New helper function.
(simplify_context::simplify_binary_operation_1): New
code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-11.c: New test.

bpf: fix build adding new required arg to RESOLVE_OVERLOADED_BUILTIN

gcc/ChangeLog

* config/bpf/bpf.cc (bpf_resolve_overloaded_builtin): Add argument
`complain'.

doc: Fix typos for --enable-host-pie docs in install.texi

gcc/ChangeLog:

* doc/install.texi (Configuration): Fix typos in documentation
for --enable-host-pie.

gimple-fold: Fix the recent ifcombine optimization for _BitInt [PR118023]

The BIT_FIELD_REF verifier has:
          if (INTEGRAL_TYPE_P (TREE_TYPE (op))
              && !type_has_mode_precision_p (TREE_TYPE (op)))
            {
              error ("%qs of non-mode-precision operand", code_name);
              return true;
            }
check among other things, so one can't extract something out of say
_BitInt(63) or _BitInt(4096).
The new ifcombine optimization happily creates such BIT_FIELD_REFs
and ICEs during their verification.

The following patch fixes that by rejecting those in decode_field_reference.

2024-12-14  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118023
* gimple-fold.cc (decode_field_reference): Return NULL_TREE if
inner has non-type_has_mode_precision_p integral type.

* gcc.dg/bitint-119.c: New test.

warn-access: Fix up matching_alloc_calls_p [PR118024]

The following testcase ICEs because of a bug in matching_alloc_calls_p.
The loop was apparently meant to be walking the two attribute chains
in lock-step, but doesn't really do that.  If the first lookup_attribute
returns non-NULL, the second one is not done, so rmats in that case can
be some random unrelated attribute rather than "malloc" attribute; the
body assumes even rmats if non-NULL is "malloc" attribute and relies
on its argument to be a "malloc" argument and if it is some other
attribute with incompatible attribute, it just crashes.

Now, fixing that in the obvious way, instead of doing
(amats = lookup_attribute ("malloc", amats))
|| (rmats = lookup_attribute ("malloc", rmats))
in the condition do
((amats = lookup_attribute ("malloc", amats)),
(rmats = lookup_attribute ("malloc", rmats)),
(amats || rmats))
fixes the testcase but regresses Wmismatched-dealloc-{2,3}.c tests.
The problem is that walking the attribute lists in a lock-step is obviously
a very bad idea, there is no requirement that the same deallocators are
present in the same order on both decls, e.g. there could be an extra malloc
attribute without argument in just one of the lists, or the order of say
free/realloc could be swapped, etc.  We don't generally document nor enforce
any particular ordering of attributes (even when for some attributes we just
handle the first one rather than all).

So, this patch instead simply splits it into two loops, the first one walks
alloc_decl attributes, the second one walks dealloc_decl attributes.
If the malloc attribute argument is a built-in, that doesn't change
anything, and otherwise we have the chance to populate the whole
common_deallocs hash_set in the first loop and then can check it in the
second one (and don't need to use more expensive add method on it, can just
check contains there).  Not to mention that it also fixes the case when
the function would incorrectly return true if there wasn't a common
deallocator between the two, but dealloc_decl had 2 malloc attributes with
the same deallocator.

2024-12-14  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/118024
* gimple-ssa-warn-access.cc (matching_alloc_calls_p): Walk malloc
attributes of alloc_decl and dealloc_decl in separate loops rather
than in lock-step.  Use common_deallocs.contains rather than
common_deallocs.add in the second loop.

* gcc.dg/pr118024.c: New test.

opts: Use OPTION_SET_P instead of magic value 2 for -fshort-enums default [PR118011]

The magic values for default (usually -1 or sometimes 2) for some options
are from times we haven't global_options_set, I think we should eventually
get rid of all of those.

The PR is about gcc -Q --help=optimizers reporting -fshort-enums as
[enabled] when it is disabled.
For this the following patch is just partial fix; with explicit
gcc -Q --help=optimizers -fshort-enums
or
gcc -Q --help=optimizers -fno-short-enums
it already worked correctly before, with this patch it will report
even with just
gcc -Q --help=optimizers
correct value on most targets, except 32-bit arm with some options or
defaults, so I think it is a step in the right direction.

But, as I wrote in the PR, process_options isn't done before --help=
and even shouldn't be in its current form where it warns on some option
combinations or errors or emits sorry on others, so I think ideally
process_options should have some bool argument whether it is done for
--help= purposes or not, if yes, not emit warnings and just adjust the
options, otherwise do what it currently does.

2024-12-14 Jakub Jelinek <jakub@redhat.com>

PR c/118011
gcc/
* opts.cc (init_options_struct): Don't set opts->x_flag_short_enums to
2.
* toplev.cc (process_options): Test !OPTION_SET_P (flag_short_enums)
rather than flag_short_enums == 2.
gcc/ada/
* gcc-interface/misc.cc (gnat_post_options): Test
!OPTION_SET_P (flag_short_enums) rather than flag_short_enums == 2.

c++: Disallow decomposition of lambda bases [PR90321]

Decomposition of lambda closure types is not allowed by
[dcl.struct.bind] p6, since members of a closure have no name.

r244909 made this an error, but missed the case where a lambda is used
as a base.  This patch moves the check to find_decomp_class_base to
handle this case.

As a drive-by improvement, we also slightly improve the diagnostics to
indicate why a base class was being inspected.  Ideally the diagnostic
would point directly at the relevant base, but there doesn't seem to be
an easy way to get this location just from the binfo so I don't worry
about that here.

PR c++/90321

gcc/cp/ChangeLog:

* decl.cc (find_decomp_class_base): Check for decomposing a
lambda closure type.  Report base class chains if needed.
(cp_finish_decomp): Remove no-longer-needed check.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp62.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>

libstdc++: Remove duplicate using-declaration in <wchar.h>

libstdc++-v3/ChangeLog:

* include/c_compatibility/wchar.h (fgetwc): Remove duplicate
using-declaration.

Daily bump.

cse: Fix up record_jump_equiv checks [PR117095]

The following testcase is miscompiled on s390x-linux with -O2 -march=z15.
The problem happens during cse2, which sees in an extended basic block
(jump_insn 217 78 216 10 (parallel [
            (set (pc)
                (if_then_else (ne (reg:SI 165)
                        (const_int 1 [0x1]))
                    (label_ref 216)
                    (pc)))
            (set (reg:SI 165)
                (plus:SI (reg:SI 165)
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (scratch:SI))
            (clobber (reg:CC 33 %cc))
        ]) "t.c":14:17 discrim 1 2192 {doloop_si64}
     (int_list:REG_BR_PROB 955630228 (nil))
-> 216)
...
(insn 99 98 100 12 (set (reg:SI 138)
        (const_int 1 [0x1])) "t.c":9:31 1507 {*movsi_zarch}
     (nil))
(insn 100 99 103 12 (parallel [
            (set (reg:SI 137)
                (minus:SI (reg:SI 138)
                    (subreg:SI (reg:HI 135 [ a ]) 0)))
            (clobber (reg:CC 33 %cc))
        ]) "t.c":9:31 1904 {*subsi3}
     (expr_list:REG_DEAD (reg:SI 138)
        (expr_list:REG_DEAD (reg:HI 135 [ a ])
            (expr_list:REG_UNUSED (reg:CC 33 %cc)
                (nil)))))
Note, cse2 has df_note_add_problem () before df_analyze, which add
     (expr_list:REG_UNUSED (reg:SI 165)
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
notes to the first insn (correctly so, %cc is clobbered there and pseudo
165 isn't used after the insn).
Now, cse_extended_basic_block has an extra optimization on conditional
jumps, where it records equivalence on the edge which continues in the ebb.
Here it sees (ne reg:SI 165) (const_int 1) is false on the edge and
remembers that pseudo 165 is comparison equivalent to (const_int 1),
so on insn 100 it decides to replace (reg:SI 138) with (reg:SI 165).

This optimization isn't correct here though, because the JUMP_INSN has
multiple sets.  Before r0-77890 record_jump_equiv has been called from
cse_insn guarded on n_sets == 1 && any_condjump_p (insn), so it wouldn't
be done on the above JUMP_INSN where n_sets == 2.  But since that change
it is guarded with single_set (insn) && any_condjump_p (insn) and that
is true because of the REG_UNUSED note.  Looking at that note is
inappropriate in CSE though, because the whole intent of the pass is to
extend the lifetimes of the pseudos if equivalence is found, so the fact
that there is REG_UNUSED note for (reg:SI 165) and that the reg isn't used
later doesn't imply that it won't be used after the optimization.
So, unless we manage to process the other sets on the JUMP_INSN (it wouldn't
be terribly hard in this exact case, the doloop insn decreases the register
by 1 and so we could just record equivalence to (const_int 0) instead, but
generally it might be hard), we should IMHO just punt if there are multiple
sets.

The patch below adds !multiple_sets (insn) check instead of replacing with
it the single_set (insn) check, because apparently any_condjump_p uses
pc_set which supports the case where PATTERN is a SET to PC (that is a
single_set (insn) && !multiple_sets (insn), PATTERN is a PARALLEL with a
single SET to PC (likewise) and some CLOBBERs, PARALLEL with two or more
SETs where the first one is SET to PC (that could be single_set (insn)
with REG_UNUSED notes but is not !multiple_sets (insn)) or PATTERN
is UNSPEC/UNSPEC_VOLATILE with SET inside of it.  For the last case
!multiple_sets (insn) will be true, but IMHO we shouldn't try to derive
anything from those because we haven't checked the rest of the UNSPEC*
and we don't really know what it does.

2024-12-13  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117095
* cse.cc (cse_extended_basic_block): Don't call record_jump_equiv
if multiple_sets (insn).

* gcc.c-torture/execute/pr117095.c: New test.

libstdc++: Avoid unnecessary copies in ranges::min/max [PR112349]

Use a local reference for the (now possibly lifetime extended) result of
*__first so that we copy it only when necessary.

PR libstdc++/112349

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__min_fn::operator()): Turn local
object __tmp into a reference.
* include/bits/ranges_util.h (__max_fn::operator()): Likewise.
* testsuite/25_algorithms/max/constrained.cc (test04): New test.
* testsuite/25_algorithms/min/constrained.cc (test04): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

In this PR, we have to handle a case where MVE predicates are supplied
as a const_int, where individual predicates have illegal boolean
values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
fix the constant (any non-zero value is converted to all 1s) and emit
a warning.

On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
instruction level, but end-users should describe lanes rather than
bytes (so all bytes of a true-predicated lane should be '1'), see the
section on MVE intrinsics in the Arm ACLE specification.

Since force_lowpart_subreg cannot handle const_int (because they have VOID mode),
use gen_lowpart on them, force_lowpart_subreg otherwise.

2024-11-20  Christophe Lyon  <christophe.lyon@linaro.org>
    Jakub Jelinek  <jakub@redhat.com>

PR target/114801
gcc/
* config/arm/arm-mve-builtins.cc
(function_expander::add_input_operand): Handle CONST_INT
predicates.

gcc/testsuite/
* gcc.target/arm/mve/pr108443.c: Update predicate constant.
* gcc.target/arm/mve/pr108443-run.c: Likewise.
* gcc.target/arm/mve/pr114801.c: New test.

arm: [MVE intrinsics] rework vst2q vst4q vld2q vld4q

Implement vst2q, vst4q, vld2q and vld4q using the new MVE builtins
framework.

Since MVE uses different tuple modes than Neon, we need to use
VALID_MVE_STRUCT_MODE because VALID_NEON_STRUCT_MODE is no longer a
super-set of it, for instance in output_move_neon and
arm_print_operand_address.

In arm_hard_regno_mode_ok, the change is similar but a bit more
intrusive.

Expand the VSTRUCT iterator, so that mov<mode> and neon_mov<mode>
patterns from neon.md still work for MVE.

Besides the small updates to the patterns in mve.md, we have to update
vec_load_lanes and vec_store_lanes in vec-common.md so that the
vectorizer can handle the new modes. These patterns are now different
from Neon's, so maybe we should move them back to neon.md and mve.md

The patch adds arm_array_mode, which is used by build_array_type_nelts
and makes it possible to support the new assert in
register_builtin_tuple_types.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vst24_impl): New.
(class vld24_impl): New.
(vld2q, vld4q, vst2q, vst4q): New.
* config/arm/arm-mve-builtins-base.def (vld2q, vld4q, vst2q)
(vst4q): New.
* config/arm/arm-mve-builtins-base.h (vld2q, vld4q, vst2q, vst4q):
New.
* config/arm/arm-mve-builtins.cc (register_builtin_tuple_types):
Add more asserts.
* config/arm/arm.cc (TARGET_ARRAY_MODE): New.
(output_move_neon): Handle MVE struct modes.
(arm_print_operand_address): Likewise.
(arm_hard_regno_mode_ok): Likewise.
(arm_array_mode): New.
* config/arm/arm.h (VALID_MVE_STRUCT_MODE): Likewise.
* config/arm/arm_mve.h (vst4q): Delete.
(vst2q): Delete.
(vld2q): Delete.
(vld4q): Delete.
(vst4q_s8): Delete.
(vst4q_s16): Delete.
(vst4q_s32): Delete.
(vst4q_u8): Delete.
(vst4q_u16): Delete.
(vst4q_u32): Delete.
(vst4q_f16): Delete.
(vst4q_f32): Delete.
(vst2q_s8): Delete.
(vst2q_u8): Delete.
(vld2q_s8): Delete.
(vld2q_u8): Delete.
(vld4q_s8): Delete.
(vld4q_u8): Delete.
(vst2q_s16): Delete.
(vst2q_u16): Delete.
(vld2q_s16): Delete.
(vld2q_u16): Delete.
(vld4q_s16): Delete.
(vld4q_u16): Delete.
(vst2q_s32): Delete.
(vst2q_u32): Delete.
(vld2q_s32): Delete.
(vld2q_u32): Delete.
(vld4q_s32): Delete.
(vld4q_u32): Delete.
(vld4q_f16): Delete.
(vld2q_f16): Delete.
(vst2q_f16): Delete.
(vld4q_f32): Delete.
(vld2q_f32): Delete.
(vst2q_f32): Delete.
(__arm_vst4q_s8): Delete.
(__arm_vst4q_s16): Delete.
(__arm_vst4q_s32): Delete.
(__arm_vst4q_u8): Delete.
(__arm_vst4q_u16): Delete.
(__arm_vst4q_u32): Delete.
(__arm_vst2q_s8): Delete.
(__arm_vst2q_u8): Delete.
(__arm_vld2q_s8): Delete.
(__arm_vld2q_u8): Delete.
(__arm_vld4q_s8): Delete.
(__arm_vld4q_u8): Delete.
(__arm_vst2q_s16): Delete.
(__arm_vst2q_u16): Delete.
(__arm_vld2q_s16): Delete.
(__arm_vld2q_u16): Delete.
(__arm_vld4q_s16): Delete.
(__arm_vld4q_u16): Delete.
(__arm_vst2q_s32): Delete.
(__arm_vst2q_u32): Delete.
(__arm_vld2q_s32): Delete.
(__arm_vld2q_u32): Delete.
(__arm_vld4q_s32): Delete.
(__arm_vld4q_u32): Delete.
(__arm_vst4q_f16): Delete.
(__arm_vst4q_f32): Delete.
(__arm_vld4q_f16): Delete.
(__arm_vld2q_f16): Delete.
(__arm_vst2q_f16): Delete.
(__arm_vld4q_f32): Delete.
(__arm_vld2q_f32): Delete.
(__arm_vst2q_f32): Delete.
(__arm_vst4q): Delete.
(__arm_vst2q): Delete.
(__arm_vld2q): Delete.
(__arm_vld4q): Delete.
* config/arm/arm_mve_builtins.def (vst4q, vst2q, vld4q, vld2q):
Delete.
* config/arm/iterators.md (VSTRUCT): Add V2x16QI, V2x8HI, V2x4SI,
V2x8HF, V2x4SF, V4x16QI, V4x8HI, V4x4SI, V4x8HF, V4x4SF.
(MVE_VLD2_VST2, MVE_vld2_vst2, MVE_VLD4_VST4, MVE_vld4_vst4): New.
* config/arm/mve.md (mve_vst4q<mode>): Update into ...
(@mve_vst4q<mode>): ... this.
(mve_vst2q<mode>): Update into ...
(@mve_vst2q<mode>): ... this.
(mve_vld2q<mode>): Update into ...
(@mve_vld2q<mode>): ... this.
(mve_vld4q<mode>): Update into ...
(@mve_vld4q<mode>): ... this.
* config/arm/vec-common.md (vec_load_lanesoi<mode>) Remove MVE
support.
(vec_load_lanesxi<mode>): Likewise.
(vec_store_lanesoi<mode>): Likewise.
(vec_store_lanesxi<mode>): Likewise.
(vec_load_lanes<MVE_vld2_vst2><mode>):
New.
(vec_store_lanes<MVE_vld2_vst2><mode>): New.
(vec_load_lanes<MVE_vld4_vst4><mode>): New.
(vec_store_lanes<MVE_vld4_vst4><mode>): New.

arm: [MVE intrinsics] fix store shape to support tuples

Now that tuples are properly supported, we can update the store shape, to expect
"t0" instead of "v0" as last argument.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct store_def): Add
support for tuples.

arm: [MVE intrinsics] add support for tuples

This patch is largely a copy/paste from the aarch64 SVE counterpart,
and adds support for tuples to the MVE intrinsics framework.

Introduce function_resolver::infer_tuple_type which will be used to
resolve overloaded vst2q and vst4q function names in a later patch.

Fix access to acle_vector_types in a few places, as well as in
infer_vector_or_tuple_type because we should shift the tuple size to
the right by one bit when computing the array index.

The new wrap_type_in_struct, register_type_decl and infer_tuple_type
are largely copies of the aarch64 versions, and
register_builtin_tuple_types is very similar.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (parse_type): Fix access
to acle_vector_types.
* config/arm/arm-mve-builtins.cc (wrap_type_in_struct): New.
(register_type_decl): New.
(register_builtin_tuple_types): Fix support for tuples.
(function_resolver::infer_tuple_type): New.
* config/arm/arm-mve-builtins.h
(function_resolver::infer_tuple_type): Declare.
(function_instance::tuple_type): Fix access to acle_vector_types.

arm: [MVE intrinsics] add modes for tuples

Add V2x and V4x modes, like we do on aarch64 for Advanced SIMD
q-registers.

gcc/ChangeLog:

* config/arm/arm-modes.def (MVE_STRUCT_MODES): New.

arm: [MVE intrinsics] remove V2DF from MVE_vecs iterator

V2DF is not supported by MVE, so remove it from the only iterator
which contains it.

gcc/ChangeLog:

* config/arm/iterators.md (MVE_vecs): Remove V2DF.

arm: [MVE intrinsics] Fix condition for vec_extract patterns

Remove floating-point condition from mve_vec_extract_sext_internal and
mve_vec_extract_zext_internal, since the MVE_2 iterator does not
include any FP mode.

gcc/ChangeLog:

* config/arm/mve.md (mve_vec_extract_sext_internal): Fix
condition.
(mve_vec_extract_zext_internal): Likewise.

arm: [MVE intrinsics] remove useless call_properties implementations.

vstrq_impl derives from store_truncating and vldrq_impl derives from
load_extending which both implement call_properties.

No need to re-implement them in the derived classes.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (vstrq_impl): Remove
call_properties.
(vldrq_impl): Likewise.

arm: [MVE intrinsics] rework vldr gather_base_wb

Implement vldr?q_gather_base_wb using the new MVE builtins framework.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgbwbxu_qualifiers)
(arm_ldrgbwbxu_z_qualifiers, arm_ldrgbwbs_qualifiers)
(arm_ldrgbwbu_qualifiers, arm_ldrgbwbs_z_qualifiers)
(arm_ldrgbwbu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vldrq_gather_base_impl):
Add support for MODE_wb.
* config/arm/arm-mve-builtins-shapes.cc (struct
load_gather_base_def): Likewise.
* config/arm/arm_mve.h (vldrdq_gather_base_wb_s64): Delete.
(vldrdq_gather_base_wb_u64): Delete.
(vldrdq_gather_base_wb_z_s64): Delete.
(vldrdq_gather_base_wb_z_u64): Delete.
(vldrwq_gather_base_wb_f32): Delete.
(vldrwq_gather_base_wb_s32): Delete.
(vldrwq_gather_base_wb_u32): Delete.
(vldrwq_gather_base_wb_z_f32): Delete.
(vldrwq_gather_base_wb_z_s32): Delete.
(vldrwq_gather_base_wb_z_u32): Delete.
(__arm_vldrdq_gather_base_wb_s64): Delete.
(__arm_vldrdq_gather_base_wb_u64): Delete.
(__arm_vldrdq_gather_base_wb_z_s64): Delete.
(__arm_vldrdq_gather_base_wb_z_u64): Delete.
(__arm_vldrwq_gather_base_wb_s32): Delete.
(__arm_vldrwq_gather_base_wb_u32): Delete.
(__arm_vldrwq_gather_base_wb_z_s32): Delete.
(__arm_vldrwq_gather_base_wb_z_u32): Delete.
(__arm_vldrwq_gather_base_wb_f32): Delete.
(__arm_vldrwq_gather_base_wb_z_f32): Delete.
* config/arm/arm_mve_builtins.def (vldrwq_gather_base_nowb_z_u)
(vldrdq_gather_base_nowb_z_u, vldrwq_gather_base_nowb_u)
(vldrdq_gather_base_nowb_u, vldrwq_gather_base_nowb_z_s)
(vldrwq_gather_base_nowb_z_f, vldrdq_gather_base_nowb_z_s)
(vldrwq_gather_base_nowb_s, vldrwq_gather_base_nowb_f)
(vldrdq_gather_base_nowb_s, vldrdq_gather_base_wb_z_s)
(vldrdq_gather_base_wb_z_u, vldrdq_gather_base_wb_s)
(vldrdq_gather_base_wb_u, vldrwq_gather_base_wb_z_s)
(vldrwq_gather_base_wb_z_f, vldrwq_gather_base_wb_z_u)
(vldrwq_gather_base_wb_s, vldrwq_gather_base_wb_f)
(vldrwq_gather_base_wb_u): Delete
* config/arm/iterators.md (supf): Remove VLDRWQGBWB_S,
VLDRWQGBWB_U, VLDRDQGBWB_S, VLDRDQGBWB_U.
(VLDRWGBWBQ, VLDRDGBWBQ): Delete.
* config/arm/mve.md (mve_vldrwq_gather_base_wb_<supf>v4si): Delete.
(mve_vldrwq_gather_base_nowb_<supf>v4si): Delete.
(mve_vldrwq_gather_base_wb_<supf>v4si_insn): Delete.
(mve_vldrwq_gather_base_wb_z_<supf>v4si): Delete.
(mve_vldrwq_gather_base_nowb_z_<supf>v4si): Delete.
(mve_vldrwq_gather_base_wb_z_<supf>v4si_insn): Delete.
(mve_vldrwq_gather_base_wb_fv4sf): Delete.
(mve_vldrwq_gather_base_nowb_fv4sf): Delete.
(mve_vldrwq_gather_base_wb_fv4sf_insn): Delete.
(mve_vldrwq_gather_base_wb_z_fv4sf): Delete.
(mve_vldrwq_gather_base_nowb_z_fv4sf): Delete.
(mve_vldrwq_gather_base_wb_z_fv4sf_insn): Delete.
(mve_vldrdq_gather_base_wb_<supf>v2di): Delete.
(mve_vldrdq_gather_base_nowb_<supf>v2di): Delete.
(mve_vldrdq_gather_base_wb_<supf>v2di_insn): Delete.
(mve_vldrdq_gather_base_wb_z_<supf>v2di): Delete.
(mve_vldrdq_gather_base_nowb_z_<supf>v2di): Delete.
(mve_vldrdq_gather_base_wb_z_<supf>v2di_insn): Delete.
(@mve_vldrq_gather_base_wb_<mode>): New.
(@mve_vldrq_gather_base_wb_z_<mode>): New.
* config/arm/unspecs.md (VLDRWQGBWB_S, VLDRWQGBWB_U, VLDRWQGBWB_F)
(VLDRDQGBWB_S, VLDRDQGBWB_U): Delete
(VLDRGBWBQ, VLDRGBWBQ_Z): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
Update expected output.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
Likewise.

arm: [MVE intrinsics] rework vldr gather_base

Implement vldr?q_gather_base using the new MVE builtins framework.

The patch updates two testcases rather than using different iterators
for predicated and non-predicated versions. According to ACLE:
vldrdq_gather_base_s64 is expected to generate VLDRD.64
vldrdq_gather_base_z_s64 is expected to generate VLDRDT.U64

Both are equally valid, however.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgbs_qualifiers)
(arm_ldrgbu_qualifiers, arm_ldrgbs_z_qualifiers)
(arm_ldrgbu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class
vldrq_gather_base_impl): New.
(vldrdq_gather_base, vldrwq_gather_base): New.
* config/arm/arm-mve-builtins-base.def (vldrdq_gather_base)
(vldrwq_gather_base): New.
* config/arm/arm-mve-builtins-base.h: (vldrdq_gather_base)
(vldrwq_gather_base): New.
* config/arm/arm_mve.h (vldrwq_gather_base_s32): Delete.
(vldrwq_gather_base_u32): Delete.
(vldrwq_gather_base_z_u32): Delete.
(vldrwq_gather_base_z_s32): Delete.
(vldrdq_gather_base_s64): Delete.
(vldrdq_gather_base_u64): Delete.
(vldrdq_gather_base_z_s64): Delete.
(vldrdq_gather_base_z_u64): Delete.
(vldrwq_gather_base_f32): Delete.
(vldrwq_gather_base_z_f32): Delete.
(__arm_vldrwq_gather_base_s32): Delete.
(__arm_vldrwq_gather_base_u32): Delete.
(__arm_vldrwq_gather_base_z_s32): Delete.
(__arm_vldrwq_gather_base_z_u32): Delete.
(__arm_vldrdq_gather_base_s64): Delete.
(__arm_vldrdq_gather_base_u64): Delete.
(__arm_vldrdq_gather_base_z_s64): Delete.
(__arm_vldrdq_gather_base_z_u64): Delete.
(__arm_vldrwq_gather_base_f32): Delete.
(__arm_vldrwq_gather_base_z_f32): Delete.
* config/arm/arm_mve_builtins.def (vldrwq_gather_base_s)
(vldrwq_gather_base_u, vldrwq_gather_base_z_s)
(vldrwq_gather_base_z_u, vldrdq_gather_base_s)
(vldrwq_gather_base_f, vldrdq_gather_base_z_s)
(vldrwq_gather_base_z_f, vldrdq_gather_base_u)
(vldrdq_gather_base_z_u): Delete.
* config/arm/iterators.md (supf): Remove VLDRWQGB_S, VLDRWQGB_U,
VLDRDQGB_S, VLDRDQGB_U.
(VLDRWGBQ, VLDRDGBQ): Delete.
* config/arm/mve.md (mve_vldrwq_gather_base_<supf>v4si): Delete.
(mve_vldrwq_gather_base_z_<supf>v4si): Delete.
(mve_vldrdq_gather_base_<supf>v2di): Delete.
(mve_vldrdq_gather_base_z_<supf>v2di): Delete.
(mve_vldrwq_gather_base_fv4sf): Delete.
(mve_vldrwq_gather_base_z_fv4sf): Delete.
(@mve_vldrq_gather_base_<mode>): New.
(@mve_vldrq_gather_base_z_<mode>): New.
* config/arm/unspecs.md (VLDRWQGB_S, VLDRWQGB_U, VLDRDQGB_S)
(VLDRDQGB_U, VLDRWQGB_F): Delete.
(VLDRGBQ, VLDRGBQ_Z): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c: Update
expected output.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c:
Likewise.

arm: [MVE intrinsics] add load_gather_base shape

This patch adds the load_gather_base shape description.

Unlike other load_gather shapes, this one does not support overloaded
forms.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct
load_gather_base_def): New.
* config/arm/arm-mve-builtins-shapes.h: (load_gather_base): New.

arm: [MVE intrinsics] rework vldr gather_shifted_offset

Implement vldr?q_gather_shifted_offset using the new MVE builtins
framework.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgu_qualifiers)
(arm_ldrgs_qualifiers, arm_ldrgs_z_qualifiers)
(arm_ldrgu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vldrq_gather_impl): Add
support for shifted version.
(vldrdq_gather_shifted, vldrhq_gather_shifted)
(vldrwq_gather_shifted): New.
* config/arm/arm-mve-builtins-base.def (vldrdq_gather_shifted)
(vldrhq_gather_shifted, vldrwq_gather_shifted): New.
* config/arm/arm-mve-builtins-base.h (vldrdq_gather_shifted)
(vldrhq_gather_shifted, vldrwq_gather_shifted): New.
* config/arm/arm_mve.h (vldrhq_gather_shifted_offset): Delete.
(vldrhq_gather_shifted_offset_z): Delete.
(vldrdq_gather_shifted_offset): Delete.
(vldrdq_gather_shifted_offset_z): Delete.
(vldrwq_gather_shifted_offset): Delete.
(vldrwq_gather_shifted_offset_z): Delete.
(vldrhq_gather_shifted_offset_s32): Delete.
(vldrhq_gather_shifted_offset_s16): Delete.
(vldrhq_gather_shifted_offset_u32): Delete.
(vldrhq_gather_shifted_offset_u16): Delete.
(vldrhq_gather_shifted_offset_z_s32): Delete.
(vldrhq_gather_shifted_offset_z_s16): Delete.
(vldrhq_gather_shifted_offset_z_u32): Delete.
(vldrhq_gather_shifted_offset_z_u16): Delete.
(vldrdq_gather_shifted_offset_s64): Delete.
(vldrdq_gather_shifted_offset_u64): Delete.
(vldrdq_gather_shifted_offset_z_s64): Delete.
(vldrdq_gather_shifted_offset_z_u64): Delete.
(vldrhq_gather_shifted_offset_f16): Delete.
(vldrhq_gather_shifted_offset_z_f16): Delete.
(vldrwq_gather_shifted_offset_f32): Delete.
(vldrwq_gather_shifted_offset_s32): Delete.
(vldrwq_gather_shifted_offset_u32): Delete.
(vldrwq_gather_shifted_offset_z_f32): Delete.
(vldrwq_gather_shifted_offset_z_s32): Delete.
(vldrwq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_s32): Delete.
(__arm_vldrhq_gather_shifted_offset_s16): Delete.
(__arm_vldrhq_gather_shifted_offset_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_u16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_s32): Delete.
(__arm_vldrhq_gather_shifted_offset_z_s16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_z_u16): Delete.
(__arm_vldrdq_gather_shifted_offset_s64): Delete.
(__arm_vldrdq_gather_shifted_offset_u64): Delete.
(__arm_vldrdq_gather_shifted_offset_z_s64): Delete.
(__arm_vldrdq_gather_shifted_offset_z_u64): Delete.
(__arm_vldrwq_gather_shifted_offset_s32): Delete.
(__arm_vldrwq_gather_shifted_offset_u32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_s32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_f16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_f16): Delete.
(__arm_vldrwq_gather_shifted_offset_f32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_f32): Delete.
(__arm_vldrhq_gather_shifted_offset): Delete.
(__arm_vldrhq_gather_shifted_offset_z): Delete.
(__arm_vldrdq_gather_shifted_offset): Delete.
(__arm_vldrdq_gather_shifted_offset_z): Delete.
(__arm_vldrwq_gather_shifted_offset): Delete.
(__arm_vldrwq_gather_shifted_offset_z): Delete.
* config/arm/arm_mve_builtins.def
(vldrhq_gather_shifted_offset_z_u, vldrhq_gather_shifted_offset_u)
(vldrhq_gather_shifted_offset_z_s, vldrhq_gather_shifted_offset_s)
(vldrdq_gather_shifted_offset_s, vldrhq_gather_shifted_offset_f)
(vldrwq_gather_shifted_offset_f, vldrwq_gather_shifted_offset_s)
(vldrdq_gather_shifted_offset_z_s)
(vldrhq_gather_shifted_offset_z_f)
(vldrwq_gather_shifted_offset_z_f)
(vldrwq_gather_shifted_offset_z_s, vldrdq_gather_shifted_offset_u)
(vldrwq_gather_shifted_offset_u, vldrdq_gather_shifted_offset_z_u)
(vldrwq_gather_shifted_offset_z_u): Delete.
* config/arm/iterators.md (supf): Remove VLDRHQGSO_S, VLDRHQGSO_U,
VLDRDQGSO_S, VLDRDQGSO_U, VLDRWQGSO_S, VLDRWQGSO_U.
(VLDRHGSOQ, VLDRDGSOQ, VLDRWGSOQ): Delete.
* config/arm/mve.md
(mve_vldrhq_gather_shifted_offset_<supf><mode>): Delete.
(mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Delete.
(mve_vldrdq_gather_shifted_offset_<supf>v2di): Delete.
(mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Delete.
(mve_vldrhq_gather_shifted_offset_fv8hf): Delete.
(mve_vldrhq_gather_shifted_offset_z_fv8hf): Delete.
(mve_vldrwq_gather_shifted_offset_fv4sf): Delete.
(mve_vldrwq_gather_shifted_offset_<supf>v4si): Delete.
(mve_vldrwq_gather_shifted_offset_z_fv4sf): Delete.
(mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Delete.
(@mve_vldrq_gather_shifted_offset_<mode>): New.
(@mve_vldrq_gather_shifted_offset_extend_v4si<US>): New.
(@mve_vldrq_gather_shifted_offset_z_<mode>): New.
(@mve_vldrq_gather_shifted_offset_z_extend_v4si<US>): New.
* config/arm/unspecs.md (VLDRHQGSO_S, VLDRHQGSO_U, VLDRDQGSO_S)
(VLDRDQGSO_U, VLDRHQGSO_F, VLDRWQGSO_F, VLDRWQGSO_S, VLDRWQGSO_U):
Delete.
(VLDRGSOQ, VLDRGSOQ_Z, VLDRGSOQ_EXT, VLDRGSOQ_EXT_Z): New.

arm: [MVE intrinsics] rework vldr gather_offset

Implement vldr?q_gather_offset using the new MVE builtins framework.

The patch introduces a new attribute iterator (MVE_u_elem) to
accomodate the fact that ACLE's expected output description uses "uNN"
for all modes, except V8HF where it expects ".f16". Using "V_sz_elem"
would work, but would require to update several testcases.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vldrq_gather_impl):
New.
(vldrbq_gather, vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm-mve-builtins-base.def (vldrbq_gather)
(vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm-mve-builtins-base.h (vldrbq_gather)
(vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm_mve.h (vldrbq_gather_offset): Delete.
(vldrbq_gather_offset_z): Delete.
(vldrhq_gather_offset): Delete.
(vldrhq_gather_offset_z): Delete.
(vldrdq_gather_offset): Delete.
(vldrdq_gather_offset_z): Delete.
(vldrwq_gather_offset): Delete.
(vldrwq_gather_offset_z): Delete.
(vldrbq_gather_offset_u8): Delete.
(vldrbq_gather_offset_s8): Delete.
(vldrbq_gather_offset_u16): Delete.
(vldrbq_gather_offset_s16): Delete.
(vldrbq_gather_offset_u32): Delete.
(vldrbq_gather_offset_s32): Delete.
(vldrbq_gather_offset_z_s16): Delete.
(vldrbq_gather_offset_z_u8): Delete.
(vldrbq_gather_offset_z_s32): Delete.
(vldrbq_gather_offset_z_u16): Delete.
(vldrbq_gather_offset_z_u32): Delete.
(vldrbq_gather_offset_z_s8): Delete.
(vldrhq_gather_offset_s32): Delete.
(vldrhq_gather_offset_s16): Delete.
(vldrhq_gather_offset_u32): Delete.
(vldrhq_gather_offset_u16): Delete.
(vldrhq_gather_offset_z_s32): Delete.
(vldrhq_gather_offset_z_s16): Delete.
(vldrhq_gather_offset_z_u32): Delete.
(vldrhq_gather_offset_z_u16): Delete.
(vldrdq_gather_offset_s64): Delete.
(vldrdq_gather_offset_u64): Delete.
(vldrdq_gather_offset_z_s64): Delete.
(vldrdq_gather_offset_z_u64): Delete.
(vldrhq_gather_offset_f16): Delete.
(vldrhq_gather_offset_z_f16): Delete.
(vldrwq_gather_offset_f32): Delete.
(vldrwq_gather_offset_s32): Delete.
(vldrwq_gather_offset_u32): Delete.
(vldrwq_gather_offset_z_f32): Delete.
(vldrwq_gather_offset_z_s32): Delete.
(vldrwq_gather_offset_z_u32): Delete.
(__arm_vldrbq_gather_offset_u8): Delete.
(__arm_vldrbq_gather_offset_s8): Delete.
(__arm_vldrbq_gather_offset_u16): Delete.
(__arm_vldrbq_gather_offset_s16): Delete.
(__arm_vldrbq_gather_offset_u32): Delete.
(__arm_vldrbq_gather_offset_s32): Delete.
(__arm_vldrbq_gather_offset_z_s8): Delete.
(__arm_vldrbq_gather_offset_z_s32): Delete.
(__arm_vldrbq_gather_offset_z_s16): Delete.
(__arm_vldrbq_gather_offset_z_u8): Delete.
(__arm_vldrbq_gather_offset_z_u32): Delete.
(__arm_vldrbq_gather_offset_z_u16): Delete.
(__arm_vldrhq_gather_offset_s32): Delete.
(__arm_vldrhq_gather_offset_s16): Delete.
(__arm_vldrhq_gather_offset_u32): Delete.
(__arm_vldrhq_gather_offset_u16): Delete.
(__arm_vldrhq_gather_offset_z_s32): Delete.
(__arm_vldrhq_gather_offset_z_s16): Delete.
(__arm_vldrhq_gather_offset_z_u32): Delete.
(__arm_vldrhq_gather_offset_z_u16): Delete.
(__arm_vldrdq_gather_offset_s64): Delete.
(__arm_vldrdq_gather_offset_u64): Delete.
(__arm_vldrdq_gather_offset_z_s64): Delete.
(__arm_vldrdq_gather_offset_z_u64): Delete.
(__arm_vldrwq_gather_offset_s32): Delete.
(__arm_vldrwq_gather_offset_u32): Delete.
(__arm_vldrwq_gather_offset_z_s32): Delete.
(__arm_vldrwq_gather_offset_z_u32): Delete.
(__arm_vldrhq_gather_offset_f16): Delete.
(__arm_vldrhq_gather_offset_z_f16): Delete.
(__arm_vldrwq_gather_offset_f32): Delete.
(__arm_vldrwq_gather_offset_z_f32): Delete.
(__arm_vldrbq_gather_offset): Delete.
(__arm_vldrbq_gather_offset_z): Delete.
(__arm_vldrhq_gather_offset): Delete.
(__arm_vldrhq_gather_offset_z): Delete.
(__arm_vldrdq_gather_offset): Delete.
(__arm_vldrdq_gather_offset_z): Delete.
(__arm_vldrwq_gather_offset): Delete.
(__arm_vldrwq_gather_offset_z): Delete.
* config/arm/arm_mve_builtins.def (vldrbq_gather_offset_u)
(vldrbq_gather_offset_s, vldrbq_gather_offset_z_s)
(vldrbq_gather_offset_z_u, vldrhq_gather_offset_z_u)
(vldrhq_gather_offset_u, vldrhq_gather_offset_z_s)
(vldrhq_gather_offset_s, vldrdq_gather_offset_s)
(vldrhq_gather_offset_f, vldrwq_gather_offset_f)
(vldrwq_gather_offset_s, vldrdq_gather_offset_z_s)
(vldrhq_gather_offset_z_f, vldrwq_gather_offset_z_f)
(vldrwq_gather_offset_z_s, vldrdq_gather_offset_u)
(vldrwq_gather_offset_u, vldrdq_gather_offset_z_u)
(vldrwq_gather_offset_z_u): Delete.
* config/arm/iterators.md (MVE_u_elem): New.
(supf): Remove VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S, VLDRHQGO_U,
VLDRDQGO_S, VLDRDQGO_U, VLDRWQGO_S, VLDRWQGO_U.
(VLDRBGOQ, VLDRHGOQ, VLDRDGOQ, VLDRWGOQ): Delete.
* config/arm/mve.md (mve_vldrbq_gather_offset_<supf><mode>):
Delete.
(mve_vldrbq_gather_offset_z_<supf><mode>): Delete.
(mve_vldrhq_gather_offset_<supf><mode>): Delete.
(mve_vldrhq_gather_offset_z_<supf><mode>): Delete.
(mve_vldrdq_gather_offset_<supf>v2di): Delete.
(mve_vldrdq_gather_offset_z_<supf>v2di): Delete.
(mve_vldrhq_gather_offset_fv8hf): Delete.
(mve_vldrhq_gather_offset_z_fv8hf): Delete.
(mve_vldrwq_gather_offset_fv4sf): Delete.
(mve_vldrwq_gather_offset_<supf>v4si): Delete.
(mve_vldrwq_gather_offset_z_fv4sf): Delete.
(mve_vldrwq_gather_offset_z_<supf>v4si): Delete.
(@mve_vldrq_gather_offset_<mode>): New.
(@mve_vldrq_gather_offset_extend_<mode><US>): New.
(@mve_vldrq_gather_offset_z_<mode>): New.
(@mve_vldrq_gather_offset_z_extend_<mode><US>): New.
* config/arm/unspecs.md (VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S)
(VLDRHQGO_U, VLDRDQGO_S, VLDRDQGO_U, VLDRHQGO_F, VLDRWQGO_F)
(VLDRWQGO_S, VLDRWQGO_U): Delete.
(VLDRGOQ, VLDRGOQ_Z, VLDRGOQ_EXT, VLDRGOQ_EXT_Z): New.

arm: [MVE intrinsics] add load_ext_gather_offset shape

This patch adds the load_ext_gather_offset shape description.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct load_ext_gather):
New.
(struct load_ext_gather_offset_def): New.
* config/arm/arm-mve-builtins-shapes.h (load_ext_gather_offset):
New.

arm: [MVE intrinsics] rework vstr scatter_base_wb

Implement vstr?q_scatter_base_wb using the new MVE builtins framework.

The patch introduces a new 'b' type for signatures, which
represents the type of the 'base' argument of vstr?q_scatter_base_wb.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strsbwbs_qualifiers)
(arm_strsbwbu_qualifiers, arm_strsbwbs_p_qualifiers)
(arm_strsbwbu_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vstrq_scatter_base_impl):
Add support for MODE_wb.
* config/arm/arm-mve-builtins-shapes.cc (parse_type): Add support
for 'b' type.
(store_scatter_base): Add support for MODE_wb.
* config/arm/arm-mve-builtins.cc
(function_resolver::require_pointer_to_type): New.
* config/arm/arm-mve-builtins.h
(function_resolver::require_pointer_to_type): New.
* config/arm/arm_mve.h (vstrdq_scatter_base_wb): Delete.
(vstrdq_scatter_base_wb_p): Delete.
(vstrwq_scatter_base_wb_p): Delete.
(vstrwq_scatter_base_wb): Delete.
(vstrdq_scatter_base_wb_p_s64): Delete.
(vstrdq_scatter_base_wb_p_u64): Delete.
(vstrdq_scatter_base_wb_s64): Delete.
(vstrdq_scatter_base_wb_u64): Delete.
(vstrwq_scatter_base_wb_p_s32): Delete.
(vstrwq_scatter_base_wb_p_f32): Delete.
(vstrwq_scatter_base_wb_p_u32): Delete.
(vstrwq_scatter_base_wb_s32): Delete.
(vstrwq_scatter_base_wb_u32): Delete.
(vstrwq_scatter_base_wb_f32): Delete.
(__arm_vstrdq_scatter_base_wb_s64): Delete.
(__arm_vstrdq_scatter_base_wb_u64): Delete.
(__arm_vstrdq_scatter_base_wb_p_s64): Delete.
(__arm_vstrdq_scatter_base_wb_p_u64): Delete.
(__arm_vstrwq_scatter_base_wb_p_s32): Delete.
(__arm_vstrwq_scatter_base_wb_p_u32): Delete.
(__arm_vstrwq_scatter_base_wb_s32): Delete.
(__arm_vstrwq_scatter_base_wb_u32): Delete.
(__arm_vstrwq_scatter_base_wb_f32): Delete.
(__arm_vstrwq_scatter_base_wb_p_f32): Delete.
(__arm_vstrdq_scatter_base_wb): Delete.
(__arm_vstrdq_scatter_base_wb_p): Delete.
(__arm_vstrwq_scatter_base_wb_p): Delete.
(__arm_vstrwq_scatter_base_wb): Delete.
* config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_u)
(vstrdq_scatter_base_wb_u, vstrwq_scatter_base_wb_p_u)
(vstrdq_scatter_base_wb_p_u, vstrwq_scatter_base_wb_s)
(vstrwq_scatter_base_wb_f, vstrdq_scatter_base_wb_s)
(vstrwq_scatter_base_wb_p_s, vstrwq_scatter_base_wb_p_f)
(vstrdq_scatter_base_wb_p_s): Delete.
* config/arm/iterators.md (supf): Remove VSTRWQSBWB_S,
VSTRWQSBWB_U, VSTRDQSBWB_S, VSTRDQSBWB_U.
(VSTRDSBQ, VSTRWSBWBQ, VSTRDSBWBQ): Delete.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Delete.
(mve_vstrwq_scatter_base_wb_p_<supf>v4si): Delete.
(mve_vstrwq_scatter_base_wb_fv4sf): Delete.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Delete.
(mve_vstrdq_scatter_base_wb_<supf>v2di): Delete.
(mve_vstrdq_scatter_base_wb_p_<supf>v2di): Delete.
(@mve_vstrq_scatter_base_wb_<mode>): New.
(@mve_vstrq_scatter_base_wb_p_<mode>): New.
* config/arm/unspecs.md (VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRWQSBWB_F)
(VSTRDQSBWB_S, VSTRDQSBWB_U): Delete.
(VSTRSBWBQ, VSTRSBWBQ_P): New.

arm: [MVE intrinsics] rework vstr scatter_base

Implement vstr?q_scatter_base using the new MVE builtins framework.

We need to introduce a new iterator (MVE_4) to support the set needed
by vstr?q_scatter_base (V4SI V4SF V2DI).

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strsbs_qualifiers)
(arm_strsbu_qualifiers, arm_strsbs_p_qualifiers)
(arm_strsbu_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class
vstrq_scatter_base_impl): New.
(vstrwq_scatter_base, vstrdq_scatter_base): New.
* config/arm/arm-mve-builtins-base.def (vstrwq_scatter_base)
(vstrdq_scatter_base): New.
* config/arm/arm-mve-builtins-base.h (vstrwq_scatter_base)
(vstrdq_scatter_base): New.
* config/arm/arm_mve.h (vstrwq_scatter_base): Delete.
(vstrwq_scatter_base_p): Delete.
(vstrdq_scatter_base_p): Delete.
(vstrdq_scatter_base): Delete.
(vstrwq_scatter_base_s32): Delete.
(vstrwq_scatter_base_u32): Delete.
(vstrwq_scatter_base_p_s32): Delete.
(vstrwq_scatter_base_p_u32): Delete.
(vstrdq_scatter_base_p_s64): Delete.
(vstrdq_scatter_base_p_u64): Delete.
(vstrdq_scatter_base_s64): Delete.
(vstrdq_scatter_base_u64): Delete.
(vstrwq_scatter_base_f32): Delete.
(vstrwq_scatter_base_p_f32): Delete.
(__arm_vstrwq_scatter_base_s32): Delete.
(__arm_vstrwq_scatter_base_u32): Delete.
(__arm_vstrwq_scatter_base_p_s32): Delete.
(__arm_vstrwq_scatter_base_p_u32): Delete.
(__arm_vstrdq_scatter_base_p_s64): Delete.
(__arm_vstrdq_scatter_base_p_u64): Delete.
(__arm_vstrdq_scatter_base_s64): Delete.
(__arm_vstrdq_scatter_base_u64): Delete.
(__arm_vstrwq_scatter_base_f32): Delete.
(__arm_vstrwq_scatter_base_p_f32): Delete.
(__arm_vstrwq_scatter_base): Delete.
(__arm_vstrwq_scatter_base_p): Delete.
(__arm_vstrdq_scatter_base_p): Delete.
(__arm_vstrdq_scatter_base): Delete.
* config/arm/arm_mve_builtins.def (vstrwq_scatter_base_s)
(vstrwq_scatter_base_u, vstrwq_scatter_base_p_s)
(vstrwq_scatter_base_p_u, vstrdq_scatter_base_s)
(vstrwq_scatter_base_f, vstrdq_scatter_base_p_s)
(vstrwq_scatter_base_p_f, vstrdq_scatter_base_u)
(vstrdq_scatter_base_p_u): Delete.
* config/arm/iterators.md (MVE_4): New.
(supf): Remove VSTRWQSB_S, VSTRWQSB_U.
(VSTRWSBQ): Delete.
* config/arm/mve.md (mve_vstrwq_scatter_base_<supf>v4si): Delete.
(mve_vstrwq_scatter_base_p_<supf>v4si): Delete.
(mve_vstrdq_scatter_base_p_<supf>v2di): Delete.
(mve_vstrdq_scatter_base_<supf>v2di): Delete.
(mve_vstrwq_scatter_base_fv4sf): Delete.
(mve_vstrwq_scatter_base_p_fv4sf): Delete.
(@mve_vstrq_scatter_base_<mode>): New.
(@mve_vstrq_scatter_base_p_<mode>): New.
* config/arm/unspecs.md (VSTRWQSB_S, VSTRWQSB_U, VSTRWQSB_F):
Delete.
(VSTRSBQ, VSTRSBQ_P): New.

arm: [MVE intrinsics] Add store_scatter_base shape

This patch adds the store_scatter_base shape description.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (store_scatter_base): New.
* config/arm/arm-mve-builtins-shapes.h (store_scatter_base): New.

arm: [MVE intrinsics] Check immediate is a multiple in a range

This patch adds support to check that an immediate is a multiple of a
given value in a given range.

This will be used for instance by scatter_base to check that offset is
in +/-4*[0..127].

Unlike require_immediate_range, require_immediate_range_multiple
accepts signed range bounds to handle the above case.

gcc/ChangeLog:

* config/arm/arm-mve-builtins.cc (report_out_of_range_multiple):
New.
(function_checker::require_signed_immediate): New.
(function_checker::require_immediate_range_multiple): New.
* config/arm/arm-mve-builtins.h
(function_checker::require_immediate_range_multiple): New.
(function_checker::require_signed_immediate): New.

arm: [MVE intrinsics] rework vstr_scatter_shifted_offset

Implement vstr?q_scatter_shifted_offset intrinsics using the MVE
builtins framework.

We use the same approach as the previous patch, and we now have four
sets of patterns:
- vector scatter stores with shifted offset (non-truncating)
- predicated vector scatter stores with shifted offset (non-truncating)
- truncating vector scatter stores with shifted offset
- predicated truncating vector scatter stores with shifted offset

Note that the truncating patterns do not use an iterator since there
is only one such variant: V4SI to V4HI.

We need to introduce new iterators:
- MVE_VLD_ST_scatter_shifted, same as MVE_VLD_ST_scatter without V16QI
- MVE_scatter_shift to map the mode to the shift amount

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strss_qualifiers)
(arm_strsu_qualifiers, arm_strsu_p_qualifiers)
(arm_strss_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl):
Add support for shifted version.
(vstrdq_scatter_shifted, vstrhq_scatter_shifted)
(vstrwq_scatter_shifted): New.
* config/arm/arm-mve-builtins-base.def (vstrhq_scatter_shifted)
(vstrwq_scatter_shifted, vstrdq_scatter_shifted): New.
* config/arm/arm-mve-builtins-base.h (vstrhq_scatter_shifted)
(vstrwq_scatter_shifted, vstrdq_scatter_shifted): New.
* config/arm/arm_mve.h (vstrhq_scatter_shifted_offset): Delete.
(vstrhq_scatter_shifted_offset_p): Delete.
(vstrdq_scatter_shifted_offset_p): Delete.
(vstrdq_scatter_shifted_offset): Delete.
(vstrwq_scatter_shifted_offset_p): Delete.
(vstrwq_scatter_shifted_offset): Delete.
(vstrhq_scatter_shifted_offset_s32): Delete.
(vstrhq_scatter_shifted_offset_s16): Delete.
(vstrhq_scatter_shifted_offset_u32): Delete.
(vstrhq_scatter_shifted_offset_u16): Delete.
(vstrhq_scatter_shifted_offset_p_s32): Delete.
(vstrhq_scatter_shifted_offset_p_s16): Delete.
(vstrhq_scatter_shifted_offset_p_u32): Delete.
(vstrhq_scatter_shifted_offset_p_u16): Delete.
(vstrdq_scatter_shifted_offset_p_s64): Delete.
(vstrdq_scatter_shifted_offset_p_u64): Delete.
(vstrdq_scatter_shifted_offset_s64): Delete.
(vstrdq_scatter_shifted_offset_u64): Delete.
(vstrhq_scatter_shifted_offset_f16): Delete.
(vstrhq_scatter_shifted_offset_p_f16): Delete.
(vstrwq_scatter_shifted_offset_f32): Delete.
(vstrwq_scatter_shifted_offset_p_f32): Delete.
(vstrwq_scatter_shifted_offset_p_s32): Delete.
(vstrwq_scatter_shifted_offset_p_u32): Delete.
(vstrwq_scatter_shifted_offset_s32): Delete.
(vstrwq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_s32): Delete.
(__arm_vstrhq_scatter_shifted_offset_s16): Delete.
(__arm_vstrhq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_u16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_s32): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_s16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_u16): Delete.
(__arm_vstrdq_scatter_shifted_offset_p_s64): Delete.
(__arm_vstrdq_scatter_shifted_offset_p_u64): Delete.
(__arm_vstrdq_scatter_shifted_offset_s64): Delete.
(__arm_vstrdq_scatter_shifted_offset_u64): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_s32): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_u32): Delete.
(__arm_vstrwq_scatter_shifted_offset_s32): Delete.
(__arm_vstrwq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_f16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_f16): Delete.
(__arm_vstrwq_scatter_shifted_offset_f32): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_f32): Delete.
(__arm_vstrhq_scatter_shifted_offset): Delete.
(__arm_vstrhq_scatter_shifted_offset_p): Delete.
(__arm_vstrdq_scatter_shifted_offset_p): Delete.
(__arm_vstrdq_scatter_shifted_offset): Delete.
(__arm_vstrwq_scatter_shifted_offset_p): Delete.
(__arm_vstrwq_scatter_shifted_offset): Delete.
* config/arm/arm_mve_builtins.def
(vstrhq_scatter_shifted_offset_p_u)
(vstrhq_scatter_shifted_offset_u)
(vstrhq_scatter_shifted_offset_p_s)
(vstrhq_scatter_shifted_offset_s, vstrdq_scatter_shifted_offset_s)
(vstrhq_scatter_shifted_offset_f, vstrwq_scatter_shifted_offset_f)
(vstrwq_scatter_shifted_offset_s)
(vstrdq_scatter_shifted_offset_p_s)
(vstrhq_scatter_shifted_offset_p_f)
(vstrwq_scatter_shifted_offset_p_f)
(vstrwq_scatter_shifted_offset_p_s)
(vstrdq_scatter_shifted_offset_u, vstrwq_scatter_shifted_offset_u)
(vstrdq_scatter_shifted_offset_p_u)
(vstrwq_scatter_shifted_offset_p_u): Delete.
* config/arm/iterators.md (MVE_VLD_ST_scatter_shifted): New.
(MVE_scatter_shift): New.
(supf): Remove VSTRHQSSO_S, VSTRHQSSO_U, VSTRDQSSO_S, VSTRDQSSO_U,
VSTRWQSSO_U, VSTRWQSSO_S.
(VSTRHSSOQ, VSTRDSSOQ, VSTRWSSOQ): Delete.
* config/arm/mve.md (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Delete.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Delete.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>): Delete.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Delete.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Delete.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Delete.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di): Delete.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Delete.
(mve_vstrhq_scatter_shifted_offset_fv8hf): Delete.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Delete.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf): Delete.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Delete.
(mve_vstrwq_scatter_shifted_offset_fv4sf): Delete.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Delete.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf): Delete.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Delete.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Delete.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Delete.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si): Delete.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Delete.
(@mve_vstrq_scatter_shifted_offset_<mode>): New.
(@mve_vstrq_scatter_shifted_offset_p_<mode>): New.
(mve_vstrq_truncate_scatter_shifted_offset_v4si): New.
(mve_vstrq_truncate_scatter_shifted_offset_p_v4si): New.
* config/arm/unspecs.md (VSTRDQSSO_S, VSTRDQSSO_U, VSTRWQSSO_S)
(VSTRWQSSO_U, VSTRHQSSO_F, VSTRWQSSO_F, VSTRHQSSO_S, VSTRHQSSO_U):
Delete.
(VSTRSSOQ, VSTRSSOQ_P, VSTRSSOQ_TRUNC, VSTRSSOQ_TRUNC_P): New.

arm: [MVE intrinsics] rework vstr?q_scatter_offset

This patch implements vstr?q_scatter_offset using the new MVE builtins
framework.

It uses a similar approach to a previous patch which grouped
truncating and non-truncating stores in two sets of patterns, rather
than having groups of patterns depending on the destination size.

We need to add the 'integer_64' types of suffixes in order to support
vstrdq_scatter_offset.

The patch introduces the MVE_VLD_ST_scatter iterator, similar to
MVE_VLD_ST but which also includes V2DI (again, for
vstrdq_scatter_offset).

The new MVE_scatter_offset mode attribute is used to map the
destination type to the offset type (both are usually equal, except
when the destination is floating-point).

We end up with four sets of patterns:
- vector scatter stores with offset (non-truncating)
- predicated vector scatter stores with offset (non-truncating)
- truncating vector scatter stores with offset
- predicated truncating vector scatter stores with offset

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl):
New.
(vstrbq_scatter, vstrhq_scatter, vstrwq_scatter, vstrdq_scatter):
New.
* config/arm/arm-mve-builtins-base.def (vstrbq_scatter)
(vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New.
* config/arm/arm-mve-builtins-base.h (vstrbq_scatter)
(vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New.
* config/arm/arm-mve-builtins.cc (integer_64): New.
* config/arm/arm_mve.h (vstrbq_scatter_offset): Delete.
(vstrbq_scatter_offset_p): Delete.
(vstrhq_scatter_offset): Delete.
(vstrhq_scatter_offset_p): Delete.
(vstrdq_scatter_offset_p): Delete.
(vstrdq_scatter_offset): Delete.
(vstrwq_scatter_offset_p): Delete.
(vstrwq_scatter_offset): Delete.
(vstrbq_scatter_offset_s8): Delete.
(vstrbq_scatter_offset_u8): Delete.
(vstrbq_scatter_offset_u16): Delete.
(vstrbq_scatter_offset_s16): Delete.
(vstrbq_scatter_offset_u32): Delete.
(vstrbq_scatter_offset_s32): Delete.
(vstrbq_scatter_offset_p_s8): Delete.
(vstrbq_scatter_offset_p_s32): Delete.
(vstrbq_scatter_offset_p_s16): Delete.
(vstrbq_scatter_offset_p_u8): Delete.
(vstrbq_scatter_offset_p_u32): Delete.
(vstrbq_scatter_offset_p_u16): Delete.
(vstrhq_scatter_offset_s32): Delete.
(vstrhq_scatter_offset_s16): Delete.
(vstrhq_scatter_offset_u32): Delete.
(vstrhq_scatter_offset_u16): Delete.
(vstrhq_scatter_offset_p_s32): Delete.
(vstrhq_scatter_offset_p_s16): Delete.
(vstrhq_scatter_offset_p_u32): Delete.
(vstrhq_scatter_offset_p_u16): Delete.
(vstrdq_scatter_offset_p_s64): Delete.
(vstrdq_scatter_offset_p_u64): Delete.
(vstrdq_scatter_offset_s64): Delete.
(vstrdq_scatter_offset_u64): Delete.
(vstrhq_scatter_offset_f16): Delete.
(vstrhq_scatter_offset_p_f16): Delete.
(vstrwq_scatter_offset_f32): Delete.
(vstrwq_scatter_offset_p_f32): Delete.
(vstrwq_scatter_offset_p_s32): Delete.
(vstrwq_scatter_offset_p_u32): Delete.
(vstrwq_scatter_offset_s32): Delete.
(vstrwq_scatter_offset_u32): Delete.
(__arm_vstrbq_scatter_offset_s8): Delete.
(__arm_vstrbq_scatter_offset_s32): Delete.
(__arm_vstrbq_scatter_offset_s16): Delete.
(__arm_vstrbq_scatter_offset_u8): Delete.
(__arm_vstrbq_scatter_offset_u32): Delete.
(__arm_vstrbq_scatter_offset_u16): Delete.
(__arm_vstrbq_scatter_offset_p_s8): Delete.
(__arm_vstrbq_scatter_offset_p_s32): Delete.
(__arm_vstrbq_scatter_offset_p_s16): Delete.
(__arm_vstrbq_scatter_offset_p_u8): Delete.
(__arm_vstrbq_scatter_offset_p_u32): Delete.
(__arm_vstrbq_scatter_offset_p_u16): Delete.
(__arm_vstrhq_scatter_offset_s32): Delete.
(__arm_vstrhq_scatter_offset_s16): Delete.
(__arm_vstrhq_scatter_offset_u32): Delete.
(__arm_vstrhq_scatter_offset_u16): Delete.
(__arm_vstrhq_scatter_offset_p_s32): Delete.
(__arm_vstrhq_scatter_offset_p_s16): Delete.
(__arm_vstrhq_scatter_offset_p_u32): Delete.
(__arm_vstrhq_scatter_offset_p_u16): Delete.
(__arm_vstrdq_scatter_offset_p_s64): Delete.
(__arm_vstrdq_scatter_offset_p_u64): Delete.
(__arm_vstrdq_scatter_offset_s64): Delete.
(__arm_vstrdq_scatter_offset_u64): Delete.
(__arm_vstrwq_scatter_offset_p_s32): Delete.
(__arm_vstrwq_scatter_offset_p_u32): Delete.
(__arm_vstrwq_scatter_offset_s32): Delete.
(__arm_vstrwq_scatter_offset_u32): Delete.
(__arm_vstrhq_scatter_offset_f16): Delete.
(__arm_vstrhq_scatter_offset_p_f16): Delete.
(__arm_vstrwq_scatter_offset_f32): Delete.
(__arm_vstrwq_scatter_offset_p_f32): Delete.
(__arm_vstrbq_scatter_offset): Delete.
(__arm_vstrbq_scatter_offset_p): Delete.
(__arm_vstrhq_scatter_offset): Delete.
(__arm_vstrhq_scatter_offset_p): Delete.
(__arm_vstrdq_scatter_offset_p): Delete.
(__arm_vstrdq_scatter_offset): Delete.
(__arm_vstrwq_scatter_offset_p): Delete.
(__arm_vstrwq_scatter_offset): Delete.
* config/arm/arm_mve_builtins.def (vstrbq_scatter_offset_s)
(vstrbq_scatter_offset_u, vstrbq_scatter_offset_p_s)
(vstrbq_scatter_offset_p_u, vstrhq_scatter_offset_p_u)
(vstrhq_scatter_offset_u, vstrhq_scatter_offset_p_s)
(vstrhq_scatter_offset_s, vstrdq_scatter_offset_s)
(vstrhq_scatter_offset_f, vstrwq_scatter_offset_f)
(vstrwq_scatter_offset_s, vstrdq_scatter_offset_p_s)
(vstrhq_scatter_offset_p_f, vstrwq_scatter_offset_p_f)
(vstrwq_scatter_offset_p_s, vstrdq_scatter_offset_u)
(vstrwq_scatter_offset_u, vstrdq_scatter_offset_p_u)
(vstrwq_scatter_offset_p_u) Delete.
* config/arm/iterators.md (MVE_VLD_ST_scatter): New.
(MVE_scatter_offset): New.
(MVE_elem_ch): Add entry for V2DI.
(supf): Remove VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S, VSTRHQSO_U,
VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_U, VSTRWQSO_S.
(VSTRBSOQ, VSTRHSOQ, VSTRDSOQ, VSTRWSOQ): Delete.
* config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>):
Delete.
(mve_vstrbq_scatter_offset_<supf><mode>_insn): Delete.
(mve_vstrbq_scatter_offset_p_<supf><mode>): Delete.
(mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Delete.
(mve_vstrhq_scatter_offset_p_<supf><mode>): Delete.
(mve_vstrhq_scatter_offset_p_<supf><mode>_insn): Delete.
(mve_vstrhq_scatter_offset_<supf><mode>): Delete.
(mve_vstrhq_scatter_offset_<supf><mode>_insn): Delete.
(mve_vstrdq_scatter_offset_p_<supf>v2di): Delete.
(mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Delete.
(mve_vstrdq_scatter_offset_<supf>v2di): Delete.
(mve_vstrdq_scatter_offset_<supf>v2di_insn): Delete.
(mve_vstrhq_scatter_offset_fv8hf): Delete.
(mve_vstrhq_scatter_offset_fv8hf_insn): Delete.
(mve_vstrhq_scatter_offset_p_fv8hf): Delete.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Delete.
(mve_vstrwq_scatter_offset_fv4sf): Delete.
(mve_vstrwq_scatter_offset_fv4sf_insn): Delete.
(mve_vstrwq_scatter_offset_p_fv4sf): Delete.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Delete.
(mve_vstrwq_scatter_offset_p_<supf>v4si): Delete.
(mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Delete.
(mve_vstrwq_scatter_offset_<supf>v4si): Delete.
(mve_vstrwq_scatter_offset_<supf>v4si_insn): Delete.
(@mve_vstrq_scatter_offset_<mode>): New.
(@mve_vstrq_scatter_offset_p_<mode>): New.
(@mve_vstrq_truncate_scatter_offset_<mode>): New.
(@mve_vstrq_truncate_scatter_offset_p_<mode>): New.
* config/arm/unspecs.md (VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S)
(VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_S, VSTRWQSO_U, VSTRHQSO_F)
(VSTRWQSO_F, VSTRHQSO_U): Delete.
(VSTRQSO, VSTRQSO_P, VSTRQSO_TRUNC, VSTRQSO_TRUNC_P): New.

arm: [MVE intrinsics] add store_scatter_offset shape

This patch adds the store_scatter_offset shape and uses a new helper
class (store_scatter), which will also be used by later patches.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct store_scatter): New.
(struct store_scatter_offset_def): New.
* config/arm/arm-mve-builtins-shapes.h (store_scatter_offset): New.

arm: [MVE intrinsics] add mode_after_pred helper in function_shape

This new helper returns true if the mode suffix goes after the
predicate suffix. This is true in most cases, so the base
implementations in nonoverloaded_base and overloaded_base return true.
For instance: vaddq_m_n_s32.

This will be useful in later patches to implement
vstr?q_scatter_offset_p (_p appears after _offset).

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct
nonoverloaded_base): Implement mode_after_pred.
(struct overloaded_base): Likewise.
* config/arm/arm-mve-builtins.cc (function_builder::get_name):
Call mode_after_pred as needed.
* config/arm/arm-mve-builtins.h (function_shape): Add
mode_after_pred.

C++: reject OpenMP directives in constexpr functions

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_construct, cp_parser_pragma): Reject
OpenMP expressions in constexpr functions.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/pr108607.C: Update dg-error.
* g++.dg/gomp/pr79664.C: Update dg-error.
* g++.dg/gomp/omp-constexpr.C: New test.

genrecog: Split into separate partitions [PR111600].

Hi,

this patch makes genrecog split its output into separate files (10 by
default) in the same vein genemit does.  The changes are mostly
mechanical again, changing printfs and puts to fprintf.
As insn-recog.cc relies on being able to call other recog functions a
header insn-recog.h is introduced that pre declares all of those.

For simplicity the number of files is determined by (re-using)
--with-insnemit-partitions.  Naming suggestions welcome :)

Bootstrapped and regtested on x86 and power10, regtested on riscv.
aarch64 bootstrap is currently blocked because of the
"maybe uninitialized" issue discussed on IRC.

Regards
Robin

PR target/111600

gcc/ChangeLog:

* Makefile.in:  Add insn-recog split.
* configure: Regenerate.
* configure.ac: Document that the number of insnemit partitions is
used for insn-recog as well.
* genconditions.cc (write_one_condition): Use fprintf.
* genpreds.cc (write_predicate_expr): Ditto.
(write_init_reg_class_start_regs): Ditto.
* genrecog.cc (write_header): Add header file to includes.
(printf_indent): Use fprintf.
(change_state): Ditto.
(print_code): Ditto.
(print_host_wide_int): Ditto.
(print_parameter_value): Ditto.
(print_test_rtx): Ditto.
(print_nonbool_test): Ditto.
(print_label_value): Ditto.
(print_test): Ditto.
(print_decision): Ditto.
(print_state): Ditto.
(print_subroutine_call): Ditto.
(print_acceptance): Ditto.
(print_subroutine_start): Ditto.
(print_pattern): Ditto.
(print_subroutine): Ditto.
(print_subroutine_group): Ditto.
(handle_arg): Add -O and -H for output and header file handling.
(main): Use callback.
* gentarget-def.cc (def_target_insn): Use fprintf.
* read-md.cc (md_reader::print_c_condition): Ditto.
* read-md.h (class md_reader): Ditto.

libstdc++: Fix uninitialized data in std::basic_spanbuf::seekoff

I noticed a -Wmaybe-uninitialized warning for this function, which turns
out to be correct. If the caller passes a valid std::ios_base::seekdir
value then there's no problem, but if they pass std::seekdir(999) then
we don't initialize the __base variable before adding it to __off.

Rather than initialize it to an arbitrary value, we should return an
error.

Also add [[unlikely]] attributes to the paths that return an error.

libstdc++-v3/ChangeLog:

* include/std/spanstream (basic_spanbuf::seekoff): Return an
error for invalid seekdir values.

libstdc++: Swap expressions in noexcept-specifier of ranges::not_equal_to

Although this should never make a difference for sensible code, we
should really make the expression in the noexcept-specifier match the
expression in the function body.

libstdc++-v3/ChangeLog:

* include/bits/ranges_cmp.h (not_equal_to): Make order of
expressions in noexcept-specifier match the body.
* testsuite/20_util/function_objects/range.cmp/not_equal_to.cc:
Check noexcept.

libstdc++: Fix -Wsign-compare warning in <regex>

libstdc++-v3/ChangeLog:

* include/bits/regex.tcc: Fix -Wsign-compare warning.

libstdc++: Fix -Wreorder warning in <pstl/parallel_backend_tbb.h>

libstdc++-v3/ChangeLog:

* include/pstl/parallel_backend_tbb.h (__merge_func): Fix order
of mem-initializers.

libstdc++: Fix -Wmisleading-indentation warning in testcase

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/random/random_device/entropy.cc: Fix
indentation to avoid -Wmisleading-indentation warning.

AArch64: Set L1 data cache size according to size on CPUs

This sets the L1 data cache size for some cores based on their size in their
Technical Reference Manuals.

Today the port minimum is 256 bytes as explained in commit
g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most cores
actually define the L1 cache size as 64-bytes. The generic Armv9-A model was
already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this
change follows suite for a few other cores based on their TRMs.

This results in less memory pressure when running on large core count machines.

gcc/ChangeLog:

* config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen1.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
(neoversev2_prefetch_tune): Removed.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.

AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

GCC 15 added two new fusions CMP+CSEL and CMP+CSET.

This patch enables them for cores that support based on their Software
Optimization Guides and generically on Armv9-A. Even if a core does not
support it there's no negative performance impact.

gcc/ChangeLog:

* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE):
New.
* config/aarch64/tuning_models/neoverse512tvb.h: Use it.
* config/aarch64/tuning_models/neoversen2.h: Use it.
* config/aarch64/tuning_models/neoversen3.h: Use it.
* config/aarch64/tuning_models/neoversev1.h: Use it.
* config/aarch64/tuning_models/neoversev2.h: Use it.
* config/aarch64/tuning_models/neoversev3.h: Use it.
* config/aarch64/tuning_models/neoversev3ae.h: Use it.
* config/aarch64/tuning_models/cortexx925.h: Add fusions.
* config/aarch64/tuning_models/generic_armv9_a.h: Add fusions.

i386: Add vec_fm{addsub,subadd}v2sf4 patterns [PR116979]

As mentioned in the PR, the addition of vec_addsubv2sf3 expander caused
the testcase to be vectorized and no longer to use fma.
The following patch adds new expanders so that it can be vectorized
again with the alternating add/sub fma instructions.

There is some bug on the slp cost computation side which causes it
not to count some scalar multiplication costs, but I think the patch
is desirable anyway before that is fixed and the testcase for now just
uses -fvect-cost-model=unlimited.

2024-12-13 Jakub Jelinek <jakub@redhat.com>

PR target/116979
* config/i386/mmx.md (vec_fmaddsubv2sf4, vec_fmsubaddv2sf4): New
define_expand patterns.

* gcc.target/i386/pr116979.c: New test.

RISC-V: Improve slide1up pattern.

This patch adds a second variant to implement the extract/slide1up
pattern. In order to do a permutation like
<3, 4, 5, 6> from vectors <0, 1, 2, 3> and <4, 5, 6, 7>
we currently extract <3> from the first vector and re-insert it into the
second vector. Unless register-file crossing latency is essentially
zero it should be preferable to first slide the second vector up by
one, then slide down the first vector by (nunits - 1).

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_register_move_cost):
Export.
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
Rename...
(shuffle_off_by_one_patterns): ... to this and add slideup/slidedown
variant.
(expand_vec_perm_const_1): Call renamed function.
* config/riscv/riscv.cc (riscv_secondary_memory_needed): Remove
static.
(riscv_register_move_cost): Add VR<->GR/FR handling.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112599-2.c: Adjust test
expectation.

RISC-V: Add even/odd vec_perm_const pattern.

This adds handling for even/odd patterns.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_even_odd_patterns): New
function.
(expand_vec_perm_const_1): Use new function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd.c: New test.

RISC-V: Add interleave pattern.

This patch adds efficient handling of interleaving patterns like
[0 4 1 5] to vec_perm_const. It is implemented by a slideup and a
gather.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_interleave_patterns): New
function.
(expand_vec_perm_const_1): Use new function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave.c: New test.

RISC-V: Add slide to perm_const strategies.

This patch adds a shuffle_slide_patterns to expand_vec_perm_const.
It recognizes permutations like

{0, 1, 4, 5}
or
{2, 3, 6, 7}

which can be constructed by a slideup or slidedown of one of the vectors
into the other one.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_slide_patterns): New.
(expand_vec_perm_const_1): Call new function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide.c: New test.

RISC-V: Emit vector shift pattern for const_vector [PR117353].

In PR117353 and PR117878 we expand a const vector during reload.  For
this we use an unpredicated left shift.  Normally an insn like this is
split but as we introduce it late and cannot create pseudos anymore
it remains unpredicated and is not recognized by the vsetvl pass (where
we expect all insns to be in predicated RVV format).

This patch directly emits a predicated shift instead.  We could
distinguish between !lra_in_progress and lra_in_progress and emit
an unpredicated shift in the former case but we're not very likely
to optimize it anyway so it doesn't seem worth it.

PR target/117353
PR target/117878

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use predicated
instead of simple shift.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr117353.c: New test.

testsuite: Fix typo in directive names

Some directives in the test were #errror rather than #error.

2024-12-13 Jakub Jelinek <jakub@redhat.com>

* c-c++-common/cpp/embed-1.c: Use #error rather than #errror.

RISC-V: Make vector strided load alias all other memories

The vector strided load doesn't include the (mem:BLK (scratch)) to
alias all other memories.  It will make the alias analysis only
consider the base address of strided load and promopt the store
before the strided load.  For example as below

  #define STEP 10

  char d[225];
  int e[STEP];

  int main() {
    // store 0, 10, 20, 30, 40, 50, 60, 70, 80, 90
    for (long h = 0; h < STEP; ++h)
      d[h * STEP] = 9;

    // load 30, 40, 50, 60, 70, 80, 90
    // store 3,  4,  5,  6,  7,  8,  9
    for (int h = 3; h < STEP; h += 1)
      e[h] = d[h * STEP];

    if (e[5] != 9) {
      __builtin_abort ();
    }

    return 0;
  }

The asm dump will be:
main:
        lui     a5,%hi(.LANCHOR0)
        addi    a5,a5,%lo(.LANCHOR0)
        li      a4,9
        sb      a4,30(a5)
        addi    a3,a5,30
        vsetivli        zero,7,e32,m1,ta,ma
        li      a2,10
        vlse8.v v2,0(a3),a2 // depends on 30(a5), 40(a5), ... 90(a5) but
                            // only 30(a5) has been promoted before vlse.
                            // It is store after load mistake.
        addi    a3,a5,252
        sb      a4,0(a5)
        sb      a4,10(a5)
        sb      a4,20(a5)
        sb      a4,40(a5)
        vzext.vf4       v1,v2
        sb      a4,50(a5)
        sb      a4,60(a5)
        vse32.v v1,0(a3)
        li      a0,0
        sb      a4,70(a5)
        sb      a4,80(a5)
        sb      a4,90(a5)
        lw      a5,260(a5)
        beq     a5,a4,.L4
        li      a0,123

After this patch:
main:
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,9
        lui     a5,%hi(.LANCHOR0)
        addi    a5,a5,%lo(.LANCHOR0)
        addi    a4,a5,244
        vse32.v v1,0(a4)
        li      a4,9
        sb      a4,0(a5)
        sb      a4,10(a5)
        sb      a4,20(a5)
        sb      a4,30(a5)
        sb      a4,40(a5)
        sb      a4,50(a5)
        sb      a4,60(a5)
        sb      a4,70(a5)
        sb      a4,80(a5)
        sb      a4,90(a5)
        vsetivli        zero,3,e32,m1,ta,ma
        addi    a4,a5,70
        li      a3,10
        vlse8.v v2,0(a4),a3
        addi    a5,a5,260
        li      a0,0
        vzext.vf4       v1,v2
        vse32.v v1,0(a5)
        ret

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117990

gcc/ChangeLog:

* config/riscv/vector.md: Add the (mem:BLK (scratch)) to the
vector strided load.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr117990-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

ada: Fix internal error on packed record with 0-size component

The problem is that the order of components listed in a constant CONSTRUCTOR
does not match that of the associated record type.

gcc/ada/ChangeLog:

* gcc-interface/utils2.cc (compare_elmt_bitpos): Deal specially with
0-sized components when the bit position is the same.

ada: Pass artificial_p to create_type_decl

The recent "nameless types" change to gcc-interface caused the gdb
pretty-printer for VSS to fail.  This happens because one call to
create_type_decl unconditionally passes "true" as the "artificial_p"
parameter.  This patch changes this call to instead pass the entity's
local artificial_p value instead.  This makes sense, I think, because
the type decl being created for debug purposes (as the comment says)
is there to represent the relevant entity from the source.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Pass artificial_p to
create_type_decl.

ada: Cleanup preanalysis of static expressions

During preanalysis, the frontend does not generate freeze nodes.
The exception to this rule occurs during the preanalysis of default
and per-object expressions, where static expressions are frozen.

A patch merged six years ago to address an issue in this area introduced
additional complexity and confusion regarding the frontend's behavior in
such cases. The purpose of this patch is to revert that change, simplifying
the support for the preanalysis of static expressions to make it cleaner
and easier to understand.

gcc/ada/ChangeLog:

* sem.ads (Inside_Preanalysis_Without_Freezing): Removed.
* sem.adb (Semantics): Remove Inside_Preanalysis_Without_Freezing.
* sem_ch6.adb (Preanalyze_Formal_Expression): Removed.
* sem_ch3.ads (Preanalyze_Assert_Expression): Add documentation.
(Preanalyze_Spec_Expression): Add documentation.
* sem_ch3.adb (Preanalyze_Assert_Expression) Code cleanup.
(Preanalyze_Default_Expression): Code cleanup.
* sem_res.ads (Preanalyze_With_Freezing_And_Resolve): Removed.
* sem_res.adb (Preanalyze_With_Freezing_And_Resolve): Removed.
(Preanalyze_And_Resolve): Code cleanup.
* freeze.adb (Freeze_Entity): No freeze under strict preanalysis.
(Freeze_Expression): Code cleanup.
(Freeze_Expr_Types): Replace call to Preanalyze_Spec_Expression by
strict preanalysis during preanalysis of a duplicate of the
expression performed to have available the minimum decoration
to locate referenced unfrozen types.
* sem_aggr.adb (Resolve_Array_Aggregate): Minor code cleanup.
* sem_attr.adb (Resolve_Attribute): Add documentation.
* sem_ch13.adb (Resolve_Aspect_Expressions[Aspect_Default_Value]):
Replace call to Preanalyze_Spec_Expression by Preanalyze_And_Resolve.
(Resolve_Aspect_Expressions[Aspect_Default_Component_Value]): Ditto.
* sem_ch8.adb (Set_Entity_Or_Discriminal): Code cleaup.
* sem_prag.adb (Analyze_Initial_Condition_In_Decl_Part): Replace
call to Preanalyze_Assert_Expression by call to Preanalyze_And_Resolve.
(Analyze_Pre_Post_Condition): Replace call to Preanayze_Spec_Expression
by call to Preanalyze_Assert_Expression.
* sem_util.ads (In_Pragma_Expression): Adding a formal to extend the
functionality of this subprogram.
(Within_Static_Expression): New subprogram.
* sem_util.adb (In_Pragma_Expression): Ditto.
(Within_Static_Expression): Ditto.
* checks.adb (Install_Null_Excluding_Check): No check during preanalysis.
(Install_Primitive_Elaboration_Check): Ditto.

ada: Improve expansion of nested conditional expressions in return statements

This arranges for nested conditional expressions in simple return statements
to have their expansion delayed until the returns are distributed into their
dependent expressions.  This comprises the case of the elsif part of an if
expression present in the source code.

This also distributes qualified expressions into the dependent expressions
of conditional expressions, although this seems to occur rarely in practice.

gcc/ada/ChangeLog:

* exp_aggr.ads (Is_Delayed_Conditional_Expression): Move to...
* exp_aggr.adb (Is_Delayed_Conditional_Expression): Move to...
(Convert_To_Assignments): Use Delay_Conditional_Expressions_Between.
* exp_ch3.adb (Expand_N_Object_Declaration): Reset the Analyzed flag
by means of Unanalyze_Delayed_Conditional_Expression.
* exp_ch4.adb (Expand_N_Case_Expression): Likewise.  Delay expanding
the expression if it is in the context of a simple return statement.
(Expand_N_If_Expression): Likewise.
(Expand_N_Qualified_Expression): Fold identical operand.  Distribute
the expression into an operand that is a conditional expression with
expansion delayed.
(Process_Transient_In_Expression): Also test the parent node for the
presence of a simple return statement.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Test the unconditional
parent node for the presence of a simple return statement.
* exp_util.ads (Delayed Expansion): New description.
(Delay_Conditional_Expressions_Between): New procedure.
(Is_Delayed_Conditional_Expression): ...here.
(Unanalyze_Delayed_Conditional_Expression): New procedure.
(Unconditional_Parent): New function.
* exp_util.adb (Find_Hook_Context): Take into account conditional
statements coming from conditional expressions.
(Within_Conditional_Expression): Likewise.
(Delay_Conditional_Expressions_Between): New procedure.
(Is_Delayed_Conditional_Expression): ...here.
(Unanalyze_Delayed_Conditional_Expression): New procedure.
(Unconditional_Parent): New function.
* sinfo.ads (Expansion_Delayed): Adjust description.

ada: Fix indentation in record component declarations

Code cleanup.

gcc/ada/ChangeLog:

* exp_aggr.adb (Case_Bounds): Fix indentation.
* sem_case.adb (Choice_Bounds): Likewise.
* libgnat/s-dourea.ads (Duuble_T): Likewise.
* libgnat/s-excmac__arm.ads (Cleanup_Cache_Type): Likewise.

ada: Fix code indentation

Fix uncontroversial coding style violations detected by an experiment with
a tree-sitter indentation support in Emacs.

gcc/ada/ChangeLog:

* atree.adb, diagnostics-pretty_emitter.adb,
diagnostics-utils.adb, einfo-utils.adb, errout.adb, exp_aggr.adb,
exp_ch3.adb, exp_ch5.adb, exp_ch6.adb, exp_ch7.adb, exp_imgv.adb,
exp_pakd.adb, exp_prag.adb, exp_unst.adb, exp_util.adb, gnatchop.adb,
gnatlink.adb, inline.adb, itypes.adb, osint.adb, rtsfind.adb,
sem_aggr.adb, sem_ch10.adb, sem_ch12.adb, sem_ch13.adb, sem_ch3.adb,
sem_ch4.adb, sem_dim.adb, sem_elab.adb, sem_prag.adb, sem_util.adb,
sprint.adb, switch-m.adb, table.adb: Fix code indentation.

ada: Fix fixed point text-io when subtype has dynamic range

When the fixed point subtype has dynamic range, for example in the
context of a generic procedure Test where Fixed_Type is a type formal:

  procedure Test (Low, High : Fixed_Type) is
    type New_Subtype is new Fixed_Type range Low .. High;
    package New_Io is new Text_IO.Fixed_IO (New_Subtype);

the compiler would complain with:
non-static universal integer value out of range

Have the check use the Base type for checking what integer type can be
used. If a given integer type can be used for a base type, it can
also be used for any of its subtypes.

gcc/ada/ChangeLog:

* libgnat/a-tifiio.adb (OK_Get_32): Use 'Base.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-tifiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.
* libgnat/a-wtfiio.adb (OK_Get_32): Likewise.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-wtfiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.
* libgnat/a-ztfiio.adb (OK_Get_32): Likewise.
(OK_Put_32, OK_Get_64, OK_Put_64): Likewise.
* libgnat/a-ztfiio__128.adb (OK_Get_32, OK_Put_32, OK_Get_64)
(OK_Put_64, OK_Get_128, OK_Put_128): Likewise.

ada: Refactor code of Check_Ambiguous_Call and Valid_Conversion

gcc/ada/ChangeLog:

* sem_res.adb (Report_Ambiguous_Argument): Code cleanup.
(Resolve): Code cleanup.

ada: Implement new rules about effectively volatile types in SPARK

New rules make record types effectively volatile based on the effective
volatility of their components; same for effectively volatile for
reading. Now volatility composition for records works like volatility
composition for arrays.

gcc/ada/ChangeLog:

* sem_util.adb (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Implement new rule for
record types.
* sem_util.ads (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Adjust comments.

ada: Remove unused parameter from volatile type queries

Routines Is_Effectively_Volatile and Is_Effectively_Volatile_For_Reading
were always called with Ignore_Protected parameter set to True (or has
been passed unmodified on recursive calls), so this parameter wasn't
actually needed.

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* sem_util.adb (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Remove Ignore_Protected
parameter.
(Is_Effectively_Volatile_Object,
Is_Effectively_Volatile_Object_For_Reading): Remove
single-parameter wrappers that are needed to instantiate
generic subprogram.
* sem_util.ads (Is_Effectively_Volatile,
Is_Effectively_Volatile_For_Reading): Remove parameter; adjust
comment.

ada: Elide copy for calls in allocators for nonlimited by-reference types

This prevents a temporary from being created on the primary stack to hold
the result of the function calls before it is copied to the newly allocated
memory in the nonlimited by-reference case.

That's already not done in the nonlimited non-by-reference case and there is
no reason to do it in the former case either. The main issue is the call to
Remove_Side_Effects in Expand_Allocator_Expression, but its only purpose is
to cover the problematic processing done in Build_Allocate_Deallocate_Proc
on (part of) the expression; once this is fixed, the call is unnecessary.

The change also contains another small fix to deal with the corner case of
allocators for access-to-access types.

gcc/ada/ChangeLog:

* exp_ch4.adb (Expand_Allocator_Expression): Do not preventively
call Remove_Side_Effects on the expression in the nonlimited
by-reference case. Always call Build_Allocate_Deallocate_Proc
in the default case.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Bail out if the call
is the qualified expression of an allocator.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Replace all the
calls to Relocate_Node by calls to Duplicate_Subexpr_No_Checks.

ada: Remove last call to Preanalyze_And_Resolve from Exp_Aggr

All the expressions are now at least preanalyzed in a non-iterated context,
so we do not need to redo it in Aggr_Assignment_OK_For_Backend, given that
Is_OK_Aggregate explicitly rejects iterated component associations.

gcc/ada/ChangeLog:

* exp_aggr.adb (Aggr_Assignment_OK_For_Backend): Do not call again
Preanalyze_And_Resolve on the expression.

ada: Fix breakage of GNATprove introduced by latest change

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Aggr_Expr): Always perform a full analysis
of the expression in SPARK mode.

ada: Fix typo in reference manual

gcc/ada/ChangeLog:

* doc/gnat_rm/gnat_language_extensions.rst: Fix typo.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Fix dangling reference with user-defined indexing of function call

This happens with a noncontrolled type because the user-defined indexing is
expanded into a function call that binds the lifetime of the original call
to its return value. The temporary must be created explicitly in this case,
so that the front-end can control its lifetime.

gcc/ada/ChangeLog:

* exp_ch6.adb (Expand_Call_Helper): Also create a temporary in the
case of a noncontrolled user-defined indexing.

ada: Fix documentation of Ada.Real_Time.Timing_Events

The GNAT reference manual stated that GNAT did not implement this
language-defined package, but GNAT in fact does offer an implementation
of it.

gcc/ada/ChangeLog:

* doc/gnat_rm/standard_library_routines.rst: Fix documentation.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Exclude library units from gnatcov instrumentation

Before this patch, we instrumented code that's only used during the
build process to generate more code. This patch marks the
code-generating code so it's not instrumented for coverage.

gcc/ada/ChangeLog:

* gnat2.gpr: Add library units to coverage exclusion list.

ada: Further work in semantic analysis of iterated component associations

This finishes up the transition to preanalysis of a copy of the expression
for iterated component associations in all contexts, thus voiding the need
to clean things up afterward.

However, this requires a larger cleanup in semantics analysis of aggregates,
in particular for others choices, which are currently skipped in Sem_Aggr,
with Exp_Aggr trying to patch things up afterward but leaving some legality
loopholes in the end.  That's why this makes sure that all the expressions
appearing in aggregates are either analyzed or preanalyzed by Sem_Aggr, as
documented in the spec of Sem, modulo the copy in an iteration context.

gcc/ada/ChangeLog:

* exp_aggr.adb (Build_Array_Aggr_Code): Remove obsolete comment.
(Convert_To_Positional): Remove Ctyp local variable.
(Is_Static_Element): Remove Dims parameter and do not preanalyze the
expression there.
(Expand_Array_Aggregate): Make Ctyp a constant.
(Compute_Others_Present): Do not preanalyze the expression there.
* sem_aggr.adb (Resolve_Array_Aggregate): New Ctyp constant.  Use it
throughout the procedure to denote the component type.
(Resolve_Aggr_Expr): Always preanalyze a copy of the expression in
an iteration context.  Preanalyze it directly when the expander is
active and the choice may cover multiple components.  Otherwise,
fully analyze it.
Do not reanalyze an iterated component association with an others
choice either when there are positional components.
(Resolve_Iterated_Component_Association): Do not remove references
from the expression after invoking Resolve_Aggr_Expr on it.

ada: Remove implicit assumption in the double case

The assumption is fulfilled in all the instantiations of the package, but
it should not be made in the generic code.

gcc/ada/ChangeLog:

* libgnat/s-imager.adb (Set_Image_Real): In the case where a double
integer is needed, do not implicit assume that it can contain up to
'Digits of the floating-point type.

ada: Adjust cut-off for scaling of floating-point numbers

The value needs to take into account denormals and encompass Maxdigs.

gcc/ada/ChangeLog:

* libgnat/s-imager.adb (Maxscaling): Change to Natural constant and
add Maxdigs to value.

Fix -fstrict-flex-arrays documentation, again [PR111659]

My previous attempt to fix this issue ended up garbling the text
instead. Trying again to make the descriptions of the attribute and
command-line option consistent.

gcc/ChangeLog
PR middle-end/111659
* doc/extend.texi (Common Variable Attributes): Copy-edit description
of the strict_flex_array attribute levels.
* doc/invoke.texi (C Dialect Options): Swap documented behavior for
levels 0 and 3. Copy the description for the other levels from the
attribute instead of indirecting to it.

Daily bump.

libstdc++: Fix some -Wsign-compare warnings in the testsuite

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_map/modifiers/reserve.cc:
Cast to size_t to fix -Wsign-compare warning.
* testsuite/23_containers/unordered_set/hash_policy/71181.cc:
Likewise.
* testsuite/23_containers/unordered_set/insert/move_range.cc:
Likewise.

libstdc++: Fix -Wsign-compare warnings in bits/hashtable_policy.h

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Local_iterator_base): Fix
-Wsign-compare warnings.

libstdc++: Fix typo in comment in src/c++17/fs_dir.cc

libstdc++-v3/ChangeLog:

* src/c++17/fs_dir.cc: Fix typo in comment.

hppa: Remove extra clobber from divsi3, udivsi3, modsi3 and umodsi3 patterns

The $$divI, $$divU, $$remI and $$remU millicode calls clobber r1,
r26, r25 and the return link register (r31 or r2). We don't need
to clobber any other registers.

2024-12-12 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.cc (pa_emit_hpdiv_const): Clobber r1, r25,
r25 and return register.
* config/pa/pa.md (divsi3): Revise clobbers and operands.
Remove second clobber from div:SI insns.
(udivsi3, modsi3, umodsi3): Likewise.

Regenerate attr-urls.def.

I noticed there is this new generated file that needs to be updated by
"make regenerate-attr-urls" similarly to "make regenerate-opt-urls", but
nobody had done that recently as the buildbot does not nag about it yet.

gcc/ChangeLog

* attr-urls.def: Regenerate.

Clean up documentation of -Wsuggest-attribute= [PR115532]

The list of -Wsuggest-attribute= variants was out of date in the option
summary (and getting too long to fit on one line), and an index entry was
missing for -Wsuggest-attribute=returns_nonnull.

gcc/c-family/ChangeLog
PR c/115532
* c.opt.urls: Regenerated.

gcc/ChangeLog
PR c/115532
* common.opt.urls: Regenerated.
* doc/invoke.texi (Option Summary): Don't try to list all the
-Wsuggest-attribute= variants inline here.
(Warning Options): Likewise. Add @opindex for
Wsuggest-attribute=returns_nonnull and its no- form. Remove
@itemx for no- form.

Co-Authored-By: Peter Eisentraut <peter@eisentraut.org>

match.pd: Defer some CTZ/CLZ foldings until after ubsan pass for -fsanitize=builtin [PR115127]

As the following testcase shows, -fsanitize=builtin instruments the
builtins in the ubsan pass which is done shortly after going into
SSA, but if optimizations optimize the builtins away before that,
nothing is instrumented. Now, I think it is just fine if the
result of the builtins isn't used in any way and we just DCE them,
but in the following optimizations the result is used.
So, the following patch for -fsanitize=builtin only defers the
optimizations that might turn single argument CLZ/CTZ (aka undefined
at zero) until the ubsan pass is done.
Now, we don't have PROP_ubsan and am not sure it is worth adding it,
there is PROP_ssa set by the ssa pass which is 3 passes before
ubsan, but there are only 2 warning passes in between, so PROP_ssa
looked good enough to me.

2024-12-12 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/115127
* match.pd (clz (X) == C, ctz (X) == C, ctz (X) >= C): Don't
optimize if -fsanitize=builtin and not yet in SSA form.

* c-c++-common/ubsan/builtin-2.c: New test.

OpenMP: Enable has_device_addr clause for 'dispatch' in C/C++

The 'has_device_addr' of 'dispatch' has to be seen in conjunction with the
'need_device_addr' modifier to the 'adjust_args' clause of 'declare variant'.
As the latter has not yet been implemented, 'has_device_addr' has no real
effect. However, to prepare for 'need_device_addr' and as service to the user:

For C, where 'need_device_addr' is not permitted (contrary to C++ and Fortran),
a note is output when then the user tries to use it (alongside the existing
error that either 'nothing' or 'need_device_ptr' was expected).

And, on the ME side, is is lightly handled by diagnosing when - for the
same argument - there is a mismatch between the variant's adjust_args
'need_device_ptr' modifier and dispatch having an 'has_device_addr' clause
(or likewise for need_device_addr with is_device_ptr) as, according to the
spec, those are completely separate.
Thus, 'dispatch' will still do the host to device pointer conversion for
a 'need_device_ptr' argument, even if it appeared in a 'has_device_addr'
clause.

gcc/c/ChangeLog:

* c-parser.cc (OMP_DISPATCH_CLAUSE_MASK): Add has_device_addr clause.
(c_finish_omp_declare_variant): Add an 'inform' telling the user that
'need_device_addr' is invalid for C.

gcc/cp/ChangeLog:

* parser.cc (OMP_DISPATCH_CLAUSE_MASK): Add has_device_addr clause.

gcc/ChangeLog:

* gimplify.cc (gimplify_call_expr): When handling OpenMP's dispatch,
add diagnostic when there is a ptr vs. addr mismatch between
need_device_{addr,ptr} and {is,has}_device_{ptr,addr}, respectively.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/adjust-args-3.c: New test.
* gcc.dg/gomp/adjust-args-2.c: New test.

Fortran: Fix testsuite regressions after r15-5083 [PR117797]

2024-12-12 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/117797
* trans-array.cc (class_array_element_size): New function.
(gfc_get_array_span): Refactor, using class_array_element_size
to return the span for descriptors that are the _data component
of a class expression and then class dummy references. Revert
the conditions to those before r15-5083 tidying up using 'sym'.

gcc/testsuite/
PR fortran/117797
* gfortran.dg/pr117797.f90: New test.

Fix precondition failure with Ada.Numerics.Generic_Real_Arrays.Eigenvalues

This fixes a precondition failure triggered when the Eigenvalues routine
of Ada.Numerics.Generic_Real_Arrays is instantiated with -gnata, beause
it calls Sort_Eigensystem on an empty vector.

gcc/ada
PR ada/117996
* libgnat/a-ngrear.adb (Jacobi): Remove default value for
Compute_Vectors formal parameter.
(Sort_Eigensystem): Add Compute_Vectors formal parameter. Do not
modify the Vectors if Compute_Vectors is False.
(Eigensystem): Pass True as Compute_Vectors to Sort_Eigensystem.
(Eigenvalues): Pass False as Compute_Vectors to Sort_Eigensystem.

gcc/testsuite
* gnat.dg/matrix1.adb: New test.

AVR: target/118000 - Fix copymem from address-spaces.

* rampz_rtx et al. were missing MEM_VOLATILE_P. This is needed because
avr_emit_cpymemhi is setting RAMPZ explicitly with an own insn.

* avr_out_cpymem was missing a final RAMPZ = 0 on EBI devices.

This only affects the __flash1 ... __flash5 spaces since the other ASes
use different routines,

gcc/
PR target/118000
* config/avr/avr.cc (avr_init_expanders) <sreg_rtx>
<rampd_rtx, rampx_rtx, rampy_rtx, rampz_rtx>: Set MEM_VOLATILE_P.
(avr_out_cpymem) [ELPM && EBI]: Restore RAMPZ to 0 after.

ifcombine field-merge: set upper bound for get_best_mode

A bootstrap on aarch64-linux-gnu revealed that sometimes (for example,
when building shorten_branches in final.cc) we will find such things
as MEM <unsigned int>, where unsigned int happens to be a variant of
the original unsigned int type, but with 64-bit alignment. This
unusual alignment circumstance caused (i) get_inner_reference to not
look inside the MEM, (ii) get_best_mode to choose DImode instead of
SImode to access the object, so we built a BIT_FIELD_REF that
attempted to select all 64 bits of a 32-bit object, and that failed
gimple verification ("position plus size exceeds size of referenced
object") because there aren't that many bits in the unsigned int
object.

This patch avoids this failure mode by limiting the bitfield range
with the size of the inner object, if it is a known constant.

This enables us to avoid creating a BIT_FIELD_REF and reusing the load
expr, but we still introduced a separate load, that would presumably
get optimized out, but that is easy enough to avoid in the first place
by reusing the SSA_NAME it was originally loaded into, so I
implemented that in make_bit_field_load.

for gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit the
size of the bitregion in get_best_mode calls by the inner
object's type size, if known.
(make_bit_field_load): Reuse SSA_NAME if we're attempting to
issue an identical load.