git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Refactor std::uninitialized_{copy,fill,fill_n} algos [PR68350]

This refactors the std::uninitialized_copy, std::uninitialized_fill and
std::uninitialized_fill_n algorithms to directly perform memcpy/memset
optimizations instead of dispatching to std::copy/std::fill/std::fill_n.

The reasons for this are:

- Use 'if constexpr' to simplify and optimize compilation throughput, so
  dispatching to specialized class templates is only needed for C++98
  mode.
- Use memcpy instead of memmove, because the conditions on
  non-overlapping ranges are stronger for std::uninitialized_copy than
  for std::copy. Using memcpy might be a minor optimization.
- No special case for creating a range of one element, which std::copy
  needs to deal with (see PR libstdc++/108846). The uninitialized algos
  create new objects, which reuses storage and is allowed to clobber
  tail padding.
- Relax the conditions for using memcpy/memset, because the C++20 rules
  on implicit-lifetime types mean that we can rely on memcpy to begin
  lifetimes of trivially copyable types.  We don't need to require
  trivially default constructible, so don't need to limit the
  optimization to trivial types. See PR 68350 for more details.
- Remove the dependency on std::copy and std::fill. This should mean
  that stl_uninitialized.h no longer needs to include all of
  stl_algobase.h.  This isn't quite true yet, because we still use
  std::fill in __uninitialized_default and still use std::fill_n in
  __uninitialized_default_n. That will be fixed later.

Several tests need changes to the diagnostics matched by dg-error
because we no longer use the __constructible() function that had a
static assert in. Now we just get straightforward errors for attempting
to use a deleted constructor.

Two tests needed more signficant changes to the actual expected results
of executing the tests, because they were checking for old behaviour
which was incorrect according to the standard.
20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
std::copy to be used for a call to std::uninitialized_copy involving two
trivially copyable types. That was incorrect behaviour, because a
non-trivial constructor should have been used, but using std::copy used
trivial default initialization followed by assignment.
20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
the behaviour with a non-integral Size passed to uninitialized_fill_n,
but I wrote the test looking at the requirements of uninitialized_copy_n
which are not the same as uninitialized_fill_n. The former uses --n and
tests n > 0, but the latter just tests n-- (which will never be false
for a floating-point value with a fractional part).

libstdc++-v3/ChangeLog:

PR libstdc++/68350
PR libstdc++/93059
* include/bits/stl_uninitialized.h (__check_constructible)
(_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
[C++98] (__unwrappable_niter): New trait.
(__uninitialized_copy<true>): Replace use of std::copy.
(uninitialized_copy): Fix Doxygen comments. Open-code memcpy
optimization for C++11 and later.
(__uninitialized_fill<true>): Replace use of std::fill.
(uninitialized_fill): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
(__uninitialized_fill_n<true>): Replace use of std::fill_n.
(uninitialized_fill_n): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
Adjust expected behaviour to match what the standard specifies.
* testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error directives.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Likewise.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Likewise.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Move std::__niter_base and std::__niter_wrap to stl_iterator.h

Move the functions for unwrapping and rewrapping __normal_iterator
objects to the same file as the definition of __normal_iterator itself.

This will allow a later commit to make use of std::__niter_base in other
headers without having to include all of <bits/stl_algobase.h>.

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
to ...
* include/bits/stl_iterator.h: ... here.
(__niter_base, __miter_base): Move all overloads to the end of
the header.
* testsuite/24_iterators/normal_iterator/wrapping.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

SVE intrinsics: Add fold_active_lanes_to method to refactor svmul and svdiv.

As suggested in
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html,
this patch adds the method gimple_folder::fold_active_lanes_to (tree X).
This method folds active lanes to X and sets inactive lanes according to
the predication, returning a new gimple statement. That makes folding of
SVE intrinsics easier and reduces code duplication in the
svxxx_impl::fold implementations.
Using this new method, svdiv_impl::fold and svmul_impl::fold were refactored.
Additionally, the method was used for two optimizations:
1) Fold svdiv to the dividend, if the divisor is all ones and
2) for svmul, if one of the operands is all ones, fold to the other operand.
Both optimizations were previously applied to _x and _m predication on
the RTL level, but not for _z, where svdiv/svmul were still being used.
For both optimization, codegen was improved by this patch, for example by
skipping sel instructions with all-same operands and replacing sel
instructions by mov instructions.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Refactor using fold_active_lanes_to and fold to dividend, is the
divisor is all ones.
(svmul_impl::fold): Refactor using fold_active_lanes_to and fold
to the other operand, if one of the operands is all ones.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_active_lanes_to (tree).
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::fold_actives_lanes_to): Add new method to fold
actives lanes to given argument and setting inactives lanes
according to the predication.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
* gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.

[5/n] remove trapv-*.c special-casing of gcc.dg/vect/ files

The following makes -ftrapv explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named trapv-*
* gcc.dg/vect/trapv-vect-reduc-4.c: Add dg-additional-options -ftrapv.

[4/n] remove wrapv-*.c special-casing of gcc.dg/vect/ files

The following makes -fwrapv explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named wrapv-*
* gcc.dg/vect/wrapv-vect-7.c: Add dg-additional-options -fwrapv.
* gcc.dg/vect/wrapv-vect-reduc-2char.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-2short.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c: Likewise.

[3/n] remove fast-math-*.c special-casing of gcc.dg/vect/ files

The following makes -ffast-math explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named fast-math-*
* gcc.dg/vect/fast-math-bb-slp-call-1.c: Add dg-additional-options
-ffast-math.
* gcc.dg/vect/fast-math-bb-slp-call-2.c: Likewise.
* gcc.dg/vect/fast-math-bb-slp-call-3.c: Likewise.
* gcc.dg/vect/fast-math-ifcvt-1.c: Likewise.
* gcc.dg/vect/fast-math-pr35982.c: Likewise.
* gcc.dg/vect/fast-math-pr43074.c: Likewise.
* gcc.dg/vect/fast-math-pr44152.c: Likewise.
* gcc.dg/vect/fast-math-pr55281.c: Likewise.
* gcc.dg/vect/fast-math-slp-27.c: Likewise.
* gcc.dg/vect/fast-math-slp-38.c: Likewise.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/fast-math-vect-call-2.c: Likewise.
* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
* gcc.dg/vect/fast-math-vect-outer-7.c: Likewise.
* gcc.dg/vect/fast-math-vect-pow-1.c: Likewise.
* gcc.dg/vect/fast-math-vect-pow-2.c: Likewise.
* gcc.dg/vect/fast-math-vect-pr25911.c: Likewise.
* gcc.dg/vect/fast-math-vect-pr29925.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-5.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-7.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-8.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-9.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Likewise.

[2/n] remove no-vfa-*.c special-casing of gcc.dg/vect/ files

The following makes --param vect-max-version-for-alias-checks=0
explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named no-vfa-*
* gcc.dg/vect/no-vfa-pr29145.c: Add dg-additional-options
--param vect-max-version-for-alias-checks=0.
* gcc.dg/vect/no-vfa-vect-101.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102a.c: Likewise.
* gcc.dg/vect/no-vfa-vect-37.c: Likewise.
* gcc.dg/vect/no-vfa-vect-43.c: Likewise.
* gcc.dg/vect/no-vfa-vect-45.c: Likewise.
* gcc.dg/vect/no-vfa-vect-49.c: Likewise.
* gcc.dg/vect/no-vfa-vect-51.c: Likewise.
* gcc.dg/vect/no-vfa-vect-53.c: Likewise.
* gcc.dg/vect/no-vfa-vect-57.c: Likewise.
* gcc.dg/vect/no-vfa-vect-61.c: Likewise.
* gcc.dg/vect/no-vfa-vect-79.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
* gcc.dg/vect/no-vfa-vect-dv-2.c: Likewise.

Adjust assert in vect_build_slp_tree_2

The assert in SLP discovery when we handle masked operations is
confusingly wide - all gather variants should be catched by
the earlier STMT_VINFO_GATHER_SCATTER_P.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only expect
IFN_MASK_LOAD for masked loads that are not
STMT_VINFO_GATHER_SCATTER_P.

MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as aarch64 ldp/stp
maintainer.
(Various Maintainers): Add myself as pair fusion maintainer.

testsuite: Add necessary dejagnu directives to pr115815_0.c

I have received an email from the Linaro infrastructure that the test
gcc.dg/lto/pr115815_0.c which I added is failing on arm-eabi and I
realized that not only it is missing dg-require-effective-target
global_constructor but actually any dejagnu directives at all, which
means it is unnecessarily running both at -O0 and -O2 and there is an
unnecesary run test too.  All fixed by this patch.

I have not actually verified that the failure goes away on arm-eabi
but have very high hopes it will.  I have verified that the test still
checks for the bug and also that it passes by running:

  make -k check-gcc RUNTESTFLAGS="lto.exp=*pr115815*"

gcc/testsuite/ChangeLog:

2024-10-14  Martin Jambor  <mjambor@suse.cz>

* gcc.dg/lto/pr115815_0.c: Add dejagu directives.

middle-end: Fix GSI for gcond root [PR117140]

When finding the gsi to use for code of the root statements we should use the
one of the original statement rather than the gcond which may be inside a
pattern.

Without this the emitted instructions may be discarded later.

gcc/ChangeLog:

PR tree-optimization/117140
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Use gsi from
original statement.

gcc/testsuite/ChangeLog:

PR tree-optimization/117140
* gcc.dg/vect/vect-early-break_129-pr117140.c: New test.

middle-end: Fix VEC_PERM_EXPR lowering since relaxation of vector sizes

In GCC 14 VEC_PERM_EXPR was relaxed to be able to permute to a 2x larger vector
than the size of the input vectors. However various passes and transformations
were not updated to account for this.

I have patches in these area that I will be upstreaming with individual patches
that expose them.

This one is that vectlower tries to lower based on the size of the input vectors
rather than the size of the output. As a consequence it creates an invalid
vector of half the size.

Luckily we ICE because the resulting nunits doesn't match the vector size.

gcc/ChangeLog:

* tree-vect-generic.cc (lower_vec_perm): Use output vector size instead
of input vector when determining output nunits.

gcc/testsuite/ChangeLog:

* gcc.dg/vec-perm-lower.c: New test.

AArch64: use movi d0, #0 to clear SVE registers instead of mov z0.d, #0

This patch changes SVE to use Adv. SIMD movi 0 to clear SVE registers when not
in SVE streaming mode. As the Neoverse Software Optimization guides indicate
SVE mov #0 is not a zero cost move.

When In streaming mode we continue to use SVE's mov to clear the registers.

Tests have already been updated.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_output_sve_mov_immediate): Use
fmov for SVE zeros.

AArch64: support encoding integer immediates using floating point moves

This patch extends our immediate SIMD generation cases to support generating
integer immediates using floating point operation if the integer immediate maps
to an exact FP value.

As an example:

uint32x4_t f1() {
    return vdupq_n_u32(0x3f800000);
}

currently generates:

f1:
        adrp    x0, .LC0
        ldr     q0, [x0, #:lo12:.LC0]
        ret

i.e. a load, but with this change:

f1:
        fmov    v0.4s, 1.0e+0
        ret

Such immediates are common in e.g. our Math routines in glibc because they are
created to extract or mark part of an FP immediate as masks.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_sve_valid_immediate,
aarch64_simd_valid_immediate): Refactor accepting modes and values.
(aarch64_float_const_representable_p): Refactor and extract FP checks
into ...
(aarch64_real_float_const_representable_p): ...This and fix fail
fallback from real_to_integer.
(aarch64_advsimd_valid_immediate): Use it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/const_create_using_fmov.c: New test.

AArch64: update testsuite to account for new zero moves

The patch series will adjust how zeros are created. In principal it doesn't
matter the exact lane size a zero gets created on but this makes the tests a
bit fragile.

This preparation patch will update the testsuite to accept multiple variants
of ways to create vector zeros to accept both the current syntax and the one
being transitioned to in the series.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldp_stp_18.c: Update zero regexpr.
* gcc.target/aarch64/memset-corner-cases.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_bf16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s8.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u8.c: Likewise.
* gcc.target/aarch64/sve/const_fold_div_1.c: Likewise.
* gcc.target/aarch64/sve/const_fold_mul_1.c: Likewise.
* gcc.target/aarch64/sve/dup_imm_1.c: Likewise.
* gcc.target/aarch64/sve/fdup_1.c: Likewise.
* gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
* gcc.target/aarch64/sve/fold_mul_zero.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.

arm: [MVE intrinsics] use long_type_suffix / half_type_suffix helpers

In several places we are looking for a type twice or half as large as
the type suffix: this patch introduces helper functions to avoid code
duplication. long_type_suffix is similar to the SVE counterpart, but
adds an 'expected_tclass' parameter. half_type_suffix is similar to
it, but does not exist in SVE.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-shapes.cc (long_type_suffix): New.
(half_type_suffix): New.
(struct binary_move_narrow_def): Use new helper.
(struct binary_move_narrow_unsigned_def): Likewise.
(struct binary_rshift_narrow_def): Likewise.
(struct binary_rshift_narrow_unsigned_def): Likewise.
(struct binary_widen_def): Likewise.
(struct binary_widen_n_def): Likewise.
(struct binary_widen_opt_n_def): Likewise.
(struct unary_widen_def): Likewise.

arm: [MVE intrinsics] rework vsbcq vsbciq

Implement vsbcq vsbciq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patches.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): Add
support for vsbciq and vsbcq.
(vadciq, vadcq): Add new parameter.
(vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.def (vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.h (vsbciq): New.
(vsbcq): New.
* config/arm/arm_mve.h (vsbciq): Delete.
(vsbciq_m): Delete.
(vsbcq): Delete.
(vsbcq_m): Delete.
(vsbciq_s32): Delete.
(vsbciq_u32): Delete.
(vsbciq_m_s32): Delete.
(vsbciq_m_u32): Delete.
(vsbcq_s32): Delete.
(vsbcq_u32): Delete.
(vsbcq_m_s32): Delete.
(vsbcq_m_u32): Delete.
(__arm_vsbciq_s32): Delete.
(__arm_vsbciq_u32): Delete.
(__arm_vsbciq_m_s32): Delete.
(__arm_vsbciq_m_u32): Delete.
(__arm_vsbcq_s32): Delete.
(__arm_vsbcq_u32): Delete.
(__arm_vsbcq_m_s32): Delete.
(__arm_vsbcq_m_u32): Delete.
(__arm_vsbciq): Delete.
(__arm_vsbciq_m): Delete.
(__arm_vsbcq): Delete.
(__arm_vsbcq_m): Delete.

arm: [MVE intrinsics] rework vadcq

Implement vadcq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patch to support
vadciq: we just need to initialize carry from the input parameter.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-base.cc (vadcq_vsbc): Add support
for vadcq.
* config/arm/arm-mve-builtins-base.def (vadcq): New.
* config/arm/arm-mve-builtins-base.h (vadcq): New.
* config/arm/arm_mve.h (vadcq): Delete.
(vadcq_m): Delete.
(vadcq_s32): Delete.
(vadcq_u32): Delete.
(vadcq_m_s32): Delete.
(vadcq_m_u32): Delete.
(__arm_vadcq_s32): Delete.
(__arm_vadcq_u32): Delete.
(__arm_vadcq_m_s32): Delete.
(__arm_vadcq_m_u32): Delete.
(__arm_vadcq): Delete.
(__arm_vadcq_m): Delete.

arm: [MVE intrinsics] rework vadciq

Implement vadciq using the new MVE builtins framework.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): New.
(vadciq): New.
* config/arm/arm-mve-builtins-base.def (vadciq): New.
* config/arm/arm-mve-builtins-base.h (vadciq): New.
* config/arm/arm_mve.h (vadciq): Delete.
(vadciq_m): Delete.
(vadciq_s32): Delete.
(vadciq_u32): Delete.
(vadciq_m_s32): Delete.
(vadciq_m_u32): Delete.
(__arm_vadciq_s32): Delete.
(__arm_vadciq_u32): Delete.
(__arm_vadciq_m_s32): Delete.
(__arm_vadciq_m_u32): Delete.
(__arm_vadciq): Delete.
(__arm_vadciq_m): Delete.

arm: [MVE intrinsics] factorize vadc vadci vsbc vsbci

Factorize vadc/vsbc and vadci/vsbci so that they use the same
parameterized names.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VADCIQ_M_S, VADCIQ_M_U,
VADCIQ_U, VADCIQ_S, VADCQ_M_S, VADCQ_M_U, VADCQ_S, VADCQ_U,
VSBCIQ_M_S, VSBCIQ_M_U, VSBCIQ_S, VSBCIQ_U, VSBCQ_M_S, VSBCQ_M_U,
VSBCQ_S, VSBCQ_U.
(VADCIQ, VSBCIQ): Merge into ...
(VxCIQ): ... this.
(VADCIQ_M, VSBCIQ_M): Merge into ...
(VxCIQ_M): ... this.
(VSBCQ, VADCQ): Merge into ...
(VxCQ): ... this.
(VSBCQ_M, VADCQ_M): Merge into ...
(VxCQ_M): ... this.
* config/arm/mve.md
(mve_vadciq_<supf>v4si, mve_vsbciq_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_<supf>v4si): ... this.
(mve_vadciq_m_<supf>v4si, mve_vsbciq_m_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_m_<supf>v4si): ... this.
(mve_vadcq_<supf>v4si, mve_vsbcq_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_<supf>v4si): ... this.
(mve_vadcq_m_<supf>v4si, mve_vsbcq_m_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_m_<supf>v4si): ... this.

arm: [MVE intrinsics] add vadc_vsbc shape

This patch adds the vadc_vsbc shape description.

2024-08-28 Christophe Lyon <chrirstophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vadc_vsbc): New.
* config/arm/arm-mve-builtins-shapes.h (vadc_vsbc): New.

arm: [MVE intrinsics] remove vshlcq useless expanders

Since we rewrote the implementation of vshlcq intrinsics, we no longer
need these expanders.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-builtins.cc
(arm_ternop_unone_none_unone_imm_qualifiers)
(-arm_ternop_none_none_unone_imm_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (vshlcq_m_vec_s)
(vshlcq_m_carry_s, vshlcq_m_vec_u, vshlcq_m_carry_u): Delete.
* config/arm/mve.md (mve_vshlcq_vec_<supf><mode>): Delete.
(mve_vshlcq_carry_<supf><mode>): Delete.
(mve_vshlcq_m_vec_<supf><mode>): Delete.
(mve_vshlcq_m_carry_<supf><mode>): Delete.

arm: [MVE intrinsics] rework vshlcq

Implement vshlc using the new MVE builtins framework.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (class vshlc_impl): New.
(vshlc): New.
* config/arm/arm-mve-builtins-base.def (vshlcq): New.
* config/arm/arm-mve-builtins-base.h (vshlcq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshlc.
* config/arm/arm_mve.h (vshlcq): Delete.
(vshlcq_m): Delete.
(vshlcq_s8): Delete.
(vshlcq_u8): Delete.
(vshlcq_s16): Delete.
(vshlcq_u16): Delete.
(vshlcq_s32): Delete.
(vshlcq_u32): Delete.
(vshlcq_m_s8): Delete.
(vshlcq_m_u8): Delete.
(vshlcq_m_s16): Delete.
(vshlcq_m_u16): Delete.
(vshlcq_m_s32): Delete.
(vshlcq_m_u32): Delete.
(__arm_vshlcq_s8): Delete.
(__arm_vshlcq_u8): Delete.
(__arm_vshlcq_s16): Delete.
(__arm_vshlcq_u16): Delete.
(__arm_vshlcq_s32): Delete.
(__arm_vshlcq_u32): Delete.
(__arm_vshlcq_m_s8): Delete.
(__arm_vshlcq_m_u8): Delete.
(__arm_vshlcq_m_s16): Delete.
(__arm_vshlcq_m_u16): Delete.
(__arm_vshlcq_m_s32): Delete.
(__arm_vshlcq_m_u32): Delete.
(__arm_vshlcq): Delete.
(__arm_vshlcq_m): Delete.
* config/arm/mve.md (mve_vshlcq_<supf><mode>): Add '@' prefix.
(mve_vshlcq_m_<supf><mode>): Likewise.

arm: [MVE intrinsics] add vshlc shape

This patch adds the vshlc shape description.

2024-08-28 Christophe Lyon <chrirstophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vshlc): New.
* config/arm/arm-mve-builtins-shapes.h (vshlc): New.

arm: [MVE intrinsics] remove useless v[id]wdup expanders

Like with vddup/vidup, we use code_for_mve_q_wb_u_insn, so we can drop
the expanders and their declarations as builtins, now useless.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-builtins.cc
(arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (viwdupq_wb_u, vdwdupq_wb_u)
(viwdupq_m_wb_u, vdwdupq_m_wb_u, viwdupq_m_n_u, vdwdupq_m_n_u)
(vdwdupq_n_u, viwdupq_n_u): Delete.
* config/arm/mve.md (mve_vdwdupq_n_u<mode>): Delete.
(mve_vdwdupq_wb_u<mode>): Delete.
(mve_vdwdupq_m_n_u<mode>): Delete.
(mve_vdwdupq_m_wb_u<mode>): Delete.

arm: [MVE intrinsics] update v[id]wdup tests

Testing v[id]wdup overloads with '1' as argument for uint32_t* does
not make sense: this patch adds a new 'unit32_t *a' parameter to foo2
in such tests.

The difference with v[id]dup tests (where we removed 'foo2') is that
in 'foo1' we test the overload with a variable 'wrap' parameter (b)
and we need foo2 to test the overload with an immediate (1).

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/

* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c: Use pointer
parameter in foo2.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.

arm: [MVE intrinsics] rework vdwdup viwdup

Implement vdwdup and viwdup using the new MVE builtins framework.

In order to share more code with viddup_impl, the patch swaps operands
1 and 2 in @mve_v[id]wdupq_m_wb_u<mode>_insn, so that the parameter
order is similar to what @mve_v[id]dupq_m_wb_u<mode>_insn uses.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (viddup_impl): Add support
for wrapping versions.
(vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.def (vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.h (vdwdupq): New.
(viwdupq): New.
* config/arm/arm_mve.h (vdwdupq_m): Delete.
(vdwdupq_u8): Delete.
(vdwdupq_u32): Delete.
(vdwdupq_u16): Delete.
(viwdupq_m): Delete.
(viwdupq_u8): Delete.
(viwdupq_u32): Delete.
(viwdupq_u16): Delete.
(vdwdupq_x_u8): Delete.
(vdwdupq_x_u16): Delete.
(vdwdupq_x_u32): Delete.
(viwdupq_x_u8): Delete.
(viwdupq_x_u16): Delete.
(viwdupq_x_u32): Delete.
(vdwdupq_m_n_u8): Delete.
(vdwdupq_m_n_u32): Delete.
(vdwdupq_m_n_u16): Delete.
(vdwdupq_m_wb_u8): Delete.
(vdwdupq_m_wb_u32): Delete.
(vdwdupq_m_wb_u16): Delete.
(vdwdupq_n_u8): Delete.
(vdwdupq_n_u32): Delete.
(vdwdupq_n_u16): Delete.
(vdwdupq_wb_u8): Delete.
(vdwdupq_wb_u32): Delete.
(vdwdupq_wb_u16): Delete.
(viwdupq_m_n_u8): Delete.
(viwdupq_m_n_u32): Delete.
(viwdupq_m_n_u16): Delete.
(viwdupq_m_wb_u8): Delete.
(viwdupq_m_wb_u32): Delete.
(viwdupq_m_wb_u16): Delete.
(viwdupq_n_u8): Delete.
(viwdupq_n_u32): Delete.
(viwdupq_n_u16): Delete.
(viwdupq_wb_u8): Delete.
(viwdupq_wb_u32): Delete.
(viwdupq_wb_u16): Delete.
(vdwdupq_x_n_u8): Delete.
(vdwdupq_x_n_u16): Delete.
(vdwdupq_x_n_u32): Delete.
(vdwdupq_x_wb_u8): Delete.
(vdwdupq_x_wb_u16): Delete.
(vdwdupq_x_wb_u32): Delete.
(viwdupq_x_n_u8): Delete.
(viwdupq_x_n_u16): Delete.
(viwdupq_x_n_u32): Delete.
(viwdupq_x_wb_u8): Delete.
(viwdupq_x_wb_u16): Delete.
(viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m_n_u8): Delete.
(__arm_vdwdupq_m_n_u32): Delete.
(__arm_vdwdupq_m_n_u16): Delete.
(__arm_vdwdupq_m_wb_u8): Delete.
(__arm_vdwdupq_m_wb_u32): Delete.
(__arm_vdwdupq_m_wb_u16): Delete.
(__arm_vdwdupq_n_u8): Delete.
(__arm_vdwdupq_n_u32): Delete.
(__arm_vdwdupq_n_u16): Delete.
(__arm_vdwdupq_wb_u8): Delete.
(__arm_vdwdupq_wb_u32): Delete.
(__arm_vdwdupq_wb_u16): Delete.
(__arm_viwdupq_m_n_u8): Delete.
(__arm_viwdupq_m_n_u32): Delete.
(__arm_viwdupq_m_n_u16): Delete.
(__arm_viwdupq_m_wb_u8): Delete.
(__arm_viwdupq_m_wb_u32): Delete.
(__arm_viwdupq_m_wb_u16): Delete.
(__arm_viwdupq_n_u8): Delete.
(__arm_viwdupq_n_u32): Delete.
(__arm_viwdupq_n_u16): Delete.
(__arm_viwdupq_wb_u8): Delete.
(__arm_viwdupq_wb_u32): Delete.
(__arm_viwdupq_wb_u16): Delete.
(__arm_vdwdupq_x_n_u8): Delete.
(__arm_vdwdupq_x_n_u16): Delete.
(__arm_vdwdupq_x_n_u32): Delete.
(__arm_vdwdupq_x_wb_u8): Delete.
(__arm_vdwdupq_x_wb_u16): Delete.
(__arm_vdwdupq_x_wb_u32): Delete.
(__arm_viwdupq_x_n_u8): Delete.
(__arm_viwdupq_x_n_u16): Delete.
(__arm_viwdupq_x_n_u32): Delete.
(__arm_viwdupq_x_wb_u8): Delete.
(__arm_viwdupq_x_wb_u16): Delete.
(__arm_viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m): Delete.
(__arm_vdwdupq_u8): Delete.
(__arm_vdwdupq_u32): Delete.
(__arm_vdwdupq_u16): Delete.
(__arm_viwdupq_m): Delete.
(__arm_viwdupq_u8): Delete.
(__arm_viwdupq_u32): Delete.
(__arm_viwdupq_u16): Delete.
(__arm_vdwdupq_x_u8): Delete.
(__arm_vdwdupq_x_u16): Delete.
(__arm_vdwdupq_x_u32): Delete.
(__arm_viwdupq_x_u8): Delete.
(__arm_viwdupq_x_u16): Delete.
(__arm_viwdupq_x_u32): Delete.
* config/arm/mve.md (@mve_<mve_insn>q_m_wb_u<mode>_insn): Swap
operands 1 and 2.

arm: [MVE intrinsics] add vidwdup shape

This patch adds the vidwdup shape description for vdwdup and viwdup.

It is very similar to viddup, but accounts for the additional 'wrap'
scalar parameter.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vidwdup): New.
* config/arm/arm-mve-builtins-shapes.h (vidwdup): New.

arm: [MVE intrinsics] factorize vdwdup viwdup

Factorize vdwdup and viwdup so that they use the same parameterized
names.

Like with vddup and vidup, we do not bother with the corresponding
expanders, as we stop using them in a subsequent patch.

The patch also adds the missing attributes to vdwdupq_wb_u_insn and
viwdupq_wb_u_insn patterns.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VIWDUPQ, VDWDUPQ,
VIWDUPQ_M, VDWDUPQ_M.
(VIDWDUPQ): New iterator.
(VIDWDUPQ_M): New iterator.
* config/arm/mve.md (mve_vdwdupq_wb_u<mode>_insn)
(mve_viwdupq_wb_u<mode>_insn): Merge into ...
(@mve_<mve_insn>q_wb_u<mode>_insn): ... this. Add missing
mve_unpredicated_insn and mve_move attributes.
(mve_vdwdupq_m_wb_u<mode>_insn, mve_viwdupq_m_wb_u<mode>_insn):
Merge into ...
(@mve_<mve_insn>q_m_wb_u<mode>_insn): ... this.

arm: [MVE intrinsics] fix checks of immediate arguments

As discussed in [1], it is better to use "su64" for immediates in
intrinsics signatures in order to provide better diagnostics
(erroneous constants are not truncated for instance). This patch thus
uses su64 instead of ss32 in binary_lshift_unsigned,
binary_rshift_narrow, binary_rshift_narrow_unsigned, ternary_lshift,
ternary_rshift.

In addition, we fix cases where we called require_integer_immediate
whereas we just want to check that the argument is a scalar, and thus
use require_scalar_type in binary_acca_int32, binary_acca_int64,
unary_int32_acc.

Finally, in binary_lshift_unsigned we just want to check that 'imm' is
an immediate, not the optional predicates.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660262.html

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_acca_int32): Fix
check of scalar argument.
(binary_acca_int64): Likewise.
(binary_lshift_unsigned): Likewise.
(binary_rshift_narrow): Likewise.
(binary_rshift_narrow_unsigned): Likewise.
(ternary_lshift): Likewise.
(ternary_rshift): Likewise.
(unary_int32_acc): Likewise.

arm: [MVE intrinsics] remove v[id]dup expanders

We use code_for_mve_q_u_insn, rather than the expanders used by the
previous implementation, so we can remove the expanders and their
declaration as builtins.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm_mve_builtins.def (vddupq_n_u, vidupq_n_u)
(vddupq_m_n_u, vidupq_m_n_u): Delete.
* config/arm/mve.md (mve_vidupq_n_u<mode>, mve_vidupq_m_n_u<mode>)
(mve_vddupq_n_u<mode>, mve_vddupq_m_n_u<mode>): Delete.

arm: [MVE intrinsics] update v[id]dup tests

Testing v[id]dup overloads with '1' as argument for uint32_t* does not
make sense: instead of choosing the '_wb' overload, we choose the
'_n', but we already do that in the '_n' tests.

This patch removes all such bogus foo2 functions.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Remove foo2.

arm: [MVE intrinsics] rework vddup vidup

Implement vddup and vidup using the new MVE builtins framework.

We generate better code because we take advantage of the two outputs
produced by the v[id]dup instructions.

For instance, before:
ldr r3, [r0]
sub r2, r3, #8
str r2, [r0]
mov r2, r3
vddup.u16 q3, r2, #1

now:
ldr r2, [r0]
vddup.u16 q3, r2, #1
str r2, [r0]

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (class viddup_impl): New.
(vddup): New.
(vidup): New.
* config/arm/arm-mve-builtins-base.def (vddupq): New.
(vidupq): New.
* config/arm/arm-mve-builtins-base.h (vddupq): New.
(vidupq): New.
* config/arm/arm_mve.h (vddupq_m): Delete.
(vddupq_u8): Delete.
(vddupq_u32): Delete.
(vddupq_u16): Delete.
(vidupq_m): Delete.
(vidupq_u8): Delete.
(vidupq_u32): Delete.
(vidupq_u16): Delete.
(vddupq_x_u8): Delete.
(vddupq_x_u16): Delete.
(vddupq_x_u32): Delete.
(vidupq_x_u8): Delete.
(vidupq_x_u16): Delete.
(vidupq_x_u32): Delete.
(vddupq_m_n_u8): Delete.
(vddupq_m_n_u32): Delete.
(vddupq_m_n_u16): Delete.
(vddupq_m_wb_u8): Delete.
(vddupq_m_wb_u16): Delete.
(vddupq_m_wb_u32): Delete.
(vddupq_n_u8): Delete.
(vddupq_n_u32): Delete.
(vddupq_n_u16): Delete.
(vddupq_wb_u8): Delete.
(vddupq_wb_u16): Delete.
(vddupq_wb_u32): Delete.
(vidupq_m_n_u8): Delete.
(vidupq_m_n_u32): Delete.
(vidupq_m_n_u16): Delete.
(vidupq_m_wb_u8): Delete.
(vidupq_m_wb_u16): Delete.
(vidupq_m_wb_u32): Delete.
(vidupq_n_u8): Delete.
(vidupq_n_u32): Delete.
(vidupq_n_u16): Delete.
(vidupq_wb_u8): Delete.
(vidupq_wb_u16): Delete.
(vidupq_wb_u32): Delete.
(vddupq_x_n_u8): Delete.
(vddupq_x_n_u16): Delete.
(vddupq_x_n_u32): Delete.
(vddupq_x_wb_u8): Delete.
(vddupq_x_wb_u16): Delete.
(vddupq_x_wb_u32): Delete.
(vidupq_x_n_u8): Delete.
(vidupq_x_n_u16): Delete.
(vidupq_x_n_u32): Delete.
(vidupq_x_wb_u8): Delete.
(vidupq_x_wb_u16): Delete.
(vidupq_x_wb_u32): Delete.
(__arm_vddupq_m_n_u8): Delete.
(__arm_vddupq_m_n_u32): Delete.
(__arm_vddupq_m_n_u16): Delete.
(__arm_vddupq_m_wb_u8): Delete.
(__arm_vddupq_m_wb_u16): Delete.
(__arm_vddupq_m_wb_u32): Delete.
(__arm_vddupq_n_u8): Delete.
(__arm_vddupq_n_u32): Delete.
(__arm_vddupq_n_u16): Delete.
(__arm_vidupq_m_n_u8): Delete.
(__arm_vidupq_m_n_u32): Delete.
(__arm_vidupq_m_n_u16): Delete.
(__arm_vidupq_n_u8): Delete.
(__arm_vidupq_m_wb_u8): Delete.
(__arm_vidupq_m_wb_u16): Delete.
(__arm_vidupq_m_wb_u32): Delete.
(__arm_vidupq_n_u32): Delete.
(__arm_vidupq_n_u16): Delete.
(__arm_vidupq_wb_u8): Delete.
(__arm_vidupq_wb_u16): Delete.
(__arm_vidupq_wb_u32): Delete.
(__arm_vddupq_wb_u8): Delete.
(__arm_vddupq_wb_u16): Delete.
(__arm_vddupq_wb_u32): Delete.
(__arm_vddupq_x_n_u8): Delete.
(__arm_vddupq_x_n_u16): Delete.
(__arm_vddupq_x_n_u32): Delete.
(__arm_vddupq_x_wb_u8): Delete.
(__arm_vddupq_x_wb_u16): Delete.
(__arm_vddupq_x_wb_u32): Delete.
(__arm_vidupq_x_n_u8): Delete.
(__arm_vidupq_x_n_u16): Delete.
(__arm_vidupq_x_n_u32): Delete.
(__arm_vidupq_x_wb_u8): Delete.
(__arm_vidupq_x_wb_u16): Delete.
(__arm_vidupq_x_wb_u32): Delete.
(__arm_vddupq_m): Delete.
(__arm_vddupq_u8): Delete.
(__arm_vddupq_u32): Delete.
(__arm_vddupq_u16): Delete.
(__arm_vidupq_m): Delete.
(__arm_vidupq_u8): Delete.
(__arm_vidupq_u32): Delete.
(__arm_vidupq_u16): Delete.
(__arm_vddupq_x_u8): Delete.
(__arm_vddupq_x_u16): Delete.
(__arm_vddupq_x_u32): Delete.
(__arm_vidupq_x_u8): Delete.
(__arm_vidupq_x_u16): Delete.
(__arm_vidupq_x_u32): Delete.

arm: [MVE intrinsics] add viddup shape

This patch adds the viddup shape description for vidup and vddup.

This requires the addition of report_not_one_of and
function_checker::require_immediate_one_of to
gcc/config/arm/arm-mve-builtins.cc (they are copies of the aarch64 SVE
counterpart).

This patch also introduces MODE_wb.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-shapes.cc (viddup): New.
* config/arm/arm-mve-builtins-shapes.h (viddup): New.
* config/arm/arm-mve-builtins.cc (report_not_one_of): New.
(function_checker::require_immediate_one_of): New.
* config/arm/arm-mve-builtins.def (wb): New mode.
* config/arm/arm-mve-builtins.h (function_checker) Add
require_immediate_one_of.

arm: [MVE intrinsics] factorize vddup vidup

Factorize vddup and vidup so that they use the same parameterized
names.

This patch updates only the (define_insn
"@mve_<mve_insn>q_u<mode>_insn") patterns and does not bother with the
(define_expand "mve_vidupq_n_u<mode>") ones, because a subsequent
patch avoids using them.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VIDUPQ, VDDUPQ,
VIDUPQ_M, VDDUPQ_M.
(viddupq_op): New.
(viddupq_m_op): New.
(VIDDUPQ): New.
(VIDDUPQ_M): New.
* config/arm/mve.md (mve_vddupq_u<mode>_insn)
(mve_vidupq_u<mode>_insn): Merge into ...
(mve_<mve_insn>q_u<mode>_insn): ... this.
(mve_vddupq_m_wb_u<mode>_insn, mve_vidupq_m_wb_u<mode>_insn):
Merge into ...
(mve_<mve_insn>q_m_wb_u<mode>_insn): ... this.

arm: [MVE intrinsics] rework vctp

Implement vctp using the new MVE builtins framework.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vctpq_impl): New.
(vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-base.def (vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-base.h (vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-shapes.cc (vctp): New.
* config/arm/arm-mve-builtins-shapes.h (vctp): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Add support for vctp.
* config/arm/arm_mve.h (vctp16q): Delete.
(vctp32q): Delete.
(vctp64q): Delete.
(vctp8q): Delete.
(vctp8q_m): Delete.
(vctp64q_m): Delete.
(vctp32q_m): Delete.
(vctp16q_m): Delete.
(__arm_vctp16q): Delete.
(__arm_vctp32q): Delete.
(__arm_vctp64q): Delete.
(__arm_vctp8q): Delete.
(__arm_vctp8q_m): Delete.
(__arm_vctp64q_m): Delete.
(__arm_vctp32q_m): Delete.
(__arm_vctp16q_m): Delete.
* config/arm/mve.md (mve_vctp<MVE_vctp>q<MVE_vpred>): Add '@'
prefix.
(mve_vctp<MVE_vctp>q_m<MVE_vpred>): Likewise.

arm: [MVE intrinsics] rework vorn

Implement vorn using the new MVE builtins framework.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (vornq): New.
* config/arm/arm-mve-builtins-base.def (vornq): New.
* config/arm/arm-mve-builtins-base.h (vornq): New.
* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_exact_insn_vorn): New.
* config/arm/arm_mve.h (vornq): Delete.
(vornq_m): Delete.
(vornq_x): Delete.
(vornq_u8): Delete.
(vornq_s8): Delete.
(vornq_u16): Delete.
(vornq_s16): Delete.
(vornq_u32): Delete.
(vornq_s32): Delete.
(vornq_f16): Delete.
(vornq_f32): Delete.
(vornq_m_s8): Delete.
(vornq_m_s32): Delete.
(vornq_m_s16): Delete.
(vornq_m_u8): Delete.
(vornq_m_u32): Delete.
(vornq_m_u16): Delete.
(vornq_m_f32): Delete.
(vornq_m_f16): Delete.
(vornq_x_s8): Delete.
(vornq_x_s16): Delete.
(vornq_x_s32): Delete.
(vornq_x_u8): Delete.
(vornq_x_u16): Delete.
(vornq_x_u32): Delete.
(vornq_x_f16): Delete.
(vornq_x_f32): Delete.
(__arm_vornq_u8): Delete.
(__arm_vornq_s8): Delete.
(__arm_vornq_u16): Delete.
(__arm_vornq_s16): Delete.
(__arm_vornq_u32): Delete.
(__arm_vornq_s32): Delete.
(__arm_vornq_m_s8): Delete.
(__arm_vornq_m_s32): Delete.
(__arm_vornq_m_s16): Delete.
(__arm_vornq_m_u8): Delete.
(__arm_vornq_m_u32): Delete.
(__arm_vornq_m_u16): Delete.
(__arm_vornq_x_s8): Delete.
(__arm_vornq_x_s16): Delete.
(__arm_vornq_x_s32): Delete.
(__arm_vornq_x_u8): Delete.
(__arm_vornq_x_u16): Delete.
(__arm_vornq_x_u32): Delete.
(__arm_vornq_f16): Delete.
(__arm_vornq_f32): Delete.
(__arm_vornq_m_f32): Delete.
(__arm_vornq_m_f16): Delete.
(__arm_vornq_x_f16): Delete.
(__arm_vornq_x_f32): Delete.
(__arm_vornq): Delete.
(__arm_vornq_m): Delete.
(__arm_vornq_x): Delete.

arm: [MVE intrinsics] factorize vorn

Factorize vorn so that they use parameterized names.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC): Add VORNQ_M_S,
VORNQ_M_U.
(MVE_FP_M_BINARY_LOGIC): Add VORNQ_M_F.
(mve_insn): Add VORNQ_M_S, VORNQ_M_U, VORNQ_M_F.
* config/arm/mve.md (mve_vornq_s<mode>): Rename into ...
(@mve_vornq_s<mode>): ... this.
(mve_vornq_u<mode>): Rename into ...
(@mve_vornq_u<mode>): ... this.
(mve_vornq_f<mode>): Rename into ...
(@mve_vornq_f<mode>): ... this.
(mve_vornq_m_<supf><mode>): Merge into vand/vbic pattern.
(mve_vornq_m_f<mode>): Likewise.

arm: [MVE intrinsics] rework vbicq

Implement vbicq using the new MVE builtins framework.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (vbicq): New.
* config/arm/arm-mve-builtins-base.def (vbicq): New.
* config/arm/arm-mve-builtins-base.h (vbicq): New.
* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_exact_insn_vbic): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Add support for vbicq.
* config/arm/arm_mve.h (vbicq): Delete.
(vbicq_m_n): Delete.
(vbicq_m): Delete.
(vbicq_x): Delete.
(vbicq_u8): Delete.
(vbicq_s8): Delete.
(vbicq_u16): Delete.
(vbicq_s16): Delete.
(vbicq_u32): Delete.
(vbicq_s32): Delete.
(vbicq_n_u16): Delete.
(vbicq_f16): Delete.
(vbicq_n_s16): Delete.
(vbicq_n_u32): Delete.
(vbicq_f32): Delete.
(vbicq_n_s32): Delete.
(vbicq_m_n_s16): Delete.
(vbicq_m_n_s32): Delete.
(vbicq_m_n_u16): Delete.
(vbicq_m_n_u32): Delete.
(vbicq_m_s8): Delete.
(vbicq_m_s32): Delete.
(vbicq_m_s16): Delete.
(vbicq_m_u8): Delete.
(vbicq_m_u32): Delete.
(vbicq_m_u16): Delete.
(vbicq_m_f32): Delete.
(vbicq_m_f16): Delete.
(vbicq_x_s8): Delete.
(vbicq_x_s16): Delete.
(vbicq_x_s32): Delete.
(vbicq_x_u8): Delete.
(vbicq_x_u16): Delete.
(vbicq_x_u32): Delete.
(vbicq_x_f16): Delete.
(vbicq_x_f32): Delete.
(__arm_vbicq_u8): Delete.
(__arm_vbicq_s8): Delete.
(__arm_vbicq_u16): Delete.
(__arm_vbicq_s16): Delete.
(__arm_vbicq_u32): Delete.
(__arm_vbicq_s32): Delete.
(__arm_vbicq_n_u16): Delete.
(__arm_vbicq_n_s16): Delete.
(__arm_vbicq_n_u32): Delete.
(__arm_vbicq_n_s32): Delete.
(__arm_vbicq_m_n_s16): Delete.
(__arm_vbicq_m_n_s32): Delete.
(__arm_vbicq_m_n_u16): Delete.
(__arm_vbicq_m_n_u32): Delete.
(__arm_vbicq_m_s8): Delete.
(__arm_vbicq_m_s32): Delete.
(__arm_vbicq_m_s16): Delete.
(__arm_vbicq_m_u8): Delete.
(__arm_vbicq_m_u32): Delete.
(__arm_vbicq_m_u16): Delete.
(__arm_vbicq_x_s8): Delete.
(__arm_vbicq_x_s16): Delete.
(__arm_vbicq_x_s32): Delete.
(__arm_vbicq_x_u8): Delete.
(__arm_vbicq_x_u16): Delete.
(__arm_vbicq_x_u32): Delete.
(__arm_vbicq_f16): Delete.
(__arm_vbicq_f32): Delete.
(__arm_vbicq_m_f32): Delete.
(__arm_vbicq_m_f16): Delete.
(__arm_vbicq_x_f16): Delete.
(__arm_vbicq_x_f32): Delete.
(__arm_vbicq): Delete.
(__arm_vbicq_m_n): Delete.
(__arm_vbicq_m): Delete.
(__arm_vbicq_x): Delete.
* config/arm/mve.md (mve_vbicq_u<mode>): Rename into ...
(@mve_vbicq_u<mode>): ... this.
(mve_vbicq_s<mode>): Rename into ...
(@mve_vbicq_s<mode>): ... this.
(mve_vbicq_f<mode>): Rename into ...
(@mve_vbicq_f<mode>): ... this.

arm: [MVE intrinsics] rework vcvtaq vcvtmq vcvtnq vcvtpq

Implement vcvtaq vcvtmq vcvtnq vcvtpq using the new MVE builtins
framework.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins-base.def (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins-base.h: (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins.cc (cvtx): New type.
* config/arm/arm_mve.h (vcvtaq_m): Delete.
(vcvtmq_m): Delete.
(vcvtnq_m): Delete.
(vcvtpq_m): Delete.
(vcvtaq_s16_f16): Delete.
(vcvtaq_s32_f32): Delete.
(vcvtnq_s16_f16): Delete.
(vcvtnq_s32_f32): Delete.
(vcvtpq_s16_f16): Delete.
(vcvtpq_s32_f32): Delete.
(vcvtmq_s16_f16): Delete.
(vcvtmq_s32_f32): Delete.
(vcvtpq_u16_f16): Delete.
(vcvtpq_u32_f32): Delete.
(vcvtnq_u16_f16): Delete.
(vcvtnq_u32_f32): Delete.
(vcvtmq_u16_f16): Delete.
(vcvtmq_u32_f32): Delete.
(vcvtaq_u16_f16): Delete.
(vcvtaq_u32_f32): Delete.
(vcvtaq_m_s16_f16): Delete.
(vcvtaq_m_u16_f16): Delete.
(vcvtaq_m_s32_f32): Delete.
(vcvtaq_m_u32_f32): Delete.
(vcvtmq_m_s16_f16): Delete.
(vcvtnq_m_s16_f16): Delete.
(vcvtpq_m_s16_f16): Delete.
(vcvtmq_m_u16_f16): Delete.
(vcvtnq_m_u16_f16): Delete.
(vcvtpq_m_u16_f16): Delete.
(vcvtmq_m_s32_f32): Delete.
(vcvtnq_m_s32_f32): Delete.
(vcvtpq_m_s32_f32): Delete.
(vcvtmq_m_u32_f32): Delete.
(vcvtnq_m_u32_f32): Delete.
(vcvtpq_m_u32_f32): Delete.
(vcvtaq_x_s16_f16): Delete.
(vcvtaq_x_s32_f32): Delete.
(vcvtaq_x_u16_f16): Delete.
(vcvtaq_x_u32_f32): Delete.
(vcvtnq_x_s16_f16): Delete.
(vcvtnq_x_s32_f32): Delete.
(vcvtnq_x_u16_f16): Delete.
(vcvtnq_x_u32_f32): Delete.
(vcvtpq_x_s16_f16): Delete.
(vcvtpq_x_s32_f32): Delete.
(vcvtpq_x_u16_f16): Delete.
(vcvtpq_x_u32_f32): Delete.
(vcvtmq_x_s16_f16): Delete.
(vcvtmq_x_s32_f32): Delete.
(vcvtmq_x_u16_f16): Delete.
(vcvtmq_x_u32_f32): Delete.
(__arm_vcvtpq_u16_f16): Delete.
(__arm_vcvtpq_u32_f32): Delete.
(__arm_vcvtnq_u16_f16): Delete.
(__arm_vcvtnq_u32_f32): Delete.
(__arm_vcvtmq_u16_f16): Delete.
(__arm_vcvtmq_u32_f32): Delete.
(__arm_vcvtaq_u16_f16): Delete.
(__arm_vcvtaq_u32_f32): Delete.
(__arm_vcvtaq_s16_f16): Delete.
(__arm_vcvtaq_s32_f32): Delete.
(__arm_vcvtnq_s16_f16): Delete.
(__arm_vcvtnq_s32_f32): Delete.
(__arm_vcvtpq_s16_f16): Delete.
(__arm_vcvtpq_s32_f32): Delete.
(__arm_vcvtmq_s16_f16): Delete.
(__arm_vcvtmq_s32_f32): Delete.
(__arm_vcvtaq_m_s16_f16): Delete.
(__arm_vcvtaq_m_u16_f16): Delete.
(__arm_vcvtaq_m_s32_f32): Delete.
(__arm_vcvtaq_m_u32_f32): Delete.
(__arm_vcvtmq_m_s16_f16): Delete.
(__arm_vcvtnq_m_s16_f16): Delete.
(__arm_vcvtpq_m_s16_f16): Delete.
(__arm_vcvtmq_m_u16_f16): Delete.
(__arm_vcvtnq_m_u16_f16): Delete.
(__arm_vcvtpq_m_u16_f16): Delete.
(__arm_vcvtmq_m_s32_f32): Delete.
(__arm_vcvtnq_m_s32_f32): Delete.
(__arm_vcvtpq_m_s32_f32): Delete.
(__arm_vcvtmq_m_u32_f32): Delete.
(__arm_vcvtnq_m_u32_f32): Delete.
(__arm_vcvtpq_m_u32_f32): Delete.
(__arm_vcvtaq_x_s16_f16): Delete.
(__arm_vcvtaq_x_s32_f32): Delete.
(__arm_vcvtaq_x_u16_f16): Delete.
(__arm_vcvtaq_x_u32_f32): Delete.
(__arm_vcvtnq_x_s16_f16): Delete.
(__arm_vcvtnq_x_s32_f32): Delete.
(__arm_vcvtnq_x_u16_f16): Delete.
(__arm_vcvtnq_x_u32_f32): Delete.
(__arm_vcvtpq_x_s16_f16): Delete.
(__arm_vcvtpq_x_s32_f32): Delete.
(__arm_vcvtpq_x_u16_f16): Delete.
(__arm_vcvtpq_x_u32_f32): Delete.
(__arm_vcvtmq_x_s16_f16): Delete.
(__arm_vcvtmq_x_s32_f32): Delete.
(__arm_vcvtmq_x_u16_f16): Delete.
(__arm_vcvtmq_x_u32_f32): Delete.
(__arm_vcvtaq_m): Delete.
(__arm_vcvtmq_m): Delete.
(__arm_vcvtnq_m): Delete.
(__arm_vcvtpq_m): Delete.

arm: [MVE intrinsics] add vcvtx shape

This patch adds the vcvtx shape description for vcvtaq, vcvtmq,
vcvtnq, vcvtpq.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vcvtx): New.
* config/arm/arm-mve-builtins-shapes.h (vcvtx): New.

arm: [MVE intrinsics] factorize vcvtaq vcvtmq vcvtnq vcvtpq

Factorize vcvtaq vcvtmq vcvtnq vcvtpq builtins so that they use the
same parameterized names.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTAQ_M_S, VCVTAQ_M_U,
VCVTAQ_S, VCVTAQ_U, VCVTMQ_M_S, VCVTMQ_M_U, VCVTMQ_S, VCVTMQ_U,
VCVTNQ_M_S, VCVTNQ_M_U, VCVTNQ_S, VCVTNQ_U, VCVTPQ_M_S,
VCVTPQ_M_U, VCVTPQ_S, VCVTPQ_U.
(VCVTAQ, VCVTPQ, VCVTNQ, VCVTMQ, VCVTAQ_M, VCVTMQ_M, VCVTNQ_M)
(VCVTPQ_M): Delete.
(VCVTxQ, VCVTxQ_M): New.
* config/arm/mve.md (mve_vcvtpq_<supf><mode>)
(mve_vcvtnq_<supf><mode>, mve_vcvtmq_<supf><mode>)
(mve_vcvtaq_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.
(mve_vcvtaq_m_<supf><mode>, mve_vcvtmq_m_<supf><mode>)
(mve_vcvtpq_m_<supf><mode>, mve_vcvtnq_m_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_m_<supf><mode>): ... this.

arm: [MVE intrinsics] rework vcvtbq_f16_f32 vcvttq_f16_f32 vcvtbq_f32_f16 vcvttq_f32_f16

Implement vcvtbq_f16_f32, vcvttq_f16_f32, vcvtbq_f32_f16 and
vcvttq_f32_f16 using the new MVE builtins framework.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (class vcvtxq_impl): New.
(vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins-base.def (vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins-base.h (vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins.cc (cvt_f16_f32, cvt_f32_f16): New
types.
(function_instance::has_inactive_argument): Support vcvtbq and
vcvttq.
* config/arm/arm_mve.h (vcvttq_f32): Delete.
(vcvtbq_f32): Delete.
(vcvtbq_m): Delete.
(vcvttq_m): Delete.
(vcvttq_f32_f16): Delete.
(vcvtbq_f32_f16): Delete.
(vcvttq_f16_f32): Delete.
(vcvtbq_f16_f32): Delete.
(vcvtbq_m_f16_f32): Delete.
(vcvtbq_m_f32_f16): Delete.
(vcvttq_m_f16_f32): Delete.
(vcvttq_m_f32_f16): Delete.
(vcvtbq_x_f32_f16): Delete.
(vcvttq_x_f32_f16): Delete.
(__arm_vcvttq_f32_f16): Delete.
(__arm_vcvtbq_f32_f16): Delete.
(__arm_vcvttq_f16_f32): Delete.
(__arm_vcvtbq_f16_f32): Delete.
(__arm_vcvtbq_m_f16_f32): Delete.
(__arm_vcvtbq_m_f32_f16): Delete.
(__arm_vcvttq_m_f16_f32): Delete.
(__arm_vcvttq_m_f32_f16): Delete.
(__arm_vcvtbq_x_f32_f16): Delete.
(__arm_vcvttq_x_f32_f16): Delete.
(__arm_vcvttq_f32): Delete.
(__arm_vcvtbq_f32): Delete.
(__arm_vcvtbq_m): Delete.
(__arm_vcvttq_m): Delete.

arm: [MVE intrinsics] add vcvt_f16_f32 and vcvt_f32_f16 shapes

This patch adds the vcvt_f16_f32 and vcvt_f32_f16 shapes descriptions.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vcvt_f16_f32)
(vcvt_f32_f16): New.
* config/arm/arm-mve-builtins-shapes.h (vcvt_f16_f32)
(vcvt_f32_f16): New.

arm: [MVE intrinsics] factorize vcvtbq vcvttq

Factorize vcvtbq, vcvttq so that they use the same parameterized
names.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTBQ_F16_F32,
VCVTTQ_F16_F32, VCVTBQ_F32_F16, VCVTTQ_F32_F16, VCVTBQ_M_F16_F32,
VCVTTQ_M_F16_F32, VCVTBQ_M_F32_F16, VCVTTQ_M_F32_F16.
(VCVTxQ_F16_F32): New iterator.
(VCVTxQ_F32_F16): Likewise.
(VCVTxQ_M_F16_F32): Likewise.
(VCVTxQ_M_F32_F16): Likewise.
* config/arm/mve.md (mve_vcvttq_f32_f16v4sf)
(mve_vcvtbq_f32_f16v4sf): Merge into ...
(@mve_<mve_insn>q_f32_f16v4sf): ... this.
(mve_vcvtbq_f16_f32v8hf, mve_vcvttq_f16_f32v8hf): Merge into ...
(@mve_<mve_insn>q_f16_f32v8hf): ... this.
(mve_vcvtbq_m_f16_f32v8hf, mve_vcvttq_m_f16_f32v8hf): Merge into
...
(@mve_<mve_insn>q_m_f16_f32v8hf): ... this.
(mve_vcvtbq_m_f32_f16v4sf, mve_vcvttq_m_f32_f16v4sf): Merge into
...
(@mve_<mve_insn>q_m_f32_f16v4sf): ... this.

arm: [MVE intrinsics] rework vcvtq

Implement vcvtq using the new MVE builtins framework.

In config/arm/arm-mve-builtins-base.def, the patch also restores the
alphabetical order.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (class vcvtq_impl): New.
(vcvtq): New.
* config/arm/arm-mve-builtins-base.def (vcvtq): New.
* config/arm/arm-mve-builtins-base.h (vcvtq): New.
* config/arm/arm-mve-builtins.cc (cvt): New type.
* config/arm/arm_mve.h (vcvtq): Delete.
(vcvtq_n): Delete.
(vcvtq_m): Delete.
(vcvtq_m_n): Delete.
(vcvtq_x): Delete.
(vcvtq_x_n): Delete.
(vcvtq_f16_s16): Delete.
(vcvtq_f32_s32): Delete.
(vcvtq_f16_u16): Delete.
(vcvtq_f32_u32): Delete.
(vcvtq_s16_f16): Delete.
(vcvtq_s32_f32): Delete.
(vcvtq_u16_f16): Delete.
(vcvtq_u32_f32): Delete.
(vcvtq_n_f16_s16): Delete.
(vcvtq_n_f32_s32): Delete.
(vcvtq_n_f16_u16): Delete.
(vcvtq_n_f32_u32): Delete.
(vcvtq_n_s16_f16): Delete.
(vcvtq_n_s32_f32): Delete.
(vcvtq_n_u16_f16): Delete.
(vcvtq_n_u32_f32): Delete.
(vcvtq_m_f16_s16): Delete.
(vcvtq_m_f16_u16): Delete.
(vcvtq_m_f32_s32): Delete.
(vcvtq_m_f32_u32): Delete.
(vcvtq_m_s16_f16): Delete.
(vcvtq_m_u16_f16): Delete.
(vcvtq_m_s32_f32): Delete.
(vcvtq_m_u32_f32): Delete.
(vcvtq_m_n_f16_u16): Delete.
(vcvtq_m_n_f16_s16): Delete.
(vcvtq_m_n_f32_u32): Delete.
(vcvtq_m_n_f32_s32): Delete.
(vcvtq_m_n_s32_f32): Delete.
(vcvtq_m_n_s16_f16): Delete.
(vcvtq_m_n_u32_f32): Delete.
(vcvtq_m_n_u16_f16): Delete.
(vcvtq_x_f16_u16): Delete.
(vcvtq_x_f16_s16): Delete.
(vcvtq_x_f32_s32): Delete.
(vcvtq_x_f32_u32): Delete.
(vcvtq_x_n_f16_s16): Delete.
(vcvtq_x_n_f16_u16): Delete.
(vcvtq_x_n_f32_s32): Delete.
(vcvtq_x_n_f32_u32): Delete.
(vcvtq_x_s16_f16): Delete.
(vcvtq_x_s32_f32): Delete.
(vcvtq_x_u16_f16): Delete.
(vcvtq_x_u32_f32): Delete.
(vcvtq_x_n_s16_f16): Delete.
(vcvtq_x_n_s32_f32): Delete.
(vcvtq_x_n_u16_f16): Delete.
(vcvtq_x_n_u32_f32): Delete.
(__arm_vcvtq_f16_s16): Delete.
(__arm_vcvtq_f32_s32): Delete.
(__arm_vcvtq_f16_u16): Delete.
(__arm_vcvtq_f32_u32): Delete.
(__arm_vcvtq_s16_f16): Delete.
(__arm_vcvtq_s32_f32): Delete.
(__arm_vcvtq_u16_f16): Delete.
(__arm_vcvtq_u32_f32): Delete.
(__arm_vcvtq_n_f16_s16): Delete.
(__arm_vcvtq_n_f32_s32): Delete.
(__arm_vcvtq_n_f16_u16): Delete.
(__arm_vcvtq_n_f32_u32): Delete.
(__arm_vcvtq_n_s16_f16): Delete.
(__arm_vcvtq_n_s32_f32): Delete.
(__arm_vcvtq_n_u16_f16): Delete.
(__arm_vcvtq_n_u32_f32): Delete.
(__arm_vcvtq_m_f16_s16): Delete.
(__arm_vcvtq_m_f16_u16): Delete.
(__arm_vcvtq_m_f32_s32): Delete.
(__arm_vcvtq_m_f32_u32): Delete.
(__arm_vcvtq_m_s16_f16): Delete.
(__arm_vcvtq_m_u16_f16): Delete.
(__arm_vcvtq_m_s32_f32): Delete.
(__arm_vcvtq_m_u32_f32): Delete.
(__arm_vcvtq_m_n_f16_u16): Delete.
(__arm_vcvtq_m_n_f16_s16): Delete.
(__arm_vcvtq_m_n_f32_u32): Delete.
(__arm_vcvtq_m_n_f32_s32): Delete.
(__arm_vcvtq_m_n_s32_f32): Delete.
(__arm_vcvtq_m_n_s16_f16): Delete.
(__arm_vcvtq_m_n_u32_f32): Delete.
(__arm_vcvtq_m_n_u16_f16): Delete.
(__arm_vcvtq_x_f16_u16): Delete.
(__arm_vcvtq_x_f16_s16): Delete.
(__arm_vcvtq_x_f32_s32): Delete.
(__arm_vcvtq_x_f32_u32): Delete.
(__arm_vcvtq_x_n_f16_s16): Delete.
(__arm_vcvtq_x_n_f16_u16): Delete.
(__arm_vcvtq_x_n_f32_s32): Delete.
(__arm_vcvtq_x_n_f32_u32): Delete.
(__arm_vcvtq_x_s16_f16): Delete.
(__arm_vcvtq_x_s32_f32): Delete.
(__arm_vcvtq_x_u16_f16): Delete.
(__arm_vcvtq_x_u32_f32): Delete.
(__arm_vcvtq_x_n_s16_f16): Delete.
(__arm_vcvtq_x_n_s32_f32): Delete.
(__arm_vcvtq_x_n_u16_f16): Delete.
(__arm_vcvtq_x_n_u32_f32): Delete.
(__arm_vcvtq): Delete.
(__arm_vcvtq_n): Delete.
(__arm_vcvtq_m): Delete.
(__arm_vcvtq_m_n): Delete.
(__arm_vcvtq_x): Delete.
(__arm_vcvtq_x_n): Delete.

arm: [MVE intrinsics] add vcvt shape

This patch adds the vcvt shape description.

It needs to add a new type_suffix_info parameter to
explicit_type_suffix_p (), because vcvt uses overloads for type
suffixes for integer to floating-point conversions, but not for
floating-point to integer.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc
(nonoverloaded_base::explicit_type_suffix_p): Add unused
type_suffix_info parameter.
(overloaded_base::explicit_type_suffix_p): Likewise.
(unary_n_def::explicit_type_suffix_p): Likewise.
(vcvt): New.
* config/arm/arm-mve-builtins-shapes.h (vcvt): New.
* config/arm/arm-mve-builtins.cc (function_builder::get_name): Add
new type_suffix parameter.
(function_builder::add_overloaded_functions): Likewise.
* config/arm/arm-mve-builtins.h
(function_shape::explicit_type_suffix_p): Likewise.

arm: [MVE intrinsics] factorize vcvtq

Factorize vcvtq so that they use parameterized names.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTQ_FROM_F_S,
VCVTQ_FROM_F_U, VCVTQ_M_FROM_F_S, VCVTQ_M_FROM_F_U,
VCVTQ_M_N_FROM_F_S, VCVTQ_M_N_FROM_F_U, VCVTQ_M_N_TO_F_S,
VCVTQ_M_N_TO_F_U, VCVTQ_M_TO_F_S, VCVTQ_M_TO_F_U,
VCVTQ_N_FROM_F_S, VCVTQ_N_FROM_F_U, VCVTQ_N_TO_F_S,
VCVTQ_N_TO_F_U, VCVTQ_TO_F_S, VCVTQ_TO_F_U.
* config/arm/mve.md (mve_vcvtq_to_f_<supf><mode>): Rename into
@mve_<mve_insn>q_to_f_<supf><mode>.
(mve_vcvtq_from_f_<supf><mode>): Rename into
@mve_<mve_insn>q_from_f_<supf><mode>.
(mve_vcvtq_n_to_f_<supf><mode>): Rename into
@mve_<mve_insn>q_n_to_f_<supf><mode>.
(mve_vcvtq_n_from_f_<supf><mode>): Rename into
@mve_<mve_insn>q_n_from_f_<supf><mode>.
(mve_vcvtq_m_to_f_<supf><mode>): Rename into
@mve_<mve_insn>q_m_to_f_<supf><mode>.
(mve_vcvtq_m_n_from_f_<supf><mode>): Rename into
@mve_<mve_insn>q_m_n_from_f_<supf><mode>.
(mve_vcvtq_m_from_f_<supf><mode>): Rename into
@mve_<mve_insn>q_m_from_f_<supf><mode>.
(mve_vcvtq_m_n_to_f_<supf><mode>): Rename into
@mve_<mve_insn>q_m_n_to_f_<supf><mode>.

arm: [MVE intrinsics] Cleanup arm-mve-builtins-functions.h

This patch brings no functional change but removes some code
duplication in arm-mve-builtins-functions.h and makes it easier to
read and maintain.

It introduces a new expand_unspec () member of
unspec_based_mve_function_base and makes a few classes inherit from it
instead of function_base.

This adds 3 new members containing the unspec codes for signed-int,
unsigned-int and floating-point intrinsics (no mode, no predicate).
Depending on the derived class, these will be used instead of the 3
similar RTX codes.

The new expand_unspec () handles all the possible unspecs, some of
which maybe not be supported by a given intrinsics family: such code
paths won't be used in that case.  Similarly, codes specific to a
family (RTX, or PRED_p for instance) should be handled by the caller
of expand_unspec ().

Thanks to this, expand () for unspec_based_mve_function_exact_insn,
unspec_mve_function_exact_insn, unspec_mve_function_exact_insn_pred_p,
unspec_mve_function_exact_insn_vshl no longer duplicate a lot of code.

The patch also makes most of PRED_m and PRED_x handling use the same
code, and uses conditional operators when computing which RTX
code/unspec to use when calling code_for_mve_q_XXX.

2024-07-11  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-functions.h
(unspec_based_mve_function_base): Add m_unspec_for_sint,
m_unspec_for_uint, m_unspec_for_fp and expand_unspec members.
(unspec_based_mve_function_exact_insn): Inherit from
unspec_based_mve_function_base and use expand_unspec.
(unspec_mve_function_exact_insn): Likewise.
(unspec_mve_function_exact_insn_pred_p): Likewise.  Use
conditionals.
(unspec_mve_function_exact_insn_vshl): Likewise.
(unspec_based_mve_function_exact_insn_vcmp): Initialize new
inherited members.  Use conditionals.
(unspec_mve_function_exact_insn_rot): Merge PRED_m and PRED_x
handling.  Use conditionals.
(unspec_mve_function_exact_insn_vmull): Likewise.
(unspec_mve_function_exact_insn_vmull_poly): Likewise.

arm: [MVE intrinsics] remove useless resolve from create shape

vcreateq have no overloaded forms, so there's no need for resolve ().

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (create_def::resolve):
Delete function.

arm: [MVE intrinsics] improve comment for orrq shape

Add a comment about the lack of "n" forms for floating-point nor 8-bit
integers, to make it clearer why we use build_16_32 for MODE_n.

2024-07-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_orrq_def): Improve comment.

Relax boolean processing in vect_maybe_update_slp_op_vectype

The following makes VECTOR_BOOLEAN_TYPE_P processing consistent with
what we do without SLP.  The original motivation for rejecting of
VECTOR_BOOLEAN_TYPE_P extern defs was bad code generation.  But
the non-SLP codepath happily goes along - but always hits the
case of an uniform vector and this case specifically we can now
code-generate optimally.  So the following allows single-lane
externs as well.

Requiring patterns to code-generate can have bad influence on
the vectorization factor though a prototype patch of mine shows
that generating vector compares externally isn't always trivial.

The patch fixes the gcc.dg/vect/vect-early-break_82.c FAIL on x86_64
when --param vect-force-slp=1 is in effect.

PR tree-optimization/117171
* tree-vect-stmts.cc (vect_maybe_update_slp_op_vectype):
Relax vect_external_def VECTOR_BOOLEAN_TYPE_P constraint.

testsuite: arm: Corrected expected error message for cde-mve-error-1.c

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/cde-mve-error-1.c: Corrected quotation in
expected error message.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: Fix typos for AVX10.2 convert testcases

Fix typos related to types for vcvtne[,2]ph[b,h]f8[,s] testcases.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Fix typo.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.

Fortran: Add tolerance to real value comparisons.

gcc/testsuite/ChangeLog:

PR fortran/105361
* gfortran.dg/pr105361.f90: In the comparisons of
real values after a read, use a tolerance so that
subtle differences in results between different
architectures do not fail.

AVR: Rename test case to PR number.

PR rtl-optimization/117191
gcc/testsuite/
* gcc.target/avr/torture/pr117189.c: Rename to...
* gcc.target/avr/torture/pr117191.c: ...this.

aarch64: libgcc: Use -Werror

This patch adds -Werror to LIBGCC2_CFLAGS so that aarch64 can catch
warnings during bootstrap, while not impacting other targets.

The patch also adds -Wno-prio-ctor-dtor to avoid a warning when
compiling lse_init.c

libgcc/
* config/aarch64/t-aarch64: Always use -Werror
-Wno-prio-ctor-dtor.

aarch64: libgcc: add prototypes in cpuinfo

Add prototypes for __init_cpu_features_resolver and
__init_cpu_features to avoid warnings due to -Wmissing-prototypes.

libgcc/
* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add
prototype.
(__init_cpu_features): Likewise.

aarch64: libgcc: Cleanup warnings in lse.S

Since
  Commit c608ada288ced0268bbbbc1fd4136f56c34b24d4
  Author:     Zac Walker <zacwalker@microsoft.com>
  CommitDate: 2024-01-23 15:32:30 +0000

  Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target

lse.S includes aarch64-asm.h, leading to a conflicting definition of macro 'L':
- in lse.S it expands to either '' or 'L'
- in aarch64-asm.h it is used to generate .L ## label

lse.S does not use the second, so this patch just undefines L after
the inclusion of aarch64-asm.h.

libgcc/
* config/aarch64/lse.S: Undefine L() macro.

tree-object-size: Fall back to wholesize for non-const offset

Don't bail out early if the offset to a pointer in __builtin_object_size
is a variable, return the wholesize instead since that is a better
fallback for maximum estimate. This should keep checks in place for
fortified functions to constrain overflows to at lesat some extent.

gcc/ChangeLog:

PR middle-end/77608
* tree-object-size.cc (plus_stmt_object_size): Drop check for
constant offset.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c (test12): New test.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>

AVR: Rename test case to according PR number.

PR rtl-optimization/PR117189
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116550-2.c: Rename to...
* gcc.target/avr/torture/pr117189.c: ...this.

doc: remove outdated C++ Concepts section

This was added as part of the initial Concepts TS implementation and
reflects an early version of the Concepts TS paper, which is very
different from standard C++20 concepts (and even from more recent
versions of the Concepts TS, support for which we deprecated in GCC 14
and removed for GCC 15). So there's not much to salvage from this
section besides the __is_same trait documentation which we can
conveniently move to the previous Type Traits section.

gcc/ChangeLog:

* doc/extend.texi (C++ Concepts): Remove section. Move
__is_same documentation to the previous Type Traits section.

Reviewed-by: Jason Merrill <jason@redhat.com>

SH: Fix typo of commit b717c462b96e

gcc/ChangeLog:
PR target/113533
* config/sh/sh.cc (sh_rtx_costs): Delete wrong semicolon.

rtl-optimization/116550 - Add test cases.

PR rtl-optimization/116550
gcc/testsuite/
* gcc.target/avr/torture/lra-pr116550-1.c: New file.
* gcc.target/avr/torture/lra-pr116550-2.c: New file.

[1/n] remove gcc.dg/vect special naming in favor of dg-additional-options

This kicks off removal of keying options used on testcase names as
done in gcc.dg/vect as the appropriate way to do this is using
dg-additional-options.

Starting with two of the least used ones.

This causes the moved tests to be covered by VECT_ADDITIONAL_FLAGS
processing.

* gcc.dg/vect/vect.exp: Process no-fast-math-* and
no-math-errno-* in the main set.
* gcc.dg/vect/no-fast-math-vect16.c: Add -fno-fast-math.
* gcc.dg/vect/no-math-errno-slp-32.c: Add -fno-math-errno.
* gcc.dg/vect/no-math-errno-vect-pow-1.c: Likewise.

tree-optimization/117172 - single lane SLP for non-linear inductions

The following adds single-lane SLP support for vectorizing non-linear
inductions.

This fixes a bunch of i386 specific testcases with --param vect-force-slp=1.

PR tree-optimization/117172
* tree-vect-loop.cc (vectorizable_nonlinear_induction): Add
single-lane SLP support.

testsuite: Add -march=x86-64-v3 to AVX10 testcases to slience warning for GCC built with AVX512 arch

Currently, when build GCC with config --with-arch=native on AVX512
machines, if we run AVX10.2 testcases, we will get vector size warnings.
It is expected but annoying. Simply add -march=x86-64-v3 to override
--with-arch=native to slience all the warnings.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-25.c: Add -march=x86-64-v3.
* gcc.target/i386/avx10_1-26.c: Ditto.
* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: Ditto.
* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-512-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-512-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Ditto.
* gcc.target/i386/avx10_2-512-minmax-1.c: Ditto.
* gcc.target/i386/avx10_2-512-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Ditto.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-compare-1.c: Ditto.
* gcc.target/i386/avx10_2-compare-1b.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avx10_2-minmax-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcmppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2usis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2usis-2.c: Ditto.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmovd-1.c: Ditto.
* gcc.target/i386/avx10_2-vmovd-2.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-1.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto.

The detailed explanation from PR116550:

Test file: udivmoddi.c
problem insn: 484

Before LRA pass we have:
(insn 484 483 485 72 (parallel [
            (set (reg/v:SI 143 [ __q1 ])
                (plus:SI (reg/v:SI 143 [ __q1 ])
                    (const_int -2 [0xfffffffffffffffe])))
            (clobber (scratch:QI))
        ]) "udivmoddi.c":163:405 discrim 5 186 {addsi3}
     (nil))

LRA substitute all scratches with new pseudos, so we have:
(insn 484 483 485 72 (parallel [
            (set (reg/v:SI 143 [ __q1 ])
                (plus:SI (reg/v:SI 143 [ __q1 ])
                    (const_int -2 [0xfffffffffffffffe])))
            (clobber (reg:QI 619))
        ]) "/mnt/d/avr-lra/udivmoddi.c":163:405 discrim 5 186 {addsi3}
     (expr_list:REG_UNUSED (reg:QI 619)
        (nil)))

Pseudo 619 is a special scratch register generated by LRA which is marked in `scratch_bitmap' and can be tested by call `ira_former_scratch_p(regno)'.

In dump file (udivmoddi.c.317r.reload) we have:
      Creating newreg=619
Removing SCRATCH to p619 in insn #484 (nop 3)
rescanning insn with uid = 484.

After that LRA tries to spill (reg:QI 619)
It's a bug because (reg:QI 619) is an output scratch register which is already something like spill register.

Fragment from udivmoddi.c.317r.reload:
      Choosing alt 2 in insn 484:  (0) r  (1) 0  (2) nYnn  (3) &d {addsi3}
      Creating newreg=728 from oldreg=619, assigning class LD_REGS to r728

IMHO: the bug is in lra-constraints.cc in function `get_reload_reg'
fragment of `get_reload_reg':
  if (type == OP_OUT)
    {
      /* Output reload registers tend to start out with a conservative
choice of register class.  Usually this is ALL_REGS, although
a target might narrow it (for performance reasons) through
targetm.preferred_reload_class.  It's therefore quite common
for a reload instruction to require a more restrictive class
than the class that was originally assigned to the reload register.

In these situations, it's more efficient to refine the choice
of register class rather than create a second reload register.
This also helps to avoid cycling for registers that are only
used by reload instructions.  */
      if (REG_P (original)
  && (int) REGNO (original) >= new_regno_start
  && INSN_UID (curr_insn) >= new_insn_uid_start
__________________________________^^
  && in_class_p (original, rclass, &new_class, true))
{
  unsigned int regno = REGNO (original);
  if (lra_dump_file != NULL)
    {
      fprintf (lra_dump_file, " Reuse r%d for output ", regno);
      dump_value_slim (lra_dump_file, original, 1);
    }

This condition incorrectly limits register reuse to ONLY newly generated instructions.
i.e. LRA can reuse registers only from insns generated by himself.

IMHO:It's wrong.
Scratch registers generated by LRA also have to be reused.

The patch is very simple.
On x86_64, it bootstraps+regtests fine.

gcc/
PR target/116550
* lra-constraints.cc (get_reload_reg): Reuse scratch registers
generated by LRA.

Fix ICE with coarrays and submodules [PR80235]

Exposing a variable in a module and referencing it in a submodule made
the compiler ICE, because the external variable was not sorted into the
correct module. In fact the module name was not set where the variable
got built.

gcc/fortran/ChangeLog:

PR fortran/80235

* trans-decl.cc (gfc_build_qualified_array): Make sure the array
is associated to the correct module and being marked as extern.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/add_sources/submodule_1_sub.f90: New test.
* gfortran.dg/coarray/submodule_1.f90: New test.

Fix gcc.dg/vect/vect-early-break_39.c FAIL with forced SLP

The testcases shows single-element interleaving of size three
being exempted from permutation lowering via heuristics
(see also PR116973). But it wasn't supposed to apply to
non-power-of-two sizes so this amends the check to ensure
the sub-group is aligned even when the number of lanes is one.

* tree-vect-slp.cc (vect_lower_load_permutations): Avoid
exempting non-power-of-two group sizes from lowering.

c, libcpp: Partially implement C2Y N3353 paper [PR117028]

The following patch partially implements the N3353 paper.
In particular, it adds support for the delimited escape sequences
(\u{123}, \x{123}, \o{123}) which were added already for C++23,
all I had to do is split the delimited escape sequence guarding from
named universal character escape sequence guards
(\N{LATIN CAPITAL LETTER C WITH CARON}), which C++23 has but C2Y doesn't
and emit different diagnostics for C from C++ for the delimited escape
sequences.
And it adds support for the new style of octal literals, 0o137 or 0O1777.
I have so far added that just for C and not C++, because I have no idea
whether C++ will want to handle it similarly.

What the patch doesn't do is any kind of diagnostics for obsoletion of
\137 or 0137, as discussed in the PR, I think it is way too early for that.
Perhaps some non-default warning later on.

2024-10-17  Jakub Jelinek  <jakub@redhat.com>

PR c/117028
libcpp/
* include/cpplib.h (struct cpp_options): Add named_uc_escape_seqs,
octal_constants and cpp_warn_c23_c2y_compat members.
(enum cpp_warning_reason): Add CPP_W_C23_C2Y_COMPAT enumerator.
* init.cc (struct lang_flags): Add named_uc_escape_seqs and
octal_constants bit-fields.
(lang_defaults): Add initializers for them into the table.
(cpp_set_lang): Initialize named_uc_escape_seqs and octal_constants.
(cpp_create_reader): Initialize cpp_warn_c23_c2y_compat to -1.
* charset.cc (_cpp_valid_ucn): Test
CPP_OPTION (pfile, named_uc_escape_seqs) rather than
CPP_OPTION (pfile, delimited_escape_seqs) in \N{} related tests.
Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed.  Formatting fixes.
(convert_hex): Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed.
(convert_oct): Likewise.
* expr.cc (cpp_classify_number): Handle C2Y 0o or 0O prefixed
octal constants.
(cpp_interpret_integer): Likewise.
gcc/c-family/
* c.opt (Wc23-c2y-compat): Add CPP and CppReason parameters.
* c-opts.cc (set_std_c2y): Use CLK_STDC2Y or CLK_GNUC2Y rather
than CLK_STDC23 and CLK_GNUC23.  Formatting fix.
* c-lex.cc (interpret_integer): Handle C2Y 0o or 0O prefixed
and wb/WB/uwb/UWB suffixed octal constants.
gcc/testsuite/
* gcc.dg/bitint-112.c: New test.
* gcc.dg/c23-digit-separators-1.c: Add _Static_assert for
valid binary constant with digit separator.
* gcc.dg/c23-octal-constants-1.c: New test.
* gcc.dg/c23-octal-constants-2.c: New test.
* gcc.dg/c2y-digit-separators-1.c: New test.
* gcc.dg/c2y-digit-separators-2.c: New test.
* gcc.dg/c2y-octal-constants-1.c: New test.
* gcc.dg/c2y-octal-constants-2.c: New test.
* gcc.dg/c2y-octal-constants-3.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-3.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-4.c: New test.
* gcc.dg/octal-constants-1.c: New test.
* gcc.dg/octal-constants-2.c: New test.
* gcc.dg/octal-constants-3.c: New test.
* gcc.dg/octal-constants-4.c: New test.
* gcc.dg/system-octal-constants-1.c: New test.
* gcc.dg/system-octal-constants-1.h: New file.

c: Fix up speed up compilation of large char array initializers when not using #embed [PR117177]

Apparently my
c: Speed up compilation of large char array initializers when not using #embed
patch broke building glibc.

The issue is that when using CPP_EMBED, we are guaranteed by the
preprocessor that there is CPP_NUMBER CPP_COMMA before it and
CPP_COMMA CPP_NUMBER after it (or CPP_COMMA CPP_EMBED), so RAW_DATA_CST
never ends up at the end of arrays of unknown length.
Now, the c_parser_initval optimization attempted to preserve that property
rather than changing everything that e.g. inferes array number of elements
from the initializer etc. to deal with RAW_DATA_CST at the end, but
it didn't take into account the possibility that there could be
CPP_COMMA followed by CPP_CLOSE_BRACE (where the CPP_COMMA is redundant).

As we are peaking already at 4 tokens in that code, peeking more would
require using raw tokens and that seems to be expensive doing it for
every pair of tokens due to vec_free done when we are out of raw tokens.

So, the following patch instead determines the case where we want
another INTEGER_CST element after it after consuming the tokens, and just
arranges for another process_init_element.

2024-10-17 Jakub Jelinek <jakub@redhat.com>

PR c/117177
gcc/c/
* c-parser.cc (c_parser_initval): Instead of doing
orig_len == INT_MAX checks before consuming tokens to set
last = 1, check it after consuming it and if not followed
by CPP_COMMA CPP_NUMBER, call process_init_element once
more with the last CPP_NUMBER.
gcc/testsuite/
* c-c++-common/init-4.c: New test.

i386: Fix scalar VCOMSBF16 which only compares low word

gcc/ChangeLog:

* config/i386/sse.md(avx10_2_comsbf16_v8bf): Fixed scalar
operands.

Don't lower vpcmpu to pcmpgt since the latter is for signed comparison.

r15-1737-gb06a108f0fbffe lower AVX512 kmask comparison to AVX2 ones,
but wrong lowered unsigned comparison to signed ones, for unsigned
comparison, only EQ/NEQ can be lowered.

The commit fix that.

gcc/ChangeLog:

PR target/116940
* config/i386/sse.md (*avx2_pcmp<mode>3_7): Change
UNSPEC_PCMP_ITER to UNSPEC_PCMP.
(*avx2_pcmp<mode>3_8): New pre_reload
define_insn_and_splitter.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr116940.c: New test.

Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to (vec_merge (fma: op1 op2 op3) (match_dup 1)) mask)

For masked FMA, there're 2 forms of RTL representation
1) (vec_merge (fma: op2 op1 op3) op1) mask)
2) (vec_merge (fma: op1 op2 op3) op1) mask)
It's because op1 op2 are communatative in RTL(the second op1 is
written as (match_dup 1))

we once tried to replace (match_dup 1)
with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but
trigger an ICE in reload(reload can handle at most one operand with
"0" constraint).

So the patch do the canonicalizaton for the backend part.

gcc/ChangeLog:

PR target/117072
* config/i386/sse.md (<avx512>_fmadd_<mode>_mask<round_name>):
Relax predicates of fma operands from register_operand to
nonimmediate_operand.
(<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
(<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
(<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
(<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
(<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
(<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
(<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.
(<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
(<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
(<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.
(avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
(avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
(avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
(*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
(avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
(*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
(avx512f_vmfnmadd_<mode>_mask<round_name>): Ditto.
(avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
(avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_maskz_1<round_name>): Ditto.
(avx10_2_fmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fmsubnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto.
(fmai_vmfmadd_<mode><round_name>): Swap operands[1] and operands[2].
(fmai_vmfmsub_<mode><round_name>): Ditto.
(fmai_vmfnmadd_<mode><round_name>): Ditto.
(fmai_vmfnmsub_<mode><round_name>): Ditto.
(*fmai_fmadd_<mode>): Swap operands[1] and operands[2] adjust
operands[1] predicates from register_operand to
nonimmediate_operand.
(*fmai_fmsub_<mode>): Ditto.
(*fmai_fnmadd_<mode><round_name>): Ditto.
(*fmai_fnmsub_<mode><round_name>): Ditto.

Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).

For x86 masked fma, there're 2 rtl representations
1) (vec_merge (fma op2 op1 op3) op1 mask)
2) (vec_merge (fma op1 op2 op3) op1 mask).

5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
5896        (vec_merge:VFH_AVX512VL
5897          (fma:VFH_AVX512VL
5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
5901          (match_dup 1)
5902          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
5903  "TARGET_AVX512F && <round_mode_condition>"
5904  "@
5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
5907  [(set_attr "type" "ssemuladd")
5908   (set_attr "prefix" "evex")
5909   (set_attr "mode" "<MODE>")])

Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
we once tried to replace it with (match_operand:M 5
"nonimmediate_operand" "0")) to enable more flexibility for pattern
match and recog, but it triggered an ICE in reload(reload can handle
at most one perand with "0" constraint).

So we need either add 2 patterns in the backend or just do the
canonicalization in the middle-end.

gcc/ChangeLog:

PR middle-end/117072
* combine.cc (maybe_swap_commutative_operands):
Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
to (vec_merge (fma op1 op2 op3) op1 mask).

Support andn_optab for x86

Add new andn pattern to match the new optab added by
r15-1890-gf379596e0ba99d. Only enable 64bit, 128bit and
256bit vector ANDN, X86-64 has mask mov instruction when
avx512 is enabled.

gcc/ChangeLog:

* config/i386/sse.md (andn<mode>3): New.
* config/i386/mmx.md (andn<mode>3): New.

gcc/testsuite/ChangeLog:

* g++.target/i386/vect-cmp.C: New test.

tree-object-size: use size_for_offset in more cases

When wholesize != size, there is a reasonable opportunity for static
object sizes also to be computed using size_for_offset, so use that.

gcc/ChangeLog:

* tree-object-size.cc (plus_stmt_object_size): Call
SIZE_FOR_OFFSET for some negative offset cases.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-3.c (test9): Adjust test.
* gcc.dg/builtin-object-size-4.c (test8): Likewise.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>

arm: [MVE intrinsics] Improve vdupq_n implementation

This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup<mode> pattern into
@mve_vdupq_n<mode>, and removes the now useless
@mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_<mve_insn>q_m_n_<supf><mode> and
@mve_<mve_insn>q_m_n_f<mode>.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0

or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:
--------------------------------
* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  <jolen.li@arm.com>
    Christophe Lyon  <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup<mode>): Rename into ...
(@mve_vdup_n<mode>): ... this.
(@mve_<mve_insn>q_n_f<mode>): Delete.
(@mve_<mve_insn>q_n_<supf><mode>): Delete..
(@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn
attribute.
(@mve_<mve_insn>q_m_n_f<mode>): Likewise.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.

arm: [MVE intrinsics] fix vdup iterator

This patch fixes a bug where the mode iterator for mve_vdup<mode>
should be MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64)
are not supported by MVE.

2024-07-02 Jolen Li <jolen.li@arm.com>
Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/mve.md (mve_vdup<mode>): Fix mode iterator.

diagnostics: capture backtraces in SARIF notifications [PR116602]

This patch makes the SARIF output's crash handler attempt to capture
a backtrace in JSON form within the notification's property bag.  The
precise format of the property is subject to change, but, for example,
in one of the test cases I got output like this:

"properties": {"gcc/backtrace": {"frames": [{"pc": "0x7f39c610a32d",
                                             "function": "pass_crash_test::execute(function*)",
                                             "filename": "/home/david/gcc-newgit/src/gcc/testsuite/gcc.dg/plugin/crash_test_plugin.c",
                                             "lineno": 98}]}}}],

The backtrace code is based on that in diagnostic.cc.

gcc/ChangeLog:
PR other/116602
* diagnostic-format-sarif.cc: Include "demangle.h" and
"backtrace.h".
(sarif_invocation::add_notification_for_ice): Add "backtrace"
param and pass it to ctor.
(sarif_ice_notification::sarif_ice_notification): Add "backtrace"
param and add it to property bag.
(bt_stop): New, taken from diagnostic.cc.
(struct bt_closure): New.
(bt_callback): New, adapted from diagnostic.cc.
(sarif_builder::make_stack_from_backtrace): New.
(sarif_builder::on_report_diagnostic): Attempt to get backtrace
and pass it to add_notification_for_ice.

gcc/testsuite/ChangeLog:
PR other/116602
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_1.py: Add check
for backtrace.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_2.py: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: eliminate m_ice_handler_cb [PR116613]

No functional change intended.

gcc/ChangeLog:
PR other/116613
* diagnostic-format-sarif.cc
(sarif_builder::on_report_diagnostic): Move the fnotice here from
sarif_ice_handler.
(sarif_ice_handler): Delete.
(diagnostic_output_format_init_sarif): Drop setting of ice handler
callback.
* diagnostic.cc (diagnostic_context::initialize): Likewise.
(diagnostic_context::action_after_output): Rather than call
m_ice_handler_cb, instead call finish on this context.
* diagnostic.h (ice_handler_callback_t): Delete typedef.
(diagnostic_context::set_ice_handler_callback): Delete.
(diagnostic_context::m_ice_handler_cb): Delete.

gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Update for
removal of ICE callback.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: Prepare for -std=gnu23 default

Now that C23 support is essentially feature-complete, I'd like to
switch the default language version for C compilation to -std=gnu23.

This requires updating a large number of testcases that fail with the
new language version if left unchanged.  In this patch, update most of
the tests for which there is a safe change that works both before and
after the update to default language version - typically adding the
option -std=gnu17 or -Wno-old-style-definition to the tests.  (There
are also a few tests where I'd like to investigate further why they
fail with -std=gnu23, or where I think such failures show an actual
bug to fix before changing the default language version, or where it
seems more appropriate to make a testcase change that would result in
failures in the absence of the language version change rather than
just adding an option that does nothing with the gnu17 default.)

The libffi test fixes have also been submitted upstream:
<https://github.com/libffi/libffi/pull/861>.

Most of the failures requiring such changes are for one of two
reasons:

* Unprototyped function declarations with () (meaning the same as
  (void) in C23 mode) for a function then called with arguments.

* Old-style function definitions, which warn by default in C23 mode,
  so resulting in test failures for the unexpected warnings.

Other reasons for failures include:

* Tests with their own definitions of bool, true and false.

* Tests of diagnostics (often with -pedantic) in cases where C23 has
  changed semantics, such as:

  - tag compatibility for structs;
  - enum values out of range of int;
  - handing of qualified array types;
  - decimal floating types formerly needing -pedantic diagnostics, but
    being standard in C23.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/testsuite/
* c-c++-common/Wcast-function-type.c: Add -std=gnu17 for C.
* c-c++-common/Wformat-pr84258.c: Add -std=gnu17 for C.
* c-c++-common/Wvarargs.c: Add -std=gnu17 for C.
* c-c++-common/analyzer/data-model-12.c: Add -std=gnu17 for C.
* c-c++-common/builtins.c: Add -std=gnu17 for C.
* c-c++-common/pointer-to-fn1.c: Add -std=gnu17 for C.
* c-c++-common/pragma-diag-17.c: Add -std=gnu17 for C.
* c-c++-common/sizeof-array-argument.c: Add
-Wno-old-style-definition for C.
* g++.dg/lto/pr54625-1_0.c: Add -std=gnu17.
* g++.dg/lto/pr54625-2_0.c: Add -std=gnu17.
* gcc.c-torture/compile/20040214-2.c: Add -std=gnu17.
* gcc.c-torture/compile/921011-2.c: Add -std=gnu17.
* gcc.c-torture/compile/931102-1.c: Add -std=gnu17.
* gcc.c-torture/compile/990801-1.c: Add -std=gnu17.
* gcc.c-torture/compile/nested-1.c: Add -std=gnu17.
* gcc.c-torture/compile/pr100241-1.c: Add -std=gnu17.
* gcc.c-torture/compile/pr106101.c: Add -std=gnu17.
* gcc.c-torture/compile/pr113616.c: Add -std=gnu17.
* gcc.c-torture/compile/pr47967.c: Add -std=gnu17.
* gcc.c-torture/compile/pr51694.c: Add -std=gnu17.
* gcc.c-torture/compile/pr71109.c: Add -std=gnu17.
* gcc.c-torture/compile/pr83051-2.c: Add -std=gnu17.
* gcc.c-torture/compile/pr89663-1.c: Add -std=gnu17.
* gcc.c-torture/compile/pr94238.c: Add -std=gnu17.
* gcc.c-torture/compile/pr96796.c: Add -std=gnu17.
* gcc.c-torture/compile/pr97576.c: Add -std=gnu17.
* gcc.c-torture/compile/udivmod4.c: Add -std=gnu17.
* gcc.c-torture/execute/20010605-2.c: Add -std=gnu17.
* gcc.c-torture/execute/20020404-1.c: Add -std=gnu17.
* gcc.c-torture/execute/20030714-1.c: Add -std=gnu17.
* gcc.c-torture/execute/20051012-1.c: Add -std=gnu17.
* gcc.c-torture/execute/20190820-1.c: Add -std=gnu17.
* gcc.c-torture/execute/920612-1.c: Add -Wno-old-style-definition.
* gcc.c-torture/execute/930608-1.c: Add -std=gnu17.
* gcc.c-torture/execute/comp-goto-1.c: Add -std=gnu17.
* gcc.c-torture/execute/ieee/fp-cmp-1.x: Add -std=gnu17.
* gcc.c-torture/execute/ieee/fp-cmp-2.x: Add -std=gnu17.
* gcc.c-torture/execute/ieee/fp-cmp-3.x: Add -std=gnu17.
* gcc.c-torture/execute/ieee/fp-cmp-4.x: New file.
* gcc.c-torture/execute/ieee/fp-cmp-4f.x: New file.
* gcc.c-torture/execute/ieee/fp-cmp-4l.x: New file.
* gcc.c-torture/execute/loop-9.c: Add -std=gnu17.
* gcc.c-torture/execute/pr103209.c: Add -std=gnu17.
* gcc.c-torture/execute/pr28289.c: Add -std=gnu17.
* gcc.c-torture/execute/pr34982.c: Add -std=gnu17.
* gcc.c-torture/execute/pr67037.c: Add -std=gnu17.
* gcc.c-torture/execute/va-arg-2.c: Add -std=gnu17.
* gcc.dg/20010202-1.c: Add -std=gnu17.
* gcc.dg/20020430-1.c: Add -std=gnu17.
* gcc.dg/20031218-3.c: Add -std=gnu17.
* gcc.dg/20040127-1.c: Add -std=gnu17.
* gcc.dg/20041014-1.c: Add -Wno-old-style-definition.
* gcc.dg/20041122-1.c: Add -std=gnu17.
* gcc.dg/20050309-1.c: Add -std=gnu17.
* gcc.dg/20061026.c: Add -std=gnu17.
* gcc.dg/20101010-1.c: Add -std=gnu17.
* gcc.dg/Warray-parameter-10.c: Add -std=gnu17.
* gcc.dg/Wbuiltin-declaration-mismatch-2.c: Add -std=gnu17.
* gcc.dg/Wbuiltin-declaration-mismatch-3.c: Add -std=gnu17.
* gcc.dg/Wbuiltin-declaration-mismatch-4.c: Add -std=gnu17.
* gcc.dg/Wbuiltin-declaration-mismatch-5.c: Add -std=gnu17.
* gcc.dg/Wbuiltin-declaration-mismatch.c: Add -std=gnu17.
* gcc.dg/Wcxx-compat-2.c: Add -std=gnu17.
* gcc.dg/Wdouble-promotion.c: Add -std=gnu17.
* gcc.dg/Wfree-nonheap-object-7.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-1.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-1a.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-2.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-3.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-4.c: Add -std=gnu17.
* gcc.dg/Wimplicit-int-4a.c: Add -std=gnu17.
* gcc.dg/Wincompatible-pointer-types-1.c: Add -std=gnu17.
* gcc.dg/Wrestrict-19.c: Add -std=gnu17.
* gcc.dg/Wrestrict-4.c: Add -std=gnu17.
* gcc.dg/Wrestrict-5.c: Add -std=gnu17.
* gcc.dg/Wstrict-overflow-20.c: Add -std=gnu17.
* gcc.dg/Wstringop-overflow-13.c: Add -std=gnu17.
* gcc.dg/analyzer/doom-d_main-IdentifyVersion.c: Add -std=gnu17.
* gcc.dg/analyzer/doom-s_sound-pr108867.c: Add -std=gnu17.
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Add
-Wno-old-style-definition.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Add
-Wno-old-style-definition.
* gcc.dg/analyzer/pr93355-localealias.c: Add
-Wno-old-style-definition.
* gcc.dg/analyzer/pr93375.c: Add -std=gnu17.
* gcc.dg/analyzer/pr94688.c: Add -std=gnu17.
* gcc.dg/analyzer/sensitive-1.c: Add -std=gnu17.
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
Add -std=gnu17.
* gcc.dg/analyzer/torture/pr104863.c: Add -std=gnu17.
* gcc.dg/analyzer/torture/pr93379.c: Add -std=gnu17.
* gcc.dg/array-quals-2.c: Add -std=gnu17.
* gcc.dg/attr-invalid.c: Add -Wno-old-style-definition.
* gcc.dg/auto-init-uninit-A.c: Add -Wno-old-style-definition.
* gcc.dg/builtin-choose-expr.c: Declare exit with (int) prototype.
* gcc.dg/builtin-tgmath-err-1.c: Add -std=gnu17.
* gcc.dg/builtins-30.c: Add -std=gnu17.
* gcc.dg/cast-function-1.c: Add -std=gnu17.
* gcc.dg/cleanup-1.c: Add -std=gnu17.
* gcc.dg/compat/struct-complex-1_x.c: Add -std=gnu17.
* gcc.dg/compat/struct-complex-2_x.c: Add -std=gnu17.
* gcc.dg/compat/union-m128-1_x.c: Add -std=gnu17.
* gcc.dg/debug/dwarf2/pr66482.c: Add -std=gnu17.
* gcc.dg/dfp/composite-type-2.c: Add -std=gnu17.
* gcc.dg/dfp/composite-type.c: Add -std=gnu17.
* gcc.dg/dfp/keywords-pedantic.c: Add -std=gnu17.
* gcc.dg/dremf-type-compat-1.c: Add -std=gnu17.
* gcc.dg/dremf-type-compat-2.c: Add -std=gnu17.
* gcc.dg/dremf-type-compat-3.c: Add -std=gnu17.
* gcc.dg/dremf-type-compat-4.c: Add -std=gnu17.
* gcc.dg/enum-compat-1.c: Add -std=gnu17.
* gcc.dg/enum-compat-2.c: Add -std=gnu17.
* gcc.dg/floatn-errs.c: Add -std=gnu17.
* gcc.dg/fltconst-pedantic-dfp.c: Add -std=gnu17.
* gcc.dg/format/proto.c: Add -std=gnu17.
* gcc.dg/format/sentinel-1.c: Add -std=gnu17.
* gcc.dg/gomp/declare-simd-1.c: Add -Wno-old-style-definition.
* gcc.dg/ifelse-1.c: Add -Wno-old-style-definition.
* gcc.dg/inline-33.c: Add -std=gnu17.
* gcc.dg/ipa/inline-5.c: Add -std=gnu17.
* gcc.dg/ipa/ipa-sra-21.c: Add -std=gnu17.
* gcc.dg/ipa/pr102714.c: Add -std=gnu17.
* gcc.dg/ipa/pr104813.c: Add -std=gnu17.
* gcc.dg/ipa/pr108679.c: Add -std=gnu17.
* gcc.dg/ipa/pr42706.c: Add -std=gnu17.
* gcc.dg/ipa/pr88214.c: Add -Wno-old-style-definition.
* gcc.dg/ipa/pr91853.c: Add -Wno-old-style-definition.
* gcc.dg/ipa/pr93763.c: Add -std=gnu17.
* gcc.dg/ipa/pr96482-2.c: Add -std=gnu17.
* gcc.dg/lto/20091013-1_2.c: Add -std=gnu17.
* gcc.dg/lto/20091015-1_2.c: Add -std=gnu17.
* gcc.dg/lto/pr113197_1.c: Add -std=gnu17.
* gcc.dg/lto/pr54702_1.c: Add -std=gnu17.
* gcc.dg/lto/pr99849_0.c: Add -std=gnu17.
* gcc.dg/noncompile/920923-1.c: Add -std=gnu17.
* gcc.dg/noncompile/old-style-parm-1.c: Add
-Wno-old-style-definition.
* gcc.dg/noncompile/old-style-parm-3.c: Add
-Wno-old-style-definition.
* gcc.dg/noncompile/pr30552-2.c: Add -Wno-old-style-definition.
* gcc.dg/noncompile/pr30552-3.c: Add -std=gnu17.
* gcc.dg/noncompile/pr71265.c: Add -Wno-old-style-definition.
* gcc.dg/noncompile/pr79758-2.c: Add -Wno-old-style-definition.
* gcc.dg/noncompile/pr79758.c: Add -Wno-old-style-definition.
* gcc.dg/noncompile/va-arg-1.c: Add -std=gnu17.
* gcc.dg/old-style-prom-1.c: Add -std=gnu17.
* gcc.dg/old-style-prom-2.c: Add -std=gnu17.
* gcc.dg/old-style-prom-3.c: Add -std=gnu17.
* gcc.dg/old-style-then-proto-1.c: Add -std=gnu17.
* gcc.dg/parm-incomplete-1.c: Add -std=gnu17.
* gcc.dg/parm-mismatch-1.c: Add -std=gnu17.
* gcc.dg/permerror-default.c: Add -std=gnu17.
* gcc.dg/permerror-fpermissive-nowarning.c: Add -std=gnu17.
* gcc.dg/permerror-fpermissive.c: Add -std=gnu17.
* gcc.dg/permerror-noerror.c: Add -std=gnu17.
* gcc.dg/permerror-nowarning.c: Add -std=gnu17.
* gcc.dg/permerror-pedantic.c: Add -std=gnu17.
* gcc.dg/plugin/infoleak-net-ethtool-ioctl.c: Add -std=gnu17.
* gcc.dg/pointer-array-quals-1.c: Add -std=gnu17.
* gcc.dg/pointer-array-quals-2.c: Add -std=gnu17.
* gcc.dg/pr100791.c: Add -std=gnu17.
* gcc.dg/pr100843.c: Add -std=gnu17.
* gcc.dg/pr102273.c: Add -std=gnu17.
* gcc.dg/pr102385.c: Add -std=gnu17.
* gcc.dg/pr103222.c: Add -std=gnu17.
* gcc.dg/pr105140.c: Add -std=gnu17.
* gcc.dg/pr105150.c: Add -std=gnu17.
* gcc.dg/pr105250.c: Add -std=gnu17.
* gcc.dg/pr105972.c: Add -Wno-old-style-definition.
* gcc.dg/pr111039.c: Add -std=gnu17.
* gcc.dg/pr111407.c: Add -std=gnu17.
* gcc.dg/pr111922.c: Add -Wno-old-style-definition.
* gcc.dg/pr15236.c: Add -std=gnu17.
* gcc.dg/pr17188-1.c: Add -std=gnu17.
* gcc.dg/pr20368-1.c: Add -std=gnu17.
* gcc.dg/pr20368-2.c: Add -std=gnu17.
* gcc.dg/pr20368-3.c: Add -std=gnu17.
* gcc.dg/pr27331.c: Add -Wno-old-style-definition.
* gcc.dg/pr27861-1.c: Add -std=gnu17.
* gcc.dg/pr28121.c: Add -std=gnu17.
* gcc.dg/pr28243.c: Add -std=gnu17.
* gcc.dg/pr28888.c: Add -std=gnu17.
* gcc.dg/pr29254.c: Add -std=gnu17.
* gcc.dg/pr34457-1.c: Add -std=gnu17.
* gcc.dg/pr36015.c: Add -std=gnu17.
* gcc.dg/pr38245-3.c: Add -std=gnu17.
* gcc.dg/pr38245-4.c: Add -std=gnu17.
* gcc.dg/pr41241.c: Add -std=gnu17.
* gcc.dg/pr43058.c: Add -std=gnu17.
* gcc.dg/pr44539.c: Add -std=gnu17.
* gcc.dg/pr45055.c: Add -std=gnu17.
* gcc.dg/pr50908.c: Add -Wno-old-style-definition.
* gcc.dg/pr60647-1.c: Add -Wno-old-style-definition.
* gcc.dg/pr63762.c: Add -std=gnu17.
* gcc.dg/pr63804.c: Add -std=gnu17.
* gcc.dg/pr68306-3.c: Add -std=gnu17.
* gcc.dg/pr68533.c: Add -std=gnu17.
* gcc.dg/pr69156.c: Add -std=gnu17.
* gcc.dg/pr7356-2.c: Add -Wno-old-style-definition.
* gcc.dg/pr79983.c: Add -std=gnu17.
* gcc.dg/pr83463.c: Add -std=gnu17.
* gcc.dg/pr87347.c: Add -std=gnu17.
* gcc.dg/pr89521-1.c: Add -std=gnu17.
* gcc.dg/pr89521-2.c: Add -std=gnu17.
* gcc.dg/pr90648.c: Add -std=gnu17.
* gcc.dg/pr93573-1.c: Add -std=gnu17.
* gcc.dg/pr94167.c: Add -std=gnu17.
* gcc.dg/pr94705.c: Add -std=gnu17.
* gcc.dg/pr95118.c: Add -std=gnu17.
* gcc.dg/pr96335.c: Add -std=gnu17.
* gcc.dg/pr97830.c: Add -std=gnu17.
* gcc.dg/pr97882.c: Add -std=gnu17.
* gcc.dg/pr99122-2.c: Add -std=gnu17.
* gcc.dg/pr99122-3.c: Add -std=gnu17.
* gcc.dg/qual-component-1.c: Add -std=gnu17.
* gcc.dg/sibcall-6.c: Add -Wno-old-style-definition.
* gcc.dg/sms-2.c: Add -Wno-old-style-definition.
* gcc.dg/tm/20091221.c: Add -std=gnu17.
* gcc.dg/torture/bfloat16-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float128-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float128x-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float16-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float32-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float32x-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float64-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/float64x-basic.c: Add -Wno-old-style-definition.
* gcc.dg/torture/pr102762.c: Add -std=gnu17.
* gcc.dg/torture/pr103987.c: Add -std=gnu17.
* gcc.dg/torture/pr104825.c: Add -Wno-old-style-definition.
* gcc.dg/torture/pr105166.c: Add -std=gnu17.
* gcc.dg/torture/pr105185.c: Add -Wno-old-style-definition.
* gcc.dg/torture/pr109652.c: Add -std=gnu17.
* gcc.dg/torture/pr112444.c: Add -std=gnu17.
* gcc.dg/torture/pr113895-3.c: Add -std=gnu17.
* gcc.dg/torture/pr24626-2.c: Add -std=gnu17.
* gcc.dg/torture/pr25183.c: Add -std=gnu17.
* gcc.dg/torture/pr38948.c: Add -std=gnu17.
* gcc.dg/torture/pr44807.c: Add -std=gnu17.
* gcc.dg/torture/pr47281.c: Add -std=gnu17.
* gcc.dg/torture/pr47958-1.c: Add -Wno-old-style-definition.
* gcc.dg/torture/pr48063.c: Add -std=gnu17.
* gcc.dg/torture/pr57036-1.c: Add -std=gnu17.
* gcc.dg/torture/pr57330.c: Add -std=gnu17.
* gcc.dg/torture/pr57584.c: Add -std=gnu17.
* gcc.dg/torture/pr67741.c: Add -std=gnu17.
* gcc.dg/torture/pr68104.c: Add -std=gnu17.
* gcc.dg/torture/pr69242.c: Add -std=gnu17.
* gcc.dg/torture/pr70457.c: Add -std=gnu17.
* gcc.dg/torture/pr70985.c: Add -std=gnu17.
* gcc.dg/torture/pr71606.c: Add -std=gnu17.
* gcc.dg/torture/pr71816.c: Add -std=gnu17.
* gcc.dg/torture/pr77286.c: Add -std=gnu17.
* gcc.dg/torture/pr77646.c: Add -std=gnu17.
* gcc.dg/torture/pr77677-2.c: Add -std=gnu17.
* gcc.dg/torture/pr78365.c: Add -Wno-old-style-definition.
* gcc.dg/torture/pr79732.c: Add -std=gnu17.
* gcc.dg/torture/pr80612.c: Add -std=gnu17.
* gcc.dg/torture/pr80764.c: Add -std=gnu17.
* gcc.dg/torture/pr80842.c: Add -std=gnu17.
* gcc.dg/torture/pr81900.c: Add -std=gnu17.
* gcc.dg/torture/pr82276.c: Add -std=gnu17.
* gcc.dg/torture/pr84803.c: Add -std=gnu17.
* gcc.dg/torture/pr93124.c: Add -std=gnu17.
* gcc.dg/torture/pr97330-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-prof/comp-goto-1.c: Add -std=gnu17.
* gcc.dg/tree-ssa/20030703-2.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030708-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030709-2.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030709-3.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030710-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030711-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030711-2.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030711-3.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030714-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030714-2.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030728-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030807-10.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030807-11.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030807-3.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030807-6.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030807-7.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030814-4.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030814-5.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030814-6.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20030918-1.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/20040514-2.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/loadpre7.c: Add -Wno-old-style-definition.
* gcc.dg/tree-ssa/pr111003.c: Add -std=gnu17.
* gcc.dg/tree-ssa/pr115128.c: Add -std=gnu17.
* gcc.dg/tree-ssa/pr115191.c: Add -std=gnu17.
* gcc.dg/tree-ssa/pr24840.c: Add -std=gnu17.
* gcc.dg/tree-ssa/pr69666.c: Add -std=gnu17.
* gcc.dg/tree-ssa/pr70232.c: Add -std=gnu17.
* gcc.dg/ubsan/pr79757-1.c: Add -Wno-old-style-definition.
* gcc.dg/ubsan/pr79757-2.c: Add -Wno-old-style-definition.
* gcc.dg/ubsan/pr79757-3.c: Add -Wno-old-style-definition.
* gcc.dg/ubsan/pr81223.c: Add -std=gnu17.
* gcc.dg/uninit-10-O0.c: Add -Wno-old-style-definition.
* gcc.dg/uninit-10.c: Add -Wno-old-style-definition.
* gcc.dg/uninit-32.c: Add -std=gnu17.
* gcc.dg/uninit-41.c: Add -std=gnu17.
* gcc.dg/uninit-A-O0.c: Add -Wno-old-style-definition.
* gcc.dg/uninit-A.c: Add -Wno-old-style-definition.
* gcc.dg/unused-1.c: Add -Wno-old-style-definition.
* gcc.dg/vect/bb-slp-pr114249.c: Add -std=gnu17.
* gcc.dg/vect/bb-slp-pr97486.c: Add -std=gnu17.
* gcc.dg/vect/bb-slp-subgroups-1.c: Add -std=gnu17.
* gcc.dg/vect/bb-slp-subgroups-2.c: Add -std=gnu17.
* gcc.dg/vect/bb-slp-subgroups-3.c: Add -std=gnu17.
* gcc.dg/vect/vect-early-break_111-pr113731.c: Add -std=gnu17.
* gcc.dg/vect/vect-early-break_122-pr114239.c: Add -std=gnu17.
* gcc.dg/vect/vect-multi-peel-gaps.c: Add -std=gnu17.
* gcc.dg/vla-stexp-2.c: Add -std=gnu17.
* gcc.dg/warn-1.c: Add -Wno-old-style-definition.
* gcc.dg/winline-10.c: Add -Wno-old-style-definition.
* gcc.dg/wtr-label-1.c: Add -Wno-old-style-definition.
* gcc.dg/wtr-switch-1.c: Add -Wno-old-style-definition.
* gcc.target/i386/excess-precision-3.c: Add
-Wno-old-style-definition.
* gcc.target/i386/fma4-256-nmsubXX.c: Add -std=gnu17.
* gcc.target/i386/fma4-nmsubXX.c: Add -std=gnu17.
* gcc.target/i386/nop-mcount.c: Add -Wno-old-style-definition.
* gcc.target/i386/pr102627.c: Add -std=gnu17.
* gcc.target/i386/pr106994.c: Add -std=gnu17.
* gcc.target/i386/pr68349.c: Add -std=gnu17.
* gcc.target/i386/pr97313.c: Add -std=gnu17.
* gcc.target/i386/pr99454.c: Add -std=gnu17.
* gcc.target/i386/record-mcount.c: Add -Wno-old-style-definition.

libffi/
* testsuite/libffi.call/va_struct2.c (test_fn): Cast n to void.
* testsuite/libffi.call/va_struct3.c (test_fn): Likewise.
Backported from <https://github.com/libffi/libffi/pull/861>.

c: Add some checking asserts to named loops handling code

Jonathan mentioned an unnamed static analyzer reported issue in
c_finish_bc_name.
It is actually a false positive, because the construction of the
loop_names vector guarantees that the last element of the vector
(if the vector is non-empty) always has either
C_DECL_LOOP_NAME (l) or C_DECL_SWITCH_NAME (l) (or both) flags
set, so c will be always non-NULL after the if at the start of the
loops.
The following patch is an attempt to help those static analyzers
(though dunno if it actually helps), by adding a checking assert.

2024-10-16 Jakub Jelinek <jakub@redhat.com>

* c-decl.cc (c_get_loop_names): Add checking assert that
c is non-NULL in the loop.
(c_finish_bc_name): Likewise.

c: Fix up uninitialized next.original_type use in #embed optimization

Jonathan pointed me at a diagnostic from an unnamed static analyzer
which found that next.original_type isn't initialized for the CPP_EMBED
case when it is parsed in a comma expression, yet
expr.original_type = next.original_type;
is done a few lines later and the expr is returned.

2024-10-16 Jakub Jelinek <jakub@redhat.com>

* c-parser.cc (c_parser_expression): Initialize next.original_type
to integer_type_node for the CPP_EMBED case.

Add libgomp.oacc-fortran/acc_on_device-1-4.f

Kind of undoes r15-4315-g9f549d216c9716 by adding the original testcase back;
namely, adding acc_on_device-1-3.f as acc_on_device-1-4.f with
-fno-builtin-acc_on_device removed.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f: New test;
same as acc_on_device-1-3.f but using the builtin function.

PR116510: Add missing fold_converts into tree switch if conversion

Passes test suite. Ok to commit?

gcc/ChangeLog:

PR middle-end/116510
* tree-if-conv.cc (predicate_bbs): Add missing fold_converts.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-3.c: New test.

Ternary operator formatting fixes

While working on PR117028 C2Y changes, I've noticed weird ternary
operator formatting (operand1 ? operand2: operand3).
The usual formatting is operand1 ? operand2 : operand3
where we have around 18000+ cases of that (counting only what fits
on one line) and
indent -nbad -bap -nbc -bbo -bl -bli2 -bls -ncdb -nce -cp1 -cs -di2 -ndj \
-nfc1 -nfca -hnl -i2 -ip5 -lp -pcs -psl -nsc -nsob
documented in
https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting
does the same.
Some code was even trying to save space as much as possible and used
operand1?operand2:operand3 or
operand1 ? operand2:operand3

Today I've grepped for such cases (the grep was '?.*[^ ]:' and I had to
skim through various false positives with that where the : matched e.g.
stuff inside of strings, or *.md pattern macros or :: scope) and the
following patch is a fix for what I found.

2024-10-16 Jakub Jelinek <jakub@redhat.com>

gcc/
* attribs.cc (lookup_scoped_attribute_spec): ?: operator formatting
fixes.
* basic-block.h (FOR_BB_INSNS_SAFE): Likewise.
* cfgcleanup.cc (outgoing_edges_match): Likewise.
* cgraph.cc (cgraph_node::dump): Likewise.
* config/arc/arc.cc (gen_acc1, gen_acc2): Likewise.
* config/arc/arc.h (CLASS_MAX_NREGS, CONSTANT_ADDRESS_P): Likewise.
* config/arm/arm.cc (arm_print_operand): Likewise.
* config/cris/cris.md (*b<rnzcond:code><mode>): Likewise.
* config/darwin.cc (darwin_asm_declare_object_name,
darwin_emit_common): Likewise.
* config/darwin-driver.cc (darwin_driver_init): Likewise.
* config/epiphany/epiphany.md (call, sibcall, call_value,
sibcall_value): Likewise.
* config/i386/i386.cc (gen_push2): Likewise.
* config/i386/i386.h (ix86_cur_cost): Likewise.
* config/i386/openbsdelf.h (FUNCTION_PROFILER): Likewise.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Likewise.
* config/loongarch/loongarch-cpu.cc (fill_native_cpu_config):
Likewise.
* config/riscv/riscv.cc (riscv_union_memmodels): Likewise.
* config/riscv/zc.md (*mva01s<X:mode>, *mvsa01<X:mode>): Likewise.
* config/rs6000/mmintrin.h (_mm_cmpeq_pi8, _mm_cmpgt_pi8,
_mm_cmpeq_pi16, _mm_cmpgt_pi16, _mm_cmpeq_pi32, _mm_cmpgt_pi32):
Likewise.
* config/v850/predicates.md (pattern_is_ok_for_prologue): Likewise.
* config/xtensa/constraints.md (d, C, W): Likewise.
* coverage.cc (coverage_begin_function, build_init_ctor,
build_gcov_exit_decl): Likewise.
* df-problems.cc (df_create_unused_note): Likewise.
* diagnostic.cc (diagnostic_set_caret_max_width): Likewise.
* diagnostic-path.cc (path_summary::path_summary): Likewise.
* expr.cc (expand_expr_divmod): Likewise.
* gcov.cc (format_gcov): Likewise.
* gcov-dump.cc (dump_gcov_file): Likewise.
* genmatch.cc (main): Likewise.
* incpath.cc (remove_duplicates, register_include_chains): Likewise.
* ipa-devirt.cc (dump_odr_type): Likewise.
* ipa-icf.cc (sem_item_optimizer::merge_classes): Likewise.
* ipa-inline.cc (inline_small_functions): Likewise.
* ipa-polymorphic-call.cc (ipa_polymorphic_call_context::dump):
Likewise.
* ipa-sra.cc (create_parameter_descriptors): Likewise.
* ipa-utils.cc (find_always_executed_bbs): Likewise.
* predict.cc (predict_loops): Likewise.
* selftest.cc (read_file): Likewise.
* sreal.h (SREAL_SIGN, SREAL_ABS): Likewise.
* tree-dump.cc (dequeue_and_dump): Likewise.
* tree-ssa-ccp.cc (bit_value_binop): Likewise.
gcc/c-family/
* c-opts.cc (c_common_init_options, c_common_handle_option,
c_common_finish, set_std_c89, set_std_c99, set_std_c11,
set_std_c17, set_std_c23, set_std_cxx98, set_std_cxx11,
set_std_cxx14, set_std_cxx17, set_std_cxx20, set_std_cxx23,
set_std_cxx26): ?: operator formatting fixes.
gcc/cp/
* search.cc (lookup_member): ?: operator formatting fixes.
* typeck.cc (cp_build_modify_expr): Likewise.
libcpp/
* expr.cc (interpret_float_suffix): ?: operator formatting fixes.

Fix bootstrap on 32-bit SPARC/Solaris

The 'U' constraint cannot be used with LRA.

gcc/
PR target/113952
PR target/117168
* config/sparc/constraints.md ('U'): Delete.
* config/sparc/sparc.md (*movdi_insn_sp32): Remove U alternatives.
(*movdf_insn_sp32): Likewise.
(*mov<VM64:mode>_insn_sp32): Likewise.
* doc/md.texi (SPARC constraints): Remove entry for 'U'.

Daily bump.

Enhance gather fallback for PR65518 with SLP

With SLP forced we fail to use gather for PR65518 on RISC-V as expected
because we're failing due to not effective peeling for gaps. The
following appropriately moves the memory_access_type adjustment before
doing all the overrun checking since using VMAT_ELEMENTWISE means
there's no overrun.

* tree-vect-stmts.cc (get_group_load_store_type): Move
VMAT_ELEMENTWISE fallback for single-element interleaving
of too large groups before overrun checking.

* gcc.dg/vect/pr65518.c: Adjust.

tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

The following is a more complete fix for PR117050, restoring the
ability to permute non-grouped .MASK_LOAD with.

PR tree-optimization/117050
* tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle
non-grouped masked loads when handling permutations.

Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

The following prepares us for SLP instances with a non-uniform number
of lanes.  We already have this with load permutation lowering, but
we managed to keep that within the constraints of the per SLP instance
computed VF based on its max_nunits (with a vector type fixed for
each node) and the instance group size which is the number of lanes
in the SLP instance root.  But in the case where arbitrary splitting
and merging SLP nodes at non-power-of-two lane boundaries is allowed
this simple calculation based on the outgoing group size falls apart.

The following, instead of computing a VF during SLP instance
discovery, computes it at vect_make_slp_decision time by walking
the SLP graph and looking at each SLP node in isolation.  We do
track max_nunits per node which could be a VF per node instead or
forgo with both completely (though for BB vectorization we need
to communicate a VF > 1 requirement upward, or compute that after
the fact).  In the end we'd like to delay vector type assignment
and only compute a minimum VF here, allowing vector types to
grow when the actual VF is bigger.

There's slight complication with permutes of externs / constants
as those get their vector type (and thus max_nunits) assigned late.
While we force them to have the same vector type as the result at
the moment their number of lanes can differ.  So those get handled
explicitly there right now to up the VF as needed - the alternative
is to fail vectorization, I have an addition to
vect_maybe_update_slp_op_vectype that would FAIL if the set
vector type isn't within the constraints of the VF.

* tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove.
(slp_instance::unrolling_factor): Likewise.
* tree-vect-slp.cc (vect_build_slp_instance): Do not set
SLP_INSTANCE_UNROLLING_FACTOR.  Remove then dead code.
Compute and set max_nunits from the RHS nodes merged.
(vect_update_slp_vf_for_node): New function.
(vect_make_slp_decision): Use vect_update_slp_vf_for_node
to compute VF recursively.
(vect_build_slp_store_interleaving): Get max_nunits and
properly set that on the permute nodes built.
(vect_analyze_slp): Do not set SLP_INSTANCE_UNROLLING_FACTOR.

testsuite: Add tests for C23 __STDC_VERSION__

Add some tests for the value of __STDC_VERSION__ in C23 mode.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

* gcc.dg/c23-version-1.c, gcc.dg/c23-version-2.c,
gcc.dg/gnu23-version-1.c: New tests.

libstdc++: Fix Python deprecation warning in printers.py

python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed as positional argument

The Python docs say:

  Deprecated since version 3.13: Passing count and flags as positional
  arguments is deprecated. In future Python versions they will be
  keyword-only parameters.

Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.

RISC-V: Use biggest_mode as mode for constants.

In compute_nregs_for_mode we expect that the current variable's mode is
at most as large as the biggest mode to be used for vectorization.

This might not be true for constants as they don't actually have a mode.
In that case, just use the biggest mode so max_number_of_live_regs
returns 1.

This fixes several test cases in the test suite.

gcc/ChangeLog:

PR target/116655

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Use biggest mode instead of constant's saved mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116655.c: New test.

c: Speed up compilation of large char array initializers when not using #embed

The following patch on attempts to speed up compilation of large char array
initializers when one doesn't use #embed in the source.

My testcase has been
unsigned char a[] = {
#embed "cc1gm2" limit (100000000)
};
and corresponding variant which has the middle line replaced with
dd if=cc1gm bs=100000000 count=1 | xxd -i
With embed 95.3MiB is really fast:
time ./cc1 -quiet -O2 -o test4a.s test4a.c

real    0m0.700s
user    0m0.576s
sys     0m0.123s
Without embed and without this patch it needs around 11GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s test4b.c

real    2m47.230s
user    2m41.548s
sys     0m4.328s
Without embed and with this patch it needs around 3.5GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s2 test4b.c

real    0m25.004s
user    0m23.655s
sys     0m1.308s
Not perfect (but one needs to parse all the numbers, libcpp also creates
strings which are pointed by CPP_NUMBER tokens (that can take up to 4 bytes
per byte), but still almost 7x speed improvement and 3x compile time memory.

One drawback of the patch is that for the larger initializers the precise
locations for -Wconversion warnings are gone when initializing signed char
(or char when it is signed) arrays.

If that is important, perhaps c_maybe_optimize_large_byte_initializer could
tell the caller this is the case and c_parser_initval could emit the
warnings directly when it still knows the location_t and suppress warnings
on the RAW_DATA_CST.

2024-10-16  Jakub Jelinek  <jakub@redhat.com>

* c-tree.h (c_maybe_optimize_large_byte_initializer): Declare.
* c-parser.cc (c_parser_initval): Attempt to optimize large char array
initializers into RAW_DATA_CST.
* c-typeck.cc (c_maybe_optimize_large_byte_initializer): New function.

* c-c++-common/init-1.c: New test.
* c-c++-common/init-2.c: New test.
* c-c++-common/init-3.c: New test.

gimplify: Small RAW_DATA_CST gimplification fix

I've noticed the following testcase hangs during gimplification.

While it is gimplifying an assignment from a VAR_DECL .LCNNN to MEM_REF,
because the VAR_DECL is TREE_READONLY, it will happily pick its initializer
and try to gimplify that, which means recursing to the exact same code.

The following patch fixes that by just gimplifying the lhs and building
assignment, because the code decided that it should use copying from
a static var.

2024-10-16 Jakub Jelinek <jakub@redhat.com>

* gimplify.cc (gimplify_init_ctor_eval): For larger RAW_DATA_CST,
just gimplify cref as lvalue and add gimple assignment of rctor
to cref instead of going through gimplification of INIT_EXPR, as
the latter can suffer from infinite recursion.

* c-c++-common/cpp/embed-24.c: New test.

libcpp, c, middle-end: Optimize initializers using #embed in C

This patch actually optimizes #embed, so far in C.

For a simple testcase (for 494447200 bytes long cc1plus):
cat embed-11.c
unsigned char a[] = {
  #embed "cc1plus"
};
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real    0m13.647s
user    0m7.157s
sys     0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real    0m28.649s
user    0m26.653s
sys     0m1.958s

and when configured against binutils with .base64 support
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real    0m4.283s
user    0m2.288s
sys     0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real    0m6.888s
user    0m5.876s
sys     0m1.002s

(all times with --enable-checking=yes,rtl,extra compiler).

Even just
./cc1plus -E -o embed-11.i embed-11.c
(which doesn't have this optimization yet and so preprocesses it as
1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
preprocessed fine).
And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
I gave up after 400 seconds when it already ate 45GB of RAM and didn't
produce a single byte into embed-11.s yet.

The patch introduces a new CPP_EMBED token which contains raw memory image
virtually representing a sequence of int literals.
To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
is only emitted if there are 4+ (it actually does that for 64+ right now)
literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
are limited to INT_MAX elements).  The main reason is that the preprocessor
doesn't really know in which context #embed directive appears, there could
be e.g.
{ 25 *
  #embed "whatever"
* 2 - 15 }
or similar and dealing with this special case deep in the expression parsing
is undesirable.
With the CPP_NUMBERs around it, I believe in the C FE the only places which
need handling of the CPP_EMBED token are initializer parsing (that is the
only one which adds actual optimizations for it), comma expressions (I
believe nothing really cares whether it is 25,13,95 or
25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
the parsing just adds one INTEGER_CST to the comma expression, I doubt users
want to be spammed with millions of -Wunused warnings per #embed),
whatever uses c_parser_expr_list (function calls, attribute arguments,
OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
c_parser_get_builtin_args (mainly for __builtin_shufflevector).  Please correct
me if I'm wrong.

The patch introduces a RAW_DATA_CST tree code, which can then be used inside
of array CONSTRUCTOR elt values.  In some sense RAW_DATA_CST is similar to
STRING_CST, but right now STRING_CST is used only if the whole array
initializer is that constant, while RAW_DATA_CST at index idx (should be
always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
[30 ... 250] =
  #embed "whatever"
really does what it would do with a integer sequence there) stands for
[idx] = RAW_DATA_POINTER (val)[0],
[idx+1] = RAW_DATA_POINTER (val)[1],
...
[idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH (val)-1].
Another important thing is that unlike STRING_CST which has the data
embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
standing for data owned by libcpp buffers).  The advantage is that it can be
cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
designated initializer to store something into the middle of a 10GB #embed
array, in no case we need to actually copy data around for that.
Right now RAW_DATA_CST is only used in initializers of integral arrays where
the integer type has (host) CHAR_BIT precision, so usually char/signed
char/unsigned char (for C++ later maybe std::byte); in theory we could say
allocate 4 times as big buffer for conversions to int array and depending
on endianity and storage order reversal etc., but I'm not sure if that is
something that will be actually needed in the wild.
And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
CPP_EMBED CPP_NUMBER division in case one uses #embed the usual way and
doesn't use the boundary literals in weird ways and the values there match
the surrounding bytes in the owner buffer.

For LTO, in order to avoid copying perhaps gigabytes long data around,
the hacks in the streamer out/in cause the data owned by libcpp to be
streamed right into the stream and streamed back as a STRING_CST which
owns the data.

2024-10-16  Jakub Jelinek  <jakub@redhat.com>

libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_EMBED token type.
* files.cc (finish_embed): For limit >= 64 and C preprocessing
instead of emitting CPP_NUMBER CPP_COMMA separated sequence for the
whole embed emit it just for the first and last byte and in between
emit a CPP_EMBED token or tokens if too large.
gcc/
* treestruct.def (TS_RAW_DATA_CST): New.
* tree.def (RAW_DATA_CST): New tree code.
* tree-core.h (struct tree_raw_data): New type.
(union tree_node): Add raw_data_cst member.
* tree.h (RAW_DATA_LENGTH, RAW_DATA_POINTER, RAW_DATA_OWNER): Define.
(gt_ggc_mx, gt_pch_nx): Declare overloads for tree_raw_data *.
* tree.cc (tree_node_structure_for_code): Handle RAW_DATA_CST.
(initialize_tree_contains_struct): Handle TS_RAW_DATA_CST.
(tree_code_size): Handle RAW_DATA_CST.
(initializer_zerop): Likewise.
(gt_ggc_mx, gt_pch_nx): Define overloads for tree_raw_data *.
* gimplify.cc (gimplify_init_ctor_eval): Handle RAW_DATA_CST.
* fold-const.cc (operand_compare::operand_equal_p): Handle
RAW_DATA_CST.  Formatting fix.
(operand_compare::hash_operand): Handle RAW_DATA_CST.
(native_encode_initializer): Likewise.
(get_array_ctor_element_at_index): Likewise.
(fold): Likewise.
* gimple-fold.cc (fold_array_ctor_reference): Likewise.  Formatting
fix.
* varasm.cc (const_hash_1): Handle RAW_DATA_CST.
(initializer_constant_valid_p_1): Likewise.
(array_size_for_constructor): Likewise.
(output_constructor_regular_field): Likewise.
* expr.cc (categorize_ctor_elements_1): Likewise.
(expand_expr_real_1) <case ARRAY_REF>: Punt for RAW_DATA_CST.
* tree-streamer.cc (streamer_check_handled_ts_structures): Mark
TS_RAW_DATA_CST as handled.
* tree-streamer-in.cc (streamer_alloc_tree): Handle RAW_DATA_CST.
(lto_input_ts_raw_data_cst_tree_pointers): New function.
(streamer_read_tree_body): Call it for RAW_DATA_CST.
* tree-streamer-out.cc (write_ts_raw_data_cst_tree_pointers): New
function.
(streamer_write_tree_body): Call it for RAW_DATA_CST.
(streamer_write_tree_header): Handle RAW_DATA_CST.
* lto-streamer-out.cc (DFS::DFS_write_tree_body): Handle RAW_DATA_CST.
* tree-pretty-print.cc (dump_generic_node): Likewise.
gcc/c-family/
* c-ppoutput.cc (token_streamer::stream): Add special code to spell
CPP_EMBED token.
* c-lex.cc (c_lex_with_flags): Handle CPP_EMBED.  Formatting fix.
* c-common.cc (c_parse_error): Handle CPP_EMBED.
(braced_list_to_string): Optimize RAW_DATA_CST surrounded by
INTEGER_CSTs which match some bytes before or after RAW_DATA_CST in
its owner.
gcc/c/
* c-parser.cc (c_parser_braced_init): Handle CPP_EMBED.
(c_parser_get_builtin_args): Likewise.
(c_parser_expression): Likewise.
(c_parser_expr_list): Likewise.
* c-typeck.cc (digest_init): Handle RAW_DATA_CST.  Formatting fix.
(init_node_successor): New function.
(add_pending_init): Handle RAW_DATA_CST.
(set_nonincremental_init): Formatting fix.
(output_init_element): Handle RAW_DATA_CST.  Formatting fixes.
(maybe_split_raw_data): New function.
(process_init_element): Use maybe_split_raw_data.  Handle
RAW_DATA_CST.
gcc/testsuite/
* c-c++-common/cpp/embed-20.c: New test.
* c-c++-common/cpp/embed-21.c: New test.
* c-c++-common/cpp/embed-28.c: New test.
* gcc.dg/cpp/embed-8.c: New test.
* gcc.dg/cpp/embed-9.c: New test.
* gcc.dg/cpp/embed-10.c: New test.
* gcc.dg/cpp/embed-11.c: New test.
* gcc.dg/cpp/embed-12.c: New test.
* gcc.dg/cpp/embed-13.c: New test.
* gcc.dg/cpp/embed-14.c: New test.
* gcc.dg/cpp/embed-15.c: New test.
* gcc.dg/cpp/embed-16.c: New test.
* gcc.dg/pch/embed-1.c: New test.
* gcc.dg/pch/embed-1.hs: New test.
* gcc.dg/lto/embed-1_0.c: New test.
* gcc.dg/lto/embed-1_1.c: New test.

vax: fixup vax.opt.urls

Needed after r15-4373-gb388f65abc71c9.

gcc/ChangeLog:

* config/vax/vax.opt.urls: Adjust index for -mlra.