git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: Add test directive checking removal of link_error

This test needs a directive checking the removal of the link_error.
Committed as obvious.

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/testsuite/
* gcc.dg/tree-ssa/log_ident.c: Add scan for removal of
link_error in optimized tree dump.

c++: redundant hashing in register_specialization

After r15-4050-g5dad738c1dd164 register_specialization needs to set
elt.hash to the (maybe) precomputed hash so that the lookup uses it
rather than redundantly computing it from scratch.

gcc/cp/ChangeLog:

* pt.cc (register_specialization): Set elt.hash.

Reviewed-by: Jason Merrill <jason@redhat.com>

testsuite: Skip pr112305.c for -O[01] on simulators

gcc.dg/torture/pr112305.c contains an inner loop that executes
0x8000_0014 times and an outer loop that executes 5 times, giving about
10 billion total executions of the inner loop body.  At -O2 and above we
are able to remove the inner loop, but at -O1 we keep a no-op loop:

        dls     lr, r3
.L3:
        subs    r3, r3, #1
        le      lr, .L3

and at -O0 we of course don't optimise.

This can lead to long execution times on simulators, possibly
triggering a timeout.

gcc/testsuite
* gcc.dg/torture/pr112305.c: Skip at -O0 and -O1 for simulators.

c++/modules: Handle forward-declared class types

In some cases we can access members of a namespace-scope class without
ever having performed name-lookup on it; this can occur when a
forward-declaration of the class is used as a return type, for
instance, or with PIMPL.

One possible approach would be to do name lookup in complete_type to
force lazy loading to occur, but this seems overly expensive for a
relatively rare case. Instead, this patch generalises the existing
pending-entity support to handle this case as well.

Unfortunately this does mean that almost every class definition will be
added to the pending-entity table, and almost always unnecessarily, but
I don't see a good way to avoid this.

gcc/cp/ChangeLog:

* module.cc (depset::DB_IS_MEMBER_BIT): Rename to...
(depset::DB_IS_PENDING_BIT): ...this.
(depset::is_member): Remove.
(depset::is_pending_entity): New function.
(depset::hash::make_dependency): Mark definitions of
namespace-scope types as maybe-pending entities.
(depset::hash::add_class_entities): Rename DB_IS_MEMBER_BIT to
DB_IS_PENDING_BIT.
(depset::hash::find_dependencies): Use is_pending_entity
instead of is_member.
(module_state::write_pendings): Likewise; adjust comment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/inst-4_b.C: Adjust pending-entity count.
* g++.dg/modules/member-def-1_c.C: Likewise.
* g++.dg/modules/member-def-2_c.C: Likewise.
* g++.dg/modules/tpl-spec-3_b.C: Likewise.
* g++.dg/modules/tpl-spec-4_b.C: Likewise.
* g++.dg/modules/tpl-spec-5_b.C: Likewise.
* g++.dg/modules/class-9_a.H: New test.
* g++.dg/modules/class-9_b.H: New test.
* g++.dg/modules/class-9_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

tree-optimization/117254 - ICE with access diangostics

The diagnostics code fails to handle non-constant domain max.

PR tree-optimization/117254
* gimple-ssa-warn-access.cc (maybe_warn_nonstring_arg):
Check the array domain max is constant before using it.

* gcc.dg/pr117254.c: New testcase.

amdgcn: Refactor device settings into a def file

Almost all device-specific settings are now centralised into gcn-devices.def
for the compiler, mkoffload, and libgomp. No longer will we have to touch 10
files in multiple places just to add another device without any exotic
features. (New ISAs and devices with incompatible metadata will continue to
need a bit more.)

In order to remove the device-specific conditionals in the code a new value
HSACO_ATTR_UNSUPPORTED has been added, indicating that the assembler will
reject any setting of that option.

This incorporates some of Tobias's patch from March 2024.

Co-Authored-By: Tobias Burnus <tburnus@baylibre.com>
gcc/ChangeLog:

* config.gcc (amdgcn): Add gcn-device-macros.h to tm_file.
Add gcn-tables.opt to extra_options.
* config/gcn/gcn-hsa.h (NO_XNACK): Delete.
(NO_SRAM_ECC): Delete.
(SRAMOPT): Move definition to generated file gcn-device-macros.h.
(XNACKOPT): Likewise.
(ASM_SPEC): Redefine using generated values from gcn-device-macros.h.
* config/gcn/gcn-opts.h
(enum processor_type): Generate from gcn-devices.def.
(TARGET_VEGA10): Delete.
(TARGET_VEGA20): Delete.
(TARGET_GFX908): Delete.
(TARGET_GFX90a): Delete.
(TARGET_GFX90c): Delete.
(TARGET_GFX1030): Delete.
(TARGET_GFX1036): Delete.
(TARGET_GFX1100): Delete.
(TARGET_GFX1103): Delete.
(TARGET_XNACK): Redefine to allow for HSACO_ATTR_UNSUPPORTED.
(enum hsaco_attr_type): Add HSACO_ATTR_UNSUPPORTED.
(TARGET_TGSPLIT): New define.
* config/gcn/gcn.cc (gcn_devices): New constant table.
(gcn_option_override): Rework to use gcn_devices table.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
(gcn_hsa_declare_function_name): Rework using TARGET_* macros.
* config/gcn/gcn.h (gcn_devices): Declare struct and table.
(TARGET_CPU_CPP_BUILTINS): Rework using gcn_devices.
* config/gcn/gcn.opt: Move enum data to generated file gcn-tables.opt.
Use new names for the default values.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX900): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX906): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX908): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90a): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX90c): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1030): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1036): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1100): Delete.
(EF_AMDGPU_MACH_AMDGCN_GFX1103): Delete.
(enum elf_arch_code): Define using gcn-devices.def.
(get_arch): Rework using gcn-devices.def.
(main): Rework using gcn-devices.def
* config/gcn/t-gcn-hsa (gcn-tables.opt): Generate file.
(gcn-device-macros.h): Generate file.
* config/gcn/t-omp-device: Generate isa list from gcn-devices.def.
* config/gcn/gcn-devices.def: New file.
* config/gcn/gcn-tables.opt: New file.
* config/gcn/gcn-tables.opt.urls: New file.
* config/gcn/gen-gcn-device-macros.awk: New file.
* config/gcn/gen-opt-tables.awk: New file.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Generate from gcn-devices.def.
(gcn_gfx803_s): Delete.
(gcn_gfx900_s): Delete.
(gcn_gfx906_s): Delete.
(gcn_gfx908_s): Delete.
(gcn_gfx90a_s): Delete.
(gcn_gfx90c_s): Delete.
(gcn_gfx1030_s): Delete.
(gcn_gfx1036_s): Delete.
(gcn_gfx1100_s): Delete.
(gcn_gfx1103_s): Delete.
(gcn_isa_name_len): Delete.
(isa_hsa_name): Rename ...
(isa_name): ... to this, and rework using gcn-devices.def.
(isa_gcc_name): Delete.
(isa_code): Rework using gcn-devices.def.
(max_isa_vgprs): Rework using gcn-devices.def.
(isa_matches_agent): Update isa_name usage.
(GOMP_OFFLOAD_init_device): Improve diagnostic using the name.

tree-optimization/117123 - missed PHI equivalence in VN

Value-numbering can use its set of equivalences to prove that
a PHI node with args <a_1, 5, 10> is equal to a_1 iff on the
edges with the constants a_1 == 5 and a_1 == 10 hold.  This
breaks down when the order of PHI args is <5, 10, a_1> as then
we drop to VARYING early.  The following mitigates this by
shuffling a copy of the edge vector to always process a SSA name
argument first.  Which should also handle the special-case of
a two argument <5, a_1> we already had.

PR tree-optimization/117123
* tree-ssa-sccvn.cc (visit_phi): First process a non-constant
argument edge to handle more equivalences.  Remove the
two-arg special case.

* g++.dg/tree-ssa/pr117123.C: New testcase.

testsuite: Fix typo in ext-floating19.C

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/ext-floating19.C: Fix typo for bfloat16 guard.

RISC-V: Add testcases for unsigned .SAT_SUB form 1 with IMM = 1.

form 1:
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
{                                       \
  return (T)IMM >= y ? (T)IMM - y : 0;  \
}

Passed the rv64gcv regression test.

Change-Id: I8805225b445cdbbc685f4f54a4d66c7ee8f748e1
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_sub_imm-1_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4_2.c: New test.

Match: Support IMM=1 for unsigned scalar .SAT_SUB IMM form 1

This patch would like to support .SAT_SUB when one of the op
is IMM = 1 of form1.

Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
{                                       \
   return IMM >= y ? IMM - y : 0;        \
}

Take below form 1 as example:
DEF_SAT_U_SUB_IMM_FMT_1(uint8_t, 1)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_1 (uint8_t y)
{
  uint8_t _1;
  uint8_t _3;

  <bb 2> [local count: 1073741824]:
  if (y_2(D) <= 1)
    goto <bb 3>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 3> [local count: 440234144]:
  _3 = y_2(D) ^ 1;

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <0(2), _3(3)>
  return _1;

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_1 (uint8_t y)
{
  uint8_t _1;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = .SAT_SUB (1, y_2(D)); [tail call]
  return _1;
;;    succ:       EXIT

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Support IMM=1.

RISC-V: Add testcases for unsigned .SAT_SUB form 1 with IMM = max -1.

form 1:
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
{                                       \
  return (T)IMM >= y ? (T)IMM - y : 0;  \
}

Passed the rv64gcv regression test.

Change-Id: Idaa1ab41f2a5785112279ea8ee2c93236457b740
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_sub_imm-1_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4_1.c: New test.

Match: Support IMM=max-1 for unsigned scalar .SAT_SUB IMM form 1

This patch would like to support .SAT_SUB when one of the op
is IMM = max - 1 of form1.

Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline))             \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
{                                       \
   return IMM >= y ? IMM - y : 0;        \
}

Take below form 1 as example:
DEF_SAT_U_SUB_IMM_FMT_1(uint8_t, 254)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm254_uint8_t_fmt_1 (uint8_t y)
{
  uint8_t _1;
  uint8_t _3;

  <bb 2> [local count: 1073741824]:
  if (y_2(D) != 255)
    goto <bb 3>; [66.00%]
  else
    goto <bb 4>; [34.00%]

  <bb 3> [local count: 708669600]:
  _3 = 254 - y_2(D);

  <bb 4> [local count: 1073741824]:
  # _1 = PHI <0(2), _3(3)>
  return _1;

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm254_uint8_t_fmt_1 (uint8_t y)
{
  uint8_t _1;

  <bb 2> [local count: 1073741824]:
  _1 = .SAT_SUB (254, y_2(D)); [tail call]
  return _1;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Support IMM=max-1.

Daily bump.

[committed][PR rtl-optimization/116488] Fix SIGN_EXTEND source handling in ext-dce

A while back I noticed that the code to call carry_backpropagate was being
called after the optimization step.  Which seemed wrong, but at the time I
didn't have a testcase showing it as a problem.  Now I have 4 :-)

The way things used to work, the extension would be stripped away before
calling carry_backpropagte, meaning carry_backpropagate would never see a
SIGN_EXTENSION.  Thus the code trying to account for the sign extended bit was
never reached.

Getting that bit marked live is what's needed to fix these testcases. Fallout
is minor with just an adjustment needed to sensibly deal with vector modes in a
place where we didn't have them before.

I'm still somewhat concerned about this code.  Specifically whether or not we
can get in here with arbitrarily complex RTL, and if so do we need to recurse
down and look at those sub-expressions.

So while this patch fixes the most pressing issue, I wouldn't be terribly
surprised if we're back inside this code at some point.

Bootstrapped and regression tested on x86_64, ppc64le, riscv64, s390x, mips64,
loongarch, aarch64, m68k, alpha, hppa, sh4, sh4eb, perhaps something else that
I've forgotten...  Also tested on all the crosses in my tester.

PR rtl-optimization/116488
PR rtl-optimization/116579
PR rtl-optimization/116915
PR rtl-optimization/117226
gcc/
* ext-dce.cc (carry_backpropagate): Properly handle SIGN_EXTEND, add
ZERO_EXTEND handling as well.
(ext_dce_process_uses): Call carry_backpropagate before the optimization
step.

gcc/testsuite/
* gcc.dg/torture/pr116488.c: New test.
* gcc.dg/torture/pr116579.c: New test.
* gcc.dg/torture/pr116915.c: New test.
* gcc.dg/torture/pr117226.c: New test.

RISC-V: Add testcases for form 8 of vector signed SAT_TRUNC

Form 8:
  #define DEF_VEC_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_8 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN >= x || x >= (WT)NT_MAX                     \
  ? x < 0 ? NT_MIN : NT_MAX                                     \
  : trunc;                                                      \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-8-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-8-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 7 of vector signed SAT_TRUNC

Form 7:
  #define DEF_VEC_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_7 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN > x || x >= (WT)NT_MAX                      \
  ? x < 0 ? NT_MIN : NT_MAX                                     \
  : trunc;                                                      \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-7-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-7-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 6 of vector signed SAT_TRUNC

Form 6:
  #define DEF_VEC_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_6 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN >= x || x > (WT)NT_MAX                      \
  ? x < 0 ? NT_MIN : NT_MAX                                     \
  j: trunc;                                                      \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-6-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-6-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 5 of vector signed SAT_TRUNC

Form 5:
  #define DEF_VEC_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_5 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN > x || x > (WT)NT_MAX                       \
  ? x < 0 ? NT_MIN : NT_MAX                                     \
  : trunc;                                                      \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-5-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-5-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 4 of vector signed SAT_TRUNC

Form 4:
  #define DEF_VEC_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_4 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN <= x && x < (WT)NT_MAX                      \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-4-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-4-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 3 of vector signed SAT_TRUNC

Form 3:
  #define DEF_VEC_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_3 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN < x && x < (WT)NT_MAX                       \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-3-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-3-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 2 of vector signed SAT_TRUNC

Form 2:
  #define DEF_VEC_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_2 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN < x && x < (WT)NT_MAX                       \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-2-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-2-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for form 1 of vector signed SAT_TRUNC

Form 1:
  #define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX                     \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: Add test data for
signed SAT_TRUNC.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-1-i64-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i16-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i32-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i32-to-i8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i32.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_s_trunc-run-1-i64-to-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Implement vector SAT_TRUNC for signed integer

This patch would like to implement the sstrunc for vector signed integer.

Form 1:
  #define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX                     \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX)

Before this patch:
  27   │     vsetvli a5,a2,e64,m1,ta,ma
  28   │     vle64.v v1,0(a1)
  29   │     slli    a3,a5,3
  30   │     slli    a4,a5,2
  31   │     sub a2,a2,a5
  32   │     add a1,a1,a3
  33   │     vadd.vv v0,v1,v5
  34   │     vsetvli zero,zero,e32,mf2,ta,ma
  35   │     vnsrl.wx    v2,v1,a6
  36   │     vncvt.x.x.w v1,v1
  37   │     vsetvli zero,zero,e64,m1,ta,ma
  38   │     vmsgtu.vv   v0,v0,v4
  39   │     vsetvli zero,zero,e32,mf2,ta,mu
  40   │     vneg.v  v2,v2
  41   │     vxor.vv v1,v2,v3,v0.t
  42   │     vse32.v v1,0(a0)
  43   │     add a0,a0,a4
  44   │     bne a2,zero,.L3

After this patch:
  16   │     vsetvli a5,a2,e32,mf2,ta,ma
  17   │     vle64.v v1,0(a1)
  18   │     slli    a3,a5,3
  19   │     slli    a4,a5,2
  20   │     sub a2,a2,a5
  21   │     add a1,a1,a3
  22   │     vnclip.wi   v1,v1,0
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a2,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (sstrunc<mode><v_double_trunc>2): Add
new pattern sstrunc for double trunc.
(sstrunc<mode><v_quad_trunc>2): Ditto but for quad trunc.
(sstrunc<mode><v_oct_trunc>2): Ditto but for oct trunc.
* config/riscv/riscv-protos.h (expand_vec_double_sstrunc): Add
new func decl to expand double trunc.
(expand_vec_quad_sstrunc): Ditto but for quad trunc.
(expand_vec_oct_sstrunc): Ditto but for oct trunc.
* config/riscv/riscv-v.cc (expand_vec_double_sstrunc): Add new
func to expand double trunc.
(expand_vec_quad_sstrunc): Ditto but for quad trunc.
(expand_vec_oct_sstrunc): Ditto but for oct trunc.

Signed-off-by: Pan Li <pan2.li@intel.com>

Vect: Try the pattern of vector signed integer SAT_TRUNC

Almost the same as vector unsigned integer SAT_TRUNC, try to match
the signed version during the vector pattern matching.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_signed_integer_sat_trunc): Add
new func decl for signed SAT_TRUNC.
(vect_recog_sat_trunc_pattern): Try signed match pattern for
the SAT_TRUNC.

Signed-off-by: Pan Li <pan2.li@intel.com>

Match: Support form 1 for vector signed integer SAT_TRUNC

This patch would like to support the form 1 of the vector signed
integer SAT_TRUNC.  Aka below example:

Form 1:
  #define DEF_VEC_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX)             \
  void __attribute__((noinline))                                        \
  vec_sat_s_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        NT trunc = (NT)x;                                               \
        out[i] = (WT)NT_MIN <= x && x <= (WT)NT_MAX                     \
  ? trunc                                                       \
  : x < 0 ? NT_MIN : NT_MAX;                                    \
      }                                                                 \
  }

DEF_VEC_SAT_S_TRUNC_FMT_1(int32_t, int64_t, INT32_MIN, INT32_MAX)

Before this patch:
  48   │   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [2, 2]);
  49   │   ivtmp_64 = _87 * 8;
  50   │   vect_x_14.10_67 = .MASK_LEN_LOAD (vectp_in.8_65, 64B, { -1, ... }, _87, 0);
  51   │   vect_trunc_15.21_78 = (vector([2,2]) int) vect_x_14.10_67;
  52   │   _61 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned long>(vect_x_14.10_67);
  53   │   _32 = _61 >> 63;
  54   │   vect_patt_52.16_73 = (vector([2,2]) int) _32;
  55   │   vect__46.17_74 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>(vect_patt_52.16_73);
  56   │   vect__47.18_75 = -vect__46.17_74;
  57   │   vect__21.19_76 = VIEW_CONVERT_EXPR<vector([2,2]) int>(vect__47.18_75);
  58   │   vect_x.11_68 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned long>(vect_x_14.10_67);
  59   │   vect__5.12_69 = vect_x.11_68 + { 2147483648, ... };
  60   │   mask__34.13_70 = vect__5.12_69 > { 4294967295, ... };
  61   │   _25 = .COND_XOR (mask__34.13_70, vect__21.19_76, { 2147483647, ... }, vect_trunc_15.21_78);
  62   │   ivtmp_80 = _87 * 4;
  63   │   .MASK_LEN_STORE (vectp_out.23_81, 32B, { -1, ... }, _87, 0, _25);
  64   │   vectp_in.8_66 = vectp_in.8_65 + ivtmp_64;
  65   │   vectp_out.23_82 = vectp_out.23_81 + ivtmp_80;
  66   │   ivtmp_86 = ivtmp_85 - _87;

After this patch:
  38   │   _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
  39   │   ivtmp_65 = _77 * 8;
  40   │   vect_x_14.10_68 = .MASK_LEN_LOAD (vectp_in.8_66, 64B, { -1, ... }, _77, 0);
  41   │   vect_patt_53.11_69 = .SAT_TRUNC (vect_x_14.10_68);
  42   │   ivtmp_70 = _77 * 4;
  43   │   .MASK_LEN_STORE (vectp_out.12_71, 32B, { -1, ... }, _77, 0, vect_patt_53.11_69);
  44   │   vectp_in.8_67 = vectp_in.8_66 + ivtmp_65;
  45   │   vectp_out.12_72 = vectp_out.12_71 + ivtmp_70;
  46   │   ivtmp_76 = ivtmp_75 - _77;

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refine matching for vector signed SAT_TRUNC form 1.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Fix costing of move to/from MOVEABLE_SYSREGS

This is necessary to prevent reload assuming that a direct FP->FPMR move
is valid.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_register_move_cost):
Increase costs involving MOVEABLE_SYSREGS.

amdgcn: silence warning

FIRST_SGPR_REG is register zero so the compiler always claims this comparison
is redundant. It's right, of course, but I'd have preferred to keep the
comparison for completeness. Probably the "correct" solution is to use an enum
for these values.

gcc/ChangeLog:

* config/gcn/gcn.h (SGPR_REGNO_P): Silence warning.

pair-fusion: Assume alias conflict if common address reg changes [PR116783]

As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn. This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

gcc/ChangeLog:

PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.

libstdc++: Improve 26_numerics/headers/cmath/types_std_c++0x_neg.cc

This test checks that the special functions in <cmath> are not declared
prior to C++17. But we can remove the target selector and allow it to be
tested for C++17 and later, and add target selectors to the individual
dg-error directives instead.

Also rename the test to match what it actually tests.

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/headers/cmath/types_std_c++0x_neg.cc:
Move to ...
* testsuite/26_numerics/headers/cmath/specfun_c++17.cc: here and
adjust test to be valid for all -std dialects.

libstdc++: Simplify C++98 std::vector::_M_data_ptr overload set

We don't need separate overloads for returning a const or non-const
pointer. We can make the member function const and return a non-const
pointer, and let vector::data() const convert it to const as needed.

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (vector::_M_data_ptr): Remove
non-const overloads. Always return non-const pointer.

libstdc++: Fix order of [[...]] and __attribute__((...)) attrs [PR117220]

GCC allows these in either order, but Clang doesn't like the C++11-style
[[__nodiscard__]] coming after __attribute__((__always_inline__)).

libstdc++-v3/ChangeLog:

PR libstdc++/117220
* include/bits/stl_iterator.h: Move _GLIBCXX_NODISCARD
annotations after __attribute__((__always_inline__)).

rs6000: Correct the function code for _AMO_LD_DEC_BOUNDED

Corrected the function code for the Atomic Memory Operation "Fetch and Decrement
Bounded", changing it from 0x1A to 0x1C.

2024-10-11 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/

* config/rs6000/amo.h (enum _AMO_LD): Correct the function code for
_AMO_LD_DEC_BOUNDED.

i386: Refactor get_intel_cpu

From ISE, it shows that we will have family 0x13 for Diamond Rapids.
Therefore, we need to refactor the get_intel_cpu to accept new families.
Also I did some reorder in the switch for clearness by putting earlier
added products on top for search convenience.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Refactor the
function for future expansion on different family.

RISC-V: Skip flag -flto for all saturated arithmetic test cases.

Skip flat -flto to address UNRESOLVED cases as follows:

gcc.target/riscv/sat_s_add-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects: output file does not exist
UNRESOLVED: gcc.target/riscv/sat_s_add-1.c

Change-Id: I7ff55197b6294cd473dfaa6cc350c5e2eb5960fe
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_s_add-1.c: Skip flag -flto.
* gcc.target/riscv/sat_s_add-10.c: Ditto.
* gcc.target/riscv/sat_s_add-11.c: Ditto.
* gcc.target/riscv/sat_s_add-12.c: Ditto.
* gcc.target/riscv/sat_s_add-13.c: Ditto.
* gcc.target/riscv/sat_s_add-14.c: Ditto.
* gcc.target/riscv/sat_s_add-15.c: Ditto.
* gcc.target/riscv/sat_s_add-16.c: Ditto.
* gcc.target/riscv/sat_s_add-2.c: Ditto.
* gcc.target/riscv/sat_s_add-3.c: Ditto.
* gcc.target/riscv/sat_s_add-4.c: Ditto.
* gcc.target/riscv/sat_s_add-5.c: Ditto.
* gcc.target/riscv/sat_s_add-6.c: Ditto.
* gcc.target/riscv/sat_s_add-7.c: Ditto.
* gcc.target/riscv/sat_s_add-8.c: Ditto.
* gcc.target/riscv/sat_s_add-9.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-1-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-2-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-3-i8.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i16.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i32.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i64.c: Ditto.
* gcc.target/riscv/sat_s_sub-4-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-1-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-2-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-3-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: Ditto.
* gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: Ditto.
* gcc.target/riscv/sat_u_add-1.c: Ditto.
* gcc.target/riscv/sat_u_add-10.c: Ditto.
* gcc.target/riscv/sat_u_add-11.c: Ditto.
* gcc.target/riscv/sat_u_add-12.c: Ditto.
* gcc.target/riscv/sat_u_add-13.c: Ditto.
* gcc.target/riscv/sat_u_add-14.c: Ditto.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-16.c: Ditto.
* gcc.target/riscv/sat_u_add-17.c: Ditto.
* gcc.target/riscv/sat_u_add-18.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-2.c: Ditto.
* gcc.target/riscv/sat_u_add-20.c: Ditto.
* gcc.target/riscv/sat_u_add-21.c: Ditto.
* gcc.target/riscv/sat_u_add-22.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-24.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-4.c: Ditto.
* gcc.target/riscv/sat_u_add-5.c: Ditto.
* gcc.target/riscv/sat_u_add-6.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add-8.c: Ditto.
* gcc.target/riscv/sat_u_add-9.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-1.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-10.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-13.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-14.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-2.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-4.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-5.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-6.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-8.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-9.c: Ditto.
* gcc.target/riscv/sat_u_sub-1.c: Ditto.
* gcc.target/riscv/sat_u_sub-10.c: Ditto.
* gcc.target/riscv/sat_u_sub-11.c: Ditto.
* gcc.target/riscv/sat_u_sub-12.c: Ditto.
* gcc.target/riscv/sat_u_sub-13.c: Ditto.
* gcc.target/riscv/sat_u_sub-14.c: Ditto.
* gcc.target/riscv/sat_u_sub-15.c: Ditto.
* gcc.target/riscv/sat_u_sub-16.c: Ditto.
* gcc.target/riscv/sat_u_sub-17.c: Ditto.
* gcc.target/riscv/sat_u_sub-18.c: Ditto.
* gcc.target/riscv/sat_u_sub-19.c: Ditto.
* gcc.target/riscv/sat_u_sub-2.c: Ditto.
* gcc.target/riscv/sat_u_sub-20.c: Ditto.
* gcc.target/riscv/sat_u_sub-21.c: Ditto.
* gcc.target/riscv/sat_u_sub-22.c: Ditto.
* gcc.target/riscv/sat_u_sub-23.c: Ditto.
* gcc.target/riscv/sat_u_sub-24.c: Ditto.
* gcc.target/riscv/sat_u_sub-25.c: Ditto.
* gcc.target/riscv/sat_u_sub-26.c: Ditto.
* gcc.target/riscv/sat_u_sub-27.c: Ditto.
* gcc.target/riscv/sat_u_sub-28.c: Ditto.
* gcc.target/riscv/sat_u_sub-29.c: Ditto.
* gcc.target/riscv/sat_u_sub-3.c: Ditto.
* gcc.target/riscv/sat_u_sub-30.c: Ditto.
* gcc.target/riscv/sat_u_sub-31.c: Ditto.
* gcc.target/riscv/sat_u_sub-32.c: Ditto.
* gcc.target/riscv/sat_u_sub-33.c: Ditto.
* gcc.target/riscv/sat_u_sub-34.c: Ditto.
* gcc.target/riscv/sat_u_sub-35.c: Ditto.
* gcc.target/riscv/sat_u_sub-36.c: Ditto.
* gcc.target/riscv/sat_u_sub-37.c: Ditto.
* gcc.target/riscv/sat_u_sub-38.c: Ditto.
* gcc.target/riscv/sat_u_sub-39.c: Ditto.
* gcc.target/riscv/sat_u_sub-4.c: Ditto.
* gcc.target/riscv/sat_u_sub-40.c: Ditto.
* gcc.target/riscv/sat_u_sub-41.c: Ditto.
* gcc.target/riscv/sat_u_sub-42.c: Ditto.
* gcc.target/riscv/sat_u_sub-43.c: Ditto.
* gcc.target/riscv/sat_u_sub-44.c: Ditto.
* gcc.target/riscv/sat_u_sub-45.c: Ditto.
* gcc.target/riscv/sat_u_sub-46.c: Ditto.
* gcc.target/riscv/sat_u_sub-47.c: Ditto.
* gcc.target/riscv/sat_u_sub-48.c: Ditto.
* gcc.target/riscv/sat_u_sub-5.c: Ditto.
* gcc.target/riscv/sat_u_sub-6.c: Ditto.
* gcc.target/riscv/sat_u_sub-7.c: Ditto.
* gcc.target/riscv/sat_u_sub-8.c: Ditto.
* gcc.target/riscv/sat_u_sub-9.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-13_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-14_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-4.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-8.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-1.c: Ditto.
* gcc.target/riscv/sat_u_trunc-10.c: Ditto.
* gcc.target/riscv/sat_u_trunc-11.c: Ditto.
* gcc.target/riscv/sat_u_trunc-12.c: Ditto.
* gcc.target/riscv/sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/sat_u_trunc-16.c: Ditto.
* gcc.target/riscv/sat_u_trunc-17.c: Ditto.
* gcc.target/riscv/sat_u_trunc-18.c: Ditto.
* gcc.target/riscv/sat_u_trunc-19.c: Ditto.
* gcc.target/riscv/sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/sat_u_trunc-20.c: Ditto.
* gcc.target/riscv/sat_u_trunc-21.c: Ditto.
* gcc.target/riscv/sat_u_trunc-22.c: Ditto.
* gcc.target/riscv/sat_u_trunc-23.c: Ditto.
* gcc.target/riscv/sat_u_trunc-24.c: Ditto.
* gcc.target/riscv/sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/sat_u_trunc-9.c: Ditto.

[testsuite] [arm] add effective target and options for pacbti tests

arm pac and bti tests that use -march=armv8.1-m.main get an implicit
-mthumb, that is incompatible with vxworks kernel mode.  Declaring the
requirement for a 8.1-m.main-compatible toolchain is enough to avoid
those fails, because the toolchain feature test fails in kernel mode,
but taking the -march options from the standardized arch tests, after
testing for support for the corresponding effective target, makes it
generally safer, and enables us to drop skip directives and extraneous
option variants.

for  gcc/testsuite/ChangeLog

* gcc.target/arm/bti-1.c: Require arch, use its opts, drop skip.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.
* g++.target/arm/pac-1.C: Likewise.  Drop +mve.

Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"

r12-6103-g1a7ce8570997eb combines vpcmpuw + zero_extend to vpcmpuw
with the pre_reload splitter, but the splitter transforms the
zero_extend into a subreg which make reload think the upper part is
garbage, it's not correct.

The patch adjusts the zero_extend define_insn_and_split to
define_insn to keep zero_extend.

gcc/ChangeLog:

PR target/117159
* config/i386/sse.md
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Change from define_insn_and_split to define_insn.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Split to the zero_extend pattern.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117159.c: New test.
* gcc.target/i386/avx512bw-pr103750-1.c: Remove xfail.
* gcc.target/i386/avx512bw-pr103750-2.c: Remove xfail.

Daily bump.

Revert "[PATCH 7/7] RISC-V: Disable by pieces for vector setmem length > UNITS_PER_WORD"

This reverts commit 72ceddbfb78dbb95f0808c3eca1765e8cd48b023.

modula2: M2MetaError.{def,mod} and P2SymBuild.mod further cleanup

Further cleanups and improve the wording of an error message.

gcc/m2/ChangeLog:

* gm2-compiler/M2MetaError.mod (op): Corrected ordering.
* gm2-compiler/P2SymBuild.def: Remove comment.
* gm2-compiler/P2SymBuild.mod (GetComparison): Replace
the word less with fewer.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

diagnostics: libcpp: Improve locations for _Pragma lexing diagnostics [PR114423]

libcpp is not currently set up to be able to generate valid
locations for tokens lexed from a _Pragma string. Instead, after obtaining
the tokens, it sets their locations all to the location of the _Pragma
operator itself. This makes things like _Pragma("GCC diagnostic") work well
enough, but if any diagnostics are issued during lexing, prior to resetting
the token locations, those diagnostics get issued at the invalid
locations. Fix that up by adding a new field pfile->diagnostic_override_loc
that instructs libcpp to issue diagnostics at the alternate location.

libcpp/ChangeLog:

PR preprocessor/114423
* internal.h (struct cpp_reader): Add DIAGNOSTIC_OVERRIDE_LOC
field.
* directives.cc (destringize_and_run): Set the new field to the
location of the _Pragma operator.
* errors.cc (cpp_diagnostic_at): Support DIAGNOSTIC_OVERRIDE_LOC to
temporarily issue diagnostics at a different location.
(cpp_diagnostic_with_line): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/114423
* c-c++-common/cpp/pragma-diagnostic-loc.c: New test.
* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust expected output.
* g++.dg/pch/operator-1.C: Likewise.

modula2: Tidyup gm2-compiler/M2MetaError.mod

This patch is a tidyup for gm2-compiler/M2MetaError.mod.

gcc/m2/ChangeLog:

* gm2-compiler/M2MetaError.mod (op): Alphabetically order
each case label and comment.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

phiopt: do factor_out_conditional_operation for all phis [PR112418]

Sometimes factor_out_conditional_operation can factor out
an operation that causes a phi node to become the same element.
Other times, we want to factor out a binary operator because
it can improve code generation, an example is PR 110015 (openjpeg).

Note this includes a heuristic to decide if factoring out the operation
is profitable or not. It can be expanded to include a better live range
extend detector. Right now it has a simple one where if it is live on a
dominating path, it is considered a live or if there are a small # of
assign statements (defaults to 5), then it does not extend the live range
too much.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/112418

gcc/ChangeLog:

* tree-ssa-phiopt.cc (is_factor_profitable): New function.
(factor_out_conditional_operation): Add merge argument. Remove
arg0/arg1 arguments. Return bool instead of the new phi.
Early return for virtual ops. Call is_factor_profitable to
check if the factoring would be profitable.
(pass_phiopt::execute): Call factor_out_conditional_operation
on all phis instead of just singleton phi.
* doc/invoke.texi (--param phiopt-factor-max-stmts-live=): Document.
* params.opt (--param=phiopt-factor-max-stmts-live=): New opt.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/factor_op_phi-1.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-2.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-3.c: New test.
* gcc.dg/tree-ssa/factor_op_phi-4.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH][v5] RISC-V: add option -m(no-)autovec-segment

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, vec_mask_len_store_lanes):
Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New macro.
* config/riscv/riscv.opt (-m(no-)autovec-segment): New option.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/no-segment.c: New test.

Add missing dg-error to unsigned_38.f90.

gcc/testsuite/ChangeLog:

PR fortran/117225
* gfortran.dg/unsigned_38.f90: Add missing dg-error directive.

[PATCH 7/7] RISC-V: Disable by pieces for vector setmem length > UNITS_PER_WORD

For fast unaligned access targets, by pieces uses up to UNITS_PER_WORD
size pieces resulting in more store instructions than needed.  For
example gcc.target/riscv/rvv/base/setmem-1.c:f1 built with
`-O3 -march=rv64gcv -mtune=thead-c906`:
```
f1:
        vsetivli        zero,8,e8,mf2,ta,ma
        vmv.v.x v1,a1
        vsetivli        zero,0,e32,mf2,ta,ma
        sb      a1,14(a0)
        vmv.x.s a4,v1
        vsetivli        zero,8,e16,m1,ta,ma
        vmv.x.s a5,v1
        vse8.v  v1,0(a0)
        sw      a4,8(a0)
        sh      a5,12(a0)
        ret
```

The slow unaligned access version built with `-O3 -march=rv64gcv` used
15 sb instructions:
```
f1:
        sb      a1,0(a0)
        sb      a1,1(a0)
        sb      a1,2(a0)
        sb      a1,3(a0)
        sb      a1,4(a0)
        sb      a1,5(a0)
        sb      a1,6(a0)
        sb      a1,7(a0)
        sb      a1,8(a0)
        sb      a1,9(a0)
        sb      a1,10(a0)
        sb      a1,11(a0)
        sb      a1,12(a0)
        sb      a1,13(a0)
        sb      a1,14(a0)
        ret
```

After this patch, the following is generated in both cases:
```
f1:
        vsetivli        zero,15,e8,m1,ta,ma
        vmv.v.x v1,a1
        vse8.v  v1,0(a0)
        ret
```

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p):
New function.
(TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Define.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113469.c: Expect mf2 setmem.
* gcc.target/riscv/rvv/base/setmem-2.c: Update f1 to expect
straight-line vector memset.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise.

[PATCH 5/7] RISC-V: Move vector memcpy decision making to separate function [NFC]

This moves the code for deciding whether to generate a vectorized
memcpy, what vector mode to use and whether a loop is needed out of
riscv_vector::expand_block_move and into a new function
riscv_vector::use_stringop_p so that it can be reused for other string
operations.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (struct stringop_info): New.
(expand_block_move): Move decision making code to...
(use_vector_stringop_p): ...here.

[PATCH 4/7] RISC-V: Honour -mrvv-max-lmul in riscv_vector::expand_block_move

Unlike the other vector string ops, expand_block_move was using max LMUL
m8 regardless of TARGET_MAX_LMUL.

The check for whether to generate inline vector code for movmem has been
moved from movmem<mode> to riscv_vector::expand_block_move to avoid
maintaining multiple versions of similar logic.  They already differed
on the minimum length for which they would generate vector code.  Now
that the expand_block_move value is used, movmem will be generated for
smaller lengths.

Limiting memcpy to m1 caused some memcpy loops to be generated in
the calling convention tests which makes it awkward to add suitable scan
assembler tests checking the return value being set, so
-mrvv-max-lmul=m8 has been added to these tests.  Other tests have been
adjusted to expect the new memcpy m1 generation where reasonably
straight-forward, otherwise -mrvv-max-lmul=m8 has been added.

pr111720-[0-9].c regressed because a memcpy loop is generated instead
of straight-line.  This reveals an existing issue where a redundant
straight-line memcpy gets eliminated but a memcpy loop does not
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117205).

For example, on pr111720-0.c after this patch:

-mrvv-max-lmul=m8:

test:
lui a5,%hi(.LANCHOR0)
li a4,32
addi sp,sp,-32
addi a5,a5,%lo(.LANCHOR0)
vsetvli zero,a4,e8,m1,ta,ma
vle8.v v8,0(a5)
addi sp,sp,32
jr ra

-mrvv-max-lmul=m1:

test:
addi sp,sp,-32
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
mv a2,sp
li a3,32
.L2:
vsetvli a4,a3,e8,m1,ta,ma
vle8.v v8,0(a5)
sub a3,a3,a4
add a5,a5,a4
vse8.v v8,0(a2)
add a2,a2,a4
bne a3,zero,.L2
li a5,32
vsetvli zero,a5,e8,m1,ta,ma
vle8.v v8,0(sp)
addi sp,sp,32
jr ra

I have added -mrvv-max-lmul=m8 to pr111720-[0-9].c so that we continue
to test the elimination of straight-line memcpy.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_lmul_mode): New prototype.
(expand_block_move): Add bool parameter for movmem_p.
* config/riscv/riscv-string.cc (riscv_expand_block_move_scalar):
Pass movmem_p as false to riscv_vector::expand_block_move.
(expand_block_move): Add movmem_p parameter.  Return false if
loop needed and movmem_p is true.  Respect TARGET_MAX_LMUL.
* config/riscv/riscv-v.cc (get_lmul_mode): New function.
* config/riscv/riscv.md (movmem<mode>): Move checking for
whether to generate inline vector code to
riscv_vector::expand_block_move by passing movmem_p as true.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113206-1.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/pr113206-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Add
-mrvv-max-lmul=m8 and adjust assembly scans.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c:
Likewise.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Add
-mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-1.c: Expect m1 in f1 and f2.
* gcc.target/riscv/rvv/base/cpymem-2.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/movmem-1.c: Adjust f1 to a length
that will not get vectorized.
* gcc.target/riscv/rvv/base/pr111720-0.c: Add -mrvv-max-lmul=m8.
* gcc.target/riscv/rvv/base/pr111720-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-2.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-3.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-4.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-5.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-6.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-7.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-8.c: Likewise.
* gcc.target/riscv/rvv/base/pr111720-9.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect memcpy m1
loops.
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise.

PR modula2/115328 The FORWARD keyword is not implemented

This patch implements the FORWARD keyword found in the ISO standard.
The patch checks incoming parameters against the prior declaration
found in definition/forward sections and will issue an error based
on virtual tokens highlighing the full parameter declaration.

gcc/m2/ChangeLog:

PR modula2/115328
* gm2-compiler/M2MetaError.def: Extend comment documentating
new format specifiers.
* gm2-compiler/M2MetaError.mod (GetTokProcedure): New declaration.
(doErrorScopeModule): New procedure.
(doErrorScopeForward): Ditto.
(doErrorScopeMod): Reimplement.
(doErrorScopeFor): New procedure.
(declarationMod): Ditto.
(doErrorScopeDefinition): Ditto.
(doErrorScopeDef): Reimplement.
(declaredDef): New procedure.
(declaredFor): Ditto.
(doErrorScopeProc): Ditto.
(declaredVar): Ditto.
(declaredType): Ditto.
(declaredFull): Ditto.
* gm2-compiler/M2Options.mod (SetAutoInit): Add missing
return type.
(GetDumpGimple): Remove duplicate implementation.
* gm2-compiler/M2Quads.def (DupFrame): New procedure.
* gm2-compiler/M2Quads.mod (DupFrame): New procedure.
* gm2-compiler/M2Reserved.def (ForwardTok): New variable.
* gm2-compiler/M2Reserved.mod (ForwardTok): Initialize variable.
* gm2-compiler/M2Scaffold.mod (DeclareArgEnvParams): Add
tokno parameter for call to PutParam.
* gm2-compiler/P0SymBuild.def (EndForward): New procedure.
* gm2-compiler/P0SymBuild.mod (EndForward): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (BlockAssert): New procedure.
(ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P1Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P1SymBuild.def (Export): Removed unnecessary
export.
(EndBuildForward): New procedure.
* gm2-compiler/P1SymBuild.mod (StartBuildProcedure): Reimplement.
(EndBuildProcedure): Ditto.
(EndBuildForward): Ditto.
* gm2-compiler/P2Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P2SymBuild.def (BuildProcedureDefinedByForward):
New procedure.
(BuildProcedureDefinedByProper): Ditto.
(CheckProcedure): Ditto.
(EndBuildForward): Ditto.
* gm2-compiler/P2SymBuild.mod (EndBuildProcedure): Reimplement.
(EndBuildForward): New procedure.
(BuildFPSection): Reimplement to allow forward declaration or
checking of parameters.
(BuildProcedureDefinedByProper): New procedure.
(BuildProcedureDefinedByForward): Ditto
(FailParameter): Remove.
(ParameterError): New procedure.
(ParameterMismatch): Ditto.
(EndBuildFormalParameters): Add parameter number check.
(GetComparison): New procedure function.
(GetSourceDesc): Ditto.
(GetCurSrcDesc): Ditto.
(GetDeclared): New procedure.
(ReturnTypeMismatch): Ditto.
(BuildFunction): Reimplement.
(CheckProcedure): New procedure.
(CheckFormalParameterSection): Reimplement using ParameterError.
* gm2-compiler/P3Build.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/P3SymBuild.def (Export): Remove unnecessary export.
(EndBuildForward): New procedure.
* gm2-compiler/P3SymBuild.mod (EndBuildForward): New procedure.
* gm2-compiler/PCBuild.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/PCSymBuild.def (EndBuildForward): New procedure.
* gm2-compiler/PCSymBuild.mod (EndBuildForward): Ditto.
* gm2-compiler/PHBuild.bnf (ProcedureDeclaration): Reimplement rule.
(PostProcedureHeading): New rule.
(ForwardDeclaration): Ditto.
(ProperProcedure): Ditto.
* gm2-compiler/SymbolTable.def (PutVarTok): New procedure.
(PutParam): Add typetok parameter.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(GetDeclaredFor): New procedure function.
(AreParametersDefinedInDefinition): Ditto.
(PutParametersDefinedByForward): New procedure.
(GetParametersDefinedByForward): New procedure function.
(PutParametersDefinedByProper): New procedure.
(GetParametersDefinedByProper): New procedure function.
(GetProcedureDeclaredForward): Ditto.
(PutProcedureDeclaredForward): New procedure.
(GetProcedureDeclaredProper): New procedure function.
(PutProcedureDeclaredProper): New procedure.
(GetProcedureDeclaredDefinition): New procedure function.
(PutProcedureDeclaredDefinition): New procedure.
(GetVarDeclTypeTok): Ditto.
(PutVarDeclTypeTok): New procedure.
(GetVarDeclTok): Ditto.
(PutVarDeclTok): New procedure.
(GetVarDeclFullTok): Ditto.
* gm2-compiler/SymbolTable.mod (ProcedureDecl): New record type.
(VarDecl): Ditto.
(SymProcedure): Add new field Declared.
(SymVar): Add new field Declared.
(PutVarTok): New procedure.
(PutParam): Add typetok parameter.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(GetDeclaredFor): New procedure function.
(AreParametersDefinedInDefinition): Ditto.
(PutParametersDefinedByForward): New procedure.
(GetParametersDefinedByForward): New procedure function.
(PutParametersDefinedByProper): New procedure.
(GetParametersDefinedByProper): New procedure function.
(GetProcedureDeclaredForward): Ditto.
(PutProcedureDeclaredForward): New procedure.
(GetProcedureDeclaredProper): New procedure function.
(PutProcedureDeclaredProper): New procedure.
(GetProcedureDeclaredDefinition): New procedure function.
(PutProcedureDeclaredDefinition): New procedure.
(GetVarDeclTypeTok): Ditto.
(PutVarDeclTypeTok): New procedure.
(GetVarDeclTok): Ditto.
(PutVarDeclTok): New procedure.
(GetVarDeclFullTok): Ditto.
(MakeProcedure): Initialize Declared field.
(MakeVar): Initialize Declared field.
* gm2-libs-log/FileSystem.def (FileNameChar): Add
missing return type.
* m2.flex: Add FORWARD keyword.

gcc/testsuite/ChangeLog:

PR modula2/115328
* gm2/iso/fail/badparam.def: New test.
* gm2/iso/fail/badparam.mod: New test.
* gm2/iso/fail/badparam2.def: New test.
* gm2/iso/fail/badparam2.mod: New test.
* gm2/iso/fail/badparam3.def: New test.
* gm2/iso/fail/badparam3.mod: New test.
* gm2/iso/fail/badparamarray.def: New test.
* gm2/iso/fail/badparamarray.mod: New test.
* gm2/iso/fail/simpledef1.def: New test.
* gm2/iso/fail/simpledef1.mod: New test.
* gm2/iso/fail/simpleforward.mod: New test.
* gm2/iso/fail/simpleforward2.mod: New test.
* gm2/iso/fail/simpleforward3.mod: New test.
* gm2/iso/fail/simpleforward4.mod: New test.
* gm2/iso/fail/simpleforward5.mod: New test.
* gm2/iso/fail/simpleforward7.mod: New test.
* gm2/iso/pass/simpleforward.mod: New test.
* gm2/iso/pass/simpleforward6.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Fortran: Fix translatability of diagnostic strings

gcc/fortran/ChangeLog:

* check.cc (is_c_interoperable): Use _(...) around to mark strings
as translatable.
* data.cc (gfc_assign_data_value): Move string literal to gfc_error
to make it translatable.
* resolve.cc (resolve_fl_variable, resolve_equivalence): Use G_(...)
around string literals.
* scanner.cc (skip_fixed_omp_sentinel): Replace '...' by %<...%>.
* trans-openmp.cc (gfc_split_omp_clauses,
gfc_trans_omp_declare_variant): Likewise.

Fortran: Add range-based diagnostic

GCC's diagnostic engine gained a while ago support for ranges, i.e. instead
of pointing at a single character '^', it can also have a '~~~~^~~~~~' range.

This patch adds support for this and adds 9 users for it, which covers the
most common cases. A single '^' can be still useful. Some location data in
gfortran is rather bad - often the matching pattern includes whitespace such
that the before or after location points to the beginning/end of the
whitespace, which can be far of especially when comments and/or continuation
lines are involed. Otherwise, often a '^' still sufficient, albeit wrong
location data only becomes obvious once starting to use ranges.

The 'locus' is extended to support two ways to store the data; hereby
gfc_current_locus always contains the old format (at least during parsing)
and gfc_current_locus shall not be used in trans*.cc. The latter permits
a nice cleanup to just use input_location. Otherwise, the new format is
only used when switching to ranges.
The only reason to convert from location_t to locus occurs in trans*.cc
for the gfc_error (etc.) diagnostic and for gfc_trans_runtime_check; there
are 5 currently 5 such cases.  For gfc_* diagnostic, we could think of
another letter besides %L or a modifier like '%lL', if deemed useful.

In any case, the new format is just:
  locus->u.location = linemap_position_for_loc_and_offset (line_table,
                         loc->u.lb->location, loc->nextc - loc->u.lb->line);
  locus->nextc = (gfc_char_t *) -1;  /* Marker for new format. */
i.e. using the existing location_t location in in the linebuffer (which
points to column 0) and add as offset the actually used column number.

As location_t handles ranges, we just use it also to store them via:
  location = make_location (caret, begin, end)
There are a few convenience macros/functions but that's all.

Alongside, a few minor fixes were done: linemap_location_before_p replaces
a line-number based comparison, which does not handle multiple statements
in the same line that ';' allows for.

gcc/fortran/ChangeLog:

* data.cc (gfc_assign_data_value): Use linemap_location_before_p
and GFC_LOCUS_IS_SET.
* decl.cc (gfc_verify_c_interop_param): Make better translatable.
(build_sym, variable_decl, gfc_match_formal_arglist,
gfc_match_subroutine): Add range-based locations, use it in
diagnostic and gobble whitespace for better locations.
* error.cc (gfc_get_location_with_offset): Handle new format.
(gfc_get_location_range): New.
* expr.cc (gfc_check_assign): Use GFC_LOCUS_IS_SET.
* frontend-passes.cc (check_locus_code, check_locus_expr):
Likewise.
(runtime_error_ne): Use GFC_LOCUS_IS_SET.
* gfortran.h (locus): Change lb to union with lb and location.
(GFC_LOCUS_IS_SET): Define.
(gfc_get_location_range): New prototype.
(gfc_new_symbol, gfc_get_symbol, gfc_get_sym_tree,
gfc_get_ha_symbol, gfc_get_ha_sym_tree): Take optional locus
argument.
* io.cc (io_constraint): Use GFC_LOCUS_IS_SET.
* match.cc (gfc_match_sym_tree): Use range locus.
* openmp.cc (gfc_match_omp_variable_list,
gfc_match_omp_doacross_sink): Likewise.
* parse.cc (next_free): Update for locus struct change.
* primary.cc (gfc_match_varspec): Likewise.
(match_variable): Use range locus.
* resolve.cc (find_array_spec): Use GFC_LOCUS_IS_SET.
* scanner.cc (gfc_at_eof, gfc_at_bol, gfc_start_source_files,
gfc_advance_line, gfc_define_undef_line, skip_fixed_comments,
gfc_gobble_whitespace, include_stmt, gfc_new_file): Update
for locus struct change.
* symbol.cc (gfc_new_symbol, gfc_get_sym_tree, gfc_get_symbol,
gfc_get_ha_sym_tree, gfc_get_ha_symbol): Take optional locus.
* trans-array.cc (gfc_trans_array_constructor_value): Use %L not %C.
(gfc_trans_g77_array, gfc_trans_dummy_array_bias,
gfc_trans_class_array, gfc_trans_deferred_array): Replace
gfc_{save,set,restore}_backend_locus by directly using
input_location.
* trans-common.cc (build_equiv_decl, get_init_field): Likewise.
* trans-decl.cc (gfc_get_extern_function_decl, build_function_decl,
build_entry_thunks, gfc_null_and_pass_deferred_len,
gfc_trans_deferred_vars, gfc_trans_use_stmts, finish_oacc_declare,
gfc_generate_block_data): Likewise.
* trans-expr.cc (gfc_copy_class_to_class, gfc_conv_expr): Changes
to avoid gfc_current_locus.
* trans-io.cc (set_error_locus): Likewise.
* trans-openmp.cc (gfc_trans_omp_workshare): Use input_locus directly.
* trans-stmt.cc (gfc_trans_if_1): Likewise and use GFC_LOCUS_IS_SET.
* trans-types.cc (gfc_get_union_type, gfc_get_derived_type): Likewise.
* trans.cc (gfc_locus_from_location): New.
(trans_runtime_error_vararg, gfc_trans_runtime_check): Use location_t
for file + line data.
(gfc_current_backend_file, gfc_save_backend_locus,
gfc_set_backend_locus, gfc_restore_backend_locus): Remove.
(trans_code): Use input_location directly, don't set gfc_current_locus.
* trans.h (gfc_save_backend_locus, gfc_set_backend_locus,
gfc_restore_backend_locus): Remove prototypes.
(gfc_locus_from_location): Add prototype.

gcc/testsuite/ChangeLog:

* gfortran.dg/bounds_check_25.f90: Update expected column
in the diagnostic.
* gfortran.dg/goacc/pr92793-1.f90: Likewise.
* gfortran.dg/gomp/allocate-14.f90: Likewise.
* gfortran.dg/gomp/polymorphic-mapping.f90: Likewise.
* gfortran.dg/gomp/reduction5.f90: Likewise.
* gfortran.dg/gomp/reduction6.f90: Likewise.

Fix an ICE with UNSIGNED in match_sym_complex_part.

gcc/fortran/ChangeLog:

PR fortran/117225
* primary.cc (match_sym_complex_part): An UNSIGNED in
a complex part is an error.

gcc/testsuite/ChangeLog:

PR fortran/117225
* gfortran.dg/unsigned_38.f90: New test.

runtime/testdata: fix for C23 nullptr keyword

Backport https://go.dev/cl/620955 from main repo.  Original description:

    src/runtime/testdata/testprogcgo/threadprof.go contains C code with a
    variable called nullptr.  This conflicts with the nullptr keyword in
    the C23 revision of the C standard (showing up as gccgo test build
    failures when updating GCC to use C23 by default when building C
    code).

    Rename that variable to nullpointer to avoid the clash with the
    keyword (any other name that's not a keyword would work just as well).

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/621059

diagnostics: remove forward decl of json::value from diagnostic.h

I believe this hasn't been necessary since r15-1413-gd3878c85f331c7.

gcc/ChangeLog:
* diagnostic.h (json::value): Remove forward decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: add debug dump functions

This commit expands on r15-3973-g4c7a58ac2617e2, which added
debug "dump" member functiosn to pretty_printer and output_buffer.

This followup adds "dump" member functions to diagnostic_context and
diagnostic_format, extends the existing dump functions and adds
indentation to make it much easier to see the various relationships
between context, format, printer, etc.

Hence you can now do:

(gdb) call global_dc->dump ()

and get a useful summary of what the diagnostic subsystem is doing;
for example:

(gdb) call global_dc->dump()
diagnostic_context:
  counts:
  output format:
    sarif_output_format
  printer:
    m_show_color: false
    m_url_format: bel
    m_buffer:
      m_formatted_obstack current object: length 0:
      m_chunk_obstack current object: length 0:
      pp_formatted_chunks: depth 0
        0: TEXT("Function ")]
        1: BEGIN_QUOTE, TEXT("program"), END_QUOTE]
        2: TEXT(" requires an argument list at ")]
        3: TEXT("(1)")]

showing the counts of all diagnostic kind that are non-zero (none yet),
that we have a sarif output format, and the printer is part-way through
formatting a string.

gcc/ChangeLog:
* diagnostic-format-json.cc (json_output_format::dump): New.
* diagnostic-format-sarif.cc (sarif_output_format::dump): New.
(sarif_file_output_format::dump): New.
* diagnostic-format-text.cc (diagnostic_text_output_format::dump):
New.
* diagnostic-format-text.h (diagnostic_text_output_format::dump):
New decl.
* diagnostic-format.h (diagnostic_output_format::dump): New decls.
* diagnostic.cc (diagnostic_context::dump): New.
(diagnostic_output_format::dump): New.
* diagnostic.h (diagnostic_context::dump): New decls.
* pretty-print-format-impl.h (pp_formatted_chunks::dump): Add
"indent" param.
* pretty-print.cc (bytes_per_hexdump_line): New constant.
(print_hexdump_line): New.
(print_hexdump): New.
(output_buffer::dump): Add "indent" param and use it.  Add
hexdump of current object in m_formatted_obstack and
m_chunk_obstack.
(pp_formatted_chunks::dump): Add "indent" param and use it.
(pretty_printer::dump): Likewise.  Add dumping of m_show_color
and m_url_format.
* pretty-print.h (output_buffer::dump): Add "indent" param.
(pretty_printer::dump): Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
(xhtml_output_format::dump): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c: Fix -std=gnu23 -Wtraditional for () in function definitions

We don't yet have clear agreement on removing -Wtraditional (although
it seems there is little to no use for most of the warnings therein),
so fix the bug in its interaction with -std=gnu23 to continue progress
on making -std=gnu23 the default while -Wtraditional remains under
discussion.

The warning for ISO C function definitions with -Wtraditional properly
covers (void), but also wrongly warned for () in C23 mode as that has
the same semantics as (void) in that case. Keep track in c_arg_info
of when () was converted to (void) for C23 so that -Wtraditional can
avoid warning in that case (with an appropriate comment on the
definition of the new field to make clear it can be removed along with
-Wtraditional).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-tree.h (c_arg_info): Add c23_empty_parens.
* c-decl.cc (grokparms): Set c23_empty_parens.
(build_arg_info): Clear c23_empty_parens.
(store_parm_decls_newstyle): Do not give -Wtraditional warning for
ISO C function definition if c23_empty_parens.

gcc/testsuite/
* gcc.dg/wtr-gnu17-1.c, gcc.dg/wtr-gnu23-1.c: New tests.

Daily bump.

gcc/: Merge definitions of array_type_nelts_top

There were two identical definitions, and none of them are available
where they are needed for implementing a number-of-elements-of
operator. Merge them, and provide the single definition in
gcc/tree.{h,cc}, where it's available for that operator, which will be
added in a following commit.

gcc/ChangeLog:

* tree.h (array_type_nelts_top)
* tree.cc (array_type_nelts_top):
Define function (moved from gcc/cp/).

gcc/cp/ChangeLog:

* cp-tree.h (array_type_nelts_top)
* tree.cc (array_type_nelts_top):
Remove function (move to gcc/).

gcc/rust/ChangeLog:

* backend/rust-tree.h (array_type_nelts_top)
* backend/rust-tree.cc (array_type_nelts_top):
Remove function.

Signed-off-by: Alejandro Colomar <alx@kernel.org>

gcc/: Rename array_type_nelts => array_type_nelts_minus_one

The old name was misleading.

While at it, also rename some temporary variables that are used with
this function, for consistency.

Link: <https://inbox.sourceware.org/gcc-patches/9fffd80-dca-2c7e-14b-6c9b509a7215@redhat.com/T/#m2f661c67c8f7b2c405c8c7fc3152dd85dc729120>

gcc/ChangeLog:

* tree.cc (array_type_nelts, array_type_nelts_minus_one)
* tree.h (array_type_nelts, array_type_nelts_minus_one)
* expr.cc (count_type_elements)
* config/aarch64/aarch64.cc
(pure_scalable_type_info::analyze_array)
* config/i386/i386.cc (ix86_canonical_va_list_type):
Rename array_type_nelts => array_type_nelts_minus_one
The old name was misleading.

gcc/c/ChangeLog:

* c-decl.cc (one_element_array_type_p, get_parm_array_spec)
* c-fold.cc (c_fold_array_ref):
Rename array_type_nelts => array_type_nelts_minus_one

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array)
* init.cc
(build_zero_init_1)
(build_value_init_noctor)
(build_vec_init)
(build_delete)
* lambda.cc (add_capture)
* tree.cc (array_type_nelts_top):
Rename array_type_nelts => array_type_nelts_minus_one

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps)
* trans-openmp.cc
(gfc_walk_alloc_comps)
(gfc_omp_clause_linear_ctor):
Rename array_type_nelts => array_type_nelts_minus_one

gcc/rust/ChangeLog:

* backend/rust-tree.cc (array_type_nelts_top):
Rename array_type_nelts => array_type_nelts_minus_one

Suggested-by: Richard Biener <richard.guenther@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>

libbacktrace: don't get confused by overlapping address ranges

Fixes https://github.com/ianlancetaylor/libbacktrace/issues/137.

* dwarf.c (resolve_unit_addrs_overlap_walk): New static function.
(resolve_unit_addrs_overlap): New static function.
(build_dwarf_data): Call resolve_unit_addrs_overlap.

hppa: Fix up pa.opt.urls

2024-10-18 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.opt.urls: Fix for -mlra.

Handle GFC_STD_UNSIGNED like a standard in error messages.

gcc/fortran/ChangeLog:

* error.cc (notify_std_msg): Handle GFC_STD_UNSIGNED.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_37.f90: New test.

hppa: Add LRA support

LRA is not enabled as default since there are some new test fails
remaining to resolve.

2024-10-18 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/113933
* config/pa/pa.cc (pa_use_lra_p): Declare.
(TARGET_LRA_P): Change define to pa_use_lra_p.
(pa_use_lra_p): New function.
(legitimize_pic_address): Also check lra_in_progress.
(pa_emit_move_sequence): Likewise.
(pa_legitimate_constant_p): Likewise.
(pa_legitimate_address_p): Likewise.
(pa_secondary_reload): For floating-point loads and stores,
return NO_REGS for REG and SUBREG operands. Return
GENERAL_REGS for some shift register spills.
* config/pa/pa.opt: Add mlra option.
* config/pa/predicates.md (integer_store_memory_operand):
Also check lra_in_progress.
(floating_point_store_memory_operand): Likewise.
(reg_before_reload_operand): Likewise.

[PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation

If riscv_vector::expand_block_move is generating a straight-line memcpy
using a predicated store, it tries to use a smaller LMUL to reduce
register pressure if it still allows an entire transfer.

This happens in the inner loop of riscv_vector::expand_block_move,
however, the vmode chosen by this loop gets overwritten later in the
function, so I have added the missing break from the outer loop.

I have also addressed a couple of issues with the conditions of the if
statement within the inner loop.

The first condition did not make sense to me:
```
  TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT
```
I think this was supposed to be checking that the length fits within the
given LMUL, so I have changed it to do that.

The second condition:
```
  /* Avoid loosing the option of using vsetivli .  */
  && (nunits <= 31 * lmul || nunits > 31 * 8)
```
seems to imply that lmul affects the range of AVL immediate that
vsetivli can take but I don't think that is correct.  Anyway, I don't
think this condition is necessary because if we find a suitable mode we
should stick with it, regardless of whether it allowed vsetivli, rather
than continuing to try larger lmul which would increase register
pressure or smaller potential_ew which would increase AVL.  I have
removed this condition.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Fix
condition for using smaller LMUL.  Break outer loop if a
suitable vmode has been found.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: Expect smaller lmul.
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-3.c: New test.

[PATCH 2/7] RISC-V: Fix uninitialized reg in memcpy

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Replace
`end` with `length_rtx` in gen_rtx_NE.

[PATCH 1/7] RISC-V: Fix indentation in riscv_vector::expand_block_move [NFC]

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Fix
indentation.

i386: Fix the order of operands in andn<MMXMODEI:mode>3 [PR117192]

Fix the order of operands in andn<MMXMODEI:mode>3 expander to comply
with the specification, where bitwise-complement applies to operand 2.

PR target/117192

gcc/ChangeLog:

* config/i386/mmx.md (andn<MMXMODEI:mode>3): Swap operand
indexes 1 and 2 to comply with andn specification.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117192.c: New test.

libstdc++: Reuse std::__assign_one in <bits/ranges_algobase.h>

Use std::__assign_one instead of ranges::__assign_one. Adjust the uses,
because std::__assign_one has the arguments in the opposite order (the
same order as an assignment expression).

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (ranges::__assign_one): Remove.
(__copy_or_move, __copy_or_move_backward): Use std::__assign_one
instead of ranges::__assign_one.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Add always_inline to some one-liners in <bits/stl_algobase.h>

We implement std::copy, std::fill etc. as a series of calls to other
overloads which incrementally peel off layers of iterator wrappers. This
adds a high abstraction penalty for -O0 and potentially even -O1. Add
the always_inline attribute to several functions that are just a single
return statement (and maybe a static_assert, or some concept-checking
assertions which are disabled by default).

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__copy_move_a1, __copy_move_a)
(__copy_move_backward_a1, __copy_move_backward_a, move_backward)
(__fill_a1, __fill_a, fill, __fill_n_a, fill_n, __equal_aux):
Add always_inline attribute to one-line forwarding functions.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Add nodiscard to std::find

I missed this one out in r14-9478-gdf483ebd24689a but I don't think that
was intentional. I see no reason std::find shouldn't be [[nodiscard]].

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (find): Add nodiscard.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Inline memmove optimizations for std::copy etc. [PR115444]

This removes all the __copy_move class template specializations that
decide how to optimize std::copy and std::copy_n. We can inline those
optimizations into the algorithms, using if-constexpr (and macros for
C++98 compatibility) and remove the code dispatching to the various
class template specializations.

Doing this means we implement the optimization directly for std::copy_n
instead of deferring to std::copy, That avoids the unwanted consequence
of advancing the iterator in copy_n only to take the difference later to
get back to the length that we already had in copy_n originally (as
described in PR 115444).

With the new flattened implementations, we can also lower contiguous
iterators to pointers in std::copy/std::copy_n/std::copy_backwards, so
that they benefit from the same memmove optimizations as pointers.
There's a subtlety though: contiguous iterators can potentially throw
exceptions to exit the algorithm early. So we can only transform the
loop to memmove if dereferencing the iterator is noexcept. We don't
check that incrementing the iterator is noexcept because we advance the
contiguous iterators before using memmove, so that if incrementing would
throw, that happens first. I am writing a proposal (P3349R0) which would
make this unnecessary, so I hope we can drop the nothrow requirements
later.

This change also solves PR 114817 by checking is_trivially_assignable
before optimizing copy/copy_n etc. to memmove. It's not enough to check
that the types are trivially copyable (a precondition for using memmove
at all), we also need to check that the specific assignment that would
be performed by the algorithm is also trivial. Replacing a non-trivial
assignment with memmove would be observable, so not allowed.

libstdc++-v3/ChangeLog:

PR libstdc++/115444
PR libstdc++/114817
* include/bits/stl_algo.h (__copy_n): Remove generic overload
and overload for random access iterators.
(copy_n): Inline generic version of __copy_n here. Do not defer
to std::copy for random access iterators.
* include/bits/stl_algobase.h (__copy_move): Remove.
(__nothrow_contiguous_iterator, __memcpyable_iterators): New
concepts.
(__assign_one, _GLIBCXX_TO_ADDR, _GLIBCXX_ADVANCE): New helpers.
(__copy_move_a2): Inline __copy_move logic and conditional
memmove optimization into the most generic overload.
(__copy_n_a): Likewise.
(__copy_move_backward): Remove.
(__copy_move_backward_a2): Inline __copy_move_backward logic and
memmove optimization into the most generic overload.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/114817.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_copy_n/114817.cc:
New test.
* testsuite/25_algorithms/copy/114817.cc: New test.
* testsuite/25_algorithms/copy/115444.cc: New test.
* testsuite/25_algorithms/copy_n/114817.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Make __normal_iterator constexpr, always_inline, nodiscard

The __gnu_cxx::__normal_iterator type we use for std::vector::iterator
is not specified by the standard, it's an implementation detail. This
means it's not constrained by the rule that forbids strengthening
constexpr. We can make it meet the constexpr iterator requirements for
older standards, not only when it's required to be for C++20.

For the non-const member functions they can't be constexpr in C++11, so
use _GLIBCXX14_CONSTEXPR for those. For all constructors, const members
and non-member operator overloads, use _GLIBCXX_CONSTEXPR or just
constexpr.

We can also liberally add [[nodiscard]] and [[gnu::always_inline]]
attributes to those functions.

Also change some internal helpers for std::move_iterator which can be
unconditionally constexpr and marked nodiscard.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__normal_iterator): Make all
members and overloaded operators constexpr before C++20, and add
always_inline attribute
(__to_address): Add nodiscard and always_inline attributes.
(__make_move_if_noexcept_iterator): Add nodiscard
and make unconditionally constexpr.
(__niter_base(__normal_iterator), __niter_base(Iter)):
Add nodiscard and always_inline attributes.
(__niter_base(reverse_iterator), __niter_base(move_iterator))
(__miter_base): Add inline.
(__niter_wrap(From, To)): Add nodiscard attribute.
(__niter_wrap(const Iter&, Iter)): Add nodiscard and
always_inline attributes.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Refactor std::uninitialized_{copy,fill,fill_n} algos [PR68350]

This refactors the std::uninitialized_copy, std::uninitialized_fill and
std::uninitialized_fill_n algorithms to directly perform memcpy/memset
optimizations instead of dispatching to std::copy/std::fill/std::fill_n.

The reasons for this are:

- Use 'if constexpr' to simplify and optimize compilation throughput, so
  dispatching to specialized class templates is only needed for C++98
  mode.
- Use memcpy instead of memmove, because the conditions on
  non-overlapping ranges are stronger for std::uninitialized_copy than
  for std::copy. Using memcpy might be a minor optimization.
- No special case for creating a range of one element, which std::copy
  needs to deal with (see PR libstdc++/108846). The uninitialized algos
  create new objects, which reuses storage and is allowed to clobber
  tail padding.
- Relax the conditions for using memcpy/memset, because the C++20 rules
  on implicit-lifetime types mean that we can rely on memcpy to begin
  lifetimes of trivially copyable types.  We don't need to require
  trivially default constructible, so don't need to limit the
  optimization to trivial types. See PR 68350 for more details.
- Remove the dependency on std::copy and std::fill. This should mean
  that stl_uninitialized.h no longer needs to include all of
  stl_algobase.h.  This isn't quite true yet, because we still use
  std::fill in __uninitialized_default and still use std::fill_n in
  __uninitialized_default_n. That will be fixed later.

Several tests need changes to the diagnostics matched by dg-error
because we no longer use the __constructible() function that had a
static assert in. Now we just get straightforward errors for attempting
to use a deleted constructor.

Two tests needed more signficant changes to the actual expected results
of executing the tests, because they were checking for old behaviour
which was incorrect according to the standard.
20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
std::copy to be used for a call to std::uninitialized_copy involving two
trivially copyable types. That was incorrect behaviour, because a
non-trivial constructor should have been used, but using std::copy used
trivial default initialization followed by assignment.
20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
the behaviour with a non-integral Size passed to uninitialized_fill_n,
but I wrote the test looking at the requirements of uninitialized_copy_n
which are not the same as uninitialized_fill_n. The former uses --n and
tests n > 0, but the latter just tests n-- (which will never be false
for a floating-point value with a fractional part).

libstdc++-v3/ChangeLog:

PR libstdc++/68350
PR libstdc++/93059
* include/bits/stl_uninitialized.h (__check_constructible)
(_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
[C++98] (__unwrappable_niter): New trait.
(__uninitialized_copy<true>): Replace use of std::copy.
(uninitialized_copy): Fix Doxygen comments. Open-code memcpy
optimization for C++11 and later.
(__uninitialized_fill<true>): Replace use of std::fill.
(uninitialized_fill): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
(__uninitialized_fill_n<true>): Replace use of std::fill_n.
(uninitialized_fill_n): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
Adjust expected behaviour to match what the standard specifies.
* testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error directives.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Likewise.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Likewise.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Move std::__niter_base and std::__niter_wrap to stl_iterator.h

Move the functions for unwrapping and rewrapping __normal_iterator
objects to the same file as the definition of __normal_iterator itself.

This will allow a later commit to make use of std::__niter_base in other
headers without having to include all of <bits/stl_algobase.h>.

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
to ...
* include/bits/stl_iterator.h: ... here.
(__niter_base, __miter_base): Move all overloads to the end of
the header.
* testsuite/24_iterators/normal_iterator/wrapping.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

SVE intrinsics: Add fold_active_lanes_to method to refactor svmul and svdiv.

As suggested in
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html,
this patch adds the method gimple_folder::fold_active_lanes_to (tree X).
This method folds active lanes to X and sets inactive lanes according to
the predication, returning a new gimple statement. That makes folding of
SVE intrinsics easier and reduces code duplication in the
svxxx_impl::fold implementations.
Using this new method, svdiv_impl::fold and svmul_impl::fold were refactored.
Additionally, the method was used for two optimizations:
1) Fold svdiv to the dividend, if the divisor is all ones and
2) for svmul, if one of the operands is all ones, fold to the other operand.
Both optimizations were previously applied to _x and _m predication on
the RTL level, but not for _z, where svdiv/svmul were still being used.
For both optimization, codegen was improved by this patch, for example by
skipping sel instructions with all-same operands and replacing sel
instructions by mov instructions.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Refactor using fold_active_lanes_to and fold to dividend, is the
divisor is all ones.
(svmul_impl::fold): Refactor using fold_active_lanes_to and fold
to the other operand, if one of the operands is all ones.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_active_lanes_to (tree).
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::fold_actives_lanes_to): Add new method to fold
actives lanes to given argument and setting inactives lanes
according to the predication.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
* gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.

[5/n] remove trapv-*.c special-casing of gcc.dg/vect/ files

The following makes -ftrapv explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named trapv-*
* gcc.dg/vect/trapv-vect-reduc-4.c: Add dg-additional-options -ftrapv.

[4/n] remove wrapv-*.c special-casing of gcc.dg/vect/ files

The following makes -fwrapv explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named wrapv-*
* gcc.dg/vect/wrapv-vect-7.c: Add dg-additional-options -fwrapv.
* gcc.dg/vect/wrapv-vect-reduc-2char.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-2short.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Likewise.
* gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c: Likewise.

[3/n] remove fast-math-*.c special-casing of gcc.dg/vect/ files

The following makes -ffast-math explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named fast-math-*
* gcc.dg/vect/fast-math-bb-slp-call-1.c: Add dg-additional-options
-ffast-math.
* gcc.dg/vect/fast-math-bb-slp-call-2.c: Likewise.
* gcc.dg/vect/fast-math-bb-slp-call-3.c: Likewise.
* gcc.dg/vect/fast-math-ifcvt-1.c: Likewise.
* gcc.dg/vect/fast-math-pr35982.c: Likewise.
* gcc.dg/vect/fast-math-pr43074.c: Likewise.
* gcc.dg/vect/fast-math-pr44152.c: Likewise.
* gcc.dg/vect/fast-math-pr55281.c: Likewise.
* gcc.dg/vect/fast-math-slp-27.c: Likewise.
* gcc.dg/vect/fast-math-slp-38.c: Likewise.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/fast-math-vect-call-2.c: Likewise.
* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
* gcc.dg/vect/fast-math-vect-outer-7.c: Likewise.
* gcc.dg/vect/fast-math-vect-pow-1.c: Likewise.
* gcc.dg/vect/fast-math-vect-pow-2.c: Likewise.
* gcc.dg/vect/fast-math-vect-pr25911.c: Likewise.
* gcc.dg/vect/fast-math-vect-pr29925.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-5.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-7.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-8.c: Likewise.
* gcc.dg/vect/fast-math-vect-reduc-9.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Likewise.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Likewise.

[2/n] remove no-vfa-*.c special-casing of gcc.dg/vect/ files

The following makes --param vect-max-version-for-alias-checks=0
explicit.

* gcc.dg/vect/vect.exp: Remove special-casing of tests
named no-vfa-*
* gcc.dg/vect/no-vfa-pr29145.c: Add dg-additional-options
--param vect-max-version-for-alias-checks=0.
* gcc.dg/vect/no-vfa-vect-101.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102a.c: Likewise.
* gcc.dg/vect/no-vfa-vect-37.c: Likewise.
* gcc.dg/vect/no-vfa-vect-43.c: Likewise.
* gcc.dg/vect/no-vfa-vect-45.c: Likewise.
* gcc.dg/vect/no-vfa-vect-49.c: Likewise.
* gcc.dg/vect/no-vfa-vect-51.c: Likewise.
* gcc.dg/vect/no-vfa-vect-53.c: Likewise.
* gcc.dg/vect/no-vfa-vect-57.c: Likewise.
* gcc.dg/vect/no-vfa-vect-61.c: Likewise.
* gcc.dg/vect/no-vfa-vect-79.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
* gcc.dg/vect/no-vfa-vect-dv-2.c: Likewise.

Adjust assert in vect_build_slp_tree_2

The assert in SLP discovery when we handle masked operations is
confusingly wide - all gather variants should be catched by
the earlier STMT_VINFO_GATHER_SCATTER_P.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only expect
IFN_MASK_LOAD for masked loads that are not
STMT_VINFO_GATHER_SCATTER_P.

MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as aarch64 ldp/stp
maintainer.
(Various Maintainers): Add myself as pair fusion maintainer.

testsuite: Add necessary dejagnu directives to pr115815_0.c

I have received an email from the Linaro infrastructure that the test
gcc.dg/lto/pr115815_0.c which I added is failing on arm-eabi and I
realized that not only it is missing dg-require-effective-target
global_constructor but actually any dejagnu directives at all, which
means it is unnecessarily running both at -O0 and -O2 and there is an
unnecesary run test too.  All fixed by this patch.

I have not actually verified that the failure goes away on arm-eabi
but have very high hopes it will.  I have verified that the test still
checks for the bug and also that it passes by running:

  make -k check-gcc RUNTESTFLAGS="lto.exp=*pr115815*"

gcc/testsuite/ChangeLog:

2024-10-14  Martin Jambor  <mjambor@suse.cz>

* gcc.dg/lto/pr115815_0.c: Add dejagu directives.

middle-end: Fix GSI for gcond root [PR117140]

When finding the gsi to use for code of the root statements we should use the
one of the original statement rather than the gcond which may be inside a
pattern.

Without this the emitted instructions may be discarded later.

gcc/ChangeLog:

PR tree-optimization/117140
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Use gsi from
original statement.

gcc/testsuite/ChangeLog:

PR tree-optimization/117140
* gcc.dg/vect/vect-early-break_129-pr117140.c: New test.

middle-end: Fix VEC_PERM_EXPR lowering since relaxation of vector sizes

In GCC 14 VEC_PERM_EXPR was relaxed to be able to permute to a 2x larger vector
than the size of the input vectors. However various passes and transformations
were not updated to account for this.

I have patches in these area that I will be upstreaming with individual patches
that expose them.

This one is that vectlower tries to lower based on the size of the input vectors
rather than the size of the output. As a consequence it creates an invalid
vector of half the size.

Luckily we ICE because the resulting nunits doesn't match the vector size.

gcc/ChangeLog:

* tree-vect-generic.cc (lower_vec_perm): Use output vector size instead
of input vector when determining output nunits.

gcc/testsuite/ChangeLog:

* gcc.dg/vec-perm-lower.c: New test.

AArch64: use movi d0, #0 to clear SVE registers instead of mov z0.d, #0

This patch changes SVE to use Adv. SIMD movi 0 to clear SVE registers when not
in SVE streaming mode. As the Neoverse Software Optimization guides indicate
SVE mov #0 is not a zero cost move.

When In streaming mode we continue to use SVE's mov to clear the registers.

Tests have already been updated.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_output_sve_mov_immediate): Use
fmov for SVE zeros.

AArch64: support encoding integer immediates using floating point moves

This patch extends our immediate SIMD generation cases to support generating
integer immediates using floating point operation if the integer immediate maps
to an exact FP value.

As an example:

uint32x4_t f1() {
    return vdupq_n_u32(0x3f800000);
}

currently generates:

f1:
        adrp    x0, .LC0
        ldr     q0, [x0, #:lo12:.LC0]
        ret

i.e. a load, but with this change:

f1:
        fmov    v0.4s, 1.0e+0
        ret

Such immediates are common in e.g. our Math routines in glibc because they are
created to extract or mark part of an FP immediate as masks.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_sve_valid_immediate,
aarch64_simd_valid_immediate): Refactor accepting modes and values.
(aarch64_float_const_representable_p): Refactor and extract FP checks
into ...
(aarch64_real_float_const_representable_p): ...This and fix fail
fallback from real_to_integer.
(aarch64_advsimd_valid_immediate): Use it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/const_create_using_fmov.c: New test.

AArch64: update testsuite to account for new zero moves

The patch series will adjust how zeros are created. In principal it doesn't
matter the exact lane size a zero gets created on but this makes the tests a
bit fragile.

This preparation patch will update the testsuite to accept multiple variants
of ways to create vector zeros to accept both the current syntax and the one
being transitioned to in the series.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldp_stp_18.c: Update zero regexpr.
* gcc.target/aarch64/memset-corner-cases.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_bf16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_f64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_s8.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u16.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u32.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u64.c: Likewise.
* gcc.target/aarch64/sme/acle-asm/revd_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acge_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acgt_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/acle_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/aclt_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cmpuo_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u8.c: Likewise.
* gcc.target/aarch64/sve/const_fold_div_1.c: Likewise.
* gcc.target/aarch64/sve/const_fold_mul_1.c: Likewise.
* gcc.target/aarch64/sve/dup_imm_1.c: Likewise.
* gcc.target/aarch64/sve/fdup_1.c: Likewise.
* gcc.target/aarch64/sve/fold_div_zero.c: Likewise.
* gcc.target/aarch64/sve/fold_mul_zero.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.

arm: [MVE intrinsics] use long_type_suffix / half_type_suffix helpers

In several places we are looking for a type twice or half as large as
the type suffix: this patch introduces helper functions to avoid code
duplication. long_type_suffix is similar to the SVE counterpart, but
adds an 'expected_tclass' parameter. half_type_suffix is similar to
it, but does not exist in SVE.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-shapes.cc (long_type_suffix): New.
(half_type_suffix): New.
(struct binary_move_narrow_def): Use new helper.
(struct binary_move_narrow_unsigned_def): Likewise.
(struct binary_rshift_narrow_def): Likewise.
(struct binary_rshift_narrow_unsigned_def): Likewise.
(struct binary_widen_def): Likewise.
(struct binary_widen_n_def): Likewise.
(struct binary_widen_opt_n_def): Likewise.
(struct unary_widen_def): Likewise.

arm: [MVE intrinsics] rework vsbcq vsbciq

Implement vsbcq vsbciq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patches.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): Add
support for vsbciq and vsbcq.
(vadciq, vadcq): Add new parameter.
(vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.def (vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.h (vsbciq): New.
(vsbcq): New.
* config/arm/arm_mve.h (vsbciq): Delete.
(vsbciq_m): Delete.
(vsbcq): Delete.
(vsbcq_m): Delete.
(vsbciq_s32): Delete.
(vsbciq_u32): Delete.
(vsbciq_m_s32): Delete.
(vsbciq_m_u32): Delete.
(vsbcq_s32): Delete.
(vsbcq_u32): Delete.
(vsbcq_m_s32): Delete.
(vsbcq_m_u32): Delete.
(__arm_vsbciq_s32): Delete.
(__arm_vsbciq_u32): Delete.
(__arm_vsbciq_m_s32): Delete.
(__arm_vsbciq_m_u32): Delete.
(__arm_vsbcq_s32): Delete.
(__arm_vsbcq_u32): Delete.
(__arm_vsbcq_m_s32): Delete.
(__arm_vsbcq_m_u32): Delete.
(__arm_vsbciq): Delete.
(__arm_vsbciq_m): Delete.
(__arm_vsbcq): Delete.
(__arm_vsbcq_m): Delete.

arm: [MVE intrinsics] rework vadcq

Implement vadcq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patch to support
vadciq: we just need to initialize carry from the input parameter.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/

* config/arm/arm-mve-builtins-base.cc (vadcq_vsbc): Add support
for vadcq.
* config/arm/arm-mve-builtins-base.def (vadcq): New.
* config/arm/arm-mve-builtins-base.h (vadcq): New.
* config/arm/arm_mve.h (vadcq): Delete.
(vadcq_m): Delete.
(vadcq_s32): Delete.
(vadcq_u32): Delete.
(vadcq_m_s32): Delete.
(vadcq_m_u32): Delete.
(__arm_vadcq_s32): Delete.
(__arm_vadcq_u32): Delete.
(__arm_vadcq_m_s32): Delete.
(__arm_vadcq_m_u32): Delete.
(__arm_vadcq): Delete.
(__arm_vadcq_m): Delete.

arm: [MVE intrinsics] rework vadciq

Implement vadciq using the new MVE builtins framework.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): New.
(vadciq): New.
* config/arm/arm-mve-builtins-base.def (vadciq): New.
* config/arm/arm-mve-builtins-base.h (vadciq): New.
* config/arm/arm_mve.h (vadciq): Delete.
(vadciq_m): Delete.
(vadciq_s32): Delete.
(vadciq_u32): Delete.
(vadciq_m_s32): Delete.
(vadciq_m_u32): Delete.
(__arm_vadciq_s32): Delete.
(__arm_vadciq_u32): Delete.
(__arm_vadciq_m_s32): Delete.
(__arm_vadciq_m_u32): Delete.
(__arm_vadciq): Delete.
(__arm_vadciq_m): Delete.

arm: [MVE intrinsics] factorize vadc vadci vsbc vsbci

Factorize vadc/vsbc and vadci/vsbci so that they use the same
parameterized names.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add VADCIQ_M_S, VADCIQ_M_U,
VADCIQ_U, VADCIQ_S, VADCQ_M_S, VADCQ_M_U, VADCQ_S, VADCQ_U,
VSBCIQ_M_S, VSBCIQ_M_U, VSBCIQ_S, VSBCIQ_U, VSBCQ_M_S, VSBCQ_M_U,
VSBCQ_S, VSBCQ_U.
(VADCIQ, VSBCIQ): Merge into ...
(VxCIQ): ... this.
(VADCIQ_M, VSBCIQ_M): Merge into ...
(VxCIQ_M): ... this.
(VSBCQ, VADCQ): Merge into ...
(VxCQ): ... this.
(VSBCQ_M, VADCQ_M): Merge into ...
(VxCQ_M): ... this.
* config/arm/mve.md
(mve_vadciq_<supf>v4si, mve_vsbciq_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_<supf>v4si): ... this.
(mve_vadciq_m_<supf>v4si, mve_vsbciq_m_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_m_<supf>v4si): ... this.
(mve_vadcq_<supf>v4si, mve_vsbcq_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_<supf>v4si): ... this.
(mve_vadcq_m_<supf>v4si, mve_vsbcq_m_<supf>v4si): Merge into ...
(@mve_<mve_insn>q_m_<supf>v4si): ... this.

arm: [MVE intrinsics] add vadc_vsbc shape

This patch adds the vadc_vsbc shape description.

2024-08-28 Christophe Lyon <chrirstophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vadc_vsbc): New.
* config/arm/arm-mve-builtins-shapes.h (vadc_vsbc): New.

arm: [MVE intrinsics] remove vshlcq useless expanders

Since we rewrote the implementation of vshlcq intrinsics, we no longer
need these expanders.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-builtins.cc
(arm_ternop_unone_none_unone_imm_qualifiers)
(-arm_ternop_none_none_unone_imm_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (vshlcq_m_vec_s)
(vshlcq_m_carry_s, vshlcq_m_vec_u, vshlcq_m_carry_u): Delete.
* config/arm/mve.md (mve_vshlcq_vec_<supf><mode>): Delete.
(mve_vshlcq_carry_<supf><mode>): Delete.
(mve_vshlcq_m_vec_<supf><mode>): Delete.
(mve_vshlcq_m_carry_<supf><mode>): Delete.

arm: [MVE intrinsics] rework vshlcq

Implement vshlc using the new MVE builtins framework.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (class vshlc_impl): New.
(vshlc): New.
* config/arm/arm-mve-builtins-base.def (vshlcq): New.
* config/arm/arm-mve-builtins-base.h (vshlcq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshlc.
* config/arm/arm_mve.h (vshlcq): Delete.
(vshlcq_m): Delete.
(vshlcq_s8): Delete.
(vshlcq_u8): Delete.
(vshlcq_s16): Delete.
(vshlcq_u16): Delete.
(vshlcq_s32): Delete.
(vshlcq_u32): Delete.
(vshlcq_m_s8): Delete.
(vshlcq_m_u8): Delete.
(vshlcq_m_s16): Delete.
(vshlcq_m_u16): Delete.
(vshlcq_m_s32): Delete.
(vshlcq_m_u32): Delete.
(__arm_vshlcq_s8): Delete.
(__arm_vshlcq_u8): Delete.
(__arm_vshlcq_s16): Delete.
(__arm_vshlcq_u16): Delete.
(__arm_vshlcq_s32): Delete.
(__arm_vshlcq_u32): Delete.
(__arm_vshlcq_m_s8): Delete.
(__arm_vshlcq_m_u8): Delete.
(__arm_vshlcq_m_s16): Delete.
(__arm_vshlcq_m_u16): Delete.
(__arm_vshlcq_m_s32): Delete.
(__arm_vshlcq_m_u32): Delete.
(__arm_vshlcq): Delete.
(__arm_vshlcq_m): Delete.
* config/arm/mve.md (mve_vshlcq_<supf><mode>): Add '@' prefix.
(mve_vshlcq_m_<supf><mode>): Likewise.

arm: [MVE intrinsics] add vshlc shape

This patch adds the vshlc shape description.

2024-08-28 Christophe Lyon <chrirstophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vshlc): New.
* config/arm/arm-mve-builtins-shapes.h (vshlc): New.

arm: [MVE intrinsics] remove useless v[id]wdup expanders

Like with vddup/vidup, we use code_for_mve_q_wb_u_insn, so we can drop
the expanders and their declarations as builtins, now useless.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-builtins.cc
(arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (viwdupq_wb_u, vdwdupq_wb_u)
(viwdupq_m_wb_u, vdwdupq_m_wb_u, viwdupq_m_n_u, vdwdupq_m_n_u)
(vdwdupq_n_u, viwdupq_n_u): Delete.
* config/arm/mve.md (mve_vdwdupq_n_u<mode>): Delete.
(mve_vdwdupq_wb_u<mode>): Delete.
(mve_vdwdupq_m_n_u<mode>): Delete.
(mve_vdwdupq_m_wb_u<mode>): Delete.

arm: [MVE intrinsics] update v[id]wdup tests

Testing v[id]wdup overloads with '1' as argument for uint32_t* does
not make sense: this patch adds a new 'unit32_t *a' parameter to foo2
in such tests.

The difference with v[id]dup tests (where we removed 'foo2') is that
in 'foo1' we test the overload with a variable 'wrap' parameter (b)
and we need foo2 to test the overload with an immediate (1).

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/

* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c: Use pointer
parameter in foo2.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.

arm: [MVE intrinsics] rework vdwdup viwdup

Implement vdwdup and viwdup using the new MVE builtins framework.

In order to share more code with viddup_impl, the patch swaps operands
1 and 2 in @mve_v[id]wdupq_m_wb_u<mode>_insn, so that the parameter
order is similar to what @mve_v[id]dupq_m_wb_u<mode>_insn uses.

2024-08-28 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (viddup_impl): Add support
for wrapping versions.
(vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.def (vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.h (vdwdupq): New.
(viwdupq): New.
* config/arm/arm_mve.h (vdwdupq_m): Delete.
(vdwdupq_u8): Delete.
(vdwdupq_u32): Delete.
(vdwdupq_u16): Delete.
(viwdupq_m): Delete.
(viwdupq_u8): Delete.
(viwdupq_u32): Delete.
(viwdupq_u16): Delete.
(vdwdupq_x_u8): Delete.
(vdwdupq_x_u16): Delete.
(vdwdupq_x_u32): Delete.
(viwdupq_x_u8): Delete.
(viwdupq_x_u16): Delete.
(viwdupq_x_u32): Delete.
(vdwdupq_m_n_u8): Delete.
(vdwdupq_m_n_u32): Delete.
(vdwdupq_m_n_u16): Delete.
(vdwdupq_m_wb_u8): Delete.
(vdwdupq_m_wb_u32): Delete.
(vdwdupq_m_wb_u16): Delete.
(vdwdupq_n_u8): Delete.
(vdwdupq_n_u32): Delete.
(vdwdupq_n_u16): Delete.
(vdwdupq_wb_u8): Delete.
(vdwdupq_wb_u32): Delete.
(vdwdupq_wb_u16): Delete.
(viwdupq_m_n_u8): Delete.
(viwdupq_m_n_u32): Delete.
(viwdupq_m_n_u16): Delete.
(viwdupq_m_wb_u8): Delete.
(viwdupq_m_wb_u32): Delete.
(viwdupq_m_wb_u16): Delete.
(viwdupq_n_u8): Delete.
(viwdupq_n_u32): Delete.
(viwdupq_n_u16): Delete.
(viwdupq_wb_u8): Delete.
(viwdupq_wb_u32): Delete.
(viwdupq_wb_u16): Delete.
(vdwdupq_x_n_u8): Delete.
(vdwdupq_x_n_u16): Delete.
(vdwdupq_x_n_u32): Delete.
(vdwdupq_x_wb_u8): Delete.
(vdwdupq_x_wb_u16): Delete.
(vdwdupq_x_wb_u32): Delete.
(viwdupq_x_n_u8): Delete.
(viwdupq_x_n_u16): Delete.
(viwdupq_x_n_u32): Delete.
(viwdupq_x_wb_u8): Delete.
(viwdupq_x_wb_u16): Delete.
(viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m_n_u8): Delete.
(__arm_vdwdupq_m_n_u32): Delete.
(__arm_vdwdupq_m_n_u16): Delete.
(__arm_vdwdupq_m_wb_u8): Delete.
(__arm_vdwdupq_m_wb_u32): Delete.
(__arm_vdwdupq_m_wb_u16): Delete.
(__arm_vdwdupq_n_u8): Delete.
(__arm_vdwdupq_n_u32): Delete.
(__arm_vdwdupq_n_u16): Delete.
(__arm_vdwdupq_wb_u8): Delete.
(__arm_vdwdupq_wb_u32): Delete.
(__arm_vdwdupq_wb_u16): Delete.
(__arm_viwdupq_m_n_u8): Delete.
(__arm_viwdupq_m_n_u32): Delete.
(__arm_viwdupq_m_n_u16): Delete.
(__arm_viwdupq_m_wb_u8): Delete.
(__arm_viwdupq_m_wb_u32): Delete.
(__arm_viwdupq_m_wb_u16): Delete.
(__arm_viwdupq_n_u8): Delete.
(__arm_viwdupq_n_u32): Delete.
(__arm_viwdupq_n_u16): Delete.
(__arm_viwdupq_wb_u8): Delete.
(__arm_viwdupq_wb_u32): Delete.
(__arm_viwdupq_wb_u16): Delete.
(__arm_vdwdupq_x_n_u8): Delete.
(__arm_vdwdupq_x_n_u16): Delete.
(__arm_vdwdupq_x_n_u32): Delete.
(__arm_vdwdupq_x_wb_u8): Delete.
(__arm_vdwdupq_x_wb_u16): Delete.
(__arm_vdwdupq_x_wb_u32): Delete.
(__arm_viwdupq_x_n_u8): Delete.
(__arm_viwdupq_x_n_u16): Delete.
(__arm_viwdupq_x_n_u32): Delete.
(__arm_viwdupq_x_wb_u8): Delete.
(__arm_viwdupq_x_wb_u16): Delete.
(__arm_viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m): Delete.
(__arm_vdwdupq_u8): Delete.
(__arm_vdwdupq_u32): Delete.
(__arm_vdwdupq_u16): Delete.
(__arm_viwdupq_m): Delete.
(__arm_viwdupq_u8): Delete.
(__arm_viwdupq_u32): Delete.
(__arm_viwdupq_u16): Delete.
(__arm_vdwdupq_x_u8): Delete.
(__arm_vdwdupq_x_u16): Delete.
(__arm_vdwdupq_x_u32): Delete.
(__arm_viwdupq_x_u8): Delete.
(__arm_viwdupq_x_u16): Delete.
(__arm_viwdupq_x_u32): Delete.
* config/arm/mve.md (@mve_<mve_insn>q_m_wb_u<mode>_insn): Swap
operands 1 and 2.

arm: [MVE intrinsics] add vidwdup shape

This patch adds the vidwdup shape description for vdwdup and viwdup.

It is very similar to viddup, but accounts for the additional 'wrap'
scalar parameter.

2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vidwdup): New.
* config/arm/arm-mve-builtins-shapes.h (vidwdup): New.