git.ipfire.org Git - thirdparty/gcc.git/log

Libquadmath: update doc for some constants

libquadmath/ChangeLog:

* libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_1_PIq,
M_2_PIq, M_2_SQRTPIq, M_SQRT1_2q): Adjust descriptioni
of these constants.

gimple ssa: switchconv: Use __builtin_popcount and support more types in exp transform [PR116355]

The gen_pow2p function generates (a & -a) == a as a fallback for
POPCOUNT (a) == 1.  Not only is the bitmagic not equivalent to
POPCOUNT (a) == 1 but it also introduces UB (consider signed
a = INT_MIN).

This patch rewrites gen_pow2p to always use __builtin_popcount instead.
This means that what the end result GIMPLE code is gets decided by an
already existing machinery in a later pass.  That is a cleaner solution
I think.  This existing machinery also uses a ^ (a - 1) > a - 1 which is
the correct bitmagic.

While rewriting gen_pow2p I had to add logic for converting the
operand's type to a type that __builtin_popcount accepts.  I naturally
also added this logic to gen_log2.  Thanks to this, exponential index
transform gains the capability to handle all operand types with
precision at most that of long long int.

gcc/ChangeLog:

PR tree-optimization/116355
* tree-switch-conversion.cc (can_log2): Add capability to
suggest converting the operand to a different type.
(gen_log2): Add capability to generate a conversion in case the
operand is of a type incompatible with the logarithm operation.
(can_pow2p): New function.
(gen_pow2p): Rewrite to use __builtin_popcount instead of
manually inserting an internal fn call or bitmagic.  Also add
capability to generate a conversion.
(switch_conversion::is_exp_index_transform_viable): Call
can_pow2p.  Store types suggested by can_log2 and gen_log2.
(switch_conversion::exp_index_transform): Params of gen_pow2p
and gen_log2 changed so update their calls.
* tree-switch-conversion.h: Add m_exp_index_transform_log2_type
and m_exp_index_transform_pow2p_type to switch_conversion class
to track type conversions needed to generate the "is power of 2"
and logarithm operations.

gcc/testsuite/ChangeLog:

PR tree-optimization/116355
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of POPCOUNT internal fn after switch conversion.  Test
for it after __builtin_popcount has had a chance to get
expanded.
* gcc.target/i386/switch-exp-transform-3.c: Also test char and
short.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

libstdc++: avoid -Wsign-compare

-Wsign-compare complained about these comparisons between (unsigned) size_t
and (signed) streamsize, or between (unsigned) native_handle_type
and (signed) -1. Fixed by adding casts to unify the types.

libstdc++-v3/ChangeLog:

* include/std/istream: Add cast to avoid -Wsign-compare.
* include/std/stacktrace: Likewise.

testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

This extends the scan-ltrans-tree* helpers to create RTL variants. This
is needed to check the behaviour of an RTL pass under LTO.

gcc/ChangeLog:

PR libstdc++/116140
* doc/sourcebuild.texi: Document ltrans-rtl value of kind for
scan-<kind>-dump*.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
(scan-ltrans-rtl-dump-not): New.
(scan-ltrans-rtl-dump-dem): New.
(scan-ltrans-rtl-dump-dem-not): New.
(scan-ltrans-rtl-dump-times): New.

Add debug overload for slp_instance

I found it helpful to be able to print a whole SLP instance from gdb.

* tree-vect-slp.cc (debug): Add overload for slp_instance.

Fix leak of SLP nodes when building store interleaving

The following fixes a leak of the discovered single-lane store
SLP nodes from which we only use their children. This uncovers
a latent reference counting issue in the interleaving build where
we fail to increment their reference count.

* tree-vect-slp.cc (vect_build_slp_store_interleaving):
Fix reference counting.
(vect_build_slp_instance): Release rhs_nodes.

Split out vect_build_slp_store_interleaving

This splits out SLP store interleaving into a separate function.

* tree-vect-slp.cc (vect_build_slp_store_interleaving): Split
out from ...
(vect_build_slp_instance): Here.

c++: add missing -Wc++??-extensions checks

The pedwarns for each of these features should be silenced by
the appropriate -Wno-c++??-extensions.

The handle_pragma_diagnostic_impl change is necessary so that we handle
-Wc++23-extensions early so it's available to interpret_float while lexing.

gcc/c-family/ChangeLog:

* c-pragma.cc (handle_pragma_diagnostic_impl): Also handle
-Wc++23-extensions early.
* c-lex.cc (interpret_float): Use -Wc++23-extensions for extended
floating point literal pedwarn.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Use
-Wc++20-extensions for auto parameter pedwarn.
* pt.cc (do_decl_instantiation, do_type_instantiation): Use
-Wc++11-extensions for 'extern template'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/extern_template-7.C: New test.
* g++.dg/cpp23/ext-floating19.C: New test.
* g++.dg/cpp2a/abbrev-fn1.C: New test.

libgomp: Add interop types and routines to OpenMP's headers and module

This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.

While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.

libgomp/ChangeLog:

* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
* libgomp.map (GOMP_5.1.3): New; add interop routines.
* omp.h.in: Add interop typedefs, enum and prototypes.
(__GOMP_DEFAULT_NULL): Define.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Use it for the optional depend argument.
* omp_lib.f90.in: Add paramters and interfaces for interop.
* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
-ffree-length-80.
* target.c (omp_get_num_interop_properties, omp_get_interop_int,
omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
* config/gcn/target.c (omp_get_num_interop_properties,
omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
omp_get_interop_name, omp_get_interop_type_desc,
omp_get_interop_rc_desc): Add.
* config/nvptx/target.c (omp_get_num_interop_properties,
omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
omp_get_interop_name, omp_get_interop_type_desc,
omp_get_interop_rc_desc): Add.
* testsuite/libgomp.c-c++-common/interop-routines-1.c: New test.
* testsuite/libgomp.c-c++-common/interop-routines-2.c: New test.
* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
* testsuite/libgomp.fortran/interop-routines-3.F: New test.
* testsuite/libgomp.fortran/interop-routines-4.F: New test.
* testsuite/libgomp.fortran/interop-routines-5.F: New test.
* testsuite/libgomp.fortran/interop-routines-6.F: New test.
* testsuite/libgomp.fortran/interop-routines-7.F90: New test.

libstdc++: fix testcase regexp

The unescaped * broke the match.

libstdc++-v3/ChangeLog:

* testsuite/20_util/default_delete/void_neg.cc: Fix regexp quoting.

libstdc++: avoid -Wzero-as-null-pointer-constant

libstdc++-v3/ChangeLog:

* include/std/coroutine (coroutine_handle): Use nullptr instead of
0 as initializer for _M_fr_ptr.

libstdc++: add missing return

The return seems to have been lost in the r15-1858 RAII overhaul.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (__uninitialized_move_copy): Add
missing return.

libstdc++: remove extra semicolons

The semicolons after each macro invocation here end up following the closing
brace of a function, leading to -Wextra-semi pedwarns.

libstdc++-v3/ChangeLog:

* include/decimal/decimal.h (_DEFINE_DECIMAL_BINARY_OP_WITH_INT):
Remove redundant semicolons.

Test: Move pr116278 run test to dg/torture [NFC]

Move the run test of pr116278 to dg/torture and leave the risc-v the
asm check under risc-v part.

PR target/116278

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116278-run-1.c: Take compile instead of run.
* gcc.target/riscv/pr116278-run-2.c: Ditto.
* gcc.dg/torture/pr116278-run-1.c: New test.
* gcc.dg/torture/pr116278-run-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Vect: Reconcile the const_int operand type of unsigned .SAT_ADD

The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)

It will fail to vectorize as the vectorizable_call will check the
operands is type_compatiable but the imm will be (const_int 9) with
the SImode, which is different from _2 (DImode).  Aka:

uint64_t _1;
uint64_t _2;
_1 = .SAT_ADD (_2, 9);

This patch would like to reconcile the imm operand to the operand type
mode of _2 by fold_convert to make the vectorizable_call happy.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_sat_add_pattern): Add fold
convert for const_int to the type of operand 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add missing mode_idx for vrol and vror

We add pattern for vector rotate, but seems like we forgot adding
mode_idx which used in AVL propgation (riscv-avlprop.cc).

gcc/ChangeLog:

* config/riscv/vector.md (mode_idx): Add vrol and vror.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/rotr.c: New.

Match: Support form 1 for scalar signed integer .SAT_ADD

This patch would like to support the form 1 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))                  \
  sat_s_add_##T##_fmt_1 (T x, T y)             \
  {                                            \
    T sum = (UT)x + (UT)y;                     \
    return (x ^ y) < 0                         \
      ? sum                                    \
      : (sum ^ x) >= 0                         \
        ? sum                                  \
        : x < 0 ? MIN : MAX;                   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd<m>3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t sum;
   8   │   long unsigned int x.0_1;
   9   │   long unsigned int y.1_2;
  10   │   long unsigned int _3;
  11   │   long int _4;
  12   │   long int _5;
  13   │   int64_t _6;
  14   │   _Bool _11;
  15   │   long int _12;
  16   │   long int _13;
  17   │   long int _14;
  18   │   long int _16;
  19   │   long int _17;
  20   │
  21   │ ;;   basic block 2, loop depth 0
  22   │ ;;    pred:       ENTRY
  23   │   x.0_1 = (long unsigned int) x_7(D);
  24   │   y.1_2 = (long unsigned int) y_8(D);
  25   │   _3 = x.0_1 + y.1_2;
  26   │   sum_9 = (int64_t) _3;
  27   │   _4 = x_7(D) ^ y_8(D);
  28   │   _5 = x_7(D) ^ sum_9;
  29   │   _17 = ~_4;
  30   │   _16 = _5 & _17;
  31   │   if (_16 < 0)
  32   │     goto <bb 3>; [41.00%]
  33   │   else
  34   │     goto <bb 4>; [59.00%]
  35   │ ;;    succ:       3
  36   │ ;;                4
  37   │
  38   │ ;;   basic block 3, loop depth 0
  39   │ ;;    pred:       2
  40   │   _11 = x_7(D) < 0;
  41   │   _12 = (long int) _11;
  42   │   _13 = -_12;
  43   │   _14 = _13 ^ 9223372036854775807;
  44   │ ;;    succ:       4
  45   │
  46   │ ;;   basic block 4, loop depth 0
  47   │ ;;    pred:       2
  48   │ ;;                3
  49   │   # _6 = PHI <sum_9(2), _14(3)>
  50   │   return _6;
  51   │ ;;    succ:       EXIT
  52   │
  53   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t _4;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  12   │   return _4;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the matching for signed .SAT_ADD.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
matching func decl.
(match_unsigned_saturation_add): Try signed .SAT_ADD and rename
to ...
(match_saturation_add): ... here.
(math_opts_dom_walker::after_dom_children): Update the above renamed
func from caller.

Signed-off-by: Pan Li <pan2.li@intel.com>

Fix PR testsuite/116271, gcc.dg/vect/tsvc/vect-tsvc-s176.c fails

gcc/testsuite:
PR testsuite/116271
* gcc.dg/vect/tsvc/vect-tsvc-s176.c [TRUNCATE_TEST]: Make sure
that m stays the same as the loop bound of the middle loop.
* gcc.dg/vect/tsvc/tsvc.h (get_expected_result) <s176> [TRUNCATE_TEST]:
Adjust expected value.

RISC-V: Add testcases for unsigned scalar .SAT_SUB IMM form 4

This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 4.  Aka:

Form 4:
  #define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
  T __attribute__((noinline))             \
  sat_u_sub_imm##IMM##_##T##_fmt_4 (T x)  \
  {                                       \
    return x > (T)IMM ? x - (T)IMM : 0;   \
  }

DEF_SAT_U_SUB_IMM_FMT_4(uint64_t, 23)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-16.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_SUB IMM form 3

This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 3.  Aka:

Form 3:
  #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
  T __attribute__((noinline))             \
  sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
  {                                       \
    return (T)IMM > y ? (T)IMM - y : 0;   \
  }

DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 23)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

Fix test failing on sparc

SPARC does not support vectorizing conditions, which this test relies
on. Use vect_condition as effective target.

Committed as obvious.

PR testsuite/116500

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: Use vect_condition to
check if vectorizing conditions is supported for target.

Update gcc zh_CN.po

* zh_CN.po: Update.

c++/coroutines: fix actor cases not being added to the current switch [PR109867]

Previously, we were building and inserting case_labels manually, which
led to them not being added into the currently running switch via
c_add_case_label. This led to false diagnostics that the user could not
act on.

PR c++/109867

gcc/cp/ChangeLog:

* coroutines.cc (expand_one_await_expression): Replace uses of
build_case_label with finish_case_label.
(build_actor_fn): Ditto.
(create_anon_label_with_ctx): Remove now-unused function.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr109867.C: New test.

Reviewed-by: Iain Sandoe <iain@sandoe.co.uk>

m68k: Accept ASHIFT like MULT in address operand

When LRA pulls an address operand out of a MEM it caninoicalizes a
containing MULT into ASHIFT. Adjust the address decomposer to recognize
this form.

PR target/116413
* config/m68k/m68k.cc (m68k_decompose_index): Accept ASHIFT like
MULT.
(m68k_rtx_costs) [PLUS]: Likewise.
(m68k_legitimize_address): Likewise.

c++: Don't show constructor internal name in error message [PR105483]

We mention 'X::__ct' instead of 'X::X' in the "names the constructor,
not the type" error for this invalid code:

=== cut here ===
struct X {};
void g () {
X::X x;
}
=== cut here ===

The problem is that we use %<%T::%D%> to build the error message, while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This is
what this patch does.

It also skips until the end of the statement and returns error_mark_node
for this and the preceding if block, to avoid emitting extra (useless)
errors.

PR c++/105483

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>. Skip to end of statement and return
error_mark_node in case of error.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error36.C: Adjust test expectation.
* g++.dg/tc1/dr147.C: Likewise.
* g++.old-deja/g++.other/typename1.C: Likewise.
* g++.dg/diagnostic/pr105483.C: New test.

RISC-V: Move helper functions above expand_const_vector

These subroutines will be used in expand_const_vector in a future patch.
Relocate so expand_const_vector can use them.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vector_init_insert_elems): Relocate.
(expand_vector_init_trailing_same_elem): Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Allow non-duplicate bool patterns in expand_const_vector

Currently we assert when encountering a non-duplicate boolean vector.
This patch allows non-duplicate vectors to fall through to the
gcc_unreachable and assert there.

This will be useful when adding a catch-all pattern to emit costs and
handle arbitary vectors.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate
to fall through other patterns before asserting.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.

gcc/ChangeLog:

* config/riscv/riscv-v.h (valid_vec_immediate_p): Add new helper.
* config/riscv/riscv-v.cc (valid_vec_immediate_p): Ditto.
(expand_const_vector): Use new helper.
* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point
case.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Emit costs for bool and stepped const vectors

These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for
bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Handle case when constant vector construction target rtx is not a register

This manifests in RTL that is optimized away which causes runtime failures
in the testsuite. Update all patterns to use a temp result register if required.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Reorder insn cost match order to match corresponding expander match order

The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Relocate.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Fix vid const vector expander for non-npatterns size steps

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2, 4, 4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

arm: Always use vmov.f64 instead of vmov.f32 with MVE

With MVE, vmov.f64 is always supported (no need for +fp.dp extension).

This patch updates two patterns:
- in movdi_vfp, we incorrectly checked
  TARGET_VFP_SINGLE || TARGET_HAVE_MVE instead of
  TARGET_VFP_SINGLE && !TARGET_HAVE_MVE, and didn't take into account
  these two possibilities when computing the length attribute.

- in thumb2_movdf_vfp, we checked only TARGET_VFP_SINGLE.

No need to update movdf_vfp, since it is enabled only for TARGET_ARM
(which is not the case when MVE is enabled).

The patch also updates gcc.target/arm/armv8_1m-fp64-move-1.c, to
accept only vmov.f64 instead of vmov.f32.

Tested on arm-none-eabi with:
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp+fp.dp

2024-08-21  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/vfp.md (movdi_vfp, thumb2_movdf_vfp): Handle MVE
case.

gcc/testsuite/
* gcc.target/arm/armv8_1m-fp64-move-1.c: Update expected code.

pr116174.c: Add the missing */

* gcc.target/i386/pr116174.c: Add the missing */.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Extend check-function-bodies to allow label and directives

As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

LRA: Fix setup_sp_offset

This is part of making m68k work with LRA.  See PR116429.
In short: setup_sp_offset is internally inconsistent.  It wants to
setup the sp_offset for newly generated instructions.  sp_offset for
an instruction is always the state of the sp-offset right before that
instruction.  For that it starts at the (assumed correct) sp_offset
of the instruction right after the given (new) sequence, and then
iterates that sequence forward simulating its effects on sp_offset.

That can't ever be right: either it needs to start at the front
and simulate forward, or start at the end and simulate backward.
The former seems to be the more natural way.  Funnily the local
variable holding that instruction is also called 'before'.

This changes it to the first variant: start before the sequence,
do one simulation step to get the sp-offset state in front of the
sequence and then continue simulating.

More details: in the problematic testcase we start with this
situation (sp_off before 550 is 0):

  550: [--sp] = 0             sp_off = 0  {pushexthisi_const}
  551: [--sp] = 37            sp_off = -4 {pushexthisi_const}
  552: [--sp] = r37           sp_off = -8 {movsi_m68k2}
  554: [--sp] = r116 - r37    sp_off = -12 {subsi3}
  556: call                   sp_off = -16

insn 554 doesn't match its constraints and needs some reloads:

      Creating newreg=262, assigning class DATA_REGS to r262
  554: r262:SI=r262:SI-r37:SI
      REG_ARGS_SIZE 0x10
    Inserting insn reload before:
  996: r262:SI=r116:SI
    Inserting insn reload after:
  997: [--%sp:SI]=r262:SI

         Considering alt=0 of insn 997:   (0) =g  (1) damSKT
            1 Non pseudo reload: reject++
          overall=1,losers=0,rld_nregs=0
      Choosing alt 0 in insn 997:  (0) =g  (1) damSKT {*movsi_m68k2} (sp_off=-16)

Note how insn 997 (the after-reload) now has sp_off=-16 already.  It all
goes downhill from there.  We end up with these insns:

  552: [--sp] = r37           sp_off = -8 {movsi_m68k2}
  996: r262 = r116            sp_off = -12
  554: r262 = r262 - r37      sp_off = -12
  997: [--sp] = r262          sp_off = -16  (!!! should be -12)
  556: call                   sp_off = -16

The call insn sp_off remains at the correct -16, but internally it's already
inconsistent here.  If the sp_off before an insn is -16, and that insn
pre_decs sp, then the after-insn sp_off should be -20.

PR target/116429
* lra.cc (setup_sp_offset): Start with sp_offset from
before the new sequence, not from after.

LRA: Don't use 0 as initialization for sp_offset

this is part of making m68k work with LRA.  See PR116374.
m68k has the property that sometimes the elimation offset
between %sp and %argptr is zero.  During setting up elimination
infrastructure it's changes between sp_offset and previous_offset
that feed into insns_with_changed_offsets that ultimately will
setup looking at the instructions so marked.

But the initial values for sp_offset and previous_offset are
also zero.  So if the targets INITIAL_ELIMINATION_OFFSET (called
in update_reg_eliminate) is zero then nothing changes, the
instructions in question don't get into the list to consider and
the sp_offset tracking goes wrong.

Solve this by initializing those member with -1 instead of zero.
An initial offset of that value seems very unlikely, as it's
in word-sized increments.  This then also reveals a problem in
eliminate_regs_in_insn where it always uses sp_offset-previous_offset
as offset adjustment, even in the first_p pass.  That was harmless
when previous_offset was uninitialized as zero.  But all the other
code uses a different idiom of checking for first_p (or rather
update_p which is !replace_p&&!first_p), and using sp_offset directly.
So use that as well in eliminate_regs_in_insn.

PR target/116374
* lra-eliminations.cc (init_elim_table): Use -1 as initializer.
(update_reg_eliminate): Accept -1 as not-yet-used marker.
(eliminate_regs_in_insn): Use previous_sp_offset only when
not first_p.

final: go down ASHIFT in walk_alter_subreg

when experimenting with m68k plus LRA one of the
changes in the backend is to accept ASHIFTs (not only
MULT) as scale code for address indices.  When then not
turning on LRA but using reload those addresses are
presented to it which chokes on them.  While reload is
going away the change to make them work doesn't really hurt
(and generally seems useful, as MULT and ASHIFT really are
no different).  So just add it.

PR target/116413
* final.cc (walk_alter_subreg): Recurse on AHIFT.

libstdc++: Do not use std::vector<bool>::reference default ctor [PR115098]

This default constructor was made private by r15-3124-gb25b101bc38000 so
the pretty printer tests need a fix to stop using it. There's no
conforming way to get a default-constructed 'reference' now, e.g. trying
to access an element of a default-constructed std::vector<bool> will
trigger an assertion. Remove the tests, but leave a comment in the
printer code about handling it.

libstdc++-v3/ChangeLog:

PR libstdc++/115098
* python/libstdcxx/v6/printers.py (StdBitReferencePrinter): Add
comment.
* testsuite/libstdc++-prettyprinters/simple.cc: Do not default
construct std::vector<bool>::reference.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.

c++: Add most missing C++20 and C++23 names to cxxapi-data.csv

This includes uncommenting the atomic_flag non-member functions, which
were added by PR libstdc++/103934.

Also generate a hint for std::ignore, which was recently tweaked to be
more generally useful by P2968R2, which r15-2324 implemented.

gcc/cp/ChangeLog:

* cxxapi-data.csv: Add C++20 and C++23 names from <chrono>,
<format>, <generator>, <iterator>, <print>, and <stdfloat>.
Set cxx11 dialect for std::ignore in <tuple>. Uncomment
atomic_flag functions from <atomic>.
* std-name-hint.gperf: Regenerate.
* std-name-hint.h: Regenerate.

c++: Add correct copyright dates to output of gen-cxxapi-file.py

This ensures the generated output says something like 2022-2024 rather
than just 2024.

gcc/cp/ChangeLog:

* gen-cxxapi-file.py: Fix copyright dates in generated output.

testsuite: Fix ending of comment in test cases

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: Fixed dg-comment.
* gcc.dg/pr71071.c: Likewise.
* gcc.dg/tree-ssa/noreturn-1.c: Likewise.
* gcc.dg/tree-ssa/pr56727.c: Likewise.
* gcc.target/arc/loop-2.cpp: Likewise.
* gcc.target/arc/loop-3.c: Likewise.
* gcc.target/arc/pr9001107555.c: Likewise.
* gcc.target/arm/armv8_1m-fp16-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.
* gcc.target/i386/amxint8-asmatt-1.c: Likewise.
* gcc.target/i386/amxint8-asmintel-1.c: Likewise.
* gcc.target/i386/avx512bw-vpermt2w-1.c: Likewise.
* gcc.target/i386/avx512vbmi-vpermt2b-1.c: Likewise.
* gcc.target/i386/endbr_immediate.c: Likewise.
* gcc.target/i386/pr96539.c: Likewise.
* gcc.target/i386/sse2-pr98461-2.c: Likewise.
* gcc.target/m68k/pr39726.c: Likewise.
* gcc.target/m68k/pr52076-1.c: Likewise.
* gcc.target/m68k/pr52076-2.c: Likewise.
* gcc.target/nvptx/v2si-vec-set-extract.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Un-XFAIL 'gcc.dg/signbit-5.c' for GCN

It XPASSes after recent commit 5a3387938d4d95717cac29eecd0ba53e0ef9094d
"testsuite: Add -fwrapv to signbit-5.c".

gcc/testsuite/
* gcc.dg/signbit-5.c: Un-XFAIL for GCN.

Handle arithmetic on eliminated address indices [PR116413]

This patch fixes gcc.c-torture/compile/opout.c for m68k with LRA
enabled.  The test has:

...
z (a, b)
{
  return (int) &a + (int) &b + (int) x + (int) z;
}

so it adds the address of two incoming arguments.  This ends up
being treated as an LEA in which the "index" is the incoming
argument pointer, which the LEA multiplies by 2.  The incoming
argument pointer is then eliminated, leading to:

(plus:SI (plus:SI (ashift:SI (plus:SI (reg/f:SI 24 %argptr)
                (const_int -4 [0xfffffffffffffffc]))
            (const_int 1 [0x1]))
        (reg/f:SI 41 [ _6 ]))
    (const_int 20 [0x14]))

In the address_info scheme, the innermost plus has to be treated
as the index "term", since that's the thing that's subject to
index_reg_class.

gcc/
PR middle-end/116413
* rtl.h (address_info): Update commentary.
* rtlanal.cc (valid_base_or_index_term_p): New function, split
out from...
(get_base_term, get_index_term): ...here.  Handle elimination PLUSes.

lra: Don't apply eliminations to allocated registers [PR116321]

The sequence of events in this PR is that:

- the function has many addresses in which only a single hard base
  register is acceptable.  Let's call the hard register H.

- IRA allocates that register to one of the pseudo base registers.
  Let's call the pseudo register P.

- Some of the other addresses that require H occur when P is still live.

- LRA therefore has to spill P.

- When it reallocates P, LRA chooses to use FRAME_POINTER_REGNUM,
  which has been eliminated to the stack pointer.  (This is ok,
  since the frame register is free.)

- Spilling P causes LRA to reprocess the instruction that uses P.

- When reprocessing the address that has P as its base, LRA first
  applies the new allocation, to get FRAME_POINTER_REGNUM,
  and then applies the elimination, to get the stack pointer.

The last step seems wrong: the elimination should only apply to
pre-existing uses of FRAME_POINTER_REGNUM, not to uses that result
from allocating pseudos.  Applying both means that we get the wrong
register number, and therefore the wrong class.

The PR is about an existing testcase that fails with LRA on m86k.

gcc/
PR middle-end/116321
* lra-constraints.cc (get_hard_regno): Only apply eliminations
to existing hard registers.
(get_reg_class): Likewise.

c++, coroutines: The frame pointer is used in the helpers [PR116482].

We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions. Fixed
by making the parameters DECL_ARTIFICIAL.

PR c++/116482

gcc/cp/ChangeLog:

* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116482.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

tree-optimization/116460 - ICE with DCE in forwprop

The following avoids removing stmts with defs that might still have
uses in the IL before calling simple_dce_from_worklist which might
remove those as that will wreck debug stmt generation.  Instead first
perform use-based DCE and then remove stmts which may have uses in
code that CFG cleanup will remove.  This requires tracking stmts
in to_remove by their SSA def so we can check whether it was removed
before without running into the issue that PHIs can be ggc_free()d
upon removal.  So this adds to_remove_defs in addition to to_remove
which has to stay to track GIMPLE_NOPs we want to elide.

PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): First do
simple_dce_from_worklist and then remove stmts in to_remove.
Track defs to be removed in to_remove_defs.

* g++.dg/torture/pr116460.C: New testcase.

Fix another inline7.c test failure on sparc targets

This new test was reported to be still failing on sparc targets.
Here the number of DW_AT_ranges dropped to zero.
The test should pass on this architecture with -Os, -O2 and -O3.
I tried to improve also different known problematic targets,
where only one subroutine had DW_AT_ranges:
Those are armhf (arm with hard float), powerpc and powerpc64.
The best option is to use -Os: So far the only one, where
all two inline instances in this test had two DW_AT_ranges.

gcc/testsuite/ChangeLog:

PR other/116462
* gcc.dg/debug/dwarf2/inline7.c: Switch to -Os optimization.

RISC-V: Support IMM for operand 1 of ussub pattern

This patch would like to allow IMM for the operand 1 of ussub pattern.
Aka .SAT_SUB(x, 22) as the below example.

Form 2:
  #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
  T __attribute__((noinline))             \
  sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
  {                                       \
    return x >= (T)IMM ? x - (T)IMM : 0;  \
  }

DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022)

It is almost the as support imm for operand 0 of ussub pattern, but
allow the second operand to be imm insted of the first operand.

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the
second operand, aka y in parameter.
* config/riscv/riscv.md (ussub<mode>3): Allow const_int for operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

c++/modules: Fix include translation for already-seen headers [PR99243]

After importing a header unit we learn about and setup any header
modules that we transitively depend on. However, this causes
'set_filename' to fail an assertion if we then come across this header
as an #include and attempt to translate it into a module. We still need
to do this translation so that libcpp learns that this is a header unit,
but we shouldn't error just because we've already seen it as an import.

Instead this patch merely checks and errors to handle the case of a
broken mapper implementation which supplies a different CMI path from
the one we already got.

As a drive-by fix, also make failing to find the CMI for a module be a
fatal error: any further errors in the TU are unlikely to be helpful.

PR c++/99243

gcc/cp/ChangeLog:

* module.cc (module_state::set_filename): Handle repeated calls
to 'set_filename' as long as the CMI path matches.
(maybe_translate_include): Adjust comment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/map-2.C: Prune additional fatal error message.
* g++.dg/modules/inc-xlate-4_a.H: New test.
* g++.dg/modules/inc-xlate-4_b.H: New test.
* g++.dg/modules/inc-xlate-4_c.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Clean up include translation [PR110980]

Currently the handling of include translation is confusing to read,
using a tri-state integer without much clarity on what different states
mean. This patch cleans this up to use explicit enumerators indicating
the different possible states instead, and fixes a bug where the option
'-flang-info-include-translate' ended being accidentally unusable.

PR c++/110980

gcc/cp/ChangeLog:

* module.cc (maybe_translate_include): Clean up.

gcc/testsuite/ChangeLog:

* g++.dg/modules/inc-xlate-2_a.H: New test.
* g++.dg/modules/inc-xlate-2_b.H: New test.
* g++.dg/modules/inc-xlate-3.h: New test.
* g++.dg/modules/inc-xlate-3_a.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

MIPS: Include missing mips16.S in libgcc/lib1funcs.S

mips16.S was missing since
commit 29b74545531f6afbee9fc38c267524326dbfbedf
Date: Thu Jun 1 10:14:24 2023 +0800

MIPS: Add speculation_barrier support

Without mips16.S included, some symbols will miss for mips16, and
so some software will fail to build.

libgcc/ChangeLog:

* config/mips/lib1funcs.S: Includes mips16.S.

combine.cc (make_more_copies): Copy attributes from the original pseudo, PR115883

The first of the late-combine passes, propagates some of the copies
made during the (in-time-)combine pass in make_more_copies into the
users of the "original" pseudo registers and removes the "old"
pseudos.  That effectively removes attributes such as REG_POINTER,
which matter to LRA.  The quoted PR is for an ICE-manifesting bug that
was exposed by the late-combine pass and went back to hiding with this
patch until commit r15-2937-g3673b7054ec2, the fix for PR116236, when
it was actually fixed.  To wit, this patch is only incidentally
related to that bug.

In other words, the REG_POINTER attribute should not be required for
LRA to work correctly.  This patch merely corrects state for those
propagated register-uses to ante late-combine.

For reasons not investigated, this fixes a failing test
"FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3"
for x86_64-linux-gnu.

PR middle-end/115883
* combine.cc (make_more_copies): Copy attributes from the original
pseudo to the new copy.

c++/coros: do not assume coros don't nest [PR113457]

In the testcase presented in the PR, during template expansion, an
tsubst of an operand causes a lambda coroutine to be processed, causing
it to get an initial suspend and final suspend.  The code for assigning
awaitable var names (get_awaitable_var) assumed that the sequence Is ->
Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
coroutine before closing it at a time), and reset the counter used for
unique numbering each time a final suspend occured.  This assumption is
false in a few cases, usually when lambdas are involved.

Instead of storing this counter in a static-storage variable, we can
store it in coroutine_info.  This struct is local to each function, so
we don't need to worry about "cross-contamination" nor resetting.

PR c++/113457

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add integer field
awaitable_number.  This is a counter used for assigning unique
names to awaitable temporaries.
(get_awaitable_var): Use awaitable_number from coroutine_info
instead of the static int awn.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr113457-1.C: New test.
* g++.dg/coroutines/pr113457.C: New test.

coroutines: diagnose usage of alloca in coroutines

We do not support it currently, and the resulting memory can only be
used inside a single resumption, so best not confuse the user with it.

PR c++/115858 - Incompatibility of coroutines and alloca()

gcc/ChangeLog:

* coroutine-passes.cc (execute_early_expand_coro_ifns): Emit a
sorry if a statement is an alloca call.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115858.C: New test.

diagnostics: move output formats from diagnostic.{c,h} to their own files

In particular, move the classic text output code to a
diagnostic-text.cc (analogous to -json.cc and -sarif.cc).

No functional change intended.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-format-text.o.
* diagnostic-format-json.cc: Include "diagnostic-format.h".
* diagnostic-format-sarif.cc: Likewise.
* diagnostic-format-text.cc: New file, using material from
diagnostics.cc.
* diagnostic-global-context.cc: Include
"diagnostic-format.h".
* diagnostic-format-text.h: New file, using material from
diagnostics.h.
* diagnostic-format.h: New file, using material from
diagnostics.h.
* diagnostic.cc: Include "diagnostic-format.h" and
"diagnostic-format-text.h".
(diagnostic_text_output_format::~diagnostic_text_output_format):
Move to diagnostic-format-text.cc.
(diagnostic_text_output_format::on_report_diagnostic): Likewise.
(diagnostic_text_output_format::on_diagram): Likewise.
(diagnostic_text_output_format::print_any_cwe): Likewise.
(diagnostic_text_output_format::print_any_rules): Likewise.
(diagnostic_text_output_format::print_option_information):
Likewise.
* diagnostic.h (class diagnostic_output_format): Move to
diagnostic-format.h.
(class diagnostic_text_output_format): Move to
diagnostic-format-text.h.
(diagnostic_output_format_init): Move to
diagnostic-format.h.
(diagnostic_output_format_init_json_stderr): Likewise.
(diagnostic_output_format_init_json_file): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* gcc.cc: Include "diagnostic-format.h".
* opts.cc: Include "diagnostic-format.h".

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_group_plugin.c: Include
"diagnostic-format-text.h".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: consolidate on_{begin,end}_diagnostic into on_report_diagnostic

Previously diagnostic_context::report_diagnostic had, after the call to
pp_format (phases 1 and 2 of formatting the message):

  m_output_format->on_begin_diagnostic (*diagnostic);
  pp_output_formatted_text (this->printer, m_urlifier);
  if (m_show_cwe)
    print_any_cwe (*diagnostic);
  if (m_show_rules)
    print_any_rules (*diagnostic);
  if (m_show_option_requested)
  print_option_information (*diagnostic, orig_diag_kind);
  m_output_format->on_end_diagnostic (*diagnostic, orig_diag_kind);

This patch replaces all of the above with a single call to

  m_output_format->on_report_diagnostic (*diagnostic, orig_diag_kind);

moving responsibility for phase 3 of formatting and printing the result
from diagnostic_context to the output format.

This simplifies diagnostic_context::report_diagnostic and allows us to
move the code that prints CWEs, rules, and option information in textual
form from diagnostic_context to diagnostic_text_output_format, where it
belongs.

No functional change intended.

gcc/ChangeLog:
* diagnostic-format-json.cc
(json_output_format::on_begin_diagnostic): Delete.
(json_output_format::on_end_diagnostic): Rename to...
(json_output_format::on_report_diagnostic): ...this and add call
to pp_output_formatted_text.
(diagnostic_output_format_init_json): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic-format-sarif.cc (sarif_builder::end_diagnostic):
Rename to...
(sarif_builder::on_report_diagnostic): ...this and add call to
pp_output_formatted_text.
(sarif_output_format::on_begin_diagnostic): Delete.
(sarif_output_format::on_end_diagnostic): Rename to...
(sarif_output_format::on_report_diagnostic): ...this and update
call to m_builder accordingly.
(diagnostic_output_format_init_sarif): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic.cc (diagnostic_context::print_any_cwe): Convert to...
(diagnostic_text_output_format::print_any_cwe): ...this.
(diagnostic_context::print_any_rules): Convert to...
(diagnostic_text_output_format::print_any_rules): ...this.
(diagnostic_context::print_option_information): Convert to...
(diagnostic_text_output_format::print_option_information):
...this.
(diagnostic_context::report_diagnostic): Replace calls to the
output format's on_begin_diagnostic, to pp_output_formatted_text,
printing CWE, rules, option info, and the call to the format's
on_end_diagnostic with a call to the format's
on_report_diagnostic.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New vfunc,
which effectively does the on_begin_diagnostic, the call to
pp_output_formatted_text, the calls for printing CWE, rules,
option info, and the call to the diagnostic_finalizer.
* diagnostic.h (diagnostic_output_format::on_begin_diagnostic):
Delete.
(diagnostic_output_format::on_end_diagnostic): Delete.
(diagnostic_output_format::on_report_diagnostic): New.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New.
(class diagnostic_context): Add friend class
diagnostic_text_output_format.
(diagnostic_context::get_urlifier): New accessor.
(diagnostic_context::print_any_cwe): Move decl...
(diagnostic_text_output_format::print_any_cwe): ...to here.
(diagnostic_context::print_any_rules): Move decl...
(diagnostic_text_output_format::print_any_rules): ...to here.
(diagnostic_context::print_option_information): Move decl...
(diagnostic_text_output_format::print_option_information): ...to
here.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: add event IDs to multithreaded event plugin test

Add test coverage of "%@" in event messages in a multithreaded
execution path.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c:
Update expected output.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
Likewise.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c
(test_diagnostic_path::add_event_2): Return the id of the added
event.
(test_diagnostic_path::add_event_2_with_event_id): New.
(example_4): Add event IDs to the deadlock messages indicating
where the locks where acquired.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite: generalize support for Python tests for SARIF output

In r15-2354-g4d1f71d49e396c I added the ability to use Python to write
tests of SARIF output via a new "run-sarif-pytest" based
on "run-gcov-pytest", with a sarif.py support script in
testsuite/gcc.dg/sarif-output.

This followup patch:
(a) removes the limitation of such tests needing to be in
testsuite/gcc.dg/sarif-output by moving sarif.py to testsuite/lib
and adding logic to add that directory to PYTHONPATH when invoking
pytest.

(b) uses this to replace fragile regexp-based tests in
gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c with
Python logic that verifies the structure within the generated JSON,
and to add test coverage for SARIF output relating to GCC plugins.

gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add comments noting that we don't
yet capture any diagnostic_metadata::rules associated with a
diagnostic.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-metadata-sarif.c: New test,
based on diagnostic-test-metadata.c.
* gcc.dg/plugin/diagnostic-test-metadata-sarif.py: New script.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c:
Replace scan-sarif-file directives with run-sarif-pytest, to
run...
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
...this new test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
diagnostic-test-metadata-sarif.c.
* gcc.dg/sarif-output/sarif.py: Move to...
* lib/sarif.py: ...here.
* lib/scansarif.exp (run-sarif-pytest): Prepend "lib" to
PYTHONPATH before running python scripts.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: fixes to selftests

Add selftest coverage for %{ and %} in pretty-print.cc

No functional change intended.

gcc/ChangeLog:
* pretty-print.cc (selftest::test_urls): Make static.
(selftest::test_urls_from_braces): New.
(selftest::test_null_urls): Make static.
(selftest::test_urlification): Likewise.
(selftest::pretty_print_cc_tests): Call test_urls_from_braces.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

json.h: fix typo in comment

gcc/ChangeLog:
* json.h: Fix typo in comment about missing INCLUDE_MEMORY.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: Check template parameters in member class template specialization [PR115716]

We currently ICE upon the following invalid code, because we don't check
that the template parameters in a member class template specialization
are correct.

=== cut here ===
template <typename T> struct x {
  template <typename U> struct y {
    typedef T result2;
  };
};
template<> template<typename U, typename> struct x<int>::y {
  typedef double result2;
};
int main() {
  x<int>::y<int>::result2 xxx2;
}
=== cut here ===

This patch fixes the PR by calling redeclare_class_template.

PR c++/115716

gcc/cp/ChangeLog:

* pt.cc (maybe_process_partial_specialization): Call
redeclare_class_template.

gcc/testsuite/ChangeLog:

* g++.dg/template/spec42.C: New test.
* g++.dg/template/spec43.C: New test.

Remove an unneeded include that was added by mistake.

gcc/ChangeLog:

* tree-if-conv.cc: Remove unneeded include from last change.

Fix bootstap-errors due to enabling -gvariable-location-views

This recent change triggered various bootstap-errors, mostly on
x86 targets because line info advance address entries were output
in the wrong section table.
The switch to the wrong line table happened in dwarfout_set_ignored_loc.
It must use the same section as the earlier called
dwarf2out_switch_text_section.

But also ft32-elf was affected, because the assembler choked on
something simple as ".2byte .LM2-.LM1", but fortunately it is
able to use native location views, the configure test was just
not executed because the ft32 "nop" instruction was missing.

gcc/ChangeLog:

PR debug/116470
* configure.ac: Add the "nop" instruction for cpu type ft32.
* configure: Regenerate.
* dwarf2out.cc (dwarf2out_set_ignored_loc): Use the correct
line info section.

libcpp: deduplicate definition of padding size

Tie together the two functions that ensure tail padding with
search_line_ssse3 via CPP_BUFFER_PADDING macro.

libcpp/ChangeLog:

* internal.h (CPP_BUFFER_PADDING): New macro; use it ...
* charset.cc (_cpp_convert_input): ...here, and ...
* files.cc (read_file_guts): ...here, and ...
* lex.cc (search_line_ssse3): here.

tree-optimization/116460 - improve forwprop compile-time

The following improves forwprop block reachability which I noticed
when debugging PR116460 and what is also noted in the comment. It
avoids processing blocks in natural loops determined unreachable,
thereby making the issue in PR116460 latent.

PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): Do not
process blocks in unreachable natural loops.

Delay edge removal in forwprop

SSA forwprop has switch simplification code that calls remove edge
and as side-effect releases dominator info.  For a followup we want
to retain that so the following delays removing edges until the end
of the pass.  As usual we have to deal with parts of the edge
vanishing due to EH/abnormal pruning so record edges as basic-block
index pairs and remove them only when they are still there.

* tree-ssa-forwprop.cc (simplify_gimple_switch_label_vec):
Delay removing edges and releasing dominator info, instead
record into edges_to_remove vector.
(simplify_gimple_switch): Pass through vector of to remove
edges.
(pass_forwprop::execute): Likewise.  Remove queued edges.

vect: Fix STMT_VINFO_DEF_TYPE check for odd/even widen mult [PR116348]

After fixing PR116142 some code started to trigger an ICE with -O3
-march=znver4. Per Richard Biener who actually made this fix:

"supportable_widening_operation fails at transform time - that's likely
because vectorizable_reduction "puns" defs to internal_def"

so the check should use STMT_VINFO_REDUC_DEF instead of checking if
STMT_VINFO_DEF_TYPE is vect_reduction_def.

gcc/ChangeLog:

PR tree-optimization/116348
* tree-vect-stmts.cc (supportable_widening_operation): Use
STMT_VINFO_REDUC_DEF (x) instead of
STMT_VINFO_DEF_TYPE (x) == vect_reduction_def.

gcc/testsuite/ChangeLog:

PR tree-optimization/116348
* gcc.c-torture/compile/pr116438.c: New test.

Co-authored-by: Richard Biener <rguenther@suse.de>

Match: Add int type fits check for .SAT_ADD imm operand

This patch would like to add strict check for imm operand of .SAT_ADD
matching. We have no type checking for imm operand in previous, which
may result in unexpected IL to be catched by .SAT_ADD pattern.

We leverage the int_fits_type_p here to make sure the imm operand is
a int type fits the result type of the .SAT_ADD. For example:

Fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, 12);
uint8_t sum = .SAT_ADD (a, 12u);
uint8_t sum = .SAT_ADD (a, 126u);
uint8_t sum = .SAT_ADD (a, 128u);
uint8_t sum = .SAT_ADD (a, 228);
uint8_t sum = .SAT_ADD (a, 223u);

Not fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, -1);
uint8_t sum = .SAT_ADD (a, 256u);
uint8_t sum = .SAT_ADD (a, 257);

The below test suite are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add int_fits_type_p check for .SAT_ADD imm operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_add_imm-11.c: Adjust test case for imm.
* gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_add_imm_type_check-1.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-10.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-11.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-12.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-13.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-14.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-15.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-16.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-17.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-18.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-19.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-2.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-20.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-21.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-22.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-23.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-24.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-25.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-26.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-27.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-28.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-29.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-3.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-30.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-31.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-32.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-33.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-34.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-35.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-36.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-37.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-38.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-39.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-4.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-40.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-41.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-42.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-43.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-44.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-45.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-46.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-47.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-48.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-49.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-5.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-50.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-51.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-52.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-6.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-7.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-8.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

expand: Use the correct mode for store flags for popcount [PR116480]

When expanding popcount used for equal to 1 (or rather __builtin_stdc_has_single_bit),
the wrong mode was bsing used for the mode of the store flags. We were using the mode
of the argument to popcount but since popcount's return value is always int, the mode
of the expansion here should have been the mode of the return type rater than the argument.

Built and tested on aarch64-linux-gnu with no regressions.
Also bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/116480

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Use the correct mode
for store flags.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116480-1.c: New test.
* gcc.dg/torture/pr116480-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

i386: Add bf8 -> fp16 intrin

Since BF8 and FP16 have same bits for exponent, the type conversion
between them is just a cast for fraction part. We will use a sequence
of instrctions instead of new instructions to do that. For convenience,
intrins are also provided.

gcc/ChangeLog:

* config/i386/avx10_2-512convertintrin.h
(_mm512_cvtpbf8_ph): New.
(_mm512_mask_cvtpbf8_ph): Ditto.
(_mm512_maskz_cvtpbf8_ph): Ditto.
* config/i386/avx10_2convertintrin.h
(_mm_cvtpbf8_ph): Ditto.
(_mm_mask_cvtpbf8_ph): Ditto.
(_mm_maskz_cvtpbf8_ph): Ditto.
(_mm256_cvtpbf8_ph): Ditto.
(_mm256_mask_cvtpbf8_ph): Ditto.
(_mm256_maskz_cvtpbf8_ph): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-convert-1.c: Add tests for new
intrin.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.

AVX10.2: Support compare instructions

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_ssecom_setcc): Mention behavior change on flags.
(ix86_expand_sse_comi): Handle AVX10.2 behavior.
(ix86_expand_sse_comi_round): Ditto.
(ix86_expand_round_builtin): Ditto.
(ix86_expand_builtin): Change function call.
* config/i386/i386.md (UNSPEC_COMX): New unspec.
* config/i386/sse.md
(avx10_2_v<unord>comx<ssemodesuffix><round_saeonly_name>): New.
(<sse>_<unord>comi<round_saeonly_name>): Add HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-compare-1.c: New test.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>

AVX10.2: Support vector copy instructions

gcc/ChangeLog:

* config.gcc: Add avx10_2copyintrin.h.
* config/i386/i386.md (avx10_2): New isa attribute.
* config/i386/immintrin.h: Include avx10_2copyintrin.h.
* config/i386/sse.md
(sse_movss_<mode>): Add new constraints to handle AVX10.2.
(vec_set<mode>_0): Ditto.
(@vec_set<mode>_0): Ditto.
(vec_set<mode>_0): Ditto.
(avx512fp16_mov<mode>): Ditto.
(*vec_set<mode>_0_1): New split.
* config/i386/avx10_2copyintrin.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-vmovd-1.c: New test.
* gcc.target/i386/avx10_2-vmovd-2.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-1.c: Ditto.
* gcc.target/i386/avx10_2-vmovw-2.c: Ditto.

AVX10.2: Support minmax instructions

gcc/ChangeLog:

* config.gcc: Add avx10_2-512minmaxintrin.h and
avx10_2minmaxintrin.h.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (V8BF, V8BF, V8BF, INT, V8BF, UQI),
(V16BF, V16BF, V16BF, INT, V16BF, UHI),
(V32BF, V32BF, V32BF, INT, V32BF, USI),
(V8HF, V8HF, V8HF, INT, V8HF, UQI),
(V8DF, V8DF, V8DF, INT, V8DF, UQI, INT),
(V32HF, V32HF, V32HF, INT, V32HF, USI, INT),
(V16HF, V16HF, V16HF, INT, V16HF, UHI, INT),
(V16SF, V16SF, V16SF, INT, V16SF, UHI, INT).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc
(ix86_expand_args_builtin): Handle V8BF_FTYPE_V8BF_V8BF_INT_V8BF_UQI,
V16BF_FTYPE_V16BF_V16BF_INT_V16BF_UHI,
V32BF_FTYPE_V32BF_V32BF_INT_V32BF_USI,
V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI,
(ix86_expand_round_builtin): Handle V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI_INT,
V32HF_FTYPE_V32HF_V32HF_INT_V32HF_USI_INT,
V16HF_FTYPE_V16HF_V16HF_INT_V16HF_UHI_INT.
V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI_INT.
* config/i386/immintrin.h: Include avx10_2-512mixmaxintrin.h and
avx10_2minmaxintrin.h.
* config/i386/sse.md (VFH_AVX10_2): New.
(avx10_2_vminmaxnepbf16_<mode><mask_name>): New define_insn.
(avx10_2_minmaxp<mode><mask_name><round_saeonly_name>): Ditto.
(avx10_2_minmaxs<mode><mask_scalar_name><round_saeonly_scalar_name>): Ditto.
* config/i386/avx10_2-512minmaxintrin.h: New file.
* config/i386/avx10_2minmaxintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx512f-helper.h: Add helper function.
* gcc.target/i386/avx10-minmax-helper.h: New helper file.
* gcc.target/i386/avx10_2-512-minmax-1.c: New test.
* gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10_2-minmax-1.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxsh-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxss-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>

[PATCH 2/2] AVX10.2: Support saturating convert instructions

gcc/ChangeLog:

* config/i386/avx10_2-512satcvtintrin.h: Add new intrin.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md (VF1_VF2_AVX10_2): New iterator.
(VF2_AVX10_2): Ditto.
(VI8_AVX10_2): Ditto.
(sat_cvt_sign_prefix): Add new UNSPEC.
(UNSPEC_SAT_CVT_DS_SIGN_ITER): New iterator.
(pd2dqssuff): Ditto.
(avx10_2_vcvtt<castmode>2<sat_cvt_sign_prefix>dqs<mode><mask_name><round_saeonly_name>):
New.
(avx10_2_vcvttpd2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_vcvttps2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_vcvttsd2<sat_cvt_sign_prefix>sis<mode><round_saeonly_name>):
Ditto.
(avx10_2_vcvttss2<sat_cvt_sign_prefix>sis<mode><round_saeonly_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Add test.
* gcc.target/i386/avx10_2-512-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c: New test.
* gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttsd2usis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2sis-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttss2usis-2.c: Ditto.

[PATCH 1/2] AVX10.2: Support saturating convert instructions

gcc/ChangeLog:

* config.gcc: Add avx10_2satcvtintrin.h and
avx10_2-512satcvtintrin.h.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI),
(V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI),
(V16SI, V16SF, V16SI, UHI, INT), (V16HI, V16BF, V16HI, UHI, INT),
(V32HI, V32BF, V32HI, USI, INT).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
V32HI_FTYPE_V32BF_V32HI_USI, V16HI_FTYPE_V16BF_V16HI_UHI,
V8HI_FTYPE_V8BF_V8HI_UQI.
(ix86_expand_round_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI_INT,
V16SI_FTYPE_V16SF_V16SI_UHI_INT, V16HI_FTYPE_V16BF_V16HI_UHI_INT.
* config/i386/immintrin.h: Include avx10_2satcvtintrin.h and
avx10_2-512savcvtintrin.h.
* config/i386/sse.md:
(UNSPEC_CVTNE_BF16_IBS_ITER): New iterator.
(sat_cvt_sign_prefix): Ditto.
(sat_cvt_trunc_prefix): Ditto.
(UNSPEC_CVT_PH_IBS_ITER): Ditto.
(UNSPEC_CVTT_PH_IBS_ITER): Ditto.
(UNSPEC_CVT_PS_IBS_ITER): Ditto.
(UNSPEC_CVTT_PS_IBS_ITER): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>nebf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
New define_insn.
(avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Ditto.
(avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Ditto.
(avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
* config/i386/avx10_2-512satcvtintrin.h: New file.
* config/i386/avx10_2satcvtintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx512f-helper.h: Add new test macro.
* gcc.target/i386/m512-check.h: Add new type.
* gcc.target/i386/avx10_2-512-satcvt-1.c: New test.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.

[PATCH 2/2] AVX10.2: Support BF16 instructions

gcc/ChangeLog:

* config/i386/avx10_2-512bf16intrin.h: Add new intrinsics.
* config/i386/avx10_2bf16intrin.h: Diito.
* config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE
for new type.
* config/i386/i386-builtin.def (BDESC): Add new buildin.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle new type.
* config/i386/sse.md (vecmemsuffix): Add vector BF mode.
(avx10_2_rsqrtpbf16_<mode><mask_name>): New define_insn.
(avx10_2_sqrtnepbf16_<mode><mask_name>): Ditto.
(avx10_2_rcppbf16_<mode><mask_name>): Ditto.
(avx10_2_getexppbf16_<mode><mask_name>): Ditto.
(BF16IMMOP): New iterator.
(bf16immop): Ditto.
(avx10_2_<bf16immop>pbf16_<mode><mask_name>): New define_insn.
(avx10_2_fpclasspbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cmppbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_comsbf16_v8bf): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10-check.h: Add AVX10_SCALAR.
* gcc.target/i386/avx10-helper.h: Add helper functions.
* gcc.target/i386/avx10_2-512-bf16-1.c: Add new tests.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: New test.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcmppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Ditto.

Co-authored-by: Levy Hsu <admin@levyhsu.com>

[PATCH 1/2] AVX10.2: Support BF16 instructions

gcc/ChangeLog:

* config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h.
* config/i386/i386-builtin-types.def : Add new
DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF,
V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF,
V8BF_FTYPE_V8BF_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_UHI,
V32BF_FTYPE_V32BF_V32BF_USI, V32BF_FTYPE_V32BF_V32BF_V32BF_USI,
V8BF_FTYPE_V8BF_V8BF_V8BF_UQI and V16BF_FTYPE_V16BF_V16BF_V16BF_UHI.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle new DEF_FUNCTION_TYPE.
* config/i386/immintrin.h: Include avx10_2-512bf16intrin.h and
avx10_2bf16intrin.h.
* config/i386/sse.md
(VBF_AVX10_2): New iterator.
(avx10_2_scalefpbf16_<mode><mask_name>): New define_insn.
(avx10_2_<code>nepbf16_<mode><mask_name>): Ditto.
(avx10_2_<insn>nepbf16_<mode><mask_name>): Ditto.
(avx10_2_fmaddnepbf16_<mode>_maskz): New expander.
(avx10_2_fnmaddnepbf16_<mode>_maskz): Ditto.
(avx10_2_fmsubnepbf16_<mode>_maskz): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_maskz): Ditto.
(avx10_2_fmaddnepbf16_<mode><sd_maskz_name>): New define_insn.
(avx10_2_fmaddnepbf16_<mode>_mask): Ditto.
(avx10_2_fmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmaddnepbf16_<mode>_mask): Ditto.
(avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fmsubnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmsubnepbf16_<mode>_mask): Ditto.
(avx10_2_fmsubnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_mask): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto.
* config/i386/avx10_2-512bf16intrin.h: New file.
* config/i386/avx10_2bf16intrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-helper.h: Add MAKE_MASK_MERGE and MAKE_MASK_ZERO
for bf16_uw.
* gcc.target/i386/m512-check.h: Add union512bf16_uw, union256bf16_uw,
union128bf16_uw and CHECK_EXP for them.
* gcc.target/i386/avx10-helper.h: New file.
* gcc.target/i386/avx10_2-512-bf16-1.c: New test.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto.

Co-authored-by: Levy Hsu <admin@levyhsu.com>

AVX10.2: Support convert instructions

gcc/ChangeLog:

* config.gcc: Add avx10_2-512convertintrin.h and
avx10_2convertintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_POINTER_TYPE
and DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle AVX10.2.
(ix86_expand_round_builtin): Ditto.
* config/i386/immintrin.h: Include avx10_2-512convertintrin.h,
avx10_2convertintrin.h.
* config/i386/sse.md (VHF_AVX10_2): New iterator.
(bf16_ph): Add 512 bit mode.
(avx10_2_cvt2ps2phx_<mode><mask_name<round_name>): New define_insn.
(ssebvecmode): New iterator.
(UNSPEC_NECONVERTFP8_PACK): Ditto.
(neconvertfp8_pack): Ditto.
(vcvt<neconvertfp8_pack><mode><mask_name>): New define_insn.
(ssebvecmode_2): New iterator.
(UNSPEC_VCVTBIASPH2FP8_PACK): Ditto.
(biasph2fp8_pack): Ditto.
(vcvt<biasph2fp8_pack>v8hf): New expander.
(vcvt<biasph2fp8_pack>v8hf_mask): Ditto.
(*vcvt<biasph2bf8_pack>v8hf): New define_insn.
(*vcvt<biasph2fp8_pack>v8hf_mask): Ditto.
(VHF_AVX10_2_2): New iterator.
(vcvt<biasph2fp8_pack><mode><mask_name>): New define_insn.
(VHF_256_512): New iterator.
(ph2fp8suff): Ditto.
(UNSPEC_NECONVERTPH2FP8_PACK): Ditto.
(neconvertph2fp8): Ditto.
(vcvt<neconvertph2fp8>v8hf_mask): New expander.
(*vcvt<neconvertph2fp8>v8hf): New define_insn.
(*vcvt<neconvertph2fp8>v8hf_mask): Ditto.
(vcvt<neconvertph2fp8><mode><mask_name>): Ditto.
(vcvthf82ph<mode><mask_name>): Ditto.
* config/i386/avx10_2-512convertintrin.h: New file.
* config/i386/avx10_2convertintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add macros for const.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-convert-1.c: New test.
* gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/fp8-helper.h: New helper file.

Co-authored-by: Levy Hsu <admin@levyhsu.com>
Co-authored-by: Kong Lingling <lingling.kong@intel.com>

[PATCH 2/2] AVX10.2: Support media instructions

gcc/ChangeLog:

* config/i386/avx10_2-512mediaintrin.h: Add new intrins.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT16 and AVX10.2.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/sse.md (unspec): Add UNSPEC_VDPPHPS.
(avx10_2_mpsadbw<mask_name>): New define_insn.
(<mask_codefor><sse4_1_avx2>_mpsadbw<mask_name>): Ditto.
(vpdp<vpdpwprodtype>_<mode>): Add AVX10_2_256.
(vpdp<vpdpwprodtype>_v16si): New defin_insn.
(vpdp<vpdpwprodtype>_<mode>_mask): Ditto.
(*vpdp<vpdpwprodtype>_<mode>_maskz): Ditto.
(vpdp<vpdpwprodtype>_<mode>_maskz): New expander.
(vdpphps_<mode>): New define_insn.
(vdpphps_<mode>_mask): Ditto.
(*vdpphps_<mode>_maskz): Ditto.
(vdpphps_<mode>_maskz): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avxvnniint16-1.c: Add new macro test.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Add test.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avxvnniint16-builtin.c: New test.
* gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>

[PATCH 1/2] AVX10.2: Support media instructions

gcc/ChangeLog

* config.gcc: Add avx10_2mediaintrin.h and
avx10_2-512mediaintrin.h.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT8 and AVX10.2.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/immintrin.h: Include avx10_2mediaintrin.h and
avx10_2-512mediaintrin.h
* config/i386/sse.md: (VI4_AVX10_2): New.
(vpdp<vpdotprodtype>_<mode>): Add AVX10_2_256.
(vpdp<vpdotprodtype>_v16si): New define_insn.
(vpdp<vpdotprodtype>_<mode>_mask): Ditto.
(*vpdp<vpdotprodtype>_<mode>_maskz): Ditto.
(vpdp<vpdotprodtype>_<mode>_maskz): New expander.
* config/i386/avx10_2-512mediaintrin.h: New file.
* config/i386/avx10_2mediaintrin.h: Ditto.

gcc/testsuite/ChangeLog

* gcc.target/i386/avx512f-helper.h: Reuse AVX512F macros
for AVX10.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* lib/target-supports.exp
(check_effective_target_avx10_2): New.
(check_effective_target_avx10_2_512): Ditto.
* gcc.target/i386/avx10-check.h: New test file.
* gcc.target/i386/avx10-helper.h: Ditto.
* gcc.target/i386/avx10_2-builtin-1.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Ditto.
* gcc.target/i386/avx10_2-media-1.c: Ditto..
* gcc.target/i386/avxvnniint8-builtin.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>

i386: Refactor m512-check.h

After AVX10 introduction, we still want to use AVX512 helper functions
to avoid duplicate code. In order to reuse them, we need to do some refactor
to make sure each function define happen under correct ISA to avoid ABI
warnings.

gcc/testsuite/ChangeLog:

* gcc.target/i386/m512-check.h: Wrap the function define with
correct vector size.

RISC-V: Support IMM for operand 0 of ussub pattern

This patch would like to allow IMM for the operand 0 of ussub pattern.
Aka .SAT_SUB(1023, y) as the below example.

Form 1:
  #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
  T __attribute__((noinline))             \
  sat_u_sub_imm##IMM##_##T##_fmt_1 (T y)  \
  {                                       \
    return (T)IMM >= y ? (T)IMM - y : 0;  \
  }

DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)

Before this patch:
  10   │ sat_u_sub_imm82_uint64_t_fmt_1:
  11   │     li  a5,82
  12   │     bgtu    a0,a5,.L3
  13   │     sub a0,a5,a0
  14   │     ret
  15   │ .L3:
  16   │     li  a0,0
  17   │     ret

After this patch:
  10   │ sat_u_sub_imm82_uint64_t_fmt_1:
  11   │     li  a5,82
  12   │     sltu    a4,a5,a0
  13   │     addi    a4,a4,-1
  14   │     sub a0,a5,a0
  15   │     and a0,a4,a0
  16   │     ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
func impl to gen xmode rtx reg from operand rtx.
(riscv_expand_ussub): Gen xmode reg for operand 1.
* config/riscv/riscv.md: Allow const_int for operand 1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 4

This patch would like to add test cases for the unsigned vector
.SAT_TRUNC form 4.  Aka:

Form 4:
  #define DEF_VEC_SAT_U_TRUNC_FMT_4(NT, WT)                             \
  void __attribute__((noinline))                                        \
  vec_sat_u_trunc_##NT##_##WT##_fmt_4 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        bool not_overflow = in[i] <= (WT)(NT)(-1);                      \
        out[i] = ((NT)in[i]) | (NT)((NT)not_overflow - 1);              \
      }                                                                 \
  }

DEF_VEC_SAT_U_TRUNC_FMT_4 (uint32_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-24.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 4

This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 4.  Aka:

Form 4:
  #define DEF_SAT_U_TRUNC_FMT_4(NT, WT)          \
  NT __attribute__((noinline))                   \
  sat_u_trunc_##WT##_to_##NT##_fmt_4 (WT x)      \
  {                                              \
    bool not_overflow = x <= (WT)(NT)(-1);       \
    return ((NT)x) | (NT)((NT)not_overflow - 1); \
  }

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-19.c: New test.
* gcc.target/riscv/sat_u_trunc-20.c: New test.
* gcc.target/riscv/sat_u_trunc-21.c: New test.
* gcc.target/riscv/sat_u_trunc-22.c: New test.
* gcc.target/riscv/sat_u_trunc-23.c: New test.
* gcc.target/riscv/sat_u_trunc-24.c: New test.
* gcc.target/riscv/sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/sat_u_trunc-run-24.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

RISC-V: Fix double mode under RV32 not utilize vf

Currently, some binops of vector vs double scalar under RV32 can't
translated to vf but vfmv+vxx.vv.

The cause is that vec_duplicate is also expanded to broadcast for double mode
under RV32. last-combine can't process expanded broadcast.

gcc/ChangeLog:

* config/riscv/vector.md: Add !FLOAT_MODE_P constraint.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto.

[PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.

The previous patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd
aimed to eliminate redundant MOV instructions by removing calling
emit_clobber in lower-subreg.cc's resolve_simple_move.

First, I found that another patch address this issue:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf2737cda53a83332db1a1a021653447b05a7e7
and even without removing calling emit_clobber,
the instruction generation is still as expected.

Second, removing the CLOBBER expression will have side effects.
When there is no CLOBBER expression and only SUBREG assignments exist,
according to the logic of the 'df_lr_bb_local_compute' function,
the register will be added to the basic block LR IN set.
This will cause the register's lifetime to span the entire function,
resulting in increased register pressure. Taking the newly added test case
'gcc/testsuite/gcc.target/riscv/pr43644.c' as an example,
removing the CLOBBER expression will lead to spill in some registers.

gcc/:
* lower-subreg.cc (resolve_simple_move): Re-add calling emit_clobber
immediately before moving a multi-word register by parts.

gcc/testsuite/:
* gcc.target/riscv/pr43644.c: New test case.

testsuite: Run array54.C only for sync_int_long targets

The test case uses "atomic<int>", which fails to link on
pru-unknown-elf target due to missing __atomic_load_4 symbol.

Fix by filtering for sync_int_long effective target. Ensured that the
test still passes for x86_64-pc-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/init/array54.C: Require sync_int_long effective target.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

Support if conversion for switches

The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.

Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
They are handled similar to COND in if conversion.

This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.

PR tree-optimization/115866

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.

Write CodeView information about static locals in optimized code

Write CodeView S_LDATA32 symbols for static locals in optimized code. We have
to handle these separately, as they come after the S_FRAMEPROC, plus you can't
have S_BLOCK32 symbols like you can in unoptimized code.

gcc/
* dwarf2codeview.cc (write_optimized_static_local_vars): New function.
(write_function): Call write_optimized_static_local_vars.

Write CodeView S_FRAMEPROC symbols

Write S_FRAMEPROC symbols, which aren't very useful but seem to be necessary
for Microsoft debuggers to function properly. These symbols come after S_LOCAL
symbols for optimized variables, but before S_REGISTER and S_REGREL32 for
unoptimized variables.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_FRAMEPROC.
(write_s_frameproc): New function.
(write_function): Call write_s_frameproc.

Write CodeView information about optimized stack variables

Outputs S_DEFRANGE_REGISTER_REL symbols for optimized local variables that are
on the stack, consisting of the stack register, the offset, and the code range
for which this applies.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_DEFRANGE_REGISTER_REL.
(write_defrange_register_rel): New function.
(write_optimized_local_variable_loc): Add fbloc param, and call
write_defrange_register_rel.
(write_optimized_local_variable): Add fbloc param.
(write_optimized_function_vars): Add fbloc param.

Write CodeView information about enregistered optimized variables

Enable variable tracking when outputting CodeView debug information, and make
it so that we issue debug symbols for optimized variables in registers. This
consists of S_LOCAL symbols, which give the name and the type of local
variables, followed by S_DEFRANGE_REGISTER symbols for the register and the
code for which this applies.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_LOCAL and
S_DEFRANGE_REGISTER.
(write_s_local): New function.
(write_defrange_register): New function.
(write_optimized_local_variable_loc): New function.
(write_optimized_local_variable): New function.
(write_optimized_function_vars): New function.
(write_function): Call write_optimized_function_vars if variable
tracking enabled.
* dwarf2out.cc (typedef var_loc_view): Move to dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* dwarf2out.h (typedef var_loc_view): Move from dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* opts.cc (finish_options): Enable variable tracking for CodeView.

i386: Update STV's gains for TImode arithmetic right shifts on AVX2.

This patch tweaks timode_scalar_chain::compute_convert_gain to better
reflect the expansion of V1TImode arithmetic right shifts by the i386
backend.  The comment "see ix86_expand_v1ti_ashiftrt" appears after
"case ASHIFTRT" in compute_convert_gain, and the changes below attempt
to better match the logic used there.

The original motivating example is:

__int128 m1;
void foo()
{
  m1 = (m1 << 8) >> 8;
}

which with -O2 -mavx2 we fail to convert to vector form due to the
inappropriate cost of the arithmetic right shift.

  Instruction gain -16 for     7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
  Total gain: -3
  Chain #1 conversion is not profitable

This is reporting that the ASHIFTRT is four instructions worse using
vectors than in scalar form, which is incorrect as the AVX2 expansion
of this shift only requires three instructions (and the scalar form
requires two).

With more accurate costs in timode_scalar_chain::compute_convert_gain
we now see (with -O2 -mavx2):

  Instruction gain -4 for     7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
  Total gain: 9
  Converting chain #1...

which results in:

foo: vmovdqa m1(%rip), %xmm0
        vpslldq $1, %xmm0, %xmm0
        vpsrad  $8, %xmm0, %xmm1
        vpsrldq $1, %xmm0, %xmm0
        vpblendd        $7, %xmm0, %xmm1, %xmm0
        vmovdqa %xmm0, m1(%rip)
        ret

2024-08-25  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain)
<case ASHIFTRT>: Update to match ix86_expand_v1ti_ashiftrt.

Disable late-combine in another RISC-V test

Another test where the output was slightly twiddled by late-combine in which
simply disabling late-combine seems to be the best option.

> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ...
> FAIL: gcc.target/riscv/cm_mv_rv32.c -Os check-function-bodies sum

Pushing to the trunk.

gcc/testsuite
* gcc.target/riscv/cm_mv_rv32.c: Disable late-combine.

[committed] Fix assembly scan for RISC-V VLS tests

Surya's IRA patch from June slightly improves the code we generate for the
vls/calling-conventions tests on RISC-V.  Specifically it removes an
unnecessary move from the instruction stream.  This (of course) broke those
tests:

> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp ...
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable  scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3

This patch does the natural adjustment of those tests by dropping the moves
from the scan.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Update
expected output.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: Likewise.

Turn off late-combine for a few risc-v specific tests

Just minor testsuite adjustments -- several of the shorten-memref tests are
slightly twiddled by the late-combine pass:

> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ...
> FAIL: gcc.target/riscv/shorten-memrefs-2.c   -Os   scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> XPASS: gcc.target/riscv/shorten-memrefs-3.c   -Os   scan-assembler-not load2a:\n.*addi[ \t]*[at][0-9],[at][0-9],[0-9]*
> FAIL: gcc.target/riscv/shorten-memrefs-5.c   -Os   scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> FAIL: gcc.target/riscv/shorten-memrefs-8.c   -Os   scan-assembler store:\n(\t?\\.[^\n]*\n)*\taddi\ta[0-7],a[0-7],1
This patch just turns off the late-combine pass for those tests.  Locally I'd
adjusted all the shorten-memref patches, but a quick re-rest shows that only 4
tests seem affected right now.

Anyway, pushing to the trunk to slightly clean up our test results.

gcc/testsuite
* gcc.target/riscv/shorten-memrefs-2.c: Turn off late-combine.
* gcc.target/riscv/shorten-memrefs-3.c: Likewise.
* gcc.target/riscv/shorten-memrefs-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-8.c: Likewise.

modula2 testsuite: new libc unit test

This patch provides a simple unit test for snprintf and atof against
the libc definition module.

gcc/testsuite/ChangeLog:

* gm2/calling-c/libc/run/pass/calling-c-libc-run-pass.exp: New test.
* gm2/calling-c/libc/run/pass/testlibcstr.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>