git.ipfire.org Git - thirdparty/gcc.git/log

Add overflow API for plus minus mult on range

In previous reviews, adding overflow APIs to range-op would be useful.
Those APIs could help to check if overflow happens when operating
between two 'range's, like: plus, minus, and mult.

Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html

gcc/ChangeLog:

* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

Daily bump.

analyzer: implement reference count checking for CPython plugin [PR107646]

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but
ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

This is a WIP in several ways:
- Currently, functions returning PyObject * are assumed to always
  produce a new reference.
- The validation of reference count is only for PyObjects created within
  a function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* engine.cc (impl_region_model_context::warn): New optional
parameter.
* exploded-graph.h (class impl_region_model_context): Likewise.
* region-model.cc (region_model::pop_frame): New callback
feature for region_model::pop_frame.
* region-model.h (struct append_regions_cb_data): Likewise.
(class region_model): Likewise.
(class region_model_context): New optional parameter.
(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference
count checking for PyObjects.
* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here
(and added more tests).
* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-no-Python-h.c: ...here (and
added more tests).
* gcc.dg/plugin/plugin.exp: New tests.
* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

Analyzer: include algorithm header

gcc/analyzer/ChangeLog:

* region-model.cc: Define INCLUDE_ALGORITHM.

pru: Add cstore expansion patterns

Add cstore patterns for the two specific operations which can be
efficiently expanded using the UMIN instruction:
X != 0
X == 0
The rest of the operations are rejected, and left to be expanded
by the common expansion code.

PR target/106562

gcc/ChangeLog:

* config/pru/predicates.md (const_0_operand): New predicate.
(pru_cstore_comparison_operator): Ditto.
* config/pru/pru.md (cstore<mode>4): New pattern.
(cstoredi4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/pru/pr106562-10.c: New test.
* gcc.target/pru/pr106562-11.c: New test.
* gcc.target/pru/pr106562-5.c: New test.
* gcc.target/pru/pr106562-6.c: New test.
* gcc.target/pru/pr106562-7.c: New test.
* gcc.target/pru/pr106562-8.c: New test.
* gcc.target/pru/pr106562-9.c: New test.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

c++: CWG 2359, wrong copy-init with designated init [PR91319]

This CWG clarifies that designated initializer support direct-initialization.
Just be careful what Note 2 in [dcl.init.aggr]/4.2 says: "If the
initialization is by designated-initializer-clause, its form determines
whether copy-initialization or direct-initialization is performed." Hence
this patch sets CONSTRUCTOR_IS_DIRECT_INIT only when we are dealing with
".x{}", but not ".x = {}".

PR c++/91319

gcc/cp/ChangeLog:

* parser.cc (cp_parser_initializer_list): Set CONSTRUCTOR_IS_DIRECT_INIT
when the designated initializer is of the .x{} form.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig30.C: New test.

c++: disallow constinit on functions [PR111173]

[dcl.constinit]/1: The constinit specifier shall be applied only to a declaration
of a variable with static or thread storage duration.

and while we detect

  constinit int fn();

we weren't detecting

  using F = int();
  constinit F f;

PR c++/111173

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Disallow constinit on functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit19.C: New test.

tree-optimization/111228 - fix testcase

* gcc.dg/tree-ssa/forwprop-42.c: Use __UINT64_TYPE__ instead
of unsigned long.

test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

Like ARM SVE, add RVV variable length xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-7.c: Add RVV.

test: Adapt slp-26.c check for RVV

Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

Since RVV is able to vectorize it with VLS modes like amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Adapt for RVV.

fortran: Restore interface to its previous state on error [PR48776]

Keep memory of the content of the current interface body being parsed
and restore it to its previous state if it has been modified at the time
a parse attempt fails.

This fixes memory errors and random segmentation faults caused by
dangling symbol pointers kept in interfaces' linked lists of symbols.
If a parsing attempt fails and symbols are freed, they should also be
removed from the current interface linked list.

As the list of symbol is a linked list, and parsing only adds new
symbols to the head of the list, all that is needed to track the
previous content of the list is a pointer to its previous head.
This adds such a pointer, and the restoration of the list of symbols
to that pointer on error.

PR fortran/48776

gcc/fortran/ChangeLog:

* gfortran.h (gfc_drop_interface_elements_before): New prototype.
(gfc_current_interface_head): Return a reference to the pointer.
* interface.cc (gfc_current_interface_head): Ditto.
(free_interface_elements_until): New function, generalizing
gfc_free_interface.
(gfc_free_interface): Use free_interface_elements_until.
(gfc_drop_interface_elements_before): New function.
* parse.cc
(current_interface_ptr, previous_interface_head): New static variables.
(current_interface_valid_p, get_current_interface_ptr): New functions.
(decode_statement): Initialize previous_interface_head.
(reject_statement): Restore current interface pointer to point to
previous_interface_head.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_procedure_1.f90: New test.

tree-optimization/111228 - combine two VEC_PERM_EXPRs

The following adds simplification of two VEC_PERM_EXPRs where
the later one replaces all elements from either the first or the
second input of the earlier permute. This allows a three input
permute to be simplified to a two input one.

I'm following the existing two input simplification case and only
allow non-VLA permutes. The now existing three cases and the
single case in tree-ssa-forwprop.cc somehow ask for merging,
I'm not doing this as part of this change though.

PR tree-optimization/111228
* match.pd ((vec_perm (vec_perm ..) @5 ..) -> (vec_perm @x @5 ..)):
New simplifications.

* gcc.dg/tree-ssa/forwprop-42.c: New testcase.

RISC-V: Remove movmisalign pattern for VLA modes

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z  0000000000000000 ra 00000000000100f4 sp 0000003ffffffb30 gp 0000000000012cc8
tp 0000000000000000 t0 00000000000102d4 t1 000000000000000f t2 0000000000000000
s0 0000000000000000 s1 0000000000000000 a0 00000000000101a6 a1 0000000000000008
a2 0000000000000010 a3 0000000000012401 a4 0000000000012480 a5 0000000000000020
a6 000000000000001f a7 00000000000000d6 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 00000000000101ec va/inst 000000000206dc07 sr 8000000200006620
Load access fault!

(spike)
core   0: 0x0000000000010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x0000000000010204
core   0:           tval 0x0000000000012c81
(spike) reg 0 a2
0x0000000000012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte align.

Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is incorrect here.

After this patch, the vectorization failed:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

I will enable auto-vectorization in another approach in the next following patch.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign<mode>): Delete.

test: Fix XPASS of RVV

XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1

Like ARM SVE, Fix these XPASS for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
* gcc.dg/vect/vect-outer-4e.c: Ditto.
* gcc.dg/vect/vect-outer-4f.c: Ditto.
* gcc.dg/vect/vect-outer-4g.c: Ditto.
* gcc.dg/vect/vect-outer-4k.c: Ditto.
* gcc.dg/vect/vect-outer-4l.c: Ditto.

test: Add xfail for riscv_vector

Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c: Ditto.

RISC-V: support cm.mva01s cm.mvsa01 in zcmp

Signed-off-by: Die Li <lidie@eswincomputing.com>
Co-Authored-By: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s<X:mode>): New pattern.
(*mvsa01<X:mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.

RISC-V: support cm.popretz in zcmp

Generate cm.popretz instead of cm.popret if return value is 0.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md: define A0_REGNUM
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_<mode>): md for popretz ra
(@gpr_multi_popretz_up_to_s0_<mode>): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_<mode>): likewise
(@gpr_multi_popretz_up_to_s2_<mode>): likewise
(@gpr_multi_popretz_up_to_s3_<mode>): likewise
(@gpr_multi_popretz_up_to_s4_<mode>): likewise
(@gpr_multi_popretz_up_to_s5_<mode>): likewise
(@gpr_multi_popretz_up_to_s6_<mode>): likewise
(@gpr_multi_popretz_up_to_s7_<mode>): likewise
(@gpr_multi_popretz_up_to_s8_<mode>): likewise
(@gpr_multi_popretz_up_to_s9_<mode>): likewise
(@gpr_multi_popretz_up_to_s11_<mode>): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i

RISC-V: support cm.push cm.pop cm.popret in zcmp

Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.

Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.

Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.

tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]

The following testcase is miscompiled since r279392 aka r10-5451-gef29b12cfbb4979
The strlen pass has adjust_last_stmt function, which performs mainly strcat
or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 0);
memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
at the end but next immediately overwrites it and so the first memcpy can be
adjusted to store 1 fewer bytes.  handle_builtin_memcpy called this function
in two spots, the first one guarded like:
  if (olddsi != NULL
      && tree_fits_uhwi_p (len)
      && !integer_zerop (len))
    adjust_last_stmt (olddsi, stmt, false);
i.e. only for constant non-zero length.  The other spot can call it even
for non-constant length but in that case we punt before that if that length
isn't length of some string + 1, so again non-zero.
The r279392 change I assume wanted to add some warning stuff and changed it
like
   if (olddsi != NULL
-      && tree_fits_uhwi_p (len)
       && !integer_zerop (len))
-    adjust_last_stmt (olddsi, stmt, false);
+    {
+      maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
+      adjust_last_stmt (olddsi, stmt, false);
+    }
While maybe_warn_overflow possibly handles non-constant length fine,
adjust_last_stmt really relies on length to be non-zero, which
!integer_zerop (len) alone doesn't guarantee.  While we could for
len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
adjust_last_stmt will not benefit from it much, so the following patch
just restores the above condition/previous behavior for the adjust_last_stmt
call only.

2023-08-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/110914
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcpy): Don't call
adjust_last_stmt unless len is known constant.

* gcc.c-torture/execute/pr110914.c: New test.

store-merging: Fix up >= 64 bit insertion [PR111015]

The following testcase shows that we mishandle bit insertion for
info->bitsize >= 64. The problem is in using unsigned HOST_WIDE_INT
shift + subtraction + build_int_cst to compute mask, the shift invokes
UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
with info->bitsize happens to compute mask of 0x3f rather than
0x3f'ffffffff'ffffffff.

The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).

2023-08-30 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/111015
* gimple-ssa-store-merging.cc
(imm_store_chain_info::output_merged_store): Use wi::mask and
wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
build_int_cst to build BIT_AND_EXPR mask.

* gcc.dg/pr111015.c: New test.

middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.

Bootstrap and Regression on X86 passed.

Ok for trunk?

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.

RISC-V: Make arch-24.c to test "success" case

arch-24.c and arch-25.c are exactly the same and redundant. The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.

RISC-V: Make sure we get VL REG operand for VLMAX vsetvl

Fix ICE in "vect" testsuite:

FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at df-scan.cc:2958)
FAIL: gcc.dg/vect/pr64495.c (test for excess errors

After this patch, all current found VSETVL PASS related bugs in "vect" are fixed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_insn_info::get_avl_or_vl_reg): Fix bug.

RISC-V: Enable movmisalign for VLS modes

Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug.
Such patch disable vectorization for misalign data movement.

After I check LLVM codes, LLVM supports misalign for VLS modes.

Before this patch:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

After this patch:

foo:
lui a0,%hi(.LANCHOR0)
addi a0,a0,%lo(.LANCHOR0)
addi sp,sp,-16
addi a1,a0,1
li a2,64
sd ra,8(sp)
vsetvli zero,a2,e8,m4,ta,ma
addi a0,a0,128
vle8.v v4,0(a1)
vse8.v v4,0(a0)
call memcmp
bne a0,zero,.L6
ld ra,8(sp)
addi sp,sp,16
jr ra
.L6:
call abort

Note this patch has passed all testcases in "vect" which are related to alignment.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (movmisalign<mode>): New pattern.
* config/riscv/riscv.cc (riscv_support_vector_misalignment): Support
VLS misalign.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: New test.

Daily bump.

RISC-V: Use splitter to generate zicond in another case

So in analyzing Ventana's internal tree against the trunk it became apparent
that the current zicond code is missing a case that helps coremark's bitwise
CRC implementation.

Here's a minimized testcase:

long xor1(long crc, long poly)
{
  if (crc & 1)
    crc ^= poly;

  return crc;
}

ie, it's just a conditional xor.

We generate this:

        andi    a5,a0,1
        neg     a5,a5
        and     a5,a5,a1
        xor     a0,a5,a0
        ret

But we should instead generate:

        andi    a5,a0,1
        czero.eqz       a5,a1,a5
        xor     a0,a5,a0
        ret

Combine wants to generate:

Trying 7, 8 -> 9:
    7: r140:DI=r137:DI&0x1
    8: r141:DI=-r140:DI
      REG_DEAD r140:DI
    9: r142:DI=r141:DI&r144:DI
      REG_DEAD r144:DI
      REG_DEAD r141:DI
Failed to match this instruction:
(set (reg:DI 142)
    (and:DI (sign_extract:DI (reg/v:DI 137 [ crc ])
            (const_int 1 [0x1])
            (const_int 0 [0]))
        (reg:DI 144)))

A splitter can rewrite the above into a suitable if-then-else construct and
squeeze an instruction out of that pesky CRC loop.  Sadly it doesn't really
help anything else.

The patch includes two variants.  One that uses ZBS, the other uses an ANDI
logical to produce the input condition.

gcc/
* config/riscv/zicond.md: New splitters to rewrite single bit
sign extension as the condition to a czero in the desired form.

gcc/testsuite
* gcc.target/riscv/zicond-xor-01.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

analyzer: new warning: -Wanalyzer-overlapping-buffers [PR99860]

gcc/ChangeLog:
PR analyzer/99860
* Makefile.in (ANALYZER_OBJS): Add analyzer/ranges.o.

gcc/analyzer/ChangeLog:
PR analyzer/99860
* analyzer-selftests.cc (selftest::run_analyzer_selftests): Call
selftest::analyzer_ranges_cc_tests.
* analyzer-selftests.h (selftest::run_analyzer_selftests): New
decl.
* analyzer.opt (Wanalyzer-overlapping-buffers): New option.
* call-details.cc: Include "analyzer/ranges.h" and "make-unique.h".
(class overlapping_buffers): New.
(call_details::complain_about_overlap): New.
* call-details.h (call_details::complain_about_overlap): New decl.
* kf.cc (kf_memcpy_memmove::impl_call_pre): Call
cd.complain_about_overlap for memcpy and memcpy_chk.
(kf_strcat::impl_call_pre): Call cd.complain_about_overlap.
(kf_strcpy::impl_call_pre): Likewise.
* ranges.cc: New file.
* ranges.h: New file.

gcc/ChangeLog:
PR analyzer/99860
* doc/invoke.texi: Add -Wanalyzer-overlapping-buffers.

gcc/testsuite/ChangeLog:
PR analyzer/99860
* c-c++-common/analyzer/overlapping-buffers.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: tweaks for explicit conversion fns diagnostic

1) When saying that a conversion is erroneous because it would use
an explicit constructor, it might be nice to show where exactly
the explicit constructor is located.  For example, with this patch:

[...]
explicit.C:4:12: note: 'S::S(int)' declared here
    4 |   explicit S(int) { }
      |            ^

2) When a conversion doesn't work out merely because the conversion
function necessary to do the conversion couldn't be used because
it was marked explicit, it would be useful to the user to say so,
rather than just saying "cannot convert".  For example, with this patch:

explicit.C:13:12: error: cannot convert 'S' to 'bool' in initialization
   13 |   bool b = S{1};
      |            ^~~~
      |            |
      |            S
explicit.C:5:12: note: explicit conversion function was not considered
    5 |   explicit operator bool() const { return true; }
      |            ^~~~~~~~

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Show where the conversion function
was declared.
(maybe_show_nonconverting_candidate): New.
* cp-tree.h (maybe_show_nonconverting_candidate): Declare.
* typeck.cc (convert_for_assignment): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/explicit.C: New test.

RISC-V: Added zvfh support for zfa extensions.

This is a follow-up for the zfa extension, added according to the recommendations
for zvfh and patch of Tsukasa OI <research_trasio@irq.a4lg.com>. At the same time,
zfa-fli-5.c of which is also based on the patch.

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627284.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628492.html

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.

RISC-V: generate builtin macro for compilation with strict alignment

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate
__riscv_unaligned_avoid with value 1 or
__riscv_unaligned_slow with value 1 or
__riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override): Define
riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>

libgccjit: add support for `restrict` attribute on function parameters

gcc/jit/Changelog:
* jit-playback.cc: Remove trailing whitespace characters.
* jit-playback.h: Add get_restrict method.
* jit-recording.cc: Add get_restrict methods.
* jit-recording.h: Add get_restrict methods.
* libgccjit++.h: Add get_restrict methods.
* libgccjit.cc: Add gcc_jit_type_get_restrict.
* libgccjit.h: Declare gcc_jit_type_get_restrict.
* libgccjit.map: Declare gcc_jit_type_get_restrict.

gcc/testsuite/ChangeLog:
* jit.dg/test-restrict.c: Add test for __restrict__ attribute.
* jit.dg/all-non-failing-tests.h: Add test-restrict.c to the list.

gcc/jit/ChangeLog:
* docs/topics/compatibility.rst: Add documentation for LIBGCCJIT_ABI_25.
* docs/topics/types.rst: Add documentation for gcc_jit_type_get_restrict.

Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com>

RISC-V: Add Types to Un-Typed Vector Instructions

Updates vector instructions to ensure that no instruction is left
without a type attribute. Create a placeholder type "vector" for
instructions where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
    const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
    const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
     const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
     const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_<mode>, dfp_dquai_<mode>): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448

analyzer: improve strdup handling [PR105899]

gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strdup::impl_call_pre): Set size of
dynamically-allocated buffer. Simulate copying the string from
the source region to the new buffer.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* c-c++-common/analyzer/pr99193-2.c: Add
-Wno-analyzer-too-complex.
* gcc.dg/analyzer/strdup-1.c: Include "analyzer-decls.h".
(test_concrete_strlen): New.
(test_symbolic_strlen): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Fix one ICE for vect test vect-multitypes-5

There will be one ICE when build vect-multitypes-5.c similar as below:

riscv64-unknown-elf-gcc -O3 \
  -march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \
  -fdiagnostics-plain-output -flto -ffat-lto-objects \
  --param riscv-autovec-preference=scalable -Wno-psabi \
  -ftree-vectorize -fno-tree-loop-distribute-patterns \
  -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \
  gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm

The below RTL is not well handled in riscv_legitimize_const_move, and
then fall through to the default pass. Then the
default force_const_mem will NULL_RTX, and will have ICE when operating
one the NULL_RTX.

(const:DI
  (plus:DI
    (symbol_ref:DI ("ic") [flags 0x2] <var_decl 0x7fe57740be10 ic>)
    (const_poly_int:DI [16, 16])))

This patch would like to take care of this rtl in riscv_legitimize_const_move.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): New declaration.
(riscv_legitimize_const_move): Handle ref plus const poly.

RISC-V: Add stub support for existing extensions (unprivileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

RISC-V: Add stub support for existing extensions (vendor)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.

RISC-V: Add stub support for existing extensions (privileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.

RISC-V: Make PR 102957 tests more comprehensive

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself.

tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs).  Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.

As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out.  The following patch matches that, e.g.
  _16 = .ADD_OVERFLOW (_1, _2);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _15 = _3 + _4;
  _12 = _15 + _18;
into
  _16 = .UADDC (_1, _2, 0);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _19 = .UADDC (_3, _4, _18);
  _12 = IMAGPART_EXPR <_19>;
so that we can emit better code.

As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.

2023-08-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb.  Don't match it though if it could be
matched later on 4 argument addition/subtraction.

* gcc.target/i386/pr79173-12.c: New test.

MATCH: Move `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p

This moves the match pattern `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p.
This now also allows to optmize comparisons and also catches the missed `(~x | y) & (x ^ y)`
transformation into `~x & y`.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/111147
* match.pd (`(x | y) & (~x ^ y)`) Use bitwise_inverted_equal_p
instead of matching bit_not.

gcc/testsuite/ChangeLog:

PR tree-optimization/111147
* gcc.dg/tree-ssa/cmpbit-4.c: New test.

vect test: Remove xfail for riscv

We are planning to enable "vect" testsuite with scalable vector auto-vectorization.

This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1

like ARM SVE.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-12.c: Add riscv xfail.

arm: Fix bootstrap / add missing initializer in MVE type_suffixes

My recent patch r14-3519-g9bae37ec8dc320 (arm: [MVE intrinsics] add
support for p8 and p16 polynomial types) added a new member to
type_suffix_info, but I forgot to add the corresponding initializer to
type_suffixes.

Committed as obvious.

2023-08-29 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins.cc (type_suffixes): Add missing
initializer.

RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2 scan-assembler-times vsetvli\\s+zero,\\s*zero 2

Fix "2" into "3", the assembly is correct and better.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Fix ASM check.

RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): New function.
(pass_vsetvl::compute_local_properties): Fix bug.
(pass_vsetvl::commit_vsetvls): Ditto.
* config/riscv/riscv-vsetvl.h: New function.

RISC-V: Fix error combine of pred_mov pattern

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
    int64_t scalar = *(int64_t*)(base + 100);
    vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
    *(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 {*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128])
        (if_then_else:RVVM2DI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM2DI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM2DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I split @pred_mov pattern and
restrict the conditon of @pred_mov.

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
New predicate.
* config/riscv/riscv-vector-builtins.cc (function_expander::function_expander):
force_reg mem target operand.
* config/riscv/vector.md (@pred_mov<mode>): Wrapper.
(*pred_mov<mode>): Remove imm -> reg pattern.
(*pred_broadcast<mode>_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Adjust.
* gcc.target/riscv/rvv/base/pr110943.c: New test.

mklog: fix bugs of --append option

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.
And with additional fix Python code format error, which Martin reported.

contrib/ChangeLog:

* mklog.py: Fix bugs.

LoongArch: Enable '-free' starting at -O2.

gcc/ChangeLog:

* common/config/loongarch/loongarch-common.cc:
Enable '-free' on O2 and above.
* doc/invoke.texi: Modify the description information
of the '-free' compilation option and add the LoongArch
description.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/sign-extend.c: New test.

Daily bump.

RISC-V: Fix documentation of __builtin_riscv_pause

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

RISC-V: __builtin_riscv_pause for all environment

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler). However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.

However, the "pause" instruction code (0x0100000f) is a HINT and emitting its
instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x0100000f" instead of "pause" only if the extension is disabled (making
the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 'Zihintpause'
extension is disabled.

Fix cond-bool-2.c on powerpc and other targets

This adds `--param logical-op-non-short-circuit=1` to the tescase
so it becomes a target indepdendent testcase now.
I filed PR 111217 as the variant of the testcase which fails indepdendently
of the param.

Committed as obvious after testing to make sure it passes on powerpc now.

gcc/testsuite/ChangeLog:

PR testsuite/111215
* gcc.dg/tree-ssa/cond-bool-2.c: Add
`--param logical-op-non-short-circuit=1` to the options.

MATCH: Move `(X & ~Y) | (~X & Y)` over to use bitwise_inverted_equal_p

This moves the pattern `(X & ~Y) | (~X & Y)` to use bitwise_inverted_equal_p
so we can simplify earlier the case where X and Y are defined by comparisons.
We were able to optimize to (!X)^(!Y) in the end due to the pattern added in
r14-3110-g7fb65f102851248bafa0815 and the older pattern r13-4620-g4d9db4bdd458 .
But folding it earlier is better.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note pr87009.c now gets `return x ^ s; in one case where the test had been expecting
`return s ^ x;` both are valid and would be expectly the same; just we now chose a slightly
different order of simplification which causes the order of the operands to be different.

gcc/ChangeLog:

* match.pd (`(X & ~Y) | (~X & Y)`): Use bitwise_inverted_equal_p
instead of specifically checking for ~X.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cmpbit-3.c: New test.
* gcc.dg/pr87009.c: Update test.

MATCH: Remove redundant pattern for `(x | y) & ~x`

After r14-2885-gb9237226fdc938, this pattern becomes
redundant as we match it using bitwise_inverted_equal_p.

There is already a testcase (gcc.dg/nand.c) for this pattern
and it still passes after the removal.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/111146
* match.pd (`(x | y) & ~x`, `(x & y) | ~x`): Remove
redundant pattern.

PHIOPT: Add dump for match and simplify and early phiopt

This adds dump on the full result of the match-and-simplify
for phiopt and specifically to know if we are rejecting something
due to being in early phi-opt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Add dump information
when resimplify returns true.
(match_simplify_replacement): Print only if accepted the match-and-simplify
result rather than the full sequence.

RISC-V: Fix uninitialized probability for GIMPLE IR tests

This patch fix unitialized probability in GIMPLE IR code tests:
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Skip
never probability.
(pass_vsetvl::compute_probabilities): Fix unitialized probability.

RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal compiler error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix bug.

arm: [MVE intrinsics] rework vmullbq_poly vmulltq_poly

Implement vmull[bt]q_poly using the new MVE builtins framework.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (vmullbq_poly)
(vmulltq_poly): New.
* config/arm/arm-mve-builtins-base.def (vmullbq_poly)
(vmulltq_poly): New.
* config/arm/arm-mve-builtins-base.h (vmullbq_poly)
(vmulltq_poly): New.
* config/arm/arm_mve.h (vmulltq_poly): Remove.
(vmullbq_poly): Remove.
(vmullbq_poly_m): Remove.
(vmulltq_poly_m): Remove.
(vmullbq_poly_x): Remove.
(vmulltq_poly_x): Remove.
(vmulltq_poly_p8): Remove.
(vmullbq_poly_p8): Remove.
(vmulltq_poly_p16): Remove.
(vmullbq_poly_p16): Remove.
(vmullbq_poly_m_p8): Remove.
(vmullbq_poly_m_p16): Remove.
(vmulltq_poly_m_p8): Remove.
(vmulltq_poly_m_p16): Remove.
(vmullbq_poly_x_p8): Remove.
(vmullbq_poly_x_p16): Remove.
(vmulltq_poly_x_p8): Remove.
(vmulltq_poly_x_p16): Remove.
(__arm_vmulltq_poly_p8): Remove.
(__arm_vmullbq_poly_p8): Remove.
(__arm_vmulltq_poly_p16): Remove.
(__arm_vmullbq_poly_p16): Remove.
(__arm_vmullbq_poly_m_p8): Remove.
(__arm_vmullbq_poly_m_p16): Remove.
(__arm_vmulltq_poly_m_p8): Remove.
(__arm_vmulltq_poly_m_p16): Remove.
(__arm_vmullbq_poly_x_p8): Remove.
(__arm_vmullbq_poly_x_p16): Remove.
(__arm_vmulltq_poly_x_p8): Remove.
(__arm_vmulltq_poly_x_p16): Remove.
(__arm_vmulltq_poly): Remove.
(__arm_vmullbq_poly): Remove.
(__arm_vmullbq_poly_m): Remove.
(__arm_vmulltq_poly_m): Remove.
(__arm_vmullbq_poly_x): Remove.
(__arm_vmulltq_poly_x): Remove.

arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vmull_poly

Introduce a function that will be used to build vmull[bt]q_poly
intrinsics that use poly types.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_vmull_poly): New.

arm: [MVE intrinsics] add binary_widen_poly shape

This patch adds the binary_widen_poly shape description.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_widen_poly): New.
* config/arm/arm-mve-builtins-shapes.h (binary_widen_poly): New.

arm: [MVE intrinsics] add support for U and p formats in parse_element_type

Introduce these two format specifiers to define the shape of
vmull[bt]q_poly intrinsics.

'U' is used to define a double-width unsigned
'p' is used to define an element of 'poly' type.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): Add
support for 'U' and 'p' format specifiers.

arm: [MVE intrinsics] add support for p8 and p16 polynomial types

Although they look like aliases for u8 and u16, we need to define them
so that we can handle p8 and p16 suffixes with the general framework.

They will be used by vmull[bt]q_poly intrinsics.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins.cc (type_suffixes): Handle poly_p
field..
(TYPES_poly_8_16): New.
(poly_8_16): New.
* config/arm/arm-mve-builtins.def (p8): New type suffix.
(p16): Likewise.
* config/arm/arm-mve-builtins.h (enum type_class_index): Add
TYPE_poly.
(struct type_suffix_info): Add poly_p field.

arm: [MVE intrinsics] rework vmullbq_int vmulltq_int

Implement vmullbq_int, vmulltq_int using the new MVE builtins
framework.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-base.cc (vmullbq_int, vmulltq_int):
New.
* config/arm/arm-mve-builtins-base.def (vmullbq_int, vmulltq_int):
New.
* config/arm/arm-mve-builtins-base.h (vmullbq_int, vmulltq_int):
New.
* config/arm/arm_mve.h (vmulltq_int): Remove.
(vmullbq_int): Remove.
(vmullbq_int_m): Remove.
(vmulltq_int_m): Remove.
(vmullbq_int_x): Remove.
(vmulltq_int_x): Remove.
(vmulltq_int_u8): Remove.
(vmullbq_int_u8): Remove.
(vmulltq_int_s8): Remove.
(vmullbq_int_s8): Remove.
(vmulltq_int_u16): Remove.
(vmullbq_int_u16): Remove.
(vmulltq_int_s16): Remove.
(vmullbq_int_s16): Remove.
(vmulltq_int_u32): Remove.
(vmullbq_int_u32): Remove.
(vmulltq_int_s32): Remove.
(vmullbq_int_s32): Remove.
(vmullbq_int_m_s8): Remove.
(vmullbq_int_m_s32): Remove.
(vmullbq_int_m_s16): Remove.
(vmullbq_int_m_u8): Remove.
(vmullbq_int_m_u32): Remove.
(vmullbq_int_m_u16): Remove.
(vmulltq_int_m_s8): Remove.
(vmulltq_int_m_s32): Remove.
(vmulltq_int_m_s16): Remove.
(vmulltq_int_m_u8): Remove.
(vmulltq_int_m_u32): Remove.
(vmulltq_int_m_u16): Remove.
(vmullbq_int_x_s8): Remove.
(vmullbq_int_x_s16): Remove.
(vmullbq_int_x_s32): Remove.
(vmullbq_int_x_u8): Remove.
(vmullbq_int_x_u16): Remove.
(vmullbq_int_x_u32): Remove.
(vmulltq_int_x_s8): Remove.
(vmulltq_int_x_s16): Remove.
(vmulltq_int_x_s32): Remove.
(vmulltq_int_x_u8): Remove.
(vmulltq_int_x_u16): Remove.
(vmulltq_int_x_u32): Remove.
(__arm_vmulltq_int_u8): Remove.
(__arm_vmullbq_int_u8): Remove.
(__arm_vmulltq_int_s8): Remove.
(__arm_vmullbq_int_s8): Remove.
(__arm_vmulltq_int_u16): Remove.
(__arm_vmullbq_int_u16): Remove.
(__arm_vmulltq_int_s16): Remove.
(__arm_vmullbq_int_s16): Remove.
(__arm_vmulltq_int_u32): Remove.
(__arm_vmullbq_int_u32): Remove.
(__arm_vmulltq_int_s32): Remove.
(__arm_vmullbq_int_s32): Remove.
(__arm_vmullbq_int_m_s8): Remove.
(__arm_vmullbq_int_m_s32): Remove.
(__arm_vmullbq_int_m_s16): Remove.
(__arm_vmullbq_int_m_u8): Remove.
(__arm_vmullbq_int_m_u32): Remove.
(__arm_vmullbq_int_m_u16): Remove.
(__arm_vmulltq_int_m_s8): Remove.
(__arm_vmulltq_int_m_s32): Remove.
(__arm_vmulltq_int_m_s16): Remove.
(__arm_vmulltq_int_m_u8): Remove.
(__arm_vmulltq_int_m_u32): Remove.
(__arm_vmulltq_int_m_u16): Remove.
(__arm_vmullbq_int_x_s8): Remove.
(__arm_vmullbq_int_x_s16): Remove.
(__arm_vmullbq_int_x_s32): Remove.
(__arm_vmullbq_int_x_u8): Remove.
(__arm_vmullbq_int_x_u16): Remove.
(__arm_vmullbq_int_x_u32): Remove.
(__arm_vmulltq_int_x_s8): Remove.
(__arm_vmulltq_int_x_s16): Remove.
(__arm_vmulltq_int_x_s32): Remove.
(__arm_vmulltq_int_x_u8): Remove.
(__arm_vmulltq_int_x_u16): Remove.
(__arm_vmulltq_int_x_u32): Remove.
(__arm_vmulltq_int): Remove.
(__arm_vmullbq_int): Remove.
(__arm_vmullbq_int_m): Remove.
(__arm_vmulltq_int_m): Remove.
(__arm_vmullbq_int_x): Remove.
(__arm_vmulltq_int_x): Remove.

arm: [MVE intrinsics] add binary_widen shape

This patch adds the binary_widen shape description.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/:

* config/arm/arm-mve-builtins-shapes.cc (binary_widen): New.
* config/arm/arm-mve-builtins-shapes.h (binary_widen): New.

arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vmull

Introduce a function that will be used to build vmull intrinsics with
the _int variant.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_vmull): New.

arm: [MVE intrinsics] factorize vmullbq vmulltq

Factorize vmullbq, vmulltq so that they use the same parameterized
names.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (mve_insn): Add vmullb, vmullt.
(isu): Add VMULLBQ_INT_S, VMULLBQ_INT_U, VMULLTQ_INT_S,
VMULLTQ_INT_U.
(supf): Add VMULLBQ_POLY_P, VMULLTQ_POLY_P, VMULLBQ_POLY_M_P,
VMULLTQ_POLY_M_P.
(VMULLBQ_INT, VMULLTQ_INT, VMULLBQ_INT_M, VMULLTQ_INT_M): Delete.
(VMULLxQ_INT, VMULLxQ_POLY, VMULLxQ_INT_M, VMULLxQ_POLY_M): New.
* config/arm/mve.md (mve_vmullbq_int_<supf><mode>)
(mve_vmulltq_int_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_int_<supf><mode>) ... this.
(mve_vmulltq_poly_p<mode>, mve_vmullbq_poly_p<mode>): Merge into ...
(@mve_<mve_insn>q_poly_<supf><mode>): ... this.
(mve_vmullbq_int_m_<supf><mode>, mve_vmulltq_int_m_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_int_m_<supf><mode>): ... this.
(mve_vmullbq_poly_m_p<mode>, mve_vmulltq_poly_m_p<mode>): Merge into ...
(@mve_<mve_insn>q_poly_m_<supf><mode>): ... this.

arm: [MVE intrinsics] Remove dead check for float type in parse_element_type

Fix a likely copy/paste error, where we check if ch == 'f' after we
checked it's either 's' or 'u'.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (parse_element_type):
Remove dead check.

arm: [MVE intrinsics] fix binary_acca_int32 and binary_acca_int64 shapes

Fix these two shapes, where we were failing to check the last
non-predicate parameter.

2023-08-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_acca_int32): Fix loop bound.
(binary_acca_int64): Likewise.

[frange] Handle relations in LTGT_EXPR.

LTGT_EXPR hasn't been handling relations, especially with NANs as a
possibility.  This handles them while documenting how relations work
in a world with NANs.

Basically we need to special case VREL_EQ before calling
frelop_early_resolve.  Note that VREL_EQ on entry to a range-op entry
is really VREL_EQ U NAN, but to make sure about the NAN possibility,
one must look at the operands.  However, even VREL_EQ U NAN is false
for LTGT_EXPR since the latter is just NE_EXPR without a NAN.

After we handle VREL_EQ, we drop down to frelop_early_resolve
pretending to be a NE_EXPR, and everything should just map correctly.

2023-08-28  Aldy Hernandez  <aldyh@redhat.com>

* range-op-float.cc (fold_range): Handle relations.

LoongArch: Remove redundant sign extension instructions caused by SLT instructions.

Since the SLT instruction does not distinguish between 64-bit operations and 32-bit
operations under the 64-bit LoongArch architecture, if the operand of slt is SImode,
the sign extension of the operand needs to be displayed.

But similar to the test case below, the sign extension is redundant:

extern int src1, src2, src3;

int
test (void)
{
  int data1 = src1 + src2;
  int data2 = src1 + src3;
  return data1 > data2 ? data1 : data2;
}
Assembly code before optimization:
...
add.w $r4,$r4,$r14
add.w $r13,$r13,$r14
slli.w $r12,$r4,0
slli.w $r14,$r13,0
slt $r12,$r12,$r14
masknez $r4,$r4,$r12
maskeqz $r12,$r13,$r12
or $r4,$r4,$r12
slli.w $r4,$r4,0
...

After optimization:
...
add.w $r12,$r12,$r14
add.w $r13,$r13,$r14
slt $r4,$r12,$r13
masknez $r12,$r12,$r4
maskeqz $r4,$r13,$r4
or $r4,$r12,$r4
...

Similar to this test example, the two operands of SLT are obtained by the
addition operation, and add.w implicitly sign-extends, so the two operands
of SLT do not require sign-extend.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Optimize the function implementation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slt-sign-extend.c: New test.

RISC-V: Fix VSETVL test failures

Committed.

Fix failures:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vlm\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 5
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9][0-9]\\:\\s+vle32\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle16\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 2
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times add\\ta[0-7],a[0-7],a[0-7]\\s+\\.L[0-9][0-9]\\:\\s+vle8\\.v\\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\\s*\$[a-x0-9]+\$ 3

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-8.c: Adapt tests.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

Use vmaskmov{ps,pd} for VI48_128_256 when TARGET_AVX2 is not available.

vpmaskmov{d,q} is available for TARGET_AVX2, vmaskmov{ps,ps} is
available for TARGET_AVX, w/o TARGET_AVX2, we can use vmaskmov{ps,pd}
for VI48_128_256

gcc/ChangeLog:

PR target/111119
* config/i386/sse.md (V48_AVX2): Rename to ..
(V48_128_256): .. this.
(ssefltmodesuffix): Extend to V4SF/V8SF/V2DF/V4DF.
(<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>): Change
V48_AVX2 to V48_128_256, also generate vmaskmov{ps,pd} for
integral modes when TARGET_AVX2 is not available.
(<avx_avx2>_maskstore<ssemodesuffix><avxsizesuffix>): Ditto.
(maskload<mode><sseintvecmodelower>): Change V48_AVX2 to
V48_128_256.
(maskstore<mode><sseintvecmodelower>): Ditto.

RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion.
I do the refactor for the following reasons:

  1. Current implementation of phase 3 is doing too many things which makes the code quality
     quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion including how to fuse
     VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location)
     whether it is correct and optimal...etc.

     We are dong these things too much so I added these following functions:

        enum fusion_type get_backward_fusion_type (const bb_info *,
     const vector_insn_info &);
        bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const;
        bool backward_demand_fusion (void);
        bool forward_demand_fusion (void);
        bool cleanup_illegal_dirty_blocks (void);

     to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is
     not the reliable and optimal approach.

     Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple
     'compatible' VSETVL demand info if they are having same earliest edge.  We let LCM decide almost
     everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the
     VSETVLs demand info are compatible or not. That's all we need to do.
     I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for
    for
      ...
         for
   if (...)
     VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
     VSETVL 2 demand: SEW = 16.
   else
     VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3.
     They are having same earliest edge which is outside the 1th inner-most loop.

   - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info:
     demand SEW = 16, LMUL = MF2, TU, MU.

   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL
     to the outer-most loop. So that we can get optimal codegen.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p):
New function.
(after_or_same_p): Ditto.
(find_reg_killed_by): Delete.
(has_vsetvl_killed_avl_p): Ditto.
(anticipatable_occurrence_p): Refactor.
(any_set_in_bb_p): Delete.
(count_regno_occurrences): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(demands_can_be_fused_p): Ditto.
(earliest_pred_can_be_fused_p): New function.
(vsetvl_dominated_by_p): Ditto.
(vector_insn_info::parse_insn): Refactor.
(vector_insn_info::merge): Refactor.
(vector_insn_info::dump): Refactor.
(vector_infos_manager::vector_infos_manager): Refactor.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Refactor.
(vector_infos_manager::free_bitmap_vectors): Refactor.
(vector_infos_manager::dump): Refactor.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Ditto.
(pass_vsetvl::get_backward_fusion_type): Delete.
(pass_vsetvl::hard_empty_block_p): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::forward_demand_fusion): Ditto.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Ditto.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::vsetvl_fusion): Ditto.
(pass_vsetvl::commit_vsetvls): Refactor.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Refactor.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-28.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-29.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-30.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-35.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-36.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-50.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-51.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c:
* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-69.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-70.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-72.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-76.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-77.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-93.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-94.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-96.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/ffload-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/imm_switch-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-45.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-103.c: New test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-13.c: New test.

Daily bump.

RISC-V: Fix spill-11.c testsuite failure

Jivan's work also results in using a different save/restore function for the
spill-11 test. So the expected output needs minor adjusting

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-11.c: Adjust expected output.

RISC-V: Fix spill-12 test

Jivan's recent work on IRA results in more efficient code for this test. This
adjusts the expected output for the removal of 5 instructions and conversion of
an addi into a simple mv.

gcc/testsuite
* gcc.target/riscv/rvv/base/spill-12.c: Update expected output.

RISC-V: Fix xtheadcondmov-indirect.c

The pressure sensitive scheduling change perturbs the output ever so slightly
for this test. Seemed easiest to just turn that off rather than generalize the
expected output enough to work across all the relevant optimization options.

gcc/testsuite/
* gcc.target/riscv/xtheadcondmov-indirect.c: Turn off pressure
sensitive scheduling.

analyzer: Move gcc.dg/analyzer tests to c-c++-common (1) [PR96395]

First batch of moving tests from under gcc.dg/analyzer into
c-c++-common/analyzer.

C builtins are not recognized as such by C++, therefore
this patch no longer uses tree.h:fndecl_built_in_p to recognize
a builtin function, but rather the function names.

Thus functions named as C builtins - such as calloc, sprintf ... -
are recognized as such both in C and C++ sources by the analyzer.

For user-declared functions named after builtins, the latters' function_decl
tree are now preferred over the function_decl the user declared, even
when the FE consider their declaration to mismatch
(Wbuiltin-declaration-mismatch emitted). This mainly comes into account
in the handling of these function attributes : the analyzer uses
the builtin's attributes defined in gcc/builtins.def.

Signed-off-by: benjamin priour <priour.be@gmail.com>
gcc/analyzer/ChangeLog:

PR analyzer/96395
* analyzer.h (class known_function): Add virtual casts
to builtin_known_function.
(class builtin_known_function): New subclass of known_function
for builtins.
* kf.cc (class kf_alloca): Now derived from
builtin_known_function.
(class kf_calloc): Likewise.
(class kf_free): Likewise.
(class kf_malloc): Likewise.
(class kf_memcpy_memmove): Likewise.
(class kf_memset): Likewise.
(class kf_realloc): Likewise.
(class kf_strchr): Likewise.
(class kf_sprintf): Likewise.
(class kf_strcat): Likewise.
(class kf_strcpy): Likewise.
(class kf_strdup): Likewise.
(class kf_strlen): Likewise.
(class kf_strndup): Likewise.
(register_known_functions): Builtins are now registered as
known_functions by name rather than by their BUILTIN_CODE.
* known-function-manager.cc (get_normal_builtin): New overload.
* known-function-manager.h: New overload declaration.
* region-model.cc (region_model::get_builtin_kf): New function.
* region-model.h (class region_model): Add declaration of
get_builtin_kf.
* sm-fd.cc: For called recognized as builtins, use the
attributes of that builtin as defined in gcc/builtins.def
rather than the user's.
* sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.

gcc/testsuite/ChangeLog:

PR analyzer/96395
* gcc.dg/analyzer/aliasing-3.c: Moved to...
* c-c++-common/analyzer/aliasing-3.c: ...here.
* gcc.dg/analyzer/aliasing-pr106473.c: Moved to...
* c-c++-common/analyzer/aliasing-pr106473.c: ...here.
* gcc.dg/analyzer/asm-x86-dyndbg-2.c: Moved to...
* c-c++-common/analyzer/asm-x86-dyndbg-2.c: ...here.
* gcc.dg/analyzer/asm-x86-lp64-2.c: Moved to...
* c-c++-common/analyzer/asm-x86-lp64-2.c: ...here.
* gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c: Moved to...
* c-c++-common/analyzer/atomic-builtins-haproxy-proxy.c: ...here.
* gcc.dg/analyzer/atomic-builtins-qemu-sockets.c: Moved to...
* c-c++-common/analyzer/atomic-builtins-qemu-sockets.c: ...here.
* gcc.dg/analyzer/attr-malloc-6.c: Moved to...
* c-c++-common/analyzer/attr-malloc-6.c: ...here.
* gcc.dg/analyzer/attr-malloc-CVE-2019-19078-usb-leak.c: Moved to...
* c-c++-common/analyzer/attr-malloc-CVE-2019-19078-usb-leak.c: ...here.
* gcc.dg/analyzer/attr-tainted_args-1.c: Moved to...
* c-c++-common/analyzer/attr-tainted_args-1.c: ...here.
* gcc.dg/analyzer/call-summaries-pr107158.c: Moved to...
* c-c++-common/analyzer/call-summaries-pr107158.c: ...here.
* gcc.dg/analyzer/calloc-1.c: Moved to...
* c-c++-common/analyzer/calloc-1.c: ...here.
* gcc.dg/analyzer/compound-assignment-5.c: Moved to...
* c-c++-common/analyzer/compound-assignment-5.c: ...here.
* gcc.dg/analyzer/coreutils-cksum-pr108664.c: Moved to...
* c-c++-common/analyzer/coreutils-cksum-pr108664.c: ...here.
* gcc.dg/analyzer/coreutils-sum-pr108666.c: Moved to...
* c-c++-common/analyzer/coreutils-sum-pr108666.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr108455-1.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr108455-1.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr108455-git-pack-revindex.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr108455-git-pack-revindex.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr108475-1.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr108475-1.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr108475-haproxy-tcpcheck.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr108475-haproxy-tcpcheck.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr109060-haproxy-cfgparse.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr109060-haproxy-cfgparse.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr109239-linux-bus.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr109239-linux-bus.c: ...here.
* gcc.dg/analyzer/deref-before-check-pr77425.c: Moved to...
* c-c++-common/analyzer/deref-before-check-pr77425.c: ...here.
* gcc.dg/analyzer/exec-1.c: Moved to...
* c-c++-common/analyzer/exec-1.c: ...here.
* gcc.dg/analyzer/feasibility-3.c: Moved to...
* c-c++-common/analyzer/feasibility-3.c: ...here.
* gcc.dg/analyzer/fields.c: Moved to...
* c-c++-common/analyzer/fields.c: ...here.
* gcc.dg/analyzer/function-ptr-5.c: Moved to...
* c-c++-common/analyzer/function-ptr-5.c: ...here.
* gcc.dg/analyzer/infinite-recursion-pr108524-1.c: Moved to...
* c-c++-common/analyzer/infinite-recursion-pr108524-1.c: ...here.
* gcc.dg/analyzer/infinite-recursion-pr108524-2.c: Moved to...
* c-c++-common/analyzer/infinite-recursion-pr108524-2.c: ...here.
* gcc.dg/analyzer/infinite-recursion-pr108524-qobject-json-parser.c: Moved to...
* c-c++-common/analyzer/infinite-recursion-pr108524-qobject-json-parser.c: ...here.
* gcc.dg/analyzer/init.c: Moved to...
* c-c++-common/analyzer/init.c: ...here.
* gcc.dg/analyzer/inlining-3-multiline.c: Moved to...
* c-c++-common/analyzer/inlining-3-multiline.c: ...here.
* gcc.dg/analyzer/inlining-3.c: Moved to...
* c-c++-common/analyzer/inlining-3.c: ...here.
* gcc.dg/analyzer/inlining-4-multiline.c: Moved to...
* c-c++-common/analyzer/inlining-4-multiline.c: ...here.
* gcc.dg/analyzer/inlining-4.c: Moved to...
* c-c++-common/analyzer/inlining-4.c: ...here.
* gcc.dg/analyzer/leak-pr105906.c: Moved to...
* c-c++-common/analyzer/leak-pr105906.c: ...here.
* gcc.dg/analyzer/leak-pr108045-with-call-summaries.c: Moved to...
* c-c++-common/analyzer/leak-pr108045-with-call-summaries.c: ...here.
* gcc.dg/analyzer/leak-pr108045-without-call-summaries.c: Moved to...
* c-c++-common/analyzer/leak-pr108045-without-call-summaries.c: ...here.
* gcc.dg/analyzer/leak-pr109059-1.c: Moved to...
* c-c++-common/analyzer/leak-pr109059-1.c: ...here.
* gcc.dg/analyzer/leak-pr109059-2.c: Moved to...
* c-c++-common/analyzer/leak-pr109059-2.c: ...here.
* gcc.dg/analyzer/malloc-2.c: Moved to...
* c-c++-common/analyzer/malloc-2.c: ...here.
* gcc.dg/analyzer/memcpy-2.c: Moved to...
* c-c++-common/analyzer/memcpy-2.c: ...here.
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c: Moved to...
* c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c: ...here.
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c: Moved to...
* c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c: ...here.
* gcc.dg/analyzer/null-deref-pr108806-qemu.c: Moved to...
* c-c++-common/analyzer/null-deref-pr108806-qemu.c: ...here.
* gcc.dg/analyzer/null-deref-pr108830.c: Moved to...
* c-c++-common/analyzer/null-deref-pr108830.c: ...here.
* gcc.dg/analyzer/pr101962.c: Moved to...
* c-c++-common/analyzer/pr101962.c: ...here.
* gcc.dg/analyzer/pr103217-2.c: Moved to...
* c-c++-common/analyzer/pr103217-2.c: ...here.
* gcc.dg/analyzer/pr103217.c: Moved to...
* c-c++-common/analyzer/pr103217.c: ...here.
* gcc.dg/analyzer/pr104029.c: Moved to...
* c-c++-common/analyzer/pr104029.c: ...here.
* gcc.dg/analyzer/pr104062.c: Moved to...
* c-c++-common/analyzer/pr104062.c: ...here.
* gcc.dg/analyzer/pr105783.c: Moved to...
* c-c++-common/analyzer/pr105783.c: ...here.
* gcc.dg/analyzer/pr107345.c: Moved to...
* c-c++-common/analyzer/pr107345.c: ...here.
* gcc.dg/analyzer/pr93695-1.c: Moved to...
* c-c++-common/analyzer/pr93695-1.c: ...here.
* gcc.dg/analyzer/pr94596.c: Moved to...
* c-c++-common/analyzer/pr94596.c: ...here.
* gcc.dg/analyzer/pr94839.c: Moved to...
* c-c++-common/analyzer/pr94839.c: ...here.
* gcc.dg/analyzer/pr95152-4.c: C only.
* gcc.dg/analyzer/pr95152-5.c: C only.
* gcc.dg/analyzer/pr95240.c: Moved to...
* c-c++-common/analyzer/pr95240.c: ...here.
* gcc.dg/analyzer/pr96639.c: Moved to...
* c-c++-common/analyzer/pr96639.c: ...here.
* gcc.dg/analyzer/pr96653.c: Moved to...
* c-c++-common/analyzer/pr96653.c: ...here.
* gcc.dg/analyzer/pr96792.c: Moved to...
* c-c++-common/analyzer/pr96792.c: ...here.
* gcc.dg/analyzer/pr96841.c: Moved to...
* c-c++-common/analyzer/pr96841.c: ...here.
* gcc.dg/analyzer/pr98564.c: Moved to...
* c-c++-common/analyzer/pr98564.c: ...here.
* gcc.dg/analyzer/pr98628.c: Moved to...
* c-c++-common/analyzer/pr98628.c: ...here.
* gcc.dg/analyzer/pr98969.c: Moved to...
* c-c++-common/analyzer/pr98969.c: ...here.
* gcc.dg/analyzer/pr99193-2.c: Moved to...
* c-c++-common/analyzer/pr99193-2.c: ...here.
* gcc.dg/analyzer/pr99193-3.c: Moved to...
* c-c++-common/analyzer/pr99193-3.c: ...here.
* gcc.dg/analyzer/pr99716-1.c: Moved to...
* c-c++-common/analyzer/pr99716-1.c: ...here.
* gcc.dg/analyzer/pr99774-1.c: Moved to...
* c-c++-common/analyzer/pr99774-1.c: ...here.
* gcc.dg/analyzer/realloc-1.c: Moved to...
* c-c++-common/analyzer/realloc-1.c: ...here.
* gcc.dg/analyzer/realloc-2.c: Moved to...
* c-c++-common/analyzer/realloc-2.c: ...here.
* gcc.dg/analyzer/realloc-3.c: Moved to...
* c-c++-common/analyzer/realloc-3.c: ...here.
* gcc.dg/analyzer/realloc-4.c: Moved to...
* c-c++-common/analyzer/realloc-4.c: ...here.
* gcc.dg/analyzer/realloc-5.c: Moved to...
* c-c++-common/analyzer/realloc-5.c: ...here.
* gcc.dg/analyzer/realloc-pr110014.c: Moved to...
* c-c++-common/analyzer/realloc-pr110014.c: ...here.
* gcc.dg/analyzer/snprintf-concat.c: Moved to...
* c-c++-common/analyzer/snprintf-concat.c: ...here.
* gcc.dg/analyzer/sock-1.c: Moved to...
* c-c++-common/analyzer/sock-1.c: ...here.
* gcc.dg/analyzer/sprintf-concat.c: Moved to...
* c-c++-common/analyzer/sprintf-concat.c: ...here.
* gcc.dg/analyzer/string-ops-concat-pair.c: Moved to...
* c-c++-common/analyzer/string-ops-concat-pair.c: ...here.
* gcc.dg/analyzer/string-ops-dup.c: Moved to...
* c-c++-common/analyzer/string-ops-dup.c: ...here.
* gcc.dg/analyzer/switch-enum-pr105273-git-vreportf-2.c: Moved to...
* c-c++-common/analyzer/switch-enum-pr105273-git-vreportf-2.c: ...here.
* gcc.dg/analyzer/symbolic-12.c: Moved to...
* c-c++-common/analyzer/symbolic-12.c: ...here.
* gcc.dg/analyzer/uninit-alloca.c: Moved to...
* c-c++-common/analyzer/uninit-alloca.c: ...here.
* gcc.dg/analyzer/untracked-2.c: Moved to...
* c-c++-common/analyzer/untracked-2.c: ...here.
* gcc.dg/analyzer/vasprintf-1.c: Moved to...
* c-c++-common/analyzer/vasprintf-1.c: ...here.
* gcc.dg/analyzer/write-to-const-1.c: Moved to...
* c-c++-common/analyzer/write-to-const-1.c: ...here.
* gcc.dg/analyzer/write-to-function-1.c: C only.
* gcc.dg/analyzer/write-to-string-literal-1.c: Moved to...
* c-c++-common/analyzer/write-to-string-literal-1.c: ...here.
* gcc.dg/analyzer/write-to-string-literal-4-disabled.c: Moved to...
* c-c++-common/analyzer/write-to-string-literal-4-disabled.c: ...here.
* gcc.dg/analyzer/write-to-string-literal-5.c: Moved to...
* c-c++-common/analyzer/write-to-string-literal-5.c: ...here.
* g++.dg/analyzer/analyzer.exp: Now also run tests under
c-c++-common/analyzer.
* gcc.dg/analyzer/analyzer-decls.h: Add NULL definition.
* gcc.dg/analyzer/analyzer.exp: Now also run tests under
c-c++-common/analyzer.
* gcc.dg/analyzer/pr104369-1.c: C only.
* gcc.dg/analyzer/pr104369-2.c: Likewise.
* gcc.dg/analyzer/pr93355-localealias-feasibility-2.c: Likewise.
* gcc.dg/analyzer/sprintf-1.c: Split into C-only and
C++-friendly bits.
* gcc.dg/analyzer/allocation-size-multiline-1.c: Removed.
* gcc.dg/analyzer/allocation-size-multiline-2.c: Removed.
* gcc.dg/analyzer/allocation-size-multiline-3.c: Removed.
* gcc.dg/analyzer/data-model-11.c: Removed.
* gcc.dg/analyzer/pr61861.c: C only.
* gcc.dg/analyzer/pr93457.c: Removed.
* gcc.dg/analyzer/pr97568.c: Removed.
* gcc.dg/analyzer/write-to-string-literal-4.c: Removed.
* c-c++-common/analyzer/allocation-size-multiline-1.c: New test.
* c-c++-common/analyzer/allocation-size-multiline-2.c: New test.
* c-c++-common/analyzer/allocation-size-multiline-3.c: New test.
* c-c++-common/analyzer/data-model-11.c: New test.
* c-c++-common/analyzer/pr93457.c: New test.
* c-c++-common/analyzer/pr97568.c: New test.
* c-c++-common/analyzer/sprintf-2.c: C++-friendly bit of
previous gcc.dg/analyzer/sprintf-1.c.
* c-c++-common/analyzer/write-to-string-literal-4.c: New test.

Daily bump.

Fortran: Supply a missing dereference [PR92586]

2023-08-26 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/92586
* trans-expr.cc (gfc_trans_arrayfunc_assign): Supply a missing
dereference for the call to gfc_deallocate_alloc_comp_no_caf.

gcc/testsuite/
PR fortran/92586
* gfortran.dg/pr92586.f90 : New test

RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
    if (a[i] < min_v)
      last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma    ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

Fix phi-opt-34.c testcase

Somehow when I was testing the new testcase, it was working but
when I re-ran the full testsuite it was not. Anyways the issue
was just a simple space before the `}` for dg-options directive.

Committed as obvious.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-34.c: Fix dg-options directive.

Daily bump.

RISC-V: Add Types to Un-Typed Sync Instructions:

Updates the sync instructions to ensure that no insn is left without
a type attribute. Updates a total of 9 insns to have type "atomic"
or type "multi" based on number of assembly instructions generated

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

RISC-V: Make stack_save_restore tests more robust

Spurred by Jivan's patch and a desire for cleaner testresults, I went ahead and
make the stack_save_restore tests independent of the precise stack size by
using a regexp.

gcc/testsuite/
* gcc.target/riscv/stack_save_restore_1.c: Robustify.
* gcc.target/riscv/stack_save_restore_2.c: Robustify.

[committed] RISC-V: Fix minor testsuite problem with zicond

I thought I had already fixed this, but clearly if I did, I didn't include it
in any upstream commits.

With -Og the optimizers are hindered in various ways and this prevents using
zicond. So skip this test with -Og (it was already being skipped at -O0).

gcc/testsuite
* gcc.target/riscv/zicond-primitiveSemantics.c: Disable for -Og.

[PATCH v10] RISC-V: Add support for the Zfa extension

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
  accelerate the processing of JavaScript Numbers.", so it seems that no implementation
  is required.

2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
  Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
  fmaxm<hf\sf\df>3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.

Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>

OpenMP: Document support for imperfectly-nested loops.

libgomp/ChangeLog
* libgomp.texi (OpenMP 5.0): Imperfectly-nested loops are done.

OpenMP: Fortran support for imperfectly-nested loops

OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.

In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

gcc/fortran/ChangeLog
* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(make_structured_block): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(check_nested_loop_in_chain): New.
(check_nested_loop_in_block_state): New.
(check_nested_loop_in_block_symbol): New.
(check_nested_loop_in_block): New.
(expr_uses_intervening_var): New.
(is_intervening_var): New.
(expr_is_invariant): Do not assume loops are perfectly nested.
(resolve_omp_do): Handle imperfectly-nested loops.
* trans-stmt.cc (gfc_trans_block_construct): Generate
OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.

gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect-gotos.f90: New.
* gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.

libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/target-imperfect1.f90: New.
* testsuite/libgomp.fortran/target-imperfect2.f90: New.
* testsuite/libgomp.fortran/target-imperfect3.f90: New.
* testsuite/libgomp.fortran/target-imperfect4.f90: New.

OpenMP: New C/C++ testcases for imperfectly nested loops.

gcc/testsuite/ChangeLog
* c-c++-common/gomp/imperfect-attributes.c: New.
* c-c++-common/gomp/imperfect-badloops.c: New.
* c-c++-common/gomp/imperfect-blocks.c: New.
* c-c++-common/gomp/imperfect-extension.c: New.
* c-c++-common/gomp/imperfect-gotos.c: New.
* c-c++-common/gomp/imperfect-invalid-scope.c: New.
* c-c++-common/gomp/imperfect-labels.c: New.
* c-c++-common/gomp/imperfect-legacy-syntax.c: New.
* c-c++-common/gomp/imperfect-pragmas.c: New.
* c-c++-common/gomp/imperfect1.c: New.
* c-c++-common/gomp/imperfect2.c: New.
* c-c++-common/gomp/imperfect3.c: New.
* c-c++-common/gomp/imperfect4.c: New.
* c-c++-common/gomp/imperfect5.c: New.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/imperfect1.c: New.
* testsuite/libgomp.c-c++-common/imperfect2.c: New.
* testsuite/libgomp.c-c++-common/imperfect3.c: New.
* testsuite/libgomp.c-c++-common/imperfect4.c: New.
* testsuite/libgomp.c-c++-common/imperfect5.c: New.
* testsuite/libgomp.c-c++-common/imperfect6.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect1.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect2.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect3.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect4.c: New.

OpenMP: C++ support for imperfectly-nested loops

OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements.  Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.

gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(check_omp_intervening_code): New.
(cp_parser_statement_seq_opt): Special-case nested loops, blocks,
and other constructs for OpenMP loops.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_convert_omp_range_for): Likewise.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten.  Add more comments.
(insert_structured_blocks): New.
(find_structured_blocks): New.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop.  Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (finish_omp_for): Try harder to preserve location
of loop variable init expression for use in diagnostics.
(struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
* g++.dg/gomp/attrs-imperfect1.C: New test.
* g++.dg/gomp/attrs-imperfect2.C: New test.
* g++.dg/gomp/attrs-imperfect3.C: New test.
* g++.dg/gomp/attrs-imperfect4.C: New test.
* g++.dg/gomp/attrs-imperfect5.C: New test.
* g++.dg/gomp/pr41967.C: Adjust expected error patterns.
* g++.dg/gomp/tpl-imperfect-gotos.C: New test.
* g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.

libgomp/ChangeLog
* testsuite/libgomp.c++/attrs-imperfect1.C: New test.
* testsuite/libgomp.c++/attrs-imperfect2.C: New test.
* testsuite/libgomp.c++/attrs-imperfect3.C: New test.
* testsuite/libgomp.c++/attrs-imperfect4.C: New test.
* testsuite/libgomp.c++/attrs-imperfect5.C: New test.
* testsuite/libgomp.c++/attrs-imperfect6.C: New test.
* testsuite/libgomp.c++/imperfect-class-1.C: New test.
* testsuite/libgomp.c++/imperfect-class-2.C: New test.
* testsuite/libgomp.c++/imperfect-class-3.C: New test.
* testsuite/libgomp.c++/imperfect-destructor.C: New test.
* testsuite/libgomp.c++/imperfect-template-1.C: New test.
* testsuite/libgomp.c++/imperfect-template-2.C: New test.
* testsuite/libgomp.c++/imperfect-template-3.C: New test.

OpenMP: C front end support for imperfectly-nested loops

OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

New common C/C++ testcases are in a separate patch.

gcc/c-family/ChangeLog
* c-common.h (c_omp_check_loop_binding_exprs): Declare.
* c-omp.cc: Include tree-iterator.h.
(find_binding_in_body): New.
(check_loop_binding_expr_r): New.
(LOCATION_OR): New.
(check_looop_binding_expr): New.
(c_omp_check_loop_binding_exprs): New.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(check_omp_intervening_code): New.
(add_structured_block_stmt): New.
(c_parser_compound_statement_nostart): Recognize intervening code,
nested loops, and other things that need special handling in
OpenMP loop constructs.
(c_parser_while_statement): Error on loop in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Error on calls to
the OpenMP runtime in intervening code.
(c_parser_pragma): Error on OpenMP pragmas in intervening code.
(c_parser_omp_loop_nest): New.
(c_parser_omp_for_loop): Rewrite to use recursive descent, calling
c_parser_omp_loop_nest to do the heavy lifting.

gcc/ChangeLog
* omp-api.h: New.
* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Update for new C error behavior.
* c-c++-common/goacc/tile-2.c: Likewise.
* gcc.dg/gomp/collapse-1.c: Likewise.

OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.

In order to detect invalid jumps in and out of intervening code in
imperfectly-nested loops, the front ends need to insert some sort of
marker to identify the structured block sequences that they push into
the inner body of the loop. The error checking happens in the
diagnose_omp_blocks pass, between gimplification and OMP lowering, so
we need both GENERIC and GIMPLE representations of these markers.
They are removed in OMP lowering so no subsequent passes need to know
about them.

This patch doesn't include any front-end changes to generate the new
data structures.

gcc/cp/ChangeLog
* constexpr.cc (cxx_eval_constant_expression): Handle
OMP_STRUCTURED_BLOCK.
* pt.cc (tsubst_expr): Likewise.

gcc/ChangeLog
* doc/generic.texi (OpenMP): Document OMP_STRUCTURED_BLOCK.
* doc/gimple.texi (GIMPLE instruction set): Add
GIMPLE_OMP_STRUCTURED_BLOCK.
(GIMPLE_OMP_STRUCTURED_BLOCK): New subsection.
* gimple-low.cc (lower_stmt): Error on GIMPLE_OMP_STRUCTURED_BLOCK.
* gimple-pretty-print.cc (dump_gimple_omp_block): Handle
GIMPLE_OMP_STRUCTURED_BLOCK.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_structured_block): New.
* gimple.def (GIMPLE_OMP_STRUCTURED_BLOCK): New.
* gimple.h (gimple_build_omp_structured_block): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
(CASE_GIMPLE_OMP): Likewise.
* gimplify.cc (is_gimple_stmt): Handle OMP_STRUCTURED_BLOCK.
(gimplify_expr): Likewise.
* omp-expand.cc (GIMPLE_OMP_STRUCTURED_BLOCK): Error on
GIMPLE_OMP_STRUCTURED_BLOCK.
* omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
(lower_omp_1): Likewise.
(diagnose_sb_1): Likewise.
(diagnose_sb_2): Likewise.
* tree-inline.cc (remap_gimple_stmt): Handle
GIMPLE_OMP_STRUCTURED_BLOCK.
(estimate_num_insns): Likewise.
* tree-nested.cc (convert_nonlocal_reference_stmt): Likewise.
(convert_local_reference_stmt): Likewise.
(convert_gimple_call): Likewise.
* tree-pretty-print.cc (dump_generic_node): Handle
OMP_STRUCTURED_BLOCK.
* tree.def (OMP_STRUCTURED_BLOCK): New.
* tree.h (OMP_STRUCTURED_BLOCK_BODY): New.

RISC-V: Enable Hoist to GCSE simple constants

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 111139).

To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Adjust const_int
cost. Add some comments about different constants handling.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

MATCH: Move `a ? one_zero : one_zero` matching after min/max matching

In PR 106677, I noticed that on the trunk we were producing:
```
  _25 = SR.116_117 == 0;
  _27 = (unsigned char) _25;
  _32 = _27 | SR.116_117;
```
From `SR.115_117 != 0 ? SR.115_117 : 1`
Rather than:
```
  _119 = MAX_EXPR <1, SR.115_117>;
```
Or (rather)
```
  _119 = SR.115_117 | 1;
```
Due to the order of the patterns.

Committed as approved with the new comment and testcase.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`a ? one_zero : one_zero`): Move
below detection of minmax.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-34.c: New test.

MATCH: `a | C -> C` when we know that `a & ~C == 0`

Even though this is handled by other code inside both VRP and CCP,
sometimes we want to optimize this outside of VRP and CCP.
An example is given in PR 106677 where phiopt will happen
after VRP (which removes a cast for a comparison) and then
phiopt will optimize the phi to be `a | 1` which can then
be optimized to `1` due to this patch.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note Similar code already exists in simplify_rtx for the RTL level;
it was moved from combine to simplify_rtx in r0-72539-gbd1ef757767f6d.
gcc/ChangeLog:

* match.pd (`a | C -> C`): New pattern.

Fortran: improve bounds checking for DATA with implied-do [PR35095]

gcc/fortran/ChangeLog:

PR fortran/35095
* data.cc (get_array_index): Add bounds-checking code and return error
status. Overindexing will be allowed as an extension for -std=legacy
and generate an error in standard-conforming mode.
(gfc_assign_data_value): Use error status from get_array_index for
graceful error recovery.

gcc/testsuite/ChangeLog:

PR fortran/35095
* gfortran.dg/data_bounds_1.f90: Adjust options to disable warnings.
* gfortran.dg/data_bounds_2.f90: New test.

fortran: Rename TRUE/FALSE to true/false in *.cc files

gcc/fortran/ChangeLog:

* match.cc (gfc_match_equivalence): Rename TRUE/FALSE to true/false.
* module.cc (check_access): Ditto.
* primary.cc (match_real_constant): Ditto.
* trans-array.cc (gfc_trans_allocate_array_storage): Ditto.
(get_array_ctor_strlen): Ditto.
* trans-common.cc (find_equivalence): Ditto.
(add_equivalences): Ditto.