git.ipfire.org Git - thirdparty/gcc.git/log

c++: merge tsubst_copy into tsubst_copy_and_build

The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy.  The former is mostly a superset of the latter, with
some differences.

The main apparent difference is their handling of various tree codes,
but much of the tree code handling in tsubst_copy appears to be dead
code.  This is because tsubst_copy mostly gets (directly) called on
id-expressions rather than on arbitrary expressions.  The interesting
tree codes are PARM_DECL, VAR_DECL, BIT_NOT_EXPR, SCOPE_REF,
TEMPLATE_ID_EXPR and IDENTIFIER_NODE:

* for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
   followed by doing some extra handling of its own
* for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
   calls (i.e. the first operand is an identifier or a type)
* for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
   refrains from doing name lookup of the terminal name

Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref, and finally
it dispatches to tsubst for type trees.[1]

Thus tsubst_copy is similar enough to tsubst_copy_and_build that it
makes sense to merge the two functions, with the main difference we
want to preserve is tsubst_copy's lack of name lookup for id-expressions.
This patch achieves this via a new tsubst flag tf_no_name_lookup which
controls name lookup and resolution of a (top-level) id-expression.

[1]: Exiting early for null 'args' doesn't seem right since it means we
return templated trees even when !processing_template_decl.  And
dispatching to tsubst for type trees muddles the distinction between
type and expressions which makes things less clear at the call site.
So these properties of tsubst_copy don't seem worth preserving.

N.B. the diff for this patch looks much cleaner when generated using
the "patience diff" algorithm via Git's --patience flag.

gcc/cp/ChangeLog:

* cp-tree.h (enum tsubst_flags): Add tf_no_name_lookup.
* pt.cc (tsubst_pack_expansion): Use tsubst for substituting
BASES_TYPE.
(tsubst_decl) <case USING_DECL>: Use tsubst_name instead of
tsubst_copy.
(tsubst) <case TEMPLATE_TYPE_PARM>: Use tsubst_copy_and_build
instead of tsubst_copy for substituting
CLASS_PLACEHOLDER_TEMPLATE.
<case TYPENAME_TYPE>: Use tsubst_name instead of tsubst_copy for
substituting TYPENAME_TYPE_FULLNAME.
(tsubst_name): Define.
(tsubst_qualified_id): Use tsubst_name instead of tsubst_copy
for substituting the component name of a SCOPE_REF.
(tsubst_copy): Remove.
(tsubst_copy_and_build): Clear tf_no_name_lookup at the start,
and remember if it was set.  Call maybe_dependent_member_ref if
tf_no_name_lookup was not set.
<case IDENTIFIER_NODE>: Don't do name lookup if tf_no_name_lookup
was set.
<case TEMPLATE_ID_EXPR>: If tf_no_name_lookup was set, use
tsubst_name instead of tsubst_copy_and_build to substitute the
template and don't finish the template-id.
<case BIT_NOT_EXPR>: Handle identifier and type operand (if
tf_no_name_lookup was set).
<case SCOPE_REF>: Avoid trying to resolve a SCOPE_REF if
tf_no_name_lookup was set by calling build_qualified_name directly
instead of tsubst_qualified_id.
<case SIZEOF_EXPR>: Handling of sizeof...  copied from tsubst_copy.
<case CALL_EXPR>: Use tsubst_name instead of tsubst_copy to
substitute a TEMPLATE_ID_EXPR callee naming an unresolved template.
<case COMPONENT_REF>: Likewise to substitute the member.
<case FUNCTION_DECL>: Copied from tsubst_copy and merged with ...
<case VAR_DECL, PARM_DECL>: ... these.  Initial handling copied
from tsubst_copy.  Optimize local variable substitution by
trying retrieve_local_specialization before checking
uses_template_parms.
<case CONST_DECL>: Copied from tsubst_copy.
<case FIELD_DECL>: Likewise.
<case NAMESPACE_DECL>: Likewise.
<case OVERLOAD>: Likewise.
<case TEMPLATE_DECL>: Likewise.
<case TEMPLATE_PARM_INDEX>: Likewise.
<case TYPE_DECL>: Likewise.
<case CLEANUP_POINT_EXPR>: Likewise.
<case OFFSET_REF>: Likewise.
<case EXPR_PACK_EXPANSION>: Likewise.
<case NONTYPE_ARGUMENT_PACK>: Likewise.
<case *_CST>: Likewise.
<case *_*_FOLD_EXPR>: Likewise.
<case DEBUG_BEGIN_STMT>: Likewise.
<case CO_AWAIT_EXPR>: Likewise.
<case TRAIT_EXPR>: Use tsubst and tsubst_copy_and_build instead
of tsubst_copy.
<default>: Copied from tsubst_copy.
(tsubst_initializer_list): Use tsubst and tsubst_copy_and_build
instead of tsubst_copy.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: non-static memfn call dependence cleanup [PR106086]

In cp_parser_postfix_expression, and in the CALL_EXPR case of
tsubst_copy_and_build, we essentially repeat the type-dependent and
COMPONENT_REF callee cases of finish_call_expr. This patch deduplicates
this logic by making both spots consistently go through finish_call_expr.

This allows us to easily fix PR106086 -- which is about us neglecting to
capture 'this' when we resolve a use of a non-static member function of
the current instantiation only at lambda regeneration time -- by moving
the call to maybe_generic_this_capture from the parser to finish_call_expr
so that we consider capturing 'this' at regeneration time as well.

PR c++/106086

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Consolidate three
calls to finish_call_expr, one to build_new_method_call and
one to build_min_nt_call_vec into one call to finish_call_expr.
Don't call maybe_generic_this_capture here.
* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Remove
COMPONENT_REF callee handling.
(type_dependent_expression_p): Use t_d_object_e_p instead of
t_d_e_p for COMPONENT_REF and OFFSET_REF.
* semantics.cc (finish_call_expr): In the type-dependent case,
call maybe_generic_this_capture here instead.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash127.C: Expect additional error due to
being able to check the member access expression ahead of time.
Strengthen the test by not instantiating the class template.
* g++.dg/cpp1y/lambda-generic-this5.C: New test.

c++: remove NON_DEPENDENT_EXPR, part 2

This follow-up patch removes build_non_dependent_expr (and
make_args_non_dependent) and calls thereof, no functional change.

gcc/cp/ChangeLog:

* call.cc (build_new_method_call): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
* coroutines.cc (finish_co_return_stmt): Likewise.
* cp-tree.h (build_non_dependent_expr): Remove.
(make_args_non_dependent): Remove.
* decl2.cc (grok_array_decl): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(build_offset_ref_call_from_tree): Likewise.
* init.cc (build_new): Likewise.
* pt.cc (make_args_non_dependent): Remove.
(test_build_non_dependent_expr): Remove.
(cp_pt_cc_tests): Adjust.
* semantics.cc (finish_expr_stmt): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(finish_for_expr): Likewise.
(finish_call_expr): Likewise.
(finish_omp_atomic): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.
(build_x_indirect_ref): Likewise.
(build_x_binary_op): Likewise.
(build_x_array_ref): Likewise.
(build_x_vec_perm_expr): Likewise.
(build_x_shufflevector): Likewise.
(build_x_unary_op): Likewise.
(cp_build_addressof): Likewise.
(build_x_conditional_expr): Likewise.
(build_x_compound_expr): Likewise.
(build_static_cast): Likewise.
(build_x_modify_expr): Likewise.
(check_return_expr): Likewise.
* typeck2.cc (build_x_arrow): Likewise.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: remove NON_DEPENDENT_EXPR, part 1

This tree code dates all the way back to r69130[1] which implemented
typing of non-dependent expressions.  Its motivation was never clear (to
me at least) since its documentation in e.g. cp-tree.def doesn't seem
accurate anymore.  build_non_dependent_expr has since gained a bunch of
edge cases about whether or how to wrap certain templated trees, making
it hard to reason about in general.

So this patch removes this tree code, and temporarily turns
build_non_dependent_expr into the identity function.  The subsequent
patch will remove build_non_dependent_expr and adjust its callers
appropriately.

We now need to more thoroughly handle templated (sub)trees in a couple
of places which previously didn't need to since they didn't look through
NON_DEPENDENT_EXPR.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2003-July/109355.html

gcc/c-family/ChangeLog:

* c-warn.cc (check_address_or_pointer_of_packed_member): Handle
type-dependent callee of CALL_EXPR.

gcc/cp/ChangeLog:

* class.cc (instantiate_type): Remove NON_DEPENDENT_EXPR
handling.
* constexpr.cc (cxx_eval_constant_expression): Likewise.
(potential_constant_expression_1): Likewise.
* coroutines.cc (coro_validate_builtin_call): Don't
expect ALIGNOF_EXPR to be wrapped in NON_DEPENDENT_EXPR.
* cp-objcp-common.cc (cp_common_init_ts): Remove
NON_DEPENDENT_EXPR handling.
* cp-tree.def (NON_DEPENDENT_EXPR): Remove.
* cp-tree.h (build_non_dependent_expr): Temporarily redefine as
the identity function.
* cvt.cc (maybe_warn_nodiscard): Handle type-dependent and
variable callee of CALL_EXPR.
* cxx-pretty-print.cc (cxx_pretty_printer::expression): Remove
NON_DEPENDENT_EXPR handling.
* error.cc (dump_decl): Likewise.
(dump_expr): Likewise.
* expr.cc (mark_use): Likewise.
(mark_exp_read): Likewise.
* pt.cc (build_non_dependent_expr): Remove.
* tree.cc (lvalue_kind): Remove NON_DEPENDENT_EXPR handling.
(cp_stabilize_reference): Likewise.
* typeck.cc (warn_for_null_address): Likewise.
(cp_build_binary_op): Handle type-dependent SIZEOF_EXPR operands.
(cp_build_unary_op) <case TRUTH_NOT_EXPR>: Don't fold inside a
template.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/var-concept3.C: Adjust expected diagnostic
for attempting to call a variable concept.

Reviewed-by: Jason Merrill <jason@redhat.com>

middle-end: don't pass loop_vinfo to vect_set_loop_condition during prolog peeling

During the refactoring I had passed loop_vinfo on to vect_set_loop_condition
during prolog peeling. This parameter is unused in most cases except for in
vect_set_loop_condition_partial_vectors where it's behaviour depends on whether
loop_vinfo is NULL or not. Apparently this code expect it to be NULL and it
reads the structures from a different location.

This fixes the failing testcase which was not using the lens values determined
earlier in vectorizable_store because it was looking it up in the given
loop_vinfo instead.

gcc/ChangeLog:

PR tree-optimization/111866
* tree-vect-loop-manip.cc (vect_do_peeling): Pass null as vinfo to
vect_set_loop_condition during prolog peeling.

tree-optimization/111383 - testcase for fixed PR

PR tree-optimization/111383
PR tree-optimization/110243
gcc/testsuite/
* gcc.dg/torture/pr111383.c: New testcase.

tree-optimization/111445 - simple_iv simplification fault

The following fixes a missed check in the simple_iv attempt
to simplify (signed T)((unsigned T) base + step) where it
allows a truncating inner conversion leading to wrong code.

PR tree-optimization/111445
* tree-scalar-evolution.cc (simple_iv_with_niters):
Add missing check for a sign-conversion.

* gcc.dg/torture/pr111445.c: New testcase.

tree-optimization/110243 - IVOPTs introducing undefined overflow

The following addresses IVOPTs rewriting expressions in its
strip_offset without caring for definedness of overflow. Rather
than the earlier attempt of just using the proper
split_constant_offset from data-ref analysis the following adjusts
IVOPTs helper trying to minimize changes from this fix, possibly
easing backports.

PR tree-optimization/110243
PR tree-optimization/111336
* tree-ssa-loop-ivopts.cc (strip_offset_1): Rewrite
operations with undefined behavior on overflow to
unsigned arithmetic.

* gcc.dg/torture/pr110243.c: New testcase.
* gcc.dg/torture/pr111336.c: Likewise.

tree-optimization/111891 - fix assert in vectorizable_simd_clone_call

The following fixes the assert in vectorizable_simd_clone_call to
assert we have a vector type during transform. Whether we have
one during analysis depends on whether another SLP user decided
on the type of a constant/external already. When we end up with
a mismatch in desire the updating will fail and make vectorization
fail.

PR tree-optimization/111891
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fix
assert.

* gfortran.dg/pr111891.f90: New testcase.

amdgcn: add -march=gfx1030 EXPERIMENTAL

Accept the architecture configure option and resolve build failures.  This is
enough to build binaries, but I've not got a device to test it on, so there
are probably runtime issues to fix.  The cache control instructions might be
unsafe (or too conservative), and the kernel metadata might be off.  Vector
reductions will need to be reworked for RDNA2.  In principle, it would be
better to use wavefrontsize32 for this architecture, but that would mean
switching everything to allow SImode masks, so wavefrontsize64 it is.

The multilib is not included in the default configuration so either configure
--with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,....

The majority of this patch has no effect on other devices, but changing from
using scalar writes for the exit value to vector writes means we don't need
the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2).

gcc/ChangeLog:

* config.gcc: Allow --with-arch=gfx1030.
* config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack.
(ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030.
(TARGET_GFX1030): New.
(TARGET_RDNA2): New.
* config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2.
(addc<mode>3<exec_vcc>): Add RDNA2 syntax variant.
(subc<mode>3<exec_vcc>): Likewise.
(<convop><mode><vndi>2_exec): Add RDNA2 alternatives.
(vec_cmp<mode>di): Likewise.
(vec_cmp<u><mode>di): Likewise.
(vec_cmp<mode>di_exec): Likewise.
(vec_cmp<u><mode>di_exec): Likewise.
(vec_cmp<mode>di_dup): Likewise.
(vec_cmp<mode>di_dup_exec): Likewise.
(reduc_<reduc_op>_scal_<mode>): Disable for RDNA2.
(*<reduc_op>_dpp_shr_<mode>): Likewise.
(*plus_carry_dpp_shr_<mode>): Likewise.
(*plus_carry_in_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030.
(gcn_global_address_p): RDNA2 only allows smaller offsets.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_omp_device_kind_arch_isa): Recognise gfx1030.
(gcn_expand_epilogue): Use VGPRs instead of SGPRs.
(output_file_start): Configure gfx1030.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__;
(ASSEMBLER_DIALECT): New.
* config/gcn/gcn.md (rdna): New define_attr.
(enabled): Use "rdna" attribute.
(gcn_return): Remove s_dcache_wb.
(addcsi3_scalar): Add RDNA2 syntax variant.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA.
(*memory_barrier): Add RDNA2 syntax variant.
(atomic_load<mode>): Add RDNA2 cache control variants, and disable
scalar atomics for RDNA2.
(atomic_store<mode>): Likewise.
(atomic_exchange<mode>): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx1030.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(main): Recognise -march=gfx1030.
* config/gcn/t-omp-device: Add gfx1030 isa.

libgcc/ChangeLog:

* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(isa_hsa_name): Recognise gfx1030.
(isa_code): Likewise.
* team.c (defined): Remove s_endpgm.

tree-optimization/111000 - restrict invariant motion of shifts

The following restricts moving variable shifts to when they are
always executed in the loop as we currently do not have an efficient
way to rewrite them to something that is unconditionally
well-defined and value range analysis will otherwise compute
invalid ranges for the shift operand.

PR tree-optimization/111000
* stor-layout.h (element_precision): Move ..
* tree.h (element_precision): .. here.
* tree-ssa-loop-im.cc (movement_possibility_1): Restrict
motion of shifts and rotates.

* gcc.dg/torture/pr111000.c: New testcase.

Control flow redundancy hardening

This patch introduces an optional hardening pass to catch unexpected
execution flows.  Functions are transformed so that basic blocks set a
bit in an automatic array, and (non-exceptional) function exit edges
check that the bits in the array represent an expected execution path
in the CFG.

Functions with multiple exit edges, or with too many blocks, call an
out-of-line checker builtin implemented in libgcc.  For simpler
functions, the verification is performed in-line.

-fharden-control-flow-redundancy enables the pass for eligible
functions, --param hardcfr-max-blocks sets a block count limit for
functions to be eligible, and --param hardcfr-max-inline-blocks
tunes the "too many blocks" limit for in-line verification.
-fhardcfr-skip-leaf makes leaf functions non-eligible.

Additional -fhardcfr-check-* options are added to enable checking at
exception escape points, before potential sibcalls, hereby dubbed
returning calls, and before noreturn calls and exception raises.  A
notable case is the distinction between noreturn calls expected to
throw and those expected to terminate or loop forever: the default
setting for -fhardcfr-check-noreturn-calls, no-xthrow, performs
checking before the latter, but the former only gets checking in the
exception handler.  GCC can only tell between them by explicit marking
noreturn functions expected to raise with the newly-introduced
expected_throw attribute, and corresponding ECF_XTHROW flag.

for  gcc/ChangeLog

* tree-core.h (ECF_XTHROW): New macro.
* tree.cc (set_call_expr): Add expected_throw attribute when
ECF_XTHROW is set.
(build_common_builtin_node): Add ECF_XTHROW to
__cxa_end_cleanup and _Unwind_Resume or _Unwind_SjLj_Resume.
* calls.cc (flags_from_decl_or_type): Check for expected_throw
attribute to set ECF_XTHROW.
* gimple.cc (gimple_build_call_from_tree): Propagate
ECF_XTHROW from decl flags to gimple call...
(gimple_call_flags): ... and back.
* gimple.h (GF_CALL_XTHROW): New gf_mask flag.
(gimple_call_set_expected_throw): New.
(gimple_call_expected_throw_p): New.
* Makefile.in (OBJS): Add gimple-harden-control-flow.o.
* builtins.def (BUILT_IN___HARDCFR_CHECK): New.
* common.opt (fharden-control-flow-redundancy): New.
(-fhardcfr-check-returning-calls): New.
(-fhardcfr-check-exceptions): New.
(-fhardcfr-check-noreturn-calls=*): New.
(Enum hardcfr_check_noreturn_calls): New.
(fhardcfr-skip-leaf): New.
* doc/invoke.texi: Document them.
(hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
* flag-types.h (enum hardcfr_noret): New.
* gimple-harden-control-flow.cc: New.
* params.opt (-param=hardcfr-max-blocks=): New.
(-param=hradcfr-max-inline-blocks=): New.
* passes.def (pass_harden_control_flow_redundancy): Add.
* tree-pass.h (make_pass_harden_control_flow_redundancy):
Declare.
* doc/extend.texi: Document expected_throw attribute.

for  gcc/ada/ChangeLog

* gcc-interface/trans.cc (gigi): Mark __gnat_reraise_zcx with
ECF_XTHROW.
(build_raise_check): Likewise for all rcheck subprograms.

for  gcc/c-family/ChangeLog

* c-attribs.cc (handle_expected_throw_attribute): New.
(c_common_attribute_table): Add expected_throw.

for  gcc/cp/ChangeLog

* decl.cc (push_throw_library_fn): Mark with ECF_XTHROW.
* except.cc (build_throw): Likewise __cxa_throw,
_ITM_cxa_throw, __cxa_rethrow.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-cfr.c: New.
* c-c++-common/harden-cfr-noret-never-O0.c: New.
* c-c++-common/torture/harden-cfr-noret-never.c: New.
* c-c++-common/torture/harden-cfr-noret-noexcept.c: New.
* c-c++-common/torture/harden-cfr-noret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-noret.c: New.
* c-c++-common/torture/harden-cfr-notail.c: New.
* c-c++-common/torture/harden-cfr-returning.c: New.
* c-c++-common/torture/harden-cfr-tail.c: New.
* c-c++-common/torture/harden-cfr-abrt-always.c: New.
* c-c++-common/torture/harden-cfr-abrt-never.c: New.
* c-c++-common/torture/harden-cfr-abrt-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-abrt-nothrow.c: New.
* c-c++-common/torture/harden-cfr-abrt.c: New.
* c-c++-common/torture/harden-cfr-always.c: New.
* c-c++-common/torture/harden-cfr-never.c: New.
* c-c++-common/torture/harden-cfr-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-always.c: New.
* c-c++-common/torture/harden-cfr-bret-never.c: New.
* c-c++-common/torture/harden-cfr-bret-noopt.c: New.
* c-c++-common/torture/harden-cfr-bret-noret.c: New.
* c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-bret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-retcl.c: New.
* c-c++-common/torture/harden-cfr-bret.c: New.
* g++.dg/harden-cfr-throw-always-O0.C: New.
* g++.dg/harden-cfr-throw-returning-O0.C: New.
* g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-always.C: New.
* g++.dg/torture/harden-cfr-throw-never.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New.
* g++.dg/torture/harden-cfr-throw-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-nocleanup.C: New.
* g++.dg/torture/harden-cfr-throw-returning.C: New.
* g++.dg/torture/harden-cfr-throw.C: New.
* gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New.
* gcc.dg/torture/harden-cfr-tail-ub.c: New.
* gnat.dg/hardcfr.adb: New.

for  libgcc/ChangeLog

* Makefile.in (LIB2ADD): Add hardcfr.c.
* hardcfr.c: New.

rtl-ssa: Don't leave NOTE_INSN_DELETED around

This patch tweaks change_insns to also call ::remove_insn to ensure the
underlying RTL insn gets removed from the insn chain in the case of a
deletion.

This avoids leaving NOTE_INSN_DELETED around after deleting insns.

For movement, the RTL insn chain is updated earlier in change_insns with
the call to move_insn. For deletion, it seems reasonable to do it here.

gcc/ChangeLog:

* rtl-ssa/changes.cc (function_info::change_insns): Ensure we call
::remove_insn on deleted insns.

Document {L,R}ROTATE_EXPR

The following amends the {L,R}SHIFT_EXPR documentation with
documentation about the {L,R}ROTATE_EXPR case.

* doc/generic.texi ({L,R}ROTATE_EXPR): Document.

SH: Fix PR 101177

Fix accidentally inverted comparison.

gcc/ChangeLog:

PR target/101177
* config/sh/sh.md (unnamed split pattern): Fix comparison of
find_regno_note result.

Rewrite more refs for epilogue vectorization

The following makes sure to rewrite all gather/scatter detected by
dataref analysis plus stmts classified as VMAT_GATHER_SCATTER. Maybe
we need to rewrite all refs, the following covers the cases I've
run into now.

* tree-vect-loop.cc (update_epilogue_loop_vinfo): Rewrite
both STMT_VINFO_GATHER_SCATTER_P and VMAT_GATHER_SCATTER
stmt refs.

Fixup vect_get_and_check_slp_defs for gathers and .MASK_LOAD

I went a little bit too simple with implementing SLP gather support
for emulated and builtin based gathers. The following fixes the
conflict that appears when running into .MASK_LOAD where we rely
on vect_get_operand_map and the bolted-on STMT_VINFO_GATHER_SCATTER_P
checking wrecks that. The following properly integrates this with
vect_get_operand_map, adding another special index refering to
the vect_check_gather_scatter analyzed offset.

This unbreaks aarch64 (and hopefully riscv), I'll followup with
more fixes and testsuite coverage for x86 where I think I got
masked gather SLP support wrong.

* tree-vect-slp.cc (off_map, off_op0_map, off_arg2_map,
off_arg3_arg2_map): New.
(vect_get_operand_map): Get flag whether the stmt was
recognized as gather or scatter and use the above
accordingly.
(vect_get_and_check_slp_defs): Adjust.
(vect_build_slp_tree_2): Likewise.

omp_lib.f90.in: Deprecate omp_lock_hint_* for OpenMP 5.0

The omp_lock_hint_* parameters were deprecated in favor of
omp_sync_hint_*. While omp.h contained deprecation markers for those,
the omp_lib module only contained them for omp_{g,s}_nested.

Note: The -Wdeprecated-declarations warning will only become active once
openmp_version / _OPENMP is bumped from 201511 (4.5) to 201811 (5.0).

libgomp/ChangeLog:

* omp_lib.f90.in: Tag omp_lock_hint_* as being deprecated when
_OPENMP >= 201811.

RISC-V: Rename some variables of vector_block_info[NFC]

1. Remove "m_" prefix as they are not private members.
2. Rename infos -> local_infos, info -> global_info to clarify their meaning.

Pushed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): Rename variables.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.
(pre_vsetvl::emit_vsetvl): Ditto.

ifcvt: Support bitfield lowering of multiple-exit loops

With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so. In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.

This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them. I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
(get_loop_body_if_conv_order): ... to here.
(if_convertible_loop_p): Remove single_exit check.
(tree_if_conversion): Move single_exit check to if-conversion part and
support multiple exits.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-8.c: New test.
* gcc.dg/vect/vect-bitfield-read-9.c: New test.

Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>

middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds

The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.

This adds support for them by explicitly matching and handling gcond as a
source.

Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization.   See tests:

  - vect-early-break_20.c
  - vect-early-break_21.c

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>

Fix testcases that are raised by support -mevex512

Hi, all

This patch aims to fix some scan-asm fail of pr89229-{5,6,7}b.c since we emit
scalar vmov{s,d} here, when trying to use x/ymm 16+ w/o avx512vl but with
avx512f+evex512.

If everyone has no objection to the modification of this behavior, then we tend
to solve these failures by modifying these testcases.

BRs,
Lin

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr89229-5b.c: Modify test.
* gcc.target/i386/pr89229-6b.c: Ditto.
* gcc.target/i386/pr89229-7b.c: Ditto.

RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

But it generate horrible register spillings.

The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.

So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations:

1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.

Before this patch:

f3:
        ble     a4,zero,.L8
        csrr    t0,vlenb
        slli    t1,t0,4
        csrr    a6,vlenb
        sub     sp,sp,t1
        csrr    a5,vlenb
        slli    a6,a6,3
        slli    a5,a5,2
        add     a6,a6,sp
        vsetvli a7,zero,e16,m8,ta,ma
        slli    a4,a4,3
        vid.v   v8
        addi    t6,a5,-1
        vand.vi v8,v8,-2
        neg     t5,a5
        vs8r.v  v8,0(sp)
        vadd.vi v8,v8,1
        vs8r.v  v8,0(a6)
        j       .L4
.L12:
        vsetvli a7,zero,e16,m8,ta,ma
.L4:
        csrr    t0,vlenb
        slli    t0,t0,3
        vl8re16.v       v16,0(sp)
        add     t0,t0,sp
        vmv.v.x v8,t6
        mv      t1,a4
        vand.vv v24,v16,v8
        mv      a6,a4
        vl8re16.v       v16,0(t0)
        vand.vv v8,v16,v8
        bleu    a4,a5,.L3
        mv      a6,a5
.L3:
        vsetvli zero,a6,e8,m4,ta,ma
        vle8.v  v20,0(a2)
        vle8.v  v16,0(a3)
        vsetvli a7,zero,e8,m4,ta,ma
        vrgatherei16.vv v4,v20,v24
        vadd.vv v4,v16,v4
        vsetvli zero,a6,e8,m4,ta,ma
        vse8.v  v4,0(a0)
        vle8.v  v20,0(a2)
        vsetvli a7,zero,e8,m4,ta,ma
        vrgatherei16.vv v4,v20,v8
        vadd.vv v4,v4,v16
        vsetvli zero,a6,e8,m4,ta,ma
        vse8.v  v4,0(a1)
        add     a4,a4,t5
        add     a0,a0,a5
        add     a3,a3,a5
        add     a1,a1,a5
        add     a2,a2,a5
        bgtu    t1,a5,.L12
        csrr    t0,vlenb
        slli    t1,t0,4
        add     sp,sp,t1
        jr      ra
.L8:
        ret

After this patch:

f3:
ble a4,zero,.L6
csrr a6,vlenb
csrr a5,vlenb
slli a6,a6,2
slli a5,a5,2
addi a6,a6,-1
slli a4,a4,3
neg t5,a5
vsetvli t1,zero,e16,m8,ta,ma
vmv.v.x v24,a6
vid.v v8
vand.vi v8,v8,-2
vadd.vi v16,v8,1
vand.vv v8,v8,v24
vand.vv v16,v16,v24
.L4:
mv t1,a4
mv a6,a4
bleu a4,a5,.L3
mv a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v v28,0(a2)
vle8.v v24,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v8
vadd.vv v4,v24,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v v4,0(a0)
vle8.v v28,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v16
vadd.vv v4,v4,v24
vsetvli zero,a6,e8,m4,ta,ma
vse8.v v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtu t1,a5,.L4
.L6:
ret

Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test

They failed are all because of bugs on VSETVL PASS:

10dd4:       0c707057                vsetvli zero,zero,e8,mf2,ta,ma
   10dd8:       5e06b8d7                vmv.v.i v17,13
   10ddc:       9ed030d7                vmv1r.v v1,v13
   10de0:       b21040d7                vncvt.x.x.w     v1,v1           ----> raise illegal instruction since we don't have SEW = 8 -> SEW = 4 narrowing.
   10de4:       5e0785d7                vmv.v.v v11,v15

Confirm the recent VSETVL refactor patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of them.

So this patch should be committed after the VSETVL refactor patch.

PR target/111848

gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (run_const_vector_selftests): Adapt selftest.
* config/riscv/riscv-v.cc (expand_const_vector): Change it into vec_duplicate splitter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: New test.

RISC-V: Refactor and cleanup vsetvl pass

This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification and
   deletion of vsetvl insns based on the virtual CFG. The basic block in the
   virtual CFG is called vsetvl_block_info and the vsetvl information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand system,
   this phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift vsetvl
   info to a pred basic block to a more unified method that there is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. The reasons
for updating are summarized below:

1. more optimized
   vlmax_back_prop-{25,26}.c
   vlmax_conflict-{3,12}.c/vsetvl-{13,23}.c/vsetvl-23.c/
   avl_single-{23,84,95}.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c
4. add some bugfix testcases.
   pr111037-{3,4}.c/pr111037-4.c
   avl_single-{89,104,105,106,107,108,109}.c

PR target/111037
PR target/111234
PR target/111725

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New.
(debug): Removed.
(compute_reaching_defintion): New.
(enum vsetvl_type): Moved.
(vlmax_avl_p): Moved.
(enum emit_type): Moved.
(vlmul_to_str): Moved.
(vlmax_avl_insn_p): Removed.
(policy_to_str): Moved.
(loop_basic_block_p): Removed.
(valid_sew_p): Removed.
(vsetvl_insn_p): Moved.
(vsetvl_vtype_change_only_p): Removed.
(after_or_same_p): Removed.
(before_p): Removed.
(anticipatable_occurrence_p): Removed.
(available_occurrence_p): Removed.
(insn_should_be_added_p): Removed.
(get_all_sets): Moved.
(get_same_bb_set): Moved.
(gen_vsetvl_pat): Removed.
(calculate_vlmul): Moved.
(get_max_int_sew): New.
(emit_vsetvl_insn): Removed.
(get_max_float_sew): New.
(eliminate_insn): Removed.
(insert_vsetvl): Removed.
(count_regno_occurrences): Moved.
(get_vl_vtype_info): Removed.
(enum def_type): Moved.
(validate_change_or_fail): Moved.
(change_insn): Removed.
(get_all_real_uses): Moved.
(get_forward_read_vl_insn): Removed.
(get_backward_fault_first_load_insn): Removed.
(change_vsetvl_insn): Removed.
(avl_source_has_vsetvl_p): Removed.
(source_equal_p): Moved.
(calculate_sew): Removed.
(same_equiv_note_p): Moved.
(get_expr_id): New.
(incompatible_avl_p): Removed.
(get_regno): New.
(different_sew_p): Removed.
(get_bb_index): New.
(different_lmul_p): Removed.
(has_no_uses): Moved.
(different_ratio_p): Removed.
(different_tail_policy_p): Removed.
(different_mask_policy_p): Removed.
(possible_zero_avl_p): Removed.
(enum demand_flags): New.
(second_ratio_invalid_for_first_sew_p): Removed.
(second_ratio_invalid_for_first_lmul_p): Removed.
(enum class): New.
(float_insn_valid_sew_p): Removed.
(second_sew_less_than_first_sew_p): Removed.
(first_sew_less_than_second_sew_p): Removed.
(class vsetvl_info): New.
(compare_lmul): Removed.
(second_lmul_less_than_first_lmul_p): Removed.
(second_ratio_less_than_first_ratio_p): Removed.
(DEF_INCOMPATIBLE_COND): Removed.
(greatest_sew): Removed.
(first_sew): Removed.
(second_sew): Removed.
(first_vlmul): Removed.
(second_vlmul): Removed.
(first_ratio): Removed.
(second_ratio): Removed.
(vlmul_for_first_sew_second_ratio): Removed.
(vlmul_for_greatest_sew_second_ratio): Removed.
(ratio_for_second_sew_first_vlmul): Removed.
(class vsetvl_block_info): New.
(DEF_SEW_LMUL_FUSE_RULE): New.
(always_unavailable): Removed.
(avl_unavailable_p): Removed.
(class demand_system): New.
(sew_unavailable_p): Removed.
(lmul_unavailable_p): Removed.
(ge_sew_unavailable_p): Removed.
(ge_sew_lmul_unavailable_p): Removed.
(ge_sew_ratio_unavailable_p): Removed.
(DEF_UNAVAILABLE_COND): Removed.
(same_sew_lmul_demand_p): Removed.
(propagate_avl_across_demands_p): Removed.
(reg_available_p): Removed.
(support_relaxed_compatible_p): Removed.
(demands_can_be_fused_p): Removed.
(earliest_pred_can_be_fused_p): Removed.
(vsetvl_dominated_by_p): Removed.
(avl_info::avl_info): Removed.
(avl_info::single_source_equal_p): Removed.
(avl_info::multiple_source_equal_p): Removed.
(DEF_SEW_LMUL_RULE): New.
(avl_info::operator=): Removed.
(avl_info::operator==): Removed.
(DEF_POLICY_RULE): New.
(avl_info::operator!=): Removed.
(avl_info::has_non_zero_avl): Removed.
(vl_vtype_info::vl_vtype_info): Removed.
(vl_vtype_info::operator==): Removed.
(DEF_AVL_RULE): New.
(vl_vtype_info::operator!=): Removed.
(vl_vtype_info::same_avl_p): Removed.
(vl_vtype_info::same_vtype_p): Removed.
(vl_vtype_info::same_vlmax_p): Removed.
(vector_insn_info::operator>=): Removed.
(vector_insn_info::operator==): Removed.
(class pre_vsetvl): New.
(vector_insn_info::parse_insn): Removed.
(vector_insn_info::compatible_p): Removed.
(vector_insn_info::skip_avl_compatible_p): Removed.
(vector_insn_info::compatible_avl_p): Removed.
(vector_insn_info::compatible_vtype_p): Removed.
(vector_insn_info::available_p): Removed.
(vector_insn_info::fuse_avl): Removed.
(vector_insn_info::fuse_sew_lmul): Removed.
(vector_insn_info::fuse_tail_policy): Removed.
(vector_insn_info::fuse_mask_policy): Removed.
(vector_insn_info::local_merge): Removed.
(vector_insn_info::global_merge): Removed.
(vector_insn_info::get_avl_or_vl_reg): Removed.
(vector_insn_info::update_fault_first_load_avl): Removed.
(vector_insn_info::dump): Removed.
(vector_infos_manager::vector_infos_manager): Removed.
(vector_infos_manager::create_expr): Removed.
(vector_infos_manager::get_expr_id): Removed.
(vector_infos_manager::all_same_ratio_p): Removed.
(vector_infos_manager::all_avail_in_compatible_p): Removed.
(vector_infos_manager::all_same_avl_p): Removed.
(vector_infos_manager::expr_set_num): Removed.
(vector_infos_manager::release): Removed.
(vector_infos_manager::create_bitmap_vectors): Removed.
(vector_infos_manager::free_bitmap_vectors): Removed.
(vector_infos_manager::dump): Removed.
(class pass_vsetvl): Adjust.
(pass_vsetvl::get_vector_info): Removed.
(pass_vsetvl::get_block_info): Removed.
(pass_vsetvl::update_vector_info): Removed.
(pass_vsetvl::update_block_info): Removed.
(pre_vsetvl::compute_avl_def_data): New.
(pass_vsetvl::simple_vsetvl): Removed.
(pass_vsetvl::compute_local_backward_infos): Removed.
(pass_vsetvl::need_vsetvl): Removed.
(pass_vsetvl::transfer_before): Removed.
(pass_vsetvl::transfer_after): Removed.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pass_vsetvl::emit_local_forward_vsetvls): Removed.
(pass_vsetvl::prune_expressions): Removed.
(pass_vsetvl::compute_local_properties): Removed.
(pre_vsetvl::compute_lcm_local_properties): New.
(pass_vsetvl::earliest_fusion): Removed.
(pre_vsetvl::fuse_local_vsetvl_info): New.
(pass_vsetvl::vsetvl_fusion): Removed.
(pass_vsetvl::can_refine_vsetvl_p): Removed.
(pre_vsetvl::earliest_fuse_vsetvl_info): New.
(pass_vsetvl::refine_vsetvls): Removed.
(pass_vsetvl::cleanup_vsetvls): Removed.
(pass_vsetvl::commit_vsetvls): Removed.
(pass_vsetvl::pre_vsetvl): Removed.
(pass_vsetvl::get_vsetvl_at_end): Removed.
(local_avl_compatible_p): Removed.
(pass_vsetvl::local_eliminate_vsetvl_insn): Removed.
(pre_vsetvl::pre_global_vsetvl_info): New.
(get_first_vsetvl_before_rvv_insns): Removed.
(pass_vsetvl::global_eliminate_vsetvl_insn): Removed.
(pre_vsetvl::emit_vsetvl): New.
(pass_vsetvl::ssa_post_optimization): Removed.
(pre_vsetvl::cleaup): New.
(pre_vsetvl::remove_avl_operand): New.
(pass_vsetvl::df_post_optimization): Removed.
(pre_vsetvl::remove_unused_dest_operand): New.
(pass_vsetvl::init): Removed.
(pass_vsetvl::done): Removed.
(pass_vsetvl::compute_probabilities): Removed.
(pass_vsetvl::lazy_vsetvl): Adjust.
(pass_vsetvl::execute): Adjust.
* config/riscv/riscv-vsetvl.def (DEF_INCOMPATIBLE_COND): Removed.
(DEF_SEW_LMUL_RULE): New.
(DEF_SEW_LMUL_FUSE_RULE): Removed.
(DEF_POLICY_RULE): New.
(DEF_UNAVAILABLE_COND): Removed
(DEF_AVL_RULE): New demand type.
(sew_lmul): New demand type.
(ratio_only): New demand type.
(sew_only): New demand type.
(ge_sew): New demand type.
(ratio_and_ge_sew): New demand type.
(tail_mask_policy): New demand type.
(tail_policy_only): New demand type.
(mask_policy_only): New demand type.
(ignore_policy): New demand type.
(avl): New demand type.
(non_zero_avl): New demand type.
(ignore_avl): New demand type.
* config/riscv/t-riscv: Removed riscv-vsetvl.h
* config/riscv/riscv-vsetvl.h: Removed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust.
* gcc.target/riscv/rvv/base/pr111037-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here.
* gcc.target/riscv/rvv/base/pr111037-2.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-106.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-108.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-109.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test.

return edge in make_eh_edges

The need to initialize edge probabilities has made make_eh_edges
undesirably hard to use.  I suppose we don't want make_eh_edges to
initialize the probability of the newly-added edge itself, so that the
caller takes care of it, but identifying the added edge in need of
adjustments is inefficient and cumbersome.  Change make_eh_edges so
that it returns the added edge.

for  gcc/ChangeLog

* tree-eh.cc (make_eh_edges): Return the new edge.
* tree-eh.h (make_eh_edges): Likewise.

c++: indirect change of active union member in constexpr [PR101631,PR102286]

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.

This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.

As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores and accesses, which catches some additional mistakes
pre-C++20.

PR c++/101631
PR c++/102286

gcc/cp/ChangeLog:

* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_component_reference): Check against first member of
value-initialised union.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp1y/constexpr-union7.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.
* g++.dg/cpp2a/constexpr-union6.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Improve diagnostics for constexpr cast from void*

This patch improves the errors given when casting from void* in C++26 to
include the expected type if the types of the pointed-to objects were
not similar. It also ensures (for all standard modes) that void* casts
are checked even for DECL_ARTIFICIAL declarations, such as
lifetime-extended temporaries, and is only ignored for cases where we
know it's OK (e.g. source_location::current) or have no other choice
(heap-allocated data).

gcc/cp/ChangeLog:

* constexpr.cc (is_std_source_location_current): New.
(cxx_eval_constant_expression): Only ignore cast from void* for
specific cases and improve other diagnostics.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-cast4.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

c++: small tweak for cp_fold_r

This patch is an optimization tweak for cp_fold_r.  If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
cp_fold has:

3143         if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
3144             && !flag_no_inline)
...
3151             r = maybe_constant_value (x, /*decl=*/NULL_TREE,

flag_no_inline is 1 for -O0:

1124   if (opts->x_optimize == 0)
1125     {
1126       /* Inlining does not work if not optimizing,
1127          so force it not to be done.  */
1128       opts->x_warn_inline = 0;
1129       opts->x_flag_no_inline = 1;
1130     }

but otherwise it's 0 and cp_fold will maybe_constant_value calls to
constexpr functions.  And if it doesn't, then folding the COND_EXPR
will keep both arms, and we can avoid calling maybe_constant_value.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r): Don't call maybe_constant_value.

doc: Update contrib.texi

I noticed that Patrick is missing here.

gcc/ChangeLog:

* doc/contrib.texi: Add entry for Patrick Palka.

vect: Use inbranch simdclones in masked loops

This patch enables the compiler to use inbranch simdclones when generating
masked loops in autovectorization.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
compatible with mask parameters in clone.
* tree-vect-stmts.cc (vect_build_all_ones_mask): Allow vector boolean
typed masks.
(vectorizable_simd_clone_call): Enable the use of masked clones in
fully masked loops.

vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

When analyzing a loop and choosing a simdclone to use it is possible to choose
a simdclone that cannot be used 'inbranch' for a loop that can use partial
vectors. This may lead to the vectorizer deciding to use partial vectors which
are not supported for notinbranch simd clones. This patch fixes that by
disabling the use of partial vectors once a notinbranch simd clone has been
selected.

gcc/ChangeLog:

PR tree-optimization/110485
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
vectors usage if a notinbranch simdclone has been selected.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/pr110485.c: New test.

vect: Fix vect_get_smallest_scalar_type for simd clones

The vect_get_smallest_scalar_type helper function was using any argument to a
simd clone call when trying to determine the smallest scalar type that would be
vectorized. This included the function pointer type in a MASK_CALL for
instance, and would result in the wrong type being selected. Instead this
patch special cases simd_clone_call's and uses only scalar types of the
original function that get transformed into vector types.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special case
simd clone calls and only use types that are mapped to vectors.
(simd_clone_call_p): New helper function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
between targets with different pointer sizes.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.

parloops: Allow poly nit and bound

Teach parloops how to handle a poly nit and bound e ahead of the changes to
enable non-constant simdlen.

gcc/ChangeLog:

* tree-parloops.cc (try_transform_to_exit_first_loop_alt): Accept
poly NIT and ALT_BOUND.

parloops: Copy target and optimizations when creating a function clone

SVE simd clones require to be compiled with a SVE target enabled or the argument
types will not be created properly. To achieve this we need to copy
DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the
clones. I decided it was probably also a good idea to copy
DECL_FUNCTION_SPECIFIC_OPTIMIZATION in case the original function is meant to
be compiled with specific optimization options.

gcc/ChangeLog:

* tree-parloops.cc (create_loop_fn): Copy specific target and
optimization options to clone.

omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

Refactor simd clone handling code ahead of support for poly simdlen.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_subparts): Remove.
(simd_clone_init_simd_arrays): Replace simd_clone_supbarts with
TYPE_VECTOR_SUBPARTS.
(ipa_simd_modify_function_body): Likewise.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Likewise.
(simd_clone_subparts): Remove.

libstdc++: [_Hashtable] Do not reuse untrusted cached hash code

On merge, reuse a merged node's possibly cached hash code only if we are on the
same type of hash and this hash is stateless.

Usage of function pointers or std::function as hash functor will prevent reusing
cached hash code.

libstdc++-v3/ChangeLog

* include/bits/hashtable_policy.h
(_Hash_code_base::_M_hash_code(const _Hash&, const _Hash_node_value<>&)): Remove.
(_Hash_code_base::_M_hash_code<_H2>(const _H2&, const _Hash_node_value<>&)): Remove.
* include/bits/hashtable.h
(_M_src_hash_code<_H2>(const _H2&, const key_type&, const __node_value_type&)): New.
(_M_merge_unique<>, _M_merge_multi<>): Use latter.
* testsuite/23_containers/unordered_map/modifiers/merge.cc
(test04, test05, test06): New test cases.

c: Fix ICE when an argument was an error mark [PR100532]

In the case of convert_argument, we would return the same expression
back rather than error_mark_node after the error message about
trying to convert to an incomplete type. This causes issues in
the gimplfier trying to see if another conversion is needed.

The code here dates back to before the revision history too so
it might be the case it never noticed we should return an error_mark_node.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/100532

gcc/c/ChangeLog:

* c-typeck.cc (convert_argument): After erroring out
about an incomplete type return error_mark_node.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100532-1.c: New test.

c: Don't warn about converting NULL to different sso endian [PR104822]

In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/104822

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Check for null pointer
before warning about an incompatible scalar storage order.

gcc/testsuite/ChangeLog:

* gcc.dg/sso-18.c: New test.
* gcc.dg/sso-19.c: New test.

ABOUT-GCC-NLS: add usage guidance

gcc/ChangeLog:

* ABOUT-GCC-NLS: Add usage guidance.

diagnostic: rename new permerror overloads

While checking another change, I noticed that the new permerror overloads
break gettext with "permerror used incompatibly as both
--keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
--keyword=permerror:3 --flag=permerror:3:gcc-internal-format". So let's
change the name.

gcc/ChangeLog:

* diagnostic-core.h (permerror): Rename new overloads...
(permerror_opt): To this.
* diagnostic.cc: Likewise.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Adjust.

c++: use G_ instead of _

Since these strings are passed to error_at, they should be marked for
translation with G_, like other diagnostic messages, rather than _, which
forces immediate (redundant) translation. The use of N_ is less
problematic, but also imprecise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_primary_expression): Use G_.
(cp_parser_using_enum): Likewise.
* decl.cc (identify_goto): Likewise.

ada: Support new SPARK aspect Side_Effects

SPARK RM 6.1.11 introduces a new aspect Side_Effects to denote
those functions which may have output parameters, write global
variables, raise exceptions and not terminate. This adds support
for this aspect and the corresponding pragma in the frontend.

Handling of this aspect in the frontend is very similar to
the handling of aspect Extensions_Visible: both are Boolean
aspects whose expression should be static, they can be specified
on the same entities, with the same rule of inheritance from
overridden to overriding primitives for tagged types.

There is no impact on code generation.

gcc/ada/

* aspects.ads: Add aspect Side_Effects.
* contracts.adb (Add_Pre_Post_Condition)
(Inherit_Subprogram_Contract): Add support for new contract.
* contracts.ads: Update comments.
* einfo-utils.adb (Get_Pragma): Add support.
* einfo-utils.ads (Prag): Update comment.
* errout.ads: Add explain codes.
* par-prag.adb (Prag): Add support.
* sem_ch13.adb (Analyze_Aspect_Specifications)
(Check_Aspect_At_Freeze_Point): Add support.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper)
(Analyze_Subprogram_Declaration): Call new analysis procedure to
check SPARK legality rules.
(Analyze_SPARK_Subprogram_Specification): New procedure to check
SPARK legality rules. Use an explain code for the error.
(Analyze_Subprogram_Specification): Move checks to new subprogram.
This code was effectively dead, as the kind for parameters was set
to E_Void at this point to detect early references.
* sem_ch6.ads (Analyze_Subprogram_Specification): Add new
procedure.
* sem_prag.adb (Analyze_Depends_In_Decl_Part)
(Analyze_Global_In_Decl_Part): Adapt legality check to apply only
to functions without side-effects.
(Analyze_If_Present): Extract functionality in new procedure
Analyze_If_Present_Internal.
(Analyze_If_Present_Internal): New procedure to analyze given
pragma kind.
(Analyze_Pragmas_If_Present): New procedure to analyze given
pragma kind associated with a declaration.
(Analyze_Pragma): Adapt support for Always_Terminates and
Exceptional_Cases. Add support for Side_Effects. Make sure to call
Analyze_If_Present to ensure pragma Side_Effects is analyzed prior
to analyzing pragmas Global and Depends. Use explain codes for the
errors.
* sem_prag.ads (Analyze_Pragmas_If_Present): Add new procedure.
* sem_util.adb (Is_Function_With_Side_Effects): New query function
to determine if a function is a function with side-effects.
* sem_util.ads (Is_Function_With_Side_Effects): Same.
* snames.ads-tmpl: Declare new names for pragma and aspect.
* doc/gnat_rm/implementation_defined_aspects.rst: Document new aspect.
* doc/gnat_rm/implementation_defined_pragmas.rst: Document new pragma.
* gnat_rm.texi: Regenerate.

ada: Refactor code to remove GNATcheck violation

Rewrite for loop containing an exit (which violates GNATcheck
rule Exits_From_Conditional_Loops), to use a while loop
which contains the exit criteria in its condition.
Also, move special case of first time through loop, to come
before loop.

gcc/ada/

* libgnat/s-imagef.adb (Set_Image_Fixed): Refactor loop.

ada: Add pragma Annotate for GNATcheck exemptions

Exempt the GNATcheck rule "Unassigned_OUT_Parameters"
with the rationale "the OUT parameter is assigned by component".

gcc/ada/

* libgnat/s-imguti.adb (Set_Decimal_Digits): Add pragma to exempt
Unassigned_OUT_Parameters.
(Set_Floating_Invalid_Value): Likewise

ada: Document gnatbind -Q switch

Add documentation for the -Q gnatbind switch in GNAT User's Guide and
improve gnatbind's help output for the switch to emphasize that it adds the
requested number of stacks to the secondary stack pool generated by the
binder.

gcc/ada/

* bindusg.adb (Display): Make it clear -Q adds to the number of
secondary stacks generated by the binder.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -Q gnatbind switch and fix references to old
runtimes.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: Seize opportunity to reuse List_Length

This patch is intended as a readability improvement. It doesn't
change the behavior of the compiler.

gcc/ada/

* sem_ch3.adb (Constrain_Array): Replace manual list length
computation by call to List_Length.

ada: Simplify "not Present" with "No"

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate): Simplify with "No".

c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback). Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.

gcc/c-family/ChangeLog:

PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl): Handle
-Wunknown-pragmas during early processing.

gcc/testsuite/ChangeLog:

PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.

libcpp: testsuite: Add test for fixed _Pragma bug [PR82335]

This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
that is not represented elsewhere.

gcc/testsuite/ChangeLog:

PR preprocessor/82335
* c-c++-common/cpp/diagnostic-pragma-3.c: New test.

middle-end: don't create LC-SSA PHI variables for PHI nodes who dominate loop

As the testcase shows, when a PHI node dominates the loop there is no new
definition inside the loop.  As such there would be no PHI nodes to update.

When we maintain LCSSA form we create an intermediate node in between the two
loops to thread alongt the value.  However later on when we update the second
loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
does nothing.   This leaves us with an incorrect phi node.  Normally this does
nothing and just gets ignored.  But in the case of the vUSE chain we end up
corrupting the chain.

As such whenever a PHI node's argument dominates the loop, we should remove
the newly created PHI node after edge redirection.

The one exception to this is when the loop has been versioned.  In such cases
the versioned loop may not use the value but the second loop can.

When this happens and we add the loop guard unless the join block has the PHI
it can't find the original value for use inside the guard block.

The next refactoring in the series moves the formation of the guard block
inside peeling itself.  Here we have all the information and wouldn't
need to re-create it later.

gcc/ChangeLog:

PR tree-optimization/111860
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Remove PHI nodes that dominate loop.

gcc/testsuite/ChangeLog:

PR tree-optimization/111860
* gcc.dg/vect/pr111860.c: New test.

tree-optimization/111131 - SLP for non-IFN gathers

The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target). That includes support for emulated gathers but also
the legacy x86 builtin path.

PR tree-optimization/111131
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one. Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.

Refactor x86 vectorized gather path

The following moves the builtin decl gather vectorization path along
the internal function and emulated gather vectorization paths,
simplifying the existing function down to generating the call and
required conversions to the actual argument types.  This thereby
exposes the unique support of two times larger number of offset
or data vector lanes.  It also makes the code path handle SLP
in principle (but SLP build needs adjustments for this, patch coming).

* tree-vect-stmts.cc (vect_build_gather_load_calls): Rename
to ...
(vect_build_one_gather_load_call): ... this.  Refactor,
inline widening/narrowing support ...
(vectorizable_load): ... here, do gather vectorization
with builtin decls along other gather vectorization.

aarch64: Generalise TFmode load/store pair patterns

This patch generalises the TFmode load/store pair patterns to TImode and
TDmode.  This brings them in line with the DXmode patterns, and uses the
same technique with separate mode iterators (TX and TX2) to allow for
distinct modes in each arm of the load/store pair.

For example, in combination with the post-RA load/store pair fusion pass
in the following patch, this improves the codegen for the following
varargs testcase involving TImode stores:

void g(void *);
int foo(int x, ...)
{
    __builtin_va_list ap;
    __builtin_va_start (ap, x);
    g(&ap);
    __builtin_va_end (ap);
}

from:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 184]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
str q0, [sp, 48]
str q1, [sp, 64]
str q2, [sp, 80]
str q3, [sp, 96]
str q4, [sp, 112]
str q5, [sp, 128]
str q6, [sp, 144]
str q7, [sp, 160]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl g
ldp x29, x30, [sp], 240
.LCFI1:
ret

to:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 1bd4971b7c71e70a637a1dq84]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
stp q0, q1, [sp, 48]
stp q2, q3, [sp, 80]
stp q4, q5, [sp, 112]
stp q6, q7, [sp, 144]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl g
ldp x29, x30, [sp], 240
.LCFI1:
ret

Note that this patch isn't neeed if we only use the mode
canonicalization approach in the new ldp fusion pass (since we
canonicalize T{I,F,D}mode to V16QImode), but we seem to get slightly
better performance with mode canonicalization disabled (see
--param=aarch64-ldp-canonicalize-modes in the following patch).

gcc/ChangeLog:

* config/aarch64/aarch64.md (load_pair_dw_tftf): Rename to ...
(load_pair_dw_<TX:mode><TX2:mode>): ... this.
(store_pair_dw_tftf): Rename to ...
(store_pair_dw_<TX:mode><TX2:mode>): ... this.
* config/aarch64/iterators.md (TX2): New.

aarch64, testsuite: Fix up pr71727.c

The test is trying to check that we don't use q-register stores with
-mstrict-align, so actually check specifically for that.

This is a prerequisite to avoid regressing:

scan-assembler-not "add\tx0, x0, :"

with the upcoming ldp fusion pass, as we change where the ldps are
formed such that a register is used rather than a symbolic (lo_sum)
address for the first load.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
make sure we don't have q-register stores with -mstrict-align.

aarch64, testsuite: Tweak sve/pcs/args_9.c to allow stps

With the new ldp/stp pass enabled, there is a change in the codegen for
this test as follows:

        add     x8, sp, 16
        ptrue   p3.h, mul3
        str     p3, [x8]
-       str     x8, [sp, 8]
-       str     x9, [sp]
+       stp     x9, x8, [sp]
        ptrue   p3.d, vl8
        ptrue   p2.s, vl7
        ptrue   p1.h, vl6

i.e. we now form an stp that we were missing previously. This patch
adjusts the scan-assembler such that it should pass whether or not
we form the stp.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
allow for stp.

aarch64, testsuite: Prevent stp in lr_free_1.c

The test is looking for individual stores which are able to be merged
into stp instructions. The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.

As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lr_free_1.c: Add
--param=aarch64-stp-policy=never to dg-options.

rtl-ssa: Support inferring uses of mem in change_insns

Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly.  This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent).  It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself.  This patch does that.

gcc/ChangeLog:

* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
parameter to give final insn position, infer use of mem if it isn't
specified explicitly.
(function_info::change_insns): Pass down final insn position to
finalize_new_accesses.
* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.

rtl-ssa: Add entry point to allow re-parenting uses

This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::reparent_use): New.
* rtl-ssa/functions.h (function_info): Declare new member
function reparent_use.

rtl-ssa: Add drop_memory_access helper

Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.

gcc/ChangeLog:

* rtl-ssa/access-utils.h (drop_memory_access): New.

rtl-ssa: Fix bug in function_info::add_insn_after

In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.

This can lead to corruption of the insn chain, in that we end up with:

insn->next_any_insn ()->prev_any_insn () != insn

in this case. This patch fixes that.

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
update the prev pointer on the following nondebug insn in the
case that !insn->is_debug_insn () && next->is_debug_insn ().

x86: Correct ISA enabled for clients since Arrow Lake

gcc/ChangeLog:

* config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
Also make Clearwater Forest depends on Sierra Forest.
* config/i386/i386-options.cc: Revise the order of the macro
definition to avoid confusion.
* doc/extend.texi: Revise documentation.
* doc/invoke.texi: Correct documentation.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Group Clearwater Forest
with atom cores.

amdgcn: deprecate Fiji device and multilib

LLVM wants to remove it, which breaks our build. This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.

I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.

gcc/ChangeLog:

* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.

LoongArch:Implement the new vector cost model framework.

This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.

The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
vector_costs. Add a constructor.
(loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
adjust the cost for inner loops.
(loongarch_vector_costs::count_operations): New function.
(loongarch_vector_costs::determine_suggested_unroll_factor): Ditto.
(loongarch_vector_costs::finish_cost): Ditto.
(loongarch_builtin_vectorization_cost): Adjust.
* config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* config/loongarch/genopts/loongarch.opt.in
(loongarch-vect-unroll-limit): Ditto.
(loongarcg-vect-issue-info): Ditto.
(mmemvec-cost): Delete.
* doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.

LoongArch:Implement vec_widen standard names.

Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.

gcc/ChangeLog:

* config/loongarch/lasx.md
(vec_widen_<su>mult_even_v8si): New patterns.
(vec_widen_<su>add_hi_<mode>): Ditto.
(vec_widen_<su>add_lo_<mode>): Ditto.
(vec_widen_<su>sub_hi_<mode>): Ditto.
(vec_widen_<su>sub_lo_<mode>): Ditto.
(vec_widen_<su>mult_hi_<mode>): Ditto.
(vec_widen_<su>mult_lo_<mode>): Ditto.
* config/loongarch/loongarch.md (u_bool): New iterator.
* config/loongarch/loongarch-protos.h
(loongarch_expand_vec_widen_hilo): New prototype.
* config/loongarch/loongarch.cc
(loongarch_expand_vec_interleave): New function.
(loongarch_expand_vec_widen_hilo): New function.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-widen-add.c: New test.
* gcc.target/loongarch/vect-widen-mul.c: New test.
* gcc.target/loongarch/vect-widen-sub.c: New test.

LoongArch:Implement avg and sad standard names.

gcc/ChangeLog:

* config/loongarch/lasx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv32qi): Ditto.
(ssadv32qi): Ditto.
* config/loongarch/lsx.md
(avg<mode>3_ceil): New patterns.
(uavg<mode>3_ceil): Ditto.
(avg<mode>3_floor): Ditto.
(uavg<mode>3_floor): Ditto.
(usadv16qi): Ditto.
(ssadv16qi): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/avg-ceil-lasx.c: New test.
* gcc.target/loongarch/avg-ceil-lsx.c: New test.
* gcc.target/loongarch/avg-floor-lasx.c: New test.
* gcc.target/loongarch/avg-floor-lsx.c: New test.
* gcc.target/loongarch/sad-lasx.c: New test.
* gcc.target/loongarch/sad-lsx.c: New test.

Daily bump.

Fix expansion of `(a & 2) != 1`

I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/111863

gcc/ChangeLog:

* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111863-1.c: New test.

[c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error

When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.

This adds a check for error before the call of c_type_promotes_to.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101364

gcc/c/ChangeLog:

* c-decl.cc (diagnose_arglist_conflict): Test for
error mark before calling of c_type_promotes_to.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101364-1.c: New test.

Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/101285

gcc/c/ChangeLog:

* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
operands early.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101285-1.c: New test.

PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.

gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
chooses base element from arg, ensure that it's a natural stepped
sequence.
(build_vec_cst_rand): New param natural_stepped and use it to
construct a naturally stepped sequence.
(test_nunits_min_2): Add new unit tests Case 6 and Case 7.

pru: Implement TARGET_INSN_COST

This patch slightly improves the embench-iot benchmark score for
PRU code size.  There is also small improvement in a few real-world
firmware programs.

  Embench-iot size
  ------------------------------------------
  Benchmark          before   after    delta
  ---------           ----    ----     -----
  aha-mont64          4.15    4.15         0
  crc32               6.04    6.04         0
  cubic              21.64   21.62     -0.02
  edn                 6.37    6.37         0
  huffbench          18.63   18.55     -0.08
  matmult-int         5.44    5.44         0
  md5sum             25.56   25.43     -0.13
  minver             12.82   12.76     -0.06
  nbody              15.09   14.97     -0.12
  nettle-aes          4.75    4.75         0
  nettle-sha256       4.67    4.67         0
  nsichneu            3.77    3.77         0
  picojpeg            4.11    4.11         0
  primecount          7.90    7.90         0
  qrduino             7.18    7.16     -0.02
  sglib-combined     13.63   13.59     -0.04
  slre                5.19    5.19         0
  st                 14.23   14.12     -0.11
  statemate           2.34    2.34         0
  tarfind            36.85   36.64     -0.21
  ud                 10.51   10.46     -0.05
  wikisort            7.44    7.41     -0.03
  ---------          -----   -----
  Geometric mean      8.42    8.40     -0.02
  Geometric SD        2.00    2.00         0
  Geometric range    12.68   12.62     -0.06

gcc/ChangeLog:

* config/pru/pru.cc (pru_insn_cost): New function.
(TARGET_INSN_COST): Define for PRU.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

LibF7: Implement mul_mant for devices without MUL instruction.

libgcc/config/avr/libf7/
* libf7-asm.sx (mul_mant): Implement for devices without MUL.
* asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation.
* t-libf7 (F7_ASM_FLAGS): Add -g0.

aarch64: Replace duplicated selftests

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
Test <= instead of testing < twice.

cse: Workaround GCC < 5 bug in cse_insn [PR111852]

Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element. poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.

For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.

2023-10-18 Jakub Jelinek <jakub@redhat.com>

PR bootstrap/111852
* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
using rtx_def type for memory_extend_buf, use unsigned char
arrayy with size of rtx_def and its alignment.

diagnostic: add permerror variants with opt

In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag. This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags. The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

* doc/invoke.texi: Move -fpermissive to Warning Options.
* diagnostic.cc (update_effective_level_from_pragmas): Remove
redundant system header check.
(diagnostic_report_diagnostic): Move down syshdr/-w check.
(diagnostic_impl): Handle DK_PERMERROR with an option number.
(permerror): Add new overloads.
* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

* g++.dg/ext/integer-pack2.C: Add -fpermissive.
* g++.dg/diagnostic/sys-narrow.h: New test.
* g++.dg/diagnostic/sys-narrow1.C: New test.
* g++.dg/diagnostic/sys-narrow1a.C: New test.
* g++.dg/diagnostic/sys-narrow1b.C: New test.
* g++.dg/diagnostic/sys-narrow1c.C: New test.
* g++.dg/diagnostic/sys-narrow1d.C: New test.
* g++.dg/diagnostic/sys-narrow1e.C: New test.
* g++.dg/diagnostic/sys-narrow1f.C: New test.
* g++.dg/diagnostic/sys-narrow1g.C: New test.
* g++.dg/diagnostic/sys-narrow1h.C: New test.
* g++.dg/diagnostic/sys-narrow1i.C: New test.

OpenMP: Avoid ICE with LTO and 'omp allocate'

gcc/ChangeLog:

* gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
to avoid that auxillary statement list reaches LTO.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-13a.f90: New test.

tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]

GCC ICEs on the first testcase.  Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
  /* Remove some statements which can't be kept in the IL because they
     use SSA_NAME whose setter is going to be removed too.  */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.

So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.

While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/111845
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
statements for the 4 operand addition or subtraction of 3 operands
from 1 operand cases and remove them when successful.  Look for
nested additions even from rhs[2], not just rhs[1].

* gcc.dg/pr111845.c: New test.
* gcc.target/i386/pr111845.c: New test.

nvptx: Use fatal_error when -march= is missing not an assert [PR111093]

gcc/ChangeLog:

PR target/111093
* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
instead of an assert ICE when no -march= has been specified.

Darwin: Check as for .build_version support and use it if available.

This adds support for the minimum OS version data in assembler files.
At present, we have no mechanism to detect the SDK version in use, and
so that is omitted from build_versions.

We follow the implementation in clang, '.build_version' is only emitted
(where supported) for target macOS versions >= 10.14. For earlier macOS
we fall back to using a '.macosx_version_min' directive. This latter is
also emitted when the assembler supports it, but not build_version.

gcc/ChangeLog:

* config.in: Regenerate.
* config/darwin.cc (darwin_file_start): Add assembler directives
for the target OS version, where these are supported by the
assembler.
(darwin_override_options): Check for building >= macOS 10.14.
* configure: Regenerate.
* configure.ac: Check for assembler support of .build_version
directives.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

ifcvt: rewrite args handling to remove lookups

This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
(typedef struct ifcvt_arg_entry): New.
(cmp_arg_entry): New.
(gen_phi_arg_condition, gen_phi_nest_statement,
predicate_scalar_phi): Use them.

AArch64: Rewrite simd move immediate patterns to new syntax

This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected. This will be needed in follow on patches.

This also merges the splits into the define_insn which will also be needed soon.

gcc/ChangeLog:

PR tree-optimization/109154
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Rewrite to new syntax.
(*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in
splits.

middle-end: ifcvt: Allow any const IFN in conditional blocks

When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.

For builtins it only allowed C99 builtin functions which it knew it can fold
away.

These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.

This is then used by match.pd to conver the IFN into a masked variant if it's
available.

For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.

middle-end: Fold vec_cond into conditional ternary or binary operation when sharing operand [PR109154]

When we have a vector conditional on a masked target which is doing a selection
on the result of a conditional operation where one of the operands of the
conditional operation is the other operand of the select, then we can fold the
vector conditional into the operation.

Concretely this transforms

  c = mask1 ? (masked_op mask2 a b) : b

into

  c = masked_op (mask1 & mask2) a b

The mask is then propagated upwards by the compiler.  In the SVE case we don't
end up needing a mask AND here since `mask2` will end up in the instruction
creating `mask` which gives us a natural &.

Such transformations are more common now in GCC 13+ as PRE has not started
unsharing of common code in case it can make one branch fully independent.

e.g. in this case `b` becomes a loop invariant value after PRE.

This transformation removes the extra select for masked architectures but
doesn't fix the general case.

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new cond_op rule.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.target/aarch64/sve/pre_cond_share_1.c: New test.

LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

During the review of an LLVM change [1], on LA464 we found that zeroing
an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.

Re-instantiate integer mask to traditional vector mask support

The following allows to pass integer mask data as traditional
vector mask for OMP SIMD clone calls which is required due to
the limited set of OMP SIMD clones in the x86 ABI when using
AVX512 but a prefered vector size of 256 bits.

* tree-vect-stmts.cc (vectorizable_simd_clone_call):
Relax check to again allow passing integer mode masks
as traditional vectors.

middle-end: maintain LCSSA throughout loop peeling

This final patch updates peeling to maintain LCSSA all the way through.

It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.

This allows us to remove many of the helper functions that touch up the loops
at various parts.  The only complication is for loop distribution where we
should be able to use the same,  however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.

The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.

However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits.   However due to the PHI nodes the probabilities get
completely wrong.  It seems to think that the impossible exit is the likely
edge.  This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.

While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not.  For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.

This preserves the current behaviour from ldist until I can dive into the
implementation more.  Hopefully that's ok for now.

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
asserts.
(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
(find_guard_arg): Look value up through explicit edge and original defs.
(vect_do_peeling): Use it.
(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
Remove.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
optional param to turn off LCSSA mode.

middle-end: updated niters analysis to handle multiple exits.

This second part updates niters analysis to be able to analyze any number of
exits.  If we have multiple exits we determine the main exit by finding the
first counting IV.

The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them.  It does however allow us to test if the exit
handling is using the right exit everywhere.

Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.

The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.

To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.

gcc/ChangeLog:

* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
it.
* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
(vec_init_loop_exit_info): Extend analysis when multiple exits.
(vect_analyze_loop_form): Record conds and determine main cond.
(vect_create_loop_vinfo): Extend bookkeeping of conds.
(vect_analyze_loop): Release conds.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND):  New.
(struct vect_loop_form_info): Add conds, alt_loop_conds;
(struct loop_vec_info): Add conds, loop_iv_cond.

middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.

The goal of this one is to separate out the refactoring from the new
functionality.

This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo.  During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
the versioned loop before peeling.

Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops.  Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.

This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.

For now the loop exits are simply initialized to same value as before determined
by single_exit (..).

No change in functionality is expected throughout this patch series.

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
(loop_distribution::distribute_loop): Bail out of not single exit.
* tree-scalar-evolution.cc (get_loop_exit_condition): New.
* tree-scalar-evolution.h (get_loop_exit_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
explicitly.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_partial_vectors_avx512,
vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
take exit.
(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
return new peeled corresponding peeled exit.
(slpeel_can_duplicate_loop_p): Explicitly take exit.
(find_loop_location): Handle not knowing an explicit exit.
(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
find_guard_arg, slpeel_update_phi_nodes_for_loops,
slpeel_update_phi_nodes_for_guard2): Use new exits.
(vect_do_peeling): Update bookkeeping to keep track of exits.
* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
analyze.
(vec_init_loop_exit_info): New.
(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
vec_epilogue_loop_iv, scalar_loop_iv.
(vect_analyze_loop_form): Initialize exits.
(vect_create_loop_vinfo): Set main exit.
(vect_create_epilog_for_reduction, vectorizable_live_operation,
vect_transform_loop): Use it.
(scale_profile_for_vect_loop): Explicitly take exit to scale.
* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
LOOP_VINFO_SCALAR_IV_EXIT): New.
(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
scalar_loop_iv.
(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
(vec_init_loop_exit_info): New.
(struct vect_loop_form_info): Add loop_exit.

middle-end: refactor vectorizable_comparison to make the main body re-usable.

Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break. The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
to ...
(vectorizable_comparison_1): ...This.

RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx

This patch optimize this following permutation with consecutive patterns index:

typedef char vnx16i __attribute__ ((vector_size (16)));

#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15

vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, MASK_16);
}

Before this patch:

        lui     a5,%hi(.LC0)
        addi    a5,a5,%lo(.LC0)
        vsetivli        zero,16,e8,m1,ta,ma
        vle8.v  v3,0(a5)
        vle8.v  v2,0(a1)
        vrgather.vv     v1,v2,v3
        vse8.v  v1,0(a0)
        ret

After this patch:

vsetivli zero,16,e8,mf8,ta,ma
vle8.v v2,0(a1)
vsetivli zero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivli zero,16,e8,mf8,ta,ma
vse8.v v1,0(a0)
ret

Overal reduce 1 instruction which is vector load instruction which is much more expansive
than VL toggling.

Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
(expand_vec_perm_const_1): Add consecutive pattern recognition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.

fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example

gcc/fortran/ChangeLog:

* intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to
the example to make it safer.

Initial Panther Lake Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther
Lake.
* common/config/i386/i386-common.cc (processor_name):
Ditto.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_PANTHERLAKE.
* config.gcc: Add -march=pantherlake.
* config/i386/driver-i386.cc (host_detect_local_cpu): Refactor
the if clause. Handle pantherlake.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Handle pantherlake.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_PANTHERLAKE): New.
(m_CORE_HYBRID): Add pantherlake.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.

x86: Add m_CORE_HYBRID for hybrid clients tuning

gcc/Changelog:

* config/i386/i386-options.cc (m_CORE_HYBRID): New.
* config/i386/x86-tune.def: Replace hybrid client tune to
m_CORE_HYBRID.

Initial Clearwater Forest Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Clearwater Forest.
* common/config/i386/i386-common.cc (processor_name):
Add Clearwater Forest.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_CLEARWATERFOREST.
* config.gcc: Add -march=clearwaterforest.
* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
clearwaterforest.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_CLEARWATERFOREST): New.
(m_CORE_ATOM): Add clearwaterforest.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.

Support 32/64-bit vectorization for _Float16 fma related operations.

gcc/ChangeLog:

* config/i386/mmx.md (fma<mode>4): New expander.
(fms<mode>4): Ditto.
(fnma<mode>4): Ditto.
(fnms<mode>4): Ditto.
(vec_fmaddsubv4hf4): Ditto.
(vec_fmsubaddv4hf4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
* gcc.target/i386/part-vect-fmahf-1.c: New test.

RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html

which is caused by assertion FAIL.

When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832

Now, we enable more tests in rvv.exp in this patch and fix the bug.

PR target/111832

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.

Daily bump.

aarch64: Put LR save slot first in more cases

Now that the prologue and epilogue code iterates over saved
registers in offset order, we can put the LR save slot first
without compromising LDP/STP formation.

This isn't worthwhile when shadow call stacks are enabled, since the
first two registers are also push/pop candidates, and LR cannot be
popped when shadow call stacks are enabled. (LR is instead loaded
first and compared against the shadow stack's value.)

But otherwise, it seems better to put the LR save slot first,
to reduce unnecessary variation with the layout for stack clash
protection.

gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make
the position of the LR save slot dependent on stack clash
protection unless shadow call stacks are enabled.

gcc/testsuite/
* gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19.
* gcc.target/aarch64/test_frame_4.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_10.c: Likewise.

aarch64: Use vecs to store register save order

aarch64_save/restore_callee_saves looped over registers in register
number order.  This in turn meant that we could only use LDP and STP
for registers that were consecutive both number-wise and
offset-wise (after unsaved registers are excluded).

This patch instead builds lists of the registers that we've decided to
save, in offset order.  We can then form LDP/STP pairs regardless of
register number order, which in turn means that we can put the LR save
slot first without losing LDP/STP opportunities.

gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add vectors that
store the list saved GPRs, FPRs and predicate registers.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize
the lists of saved registers.  Use them to choose push candidates.
Invalidate pop candidates if we're not going to do a pop.
(aarch64_next_callee_save): Delete.
(aarch64_save_callee_saves): Take a list of registers,
rather than a range.  Make !skip_wb select only write-back
candidates.
(aarch64_expand_prologue): Update calls accordingly.
(aarch64_restore_callee_saves): Take a list of registers,
rather than a range.  Always skip pop candidates.  Also skip
LR if shadow call stacks are enabled.
(aarch64_expand_epilogue): Update calls accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores
to happen in offset order.
* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.