Uros Bizjak [Wed, 21 Jun 2023 19:55:30 +0000 (21:55 +0200)]
function: Change return type of predicate function from int to bool
Also change some internal variables to bool and some functions to void.
gcc/ChangeLog:
* function.h (emit_initial_value_sets):
Change return type from int to void.
(aggregate_value_p): Change return type from int to bool.
(prologue_contains): Ditto.
(epilogue_contains): Ditto.
(prologue_epilogue_contains): Ditto.
* function.cc (temp_slot): Make "in_use" variable bool.
(make_slot_available): Update for changed "in_use" variable.
(assign_stack_temp_for_type): Ditto.
(emit_initial_value_sets): Change return type from int to void
and update function body accordingly.
(instantiate_virtual_regs): Ditto.
(rest_of_handle_thread_prologue_and_epilogue): Ditto.
(safe_insn_predicate): Change return type from int to bool.
(aggregate_value_p): Change return type from int to bool
and update function body accordingly.
(prologue_contains): Change return type from int to bool.
(prologue_epilogue_contains): Ditto.
Paul Thomas [Wed, 21 Jun 2023 16:05:58 +0000 (17:05 +0100)]
Fortran: Fix some bugs in associate [PR87477]
2023-06-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
PR fortran/88688
PR fortran/94380
PR fortran/107900
PR fortran/110224
* decl.cc (char_len_param_value): Fix memory leak.
(resolve_block_construct): Remove unnecessary static decls.
* expr.cc (gfc_is_ptr_fcn): New function.
(gfc_check_vardef_context): Use it to permit pointer function
result selectors to be used for associate names in variable
definition context.
* gfortran.h: Prototype for gfc_is_ptr_fcn.
* match.cc (build_associate_name): New function.
(gfc_match_select_type): Use the new function to replace inline
version and to build a new associate name for the case where
the supplied associate name is already used for that purpose.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (gfc_get_symbol_decl): Unlimited polymorphic
variables need deferred initialisation of the vptr.
(gfc_trans_deferred_vars): Do the vptr initialisation.
* trans-stmt.cc (trans_associate_var): Ensure that a pointer
associate name points to the target of the selector and not
the selector itself.
gcc/testsuite/
PR fortran/87477
PR fortran/107900
* gfortran.dg/pr107900.f90 : New test
PR fortran/110224
* gfortran.dg/pr110224.f90 : New test
PR fortran/88688
* gfortran.dg/pr88688.f90 : New test
PR fortran/94380
* gfortran.dg/pr94380.f90 : New test
PR fortran/95398
* gfortran.dg/pr95398.f90 : Set -std=f2008, bump the line
numbers in the error tests by two and change the text in two.
Kyrylo Tkachov [Wed, 21 Jun 2023 12:43:26 +0000 (13:43 +0100)]
aarch64: Avoid same input and output Z register for gather loads
The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software Optimization
Guides for Arm cores recommend that as well.
This means that for code like:
svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}
we'll want to avoid generating the current:
food:
ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
add z0.d, z1.d, z0.d
ret
However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.
This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'
This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.
The benchmark numbers on a Neoverse V1 on fprate look okay:
diff
503.bwaves_r 0.00%
507.cactuBSSN_r 0.00%
508.namd_r 0.00%
510.parest_r 0.55%
511.povray_r 0.22%
519.lbm_r 0.00%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.56%
538.imagick_r 0.00%
544.nab_r 0.00%
549.fotonik3d_r 0.00%
554.roms_r 0.00%
fprate 0.10%
Bootstrapped and tested on aarch64-none-linux-gnu.
Kyrylo Tkachov [Wed, 21 Jun 2023 12:40:15 +0000 (13:40 +0100)]
aarch64: Convert SVE gather patterns to compact syntax
This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.
The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
Richard Biener [Wed, 21 Jun 2023 09:12:36 +0000 (11:12 +0200)]
Hide IVOPTs strip_offset
PR110243 shows strip_offset has some correctness issues, the following
avoids using it from loop distribution which can use the more correct
split_constant_offset from data-ref analysis instead. The patch then
un-exports the function from IVOPTs.
* tree-loop-distribution.cc (classify_builtin_st): Use
split_constant_offset.
* tree-ssa-loop-ivopts.h (strip_offset): Remove.
* tree-ssa-loop-ivopts.cc (strip_offset): Make static.
Kyrylo Tkachov [Wed, 21 Jun 2023 11:03:22 +0000 (12:03 +0100)]
aarch64: Convert SVE gather patterns to compact syntax
This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.
The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
Tamar Christina [Wed, 21 Jun 2023 09:34:54 +0000 (10:34 +0100)]
docs: replace backslashchar [PR 110329].
It seems like @blackslashchar{} is a relatively new addition
to texinfo. Other parts of the docs use @samp{\} so use it
here too so older distros work.
Richard Biener [Mon, 19 Jun 2023 10:28:32 +0000 (12:28 +0200)]
[i386] Reject too large vectors for partial vector vectorization
The following works around the lack of the x86 backend making the
vectorizer compare the costs of the different possible vector
sizes the backed advertises through the vector_modes hook. When
enabling masked epilogues or main loops then this means we will
select the prefered vector mode which is usually the largest even
for loops that do not iterate close to the times the vector has
lanes. When not using masking the vectorizer would reject any
mode resulting in a VF bigger than the number of iterations
but with masking they are simply masked out.
So this overloads the finish_cost function and matches for
the problematic case, forcing a high cost to make us try a
smaller vector size.
* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Overload. For masked main loops make sure the vectorization
factor isn't more than double the number of iterations.
* gcc.target/i386/vect-partial-vectors-1.c: New testcase.
* gcc.target/i386/vect-partial-vectors-2.c: Likewise.
Jan Beulich [Wed, 21 Jun 2023 06:03:05 +0000 (08:03 +0200)]
x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F
There's no reason to constrain this to AVX512VL, unless instructed so by
-mprefer-vector-width=, as the wider operation is unusable for more
narrow operands only when the possible memory source is a non-broadcast
one. This way even the scalar copysign<mode>3 can benefit from the
operation being a single-insn one (leaving aside moves which the
compiler decides to insert for unclear reasons, and leaving aside the
fact that bcst_mem_operand() is too restrictive for broadcast to be
embedded right into VPTERNLOG*).
While there also bring *<avx512>_vternlog<mode>_all's in sync with that
of the three splitters.
Along with this also request value duplication in
ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
excess space allocation in .rodata.*, filled with zeros which are never
read.
gcc/
* config/i386/i386-expand.cc (ix86_expand_copysign): Request
value duplication by ix86_build_signbit_mask() when AVX512F and
not HFmode.
* config/i386/sse.md (*<avx512>_vternlog<mode>_all): Convert to
2-alternative form. Adjust "mode" attribute. Add "enabled"
attribute.
(*<avx512>_vpternlog<mode>_1): Also permit when TARGET_AVX512F
&& !TARGET_PREFER_AVX256.
(*<avx512>_vpternlog<mode>_2): Likewise.
(*<avx512>_vpternlog<mode>_3): Likewise.
liuhongt [Wed, 31 May 2023 03:20:46 +0000 (11:20 +0800)]
Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.
We have already use intermidate type in case WIDEN, but not for NONE,
this patch extended that.
gcc/ChangeLog:
PR target/110018
* tree-vect-stmts.cc (vectorizable_conversion): Use
intermiediate integer type for float_expr/fix_trunc_expr when
direct optab is not existed.
Lewis Hyatt [Wed, 3 Aug 2022 14:46:23 +0000 (10:46 -0400)]
libcpp: Improve location for macro names [PR66290]
When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.
libcpp/ChangeLog:
PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.
Several gcc.target/aarch64/sve/pcs tests started failing after 6a2e8dcbbd4, because the tests weren't robust against whether
an indirect argument register or the stack pointer was used as
the base for stores.
The patch allows either base register when there is only one
indirect argument. It disables -fcprop-registers in cases where
there are sometimes multiple indirect arguments, since the name
of the argument register is then an important part of the test.
Disabling -fcprop-registers gives poor final register allocation,
since:
* combine's make_more_copies hack adds extra redundant moves
* code with those moves is not allocated as well as moves without them
* we often rely on -fcprop-registers to clean up the allocation later
The patch therefore disables combine in the same tests as
cprop-registers.
The SVE handling of stack clash protection copied the stack
pointer to X11 before the probe and set up X11 as the CFA
for unwind purposes:
/* This is done to provide unwinding information for the stack
adjustments we're about to do, however to prevent the optimizers
from removing the R11 move and leaving the CFA note (which would be
very wrong) we tie the old and new stack pointer together.
The tie will expand to nothing but the optimizers will not touch
the instruction. */
rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));
/* We want the CFA independent of the stack pointer for the
duration of the loop. */
add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
RTX_FRAME_RELATED_P (insn) = 1;
-fcprop-registers is now smart enough to realise that X11 = SP,
replace X11 with SP in the stack tie, and delete the instruction
created above.
This patch tries to prevent that by making stack_tie fussy about
the register numbers. It fixes failures in
gcc.target/aarch64/sve/pcs/stack_clash*.c.
gcc/
* config/aarch64/aarch64.md (stack_tie): Hard-code the first
register operand to the stack pointer. Require the second register
operand to have the number specified in a separate const_int operand.
* config/aarch64/aarch64.cc (aarch64_emit_stack_tie): New function.
(aarch64_allocate_and_probe_stack_space): Use it.
(aarch64_expand_prologue, aarch64_expand_epilogue): Likewise.
(aarch64_expand_epilogue): Likewise.
Jakub Jelinek [Tue, 20 Jun 2023 18:17:41 +0000 (20:17 +0200)]
tree-ssa-math-opts: Small uaddc/usubc pattern matching improvement [PR79173]
In the following testcase we fail to pattern recognize the least significant
.UADDC call. The reason is that arg3 in that case is
_3 = .ADD_OVERFLOW (...);
_2 = __imag__ _3;
_1 = _2 != 0;
arg3 = (unsigned long) _1;
and while before the changes arg3 has a single use in some .ADD_OVERFLOW
later on, we add a .UADDC call next to it (and gsi_remove/gsi_replace only
what is strictly necessary and leave quite a few dead stmts around which
next DCE cleans up) and so it all of sudden isn't used just once, but twice
(.ADD_OVERFLOW and .UADDC) and so uaddc_cast fails. While we could tweak
uaddc_cast and not require has_single_use in these uses, there is also
no vrp that would figure out that because __imag__ _3 is in [0, 1] range,
it can just use arg3 = __imag__ _3; and drop the comparison and cast.
We already search if either arg2 or arg3 is ultimately set from __imag__
of .{{ADD,SUB}_OVERFLOW,U{ADD,SUB}C} call, so the following patch just
remembers the lhs of __imag__ from that case and uses it later.
2023-06-20 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember lhs of
IMAGPART_EXPR of arg2/arg3 and use that as arg3 if it has the right
type.
Uros Bizjak [Tue, 20 Jun 2023 17:42:21 +0000 (19:42 +0200)]
calls: Change return type of predicate function from int to bool
Also change some internal variables and some function arguments to bool.
gcc/ChangeLog:
* calls.h (setjmp_call_p): Change return type from int to bool.
* calls.cc (struct arg_data): Change "pass_on_stack" to bool.
(store_one_arg): Change return type from int to bool
and adjust function body accordingly. Change "sibcall_failure"
variable to bool.
(finalize_must_preallocate): Ditto. Change *must_preallocate pointer
argument to bool. Change "partial_seen" variable to bool.
(load_register_parameters): Change *sibcall_failure
pointer argument to bool.
(check_sibcall_argument_overlap_1): Change return type from int to bool
and adjust function body accordingly.
(check_sibcall_argument_overlap): Ditto. Change
"mark_stored_args_map" argument to bool.
(emit_call_1): Change "already_popped" variable to bool.
(setjmp_call_p): Change return type from int to bool
and adjust function body accordingly.
(initialize_argument_information): Change *must_preallocate
pointer argument to bool.
(expand_call): Change "pcc_struct_value", "must_preallocate"
and "sibcall_failure" variables to bool.
(emit_library_call_value_1): Change "pcc_struct_value"
variable to bool.
Ian Lance Taylor [Mon, 19 Jun 2023 21:57:54 +0000 (14:57 -0700)]
runtime: use a C function to call mmap
The final argument to mmap, of type off_t, varies.
In CL 445375 we changed it to always use the C off_t type,
but that broke 32-bit big-endian Linux systems. On those systems,
using the C off_t type requires calling the mmap64 function.
In C this is automatically handled by the <sys/mman.h> file.
In Go, we would have to change the magic //extern comment to
call mmap64 when appropriate. Rather than try to get that right,
we instead go through a C function that uses C implicit type
conversions to pick the right type.
Martin Jambor [Tue, 20 Jun 2023 16:15:22 +0000 (18:15 +0200)]
ipa-sra: Disable candidates with no known callers (PR 110276)
In IPA-SRA we use can_be_local_p () predicate rather than just plain
local call graph flag in order to figure out whether the node is a
part of an external API that we cannot change. Although there are
cases where this can allow more transformations, it also means we can
analyze functions which have no callers at all, which is pointless.
Moreover, it makes an assert of hint propagation trigger, which checks
that we have looked at callers before processing hints that come from
them. This has been reported as PR 110276.
This patch simply adds a check that a node has at least one caller
into the early checks and makes the node a non-candidate for any
transformation if it does not.
gcc/ChangeLog:
2023-06-16 Martin Jambor <mjambor@suse.cz>
PR ipa/110276
* ipa-sra.cc (struct caller_issues): New field there_is_one.
(check_for_caller_issues): Set it.
(check_all_callers_for_issues): Check it.
Martin Jambor [Tue, 20 Jun 2023 16:15:22 +0000 (18:15 +0200)]
ipa-cp: Avoid long linear searches through DECL_ARGUMENTS
There have been concerns that linear searches through DECL_ARGUMENTS
that are often necessary to compute the index of a particular
PARM_DECL which is the key to results of IPA-CP can happen often
enough to be a compile time issue, especially if we plug the results
into value numbering, as I intend to do with a follow-up patch.
This patch creates a vector sorted according to PARM_DECLs to do the look-up
for all functions which have some information discovered by IPA-CP and which
have 32 parameters or more. 32 is a hard-wired magical constant here to
capture the trade-off between the memory allocation overhead and length of the
linear search. I do not think it is worth making it a --param but if people
think it appropriate, I can turn it into one.
gcc/ChangeLog:
2023-05-31 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_uid_to_idx_map_elt): New type.
(struct ipcp_transformation): Rearrange members according to
C++ class coding convention, add m_uid_to_idx,
get_param_index and maybe_create_parm_idx_map.
* ipa-cp.cc (ipcp_transformation::get_param_index): New function.
(compare_uids): Likewise.
(ipcp_transformation::maype_create_parm_idx_map): Likewise.
* ipa-prop.cc (ipcp_get_parm_bits): Use get_param_index.
(ipcp_update_bits): Accept TS as a parameter, assume it is not NULL.
(ipcp_update_vr): Likewise.
(ipcp_transform_function): Call, maybe_create_parm_idx_map of TS, bail
out quickly if empty, pass it to ipcp_update_bits and ipcp_update_vr.
Carl Love [Tue, 20 Jun 2023 15:40:30 +0000 (11:40 -0400)]
rs6000: Add builtins for IEEE 128-bit floating point values
Add support for the following builtins:
__vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
__vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
__ieee128 scalar_insert_exp (__vector unsigned __int128,
__vector unsigned long long);
The instructions used in the builtins operate on vector registers. Thus
the result must be moved to a scalar type. There is no clean, performant
way to do this. The user code typically needs the result as a vector
anyway.
gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti.
Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti.
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-builtins.def
(__builtin_vsx_scalar_extract_exp_to_vec,
__builtin_vsx_scalar_extract_sig_to_vec,
__builtin_vsx_scalar_insert_exp_vqp): Add new builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/rs6000/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_<mode> to sxexpqp_<IEEE128:mode>_<V2DI_DI:mode>.
Rename xsxsigqp_<mode> to xsxsigqp_<IEEE128:mode>_<VEC_TI:mode>.
Rename xsiexpqp_<mode> to xsiexpqp_<IEEE128:mode>_<V2DI_DI:mode>.
* doc/extend.texi (scalar_extract_exp_to_vec,
scalar_extract_sig_to_vec): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.
gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/scalar-extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/scalar-insert-exp-16.c: New test case.
Juzhe-Zhong [Tue, 20 Jun 2023 09:00:31 +0000 (17:00 +0800)]
RISC-V: Optimize codegen of VLA SLP
Add comments for Robin:
We want to create a pattern where value[ix] = floor (ix / NPATTERNS).
As NPATTERNS is always a power of two we can rewrite this as
= ix & -NPATTERNS.
`
Recently, I figure out a better approach in case of codegen for VLA stepped vector.
Tobias Burnus [Tue, 20 Jun 2023 11:46:11 +0000 (13:46 +0200)]
Fortran: Fix parse-dump-tree for OpenMP ALLOCATE clause
Commit r14-1301-gd64e8e1224708e added u2.allocator to gfc_omp_namelist
for better readability and to permit to use namelist->expr for code
like the following:
!$omp allocators allocate(align(32) : dt%alloc_comp)
allocate (dt%alloc_comp(5))
!$omp allocate(dt%alloc_comp2) align(64)
allocate (dt%alloc_comp2(10))
However, for the parse-tree dump the change was incomplete.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist): Fix dump of the allocator
modifier of OMP_LIST_ALLOCATE.
Eric Botcazou [Fri, 9 Jun 2023 18:24:41 +0000 (20:24 +0200)]
ada: Minor tweaks
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Pass
the NULL_TREE explicitly and test imported_p in lieu of
Is_Imported. <E_Function>: Remove public_flag local variable and
make extern_flag local variable a constant.
Yannick Moy [Thu, 8 Jun 2023 14:52:24 +0000 (16:52 +0200)]
ada: Fix crash on inlining in GNATprove
After the recent change on detection of non-inlining, calls inside
the iterator part of a quantified expression were not considered
as preventing inlining anymore, leading to a crash later on inside
GNATprove. Now fixed.
gcc/ada/
* sem_res.adb (Resolve_Call): Fix change that replaced test for
quantified expressions by the test for potentially unevaluated
contexts. Both should be performed.
Eric Botcazou [Thu, 1 Jun 2023 11:25:53 +0000 (13:25 +0200)]
ada: Further fixes to handling of private views in instances
This removes more bypasses for private views in instances that are present
in type predicates (Conforming_Types, Covers, Specific_Type and Wrong_Type),
which in exchange requires additional work in Sem_Ch12 to restore the proper
view of types during the instantiation of generic bodies.
The main mechanism for this is the Has_Private_View flag, but it comes with
the limitations that 1) there must be a direct reference to the global type
in the generic construct (either a reference to a global object of this type
or the explicit declaration of a local object of this type), which is not
always the case e.g. for loop parameters and 2) it can deal with a single
type at a time, e.g. it cannot deal with an array type and its component
type if their respective views are not the same in the instance.
To overcome the second limitation, a new Has_Secondary_Private_View flag
is introduced to deal with a secondary type, which as of this writing is
either the component type of an array type or the designated type of an
access type (together they make up the vast majority of the problematic
cases for the Has_Private_View flag alone). This new mechanism subsumes
a specific treatment for them that was added in Copy_Generic_Node a few
years ago, although a specific treatment still needs to be preserved for
comparison and equality operators in a narrower case.
Additional handling is also introduced to overcome the first limitation
for loop parameters in Copy_Generic_Node, and a relaxed condition is used
in Exp_Ch7.Convert_View to generate an unchecked conversion between views.
gcc/ada/
* exp_ch7.adb (Convert_View): Detect more cases of mismatches for
private types and use Implementation_Base_Type as main criterion.
* gen_il-fields.ads (Opt_Field_Enum): Add
Has_Secondary_Private_View
* gen_il-gen-gen_nodes.adb (N_Expanded_Name): Likewise.
(N_Direct_Name): Likewise.
(N_Op): Likewise.
* sem_ch12.ads (Check_Private_View): Document the usage of second
flag Has_Secondary_Private_View.
* sem_ch12.adb (Get_Associated_Entity): New function to retrieve
the ultimate associated entity, if any.
(Check_Private_View): Implement Has_Secondary_Private_View
support.
(Copy_Generic_Node): Remove specific treatment for Component_Type
of an array type and Designated_Type of an access type. Add
specific treatment for comparison and equality operators, as well
as iterator and loop parameter specifications.
(Instantiate_Type): Implement Has_Secondary_Private_View support.
(Requires_Delayed_Save): Call Get_Associated_Entity.
(Set_Global_Type): Implement Has_Secondary_Private_View support.
* sem_ch6.adb (Conforming_Types): Remove bypass for private views
in instances.
* sem_type.adb (Covers): Return true if Is_Subtype_Of does so.
Remove bypass for private views in instances.
(Specific_Type): Likewise.
* sem_util.adb (Wrong_Type): Likewise.
* sinfo.ads (Has_Secondary_Private_View): Document new flag.
The Preelaborate pragma the removed comment was referring to was
indeed present in AI 167, as well as in clause 5.3 of the rationale
for Ada 2012, but it never made it into the 2012 version of the
reference manual.
Richard Biener [Mon, 19 Jun 2023 12:19:47 +0000 (14:19 +0200)]
Improve DSE to handle stores before __builtin_unreachable ()
DSE isn't good at identifying program points that end lifetime
of variables that are not associated with virtual operands. But
at least for those that end basic-blocks we can handle the simple
case where this ending is in the same basic-block as the definition
we want to elide. That should catch quite some common cases already.
* tree-ssa-dse.cc (dse_classify_store): When we found
no defs and the basic-block with the original definition
ends in __builtin_unreachable[_trap] the store is dead.
* gcc.dg/tree-ssa/ssa-dse-47.c: New testcase.
* c-c++-common/asan/pr106558.c: Avoid undefined behavior
due to missing return.
Richard Biener [Tue, 20 Jun 2023 07:51:40 +0000 (09:51 +0200)]
Update virtual SSA form manually where easily possible in phiprop
This keeps virtual SSA form up-to-date in phiprop when easily possible.
Only when we deal with aggregate copies the work would be too
heavy-handed in general.
* tree-ssa-phiprop.cc (phiprop_insert_phi): For simple loads
keep the virtual SSA form up-to-date.
Kyrylo Tkachov [Tue, 20 Jun 2023 10:03:47 +0000 (11:03 +0100)]
aarch64: Optimise ADDP with same source operands
We've been asked to optimise the testcase in this patch of a 64-bit ADDP with
the low and high halves of the same 128-bit vector. This can be done by a
single .4s ADDP followed by just reading the bottom 64 bits. A splitter for
this is quite straightforward now that all the vec_concat stuff is collapsed
by simplify-rtx.
With this patch we generate a single:
addp v0.4s, v0.4s, v0.4s
instead of:
dup d31, v0.d[1]
addp v0.2s, v0.2s, v31.2s
ret
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_addp_same_reg<mode>):
New define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/addp-same-low_1.c: New test.
Yannick Moy [Thu, 8 Jun 2023 09:12:25 +0000 (09:12 +0000)]
ada: Do not issue warning on postcondition in some cases
Warning on suspicious postcondition is not relevant if contract
Exceptional_Cases is present, or if contract Always_Terminates is
present with a non-statically True value, as in those cases the
postcondition can be used to indicate constraints on those pre-state
for which the subprogram might terminate normally.
gcc/ada/
* sem_util.adb (Check_Result_And_Post_State): Do not warn in cases
where the warning could be spurious.
Yannick Moy [Wed, 10 May 2023 14:10:54 +0000 (16:10 +0200)]
ada: Add the ability to add error codes to error messages
Add a new character sequence [] for error codes in error messages
handled by Error_Msg procedures, to use for SPARK-related errors.
Display of additional information on the error or warning based on
the error code is delegated to GNATprove.
gcc/ada/
* err_vars.ads (Error_Msg_Code): New variable for error codes.
* errout.adb (Error_Msg_Internal): Display continuation message
when an error code was present.
(Set_Msg_Text): Handle character sequence [] for error codes.
* errout.ads: Document new insertion sequence [].
(Error_Msg_Code): New renaming.
* erroutc.adb (Prescan_Message): Detect presence of error code.
(Set_Msg_Insertion_Code): Handle new insertion sequence [].
* erroutc.ads (Has_Error_Code): New variable for prescan.
(Set_Msg_Insertion_Code): Handle new insertion sequence [].
* contracts.adb (Check_Type_Or_Object_External_Properties):
Replace reference to SPARK RM section by an error code.
* sem_elab.adb (SPARK_Processor): Same.
* sem_prag.adb (Check_Missing_Part_Of): Same.
* sem_res.adb (Resolve_Actuals, Resolve_Entity_Name): Same.
Piotr Trojanek [Mon, 5 Jun 2023 08:30:39 +0000 (10:30 +0200)]
ada: Fix for attribute Range in Exceptional_Cases
Attribute Range is now handled like First and Last when occurring within
the consequence of Exceptional_Cases, i.e. attribute Range is not
considered to be a read of a formal parameter that would not be allowed
in the contract.
gcc/ada/
* sem_res.adb (Resolve_Entity_Name): Handle Range like First and Last.
Jose Ruiz [Tue, 30 May 2023 14:08:38 +0000 (16:08 +0200)]
ada: Document partition-wide Ada signal handlers
Indicate the signal handlers that are set by the Ada
run time, and explain how to prevent them if needed.
gcc/ada/
* doc/gnat_ugn/the_gnat_compilation_model.rst
(Partition-Wide Settings): add this subsection to document
configuration settings made by the Ada run time.
* gnat_ugn.texi: Regenerate.
Piotr Trojanek [Thu, 1 Jun 2023 07:59:40 +0000 (09:59 +0200)]
ada: Fix for quantified expressions in Exceptional_Cases
When detecting illegal uses of formal parameters of the current
subprogram in contract of its Exceptional_Cases, we relied on the
Current_Scope. However, quantified expressions introduce an implicit
scope, which we need to take into account.
Bob Duff [Wed, 31 May 2023 13:21:44 +0000 (09:21 -0400)]
ada: Fix bug in predicate checks with address clauses
This patch fixes a compiler bug triggered by having a type with some
defaulted components, and a predicate, and an object of that type with
an address clause. In this case, the compiler was crashing.
gcc/ada/
* sem_ch3.adb (Analyze_Object_Declaration): Remove predicate-check
generation if there is an address clause. These are unnecessary,
and cause gigi to crash.
* exp_util.ads (Following_Address_Clause): Remove obsolete "???"
comments. The suggested changes were done long ago.
Eric Botcazou [Wed, 31 May 2023 12:32:59 +0000 (14:32 +0200)]
ada: Fix fallout of fix to handling of private views in instances
Check_Actual_Type incorrectly switches the view of a private type declared
in the enclosing scope of a generic unit but that has a private ancestor.
gcc/ada/
* einfo.ads (Has_Private_Ancestor): Fix inaccuracy in description.
* sem_ch12.adb (Check_Actual_Type): Do not switch the view of the
type if it has a private ancestor.
Eric Botcazou [Mon, 29 May 2023 10:02:28 +0000 (12:02 +0200)]
ada: Small fixes to handling of private views in instances
The main change is the removal of the special bypass for private views in
Resolve_Implicit_Dereference, which in exchange requires additional work
in Check_Generic_Actuals and a couple more calls to Set_Global_Type in
Save_References_In_Identifier. This also removes an unused parameter in
Convert_View and adds a missing comment in Build_Derived_Record_Type.
gcc/ada/
* exp_ch7.adb (Convert_View): Remove Ind parameter and adjust.
* sem_ch12.adb (Check_Generic_Actuals): Check the type of both in
and in out actual objects, as well as the type of formal parameters
of actual subprograms. Extend the condition under which the views
are swapped to nested generic constructs.
(Save_References_In_Identifier): Call Set_Global_Type on a global
identifier rewritten as an explicit dereference, either directly
or after having first been rewritten as a function call.
(Save_References_In_Operator): Set N2 unconditionally and reuse it.
* sem_ch3.adb (Build_Derived_Record_Type): Add missing comment.
* sem_res.adb (Resolve_Implicit_Dereference): Remove special bypass
for private views in instances.
Eric Botcazou [Thu, 25 May 2023 22:09:14 +0000 (00:09 +0200)]
ada: Fix internal error on aggregate within container aggregate
This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.
gcc/ada/
* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
(Expand_Array_Aggregate): Do not delay the expansion if the parent
node is a container aggregate.
Ghjuvan Lacambre [Fri, 26 May 2023 11:26:21 +0000 (13:26 +0200)]
ada: Fix -fdiagnostics-format=json not printing all messages
The previous version of this code stopped printing messages as soon as
it encountered a deleted or continuation message. This was wrong,
continuation and deleted messages can be followed by live messages that
do need to be printed.
Eric Botcazou [Thu, 25 May 2023 14:40:35 +0000 (16:40 +0200)]
ada: Introduce -gnateH switch to force reverse Bit_Order threshold to 64
This can be helpful for legacy code that still makes use of an original
reverse Bit_Order clause, i.e. without a Scalar_Storage_Order clause.
gcc/ada/
* doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler
Switches): Document -gnateH.
* opt.ads (Reverse_Bit_Order_Threshold): New variable.
* sem_ch13.adb (Adjust_Record_For_Reverse_Bit_Order): Use its value
if it is nonnegative instead of System_Max_Integer_Size.
* switch-c.adb (Scan_Front_End_Switches): Deal with -gnateH.
* usage.adb (Usage): Print -gnateH.
* gnat_ugn.texi: Regenerate.
Marc Poulhiès [Mon, 22 May 2023 11:59:05 +0000 (13:59 +0200)]
ada: Fix type derivation of subtype of derived type
Deriving from a subtype of a derived type of a private type, whose full
view is itself a derived type of a discriminated record with a known
discriminatant was failing with the error message:
invalid constraint: type has no discriminant
The compiler needs to use the full view to be able to constrain the
type.
Also fix minor typo in comments.
gcc/ada/
* sem_ch3.adb (Build_Derived_Record_Type): Use full view as
Parent_Base if needed.
Ghjuvan Lacambre [Tue, 23 May 2023 15:50:24 +0000 (17:50 +0200)]
ada: Pass Error_Node to calls to Error_Msg in lib-load.adb
When not passing Error_Node, Error_Msg will treat Current_Node as the
node attached to the message. When this happens in lib-load.adb due to a
file that cannot be loaded, Current_Node might reference a node that
doesn't actually exist. This is a problem when using -gnatdJ and
-fdiagnostics-format, as in this case GNAT will attempt to retrieve
information from the node attached to the message and thus crash when
said node is invalid.
gcc/ada/
* lib-load.adb (Load_Unit): Pass Error_Node to calls to Error_Msg.
Claire Dross [Thu, 30 Mar 2023 09:09:33 +0000 (11:09 +0200)]
ada: Remove references to Might_Not_Return and Always_Return
The Might_Not_Return and Always_Return annotations for GNATprove
should now be replaced by the two more precise aspects
Exceptional_Cases and Always_Terminates.
They allow to specify whether a subprogram is allowed to raise
exceptions or fail to complete.
Javier Miranda [Wed, 3 May 2023 17:30:51 +0000 (17:30 +0000)]
ada: Spurious error on package instantiation
The compiler reports spurious errors processing the instantation
of a generic package when the instantation is performed in the
the body of a package that has a private type T, a dispatching
primitive of T has the same name as a component of T, and
an extension of T is used as the actual parameter for a
formal derived type of T in the instantiation.
gcc/ada/
* sem_ch4.adb
(Try_Selected_Component_In_Instance): New subprogram; factorizes
existing code.
(Find_Component_In_Instance) Moved inside the new subprogram.
(Analyze_Selected_Component): Invoke the new subprogram before
trying the Object.Operation notation.
ada: Fix edge case in Ada.Calendar.Formatting.Time_Of
Before this patch, Ada.Calendar.Formatting.Time_Of executed extra code
when passed a number of seconds equal to the number of seconds in a day.
This caused the result to be off, perhaps because a statement resetting
the number of seconds to zero was missing.
Instead of adding such a statement, this patch removes the special
handling of the problematic case, which gives the intended result.
gcc/ada/
* libgnat/a-calfor.adb (Time_Of): Fix handling of special case.
Jan Beulich [Tue, 20 Jun 2023 07:05:48 +0000 (09:05 +0200)]
x86: correct and improve "*vec_dupv2di"
The input constraint for the %vmovddup alternative was wrong, as the
upper 16 XMM registers require AVX512VL to be used with this insn. To
compensate, introduce a new alternative permitting all 32 registers, by
broadcasting to the full 512 bits in that case if AVX512VL is not
available.
Richard Biener [Mon, 19 Jun 2023 07:23:16 +0000 (09:23 +0200)]
debug/110295 - mixed up early/late debug for member DIEs
When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members. Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.
The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.
PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.
Lehua Ding [Sun, 18 Jun 2023 11:41:57 +0000 (19:41 +0800)]
RISC-V: Add tuple vector mode psABI checking and simplify code
Hi,
This patch does several things:
1. Adds the missed checking of tuple vector mode
2. Extend the scope of checking to all vector types, previously it
was only for scalable vector types.
3. Simplify the logic of determining code of vector type which will lower to
vector tmode code
Jin Ma [Mon, 19 Jun 2023 19:02:47 +0000 (13:02 -0600)]
RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.
In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.
Jan Hubicka [Mon, 19 Jun 2023 16:28:17 +0000 (18:28 +0200)]
optimize std::max early
we currently produce very bad code on loops using std::vector as a stack, since
we fail to inline push_back which in turn prevents SRA and we fail to optimize
out some store-to-load pairs.
I looked into why this function is not inlined and it is inlined by clang. We
currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30
at -O3. Clang has similar estimate, but still decides to inline at -O2.
I looked into reason why the body is so large and one problem I spotted is the
way std::max is implemented by taking and returning reference to the values.
const T& max( const T& a, const T& b );
This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.
We optimize this to MAX_EXPR, but only during late optimizations. I think this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA. The following is easist fix that simply adds phiprop pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
remove the memory stores.
gcc/ChangeLog:
PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.
Tamar Christina [Mon, 19 Jun 2023 14:47:46 +0000 (15:47 +0100)]
New compact syntax for insn and insn_split in Machine Descriptions.
This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.
With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.
The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.
One has to count and this often is tedious. Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.
This new syntax takes into account many of the common things that are done in MD
files. It's also worth saying that this version is intended to deal with the
common case of a string based alternatives. For C chunks we have some ideas
but those are not intended to be addressed here.
The main syntax rules are as follows (See docs for full rules):
- Template must start with "{@" and end with "}" to use the new syntax.
- "{@" is followed by a layout in parentheses which is "cons:" followed by
a list of match_operand/match_scratch IDs, then a semicolon, then the
same for attributes ("attrs:"). Both sections are optional (so you can
use only cons, or only attrs, or both), and cons must come before attrs
if present.
- Each alternative begins with any amount of whitespace.
- Following the whitespace is a comma-separated list of constraints and/or
attributes within brackets [], with sections separated by a semicolon.
- Following the closing ']' is any amount of whitespace, and then the actual
asm output.
- Spaces are allowed in the list (they will simply be removed).
- All alternatives should be specified: a blank list should be
"[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if
you leave certain attributes empty, I have found).
- The actual constraint string in the match_operand or match_scratch, and
the attribute string in the set_attr, must be blank or an empty string
(you can't combine the old and new syntaxes).
- The common idion * return can be shortened by using <<.
- Any unexpanded iterators left during processing will result in an error at
compile time. If for some reason <> is needed in the output then these
must be escaped using \.
- Within an {@ block both multiline and singleline C comments are allowed, but
when used outside of a C block they must be the only non-whitespace blocks on
the line
- Inside an {@ block any unexpanded iterators will result in a compile time
fault instead of incorrect assembly being generated at runtime. If the
literal <> is needed in the output this needs to be escaped with \<\>.
- This check is not performed inside C blocks (lines starting with *).
- Instead of copying the previous instruction again in the next pattern, one
can use ^ to refer to the previous asm string.
This patch works by blindly transforming the new syntax into the old syntax,
so it doesn't do extensive checking. However, it does verify that:
- The correct number of constraints/attributes are specified.
- You haven't mixed old and new syntax.
- The specified operand IDs/attribute names actually exist.
- You don't have duplicate cons
If something goes wrong, it may write invalid constraints/attributes/template
back into the rtx. But this shouldn't matter because error_at will cause the
program to fail on exit anyway.
Because this transformation occurs as early as possible (before patterns are
queued), the rest of the compiler can completely ignore the new syntax and
assume that the old syntax will always be used.
This doesn't seem to have any measurable effect on the runtime of gen*
programs.
gcc/ChangeLog:
* gensupport.cc (class conlist, add_constraints, add_attributes,
skip_spaces, expect_char, preprocess_compact_syntax,
parse_section_layout, parse_section, convert_syntax): New.
(process_rtx): Check for conversion.
* genoutput.cc (process_template): Check for unresolved iterators.
(class data): Add compact_syntax_p.
(gen_insn): Use it.
* gensupport.h (compact_syntax): New.
(hash-set.h): Include.
* doc/md.texi: Document it.
Uros Bizjak [Mon, 19 Jun 2023 09:49:08 +0000 (11:49 +0200)]
recog: Change return type of predicate functions from int to bool
Also change some internal variables to bool and change return type of
split_all_insns_noflow to void.
gcc/ChangeLog:
* recog.h (check_asm_operands): Change return type from int to bool.
(insn_invalid_p): Ditto.
(verify_changes): Ditto.
(apply_change_group): Ditto.
(constrain_operands): Ditto.
(constrain_operands_cached): Ditto.
(validate_replace_rtx_subexp): Ditto.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(added_clobbers_hard_reg_p): Ditto.
(peep2_regno_dead_p): Ditto.
(peep2_reg_dead_p): Ditto.
(store_data_bypass_p): Ditto.
(if_test_bypass_p): Ditto.
* rtl.h (split_all_insns_noflow): Change
return type from unsigned int to void.
* genemit.cc (output_added_clobbers_hard_reg_p): Change return type
of generated added_clobbers_hard_reg_p from int to bool and adjust
function body accordingly. Change "used" variable type from
int to bool.
* recog.cc (check_asm_operands): Change return type
from int to bool and adjust function body accordingly.
(insn_invalid_p): Ditto. Change "is_asm" variable to bool.
(verify_changes): Change return type from int to bool.
(apply_change_group): Change return type from int to bool
and adjust function body accordingly.
(validate_replace_rtx_subexp): Change return type from int to bool.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(constrain_operands_cached): Ditto.
(constrain_operands): Ditto. Change "lose" and "win"
variables type from int to bool.
(split_all_insns_noflow): Change return type from unsigned int
to void and adjust function body accordingly.
(peep2_regno_dead_p): Change return type from int to bool.
(peep2_reg_dead_p): Ditto.
(peep2_find_free_register): Change "success"
variable type from int to bool
(store_data_bypass_p_1): Change return type from int to bool.
(store_data_bypass_p): Ditto.
Pan Li [Sun, 18 Jun 2023 15:07:53 +0000 (23:07 +0800)]
RISC-V: Bugfix for RVV widenning reduction in ZVE32/64
The rvv widdening reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64
if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32
}
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
Please note both GCC 13 and 14 are impacted by this issue.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
PR target/110299
* gcc.target/riscv/rvv/base/pr110299-1.c: New test.
* gcc.target/riscv/rvv/base/pr110299-1.h: New test.
* gcc.target/riscv/rvv/base/pr110299-2.c: New test.
* gcc.target/riscv/rvv/base/pr110299-2.h: New test.
* gcc.target/riscv/rvv/base/pr110299-3.c: New test.
* gcc.target/riscv/rvv/base/pr110299-3.h: New test.
* gcc.target/riscv/rvv/base/pr110299-4.c: New test.
* gcc.target/riscv/rvv/base/pr110299-4.h: New test.
Pan Li [Sat, 17 Jun 2023 14:11:02 +0000 (22:11 +0800)]
RISC-V: Bugfix for RVV float reduction in ZVE32/64
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64
if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32
}
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.
Please note both GCC 13 and 14 are impacted by this issue.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
PR target/110277
* gcc.target/riscv/rvv/base/pr110277-1.c: New test.
* gcc.target/riscv/rvv/base/pr110277-1.h: New test.
* gcc.target/riscv/rvv/base/pr110277-2.c: New test.
* gcc.target/riscv/rvv/base/pr110277-2.h: New test.
Andrew Stubbs [Thu, 27 Apr 2023 14:34:28 +0000 (15:34 +0100)]
amdgcn: implement vector div and mod libfuncs
Also divmod, but only for scalar modes, for now (because there are no complex
int vectors yet).
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_divmod_libfunc): New function.
(gcn_init_libfuncs): Add div and mod functions for all modes.
Add placeholders for divmod functions.
(TARGET_EXPAND_DIVMOD_LIBFUNC): Define.
libgcc/ChangeLog:
* config/gcn/lib2-divmod-di.c: Reimplement like lib2-divmod.c.
* config/gcn/lib2-divmod.c: Likewise.
* config/gcn/lib2-gcn.h: Add new types and prototypes for all the
new vector libfuncs.
* config/gcn/t-amdgcn: Add new files.
* config/gcn/amdgcn_veclib.h: New file.
* config/gcn/lib2-vec_divmod-di.c: New file.
* config/gcn/lib2-vec_divmod-hi.c: New file.
* config/gcn/lib2-vec_divmod-qi.c: New file.
* config/gcn/lib2-vec_divmod.c: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/predcom-2.c: Avoid vectors on amdgcn.
* gcc.dg/unroll-8.c: Likewise.
* gcc.dg/vect/slp-26.c: Change expected results on amdgdn.
* lib/target-supports.exp
(check_effective_target_vect_int_mod): Add amdgcn.
(check_effective_target_divmod): Likewise.
* gcc.target/gcn/simd-math-3-16.c: New test.
* gcc.target/gcn/simd-math-3-2.c: New test.
* gcc.target/gcn/simd-math-3-32.c: New test.
* gcc.target/gcn/simd-math-3-4.c: New test.
* gcc.target/gcn/simd-math-3-8.c: New test.
* gcc.target/gcn/simd-math-3-char-16.c: New test.
* gcc.target/gcn/simd-math-3-char-2.c: New test.
* gcc.target/gcn/simd-math-3-char-32.c: New test.
* gcc.target/gcn/simd-math-3-char-4.c: New test.
* gcc.target/gcn/simd-math-3-char-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run-16.c: New test.
* gcc.target/gcn/simd-math-3-char-run-2.c: New test.
* gcc.target/gcn/simd-math-3-char-run-32.c: New test.
* gcc.target/gcn/simd-math-3-char-run-4.c: New test.
* gcc.target/gcn/simd-math-3-char-run-8.c: New test.
* gcc.target/gcn/simd-math-3-char-run.c: New test.
* gcc.target/gcn/simd-math-3-char.c: New test.
* gcc.target/gcn/simd-math-3-long-16.c: New test.
* gcc.target/gcn/simd-math-3-long-2.c: New test.
* gcc.target/gcn/simd-math-3-long-32.c: New test.
* gcc.target/gcn/simd-math-3-long-4.c: New test.
* gcc.target/gcn/simd-math-3-long-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run-16.c: New test.
* gcc.target/gcn/simd-math-3-long-run-2.c: New test.
* gcc.target/gcn/simd-math-3-long-run-32.c: New test.
* gcc.target/gcn/simd-math-3-long-run-4.c: New test.
* gcc.target/gcn/simd-math-3-long-run-8.c: New test.
* gcc.target/gcn/simd-math-3-long-run.c: New test.
* gcc.target/gcn/simd-math-3-long.c: New test.
* gcc.target/gcn/simd-math-3-run-16.c: New test.
* gcc.target/gcn/simd-math-3-run-2.c: New test.
* gcc.target/gcn/simd-math-3-run-32.c: New test.
* gcc.target/gcn/simd-math-3-run-4.c: New test.
* gcc.target/gcn/simd-math-3-run-8.c: New test.
* gcc.target/gcn/simd-math-3-run.c: New test.
* gcc.target/gcn/simd-math-3-short-16.c: New test.
* gcc.target/gcn/simd-math-3-short-2.c: New test.
* gcc.target/gcn/simd-math-3-short-32.c: New test.
* gcc.target/gcn/simd-math-3-short-4.c: New test.
* gcc.target/gcn/simd-math-3-short-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run-16.c: New test.
* gcc.target/gcn/simd-math-3-short-run-2.c: New test.
* gcc.target/gcn/simd-math-3-short-run-32.c: New test.
* gcc.target/gcn/simd-math-3-short-run-4.c: New test.
* gcc.target/gcn/simd-math-3-short-run-8.c: New test.
* gcc.target/gcn/simd-math-3-short-run.c: New test.
* gcc.target/gcn/simd-math-3-short.c: New test.
* gcc.target/gcn/simd-math-3.c: New test.
* gcc.target/gcn/simd-math-4-char-run.c: New test.
* gcc.target/gcn/simd-math-4-char.c: New test.
* gcc.target/gcn/simd-math-4-long-run.c: New test.
* gcc.target/gcn/simd-math-4-long.c: New test.
* gcc.target/gcn/simd-math-4-run.c: New test.
* gcc.target/gcn/simd-math-4-short-run.c: New test.
* gcc.target/gcn/simd-math-4-short.c: New test.
* gcc.target/gcn/simd-math-4.c: New test.
* gcc.target/gcn/simd-math-5-16.c: New test.
* gcc.target/gcn/simd-math-5-32.c: New test.
* gcc.target/gcn/simd-math-5-4.c: New test.
* gcc.target/gcn/simd-math-5-8.c: New test.
* gcc.target/gcn/simd-math-5-char-16.c: New test.
* gcc.target/gcn/simd-math-5-char-32.c: New test.
* gcc.target/gcn/simd-math-5-char-4.c: New test.
* gcc.target/gcn/simd-math-5-char-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run-16.c: New test.
* gcc.target/gcn/simd-math-5-char-run-32.c: New test.
* gcc.target/gcn/simd-math-5-char-run-4.c: New test.
* gcc.target/gcn/simd-math-5-char-run-8.c: New test.
* gcc.target/gcn/simd-math-5-char-run.c: New test.
* gcc.target/gcn/simd-math-5-char.c: New test.
* gcc.target/gcn/simd-math-5-long-16.c: New test.
* gcc.target/gcn/simd-math-5-long-32.c: New test.
* gcc.target/gcn/simd-math-5-long-4.c: New test.
* gcc.target/gcn/simd-math-5-long-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run-16.c: New test.
* gcc.target/gcn/simd-math-5-long-run-32.c: New test.
* gcc.target/gcn/simd-math-5-long-run-4.c: New test.
* gcc.target/gcn/simd-math-5-long-run-8.c: New test.
* gcc.target/gcn/simd-math-5-long-run.c: New test.
* gcc.target/gcn/simd-math-5-long.c: New test.
* gcc.target/gcn/simd-math-5-run-16.c: New test.
* gcc.target/gcn/simd-math-5-run-32.c: New test.
* gcc.target/gcn/simd-math-5-run-4.c: New test.
* gcc.target/gcn/simd-math-5-run-8.c: New test.
* gcc.target/gcn/simd-math-5-run.c: New test.
* gcc.target/gcn/simd-math-5-short-16.c: New test.
* gcc.target/gcn/simd-math-5-short-32.c: New test.
* gcc.target/gcn/simd-math-5-short-4.c: New test.
* gcc.target/gcn/simd-math-5-short-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run-16.c: New test.
* gcc.target/gcn/simd-math-5-short-run-32.c: New test.
* gcc.target/gcn/simd-math-5-short-run-4.c: New test.
* gcc.target/gcn/simd-math-5-short-run-8.c: New test.
* gcc.target/gcn/simd-math-5-short-run.c: New test.
* gcc.target/gcn/simd-math-5-short.c: New test.
* gcc.target/gcn/simd-math-5.c: New test.
Andrew Stubbs [Fri, 16 Jun 2023 16:48:23 +0000 (17:48 +0100)]
amdgcn: Delete inactive libfuncs
The HImode libfuncs weren't called and trying to enable them fails because
TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness
isn't known.
Richard Biener [Mon, 19 Jun 2023 06:20:16 +0000 (08:20 +0200)]
Remove -save-temps from tests using -flto
The following removes -save-temps that doesn't seem to have any
good reason from tests that also run with -flto added. That can
cause ltrans files to race with other multilibs tested and I'm
frequently seeing linker complaints that the architecture
doesn't match here.
I'm not sure whether the .ltrans.o files end up in a non gccN/
specific directory or if we end up sharing the same dir for
different multilibs (not sure if it's easily possible to avoid that).
Richard Biener [Mon, 19 Jun 2023 07:52:45 +0000 (09:52 +0200)]
tree-optimization/110298 - CFG cleanup and stale nb_iterations
When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.
PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.
Kyrylo Tkachov [Mon, 19 Jun 2023 09:56:37 +0000 (10:56 +0100)]
simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector
In the testcase for this patch we try to vec_concat the lowpart and highpart of a vector, but the lowpart is expressed as a subreg.
simplify-rtx.cc does not recognise this and combine ends up trying to match:
Trying 7 -> 8:
7: r93:V2SI=vec_select(r95:V4SI,parallel)
8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
REG_DEAD r95:V4SI
REG_DEAD r93:V2SI
Failed to match this instruction:
(set (reg:V4SI 97)
(vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
(vec_select:V2SI (reg/v:V4SI 95 [ a ])
(parallel:V4SI [
(const_int 2 [0x2])
(const_int 3 [0x3])
]))))
This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a simplification.
The testcase is a bit artificial, but I do have other aarch64-specific patterns that I want to optimise later
that rely on this simplification happening.
Without this patch for the testcase we generate:
foo:
dup d31, v0.d[1]
ins v0.d[1], v31.d[0]
ret
whereas we should just not generate anything as the operation is ultimately a no-op.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify vec_concat of lowpart subreg and high part vec_select.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/low-high-combine_1.c: New test.
Tobias Burnus [Mon, 19 Jun 2023 08:24:08 +0000 (10:24 +0200)]
Doc update: -foffload-options= examples + OpenMP in Fortran intrinsic modules
With LTO, the -O.. flags of the host are passed on to the lto compiler, which
also includes offloading compilers. Therefore, using --foffload-options=-O3 is
misleading as it implies that without the default optimizations are used. Hence,
this flags has now been removed from the usage examples.
The Fortran documentation lists the content (except for API routines) routines
of the intrinsic OpenMP modules OMP_LIB and OMP_LIB_KINDS; this commit adds
two missing named constants and links also to the OpenMP 5.1 and 5.2
OpenMP spec for completeness.
gcc/ChangeLog:
* doc/invoke.texi (-foffload-options): Remove '-O3' from the examples.
gcc/fortran/ChangeLog:
* intrinsic.texi (OpenMP Modules OMP_LIB and OMP_LIB_KINDS): Also
add references to the OpenMP 5.1 and 5.2 spec; add omp_initial_device
and omp_invalid_device named constants.
The warning was raised on accessing SFRs at addresses below the default
page size, as gcc considers accessing addresses in the first page of
memory as suspicious. This doesn't apply to an embedded target like the
avr, where both flash and RAM have zero as a valid address. Zero is also
a valid address in named address spaces (__memx, flash<n> etc..).
This commit implements TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID for the avr
target and reports to gcc that zero is a valid address on all
address spaces. It also disables flag_delete_null_pointer_checks
based on the target hook, and modifies target-supports.exp to add avr
to the list of targets that always keep null pointer checks. This fixes
a bunch of DejaGNU failures that occur otherwise.
PR target/105523
gcc/ChangeLog:
* common/config/avr/avr-common.cc: Remove setting
of OPT_fdelete_null_pointer_checks.
* config/avr/avr.cc (avr_option_override): Clear
flag_delete_null_pointer_checks if zero_address_valid.
(avr_addr_space_zero_address_valid): New function.
(TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Provide target
hook.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_keeps_null_pointer_checks): Add
avr.
* gcc.target/avr/pr105523.c: New test.
Ju-Zhe Zhong [Mon, 19 Jun 2023 08:07:09 +0000 (10:07 +0200)]
VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.
LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).
Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_*):
Code: v1 = MEM (...)
for (int i = 0; i < 4; i++) v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2
MEM[...] = v3
Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_* with length = VF, mask = comparison):
Code: mask = comparison
for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask)
a[i] = b[i] + c[i]; v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
Case 3 (VLA):
Code: loop_len = SELECT_VL or MIN
for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3)
Case 4 (VLA):
Code: loop_len = SELECT_VL or MIN
for (int i = 0; i < n; i++) mask = comparison
if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask)
a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask, v3)
Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
gcc/ChangeLog:
Robin Dapp [Tue, 6 Jun 2023 19:58:25 +0000 (21:58 +0200)]
RISC-V: Add autovec FP binary operations.
This implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.
The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.
As long as we do not have full middle-end support we need
-ffast-math for the tests.
In order to allow proper _Float16 this patch disables
general _Float16 promotion to float TARGET_ZVFH is defined
similar to TARGET_ZFH or TARGET_ZHINX.
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode>3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(enum vxrm_field_enum): Rename this...
(enum fixed_point_rounding_mode): ...to this.
(enum frm_field_enum): Rename this...
(enum floating_point_rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.
Robin Dapp [Mon, 5 Jun 2023 11:12:01 +0000 (13:12 +0200)]
RISC-V: Add sign-extending variants for vmv.x.s.
When the destination register of a vmv.x.s needs to be sign extended to
XLEN we currently emit an sext insn. Since vmv.x.s performs this
automatically this patch adds two instruction patterns that include
sign_extend for the destination operand.
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Add VI_QH iterator.
* config/riscv/autovec-opt.md
(@pred_extract_first_sextdi<mode>): New vmv.x.s pattern
that includes sign extension.
(@pred_extract_first_sextsi<mode>): Dito for SImode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Ensure
that no sext insns are present.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
Robin Dapp [Thu, 1 Jun 2023 12:18:57 +0000 (14:18 +0200)]
RISC-V: Implement vec_set and vec_extract.
This implements the vec_set and vec_extract patterns for integer and
floating-point data types. For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.
vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.
The patch does not include vector-vector extraction which
will be done at a later time.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
New test.
Tobias Burnus [Mon, 19 Jun 2023 07:52:10 +0000 (09:52 +0200)]
libgomp.c/target-51.c: Accept more error-msg variants in dg-output
Depending on the details, the testcase can fail with different but
related messages; all of the following all could be observed for this
testcase:
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found
libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available
Before, the last two were tested for with 'target offload_device' and
'! offload_device', respectively. Now, all three are accepted by matching
'.*' already after 'but' and without distinguishing whether the effective
target is an offload_device or not.
(For completeness, there is a fourth error that follows this pattern:
'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.)
libgomp/
* testsuite/libgomp.c/target-51.c: Accept more error msg variants
as expected dg-output.
Richard Biener [Tue, 6 Jun 2023 11:05:56 +0000 (13:05 +0200)]
AVX512 fully masked vectorization
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).
Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops. Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop. From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.
This patch does not enable using fully masked loops or
masked epilogues by default. More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls. Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info. vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.
I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.
Bootstrapped and tested on x86_64-unknown-linux-gnu. I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).
* tree-vectorizer.h (enum vect_partial_vector_style): New.
(_loop_vec_info::partial_vector_style): Likewise.
(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
(rgroup_controls::compare_type): Add.
(vec_loop_masks): Change from a typedef to auto_vec<>
to a structure.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Adjust. Convert niters_skip to compare_type.
(vect_set_loop_condition_partial_vectors_avx512): New function
implementing the AVX512 partial vector codegen.
(vect_set_loop_condition): Dispatch to the correct
vect_set_loop_condition_partial_vectors_* function based on
LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_prepare_for_masked_peels): Compute LOOP_VINFO_MASK_SKIP_NITERS
in the original niter type.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
partial_vector_style.
(can_produce_all_loop_masks_p): Adjust.
(vect_verify_full_masking): Produce the rgroup_controls vector
here. Set LOOP_VINFO_PARTIAL_VECTORS_STYLE on success.
(vect_verify_full_masking_avx512): New function implementing
verification of AVX512 style masking.
(vect_verify_loop_lens): Set LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_analyze_loop_2): Also try AVX512 style masking.
Adjust condition.
(vect_estimate_min_profitable_iters): Implement AVX512 style
mask producing cost.
(vect_record_loop_mask): Do not build the rgroup_controls
vector here but record masks in a hash-set.
(vect_get_loop_mask): Implement AVX512 style mask query,
complementing the existing while_ult style.