stage 2: implementation of k-arity promotion/reduction in the series "Improving
effectiveness and generality of autovectorization using unified representation".
The permute nodes within primitive reorder tree(PRT) generated from input
program can have any arity depending upon stride of accesses. However, the
target cannot have instructions to support all arities. Hence, we need to
promote or reduce the arity of PRT to enable successful tree tiling.
In classic autovectorization, if vectorization stride > 2, arity reduction is
performed by generating cascaded extract and interleave instructions as
described by "Auto-vectorization of Interleaved Data for SIMD" by D. Nuzman,
I. Rosen and A. Zaks.
Moreover, to enable SLP across loop, "Loop-aware SLP in GCC" by D. Nuzman,
I. Rosen and A. Zaks unrolls loop till stride = vector size.
k-arity reduction/promotion algorithm makes use of modulo arithmetic to generate
PRT of desired arity for both above-mentioned cases.
Single ILV node of arity k can be reduced into cascaded ILV nodes with single
node of arity m with children of arity k/m such that ith child of original ILV
node becomes floor (i/m) th child of (i%m) th child of new parent.
Single EXTR node with k parts and i selector can be reduced into cascaded EXTR
nodes such that parent EXTR node has m parts and i/(k/m) selection on child EXTR
node with k/m parts and i % (k/m) selection.
Similarly, loop unrolling to get desired arity m can be represented as arity
promotion from k to m.
Single ILV node of arity k can be promoted to single ILV node of arity m by
adding extraction with m/k parts and selection i/k of i%k the child of original
tree as ith child of new ILV node.
To enable loop-aware SLP, we first promote arity of input PRT to maximum vector
size permissible on the architecture. This can have impact on vector code size,
though performance will be the same. To eliminate redundant ILV and EXTR
operations, thereby undoing unneccessary unrolling, we can perform unity
reduction optimization:
- EXTR_m,x (ILV_M(S1, S2, ... Sm)) => Sx
- ILV_m (EXTR_0(S), EXTR_1(S),...EXTR_m-1(S)) => S
Later we apply arity promotion reduction algorithm on the output tree to get tree
with desired arity. For now, we are supporting target arity = 2, as most of the
architectures have support for that. However, the code can be extended for
additional arity supports as well.
We have also implemented unity reduction optimization which eliminates redundant
ILV and EXTR nodes thereby undoing unneccessary unrolling - which can bloat up
the code size otherwise.
Add new pass to perform autovectorization using unified representation - Current GCC framework does not give complete overview of the loop to be vectorized ...
Add new pass to perform autovectorization using unified representation - Current
GCC framework does not give complete overview of the loop to be vectorized : it
either breaks the loop across body, or across iterations. Because of which these
data structures can not be reused for our approach which gathers all the
information of loop body at one place using primitive permute operations. Hence,
define new data structures and populate them.
Add support for vectorization of LOAD/STORE instructions
a. Create permute order tree for the loop with LOAD and STORE instructions
for single or multi-dimensional arrays, aggregates within nested loops.
This change adds new pass to perform autovectorization using unified
representation, defines new data structures to cater to this requirement and
creates primitive reorder tree for LOAD/STORE instructions within the loop.
The whole loop is represented using the ITER_NODE, which have information about
- The preparatory statements for vectorization to be executed before entering
the loop (like initialization of vectors, prepping for reduction operations,
peeling etc.)
- Vectorizable loop body represented as PRIMOP_TREE (primitive reordering tree)
- Final statements (For peeling, variable loop bound, COLLAPSE operation for
reduction etc.)
- Other loop attributes (loop bound, peeling needed, dependences, etc.)
Memory accesses within a loop have definite repetitive pattern which can be
captured using primitive permute operators which can be used to determine
desired permute order for the vector computations. The PRIMOP_TREE is AST which
records all computations and permutations required to store destination vector
into continuous memory at the end of all iterations of the loop. It can have
INTERLEAVE, CONCAT, EXTRACT, SPLIT, ITER or any compute operation as
intermediate node. Leaf nodes can either be memory reference, constant or vector
of loop invariants. Depending upon the operation, PRIMOP_TREE holds appropriate
information about the statement within the loop which is necessary for
vectorization.
At this stage, these data structures are populated by gathering all the
information of the loop, statements within the loop and correlation of the
statements within the loop. Moreover the loop body is analyzed to check if
vectorization of each statement is possible. One has to note however that this
analysis phase will give worst-case estimate of instruction selection, as it
checks if specific named pattern is defined in .md for the target. It not
necessarily give optimal cover which is aim of the transformation phase using
tree tiling algorithm - and can be invoked only once the loop body is
represented using primitive reoder tree.
At this stage, the focus is to create permute order tree for the loop with LOAD
and STORE instructions only. The code we intend to compile is of the form
FOR(i = 0; i < N; i + +)
{
stmt 1 : D[k ∗ i + d 1 ] =S 1 [k ∗ i + c 11 ]
stmt 2 : D[k ∗ i + d 2 ] =S 1 [k ∗ i + c 21 ]
...
stmt k : D[k ∗ i + d k ] =S 1 [k ∗ i + c k 1 ]
}
Here we are assuming that any data reference can be represented using base + k *
index + offset (The data structure struct data_reference from GCC is used
currently for this purpose). If not, the address is normalized to convert to
such representation.
Martin Liska [Fri, 8 Jul 2016 07:52:03 +0000 (09:52 +0200)]
Do not consider COMPLEX_TYPE as fold_convertible_p
PR middle-end/71606
* fold-const.c (fold_convertible_p): As COMPLEX_TYPE
folding produces SAVE_EXPRs, thus return false for the type.
* gcc.dg/torture/pr71606.c: New test.
* exp_ch6.adb (Expand_Internal_Init_Call): Subsidiary procedure
to Expand_Protected_ Subprogram_Call, to handle properly a
call to a protected function that provides the initialization
expression for a private component of the same protected type.
* sem_ch9.adb (Analyze_Protected_Definition): Layout must be
applied to itypes generated for a private operation of a protected
type that has a formal of an anonymous access to subprogram,
because these itypes have no freeze nodes and are frozen in place.
* sem_ch4.adb (Analyze_Selected_Component): If prefix is a
protected type and it is not a current instance, do not examine
the first private component of the type.
2016-07-07 Arnaud Charlet <charlet@adacore.com>
* exp_imgv.adb, g-dynhta.adb, s-regexp.adb, s-fatgen.adb, s-poosiz.adb:
Minor removal of extra whitespace.
* einfo.ads: minor removal of repeated "as" in comment
* adainit.h, adainit.c (__gnat_is_read_accessible_file): New
subprogram.
(__gnat_is_write_accessible_file): New subprogram.
* s-os_lib.ads, s-os_lib.adb (Is_Read_Accessible_File): New subprogram.
(Is_Write_Accessible_File): New subprogram.
2016-07-07 Justin Squirek <squirek@adacore.com>
* sem_ch12.adb (Install_Body): Minor refactoring in the order
of local functions.
(In_Same_Scope): Change loop condition to be more expressive.
* sem_ch12.adb (In_Same_Scope): Created this function to check
a generic package definition against an instantiation for scope
dependancies.
(Install_Body): Add function In_Same_Scope and
amend conditional in charge of delaying the package instance.
(Is_In_Main_Unit): Add guard to check if parent is present in
assignment of Current_Unit.
Martin Liska [Thu, 7 Jul 2016 13:15:39 +0000 (15:15 +0200)]
Optimize fortran loops with +-1 step.
* gfortran.dg/do_1.f90: Remove a corner case that triggers
an undefined behavior.
* gfortran.dg/do_3.F90: Likewise.
* gfortran.dg/do_check_11.f90: New test.
* gfortran.dg/do_check_12.f90: New test.
* gfortran.dg/do_corner_warn.f90: New test.
* lang.opt (Wundefined-do-loop): New option.
* resolve.c (gfc_resolve_iterator): Warn for Wundefined-do-loop.
(gfc_trans_simple_do): Generate a c-style loop.
(gfc_trans_do): Fix GNU coding style.
* invoke.texi: Mention the new warning.
Martin Liska [Thu, 7 Jul 2016 13:11:05 +0000 (15:11 +0200)]
Add PRED_FORTRAN_LOOP_PREHEADER to DO loops with step
* trans-stmt.c (gfc_trans_do): Add expect builtin for DO
loops with step bigger than +-1.
* gfortran.dg/predict-1.f90: Ammend the test.
* gfortran.dg/predict-2.f90: Likewise.
* sem_prag.ads, sem_prag.adb (Build_Classwide_Expression): Include
overridden operation as parameter, in order to map formals of
the overridden and overring operation properly prior to rewriting
the inherited condition.
* freeze.adb (Check_Inherited_Cnonditions): Change call to
Build_Class_Wide_Expression accordingly. In Spark_Mode, add
call to analyze the contract of the parent operation, prior to
mapping formals between operations.
2016-07-07 Arnaud Charlet <charlet@adacore.com>
* adabkend.adb (Scan_Back_End_Switches): Ignore -o/-G switches
as done in back_end.adb.
(Scan_Compiler_Args): Remove special case for CodePeer/SPARK, no longer
needed, and prevents proper handling of multi-unit sources.
2016-07-07 Thomas Quinot <quinot@adacore.com>
* g-sechas.adb, g-sechas.ads (GNAT.Secure_Hashes.H): Add Hash_Stream
type with Write primitive calling Update on the underlying context
(and dummy Read primitive raising P_E).
sem_ch6.adb (Process_Formals): Set ghost flag on formal entities of ghost subprograms.
2016-07-07 Yannick Moy <moy@adacore.com>
* sem_ch6.adb (Process_Formals): Set ghost flag
on formal entities of ghost subprograms.
* ghost.adb (Check_Ghost_Context.Is_OK_Ghost_Context): Accept ghost
entities in use type clauses.
Martin Liska [Thu, 7 Jul 2016 12:03:39 +0000 (14:03 +0200)]
Prevent LTO wrappers to process a recursive execution
* file-find.c (remove_prefix): New function.
* file-find.h (remove_prefix): Declare the function.
* gcc-ar.c (main): Skip a folder of the wrapper if
a wrapped binary would point to the same file.
arm-arches.def (armv8-m.base): Define new architecture.
2016-07-07 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config/arm/arm-arches.def (armv8-m.base): Define new architecture.
(armv8-m.main): Likewise.
(armv8-m.main+dsp): Likewise.
* config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define.
(FL_FOR_ARCH8M_MAIN): Likewise.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and
armv8-m.main+dsp to BE8_LINK_SPEC.
* config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M.
(enum base_architecture): Add BASE_ARCH_8M_BASE and BASE_ARCH_8M_MAIN.
* config/arm/arm.c (arm_arch_name): Increase size to work with ARMv8-M
Baseline and Mainline.
(arm_option_override_internal): Also disable arm_restrict_it when
!arm_arch_notm. Update comment for -munaligned-access to also cover
ARMv8-M Baseline.
(arm_file_start): Increase buffer size for printing architecture name.
* doc/invoke.texi: Document architectures armv8-m.base, armv8-m.main
and armv8-m.main+dsp.
(mno-unaligned-access): Clarify that this is disabled by default for
ARMv8-M Baseline architectures as well.
gcc/testsuite/
* lib/target-supports.exp: Generate add_options_for_arm_arch_FUNC and
check_effective_target_arm_arch_FUNC_multilib for ARMv8-M Baseline and
ARMv8-M Mainline architectures.
libgcc/
* config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8-M.
elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to decide whether to prevent...
2016-07-07 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
decide whether to prevent some libgcc routines being included for some
multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
link between this condition and the one in
libgcc/config/arm/lib1func.S.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
__ARM_ARCH_ISA_ARM to test for Cortex-M devices.
libgcc/
* config/arm/bpabi-v6m.S: Clarify what architectures is the
implementation suitable for.
* config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
for all Thumb-1 only targets.
(NOT_ISA_TARGET_32BIT): Define for Thumb-1 only targets.
(THUMB_LDIV0): Test for NOT_ISA_TARGET_32BIT rather than
__ARM_ARCH_6M__.
(EQUIV): Likewise.
(ARM_FUNC_ALIAS): Likewise.
(umodsi3): Add check to __ARM_ARCH_ISA_THUMB != 1 to guard the idiv
version.
(modsi3): Likewise.
(clzsi2): Test for NOT_ISA_TARGET_32BIT rather than __ARM_ARCH_6M__.
(clzdi2): Likewise.
(ctzsi2): Likewise.
(L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
__ARM_ARCH_6M__ in guard for checking whether it is defined.
(final includes): Test for NOT_ISA_TARGET_32BIT rather than
__ARM_ARCH_6M__ and add comment to indicate the connection between
this condition and the one in gcc/config/arm/elf.h.
* config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
__ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
* config/arm/t-softfp: Likewise.
Richard Biener [Thu, 7 Jul 2016 07:43:35 +0000 (07:43 +0000)]
tree-ssa-pre.c: Include alias.h.
2016-07-07 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c: Include alias.h.
(compute_avail): If we have multiple VN_REFERENCEs with the
same hashtable entry adjust that to make it a valid replacement
for all of them with respect to alignment and aliasing
when doing insertion.
* tree-ssa-sccvn.h (vn_reference_operands_for_lookup): Declare.
* tree-ssa-sccvn.c (vn_reference_operands_for_lookup): New function.
rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)
Similar to PR70098, which is about integers in floating point registers,
we can have the completely analogous problem with vector registers as well
now that we allow integers in vector registers. So, this patch solves it
in the same way. This only works for targets with direct move.
To recap: register allocation can decide to put an integer mode value in
a floating point or vector register. If that register is used in a bd*z
instruction, which is a jump instruction, reload can not do an output
reload on it (it does not do output reloads on any jump insns), so the
float or vector register will remain, and we have to allow it here or
recog will ICE. Later on we will split this to valid instructions,
including a move from that fp/vec register to an int register; it is this
move that will still fail (PR70098) if we do not have direct move enabled.
PR target/70098
PR target/71763
* config/rs6000/rs6000.md (*ctr<mode>_internal1, *ctr<mode>_internal2,
*ctr<mode>_internal5, *ctr<mode>_internal6): Add *wi to the output
constraint.
gcc/testsuite/
PR target/70098
PR target/71763
* gcc.target/powerpc/pr71763.c: New file.
sem_ch6.adb (Analyze_Expression_Function): Mark body of expression function as ghost if needed when created.
2016-07-06 Yannick Moy <moy@adacore.com>
* sem_ch6.adb (Analyze_Expression_Function): Mark body of
expression function as ghost if needed when created.
* sem_prag.adb (Analyze_Pragma.Process_Inline.Set_Inline_Flags):
Remove special case.
* gnat1drv.adb: Code clean up. Do not emit any
code generation errors when the unit is ignored Ghost.
2016-07-06 Ed Schonberg <schonberg@adacore.com>
* sem_eval.adb (Check_Non_Static_Context): If the expression
is a real literal of a floating point type that is part of a
larger expression and is not a static expression, transform it
into a machine number now so that the rest of the computation,
even if other components are static, is not evaluated with
extra precision.
2016-07-06 Javier Miranda <miranda@adacore.com>
* sem_ch13.adb (Freeze_Entity_Checks): Undo previous patch and move the
needed functionality to Analyze_Freeze_Generic_Entity.
(Analyze_Freeze_Generic_Entity): If the entity is not already frozen
and has delayed aspects then analyze them.
2016-07-06 Yannick Moy <moy@adacore.com>
* sem_prag.adb (Analyze_Pragma.Process_Inline.Set_Inline_Flags):
Special case for unanalyzed body entity of ghost expression function.
Javier Miranda [Wed, 6 Jul 2016 13:14:25 +0000 (13:14 +0000)]
sem_ch7.adb (Analyze_Package_Specification): Insert its freezing nodes after the last declaration.
2016-07-06 Javier Miranda <miranda@adacore.com>
* sem_ch7.adb (Analyze_Package_Specification): Insert its
freezing nodes after the last declaration. Needed to ensure
that global entities referenced in aspects of frozen types are
properly handled.
* freeze.adb (Freeze_Entity): Minor code reorganization to ensure
that freezing nodes of generic packages are handled.
* sem_ch13.adb (Freeze_Entity_Checks): Handle N_Freeze_Generic nodes.
* sem_ch12.adb (Save_References_In_Identifier): Handle selected
components which denote a named number that is constant folded
in the analyzed copy of the tree.
* exp_aggr.adb Remove with and use clauses for Exp_Ch11 and Inline.
(Initialize_Array_Component): Protect the initialization
statements in an abort defer / undefer block when the associated
component is controlled.
(Initialize_Record_Component): Protect the initialization statements
in an abort defer / undefer block when the associated component is
controlled.
(Process_Transient_Component_Completion): Use Build_Abort_Undefer_Block
to create an abort defer / undefer block.
* exp_ch3.adb Remove with and use clauses for Exp_ch11 and Inline.
(Default_Initialize_Object): Use Build_Abort_Undefer_Block to
create an abort defer / undefer block.
* exp_ch5.adb (Expand_N_Assignment_Statement): Mark an abort
defer / undefer block as such.
* exp_ch9.adb (Find_Enclosing_Context): Do not consider an abort
defer / undefer block as a suitable context for an activation
chain or a master.
* exp_util.adb Add with and use clauses for Exp_Ch11.
(Build_Abort_Undefer_Block): New routine.
* exp_util.ads (Build_Abort_Undefer_Block): New routine.
* sinfo.adb (Is_Abort_Block): New routine.
(Set_Is_Abort_Block): New routine.
* sinfo.ads New attribute Is_Abort_Block along with occurrences
in nodes.
(Is_Abort_Block): New routine along with pragma Inline.
(Set_Is_Abort_Block): New routine along with pragma Inline.
2016-07-06 Justin Squirek <squirek@adacore.com>
* sem_ch4.adb (Analyze_One_Call): Add a conditional to handle
disambiguation.
* einfo.adb Flag252 is now used as Is_Finalized_Transient. Flag295
is now used as Is_Ignored_Transient.
(Is_Finalized_Transient): New routine.
(Is_Ignored_Transient): New routine.
(Is_Processed_Transient): Removed.
(Set_Is_Finalized_Transient): New routine.
(Set_Is_Ignored_Transient): New routine.
(Set_Is_Processed_Transient): Removed.
(Write_Entity_Flags): Output Flag252 and Flag295.
* einfo.ads: New attributes Is_Finalized_Transient
and Is_Ignored_Transient along with occurrences in
entities. Remove attribute Is_Processed_Transient.
(Is_Finalized_Transient): New routine along with pragma Inline.
(Is_Ignored_Transient): New routine along with pragma Inline.
(Is_Processed_Transient): Removed along with pragma Inline.
(Set_Is_Finalized_Transient): New routine along with pragma Inline.
(Set_Is_Ignored_Transient): New routine along with pragma Inline.
(Set_Is_Processed_Transient): Removed along with pragma Inline.
* exp_aggr.adb Add with and use clauses for Exp_Ch11 and Inline.
(Build_Record_Aggr_Code): Change the handling
of controlled record components.
(Ctrl_Init_Expression): Removed.
(Gen_Assign): Add new formal parameter In_Loop
along with comment on usage. Remove local variables Stmt and
Stmt_Expr. Change the handling of controlled array components.
(Gen_Loop): Update the call to Gen_Assign.
(Gen_While): Update the call to Gen_Assign.
(Initialize_Array_Component): New routine.
(Initialize_Ctrl_Array_Component): New routine.
(Initialize_Ctrl_Record_Component): New routine.
(Initialize_Record_Component): New routine.
(Process_Transient_Component): New routine.
(Process_Transient_Component_Completion): New routine.
* exp_ch4.adb (Process_Transient_In_Expression): New routine.
(Process_Transient_Object): Removed. Replace all existing calls
to this routine with calls to Process_Transient_In_Expression.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Remove local constant
Is_Elem_Ref. Update the comment on ignoring transients.
* exp_ch7.adb (Process_Declarations): Do not process ignored
or finalized transient objects.
(Process_Transient_In_Scope): New routine.
(Process_Transients_In_Scope): New routine.
(Process_Transient_Objects): Removed. Replace all existing calls
to this routine with calls to Process_Transients_In_Scope.
* exp_util.adb (Build_Transient_Object_Statements): New routine.
(Is_Finalizable_Transient): Do not consider a transient object
which has been finalized.
(Requires_Cleanup_Actions): Do not consider ignored or finalized
transient objects.
* exp_util.ads (Build_Transient_Object_Statements): New routine.
* sem_aggr.adb: Major code clean up.
* sem_res.adb: Update documentation.
2016-07-06 Ed Schonberg <schonberg@adacore.com>
* sem_ch3.adb (Analyze_Subtype_Declaration): For generated
subtypes, such as actual subtypes of unconstrained formals,
inherit predicate functions, if any, from the parent type rather
than creating redundant new ones.
* exp_attr.adb, sem_attr.adb, sem_ch13.adb: Minor reformatting.
2016-07-06 Arnaud Charlet <charlet@adacore.com>
* lib.adb (Check_Same_Extended_Unit): Prevent looping forever.
* gnatbind.adb: Disable some consistency checks in codepeer mode,
which are not needed.
2016-07-06 Ed Schonberg <schonberg@adacore.com>
* sem_ch12.adb (Check_Fixed_Point_Actual): Add a warning when
a formal fixed point type is instantiated with a type that has
a user-defined arithmetic operations, but the generic has no
corresponding formal functions. This is worth a warning because
of the special semantics of fixed-point operators.
Bob Duff [Wed, 6 Jul 2016 12:32:35 +0000 (12:32 +0000)]
sem_attr.adb (Analyze_Attribute): Allow any expression of discrete type.
2016-07-06 Bob Duff <duff@adacore.com>
* sem_attr.adb (Analyze_Attribute): Allow any expression of
discrete type.
* exp_attr.adb (Expand_N_Attribute_Reference): Change the
constant-folding code to correctly handle cases newly allowed
by Analyze_Attribute.
[7/7] Add negative and zero strides to vect_memory_access_type
This patch uses the vect_memory_access_type from patch 6 to represent
the effect of a negative contiguous stride or a zero stride. The latter
is valid only for loads.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_memory_access_type): Add
VMAT_INVARIANT, VMAT_CONTIGUOUS_DOWN and VMAT_CONTIGUOUS_REVERSED.
* tree-vect-stmts.c (compare_step_with_zero): New function.
(perm_mask_for_reverse): Move further up file.
(get_group_load_store_type): Stick to VMAT_ELEMENTWISE if the
step is negative.
(get_negative_load_store_type): New function.
(get_load_store_type): Call it. Add an ncopies argument.
(vectorizable_mask_load_store): Update call accordingly and
remove tests for negative steps.
(vectorizable_store, vectorizable_load): Likewise. Handle new
memory_access_types.
This is the main patch in the series. It adds a new enum and routines
for classifying a vector load or store implementation.
Originally there were three motivations:
(1) Reduce cut-&-paste
(2) Make the chosen vectorisation strategy more obvious. At the
moment this is derived implicitly from various other bits of
state (GROUPED, STRIDED, SLP, etc.)
(3) Decouple the vectorisation strategy from those other bits of state,
so that there can be a choice of implementation for a given scalar
statement. The specific problem here is that we class:
for (...)
{
... = a[i * x];
... = a[i * x + 1];
}
as "strided and grouped" but:
for (...)
{
... = a[i * 7];
... = a[i * 7 + 1];
}
as "non-strided and grouped". Before the patch, "strided and
grouped" loads would always try to use separate scalar loads
while "non-strided and grouped" loads would always try to use
load-and-permute. But load-and-permute is never supported for
a group size of 7, so the effect was that the first loop was
vectorisable and the second wasn't. It seemed odd that not
knowing x (but accepting it could be 7) would allow more
optimisation opportunities than knowing x is 7.
Unfortunately, it looks like we underestimate the cost of separate
scalar accesses on at least aarch64, so I've disabled (3) for now;
see the "if" statement at the end of get_load_store_type. I think
the patch still does (1) and (2), so that's the justification for
it in its current form. It also means that (3) is now simply a
case of removing the FIXME code, once the cost model problems have
been sorted out. (I did wonder about adding a --param, but that
seems overkill. I hope to get back to this during GCC 7 stage 1.)
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_memory_access_type): New enum.
(_stmt_vec_info): Add a memory_access_type field.
(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
(vect_model_store_cost): Take an access type instead of a boolean.
(vect_model_load_cost): Likewise.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
vect_model_store_cost and vect_model_load_cost.
* tree-vect-stmts.c (vec_load_store_type): New enum.
(vect_model_store_cost): Take an access type instead of a
store_lanes_p boolean. Simplify tests.
(vect_model_load_cost): Likewise, but for load_lanes_p.
(get_group_load_store_type, get_load_store_type): New functions.
(vectorizable_store): Use get_load_store_type. Record the access
type in STMT_VINFO_MEMORY_ACCESS_TYPE.
(vectorizable_load): Likewise.
(vectorizable_mask_load_store): Likewise. Replace is_store
variable with vls_type.
This patch moves the fix for PR65518 to the code that checks whether
load-and-permute operations are supported. If the group size is
greater than the vectorisation factor, it would still be possible
to fall back to elementwise loads (as for strided groups) rather
than fail vectorisation entirely.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_grouped_load_supported): Add a
single_element_p parameter.
* tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.
Check the PR65518 case here rather than in vectorizable_load.
* tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly.
* tree-vect-stmts.c (vectorizable_load): Likewise.
This patch just refactors the gather/scatter support so that all
information is in a single structure, rather than separate variables.
This reduces the number of arguments to a function added in patch 6.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (gather_scatter_info): New structure.
(vect_check_gather_scatter): Return a bool rather than a decl.
Replace return-by-pointer arguments with a single
gather_scatter_info *.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
(vect_analyze_data_refs): Update call accordingly.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
(vectorizable_mask_load_store): Likewise. Also record the
offset dt and vectype in the gather_scatter_info.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
/* Costs of the stores. */
if (STMT_VINFO_STRIDED_P (stmt_info)
&& !STMT_VINFO_GROUPED_ACCESS (stmt_info))
{
/* N scalar stores plus extracting the elements. */
inside_cost += record_stmt_cost (body_cost_vec,
ncopies * TYPE_VECTOR_SUBPARTS (vectype),
scalar_store, stmt_info, 0, vect_body);
But non-SLP strided groups also use individual scalar stores rather than
vector stores, so I think we should skip this only for SLP groups.
The same applies to vect_model_load_cost.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vect_model_store_cost): For non-SLP
strided groups, use the cost of N scalar accesses instead
of ncopies vector accesses.
(vect_model_load_cost): Likewise.
gcc/
* tree-vect-stmts.c (vect_cost_group_size): Delete.
(vect_model_store_cost): Avoid calling it. Use first_stmt_p
variable to indicate when once-per-group costs are being used.
(vect_model_load_cost): Likewise. Fix comment and misindented code.
I recently relaxed the peeling-for-gaps conditions for LD3 but
kept them as-is for load-and-permute. I don't think the conditions
are needed for load-and-permute either though. No current load-and-
permute should load outside the group, so if there is no gap at the end,
the final vector element loaded will correspond to an element loaded
by the original scalar loop.
The patch for PR68559 (a missed optimisation PR) increased the peeled
cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so
before that fix, we didn't peel for gaps if there was no gap at the end
of the group and if the group size was a power of 2.
The only current non-power-of-2 load-and-permute size is 3, which
doesn't require loading more than 3 vectors.
Andreas Krebbel [Wed, 6 Jul 2016 07:05:11 +0000 (07:05 +0000)]
S/390: Fix vecinit expansion.
The fallback routine in the S/390 vecinit expander did not check
whether each of the initializer elements is a proper general_operand.
Since revision r236582 the expander is invoked also with e.g. symbol
refs with an odd addend resulting in invalid insns.
Fixed by forcing the element into a register in such cases.
gcc/ChangeLog:
2016-07-06 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* config/s390/s390.c (s390_expand_vec_init): Force initializer
element to register if it doesn't match general_operand.
any_cast doesn't work with rvalue reference targets and cannot
move with a value target.
* include/experimental/any (any(_ValueType&&)): Constrain and
add an overload that doesn't forward.
(any_cast(any&&)): Constrain and add an overload that moves.
* testsuite/experimental/any/misc/any_cast.cc: Add tests for
the functionality added by LWG 2509.
rs6000-protos.h (rs6000_split_signbit): New prototype.
[gcc]
2016-07-05 Michael Meissner <meissner@linux.vnet.ibm.com>
Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000-protos.h (rs6000_split_signbit): New
prototype.
* config/rs6000/rs6000.c (rs6000_split_signbit): New function.
* config/rs6000/rs6000.md (UNSPEC_SIGNBIT): New constant.
(SIGNBIT): New mode iterator.
(Fsignbit): New mode attribute.
(signbit<mode>2): Change operand1 to match FLOAT128 instead of
IBM128; dispatch to gen_signbit{kf,tf}2_dm for __float128
when direct moves are available.
(signbit<mode>2_dm): New define_insn_and_split).
(signbit<mode>2_dm2): New define_insn.
[gcc/testsuite]
2016-07-05 Michael Meissner <meissner@linux.vnet.ibm.com>
Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/signbit-1.c: New test.
* gcc.target/powerpc/signbit-2.c: New test.
* gcc.target/powerpc/signbit-3.c: New test.
Co-Authored-By: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
From-SVN: r238016
[RTL ifcvt] PR rtl-optimization/71594: ICE in noce_emit_cmove due to mismatched source modes
PR rtl-optimization/71594
* ifcvt.c (noce_convert_multiple_sets): Wrap new_val or old_val
into subregs of appropriate mode before trying to emit a conditional
move.
David Malcolm [Tue, 5 Jul 2016 15:50:54 +0000 (15:50 +0000)]
PR c++/62314: add fixit hint for "expected ';' after class definition"
gcc/cp/ChangeLog:
PR c++/62314
* parser.c (cp_parser_class_specifier_1): When reporting
missing semicolons, use a fixit-hint to suggest insertion
of a semicolon immediately after the closing brace,
offsetting the reported column accordingly.
gcc/testsuite/ChangeLog:
PR c++/62314
* gcc/testsuite/g++.dg/parse/error5.C: Update column
number of missing semicolon error.
* g++.dg/pr62314-2.C: New test case.
Eric Botcazou [Tue, 5 Jul 2016 10:32:43 +0000 (10:32 +0000)]
decl.c (gnat_to_gnu_entity): Invoke global_bindings_p last when possible.
* gcc-interface/decl.c (gnat_to_gnu_entity): Invoke global_bindings_p
last when possible. Do not call elaborate_expression_2 on offsets in
local record types and avoid useless processing for constant offsets.
François Dumont [Mon, 4 Jul 2016 14:52:54 +0000 (14:52 +0000)]
Add tests for inserting aliased objects into std::vector
2016-07-04 François Dumont <fdumont@gcc.gnu.org>
* testsuite/23_containers/vector/modifiers/emplace/self_emplace.cc:
New test.
* testsuite/23_containers/vector/modifiers/insert/self_insert.cc: New
test.
Jonathan Wakely [Mon, 4 Jul 2016 14:52:46 +0000 (15:52 +0100)]
Fix std::vector's use of temporary objects
* include/bits/stl_vector.h (emplace(const_iterator, _Args&&...)):
Define inline. Forward to _M_emplace_aux.
(insert(const_iterator, value_type&&)): Forward to _M_insert_rval.
(_M_insert_rval, _M_emplace_aux): Declare new functions.
(_Temporary_value): New RAII type using allocator to construct/destroy.
(_S_insert_aux_assign): Remove.
(_M_insert_aux): Make non-variadic.
* include/bits/vector.tcc (insert(const_iterator, const value_type&)):
Use _Temporary_value.
(emplace(const_iterator, _Args&&...)): Remove definition.
(_M_insert_rval, _M_emplace_aux): Define.
(_M_insert_aux): Make non-variadic, stop using _S_insert_aux_assign.
(_M_fill_insert): Use _Temporary_value.
* testsuite/23_containers/vector/allocator/construction.cc: New test.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc:
Adjust expected results for emplacing an lvalue with reallocation.
* testsuite/23_containers/vector/check_construct_destroy.cc: Adjust
expected results to account for construction/destruction of temporary
using allocator.
S/390: Add support for z13 instructions lochi and locghi.
The attached patch adds patterns to make use of the z13 LOCHI and
LOCGHI instructions.
gcc/ChangeLog:
2016-07-04 Dominik Vogt <vogt@linux.vnet.ibm.com>
* config/s390/s390.md: Add "z13" cpu_facility.
("*mov<mode>cc"): Add support for z13 instructions lochi and locghi.
* config/s390/predicates.md ("loc_operand"): New predicate for "load on
condition" type instructions.
gcc/testsuite/ChangeLog:
2016-07-04 Dominik Vogt <vogt@linux.vnet.ibm.com>
* gcc.target/s390/vector/vec-scalar-cmp-1.c: Expect lochi instead
of locr.
* gcc.target/s390/loc-1.c: New test.
i386.c (ix86_expand_vec_perm): Add handle one-operand permutation for TARGET_AVX512F.
gcc/
* config/i386/i386.c (ix86_expand_vec_perm): Add handle one-operand
permutation for TARGET_AVX512F.
(ix86_expand_vec_one_operand_perm_avx512): New function.
(expand_vec_perm_1): Invoke introduced function.
* tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since
it may be not valid after vectorization.
gcc/testsuite/
* gcc/testsuite/gcc.target/i386/avx512f-vect-perm-1.c: New test.
* gcc/testsuite/gcc.target/i386/avx512f-vect-perm-2.c: New test.
re PR libstdc++/71313 ([Filesystem TS] remove_all fails to remove directory contents recursively)
PR libstdc++/71313
* src/filesystem/ops.cc (remove_all(const path&, error_code&)):
Call remove_all for children of a directory.
* testsuite/experimental/filesystem/operations/create_directories.cc:
Adjust.
* sem_attr.adb (Analyze_Attribute_Old_Result): The attributes can
appear in the postcondition of a subprogram renaming declaration,
when the renamed entity is an attribute reference that is a
function (such as 'Value).
* sem_attr.adb (Eval_Attribute): It doesn't
need to be static, just known at compile time, so use
Compile_Time_Known_Value instead of Is_Static_Expression.
This is an efficiency improvement over the previous bug fix.
* sem_ch13.adb (Analyze_One_Aspect): Use Original_Node to detect
illegal aspects on subprogram renaming declarations that may
have been rewritten as bodies.
2016-07-04 Arnaud Charlet <charlet@adacore.com>
* sem_intr.adb (Errint): Do not emit error message in
Relaxed_RM_Semantics mode.
Bob Duff [Mon, 4 Jul 2016 12:30:44 +0000 (12:30 +0000)]
sem_attr.adb (Eval_Attribute): The code was assuming that X'Enum_Rep...
2016-07-04 Bob Duff <duff@adacore.com>
* sem_attr.adb (Eval_Attribute): The code was assuming
that X'Enum_Rep, where X denotes a constant, can be constant
folded. Fix it so it makes that assumption only when X denotes
a STATIC constant.