rsandifo [Fri, 4 Aug 2017 10:40:35 +0000 (10:40 +0000)]
Use base inequality for some vector alias checks
This patch checks whether two data references x and y cannot
partially overlap and so are independent whenever &x != &y.
We can then use this in the vectoriser to optimise alias checks.
gcc/
2016-08-04 Richard Sandiford <richard.sandiford@linaro.org>
* hash-traits.h (pair_hash): New struct.
* tree-data-ref.h (data_dependence_relation): Add object_a and
object_b fields.
(DDR_OBJECT_A, DDR_OBJECT_B): New macros.
* tree-data-ref.c (initialize_data_dependence_relation): Initialize
DDR_OBJECT_A and DDR_OBJECT_B.
* tree-vectorizer.h (vec_object_pair): New type.
(_loop_vec_info): Add a check_unequal_addrs field.
(LOOP_VINFO_CHECK_UNEQUAL_ADDRS): New macro.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Return true if there is an
entry in check_unequal_addrs. Check comp_alias_ddrs instead of
may_alias_ddrs.
* tree-vect-loop.c (destroy_loop_vec_info): Release
LOOP_VINFO_CHECK_UNEQUAL_ADDRS.
(vect_analyze_loop_2): Likewise, when restarting.
(vect_estimate_min_profitable_iters): Estimate the cost of
LOOP_VINFO_CHECK_UNEQUAL_ADDRS.
* tree-vect-data-refs.c: Include tree-hash-traits.h.
(vect_prune_runtime_alias_test_list): Try to handle conflicts
using LOOP_VINFO_CHECK_UNEQUAL_ADDRS, if the data dependence allows.
Count such tests in the final summary.
* tree-vect-loop-manip.c (chain_cond_expr): New function.
(vect_create_cond_for_align_checks): Use it.
(vect_create_cond_for_unequal_addrs): New function.
(vect_loop_versioning): Call it.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-6.c: New test.
rsandifo [Fri, 4 Aug 2017 10:39:44 +0000 (10:39 +0000)]
Handle data dependence relations with different bases
This patch tries to calculate conservatively-correct distance
vectors for two references whose base addresses are not the same.
It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
isn't guaranteed to occur.
The motivating example is:
struct s { int x[8]; };
void
f (struct s *a, struct s *b)
{
for (int i = 0; i < 8; ++i)
a->x[i] += b->x[i];
}
in which the "a" and "b" accesses are either independent or have a
dependence distance of 0 (assuming -fstrict-aliasing). Neither case
prevents vectorisation, so we can vectorise without an alias check.
I'd originally wanted to do the same thing for arrays as well, e.g.:
void
f (int a[][8], struct b[][8])
{
for (int i = 0; i < 8; ++i)
a[0][i] += b[0][i];
}
I think this is valid because C11 6.7.6.2/6 says:
For two array types to be compatible, both shall have compatible
element types, and if both size specifiers are present, and are
integer constant expressions, then both size specifiers shall have
the same constant value.
So if we access an array through an int (*)[8], it must have type X[8]
or X[], where X is compatible with int. It doesn't seem possible in
either case for "a[0]" and "b[0]" to overlap when "a != b".
However, as the comment above "if (same_base_p)" explains, GCC is more
forgiving: it supports arbitrary overlap of arrays and allows arrays to
be accessed with different dimensionality. There are examples of this
in PR50067. The patch therefore only handles references that end in a
structure field access.
There are two ways of handling these dependences in the vectoriser:
use them to limit VF, or check at runtime as before. I've gone for
the approach of checking at runtime if we can, to avoid limiting VF
unnecessarily, but falling back to a VF cap when runtime checks aren't
allowed.
The patch tests whether we queued an alias check with a dependence
distance of X and then picked a VF <= X, in which case it's safe to
drop the alias check. Since vect_prune_runtime_alias_check_list
can be called twice with different VF for the same loop, it's no
longer safe to clear may_alias_ddrs on exit. Instead we should use
comp_alias_ddrs to check whether versioning is necessary.
2017-08-04 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (subscript): Add access_fn field.
(data_dependence_relation): Add could_be_independent_p.
(SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
(same_access_functions): Move to tree-data-ref.c.
* tree-data-ref.c (ref_contains_union_access_p): New function.
(access_fn_component_p): Likewise.
(access_fn_components_comparable_p): Likewise.
(dr_analyze_indices): Add a reference to access_fn_component_p.
(dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
DR_ACCESS_FN.
(constant_access_functions): Likewise.
(add_other_self_distances): Likewise.
(same_access_functions): Likewise. (Moved from tree-data-ref.h.)
(initialize_data_dependence_relation): Use XCNEW and remove
explicit zeroing of DDR_REVERSED_P. Look for a subsequence
of access functions that have the same type. Allow the
subsequence to end with different bases in some circumstances.
Record the chosen access functions in SUB_ACCESS_FN.
(build_classic_dist_vector_1): Replace ddr_a and ddr_b with
a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
(subscript_dependence_tester_1): Likewise dra and drb.
(build_classic_dist_vector): Update calls accordingly.
(subscript_dependence_tester): Likewise.
* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
DDR_COULD_BE_INDEPENDENT_P.
* tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
comp_alias_ddrs instead of may_alias_ddrs.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr):
New function.
(vect_analyze_data_ref_dependence): Use it if
DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded
distance vectors if that fails.
(dependence_distance_ge_vf): New function.
(vect_prune_runtime_alias_test_list): Use it. Don't clear
LOOP_VINFO_MAY_ALIAS_DDRS.
vries [Fri, 4 Aug 2017 07:27:05 +0000 (07:27 +0000)]
Add missing edge probability in simd_clone_adjust
Currently we generate an if with probability set on only one of the two edges:
<bb 5> [0.00%] [count: INV]:
_5 = mask.3[iter.6_3];
if (_5 == 0)
goto <bb 6>; [INV] [count: INV]
else
goto <bb 2>; [100.00%] [count: INV]
Add the missing edge probability, and set the split to unlikely/likely:
if (_5 == 0)
goto <bb 6>; [19.99%] [count: INV]
else
goto <bb 2>; [80.01%] [count: INV]
rguenth [Thu, 3 Aug 2017 14:08:56 +0000 (14:08 +0000)]
2017-08-03 Richard Biener <rguenther@suse.de>
* lto-symtab.h (lto_symtab_prevail_decl): Do not use
DECL_ABSTRACT_ORIGIN as flag we can end up using that. Instead
use DECL_LANG_FLAG_0.
(lto_symtab_prevail_decl): Likewise.
amonakov [Thu, 3 Aug 2017 13:39:47 +0000 (13:39 +0000)]
toplev: avoid recursive emergency_dump_function
* toplev.c (dumpfile.h): New include.
(internal_error_reentered): New static function. Use it...
(internal_error_function): ...here to handle reentered internal_error.
rguenth [Thu, 3 Aug 2017 11:52:00 +0000 (11:52 +0000)]
2017-08-03 Richard Biener <rguenther@suse.de>
PR middle-end/81148
* fold-const.c (split_tree): Add minus_var and minus_con
arguments, remove unused loc arg. Never generate NEGATE_EXPRs
here but always use minus_*.
(associate_trees): Assert we never associate with MINUS_EXPR
and NULL first operand. Do not recurse for PLUS_EXPR operands
when associating as MINUS_EXPR either.
(fold_binary_loc): Track minus_var and minus_con.
jakub [Thu, 3 Aug 2017 09:43:11 +0000 (09:43 +0000)]
PR driver/81650
* calls.c (alloc_max_size): Use HOST_WIDE_INT_UC (10??)
instead of 10??LU, perform unit multiplication in wide_int,
don't change alloc_object_size_limit if the limit is larger
than SSIZE_MAX.
jakub [Thu, 3 Aug 2017 09:41:55 +0000 (09:41 +0000)]
PR tree-optimization/81655
PR tree-optimization/81588
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): Handle also
the case when ranges[i].low and high are 1 for unsigned type with
precision 1.
jakub [Thu, 3 Aug 2017 08:34:16 +0000 (08:34 +0000)]
PR middle-end/81052
* omp-low.c (diagnose_sb_0): Handle flag_openmp_simd like flag_openmp.
(pass_diagnose_omp_blocks::gate): Enable also for flag_openmp_simd.
ian [Wed, 2 Aug 2017 16:27:17 +0000 (16:27 +0000)]
compiler: only finalize embedded fields before finalizing methods
When finalizing the methods of a named struct type, we used to
finalize all the field types first. That can fail if the field types
refer indirectly to the named type. Change it to just finalize the
embedded field types first, and the rest of the fields later.
jakub [Wed, 2 Aug 2017 07:13:25 +0000 (07:13 +0000)]
PR middle-end/79499
* function.c (thread_prologue_and_epilogue_insns): Determine blocks
for find_many_sub_basic_blocks bitmap by looking up BLOCK_FOR_INSN
of first NONDEBUG_INSN_P in each of the split_prologue_seq and
prologue_seq sequences - if any.
rguenth [Wed, 2 Aug 2017 06:57:12 +0000 (06:57 +0000)]
2017-08-02 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_store): Perform vector extracts
via vectors if supported, integer extracts via punning if supported
or otherwise vector extracts.
ebotcazou [Tue, 1 Aug 2017 22:15:32 +0000 (22:15 +0000)]
* c-ada-spec.c (has_static_fields): Look only into fields.
(dump_generic_ada_node): Small tweak.
(dump_nested_types): Look only into fields.
(print_ada_declaration): Look only into methods. Small tweak.
(print_ada_struct_decl): Look only into fields. Use DECL_VIRTUAL_P.
tkoenig [Tue, 1 Aug 2017 17:59:11 +0000 (17:59 +0000)]
2017-08-01 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/45435
* lang.opt (fc-prototypes): Add option.
* gfortran.h (gfc_typespec): Add interop_kind to struct.
(gfc_dump_c_prototypes): Add prototype.
* decl.c (gfc_match_kind_spec): Copy symbol used for kind to typespec.
* parse.c (gfc_parse_file): Call gfc_dump_prototypes.
* dump-parse-tree.c (gfc_dump_c_prototypes): New function.
(type_return): New enum.
(get_c_type_name): New function.
(write_decl): New function.
(write_type): New function.
(write_variable): New function.
(write_proc): New function.
(write_interop_decl): New function.
* invoke.texi: Document -fc-prototypes.
marxin [Tue, 1 Aug 2017 17:21:29 +0000 (17:21 +0000)]
Make mempcpy more optimal (PR middle-end/70140).
2017-08-01 Martin Liska <mliska@suse.cz>
PR middle-end/70140
* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01 Martin Liska <mliska@suse.cz>
PR middle-end/70140
* builtins.c (expand_builtin_memcpy_args): Remove.
(expand_builtin_memcpy): Call newly added function
expand_builtin_memory_copy_args.
(expand_builtin_memcpy_with_bounds): Likewise.
(expand_builtin_mempcpy): Remove last argument.
(expand_builtin_mempcpy_with_bounds): Likewise.
(expand_builtin_memory_copy_args): New function created from
expand_builtin_mempcpy_args with small modifications.
(expand_builtin_mempcpy_args): Remove.
(expand_builtin_stpcpy): Remove unused argument.
(expand_builtin): Likewise.
(expand_builtin_with_bounds): Likewise.
jakub [Tue, 1 Aug 2017 16:34:31 +0000 (16:34 +0000)]
PR target/81622
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): For
__builtin_vec_cmpne verify both arguments are compatible vectors
before looking at TYPE_MODE on the element type. For __builtin_vec_ld
verify arg1_type is a pointer or array type. For __builtin_vec_st,
move computation of aligned to after checking the argument types.
Formatting fixes.
gcc/
* config.gcc (arm-wrs-vxworks*): Rework to handle arm-wrs-vxworks7 as
well as arm-wrs-vxworks. Update target_cpu_name from arm6 (arch v3) to
arm8 (arch v4).
* config/arm/vxworks.h (MAYBE_TARGET_BPABI_CPP_BUILTINS): New, helper
for TARGET_OS_CPP_BUILTIN.
(TARGET_OS_CPP_BUILTIN): Invoke MAYBE_TARGET_BPABI_CPP_BUILTINS(),
refine CPU definitions for arm_arch5 and add those for arm_arch6 and
arm_arch7.
(MAYBE_ASM_ABI_SPEC): New, helper for SUBTARGET_EXTRA_ASM_SPEC,
passing required abi options to the assembler for EABI configurations.
(EXTRA_CC1_SPEC): New macro, to help prevent the implicit production
of .text.hot and .text.unlikely sections for kernel modules when
using ARM style exceptions.
(CC1_SPEC): Remove obsolete attempt at mimicking Diab toolchain
options. Add EXTRA_CC1_SPEC.
(VXWORKS_ENDIAN_SPEC): Adjust comment and remove handling of Diab
toolchain options.
(DWARF2_UNWIND_INFO): Redefine to handle the pre/post VxWorks 7
transition.
(ARM_TARGET2_DWARF_FORMAT): Define.
* config/arm/t-vxworks: Adjust multilib control to removal of the
Diab command line options.
libgcc/
* config.host (arm-wrs-vxworks*): Rework to handle arm-wrs-vxworks7
as well as arm-wrs-vxworks.
* config/arm/t-vxworks7: New file. Add unwind-arm-vxworks.c to
LIB2ADDEH.
* config/arm/unwind-arm-vxworks.c: New file. Provide dummy
__exidx_start and __exidx_end for downloadable modules.
jgreenhalgh [Tue, 1 Aug 2017 12:59:05 +0000 (12:59 +0000)]
Remove flag_tree_vectorize
gcc/
* common.opt (ftree-vectorize): No longer set flag_tree_vectorize.
(ftree-loop-vectorize): Set as EnabledBy ftree-vectorize.
(ftree-slp-vectorize): Likewise.
* omp-expand (expand_omp_simd): Remove flag_tree_vectorize, as it
can no longer be set independent of flag_tree_loop_vectorize.
* omp-general.c (emp_max_vf): Likewise.
* opts.c (enable_fdo_optimizations): Remove references to
flag_tree_vectorize, these are now implicit.
(common_handle_option): Remove handling for OPT_ftree_vectorize,
and leave it for the options machinery.
marxin [Tue, 1 Aug 2017 11:59:27 +0000 (11:59 +0000)]
Make mempcpy more optimal (PR middle-end/70140).
2017-08-01 Martin Liska <mliska@suse.cz>
PR middle-end/70140
* gcc.dg/string-opt-1.c: Adjust test-case to scan for memcpy.
2017-08-01 Martin Liska <mliska@suse.cz>
PR middle-end/70140
* builtins.c (expand_builtin_memcpy_args): Remove.
(expand_builtin_memcpy): Call newly added function
expand_builtin_memory_copy_args.
(expand_builtin_memcpy_with_bounds): Likewise.
(expand_builtin_mempcpy): Remove last argument.
(expand_builtin_mempcpy_with_bounds): Likewise.
(expand_builtin_memory_copy_args): New function created from
expand_builtin_mempcpy_args with small modifications.
(expand_builtin_mempcpy_args): Remove.
(expand_builtin_stpcpy): Remove unused argument.
(expand_builtin): Likewise.
(expand_builtin_with_bounds): Likewise.
rguenth [Tue, 1 Aug 2017 10:47:14 +0000 (10:47 +0000)]
2017-08-01 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (print_pre_expr): Handle NULL expr.
(compute_antic): Seed worklist with exit block predecessors.
* cfganal.c (dfs_find_deadend): For a cycle return the source
of the edge closing it.
* gcc.dg/tree-ssa/ssa-dce-3.c: Adjust.
* gcc.dg/tree-ssa/split-path-5.c: Remove case with just dead
endless loop.
* gcc.dg/uninit-23.c: Adjust.
jakub [Tue, 1 Aug 2017 08:32:37 +0000 (08:32 +0000)]
PR tree-optimization/81588
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): If
ranges[i].in_p, invert comparison code ccode. For >/>=,
swap rhs1 and rhs2 and comparison code unconditionally,
for </<= don't do that. Don't swap rhs1/rhs2 again if
ranges[i].in_p, instead invert comparison code ccode if
opcode or oe->rank is BIT_IOR_EXPR.
* gcc.dg/tree-ssa/pr81588.c: New test.
* gcc.dg/pr81588.c: New test.
* gcc.c-torture/execute/pr81588.c: New test.
* config/rs6000/rs6000-c: Add support for built-in functions
vector signed char vec_xl_be (signed long long, signed char *);
vector unsigned char vec_xl_be (signed long long, unsigned char *);
vector signed int vec_xl_be (signed long long, signed int *);
vector unsigned int vec_xl_be (signed long long, unsigned int *);
vector signed long long vec_xl_be (signed long long, signed long long *);
vector unsigned long long vec_xl_be (signed long long, unsigned long long *);
vector signed short vec_xl_be (signed long long, signed short *);
vector unsigned short vec_xl_be (signed long long, unsigned short *);
vector double vec_xl_be (signed long long, double *);
vector float vec_xl_be (signed long long, float *);
* config/rs6000/altivec.h (vec_xl_be): Add #define.
* config/rs6000/rs6000-builtin.def (XL_BE_V16QI, XL_BE_V8HI, XL_BE_V4SI,
XL_BE_V2DI, XL_BE_V4SF, XL_BE_V2DF, XL_BE): Add definitions for the builtins.
* config/rs6000/rs6000.c (altivec_expand_xl_be_builtin): Add function.
(altivec_expand_builtin): Add switch statement to call altivec_expand_xl_be
for each builtin.
(altivec_init_builtins): Add def_builtin for _builtin_vsx_le_be_v8hi,
__builtin_vsx_le_be_v4si, __builtin_vsx_le_be_v2di, __builtin_vsx_le_be_v4sf,
__builtin_vsx_le_be_v2df, __builtin_vsx_le_be_v16qi.
* doc/extend.texi: Update the built-in documentation file for the
new built-in functions.
gcc/testsuite/ChangeLog:
2017-07-31 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/builtins-4-runnable.c: Add test cases for the
new builtins.
pr79793-1.c and pr79793-2.c are failed when GCC is configured with
--with-cpu=slm since lea is used to adjust stack, instead of sub/add.
This patch uses -mtune=generic to always generate sub and add.
* gcc.target/i386/pr79793-1.c: Compile with -mtune=generic.
* gcc.target/i386/pr79793-2.c: Likewise.
With IBM z14 officially announced we can add support for z14 as
preferred CPU name. We still pass arch12 to Binutils in order to keep
older Binutils versions supported.
gcc/ChangeLog:
2017-07-31 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
* config.gcc: Add z14.
* config/s390/driver-native.c (s390_host_detect_local_cpu): Add
CPU model numbers for z13s and z14.
* config/s390/s390-c.c (s390_resolve_overloaded_builtin): Replace
arch12 with z14.
* config/s390/s390-opts.h (enum processor_type): Rename
PROCESSOR_ARCH12 to PROCESSOR_3906_Z14.
* config/s390/s390.c (processor_table): Add field for CPU name to
be passed to Binutils.
(s390_asm_output_machine_for_arch): Use the new field in
processor_table for Binutils.
(s390_expand_builtin): Replace arch12 with z14.
(s390_issue_rate): Rename PROCESSOR_ARCH12 to PROCESSOR_3906_Z14.
(s390_get_sched_attrmask): Likewise.
(s390_get_unit_mask): Likewise.
* config/s390/s390.opt: Add z14 to processor_type enum.