Martin Sebor [Thu, 24 Nov 2016 22:45:18 +0000 (15:45 -0700)]
PR tree-optimization/78476 - snprintf(0, 0, ...) with known arguments not optimized away
gcc/testsuite/ChangeLog:
PR tree-optimization/78476
* gcc.dg/tree-ssa/builtin-sprintf-5.c: New test.
gcc/ChangeLog:
PR tree-optimization/78476
* gimple-ssa-sprintf.c (struct pass_sprintf_length::call_info):
Add a member.
(handle_gimple_call): Adjust signature.
(try_substitute_return_value): Remove calls to bounded functions
with zero buffer size whose result is known.
(pass_sprintf_length::execute): Adjust call to handle_gimple_call.
Vladimir Makarov [Thu, 24 Nov 2016 19:54:27 +0000 (19:54 +0000)]
re PR rtl-optimization/77541 (wrong code with 512bit vectors of int128 @ -O1)
2016-11-24 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/77541
* lra-constraints.c (struct input_reload): Add field match_p.
(get_reload_reg): Check modes of input reloads to generate unique
value reload pseudo.
(match_reload): Add input reload pseudo for the current insn.
James Greenhalgh [Thu, 24 Nov 2016 18:19:29 +0000 (18:19 +0000)]
[Patch AArch64 13/17] Enable _Float16 for AArch64
gcc/
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Update
__FLT_EVAL_METHOD__ and __FLT_EVAL_METHOD_C99__ when we switch
architecture levels.
* config/aarch64/aarch64.c (aarch64_promoted_type): Only promote
the aarch64_fp16_type_node, not all HFmode types.
(aarch64_libgcc_floating_mode_supported_p): Support HFmode.
(aarch64_scalar_mode_supported_p): Likewise.
(aarch64_excess_precision): New.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Define.
(TARGET_SCALAR_MODE_SUPPORTED_P): Likewise.
(TARGET_C_EXCESS_PRECISION): Likewise.
Eric Botcazou [Thu, 24 Nov 2016 15:30:17 +0000 (15:30 +0000)]
sparc-common.c (sparc_option_optimization_table): Enable REE at -O2 and higher.
* common/config/sparc/sparc-common.c (sparc_option_optimization_table):
Enable REE at -O2 and higher.
* config/sparc/sparc.c (sparc_option_override): Disable it by default
in 32-bit mode.
Kyrylo Tkachov [Thu, 24 Nov 2016 15:22:34 +0000 (15:22 +0000)]
[TER] PR target/48863 : Don't replace expressions across local register variable definitions
PR target/48863
PR inline-asm/70184
* tree-ssa-ter.c (temp_expr_table): Add reg_vars_cnt field.
(new_temp_expr_table): Initialise reg_vars_cnt.
(free_temp_expr_table): Release reg_vars_cnt.
(process_replaceable): Add reg_vars_cnt argument, set reg_vars_cnt
field of TAB.
(find_replaceable_in_bb): Use the above to record register variable
write occurrences and cancel replacement across them.
Eric Botcazou [Thu, 24 Nov 2016 15:01:32 +0000 (15:01 +0000)]
re PR rtl-optimization/78437 (invalid sign-extend conversion in REE pass)
PR rtl-optimization/78437
* ree.c (get_uses): New function.
(combine_reaching_defs): When a copy is needed, return false if any
reaching use of the source register reads it in a mode larger than
the mode it is set in and WORD_REGISTER_OPERATIONS is true.
Richard Biener [Thu, 24 Nov 2016 12:25:22 +0000 (12:25 +0000)]
re PR tree-optimization/71595 (ICE on valid code at -O2 and -O3 on x86_64-linux-gnu: in check_loop_closed_ssa_use, at tree-ssa-loop-manip.c:704)
2016-11-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/71595
* cfgloopmanip.h (remove_path): Add irred_invalidated and
loop_closed_ssa_invalidated parameters, defaulted to NULL.
* cfgloopmanip.c (remove_path): Likewise, pass them along to
called functions. Only fix irred flags if the caller didn't
request state.
* tree-ssa-loop-ivcanon.c (unloop_loops): Use add_bb_to_loop.
(unloop_loops): Pass irred_invalidated and loop_closed_ssa_invalidated
to remove_path.
Bernd Schmidt [Thu, 24 Nov 2016 12:22:16 +0000 (12:22 +0000)]
re PR rtl-optimization/78120 (If conversion no longer performed)
PR rtl-optimization/78120
* ifcvt.c (noce_conversion_profitable_p): Check original cost in all
cases, and additionally test against max_seq_cost for speed
optimization.
(noce_process_if_block): Compute an estimate for the original cost when
optimizing for speed, using the minimum of then and else block costs.
testsuite/
PR rtl-optimization/78120
* gcc.target/i386/pr78120.c: New test.
Eric Botcazou [Thu, 24 Nov 2016 12:02:53 +0000 (12:02 +0000)]
re PR middle-end/78429 (ICE in set_value_range, at tree-vrp.c on non-standard boolean)
PR middle-end/78429
* tree.h (wi::fits_to_boolean_p): New predicate.
(wi::fits_to_tree_p): Use it for boolean types.
* tree.c (int_fits_type_p): Likewise.
Martin Liska [Thu, 24 Nov 2016 09:42:18 +0000 (10:42 +0100)]
Fix print_node for CONSTRUCTORs
* print-tree.c (struct bucket): Remove.
(print_node): Add new argument which drives whether a tree node
is printed briefly or not.
(debug_tree): Replace a custom hash table with hash_set<T>.
* print-tree.h (print_node): Add the argument.
Joseph Myers [Wed, 23 Nov 2016 23:34:05 +0000 (23:34 +0000)]
Fix e500 offset handling for TImode.
Given my previous fix for a missing insn pattern for e500, building
glibc runs into an assembler error "Error: operand out of range (256
is not between 0 and 248)". This comes from an insn:
This patch adjusts the offset handling for TImode - and TDmode and
PTImode in case such subregs can arise for them - to be the same as
for TFmode, so that proper SPE offset checks are made in the
TARGET_E500_DOUBLE case.
This allows the glibc build to complete. Testing shows 372 FAILs
across the gcc, g++ and libstdc++ testsuites; more cleanup is
certainly needed, but this gets to the point where the toolchain at
least builds so it's possible to compare test results when fixing
bugs.
* config/rs6000/rs6000.c (rs6000_legitimate_offset_address_p): For
TARGET_E500_DOUBLE. handle TDmode, TImode and PTImode the same as
TFmode, IFmode and KFmode.
Joseph Myers [Wed, 23 Nov 2016 23:32:54 +0000 (23:32 +0000)]
Add another e500 subreg pattern.
Building glibc for powerpc-linux-gnuspe --enable-e500-double, given
the patch <https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02404.html>
applied, fails with errors such as:
../sysdeps/ieee754/ldbl-128ibm/s_modfl.c: In function '__modfl':
../sysdeps/ieee754/ldbl-128ibm/s_modfl.c:91:1: error: unrecognizable insn:
}
^
(insn 31 30 32 2 (set (reg:DF 203)
(subreg:DF (reg:TI 202) 8)) "../sysdeps/ieee754/ldbl-128ibm/s_modfl.c":44 -1
(nil))
../sysdeps/ieee754/ldbl-128ibm/s_modfl.c:91:1: internal compiler error: in extract_insn, at recog.c:2311
This patch adds an insn pattern similar to various patterns already
present to handle extracting such a subreg. This allows the glibc
build to get further, until it runs into an assembler error for which
I have another patch.
gcc:
* config/rs6000/spe.md (*frob_<SPE64:mode>_ti_8): New insn
pattern.
gcc/testsuite:
* gcc.c-torture/compile/20161123-1.c: New test.
Jakub Jelinek [Wed, 23 Nov 2016 19:50:23 +0000 (20:50 +0100)]
re PR tree-optimization/78482 (wrong code at -O3 in both 32-bit and 64-bit modes on x86_64-linux-gnu)
PR tree-optimization/78482
* gcc.dg/torture/pr78482.c (c, d): Use signed char instead of char.
(bar): New function.
(main): Call bar instead of printf.
Jakub Jelinek [Wed, 23 Nov 2016 19:28:41 +0000 (20:28 +0100)]
re PR middle-end/69183 (ICE when using OpenMP PRIVATE keyword in OMP DO loop not explicitly encapsulated in OMP PARALLEL region)
PR middle-end/69183
* omp-low.c (build_outer_var_ref): Change lastprivate argument
to code, pass it recursively, adjust uses. For OMP_CLAUSE_PRIVATE
on worksharing constructs, treat it like clauses on simd construct.
Formatting fix.
(lower_rec_input_clauses): For OMP_CLAUSE_PRIVATE_OUTER_REF pass
OMP_CLAUSE_PRIVATE as last argument to build_outer_var_ref.
(lower_lastprivate_clauses): Pass OMP_CLAUSE_LASTPRIVATE instead
of true as last argument to build_outer_var_ref.
Uros Bizjak [Wed, 23 Nov 2016 19:05:53 +0000 (20:05 +0100)]
i386.md (*movqi_internal): Calculate mode attribute of alternatives 7,8,9 depending on TARGET_AVX512DQ.
* gcc.target/config/i386.md (*movqi_internal): Calculate mode
attribute of alternatives 7,8,9 depending on TARGET_AVX512DQ.
<TYPE_MSKMOV>: Emit kmovw for MODE_HI insn mode attribute.
(*k<logic><mode>): Calculate mode attribute depending on
TARGET_AVX512DQ. Emit k<logic>w for MODE_HI insn mode attribute.
(*andqi_1): Calculate mode attribute of alternative 3 depending
on TARGET_AVX512DQ. Emit kandw for MODE_HI insn mode attribute.
(kandn<mode>): Calculate mode attribute of alternative 2 depending
on TARGET_AVX512DQ. Emit kandnw for MODE_HI insn mode attribute.
(kxnor<mode>): Merge insn patterns using SWI1248_AVX512BW mode
iterator. Calculate mode attribute of alternative 1 depending
on TARGET_AVX512DQ. Emit kxnorw for MODE_HI insn mode attribute.
(*one_cmplqi2_1): Calculate mode attribute of alternative 2 depending
on TARGET_AVX512DQ. Emit knotw for MODE_HI insn mode attribute.
Jakub Jelinek [Wed, 23 Nov 2016 18:45:27 +0000 (19:45 +0100)]
re PR c++/77907 (Add "const" to argument of constexpr constructor causes the object to be left in unconstructed state)
PR c++/77907
* cp-gimplify.c (cp_fold) <case CALL_EXPR>: When calling constructor
and maybe_constant_value returns non-CALL_EXPR, create INIT_EXPR
with the object on lhs and maybe_constant_value returned expr on rhs.
PR middle-end/78153
* gimple-fold.c (fold_stmt_1): Handle case for GIMPLE_RETURN.
* tree-vrp.c (extract_range_basic): Handle case for
CFN_BUILT_IN_STRLEN.
testsuite/
* gcc.dg/tree-ssa/pr78153-1.c: New test.
* gcc.dg/tree-ssa/pr78153-2.c: Likewise.
James Greenhalgh [Wed, 23 Nov 2016 17:33:39 +0000 (17:33 +0000)]
[Patch 16/17 libgcc ARM] Half to double precision conversions
gcc/
* config/arm/arm.c (arm_convert_to_type): Delete.
(TARGET_CONVERT_TO_TYPE): Delete.
(arm_init_libfuncs): Enable trunc_optab from DFmode to HFmode.
(arm_libcall_uses_aapcs_base): Add trunc_optab from DF- to HFmode.
* config/arm/arm.h (TARGET_FP16_TO_DOUBLE): New.
* config/arm/arm.md (truncdfhf2): Only convert through SFmode if we
are in fast math mode, and have no single step hardware instruction.
(extendhfdf2): Only expand through SFmode if we don't have a
single-step hardware instruction.
* config/arm/vfp.md (*truncdfhf2): New.
(extendhfdf2): Likewise.
* config/arm/fp16.c (struct format): New.
(binary32): New.
(__gnu_float2h_internal): New. Body moved from
__gnu_f2h_internal and generalize.
(_gnu_f2h_internal): Move body to function __gnu_float2h_internal.
Call it with binary32.
Co-Authored-By: Matthew Wahab <matthew.wahab@arm.com>
From-SVN: r242781
James Greenhalgh [Wed, 23 Nov 2016 17:23:12 +0000 (17:23 +0000)]
[Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION
gcc/
* toplev.c (init_excess_precision): Delete most logic.
* tree.c (excess_precision_type): Rewrite to use
TARGET_EXCESS_PRECISION.
* doc/invoke.texi (-fexcess-precision): Document behaviour in a
more generic fashion.
* ginclude/float.h: Wrap definition of FLT_EVAL_METHOD in
__STDC_WANT_IEC_60559_TYPES_EXT__.
gcc/c-family/
* c-common.c (excess_precision_mode_join): New.
(c_ts18661_flt_eval_method): New.
(c_c11_flt_eval_method): Likewise.
(c_flt_eval_method): Likewise.
* c-common.h (excess_precision_mode_join): New.
(c_flt_eval_method): Likewise.
* c-cppbuiltin.c (c_cpp_flt_eval_method_iec_559): New.
(cpp_iec_559_value): Call it.
(c_cpp_builtins): Modify logic for __LIBGCC_*_EXCESS_PRECISION__,
call c_flt_eval_method to set __FLT_EVAL_METHOD__ and
__FLT_EVAL_METHOD_TS_18661_3__.
Jakub Jelinek [Wed, 23 Nov 2016 15:59:25 +0000 (16:59 +0100)]
re PR c++/71450 (ICE on invalid C++11 code on x86_64-linux-gnu: in tree check: expected record_type or union_type or qual_union_type, have template_type_parm in lookup_base, at cp/search.c:203)
PR c++/71450
* pt.c (tsubst_copy): Return error_mark_node when mark_used
fails, even when complain & tf_error.
* g++.dg/cpp0x/pr71450-1.C: New test.
* g++.dg/cpp0x/pr71450-2.C: New test.
Jakub Jelinek [Wed, 23 Nov 2016 15:54:39 +0000 (16:54 +0100)]
re PR c++/77739 (internal compiler error: in create_tmp_var, at gimple-expr.c:524)
PR c++/77739
* cp-gimplify.c (cp_gimplify_tree) <case VEC_INIT_EXPR>: Pass
false as handle_invisiref_parm_p to cp_genericize_tree.
(struct cp_genericize_data): Add handle_invisiref_parm_p field.
(cp_genericize_r): Don't wrap is_invisiref_parm into references
if !wtd->handle_invisiref_parm_p.
(cp_genericize_tree): Add handle_invisiref_parm_p argument,
set wtd.handle_invisiref_parm_p to it.
(cp_genericize): Pass true as handle_invisiref_parm_p to
cp_genericize_tree. Formatting fix.
Martin Jambor [Wed, 23 Nov 2016 14:51:02 +0000 (15:51 +0100)]
backport: hsa-builtins.def: New file.
Merge from HSA branch to trunk
2016-11-23 Martin Jambor <mjambor@suse.cz>
Martin Liska <mliska@suse.cz>
gcc/
* hsa-builtins.def: New file.
* Makefile.in (BUILTINS_DEF): Add hsa-builtins.def dependency.
* builtins.def: Include hsa-builtins.def.
(DEF_HSA_BUILTIN): New macro.
* dumpfile.h (OPTGROUP_OPENMP): Define.
* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
* gimple.h (gf_mask): Added elements GF_OMP_FOR_GRID_INTRA_GROUP and
GF_OMP_FOR_GRID_GROUP_ITER.
(gimple_omp_for_grid_phony): Added checking assert.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_for_grid_intra_group): New function.
(gimple_omp_for_set_grid_intra_group): Likewise.
(gimple_omp_for_grid_group_iter): Likewise.
(gimple_omp_for_set_grid_group_iter): Likewise.
* omp-low.c (check_omp_nesting_restrictions): Allow GRID loop where
previosuly only distribute loop was permitted.
(lower_lastprivate_clauses): Allow non tcc_comparison predicates.
(grid_get_kernel_launch_attributes): Support multiple HSA grid
dimensions.
(grid_expand_omp_for_loop): Likewise and also support standalone
distribute constructs. New parameter INTRA_GROUP, updated both users.
(grid_expand_target_grid_body): Support standalone distribute
constructs.
(pass_data_expand_omp): Changed optinfo_flags to OPTGROUP_OPENMP.
(pass_data_expand_omp_ssa): Likewise.
(pass_data_omp_device_lower): Likewsie.
(pass_data_lower_omp): Likewise.
(pass_data_diagnose_omp_blocks): Likewise.
(pass_data_oacc_device_lower): Likewise.
(pass_data_omp_target_link): Likewise.
(grid_lastprivate_predicate): New function.
(lower_omp_for_lastprivate): Call grid_lastprivate_predicate for
gridified loops.
(lower_omp_for): Support standalone distribute constructs.
(grid_prop): New type.
(grid_safe_assignment_p): Check for assignments to group_sizes, new
parameter GRID.
(grid_seq_only_contains_local_assignments): New parameter GRID, pass
it to callee.
(grid_find_single_omp_among_assignments_1): Likewise, improve missed
optimization info messages.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Do not bail out for SIMDs.
(grid_parallel_clauses_gridifiable): New function.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_dist_follows_simple_pattern): Likewise.
(grid_gfor_follows_tiling_pattern): Likewise.
(grid_call_permissible_in_distribute_p): Likewise.
(grid_handle_call_in_distribute): Likewise.
(grid_dist_follows_tiling_pattern): Likewise.
(grid_target_follows_gridifiable_pattern): Support standalone distribute
constructs.
(grid_var_segment): New enum.
(grid_mark_variable_segment): New function.
(grid_copy_leading_local_assignments): Call grid_mark_variable_segment
if a new argument says so.
(grid_process_grid_body): New function.
(grid_eliminate_combined_simd_part): Likewise.
(grid_mark_tiling_loops): Likewise.
(grid_mark_tiling_parallels_and_loops): Likewise.
(grid_process_kernel_body_copy): Support standalone distribute
constructs.
(grid_attempt_target_gridification): New grid variable holding overall
gridification state. Support standalone distribute constructs and
collapse clauses.
* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.
* hsa.h (hsa_bb): Add method method append_phi.
(hsa_insn_br): Renamed to hsa_insn_cbr, renamed all
occurences in all files too.
(hsa_insn_br): New class, now the ancestor of hsa_incn_cbr.
(is_a_helper <hsa_insn_br *>::test): New function.
(is_a_helper <hsa_insn_cbr *>::test): Adjust to only cover conditional
branch instructions.
(hsa_insn_signal): Make a direct descendant of
hsa_insn_basic. Add memorder constructor parameter and
m_memory_order and m_signalop member variables.
(hsa_insn_queue): Changed constructor parameters to common form.
Added m_segment and m_memory_order member variables.
(hsa_summary_t): Add private member function
process_gpu_implementation_attributes.
(hsa_function_summary): Rename m_binded_function to
m_bound_function.
(hsa_insn_basic_p): Remove typedef.
(hsa_op_with_type): Change hsa_insn_basic_p into plain pointers.
(hsa_op_reg_p): Remove typedef.
(hsa_function_representation): Change hsa_op_reg_p into plain
pointers.
(hsa_insn_phi): Removed new and delete operators.
(hsa_insn_br): Likewise.
(hsa_insn_cbr): Likewise.
(hsa_insn_sbr): Likewise.
(hsa_insn_cmp): Likewise.
(hsa_insn_mem): Likewise.
(hsa_insn_atomic): Likewise.
(hsa_insn_signal): Likewise.
(hsa_insn_seg): Likewise.
(hsa_insn_call): Likewise.
(hsa_insn_arg_block): Likewise.
(hsa_insn_comment): Likewise.
(hsa_insn_srctype): Likewise.
(hsa_insn_packed): Likewise.
(hsa_insn_cvt): Likewise.
(hsa_insn_alloca): Likewise.
* hsa.c (hsa_destroy_insn): Also handle instances of hsa_insn_br.
(process_gpu_implementation_attributes): New function.
(link_functions): Move some functionality into it. Adjust after
renaming m_binded_functions to m_bound_functions.
(hsa_insn_basic::op_output_p): Add BRIG_OPCODE_DEBUGTRAP
to the list of instructions with no output registers.
(get_in_type): Return this if it is a register of
matching size.
(hsa_get_declaration_name): Moved to...
* hsa-gen.c (hsa_get_declaration_name): ...here. Allocate
temporary string on an obstack instead from ggc.
(query_hsa_grid): Renamed to query_hsa_grid_dim, reimplemented, cut
down to two overloads.
(hsa_allocp_operand_address): Removed.
(hsa_allocp_operand_immed): Likewise.
(hsa_allocp_operand_reg): Likewise.
(hsa_allocp_operand_code_list): Likewise.
(hsa_allocp_operand_operand_list): Likewise.
(hsa_allocp_inst_basic): Likewise.
(hsa_allocp_inst_phi): Likewise.
(hsa_allocp_inst_mem): Likewise.
(hsa_allocp_inst_atomic): Likewise.
(hsa_allocp_inst_signal): Likewise.
(hsa_allocp_inst_seg): Likewise.
(hsa_allocp_inst_cmp): Likewise.
(hsa_allocp_inst_br): Likewise.
(hsa_allocp_inst_sbr): Likewise.
(hsa_allocp_inst_call): Likewise.
(hsa_allocp_inst_arg_block): Likewise.
(hsa_allocp_inst_comment): Likewise.
(hsa_allocp_inst_queue): Likewise.
(hsa_allocp_inst_srctype): Likewise.
(hsa_allocp_inst_packed): Likewise.
(hsa_allocp_inst_cvt): Likewise.
(hsa_allocp_inst_alloca): Likewise.
(hsa_allocp_bb): Likewise.
(hsa_obstack): New.
(hsa_init_data_for_cfun): Initialize obstack.
(hsa_deinit_data_for_cfun): Release memory of the obstack.
(hsa_op_immed::operator new): Use obstack instead of object_allocator.
(hsa_op_reg::operator new): Likewise.
(hsa_op_address::operator new): Likewise.
(hsa_op_code_list::operator new): Likewise.
(hsa_op_operand_list::operator new): Likewise.
(hsa_insn_basic::operator new): Likewise.
(hsa_insn_phi::operator new): Likewise.
(hsa_insn_br::operator new): Likewise.
(hsa_insn_sbr::operator new): Likewise.
(hsa_insn_cmp::operator new): Likewise.
(hsa_insn_mem::operator new): Likewise.
(hsa_insn_atomic::operator new): Likewise.
(hsa_insn_signal::operator new): Likewise.
(hsa_insn_seg::operator new): Likewise.
(hsa_insn_call::operator new): Likewise.
(hsa_insn_arg_block::operator new): Likewise.
(hsa_insn_comment::operator new): Likewise.
(hsa_insn_srctype::operator new): Likewise.
(hsa_insn_packed::operator new): Likewise.
(hsa_insn_cvt::operator new): Likewise.
(hsa_insn_alloca::operator new): Likewise.
(hsa_init_new_bb): Likewise.
(hsa_bb::append_phi): New function.
(gen_hsa_phi_from_gimple_phi): Use it.
(get_symbol_for_decl): Fix dinstinguishing between
global and local functions. Put local variables into a segment
according to their attribute or static flag, if there is one.
(hsa_insn_br::hsa_insn_br): New.
(hsa_insn_br::operator new): Likewise.
(hsa_insn_cbr::hsa_insn_cbr): Set width via ancestor constructor.
(query_hsa_grid_nodim): New function.
(multiply_grid_dim_characteristics): Likewise.
(gen_get_num_threads): Likewise.
(gen_get_num_teams): Reimplemented.
(gen_get_team_num): Likewise.
(gen_hsa_insns_for_known_library_call): Updated calls to the above
helper functions.
(get_memory_order_name): Removed.
(get_memory_order): Likewise.
(hsa_memorder_from_tree): New function.
(gen_hsa_ternary_atomic_for_builtin): Renamed to
gen_hsa_atomic_for_builtin, can also create signals.
(gen_hsa_insns_for_call): Handle many new builtins. Adjust to use
hsa_memory_order_from_tree and gen_hsa_atomic_for_builtin.
(hsa_insn_atomic): Fix function comment.
(hsa_insn_signal::hsa_insn_signal): Fix comment. Update call to
ancestor constructor and initialization of new member variables.
(hsa_insn_queue::hsa_insn_queue): Added initialization of new
member variables.
(hsa_get_host_function): Handle functions with no bound CPU
implementation. Fix binded to bound.
(get_brig_function_name): Likewise.
(HSA_SORRY_ATV): Remove semicolon after macro.
(HSA_SORRY_AT): Likewise.
(omp_simple_builtin::generate): Add missing semicolons.
(hsa_insn_phi::operator new): Removed.
(hsa_insn_br::operator new): Likewise.
(hsa_insn_cbr::operator new): Likewise.
(hsa_insn_sbr::operator new): Likewise.
(hsa_insn_cmp::operator new): Likewise.
(hsa_insn_mem::operator new): Likewise.
(hsa_insn_atomic::operator new): Likewise.
(hsa_insn_signal::operator new): Likewise.
(hsa_insn_seg::operator new): Likewise.
(hsa_insn_call::operator new): Likewise.
(hsa_insn_arg_block::operator new): Likewise.
(hsa_insn_comment::operator new): Likewise.
(hsa_insn_srctype::operator new): Likewise.
(hsa_insn_packed::operator new): Likewise.
(hsa_insn_cvt::operator new): Likewise.
(hsa_insn_alloca::operator new): Likewise.
(get_symbol_for_decl): Accept CONST_DECLs, put them to
readonly segment.
(gen_hsa_addr): Also process CONST_DECLs.
(gen_hsa_addr_insns): Process CONST_DECLs by creating private
copies.
(gen_hsa_unary_operation): Make sure the function does
not use bittype source type for firstbit and lastbit operations.
(gen_hsa_popcount_to_dest): Make sure the function uses a bittype
source type.
* hsa-brig.c (emit_insn_operands): Cope with zero operands in an
instruction.
(emit_branch_insn): Renamed to emit_cond_branch_insn.
Emit the width stored in the class.
(emit_generic_branch_insn): New function.
(emit_insn): Call emit_generic_branch_insn.
(emit_signal_insn): Remove obsolete comment. Update
member variable name, pick a type according to profile.
(emit_alloca_insn): Remove obsolete comment.
(emit_atomic_insn): Likewise.
(emit_queue_insn): Get segment and memory order from the IR object.
(hsa_brig_section): Make allocate_new_chunk, chunks
and cur_chunk provate, add a default NULL parameter to add method.
(hsa_brig_section::add): Added a new parameter, store pointer to
output data there if it is non-NULL.
(emit_function_directives): Use this new parameter instead of
calculating the pointer itself, fix function comment.
(hsa_brig_emit_function): Add forgotten endian conversion.
(hsa_output_kernels): Remove unnecessary building of
kernel_dependencies_vector_type.
(emit_immediate_operand): Declare.
(emit_directive_variable): Also emit initializers of CONST_DECLs.
(gen_hsa_insn_for_internal_fn_call): Also handle IFN_RSQRT.
(verify_function_arguments): Properly detect variadic
arguments.
* hsa-dump.c (hsa_width_specifier_name): New function.
(dump_hsa_insn_1): Dump generic branch instructions, update signal
member variable name. Special dumping for queue objects.
* ipa-hsa.c (process_hsa_functions): Adjust after renaming
m_binded_functions to m_bound_functions. Copy externally visible flag
to the node.
(ipa_hsa_write_summary): Likewise.
(ipa_hsa_read_section): Likewise.
gcc/fortran/
* f95-lang.c (DEF_HSA_BUILTIN): New macro.
Richard Biener [Wed, 23 Nov 2016 14:40:05 +0000 (14:40 +0000)]
re PR tree-optimization/78396 (gcc.dg/vect/bb-slp-cond-1.c FAILs after fix for PR77848)
2016-11-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/78396
* tree-vectorizer.c (vectorize_loops): If an innermost loop didn't
vectorize try vectorizing an if-converted body using BB vectorization.
This isn't intended to change the behaviour, just rewrite the
existing logic in a different (and hopefully clearer) way.
The new form -- particularly the part based on the "block"
concept -- is easier to convert to polynomial sizes.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* rtlanal.c (subreg_get_info): Use more local variables.
Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
occupies at least one register. Assume that full hard registers
have consistent endianness. Share previously-duplicated if block.
Rework the main handling so that it operates on independently-
addressable YMODE-sized blocks. Use subreg_size_lowpart_offset
to check lowpart offsets, without trying to find an equivalent
integer mode first. Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
as a final register-endianness correction.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242758
combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)
r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
It all has the same root cause: that patch makes combine convert every
lowpart subreg of a logical shift right to a zero_extract. This cannot
work at all if it is not a constant shift, and it has to be a bit more
careful exactly which bits it extracts.
PR target/77881
PR bootstrap/78390
PR target/78438
PR bootstrap/78477
* combine.c (make_compound_operation_int): Do not convert a subreg of
a non-constant logical shift right to a zero_extract. Handle the case
where some zero bits have been shifted into the range covered by that
subreg.
Provide versions of subreg_lowpart_offset and subreg_highpart_offset
that work on mode sizes rather than modes. Also provide a routine
that converts an lsb position to a subreg offset.
The intent (in combination with later patches) is to move the
handling of the BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN case into
just two places, so that for other combinations we don't have
to split offsets into words and subwords.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* rtl.h (subreg_size_offset_from_lsb): Declare.
(subreg_offset_from_lsb): New function.
(subreg_size_lowpart_offset): Declare.
(subreg_lowpart_offset): Turn into an inline function.
(subreg_size_highpart_offset): Declare.
(subreg_highpart_offset): Turn into an inline function.
* emit-rtl.c (subreg_size_lowpart_offset): New function.
(subreg_size_highpart_offset): Likewise
* rtlanal.c (subreg_size_offset_from_lsb): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242755
Richard Biener [Wed, 23 Nov 2016 14:25:48 +0000 (14:25 +0000)]
re PR tree-optimization/78482 (wrong code at -O3 in both 32-bit and 64-bit modes on x86_64-linux-gnu)
2016-11-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/78482
* tree-cfgcleanup.c: Include tree-ssa-loop-niter.h.
(remove_forwarder_block_with_phi): When merging with a loop
header creates a new latch reset number of iteration information
of the loop.
Bin Cheng [Wed, 23 Nov 2016 12:44:08 +0000 (12:44 +0000)]
fold-const.c (fold_cond_expr_with_comparison): Move simplification for A cmp C1 ? A : C2 to below, also simplify remaining code.
* fold-const.c (fold_cond_expr_with_comparison): Move simplification
for A cmp C1 ? A : C2 to below, also simplify remaining code.
* match.pd: Move and extend simplification from above to here:
(cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)).
* tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func.
(predicate_scalar_phi): Call fold_stmt using the new valueize func.
gcc/testsuite
* gcc.dg/fold-cond_expr-1.c: New test.
* gcc.dg/fold-condcmpconv-1.c: New test.
* gcc.dg/fold-condcmpconv-2.c: New test.
Paolo Bonzini [Wed, 23 Nov 2016 10:06:07 +0000 (10:06 +0000)]
system.h (HAVE_DESIGNATED_INITIALIZERS, [...]): Do not use "defined" in macros.
gcc:
2016-11-23 Paolo Bonzini <bonzini@gnu.org>
* system.h (HAVE_DESIGNATED_INITIALIZERS,
HAVE_DESIGNATED_UNION_INITIALIZERS): Do not use
"defined" in macros.
* doc/cpp.texi (Defined): Mention -Wexpansion-to-defined.
* doc/cppopts.texi (Invocation): Document -Wexpansion-to-defined.
* doc/invoke.texi (Warning Options): Document -Wexpansion-to-defined.
gcc/c-family:
2016-11-23 Paolo Bonzini <bonzini@gnu.org>
* c.opt (Wexpansion-to-defined): New.
gcc/testsuite:
2016-11-23 Paolo Bonzini <bonzini@gnu.org>
* gcc.dg/cpp/defined.c: Mark newly introduced warnings and
adjust for warning->pedwarn change.
* gcc.dg/cpp/defined-syshdr.c,
gcc.dg/cpp/defined-Wexpansion-to-defined.c,
gcc.dg/cpp/defined-Wextra-Wno-expansion-to-defined.c,
gcc.dg/cpp/defined-Wextra.c,
gcc.dg/cpp/defined-Wno-expansion-to-defined.c: New testcases.
libcpp:
2016-11-23 Paolo Bonzini <bonzini@gnu.org>
* include/cpplib.h (struct cpp_options): Add new member
warn_expansion_to_defined.
(CPP_W_EXPANSION_TO_DEFINED): New enum member.
* expr.c (parse_defined): Warn for all uses of "defined"
in macros, and tie warning to CPP_W_EXPANSION_TO_DEFINED.
Make it a pedwarning instead of a warning.
* system.h (HAVE_DESIGNATED_INITIALIZERS): Do not use
"defined" in macros.
The test fails for avr because fn1 does not get inlined into fn2. Inlining
occurs for x86_64 because fn1's computed size equals call_stmt_size. For the
avr, 32 bit memory moves are more expensive, and b[3] = p10[a] results in
a bigger size for fn1, preventing the inlining.
Add -finline-small-functions to force early inliner to inline fn1.
Georg-Johann Lay [Wed, 23 Nov 2016 09:17:57 +0000 (09:17 +0000)]
re PR target/60300 ([avr] Suboptimal stack pointer manipulation for frame setup)
gcc/
PR target/60300
* config/avr/constraints.md (Csp): Widen range to [-11..6].
* config/avr/avr.c (avr_prologue_setup_frame): Limit number
of RCALLs in prologue to 3.
Jakub Jelinek [Wed, 23 Nov 2016 08:08:47 +0000 (09:08 +0100)]
re PR target/78451 (FAIL: gcc.target/i386/sse-22a.c: error: inlining failed in call to always_inline '_mm512_setzero_ps')
PR target/78451
* c-pragma.c (handle_pragma_target): Don't replace
current_target_pragma, but chainon the new args to the current one.
* gcc.target/i386/pr78451.c: New test.
* gcc.target/i386/pr69255-1.c: Use #pragma GCC push_options
and #pragma GCC pop_options around the first #pragma GCC target.
* gcc.target/i386/pr69255-2.c: Likewise.
* gcc.target/i386/pr69255-3.c: Likewise.
Michael Collison [Wed, 23 Nov 2016 07:47:25 +0000 (07:47 +0000)]
2016-11-22 Michael Collison <michael.collison@arm.com>
* config/aarch64/aarch64-protos.h
(aarch64_and_split_imm1, aarch64_and_split_imm2)
(aarch64_and_bitmask_imm): New prototypes
* config/aarch64/aarch64.c (aarch64_and_split_imm1):
New overloaded function to create bit mask covering the
lowest to highest bits set.
(aarch64_and_split_imm2): New overloaded functions to create bit
mask of zeros between first and last bit set.
(aarch64_and_bitmask_imm): New function to determine if a integer
is a valid two instruction "and" operation.
* config/aarch64/aarch64.md:(and<mode>3): New define_insn and _split
allowing wider range of constants with "and" operations.
* (ior<mode>3, xor<mode>3): Use new LOGICAL2 iterator to prevent
"and" operator from matching restricted constant range used for
ior and xor operators.
* config/aarch64/constraints.md (UsO constraint): New SImode constraint
for constants in "and" operantions.
(UsP constraint): New DImode constraint for constants in "and" operations.
* config/aarch64/iterators.md (lconst2): New mode iterator.
(LOGICAL2): New code iterator.
* config/aarch64/predicates.md (aarch64_logical_and_immediate): New
predicate
(aarch64_logical_and_operand): New predicate allowing extended constants
for "and" operations.
* testsuite/gcc.target/aarch64/and_const.c: New test to verify
additional constants are recognized and fewer instructions generated.
* testsuite/gcc.target/aarch64/and_const2.c: New test to verify
additional constants are recognized and fewer instructions generated.