PR c++/80955
* lex.c (lex_string): When checking for a valid macro for the
warning related to -Wliteral-suffix (CPP_W_LITERAL_SUFFIX),
check that the macro name does not start with an underscore
before calling is_macro().
Martin Liska [Mon, 6 Nov 2017 09:02:15 +0000 (10:02 +0100)]
Instrument function exit with __builtin_unreachable in C++
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* c-opts.c (c_common_post_options): Set -Wreturn-type for C++
FE.
* c.opt: Set default value of warn_return_type.
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* constexpr.c (cxx_eval_builtin_function_call): Handle
__builtin_unreachable call.
(get_function_named_in_call): Declare function earlier.
(constexpr_fn_retval): Skip __builtin_unreachable.
* cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
...
(cp_maybe_instrument_return): ... this.
(cp_genericize): Call the function unconditionally.
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* options.c (gfc_post_options): Set default value of
-Wreturn-type to false.
...to avoid a warning about uninitialised wide_ints.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vrp.c (vrp_int_const_binop): Return true on success and
return the value by pointer.
(extract_range_from_multiplicative_op_1): Update accordingly.
Return as soon as an operation fails.
Thomas Koenig [Sun, 5 Nov 2017 17:24:37 +0000 (17:24 +0000)]
re PR fortran/82471 (Reorder loop for unfavorable index ordering in DO CONCURRENT and FORALL)
2017-11-05 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/82471
* lang.opt (ffrontend-loop-interchange): New option.
(Wfrontend-loop-interchange): New option.
* options.c (gfc_post_options): Handle ffrontend-loop-interchange.
* frontend-passes.c (gfc_run_passes): Run
optimize_namespace if flag_frontend_optimize or
flag_frontend_loop_interchange are set.
(optimize_namespace): Run functions according to flags set;
also call index_interchange.
(ind_type): New function.
(has_var): New function.
(index_cost): New function.
(loop_comp): New function.
2017-11-05 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/82471
* gfortran.dg/loop_interchange_1.f90: New test.
Paul Thomas [Sun, 5 Nov 2017 14:32:05 +0000 (14:32 +0000)]
re PR fortran/78641 ([OOP] ICE on polymorphic allocatable function in array constructor)
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/78641
* resolve.c (resolve_ordinary_assign): Do not add the _data
component for class valued array constructors being assigned
to derived type arrays.
* trans-array.c (gfc_trans_array_ctor_element): Take the _data
of class valued elements for assignment to derived type arrays.
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/78641
* gfortran.dg/class_66.f90: New test.
Paul Thomas [Sun, 5 Nov 2017 12:38:42 +0000 (12:38 +0000)]
re PR fortran/81447 ([7/8] gfortran fails to recognize the exact dynamic type of a polymorphic entity that was allocated in a external procedure)
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/81447
PR fortran/82783
* resolve.c (resolve_component): There is no need to resolve
the components of a use associated vtype.
(resolve_fl_derived): Unconditionally generate a vtable for any
module derived type, as long as the standard is F2003 or later
and it is not a vtype or a PDT template.
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/81447
* gfortran.dg/class_65.f90: New test.
* gfortran.dg/alloc_comp_basics_1.f90: Increase builtin_free
count from 18 to 21.
* gfortran.dg/allocatable_scalar_9.f90: Increase builtin_free
count from 32 to 54.
* gfortran.dg/auto_dealloc_1.f90: Increase builtin_free
count from 4 to 10.
* gfortran.dg/coarray_lib_realloc_1.f90: Increase builtin_free
count from 3 to 6. Likewise _gfortran_caf_deregister from 2 to
3, builtin_malloc from 1 to 4 and builtin_memcpy|= MEM from
2 to 5.
* gfortran.dg/finalize_28.f90: Increase builtin_free
count from 3 to 6.
* gfortran.dg/move_alloc_15.f90: Increase builtin_free and
builtin_malloc counts from 11 to 14.
* gfortran.dg/typebound_proc_27.f03: Increase builtin_free
count from 7 to 10. Likewise builtin_malloc from 12 to 15.
Tom de Vries [Sun, 5 Nov 2017 09:58:27 +0000 (09:58 +0000)]
Remove semicolon after do {} while (0) in DEF_SANITIZER_BUILTIN
2017-11-05 Tom de Vries <tom@codesourcery.com>
PR other/82784
* asan.c (DEF_SANITIZER_BUILTIN_1): Factor out of ...
(DEF_SANITIZER_BUILTIN): ... here.
(initialize_sanitizer_builtins): Use DEF_SANITIZER_BUILTIN_1 instead of
DEF_SANITIZER_BUILTIN in if stmt. Add missing semicolon.
Michael Clark [Sun, 5 Nov 2017 00:42:54 +0000 (00:42 +0000)]
RISC-V: Emit "i" suffix for instructions with immediate operands
This changes makes GCC asm output use instruction names that are
consistent with the RISC-V ISA manual. The assembler accepts
immediate-operand instructions without the "i" suffix, so this all
worked before, it's just a bit cleaner to match the ISA manual more
closely.
Andrew Waterman [Sun, 5 Nov 2017 00:30:40 +0000 (00:30 +0000)]
RISC-V: Set SLOW_BYTE_ACCESS=1
When implementing the RISC-V port, I took the name of this macro at
face value. It appears we were mistaken in what this means, here's a
quote from the SPARC port that better describes what SLOW_BYTE_ACCESS
does
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
better than access by words when possible, so grab a whole word
and maybe make use of that. */
I've added the comment to our port as well.
See https://gcc.gnu.org/ml/gcc/2017-08/msg00202.html for more
discussion. Thanks to Michael Clark and Andrew Pinski for the help!
gcc/ChangeLog
2017-11-04 Andrew Waterman <andrew@sifive.com>
* config/riscv/riscv.h (SLOW_BYTE_ACCESS): Change to 1.
Daniel Santos [Sat, 4 Nov 2017 22:38:43 +0000 (22:38 +0000)]
PR target/82002 Part 2: Correct non-immediate offset/invalid INSN
When we are realigning the stack pointer, making an ms_abi to sysv_abi
call and allocating 2GiB or more on the stack we end up with an invalid
INSN due to a non-immediate offset. This occurs both with and without
-mcall-ms2sysv-xlogues. Additionally, the stack allocation with
-mcall-ms2sysv-xlogues is ignoring (silently disabling) stack checking,
stack clash checking and probing.
This patch fixes these problems by:
1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save.
2. Rearrange where we emit SSE saves or stub call:
a. Before frame allocation when offset from frame to save area is >= 2GiB.
b. After frame allocation when frame is < 2GiB. (Stack allocations
prior to the stub call can't be combined with those afterwards, so
this is better when possible.)
3. Modify choose_baseaddr to take an optional scratch_regno argument
and never return rtx that cannot be used as an immediate.
gcc:
config/i386/i386.c (choose_basereg): Use optional scratch
register and add assertion.
(x86_emit_outlined_ms2sysv_save): Use scratch register when
needed, and don't allocate stack.
(ix86_expand_prologue): Rearrange where SSE saves/stub call is
emitted, correct wrong allocation with -mcall-ms2sysv-xlogues.
(ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets.
gcc/testsuite:
gcc.target/i386/pr82002-2a.c: Change from xfail to fail.
gcc.target/i386/pr82002-2b.c: Likewise.
Andreas Tobler [Sat, 4 Nov 2017 19:40:23 +0000 (20:40 +0100)]
re PR libgcc/82635 (std::thread's join broken on FreeBSD with all GCCs >= 5)
2017-11-04 Andreas Tobler <andreast@gcc.gnu.org>
PR libgcc/82635
* config/i386/freebsd-unwind.h (MD_FALLBACK_FRAME_STATE_FOR): Use a
sysctl to determine whether we're in a trampoline.
Keep the pattern matching method for systems without
KERN_PROC_SIGTRAMP sysctl.
trans-expr.c (gfc_trans_assignment_1): Character kind conversion may create a loop variant temporary, too.
gcc/fortran/ChangeLog:
2017-11-04 Andre Vehreschild <vehre@gcc.gnu.org>
* trans-expr.c (gfc_trans_assignment_1): Character kind conversion may
create a loop variant temporary, too.
* trans-intrinsic.c (conv_caf_send): Treat char arrays as arrays and
not as scalars.
* trans.c (get_array_span): Take the character kind into account when
doing pointer arithmetic.
gcc/testsuite/ChangeLog:
2017-11-04 Andre Vehreschild <vehre@gcc.gnu.org>
* gfortran.dg/coarray/send_char_array_1.f90: New test.
Thomas Koenig [Sat, 4 Nov 2017 13:20:32 +0000 (13:20 +0000)]
re PR fortran/29600 ([F03] MINLOC and MAXLOC take an optional KIND argument)
2017-11-04 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/29600
* gfortran.h (gfc_check_f): Replace fm3l with fm4l.
* intrinsic.h (gfc_resolve_maxloc): Add gfc_expr * to argument
list in protoytpe.
(gfc_resolve_minloc): Likewise.
* check.c (gfc_check_minloc_maxloc): Handle kind argument.
* intrinsic.c (add_sym_3_ml): Rename to
(add_sym_4_ml): and handle kind argument.
(add_function): Replace add_sym_3ml with add_sym_4ml and add
extra arguments for maxloc and minloc.
(check_specific): Change use of check.f3ml with check.f4ml.
* iresolve.c (gfc_resolve_maxloc): Handle kind argument. If
the kind is smaller than the smallest library version available,
use gfc_default_integer_kind and convert afterwards.
(gfc_resolve_minloc): Likewise.
2017-11-04 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/29600
* gfortran.dg/minmaxloc_8.f90: New test.
Steven G. Kargl [Sat, 4 Nov 2017 00:34:40 +0000 (00:34 +0000)]
re PR fortran/82796 (Private+equivalence in used module breaks compilation of pure function)
2017-11-01 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/82796
* resolve.c (resolve_equivalence): An entity in a common block within
a module cannot appear in an equivalence statement if the entity is
with a pure procedure.
2017-11-01 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/82796
* gfortran.dg/equiv_pure.f90: New test.
* config/i386/i386.c (ix86_emit_restore_reg_using_pop): Prototype.
(ix86_adjust_stack_and_probe_stack_clash): Use a push/pop sequence
to probe at the start of a noreturn function.
Jakub Jelinek [Fri, 3 Nov 2017 19:08:25 +0000 (20:08 +0100)]
re PR tree-optimization/78821 (GCC7: Copying whole 32 bits structure field by field not optimised into copying whole 32 bits at once)
PR tree-optimization/78821
* gimple-ssa-store-merging.c: Update the file comment.
(MAX_STORE_ALIAS_CHECKS): Define.
(struct store_operand_info): New type.
(store_operand_info::store_operand_info): New constructor.
(struct store_immediate_info): Add rhs_code and ops data members.
(store_immediate_info::store_immediate_info): Add rhscode, op0r
and op1r arguments to the ctor, initialize corresponding data members.
(struct merged_store_group): Add load_align_base and load_align
data members.
(merged_store_group::merged_store_group): Initialize them.
(merged_store_group::do_merge): Update them.
(merged_store_group::apply_stores): Pick the constant for
encode_tree_to_bitpos from one of the two operands, or skip
encode_tree_to_bitpos if neither operand is a constant.
(class pass_store_merging): Add process_store method decl. Remove
bool argument from terminate_all_aliasing_chains method decl.
(pass_store_merging::terminate_all_aliasing_chains): Remove
var_offset_p argument and corresponding handling.
(stmts_may_clobber_ref_p): New function.
(compatible_load_p): New function.
(imm_store_chain_info::coalesce_immediate_stores): Terminate group
if there is overlap and rhs_code is not INTEGER_CST. For
non-overlapping stores terminate group if rhs is not mergeable.
(get_alias_type_for_stmts): Change first argument from
auto_vec<gimple *> & to vec<gimple *> &. Add IS_LOAD, CLIQUEP and
BASEP arguments. If IS_LOAD is true, look at rhs1 of the stmts
instead of lhs. Compute *CLIQUEP and *BASEP in addition to the
alias type.
(get_location_for_stmts): Change first argument from
auto_vec<gimple *> & to vec<gimple *> &.
(struct split_store): Remove orig_stmts data member, add orig_stores.
(split_store::split_store): Create orig_stores rather than orig_stmts.
(find_constituent_stmts): Renamed to ...
(find_constituent_stores): ... this. Change second argument from
vec<gimple *> * to vec<store_immediate_info *> *, push pointers
to info structures rather than the statements.
(split_group): Rename ALLOW_UNALIGNED argument to
ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
it. Adjust find_constituent_stores caller.
(imm_store_chain_info::output_merged_store): Handle rhs_code other
than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
get_location_for_stmts callers. Set MR_DEPENDENCE_CLIQUE and
MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
(mem_valid_for_store_merging): New function.
(handled_load): New function.
(pass_store_merging::process_store): New method.
(pass_store_merging::execute): Use process_store method. Adjust
terminate_all_aliasing_chains caller.
* gcc.dg/store_merging_13.c: New test.
* gcc.dg/store_merging_14.c: New test.
Wilco Dijkstra [Fri, 3 Nov 2017 18:19:33 +0000 (18:19 +0000)]
Improve aarch64_legitimate_constant_p
This patch further improves aarch64_legitimate_constant_p. Allow all
integer, floating point and vector constants. Allow label references
and non-anchor symbols with an immediate offset. This allows such
constants to be rematerialized, resulting in smaller code and fewer stack
spills. SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.
gcc/
* config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
Return true for more constants, symbols and label references.
(aarch64_valid_floating_const): Remove unused function.
Wilco Dijkstra [Fri, 3 Nov 2017 16:29:47 +0000 (16:29 +0000)]
re PR c++/82768 (ICE in synthesize_implicit_template_parm, at cp/parser.c:39338)
Fix PR82768
Forcing LR at the bottom of the frame caused a few test failures.
Since there are some cases that generate worse code, revert this
part, and the frame tests pass again.
gcc/
PR target/82786
* config/aarch64/aarch64.c (aarch64_layout_frame):
Undo forcing of LR at bottom of frame.
* asan.c (create_cond_insert_point): Maintain profile.
* ipa-utils.c (ipa_merge_profiles): Be sure only IPA profiles are
merged.
* basic-block.h (struct basic_block_def): Remove frequency.
(EDGE_FREQUENCY): Use to_frequency
* bb-reorder.c (push_to_next_round_p): Use only IPA counts for global
heuristics.
(find_traces): Update to use to_frequency.
(find_traces_1_round): Likewise; use only IPA counts.
(bb_to_key): Likewise.
(connect_traces): Use IPA counts only.
(copy_bb_p): Update to use to_frequency.
(fix_up_crossing_landing_pad): Likewise.
(sanitize_hot_paths): Likewise.
* bt-load.c (basic_block_freq): Likewise.
* cfg.c (init_flow): Set count_max to uninitialized.
(check_bb_profile): Remove frequencies; check counts.
(dump_bb_info): Do not dump frequencies.
(update_bb_profile_for_threading): Update counts only.
(scale_bbs_frequencies_int): Likewise.
(MAX_SAFE_MULTIPLIER): Remove.
(scale_bbs_frequencies_gcov_type): Update counts only.
(scale_bbs_frequencies_profile_count): Update counts only.
(scale_bbs_frequencies): Update counts only.
* cfg.h (struct control_flow_graph): Add count-max.
(update_bb_profile_for_threading): Update prototype.
* cfgbuild.c (find_bb_boundaries): Do not update frequencies.
(find_many_sub_basic_blocks): Likewise.
* cfgcleanup.c (try_forward_edges): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgexpand.c (expand_gimple_cond): Likewise.
(expand_gimple_tailcall): Likewise.
(construct_init_block): Likewise.
(construct_exit_block): Likewise.
* cfghooks.c (verify_flow_info): Check consistency of counts.
(dump_bb_for_graph): Do not dump frequencies.
(split_block_1): Do not update frequencies.
(split_edge): Do not update frequencies.
(make_forwarder_block): Do not update frequencies.
(duplicate_block): Do not update frequencies.
(account_profile_record): Do not update frequencies.
* cfgloop.c (find_subloop_latch_edge_by_profile): Use IPA counts
for global heuristics.
* cfgloopanal.c (average_num_loop_insns): Update to use to_frequency.
(expected_loop_iterations_unbounded): Use counts only.
* cfgloopmanip.c (scale_loop_profile): Simplify.
(create_empty_loop_on_edge): Simplify
(loopify): Simplify
(duplicate_loop_to_header_edge): Simplify
* cfgrtl.c (force_nonfallthru_and_redirect): Update profile.
(update_br_prob_note): Take care of removing note when profile
becomes undefined.
(relink_block_chain): Do not dump frequency.
(rtl_account_profile_record): Use to_frequency.
* cgraph.c (symbol_table::create_edge): Convert count to ipa count.
(cgraph_edge::redirect_call_stmt_to_calle): Conver tcount to ipa count.
(cgraph_update_edges_for_call_stmt_node): Likewise.
(cgraph_edge::verify_count_and_frequency): Update.
(cgraph_node::verify_node): Temporarily disable frequency verification.
* cgraphbuild.c (compute_call_stmt_bb_frequency): Use
to_cgraph_frequency.
(cgraph_edge::rebuild_edges): Convert to ipa counts.
* cgraphunit.c (init_lowered_empty_function): Do not initialize
frequencies.
(cgraph_node::expand_thunk): Update profile.
* except.c (dw2_build_landing_pads): Do not update frequency.
* final.c (compute_alignments): Use to_frequency.
(dump_basic_block_info): Do not dump frequency.
* gimple-pretty-print.c (dump_profile): Do not dump frequency.
(dump_gimple_bb_header): Do not dump frequency.
* gimple-ssa-isolate-paths.c (isolate_path): Do not update frequency;
do update count.
* gimple-streamer-in.c (input_bb): Do not stream frequency.
* gimple-streamer-out.c (output_bb): Do not stream frequency.
* haifa-sched.c (sched_pressure_start_bb): Use to_freuqency.
(init_before_recovery): Do not update frequency.
(sched_create_recovery_edges): Do not update frequency.
* hsa-gen.c (convert_switch_statements): Do not update frequency.
* ipa-cp.c (ipcp_propagate_stage): Update search for max_count.
(ipa_cp_c_finalize): Set max_count to uninitialized.
* ipa-fnsummary.c (get_minimal_bb): Use counts.
(param_change_prob): Use counts.
* ipa-profile.c (ipa_profile_generate_summary): Do not summarize
local profiles.
* ipa-split.c (consider_split): Use to_frequency.
(split_function): Use to_frequency.
* ira-build.c (loop_compare_func): Likewise.
(mark_loops_for_removal): Likewise.
(mark_all_loops_for_removal): Likewise.
* loop-doloop.c (doloop_modify): Do not update frequency.
* loop-unroll.c (unroll_loop_runtime_iterations): Do not update
frequency.
* lto-streamer-in.c (input_function): Update count_max.
* omp-expand.c (expand_omp_taskreg): Update count_max.
* omp-simd-clone.c (simd_clone_adjust): Update profile.
* predict.c (maybe_hot_frequency_p): Use to_frequency.
(maybe_hot_count_p): Use ipa counts only.
(maybe_hot_bb_p): Simplify.
(maybe_hot_edge_p): Simplify.
(probably_never_executed): Do not take frequency argument.
(probably_never_executed_bb_p): Do not pass frequency.
(probably_never_executed_edge_p): Likewise.
(combine_predictions_for_bb): Check that profile is nonzero.
(propagate_freq): Do not set frequency.
(drop_profile): Simplify.
(counts_to_freqs): Simplify.
(expensive_function_p): Use to_frequency.
(propagate_unlikely_bbs_forward): Simplify.
(determine_unlikely_bbs): Simplify.
(estimate_bb_frequencies): Add hack to silence graphite issues.
(compute_function_frequency): Use ipa counts.
(pass_profile::execute): Update.
(rebuild_frequencies): Use counts only.
(force_edge_cold): Use counts only.
* profile-count.c (profile_count::dump): Dump new count types.
(profile_count::differs_from_p): Check compatiblity.
(profile_count::to_frequency): New function.
(profile_count::to_cgraph_frequency): New function.
* profile-count.h (struct function): Declare.
(enum profile_quality): Add profile_guessed_local and
profile_guessed_global0.
(class profile_proability): Decrease number of bits to 29;
update from_reg_br_prob_note and to_reg_br_prob_note.
(class profile_count: Update comment; decrease number of bits
to 61. Check compatibility.
(profile_count::compatible_p): New private member function.
(profile_count::ipa_p): New member function.
(profile_count::operator<): Handle global zero correctly.
(profile_count::operator>): Handle global zero correctly.
(profile_count::operator<=): Handle global zero correctly.
(profile_count::operator>=): Handle global zero correctly.
(profile_count::nonzero_p): New member function.
(profile_count::force_nonzero): New member function.
(profile_count::max): New member function.
(profile_count::apply_scale): Handle IPA scalling.
(profile_count::guessed_local): New member function.
(profile_count::global0): New member function.
(profile_count::ipa): New member function.
(profile_count::to_frequency): Declare.
(profile_count::to_cgraph_frequency): Declare.
* profile.c (OVERLAP_BASE): Delete.
(compute_frequency_overlap): Delete.
(compute_branch_probabilities): Do not use compute_frequency_overlap.
* regs.h (REG_FREQ_FROM_BB): Use to_frequency.
* sched-ebb.c (rank): Use counts only.
* shrink-wrap.c (handle_simple_exit): Use counts only.
(try_shrink_wrapping): Use counts only.
(place_prologue_for_one_component): Use counts only.
* tracer.c (find_best_predecessor): Use to_frequency.
(find_trace): Use to_frequency.
(tail_duplicate): Use to_frequency.
* trans-mem.c (expand_transaction): Do not update frequency.
* tree-call-cdce.c: Do not update frequency.
* tree-cfg.c (gimple_find_sub_bbs): Likewise.
(gimple_merge_blocks): Likewise.
(gimple_split_edge): Likewise.
(gimple_duplicate_sese_region): Likewise.
(gimple_duplicate_sese_tail): Likewise.
(move_sese_region_to_fn): Likewise.
(gimple_account_profile_record): Likewise.
(insert_cond_bb): Likewise.
* tree-complex.c (expand_complex_div_wide): Likewise.
* tree-eh.c (lower_resx): Update profile.
* tree-inline.c (copy_bb): Simplify count scaling; do not scale
frequencies.
(initialize_cfun): Do not initialize frequencies
(freqs_to_counts): Delete.
(copy_cfg_body): Ignore count parameter.
(copy_body): Update.
(expand_call_inline): Update count_max.
(optimize_inline_calls): Update count_max.
(tree_function_versioning): Update count_max.
* tree-ssa-coalesce.c (coalesce_cost_bb): Use to_frequency.
* tree-ssa-ifcombine.c (update_profile_after_ifcombine): Do not update
frequency.
* tree-ssa-loop-im.c (execute_sm_if_changed): Use counts only.
* tree-ssa-loop-ivcanon.c (unloop_loops): Do not update freuqency.
(try_peel_loop): Likewise.
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Use
to_frequency.
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Pass -1.
(tree_transform_and_unroll_loop): Do not use frequencies
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations):
Use reliable prediction only.
* tree-ssa-loop-unswitch.c (hoist_guard): Do not use frequencies.
* tree-ssa-sink.c (select_best_block): Use to_frequency.
* tree-ssa-tail-merge.c (replace_block_by): Temporarily disable
probability scaling.
* tree-ssa-threadupdate.c (create_block_for_threading): Do
not update frequency
(any_remaining_duplicated_blocks): Likewise.
(update_profile): Likewise.
(estimated_freqs_path): Delete.
(freqs_to_counts_path): Delete.
(clear_counts_path): Delete.
(ssa_fix_duplicate_block_edges): Likewise.
(duplicate_thread_path): Likewise.
* tree-switch-conversion.c (gen_inbound_check): Use counts.
* tree-tailcall.c (decrease_profile): Do not update frequency.
(eliminate_tail_call): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-loop.c (scale_profile_for_vect_loop): Likewise.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* ubsan.c (ubsan_expand_null_ifn): Update profile.
(ubsan_expand_ptr_ifn): Update profile.
* value-prof.c (gimple_ic): Simplify.
* value-prof.h (gimple_ic): Update prototype.
* ipa-inline-transform.c (inline_transform): Fix scaling conditoins.
* ipa-inline.c (compute_uninlined_call_time): Be sure that
counts are nonzero.
(want_inline_self_recursive_call_p): Likewise.
(resolve_noninline_speculation): Only cummulate defined counts.
(inline_small_functions): Use nonzero_p.
(ipa_inline): Do not access freed node.
Wilco Dijkstra [Fri, 3 Nov 2017 15:20:53 +0000 (15:20 +0000)]
Set default sched pressure algorithm
The Arm backend sets the default sched-pressure algorithm to SCHED_PRESSURE_MODEL.
Benchmarking on AArch64 shows this speeds up floating point performance on SPEC -
eg. CactusBSSN improves by ~16%. The gains are mostly due to less spilling,
so enable this on AArch64 by default.
gcc/
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set PARAM_SCHED_PRESSURE_ALGORITHM to SCHED_PRESSURE_MODEL.
The rs6000 port currently has an *lt0_disi define_insn, setting the DI
result to whether the SI argument is negative or not. It turns out the
generic optimisers cannot always figure out in the other cases either
that this is just a shift for us. This patch adds patterns for all
four SI/DI combinations.
PR82809: register handling in ix86_vector_duplicate_value
When adding the call to gen_vec_duplicate, I failed to notice that
code further down modified the VEC_DUPLICATE in place. That isn't
safe if gen_vec_duplicate returned a const_vector.
2017-11-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR target/82809
* config/i386/i386.c (ix86_vector_duplicate_value): Use
gen_vec_duplicate after forcing the scalar into a register.
This adds some extra debug info to the dump file for combine: print
the insns that are input to try_combine. I was worried printing more
will make the dump file only harder to read, but especially the info
from the REG_DEAD notes is invaluable.
* combine (try_combine): Print the insns input to try_combine to the
dump file.
David Malcolm [Thu, 2 Nov 2017 20:13:18 +0000 (20:13 +0000)]
Add selftest for diagnostic_get_location_text
gcc/ChangeLog:
* diagnostic.c: Include "selftest-diagnostic.h".
(selftest::assert_location_text): New function.
(selftest::test_diagnostic_get_location_text): New function.
(selftest::diagnostic_c_tests): Call it.
David Malcolm [Thu, 2 Nov 2017 20:09:18 +0000 (20:09 +0000)]
Move selftest::test_diagnostic_context to its own header
It's useful to not rely on global_dc in selftests, so this patch
moves class selftest::test_diagnostic_context from
diagnostic-show-locus.c to a new header and source file.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add selftest-diagnostic.o.
* diagnostic-show-locus.c: Include "selftest-diagnostic.h".
(class selftest::test_diagnostic_context): Move to...
* selftest-diagnostic.c: New file.
* selftest-diagnostic.h: New file.
James Bowman [Thu, 2 Nov 2017 19:41:02 +0000 (19:41 +0000)]
Add FT32B support
FT32B is a new FT32 architecture type. FT32B has a code compression
scheme which uses linker relaxations. It also has a security option to
prevent reads from program memory.
Nathan Sidwell [Thu, 2 Nov 2017 18:26:29 +0000 (18:26 +0000)]
[PR c++/82710] false positive paren warning
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00119.html
PR c++/82710
* decl.c (grokdeclarator): Don't warn when parens protect a return
type from a qualified name.
Wilco Dijkstra [Thu, 2 Nov 2017 15:12:51 +0000 (15:12 +0000)]
Define MALLOC_ABI_ALIGNMENT
The AArch64 backend currently doesn't set MALLOC_ABI_ALIGNMENT, so
add this to enable alignment optimizations on malloc pointers.
Use the same value as STACK_BOUNDARY and BIGGEST_ALIGNMENT.
gcc/
* config/aarch64/aarch64.h (MALLOC_ABI_ALIGNMENT): New define.
Jeff Law [Wed, 1 Nov 2017 22:52:34 +0000 (16:52 -0600)]
tree-ssa-ccp.c (ccp_folder): New class derived from substitute_and_fold_engine.
* tree-ssa-ccp.c (ccp_folder): New class derived from
substitute_and_fold_engine.
(ccp_folder::get_value): New member function.
(ccp_folder::fold_stmt): Renamed from ccp_fold_stmt.
(ccp_fold_stmt): Remove prototype.
(ccp_finalize): Call substitute_and_fold from the ccp_class.
* tree-ssa-copy.c (copy_folder): New class derived from
substitute_and_fold_engine.
(copy_folder::get_value): Renamed from get_value.
(fini_copy_prop): Call substitute_and_fold from copy_folder class.
* tree-vrp.c (vrp_folder): New class derived from
substitute_and_fold_engine.
(vrp_folder::fold_stmt): Renamed from vrp_fold_stmt.
(vrp_folder::get_value): New member function.
(vrp_finalize): Call substitute_and_fold from vrp_folder class.
(evrp_dom_walker::before_dom_children): Similarly for replace_uses_in.
* tree-ssa-propagate.h (substitute_and_fold_engine): New class to
provide a class interface to folder/substitute routines.
(ssa_prop_fold_stmt_fn): Remove typedef.
(ssa_prop_get_value_fn): Likewise.
(subsitute_and_fold): Remove prototype.
(replace_uses_in): Likewise.
* tree-ssa-propagate.c (substitute_and_fold_engine::replace_uses_in):
Renamed from replace_uses_in. Call the virtual member function
(substitute_and_fold_engine::replace_phi_args_in): Similarly.
(substitute_and_fold_dom_walker): Remove initialization of
data member entries for calbacks. Add substitute_and_fold_engine
member and initialize it.
(substitute_and_fold_dom_walker::before_dom_children0: Use the
member functions for get_value, replace_phi_args_in c
replace_uses_in, and fold_stmt calls.
(substitute_and_fold_engine::substitute_and_fold): Renamed from
substitute_and_fold. Remove assert. Update ctor call.
* tree-ssa-propagate.h (ssa_prop_visit_stmt_fn): Remove typedef.
(ssa_prop_visit_phi_fn): Likewise.
(class ssa_propagation_engine): New class to provide an interface
into ssa_propagate.
* tree-ssa-propagate.c (ssa_prop_visit_stmt): Remove file scoped
variable.
(ssa_prop_visit_phi): Likewise.
(ssa_propagation_engine::simulate_stmt): Moved into class.
Call visit_phi/visit_stmt from the class rather than via
file scoped static variables.
(ssa_propagation_engine::simulate_block): Moved into class.
(ssa_propagation_engine::process_ssa_edge_worklist): Similarly.
(ssa_propagation_engine::ssa_propagate): Similarly. No longer
set file scoped statics for the visit_stmt/visit_phi callbacks.
* tree-complex.c (complex_propagate): New class derived from
ssa_propagation_engine.
(complex_propagate::visit_stmt): Renamed from complex_visit_stmt.
(complex_propagate::visit_phi): Renamed from complex_visit_phi.
(tree_lower_complex): Call ssa_propagate via the complex_propagate
class.
* tree-ssa-ccp.c: (ccp_propagate): New class derived from
ssa_propagation_engine.
(ccp_propagate::visit_phi): Renamed from ccp_visit_phi_node.
(ccp_propagate::visit_stmt): Renamed from ccp_visit_stmt.
(do_ssa_ccp): Call ssa_propagate from the ccp_propagate class.
* tree-ssa-copy.c (copy_prop): New class derived from
ssa_propagation_engine.
(copy_prop::visit_stmt): Renamed from copy_prop_visit_stmt.
(copy_prop::visit_phi): Renamed from copy_prop_visit_phi_node.
(execute_copy_prop): Call ssa_propagate from the copy_prop class.
* tree-vrp.c (vrp_prop): New class derived from ssa_propagation_engine.
(vrp_prop::visit_stmt): Renamed from vrp_visit_stmt.
(vrp_prop::visit_phi): Renamed from vrp_visit_phi_node.
(execute_vrp): Call ssa_propagate from the vrp_prop class.
Jakub Jelinek [Wed, 1 Nov 2017 21:52:21 +0000 (22:52 +0100)]
re PR rtl-optimization/82778 (crash: insn does not satisfy its constraints)
PR rtl-optimization/82778
PR rtl-optimization/82597
* compare-elim.c (struct comparison): Add in_a_setter field.
(find_comparison_dom_walker::before_dom_children): Remove killed
bitmap and df_simulate_find_defs call, instead walk the defs.
Compute last_setter and initialize in_a_setter. Merge definitions
with first initialization for a few variables.
(try_validate_parallel): Use insn_invalid_p instead of
recog_memoized. Return insn rather than just the pattern.
(try_merge_compare): Fix up comment. Don't uselessly test if
in_a is a REG_P. Use cmp->in_a_setter instead of walking UD
chains.
(execute_compare_elim_after_reload): Remove df_chain_add_problem
call.
* g++.dg/opt/pr82778.C: New test.
2017-11-01 Michael Collison <michael.collison@arm.com>
PR rtl-optimization/82597
* gcc.dg/pr82597.c: New test.
aarch64_rtx_costs uses the number of registers in a mode as the basis
of SET costs. This patch makes it get the number of registers from
aarch64_hard_regno_nregs rather than repeating the calcalation inline.
Handling SVE modes in aarch64_hard_regno_nregs is then enough to get
the correct SET cost as well.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_rtx_costs): Use
aarch64_hard_regno_nregs to get the number of registers
in a mode.
Reviewed-By: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254327
The SVE port uses the public constraints "Upl" and "Upa" to mean
"low predicate register" and "any predicate register" respectively.
"Upl" was already used as an internal-only constraint by the
addition patterns, so this patch renames it to "Uaa" ("two adds
needed").
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
Reviewed-By: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254326
This patch simply moves code around, in order to make the later
patches easier to read, and to avoid forward declarations.
It doesn't add the missing function comments because the interfaces
will change in a later patch.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254325
[AArch64] Generate permute patterns using rtx builders
This patch replaces switch statements that call specific generator
functions with code that constructs the rtl pattern directly.
This seemed to scale better to SVE and also seems less error-prone.
As a side-effect, the patch fixes the REV handling for diff==1,
vmode==E_V4HFmode and adds missing support for diff==3,
vmode==E_V4HFmode.
To compensate for the lack of switches that check for specific modes,
the patch makes aarch64_expand_vec_perm_const_1 reject permutes on
single-element vectors (specifically V1DImode).
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp)
(aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev)
(aarch64_evpc_dup): Generate rtl direcly, rather than using
named expanders.
(aarch64_expand_vec_perm_const_1): Explicitly check for permutes
of a single element.
* config/aarch64/iterators.md: Add a comment above the permute
unspecs to say that they are generated directly by
aarch64_expand_vec_perm_const.
* config/aarch64/aarch64-simd.md: Likewise the permute instructions.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254324
François Dumont [Wed, 1 Nov 2017 17:52:05 +0000 (17:52 +0000)]
printers.py (StdExpAnyPrinter.__init__): Strip typename versioned namespace before the substitution.
2017-11-01 François Dumont <fdumont@gcc.gnu.org>
* python/libstdcxx/v6/printers.py (StdExpAnyPrinter.__init__): Strip
typename versioned namespace before the substitution.
(StdExpOptionalPrinter.__init__): Likewise.
(StdVariantPrinter.__init__): Likewise.
(Printer.add_version): Inject versioned namespace after std or
__gnu_cxx.
(build_libstdcxx_dictionary): Adapt add_version usages, always pass
namespace first and symbol second.
combine: Fix bug in giving up placing REG_DEAD notes (PR82683)
When we have a REG_DEAD note for a reg that is set in the new I2, we
drop the note on the floor (we cannot find whether to place it on I2
or on I3). But the code I added to do this has a bug and does not
always actually drop it. This patch fixes it.
But that on its own is too pessimistic, it turns out, and we generate
worse code. One case where we do know where to place the note is if
it came from I3 (it should go to I3 again). Doing this fixes all of
the regressions.
PR rtl-optimization/64682
PR rtl-optimization/69567
PR rtl-optimization/69737
PR rtl-optimization/82683
* combine.c (distribute_notes) <REG_DEAD>: If the new I2 sets the same
register mentioned in the note, drop the note, unless it came from I3,
in which case it should go to I3 again.
This patch moves the check for an overlapping byte to normalize_ref
from its callers, so that it's easier to convert to poly_ints later.
It's not really worth it on its own.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-dse.c (normalize_ref): Check whether the ranges overlap
and return false if not.
(clear_bytes_written_by, live_bytes_read): Update accordingly.
Most GCC ranges seem to be represented as an offset and a size (rather
than a start and inclusive end or start and exclusive end). The usual
test for whether X is in a range is of course:
x >= start && x < start + size
or:
x >= start && x - start < size
which means that an empty range of size 0 contains nothing. But other
range tests aren't as obvious.
The usual test for whether one range is contained within another
range is:
i.e. the ranges overlap if one range contains the start of the other
range. This leads to strange results like:
(start X, size 0) is a subrange of (start X, size 0) but
(start X, size 0) does not overlap (start X, size 0)
Similarly:
(start 4, size 0) is a subrange of (start 2, size 2) but
(start 4, size 0) does not overlap (start 2, size 2)
It seems like "X is a subrange of Y" should imply "X overlaps Y".
This becomes harder to ignore with the runtime sizes and offsets
added for SVE. The most obvious fix seemed to be to say that
an empty range does not overlap anything, and is therefore not
a subrange of anything.
Using the new definition of subranges didn't seem to cause any
codegen differences in the testsuite. But there was one change
with the new definition of overlapping ranges. strncpy-chk.c has:
The strncpy is detected as a zero-size write, and so with the new
definition of overlapping ranges, we treat the strncpy as having
no effect on the strcmp (which is true). The reaching definition
is the memset instead.
This patch makes ranges_overlap_p return false for zero-sized
ranges, even if the other range has an unknown size.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-alias.h (ranges_overlap_p): Return false if either
range is known to be empty.
in simplify-rtx.c. If we're dealing with CONST_VECTORs, it's better
to use CONST_VECTOR_NUNITS, since that remains constant even after the
SVE patches. In other cases we can get the number from GET_MODE_NUNITS.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
and CONST_VECTOR_NUNITS instead of computing the number of units from
the byte sizes of the vector and element.
(simplify_binary_operation_1): Likewise.
(simplify_const_binary_operation): Likewise.
(simplify_ternary_operation): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254311