Jakub Jelinek [Sat, 19 Nov 2016 18:57:56 +0000 (19:57 +0100)]
i386.c (ix86_can_inline_p): Use || instead of & when checking if callee's isa flags are subset of caller's...
* config/i386/i386.c (ix86_can_inline_p): Use || instead of &
when checking if callee's isa flags are subset of caller's isa flags.
Fix comment wording.
Jakub Jelinek [Sat, 19 Nov 2016 18:56:47 +0000 (19:56 +0100)]
i386.c (def_builtin, [...]): Formatting fixes.
* config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2,
ix86_add_new_builtins): Formatting fixes.
(ix86_expand_builtin): Use || instead of && for isa vs. isa2.
(ix86_get_builtin): Likewise.
* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
don't initialize it, don't use it for the case where it isn't
provable %{z} nor using the same argument, instead move merge
argument into a new pseudo and use that as target. Formatting fixes.
Jakub Jelinek [Fri, 18 Nov 2016 22:21:31 +0000 (23:21 +0100)]
re PR middle-end/78419 (ICE with target_clone on invalid target)
PR middle-end/78419
* multiple_target.c (get_attr_len): Start with argnum and increment
argnum on every arg. Use strchr in a loop instead of counting commas
manually.
(get_attr_str): Increment argnum for every comma in the string.
(separate_attrs): Use for instead of while loop, simplify.
(expand_target_clones): Rename defenition argument to definition.
Free attrs and attr_str even when diagnosing errors. Temporarily
change input_location around targetm.target_option.valid_attribute_p
calls. Don't emit warning or errors if that function fails.
Jakub Jelinek [Fri, 18 Nov 2016 21:55:46 +0000 (22:55 +0100)]
re PR debug/78191 (ICE in calc_die_sizes)
* dwarf2out.c (size_of_discr_list): Fix typo in function comment.
PR debug/78191
* dwarf2out.c (abbrev_opt_base_type_end): New variable.
(die_abbrev_cmp): Sort dies with die_abbrev smaller than
abbrev_opt_base_type_end only by increasing die_abbrev, before
any other dies.
(optimize_abbrev_table): Don't change abbrev numbers of
base types and CU or optimize implicit consts in them if
calc_base_type_die_sizes has been called during build_abbrev_table.
(calc_base_type_die_sizes): If abbrev_opt_start, set
abbrev_opt_base_type_end to one plus largest base type's
die_abbrev.
Ian Lance Taylor [Fri, 18 Nov 2016 17:48:29 +0000 (17:48 +0000)]
runtime: move schedt type and sched var from C to Go
This doesn't change any actual code, it just starts using the Go
definition of the schedt type and the sched variable rather than the C
definitions.
The schedt type is tweaked slightly for gccgo. We aren't going to
release goroutine stacks, so we don't need separate gfreeStack and
gfreeNostack lists. We only have one size of defer function, so we
don't need a list of 5 different pools.
gcc/testsuite/
* gcc.target/arm/optional_thumb-1.c: New test.
* gcc.target/arm/optional_thumb-2.c: New test.
* gcc.target/arm/optional_thumb-3.c: New test.
Dominik Vogt [Fri, 18 Nov 2016 14:50:27 +0000 (14:50 +0000)]
S/390: Lower requirements for successful htm tests.
The attached patch makes the htm tests on s390 less sensitive to
spurious abort. Please check the commit comment for details. The
modified tests have been run once on a zEC12.
gcc/ChangeLog:
2016-11-18 Dominik Vogt <vogt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align
local variables in functions calling alloca. Also update the ASCII
drawings
* config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Likewise.
* config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h.
Dominik Vogt [Fri, 18 Nov 2016 14:28:49 +0000 (14:28 +0000)]
RS6000: Fix PR 77359: Properly align local variables in functions calling alloca.
gcc/ChangeLog:
2016-11-18 Dominik Vogt <vogt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align
local variables in functions calling alloca. Also update the ASCII
drawings
* config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Likewise.
* config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h.
This change makes the code less sensitive to the exact type of the mode,
i.e. it forces a conversion where necessary. This becomes important
when wrappers like scalar_int_mode and scalar_mode can also be used
instead of machine_mode.
Using rtx_mode_t also abstracts away the representation. The fact that
it's a std::pair rather than a custom class isn't important to users of
the interface.
gcc/
2016-11-18 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
This may no longer be necessary with the current version
of the SVE patches, but it does at least make things consistent
with the TYPE_MODE/SET_TYPE_MODE split.
gcc/ada/
2016-11-16 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc-interface/utils.c (create_label_decl): Use SET_DECL_MODE.
gcc/c/
2016-11-16 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* c-decl.c (merge_decls): Use SET_DECL_MODE.
(make_label, finish_struct): Likewise.
gcc/cp/
2016-11-16 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to. This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler. Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.
This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher. For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.
This patch rewrites the algorithm to deal with this. It also makes it
simpler: it does not need the "candidates" array anymore, it does not
need RTL layout mode, it does not need cleanup_cfg, and it does not
need to keep track of what blocks it already visited.
PR rtl-optimization/71785
* bb-reorder.c (maybe_duplicate_computed_goto): New function.
(duplicate_computed_gotos): New function.
(pass_duplicate_computed_gotos::execute): Rewrite.
Ian Lance Taylor [Fri, 18 Nov 2016 04:05:10 +0000 (04:05 +0000)]
Update libgo/configure to restore it to the master version.
Update a few binary files that were changed in the master gc repo,
copied into the gofrontend repo, but not correctly copied into the GCC
repo. The changes are all minor and do not affect any actual tests.
Two instances of "http" changed to "https", and two timestamps were
zeroed out.
Ian Lance Taylor [Fri, 18 Nov 2016 00:15:38 +0000 (00:15 +0000)]
runtime, reflect: rewrite Go to FFI type conversion in Go
As we move toward the Go 1.7 garbage collector, it's essential that all
allocation of values that can contain Go pointers be done using the
correct type descriptor. That is simplest if we do all such allocation
in Go code. This rewrites the code that converts from a Go type to a
libffi CIF into Go.
Jason Merrill [Thu, 17 Nov 2016 22:40:28 +0000 (17:40 -0500)]
PR c++/78193 - inherited ctor regressions on sparc32.
* call.c (build_over_call): Don't set CALL_FROM_THUNK_P here.
(build_call_a): Set it here, and don't insert EMPTY_CLASS_EXPR.
(convert_like_real) [ck_rvalue]: Also pass non-addressable
types along directly.
Andrew Burgess [Thu, 17 Nov 2016 22:40:05 +0000 (22:40 +0000)]
arc/nps400: New peephole2 pattern allow more cmem loads
In the case where we access a single bit from a value and use this in a
EQ/NE comparison, GCC will convert this into a sign-extend and GE/LT
comparison.
Normally this would be fine, however, if the value is in CMEM memory,
then we don't have a sign-extending load available (using the special
short CMEM load instructions), and instead we end up using a long form
load with LIMM, which is less efficient.
This peephole optimisation looks for the sign-extend followed by GE/LT
pattern and converts this back into a load and EQ/NE comparison.
gcc/ChangeLog:
* config/arc/arc.md (cmem bit/sign-extend peephole2): New peephole
to make better use of cmem loads in the case where a single bit is
being accessed.
* config/arc/predicates.md (ge_lt_comparison_operator): New
predicate.
gcc/testsuite/ChangeLog:
* gcc.target/arc/cmem-bit-1.c: New file.
* gcc.target/arc/cmem-bit-2.c: New file.
* gcc.target/arc/cmem-bit-3.c: New file.
* gcc.target/arc/cmem-bit-4.c: New file.
Michael Meissner [Thu, 17 Nov 2016 21:42:13 +0000 (21:42 +0000)]
re PR target/78101 (PowerPC 64-bit little endian fusion failure with -O3 -mcpu=power9)
[gcc]
2016-11-17 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/78101
* config/rs6000/predicates.md (fusion_addis_mem_combo_load): Add
the appropriate checks for SFmode/DFmode load/stores in GPR
registers.
(fusion_addis_mem_combo_store): Likewise.
* config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Rename
fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar
d-form instructions for traditional Altivec registers.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.
* config/rs6000/rs6000.md (p9 fusion store peephole2): Remove
early clobber from scratch register. Do not match if the register
being stored is the scratch register.
(fusion_vsx_<P:mode>_<FPR_FUSION:mode>_load): Rename fusion_fpr_*
to fusion_vsx_* and add in support for ISA 3.0 scalar d-form
instructions for traditional Altivec registers.
(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
(fusion_vsx_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
[gcc/testsuite]
2016-11-17 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/78101
* gcc.target/powerpc/fusion4.c: New test.
Fix PR77933: stack corruption on ARM when using high registers and LR
2016-11-17 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
being live in the function and lr needing to be saved. Distinguish
between already saved pushable registers and registers to push.
Check for LR being an available pushable register.
gcc/testsuite/
PR target/77933
* gcc.target/arm/pr77933-1.c: New test.
* gcc.target/arm/pr77933-2.c: Likewise.
* config/i386/i386.md (cmpstrnsi): New test to bail out if neither
string input is a string constant.
* builtins.c (expand_builtin_strncmp): Attempt expansion of strncmp
via cmpstrnsi even if neither string is constant.
Jakub Jelinek [Thu, 17 Nov 2016 17:09:13 +0000 (18:09 +0100)]
re PR middle-end/78201 (ICE in tree_to_shwi, at tree.h:4037 (seen both on ARM32 an AArch64))
PR middle-end/78201
* varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
Don't test decl != NULL. Don't look at DECL_SIZE, but DECL_SIZE_UNIT
instead, return false if it is NULL, or doesn't fit into uhwi, or
is larger or equal to targetm.max_anchor_offset.
Pip Cet [Thu, 17 Nov 2016 16:16:38 +0000 (16:16 +0000)]
re PR rtl-optimization/78355 (LRA generates unaligned accesses when SLOW_UNALIGNED_ACCESS is 1)
PR rtl-optimization/78355
* doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Document that the macro only
needs to deal with unaligned accesses.
* doc/tm.texi: Regenerate.
* lra-constraints.c (simplify_operand_subreg): Only invoke
SLOW_UNALIGNED_ACCESS on innermode if the MEM is not aligned enough.
Co-Authored-By: Eric Botcazou <ebotcazou@adacore.com>
From-SVN: r242554
David Malcolm [Thu, 17 Nov 2016 15:55:26 +0000 (15:55 +0000)]
Fix locations within raw strings
Whilst investigating PR preprocessor/78324 I noticed that the
substring location code currently doesn't handle raw strings
correctly, by not skipping the 'R', opening quote, delimiter
and opening parenthesis.
For example, an attempt to underline chars 4-7 with caret at 6 of
this raw string yields this erroneous output:
__emit_string_literal_range (R"foo(0123456789)foo",
~~^~
With the patch, the correct range/caret is printed:
gcc/ChangeLog:
* input.c (selftest::test_lexer_string_locations_long_line): New
function.
(selftest::test_lexer_string_locations_raw_string_multiline): New
function.
(selftest::input_c_tests): Call the new functions, via
for_each_line_table_case.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-string-literals-1.c
(test_raw_string_one_liner): New function.
(test_raw_string_multiline): New function.
libcpp/ChangeLog:
* charset.c (cpp_interpret_string_1): Skip locations from
loc_reader when advancing 'p' when handling raw strings.
* config/arc/dp-hack.h (ARC_OPTFPE): Define.
(__ARC_NORM__): Use instead ARC_OPTFPE.
* config/arc/fp-hack.h: Likewise.
* config/arc/lib1funcs.S (ARC_OPTFPE): Define.
(__ARC_MPY__): Use it insetead of __ARC700__ and __HS__.
Richard Biener [Thu, 17 Nov 2016 12:38:47 +0000 (12:38 +0000)]
common.opt (ftree-loop-if-convert-stores): Mark as preserved for backward compatibility.
2016-11-17 Richard Biener <rguenther@suse.de>
* common.opt (ftree-loop-if-convert-stores): Mark as preserved for
backward compatibility.
* doc/invoke.texi (ftree-loop-if-convert-stores): Remove.
* tree-if-conv.c (pass_if_conversion::gate): Do not test
flag_tree_loop_if_convert_stores.
(pass_if_conversion::execute): Likewise.
Richard Biener [Thu, 17 Nov 2016 08:42:50 +0000 (08:42 +0000)]
re PR middle-end/78306 ([CilkPlus] "inlining failed in call to always_inline ‘memset’: function not inlinable" with -fcilkplus)
2016-11-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/78306
* ipa-inline-analysis.c (initialize_inline_failed): Do not
inhibit inlining if function calls cilk_spawn.
(can_inline_edge_p): Likewise.
Andrew Pinski [Thu, 17 Nov 2016 01:19:04 +0000 (17:19 -0800)]
aarch64.opt (mverbose-cost-dump): New option.
2016-11-16 Andrew PInski <apinski@cavium.com>
* config/aarch64/aarch64.opt (mverbose-cost-dump): New option.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Use
flag_aarch64_verbose_cost instead of checking for details dump.
(aarch64_rtx_costs_wrapper): Likewise.
David Tolnay [Wed, 16 Nov 2016 23:09:27 +0000 (23:09 +0000)]
libiberty: Add Rust symbol demangling.
Adds Rust symbol demangler. Rust mangles symbols using GNU_V3 style,
adding a hash and various special character subtitutions. This adds
a new rust style to cplus_demangle and adds 3 helper functions
rust_demangle, rust_demangle_sym and rust_is_mangled.
rust-demangle.c was written by David. Mark did the code formatting to
GNU style and integration into the gcc/libiberty build system and
testsuite.
include/ChangeLog:
2016-11-03 David Tolnay <dtolnay@gmail.com>
Mark Wielaard <mark@klomp.org>
* demangle.h (DMGL_RUST): New macro.
(DMGL_STYLE_MASK): Add DMGL_RUST.
(demangling_styles): Add dlang_rust.
(RUST_DEMANGLING_STYLE_STRING): New macro.
(RUST_DEMANGLING): New macro.
(rust_demangle): New prototype.
(rust_is_mangled): Likewise.
(rust_demangle_sym): Likewise.
libiberty/ChangeLog:
2016-11-03 David Tolnay <dtolnay@gmail.com>
Mark Wielaard <mark@klomp.org>
* Makefile.in (CFILES): Add rust-demangle.c.
(REQUIRED_OFILES): Add rust-demangle.o.
* cplus-dem.c (libiberty_demanglers): Add rust_demangling case.
(cplus_demangle): Handle RUST_DEMANGLING.
(rust_demangle): New function.
* rust-demangle.c: New file.
* testsuite/Makefile.in (really-check): Add check-rust-demangle.
(check-rust-demangle): New rule.
* testsuite/rust-demangle-expected: New file.
Co-Authored-By: Mark Wielaard <mark@klomp.org>
From-SVN: r242524
Bill Schmidt [Wed, 16 Nov 2016 22:17:10 +0000 (22:17 +0000)]
re PR tree-optimization/77848 (Gimple if-conversion results in redundant comparisons)
2016-11-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR tree-optimization/77848
* tree-if-conv.c (version_loop_for_if_conversion): When versioning
an outer loop, only save basic block aux information for the inner
loop.
(versionable_outer_loop_p): New function.
(tree_if_conversion): Version the outer loop instead of the inner
one if the pattern will be recognized for outer-loop
vectorization.
Andrew Burgess [Wed, 16 Nov 2016 22:10:52 +0000 (22:10 +0000)]
gcc: remove unneeded global related to hot/cold partitioning
The `user_defined_section_attribute' is used as part of the condition to
determine if GCC should partition blocks within a function into hot and
cold blocks. This global is initially false, and is set to true from
within the file parse phase of GCC, as part of the attribute handling
hook.
The `user_defined_section_attribute' is reset to false as part of the
final pass of GCC. However, the final pass is part of the optimisation
phase of the compiler, and so if at any point during the file parse
phase any function, or data, has a section attribute the global
`user_defined_section_attribute' will be set to true.
When GCC performs the block partitioning pass on the first function, if
`user_defined_section_attribute' is true then the function will not be
partitioned. Notice though, that due to the above, whether we partition
this first function or not has nothing to do with whether the function
has a section attribute, instead, if any function or data in the parsed
file has a section attribute then we don't partition the first
function.
After performing (or not) the block partitioning pass on the first
function we perform the final pass on the first function, at which point
we reset `user_defined_section_attribute' to false. As parsing is
complete by this point, we will never set
`user_defined_section_attribute' to true after that, and so all of the
following functions will have the partition blocks pass performed on
them, even if the function has a section attribute, and will not be
partitioned.
Luckily we don't end up partitioning functions that should not be
partitioned though. Due to the way that functions are selected during
the assembler writing phase, if a function has a section attribute this
takes priority over any hot/cold block partitioning that has been done.
What we see from the above then is that the
`user_defined_section_attribute' mechanism is broken. It was originally
created when GCC parsed, optimised, and generated assembler function at
a time. Now that we deal with the whole file in one go, we need to
update the mechanism used to gate the block partitioning pass.
This patch does this by looking specifically for a section attribute on
the function DECL, which removes the need for a global variable, and
will work whether we parse the whole file in one go, or one function at
a time.
A few new tests have been added. These check for the case where a
function is not partitioned when it could be.
gcc/ChangeLog:
* gcc/bb-reorder.c: Remove 'toplev.h' include.
(pass_partition_blocks::gate): No longer check
user_defined_section_attribute, instead check the function decl
for a section attribute.
* gcc/c-family/c-attribs.c (handle_section_attribute): No longer
set user_defined_section_attribute.
* gcc/final.c (rest_of_handle_final): Likewise.
* gcc/toplev.c: Remove definition of user_defined_section_attribute.
* gcc/toplev.h: Remove declaration of
user_defined_section_attribute.
gcc/testsuiteChangeLog:
* gcc.dg/tree-prof/section-attr-1.c: New file.
* gcc.dg/tree-prof/section-attr-2.c: New file.
* gcc.dg/tree-prof/section-attr-3.c: New file.
gcc/
* config/mips/mips.c (mips_output_jump): Output R_MICROMIPS_JALR
rather than R_MIPS_JALR relocation in microMIPS code. Do not
cancel short delay slots in PIC call relaxation.
gcc/testsuite/
* gcc.target/mips/call-1.c (dg-options): Add `-mno-micromips'.
(dg-final): Remove microMIPS JALRS mnemonic matching.
* gcc.target/mips/call-2.c (dg-options): Add `-mno-micromips'.
(dg-final): Remove microMIPS JALRS mnemonic matching.
* gcc.target/mips/call-3.c (dg-options): Add `-mno-micromips'.
(dg-final): Remove microMIPS JALRS mnemonic matching.
* gcc.target/mips/call-4.c (dg-options): Add `-mno-micromips'.
* gcc.target/mips/call-5.c (dg-options): Add `-mno-micromips'.
* gcc.target/mips/call-6.c (dg-options): Add `-mno-micromips'.
* gcc.target/mips/call-1u.c: New test case.
* gcc.target/mips/call-2u.c: New test case.
* gcc.target/mips/call-3u.c: New test case.
* gcc.target/mips/call-4u.c: New test case.
* gcc.target/mips/call-5u.c: New test case.
* gcc.target/mips/call-6u.c: New test case.
Wilco Dijkstra [Wed, 16 Nov 2016 18:10:34 +0000 (18:10 +0000)]
Looking at PR77308, one of the issues is that the bswap optimization phase doesn't work on ARM.
Looking at PR77308, one of the issues is that the bswap optimization
phase doesn't work on ARM. This is due to an odd check that uses
SLOW_UNALIGNED_ACCESS (which is always true on ARM). Since the testcase
in PR77308 generates much better code with this patch (~13% fewer
instructions), it seems best to remove this check.
gcc/
* tree-ssa-math-opts.c (bswap_replace): Remove test
of SLOW_UNALIGNED_ACCESS.