jakub [Tue, 30 Jan 2018 20:03:04 +0000 (20:03 +0000)]
PR rtl-optimization/83986
* sched-deps.c (sched_analyze_insn): For frame related insns, add anti
dependence against last_pending_memory_flush in addition to
pending_jump_insns.
aoliva [Tue, 30 Jan 2018 17:40:50 +0000 (17:40 +0000)]
[PR81611] accept copies in simple_iv_increment_p
If there are copies between the GIMPLE_PHI at the loop body and the
increment that reaches it (presumably through a back edge), still
regard it as a simple_iv_increment, so that we won't consider the
value in the back edge eligible for forwprop. Doing so would risk
making the phi node and the incremented conflicting value live
within the loop, and the phi node to be preserved for propagated
uses after the loop.
jakub [Tue, 30 Jan 2018 15:58:22 +0000 (15:58 +0000)]
PR tree-optimization/84111
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip
inner loops added during recursion, as they don't have up-to-date
SSA form.
hubicka [Tue, 30 Jan 2018 13:17:40 +0000 (13:17 +0000)]
PR lto/83954
* lto-symtab.c (warn_type_compatibility_p): Silence false positive
for type match warning on arrays of pointers.
* gcc.dg/lto/pr83954.h: New testcase.
* gcc.dg/lto/pr83954_0.c: New testcase.
* gcc.dg/lto/pr83954_1.c: New testcase.
rguenth [Tue, 30 Jan 2018 11:19:47 +0000 (11:19 +0000)]
2018-01-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/83008
* tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost
invariant and constant vector uses in stmts when they need
more than one stmt.
rsandifo [Tue, 30 Jan 2018 09:48:24 +0000 (09:48 +0000)]
[AArch64] Fix sve/extract_[12].c for big-endian SVE
sve/extract_[12].c were relying on the target-independent optimisation
that removes a redundant vec_select, so that we don't end up with
things like:
dup v0.4s, v0.4s[0]
...use s0...
But that optimisation rightly doesn't trigger for big-endian targets,
because GCC expects lane 0 to be in the high part of the register
rather than the low part.
SVE breaks this assumption -- see the comment at the head of
aarch64-sve.md for details -- so the optimisation is valid for
both endiannesses. Long term, we probably need some kind of target
hook to make GCC aware of this.
But there's another problem with the current extract pattern: it doesn't
tell the register allocator how cheap an extraction of lane 0 is with
tied registers. It seems better to split the lane 0 case out into
its own pattern and use tied operands for the FPR<-SIMD case,
so that using different registers has the cost of an extra reload.
I think we want this for both endiannesses, regardless of the hook
described above.
Also, the gen_lowpart in this pattern fails for aarch64_be due to
TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
instead. We're only creating this rtl in order to print it, so there's
no need for anything fancier.
2018-01-30 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
pattern.
(*vec_extract<mode><Vel>_v128): Require a nonzero lane number.
Use gen_rtx_REG rather than gen_lowpart.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@257178 138bc75d-0d04-0410-961f-82ee72b054a4
rsandifo [Tue, 30 Jan 2018 09:45:58 +0000 (09:45 +0000)]
Fix LRA subreg calculation for big-endian targets
LRA was using a subreg offset of 0 whenever constraints matched
two operands with different modes. That leads to an invalid offset
(and ICE) on big-endian targets if one of the modes is narrower
than a word. E.g. if a (reg:SI X) is matched to a (reg:QI Y),
the big-endian subreg should be (subreg:QI (reg:SI X) 3) rather
than (subreg:QI (reg:SI X) 0).
But this raises the issue of what the behaviour should be when the
matched operands occupy different numbers of registers. Should the
register numbers match, or should the locations of the lsbs match?
Although the documentation isn't clear, reload went for the second
interpretation (which seems the most natural to me):
/* On a REG_WORDS_BIG_ENDIAN machine, point to the last register of a
multiple hard register group of scalar integer registers, so that
for example (reg:DI 0) and (reg:SI 1) will be considered the same
register. */
So I think this means that we can/must use the lowpart offset
unconditionally, rather than trying to separate out the multi-register
case. This also matches the LRA handling of constant integers, which
already uses lowpart subregs.
The patch fixes gcc.target/aarch64/sve/extract_[34].c for aarch64_be.
2018-01-30 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* lra-constraints.c (match_reload): Use subreg_lowpart_offset
rather than 0 when creating partial subregs.
ktkachov [Tue, 30 Jan 2018 09:13:39 +0000 (09:13 +0000)]
[testsuite] XFAIL gcc.dg/tree-ssa/ssa-dom-cse-2.c on non-NEON arm targets
This test fails to optimise away the PLUS reduction in the loop on arm targets when vectorisation
is not enabled due to absence of SIMD instructions.
From reading the logs and the PR I gather that the presence or absence of SIMD affects the passing of this test
on other targets as well, as evidenced by the long list of xfail targets.
This list looks quite unwieldy to me, but here is a patch adding non-NEON arm to that list.
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: XFAIL on !arm_neon arm targets.
law [Tue, 30 Jan 2018 05:30:40 +0000 (05:30 +0000)]
PR testsuite/81010
* gcc.target/powerpc/pr56605.c: Update various dg- directives to
better match other tests which require vsx. Verify the zero
extension is part of the test in the combiner dump.
ian [Tue, 30 Jan 2018 04:48:55 +0000 (04:48 +0000)]
internal/syscall/unix: add randomTrap for sh/shbe
CL 84555 added support for the SuperH architecture, but didn't add the
randomTrap definition to be used for the getrandom syscall on Linux.
Add it now.
meissner [Mon, 29 Jan 2018 22:30:34 +0000 (22:30 +0000)]
2018-01-29 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81550
* config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): If DFmode
and SFmode can go in Altivec registers (-mcpu=power7 for DFmode,
-mcpu=power8 for SFmode) don't set the PRE_INCDEC or PRE_MODIFY
flags. This restores the settings used before the 2017-07-24.
Turning off pre increment/decrement/modify allows IVOPTS to
optimize DF/SF loops where the index is an int.
ian [Mon, 29 Jan 2018 20:58:23 +0000 (20:58 +0000)]
compiler: don't insert write barriers if we've seen errors
The compiler skips the escape analysis pass if it has seen any errors.
The write barrier pass, especially the check-escapes portion, relies
on escape analysis running. So don't run this pass if there have been
any errors, as it may cause further unreliable error reports.
rguenth [Mon, 29 Jan 2018 15:22:55 +0000 (15:22 +0000)]
2018-01-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/84086
* tree-ssanames.c: Include cfgloop.h and tree-scalar-evolution.h.
(flush_ssaname_freelist): When SSA names were released reset
the SCEV hash table.
redi [Mon, 29 Jan 2018 12:33:32 +0000 (12:33 +0000)]
PR libstdc++/83658 fix exception-safety in std::any::emplace
PR libstdc++/83658
* include/std/any (any::__do_emplace): Only set _M_manager after
constructing the contained object.
* testsuite/20_util/any/misc/any_cast_neg.cc: Adjust dg-error line.
* testsuite/20_util/any/modifiers/83658.cc: New test.
jakub [Sat, 27 Jan 2018 06:27:47 +0000 (06:27 +0000)]
* c-cppbuiltin.c (c_cpp_builtins): Use ggc_strdup for the fp_suffix
argument.
(LAZY_HEX_FP_VALUES_CNT): Define.
(lazy_hex_fp_values): Allow up to LAZY_HEX_FP_VALUES_CNT lazy hex fp
values rather than just 12.
(builtin_define_with_hex_fp_value): Likewise.
* include/cpplib.h (enum cpp_builtin_type): Change BT_LAST_USER from
BT_FIRST_USER + 31 to BT_FIRST_USER + 63.
segher [Fri, 26 Jan 2018 21:08:47 +0000 (21:08 +0000)]
rs6000: Fix safe-indirect-jump-[18].c
This patch merges the safe-indirect-jump-1.c and -8.c testcases,
since they do the same thing. On the 64-bit and AIX ABIs the indirect
call is not a sibcall, since there is code generated after the call
(the restore of r2). On the 32-bit non-AIX ABIs it is a sibcall.
* gcc.target/powerpc/safe-indirect-jump-1.c: Build on all targets.
Make expected output depend on whether we expect sibcalls or not.
* gcc.target/powerpc/safe-indirect-jump-8.c: Delete (merged into
safe-indirect-jump-1.c).
kargl [Fri, 26 Jan 2018 19:33:16 +0000 (19:33 +0000)]
2018-01-26 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/83998
* simplify.c (compute_dot_product): Initialize result to INTEGER(1) 0
or .false. The summation does the correct type conversion.
(gfc_simplify_dot_product): Special case zero-sized arrays.
This patch fixes the testsuite failures gcc.target/aarch64/subs_compare_1.c and subs_compare_2.c
The tests check that we combine a sequence like:
sub w2, w0, w1
cmp w0, w1
into
subs w2, w0, w1
This is done by a couple of peepholes in aarch64.md.
Unfortunately due to scheduling and other optimisations the SUB and CMP
can come in a different order:
cmp w0, w1
sub w0, w0, w1
And the existing peepholes cannot catch that and we fail to combine the two.
This patch adds a peephole that matches the CMP as the first insn and the SUB as the second
and outputs a SUBS. This is almost equivalent to the existing peephole that matches SUB first and CMP second
except that it doesn't have the restriction that the output register of the SUB has to not be one of the input registers.
Remember "sub w0, w0, w1 ; cmp w0, w1" is *not* equivalent to: "subs w0, w0, w1"
but "cmp w0, w1 ; sub w0, w0, w1" is.
So this is what this patch does. It adds a peephole for the case above and one for the SUB-immediate variant
(because the SUB-immediate is represented as PLUS-of-negated-immediate and thus has different RTL structure).
Bootstrapped and tested on aarch64-none-linux-gnu.
* config/aarch64/aarch64.md: Add peepholes for CMP + SUB -> SUBS
and CMP + SUB-immediate -> SUBS.
rguenth [Fri, 26 Jan 2018 14:50:25 +0000 (14:50 +0000)]
2018-01-26 Richard Biener <rguenther@suse.de>
PR rtl-optimization/84003
* dse.c (record_store): Only record redundant stores when
the earlier store aliases at least all accesses the later one does.
* g++.dg/torture/pr77745.C: Mark foo noinline to trigger
latent bug in DSE if NOINLINE is appropriately defined.
* g++.dg/torture/pr77745-2.C: New testcase including pr77745.C
and defining NOINLINE.
ktkachov [Fri, 26 Jan 2018 14:34:34 +0000 (14:34 +0000)]
[arm] XFAIL advsimd-intrinsics/vld1x2.c
This recently added test fails on arm. We haven't implemented these intrinsics for arm
(any volunteers?) so for now let's XFAIL these on that target.
Also, the float64 versions of these intrinsics are not supposed to be available on arm
so this patch slightly adjusts the test to not include them for aarch32.
In any case the entire test is XFAILed on arm, so this doesn't have any noticeable
effect.
The same number of tests (PASS) still occur on aarch64 but now they appear as XFAIL
rather than FAIL on arm.
* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Make float64
tests specific to aarch64. XFAIL test on arm.
* config/arc/arc-arch.h (arc_tune_attr): Add ARC_TUNE_CORE_3.
* config/arc/arc.c (arc_sched_issue_rate): Use ARC_TUNE_... .
(arc_init): Likewise.
(arc_override_options): Likewise.
(arc_file_start): Choose Tag_ARC_CPU_variation based on arc_tune
value.
(hwloop_fail): Use TARGET_DBNZ when we want to check for dbnz insn
support.
* config/arc/arc.h (TARGET_DBNZ): Define.
* config/arc/arc.md (attr tune): Add core_3, use ARC_TUNE_... to
properly set the tune attribute.
(dbnz): Use TARGET_DBNZ guard.
* config/arc/arc.opt (mtune): Add core3 option.
claziss [Fri, 26 Jan 2018 11:34:16 +0000 (11:34 +0000)]
[ARC] Rework delegitimate_address hook
Delegitimize address is used to undo the obfuscating effect of PIC
addresses, returning the address in a way which is understood by the
compiler. The old version of the hook was outdated, not beeing able
to recognize the current addresses generated by the ARC backend.
claziss [Fri, 26 Jan 2018 11:33:22 +0000 (11:33 +0000)]
[ARC] Add JLI support.
The ARCv2 ISA provides the JLI instruction, which is two-byte instructions
that can be used to reduce code size in an application. To make use of it,
we provide two new function attributes 'jli_always' and 'jli_fixed' which
will force the compiler to call the indicated function using a jli_s
instruction. The compiler also generates the entries in the JLI table for
the case when we use 'jli_always' attribute. In the case of 'jli_fixed'
the compiler assumes a fixed position of the function into JLI
table. Thus, the user needs to provide an assembly file with the JLI table
for the final link. This is usefully when we want to have a table in ROM
and a second table in the RAM memory.
The jli instruction usage can be also forced without the need to annotate
the source code via '-mjli-always' command.
gcc/
2018-01-26 Claudiu Zissulescu <claziss@synopsys.com>
John Eric Martin <John.Martin@emmicro-us.com>
sebastianperta [Fri, 26 Jan 2018 10:55:31 +0000 (10:55 +0000)]
2018-01-25 Sebastian Perta <sebastian.perta@renesas.com>
* config/rl78/rl78.c: if operand 2 is const avoid addition with 0
and use incw and decw where possible
* testsuite/gcc.target/rl78/test_addsi3_internal.c: new file
rguenth [Fri, 26 Jan 2018 10:30:36 +0000 (10:30 +0000)]
2018-01-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/81082
* fold-const.c (fold_plusminus_mult_expr): Do not perform the
association if it requires casting to unsigned.
* match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
from fold_plusminus_mult_expr to catch important cases late when
range info is available.
* gcc.dg/vect/pr81082.c: New testcase.
* gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n + n
simplification to n * n.
ian [Thu, 25 Jan 2018 23:10:35 +0000 (23:10 +0000)]
compiler: deref receiver types in mangled names
This was the original intent, as reflected in the long comment at the
start of names.cc, but I forgot to implement it.
Also, remove a leading ".0" from the final name. That could occur for
a method whose receiver type starts with 'u', as in that case we
prepend a space to the mangled name, to avoid confusion with the
Unicode mangling, and the space turns into ".0".
Also, if the Unicode encoding would cause the final to start with
"..u" or "..U", add a leading underscore.
Patch gotest to not get fooled by some names.
The result of these changes is that all symbols start with a letter or
an underscore.
pault [Thu, 25 Jan 2018 19:09:40 +0000 (19:09 +0000)]
2018-25-01 Paul Thomas <pault@gcc.gnu.org>
PR fortran/37577
* array.c (gfc_match_array_ref): If standard earlier than F2008
it is an error if the reference dimension is greater than 7.
libgfortran.h : Increase GFC_MAX_DIMENSIONS to 15. Change the
dtype masks and shifts accordingly.
* trans-array.c (gfc_conv_descriptor_dtype): Use the dtype
type node to check the field.
(gfc_conv_descriptor_dtype): Access the rank field of dtype.
(duplicate_allocatable_coarray): Access the rank field of the
dtype descriptor rather than the dtype itself.
* trans-expr.c (get_scalar_to_descriptor_type): Store the type
of 'scalar' on entry and use its TREE_TYPE if it is ARRAY_TYPE
(ie. a character).
(gfc_conv_procedure_call): Pass TREE_OPERAND (tmp,0) to
get_scalar_to_descriptor_type if the actual expression is a
constant.
(gfc_trans_structure_assign): Assign the rank directly to the
dtype rank field.
* trans-intrinsic.c (gfc_conv_intrinsic_rank): Cast the result
to default integer kind.
(gfc_conv_intrinsic_sizeof): Obtain the element size from the
'elem_len' field of the dtype.
* trans-io.c (gfc_build_io_library_fndecls): Replace
gfc_int4_type_node with dtype_type_node where necessary.
(transfer_namelist_element): Use gfc_get_dtype_rank_type for
scalars.
* trans-types.c : Provide 'get_dtype_type_node' to acces the
dtype_type_node and, if necessary, build it.
The maximum size of an array element is now determined by the
maximum value of size_t.
Update the description of the array descriptor, including the
type def for the dtype_type.
(gfc_get_dtype_rank_type): Build a constructor for the dtype.
Distinguish RECORD_TYPEs that are BT_DERIVED or BT_CLASS.
(gfc_get_array_descriptor_base): Change the type of the dtype
field to dtype_type_node.
(gfc_get_array_descr_info): Get the offset to the rank field of
the dtype.
* trans-types.h : Add a prototype for 'get_dtype_type_node ()'.
* trans.h : Define the indices of the dtype fields.
2018-25-01 Paul Thomas <pault@gcc.gnu.org>
PR fortran/37577
* gfortran.dg/coarray_18.f90: Allow dimension 15 for F2008.
* gfortran.dg/coarray_lib_this_image_2.f90: Change 'array1' to
'array01' in the tree dump comparison.
* gfortran.dg/coarray_lib_token_4.f90: Likewise.
* gfortran.dg/inline_sum_1.f90: Similar - allow two digits.
* gfortran.dg/rank_1.f90: Allow dimension 15 for F2008.
2018-25-01 Paul Thomas <pault@gcc.gnu.org>
PR fortran/37577
* caf/single.c (_gfortran_caf_failed_images): Access the 'type'
and 'elem_len' fields of the dtype instead of the shifts.
(_gfortran_caf_stopped_images): Likewise.
* intrinsics/associated.c (associated): Compare the 'type' and
'elem_len' fields instead of the dtype.
* caf/date_and_time.c : Access the dtype fields rather using
shifts and masks.
* io/transfer.c (transfer_array ): Comment on item count.
(set_nml_var,st_set_nml_var): Change dtype type and use fields.
(st_set_nml_dtio_var): Likewise.
* libgfortran.h : Change definition of GFC_ARRAY_DESCRIPTOR and
add a typedef for the dtype_type. Change the GFC_DTYPE_* macros
to access the dtype fields.
ian [Thu, 25 Jan 2018 18:14:04 +0000 (18:14 +0000)]
* elf.c (elf_open_debugfile_by_debuglink): Don't check CRC if the
desired CRC is zero.
(elf_add): Don't clear *found_sym and *found_dwarf if debuginfo.
ian [Thu, 25 Jan 2018 17:44:19 +0000 (17:44 +0000)]
runtime: fix lfstackUnpack on ia64
The top three region number bits must be masked out before
right-shifting the address bits into place, otherwise they will be
copied down into the lower always-zero address bits.
hubicka [Thu, 25 Jan 2018 17:24:06 +0000 (17:24 +0000)]
PR middle-end/83055
* predict.c (drop_profile): Do not push/pop cfun; update also
node->count.
(handle_missing_profiles): Fix logic looking for zero profiles.