jcmvbkbc [Mon, 8 May 2017 23:53:14 +0000 (23:53 +0000)]
xtensa: add support for SSP
gcc/
2017-05-08 Max Filippov <jcmvbkbc@gmail.com>
* config/xtensa/xtensa-protos.h
(xtensa_initial_elimination_offset): New declaration.
* config/xtensa/xtensa.c (xtensa_initial_elimination_offset):
New function. Move its body from the INITIAL_ELIMINATION_OFFSET
macro definition, add case for FRAME_POINTER_REGNUM when
FRAME_GROWS_DOWNWARD.
* config/xtensa/xtensa.h (FRAME_GROWS_DOWNWARD): New macro
definition.
(INITIAL_ELIMINATION_OFFSET): Replace body with call to
xtensa_initial_elimination_offset.
tkoenig [Mon, 8 May 2017 18:22:44 +0000 (18:22 +0000)]
2017-05-08 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/79930
* frontend-passes.c (matmul_to_var_expr): New function,
add prototype.
(matmul_to_var_code): Likewise.
(optimize_namespace): Use them from gfc_code_walker.
2017-05-08 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/79930
* gfortran.dg/inline_transpose_1.f90: Add
-finline-matmul-limit=0 to options.
* gfortran.dg/matmul_5.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gfortran.dg/inline_matmul_14.f90: New test.
* gfortran.dg/inline_matmul_15.f90: New test.
nathan [Mon, 8 May 2017 17:59:03 +0000 (17:59 +0000)]
* decl.c (builtin_function_1): Set DCL_ANTICIPATED before pushing.
(start_preparsed_function): Do decl pushing before setting
current_funciton_decl and announcing it.
rsandifo [Mon, 8 May 2017 16:18:49 +0000 (16:18 +0000)]
[AArch64] Tighten move constraints for symbolic operands
The movsi and movdi constraints allowed the source to be any
absolute symbolic expression ("S"). That's OK for operands that
have already been vetted by the aarch64_mov_operand predicate but
causes problems if the register allocator substitutes an equivalence
(the usual "the constraints can't accept more than the predicates"
restriction).
Although all other uses of "S" in the backend are redundant and could
in principle be removed, "S" itself is a publicly-documented constraint
and so we'd have to keep its definition. This patch therefore adds a
new "Usa" constraint for legitimate absolute address operands.
2017-05-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/constraints.md (Usa): New constraint.
* config/aarch64/aarch64.md (*movsi_aarch64, *movdi_aarch64): Use it.
thopre01 [Mon, 8 May 2017 14:35:56 +0000 (14:35 +0000)]
Define TM_MULTILIB_CONFIG for ARM multilib
TM_MULTILIB_CONFIG is not set in config.gcc when building with multilib
for arm targets, leading to config/arm/t-multilib not including any of
the files (t-aprofile and t-rmprofile) definining the architecture and
FPU to build multilib for. This patch fixes that by setting
TM_MULTILIB_CONFIG to with_multilib_list's value after it has been
checked. It also fix a trailing whitespace issue.
2017-05-08 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config.gcc (arm*-*-*): Set TM_MULTILIB_CONFIG from
with_multilib_list after it has been checked.
rguenth [Mon, 8 May 2017 12:52:44 +0000 (12:52 +0000)]
2017-05-08 Richard Biener <rguenther@suse.de>
* tree-vrp.c (gimple_assign_nonzero_warnv_p): Rename to ...
(gimple_assign_nonzero): ... this and remove strict_overflow_p
argument.
(gimple_stmt_nonzero_warnv_p): Rename to ...
(gimple_stmt_nonzero_p): ... this and remove strict_overflow_p
argument.
(vrp_stmt_computes_nonzero): Remove strict_overflow_p argument.
(extract_range_basic): Adjust, do not disable propagation on
strict overflow sensitive simplification.
(vrp_visit_cond_stmt): Likewise.
law [Sun, 7 May 2017 15:10:55 +0000 (15:10 +0000)]
2017-05-07 Jeff Law <law@redhat.com>
Revert:
2017-05-06 Jeff Law <law@redhat.com>
PR tree-optimization/78496
* tree-vrp.c (simplify_assert_expr_using_ranges): Remove debugging
code.
PR tree-optimization/78496
* tree-vrp.c (simplify_assert_expr_using_ranges): New function.
(simplify_stmt_using_ranges): Call it.
(vrp_dom_walker::before_dom_children): Extract equivalences
from an ASSERT_EXPR with an equality comparison against a
constant.
Revert:
2017-05-06 Jeff Law <law@redhat.com>
PR tree-optimization/78496
* gcc.dg/tree-ssa/ssa-thread-16.c: New test.
* gcc.dg/tree-ssa/ssa-thread-17.c: New test.
law [Sat, 6 May 2017 15:03:40 +0000 (15:03 +0000)]
PR tree-optimization/78496
* tree-vrp.c (simplify_assert_expr_using_ranges): New function.
(simplify_stmt_using_ranges): Call it.
(vrp_dom_walker::before_dom_children): Extract equivalences
from an ASSERT_EXPR with an equality comparison against a
constant.
PR tree-optimization/78496
* gcc.dg/tree-ssa/ssa-thread-16.c: New test.
* gcc.dg/tree-ssa/ssa-thread-17.c: New test.
rsandifo [Sat, 6 May 2017 07:46:48 +0000 (07:46 +0000)]
Record equivalences for spill registers
If we decide to allocate a call-clobbered register R to a value that
is live across a call, LRA will create a new spill register TMPR,
insert:
TMPR <- R
before the call and
R <- TMPR
after it. But if we then failed to allocate a register to TMPR, we would
always spill it to the stack, even if R was known to be equivalent to
a constant or to some existing memory location. And on AArch64, we'd
always fail to allocate such a register for 128-bit Advanced SIMD modes,
since no registers of those modes are call-preserved.
This patch avoids the problem by copying the equivalence information
from the original pseudo to the spill register. It means that the
code for the testcase is as good with -O2 as it is with -O,
whereas previously the -O code was better.
[Based on the code ARM contributed in branches/ARM/sve-branch@247248]
2017-05-06 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* lra-constraints.c (lra_copy_reg_equiv): New function.
(split_reg): Use it to copy equivalence information from the
original register to the spill register.
gcc/testsuite/
* gcc.target/aarch64/spill_1.c: New test.
uros [Sat, 6 May 2017 07:01:51 +0000 (07:01 +0000)]
* config/i386/i386.c (ext_80387_constant_init): Do not explicitly
initialize to zero.
(init_regs): Remove declaration.
(function_arg_advance_32): Initialize error_p as boolean variable.
dmalcolm [Fri, 5 May 2017 21:03:07 +0000 (21:03 +0000)]
Get rid of macros for diagnostic_report_current_module
diagnostic.h has a couple of macros (diagnostic_last_module_changed
and diagnostic_set_last_module) which are only used within
diagnostic_report_current_module.
This patch eliminates the macros in favor of static functions within
diagnostic.c.
No functional change intended.
gcc/ChangeLog:
* diagnostic.c (last_module_changed_p): New function.
(set_last_module): New function.
(diagnostic_report_current_module): Convert macro usage to
the above functions.
* diagnostic.h (diagnostic_context::last_module): Strengthen
from const line_map * to const line_map_ordinary *.
(diagnostic_last_module_changed): Delete macro.
(diagnostic_set_last_module): Delete macro.
dmalcolm [Fri, 5 May 2017 20:56:36 +0000 (20:56 +0000)]
diagnostic.c: add print_option_information
This patch simplifies diagnostic_report_diagnostic by moving
option-printing to a new subroutine.
Doing so required a slight rewrite. In both the old and new
code, context->option_name returns a malloc-ed string.
The old behavior was to then use ACONCAT to manipulate the
format_spec, appending the option metadata.
ACONCAT calcs the buffer size, then uses alloca, and then copies the
data to the on-stack buffer.
Given the alloca, this needs rewriting when moving the printing to
a subroutine. In the new version, the metadata is simply printed
using pp_* calls (so it's hitting the obstack within the
pretty_printer).
This means we can get rid of the save/restore of format_spec: I don't
believe anything else in the code modifies it.
It also seems inherently simpler; it seems odd to me to be
appending metadata to the formatting string, rather than simply
printing the metadata after the formatted string is printed
(the old code also assumed that no option name contained a '%').
No functional change intended.
gcc/ChangeLog:
* diagnostic.c (diagnostic_report_diagnostic): Eliminate
save/restor of format_spec. Move option-printing code to...
(print_option_information): ...this new function, and
reimplement by simply printing to the pretty_printer,
rather than appending to the format string.
palmer [Fri, 5 May 2017 20:24:46 +0000 (20:24 +0000)]
RISC-V: Add -mstrict-align option
The RISC-V user ISA permits misaligned accesses, but they may trap
and be emulated. That emulation software needs to be compiled assuming
strict alignment.
Even when strict alignment is not required, set SLOW_UNALIGNED_ACCESS
based upon -mtune to avoid a performance pitfall.
gcc/ChangeLog:
2017-05-04 Andrew Waterman <andrew@sifive.com>
* config/riscv/riscv.opt (mstrict-align): New option.
* config/riscv/riscv.h (STRICT_ALIGNMENT): Use it. Update comment.
(SLOW_UNALIGNED_ACCESS): Define.
(riscv_slow_unaligned_access): Declare.
* config/riscv/riscv.c (riscv_tune_info): Add slow_unaligned_access
field.
(riscv_slow_unaligned_access): New variable.
(rocket_tune_info): Set slow_unaligned_access to true.
(optimize_size_tune_info): Set slow_unaligned_access to false.
(riscv_cpu_info_table): Add entry for optimize_size_tune_info.
(riscv_valid_lo_sum_p): Use TARGET_STRICT_ALIGN.
(riscv_option_override): Set riscv_slow_unaligned_access.
* doc/invoke.texi: Add -mstrict-align to RISC-V.
meissner [Fri, 5 May 2017 20:21:15 +0000 (20:21 +0000)]
[gcc]
2017-05-05 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/79038
PR target/79202
PR target/79203
* config/rs6000/rs6000.md (u code attribute): Add FIX and
UNSIGNED_FIX.
(extendsi<mode>2): Add support for doing sign extension via
VUPKHSW and XXPERMDI if the value is in Altivec registers and we
don't have ISA 3.0 instructions.
(extendsi<mode>2 splitter): Likewise.
(fix_trunc<mode>si2): If we are at ISA 2.07 (VSX small integer),
generate the normal insns since SImode can now go in vector
registers. Disallow the special UNSPECs needed for previous
machines to hide SImode being used. Add new insns
fctiw{,w}_<mode>_smallint if SImode can go in vector registers.
(fix_trunc<mode>si2_stfiwx): Likewise.
(fix_trunc<mode>si2_internal): Likewise.
(fixuns_trunc<mode>si2): Likewise.
(fixuns_trunc<mode>si2_stfiwx): Likewise.
(fctiw<u>z_<mode>_smallint): Likewise.
(fctiw<u>z_<mode>_mem): New combiner pattern to prevent conversion
of floating point to 32-bit integer from doing a direct move to
the GPR registers to do a store.
(fctiwz_<mode>): Break long line.
[gcc/testsuite]
2017-05-05 Michael Meissner <meissner@linux.vnet.ibm.com>
sje [Fri, 5 May 2017 17:00:46 +0000 (17:00 +0000)]
2017-05-05 Steve Ellcey <sellcey@cavium.com>
* doc/invoke.texi (-fopt-info): Explicitly say order of options
included in -fopt-info does not matter.
* doc/optinfo.texi (-fopt-info): Fix description of default
behavour. Explicitly say order of options included in -fopt-info
does not matter.
thopre01 [Fri, 5 May 2017 16:50:40 +0000 (16:50 +0000)]
[ARM] Allow combination of aprofile and rmprofile multilibs
2017-05-05 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables. Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES. Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively. Add comment to introduce all
matches. Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.
wilco [Fri, 5 May 2017 16:18:17 +0000 (16:18 +0000)]
Float to int moves currently generate inefficient code due to
hacks used in the movsi and movdi patterns. The 'r = w' variant
uses '*' which tells the register allocator to ignore it.
As a result the float to int moves typically spill to the stack,
which is extremely inefficient.
gcc/
* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
(movdi_aarch64): Likewise.
jakub [Fri, 5 May 2017 16:02:44 +0000 (16:02 +0000)]
PR tree-optimization/80632
* tree-switch-conversion.c (struct switch_conv_info): Add target_vop
field.
(build_arrays): Initialize it for virtual phis.
(fix_phi_nodes): Use it for virtual phis.
jakub [Fri, 5 May 2017 15:43:22 +0000 (15:43 +0000)]
PR tree-optimization/80558
* tree-vrp.c (extract_range_from_binary_expr_1): Optimize
[x, y] op z into [x op, y op z] for op & or | if conditions
are met.
thopre01 [Fri, 5 May 2017 15:41:28 +0000 (15:41 +0000)]
[ARM] PR71607: Fix ICE when loading constant
2017-05-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Prakhar Bahuguna <prakhar.bahuguna@arm.com>
gcc/
PR target/71607
* config/arm/arm.md (use_literal_pool): Remove.
(64-bit immediate split): No longer takes cost into consideration
if arm_disable_literal_pool is enabled.
* config/arm/arm.c (arm_tls_referenced_p): Add diagnostic if TLS is
used when arm_disable_literal_pool is enabled.
(arm_max_const_double_inline_cost): Remove use of
arm_disable_literal_pool.
(push_minipool_fix): Add assert.
(arm_reorg): Add return if arm_disable_literal_pool is enabled.
* config/arm/vfp.md (no_literal_pool_df_immediate): New.
(no_literal_pool_sf_immediate): New.
2017-05-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Thomas Preud'homme <thomas.preudhomme@arm.com>
Prakhar Bahuguna <prakhar.bahuguna@arm.com>
wilco [Fri, 5 May 2017 09:40:01 +0000 (09:40 +0000)]
Code scheduling for Cortex-A53 isn't as good as it could be. It turns out
code runs faster overall if we place loads and stores with a dependency
closer together. To achieve this effect, this patch adds a bypass between
cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
earlier load is used in an address calculation. This significantly improved
benchmark scores in a proprietary benchmark suite.
gcc/
* config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
New function.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/aarch-common-protos.h
(arm_early_load_addr_dep_ptr): Add prototype.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/cortex-a53.md: Add new bypasses.
jakub [Fri, 5 May 2017 07:35:13 +0000 (07:35 +0000)]
* tree.c (next_type_uid): Change type to unsigned.
(type_hash_canon): Decrement back next_type_uid if
freeing a type node with the highest TYPE_UID. For INTEGER_TYPEs
also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
if possible.
wilco [Thu, 4 May 2017 17:52:03 +0000 (17:52 +0000)]
Many supported cores use the AUTOPREFETCHER_WEAK setting which tries
to order loads and stores to improve streaming performance. Since significant
gains were reported in http://patchwork.ozlabs.org/patch/534469/ it seems
like a good idea to enable this setting too for -mcpu=generic. Since the
weak model only keeps the order if it doesn't make the schedule worse, it
should not impact performance adversely on cores that don't show a gain.
wilco [Thu, 4 May 2017 17:49:19 +0000 (17:49 +0000)]
Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on
average with no obvious performance difference. See original discussion of
the overheads of various alignments:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html.
gcc/
* config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(cortexa73_tunings): Likewise.
wilco [Thu, 4 May 2017 17:43:43 +0000 (17:43 +0000)]
With -mcpu=generic the loop alignment is currently 4. All but one of the
supported cores use 8 or higher. Since using 8 provides performance gains
on several cores, it is best to use that by default. As discussed in [1],
the jump alignment has no effect on performance, yet has a relatively high
codesize cost [2], so setting it to 4 is best. This gives a 0.2% overall
codesize improvement as well as performance gains in several benchmarks.
gcc/
* config/aarch64/aarch64.c (generic_tunings): Set jump alignment to 4.
Set loop alignment to 8.
wilco [Thu, 4 May 2017 17:05:28 +0000 (17:05 +0000)]
All cores which add a cpu_addrcost_table use a non-zero value for
HI and TI mode shifts (a non-zero value for general indexing also
applies to all shifts). Given this, it makes no sense to use a
different setting in generic_addrcost_table. So change it so that
all supported cores, including -mcpu=generic, now generate the same:
int f(short *p, short *q, long x) { return p[x] + q[x]; }
jamborm [Thu, 4 May 2017 16:19:20 +0000 (16:19 +0000)]
[PR 80622] Treat const pools as initialized in SRA
2017-05-04 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/80622
* tree-sra.c (comes_initialized_p): New function.
(build_accesses_from_assign): Only set write lazily when
comes_initialized_p is false.
(analyze_access_subtree): Use comes_initialized_p.
(propagate_subaccesses_across_link): Assert !comes_initialized_p
instead of testing for PARM_DECL.
ktkachov [Thu, 4 May 2017 16:14:37 +0000 (16:14 +0000)]
[AArch64] Accept more addressing modes for PRFM
* config/aarch64/aarch64.md (prefetch); Adjust predicate and
constraint on operand 0 to allow more general addressing modes.
Adjust output template.
* config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
New function.
* config/aarch64/aarch64-protos.h
(aarch64_address_valid_for_prefetch_p): Declare prototype.
* config/aarch64/constraints.md (Dp): New address constraint.
* config/aarch64/predicates.md (aarch64_prefetch_operand): New
predicate.
* gcc.target/aarch64/prfm_imm_offset_1.c: New test.
hubicka [Thu, 4 May 2017 13:57:35 +0000 (13:57 +0000)]
* ipa-cp.c (perform_estimation_of_a_value): Drop base_time parameter;
update use of estimate_ipcp_clone_size_and_time.
(estimate_local_effects): Update use of
estimate_ipcp_clone_size_and_time and perform_estimation_of_a_value.
* ipa-inline.h (estimate_ipcp_clone_size_and_time): Update prototype.
* ipa-inline-analysis.c (estimate_ipcp_clone_size_and_time):
Return nonspecialized time.
rguenth [Thu, 4 May 2017 13:29:08 +0000 (13:29 +0000)]
2017-05-04 Richard Biener <rguenther@suse.de>
* tree-ssa-alias.c (get_continuation_for_phi): Improve looking
for the last VUSE which def dominates the PHI. Directly call
maybe_skip_until.
(get_continuation_for_phi_1): Remove.
rsandifo [Thu, 4 May 2017 11:37:05 +0000 (11:37 +0000)]
Cap niter_for_unrolled_loop to upper bound
For the reasons explained in PR77536, niter_for_unrolled_loop assumes 5
iterations in the absence of profiling information, although it doesn't
increase beyond the estimate for the original loop. This left a hole in
which the new estimate could be less than the old one but still greater
than the limit imposed by CEIL (nb_iterations_upper_bound, unroll factor).
2017-05-04 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Add commentary
to explain the use of truncating division. Cap the number of
iterations to the maximum given by nb_iterations_upper_bound,
if defined.
gcc/testsuite/
* gcc.dg/vect/vect-profile-1.c: New test.
thopre01 [Thu, 4 May 2017 10:26:25 +0000 (10:26 +0000)]
[ARM] Enable Purecode for ARMv8-M Baseline
This patch adds support for purecode to ARMv8-M Baseline, in addition to
the existing support for ARMv7-M and ARMv8-M Mainline.
2017-05-04 Prakhar Bahuguna <prakhar.bahuguna@arm.com>
Andre Simoes Dias Vieira <andre.simoesdiasvieira@arm.com>
gcc/
* config/arm/arm.md (movsi): Change TARGET_32BIT to TARGET_HAVE_MOVT.
(movt splitter): Likewise.
* config/arm/arm.c (arm_option_check_internal): Change arm_arch_thumb2
to TARGET_HAVE_MOVT, and merge with -mslow-flash-data check.
(const_ok_for_arm): Change else to else if (TARGET_THUMB2) and add else
block for Thumb-1 with MOVT.
(thumb2_legitimate_address_p): Move code block ...
(can_avoid_literal_pool_for_label_p): ... into this new function.
(thumb1_legitimate_address_p): Add check for TARGET_HAVE_MOVT and
literal pool.
(thumb_legitimate_constant_p): Add conditional on TARGET_HAVE_MOVT
* doc/invoke.texi (-mpure-code): Change "ARMv7-M targets" for
"M-profile targets with the MOVT instruction".
gcc/testsuite/
* gcc.target/arm/pure-code/pure-code.exp: Add conditional for
check_effective_target_arm_thumb1_movt_ok.
thopre01 [Thu, 4 May 2017 10:16:04 +0000 (10:16 +0000)]
[ARM] Rename FPSCR builtins to correct names
The GCC documentation in section 6.60.8 ARM Floating Point Status and
Control Intrinsics states that the FPSCR register can be read and
written to using the intrinsics __builtin_arm_get_fpscr and
__builtin_arm_set_fpscr. However, these are misnamed within GCC itself
and these intrinsic names are not recognised.
This patch corrects the intrinsic names to match the documentation, and
adds tests to verify these intrinsics generate the correct
instructions.
gcc/
* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
__builtin_arm_stfscr to __builtin_arm_set_fpscr.
gcc/testsuite/
* gcc.target/arm/fpscr.c: New file.
rguenth [Thu, 4 May 2017 09:08:01 +0000 (09:08 +0000)]
2017-05-04 Richard Biener <rguenther@suse.de>
* tree.c (array_at_struct_end_p): Handle arrays at struct
end with flexarrays more conservatively. Refactor and treat
arrays of arrays or aggregates more strict. Fix
VIEW_CONVERT_EXPR handling. Remove allow_compref argument.
* tree.c (array_at_struct_end_p): Adjust prototype.
* emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust.
* gimple-fold.c (get_range_strlen): Likewise.
* tree-chkp.c (chkp_may_narrow_to_field): Likewise.