Wilco Dijkstra [Mon, 1 Aug 2016 16:37:24 +0000 (16:37 +0000)]
This patch optimizes the prolog and epilog code to reduce the number of instructions and avoid multiple writes to SP.
This patch optimizes the prolog and epilog code to reduce the number of
instructions and avoid multiple writes to SP. The key idea is that epilogs
are almost exact reverses of prologs, and thus all the decisions only need
to be taken once. The frame layout is decided in aarch64_layout_frame()
and decisions recorded in the new aarch64_frame fields initial_adjust,
callee_adjust, callee_offset and final_adjust.
The epilog reverses this, and may omit step 3 if alloca wasn't used.
gcc/
* config/aarch64/aarch64.h (aarch64_frame):
Remove padding0 and hardfp_offset. Add locals_offset,
initial_adjust, callee_adjust, callee_offset and final_adjust.
* config/aarch64/aarch64.c (aarch64_layout_frame):
Remove unused padding0 and hardfp_offset initializations.
Choose frame layout and set frame variables accordingly.
Use INVALID_REGNUM instead of FIRST_PSEUDO_REGISTER.
(aarch64_push_regs): Use INVALID_REGNUM, not FIRST_PSEUDO_REGISTER.
(aarch64_pop_regs): Likewise.
(aarch64_expand_prologue): Remove all decision code, just emit
prolog according to frame variables.
(aarch64_expand_epilogue): Remove all decision code, just emit
epilog according to frame variables.
(aarch64_initial_elimination_offset): Use offset to local/arg area.
testsuite/
* gcc.target/aarch64/test_frame_10.c: Fix test to check for a
single stack adjustment, no writeback.
* gcc.target/aarch64/test_frame_12.c: Likewise.
* gcc.target/aarch64/test_frame_13.c: Likewise.
* gcc.target/aarch64/test_frame_15.c: Likewise.
* gcc.target/aarch64/test_frame_6.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_8.c: Likewise.
* gcc.target/aarch64/test_frame_16.c: New test.
Jason Merrill [Mon, 1 Aug 2016 15:01:03 +0000 (11:01 -0400)]
PR c++/72766 - ICE with VLA
* constexpr.c (cxx_eval_pointer_plus_expression): Check constancy
of nelts.
* cp-gimplify.c (cp_fully_fold): Only maybe_constant_value in
C++11 and up.
H.J. Lu [Mon, 1 Aug 2016 14:46:01 +0000 (14:46 +0000)]
Convert V1TImode register to TImode in debug insn
TImode register referenced in debug insn can be converted to V1TImode by
scalar to vector optimization. When converting a TImode store to V1TImode,
we need to check all debug insns on its use chain to convert the V1TImode
register to SUBREG TImode if source register is undefined.
gcc/
PR target/72748
* config/i386/i386.c (timode_scalar_chain::convert_insn): Call
fix_debug_reg_uses after changing source register mode to
V1TImode if source register is undefined.
gcc/testsuite/
PR target/72748
* gcc.target/i386/pr72748.c: New test.
Jonathan Wakely [Mon, 1 Aug 2016 12:17:43 +0000 (13:17 +0100)]
Run std::ios_base enum tests for C++11 and up
* testsuite/27_io/ios_base/types/fmtflags/case_label.cc: Make test
supported for C++11 and later.
* testsuite/27_io/ios_base/types/iostate/case_label.cc: Likewise.
* testsuite/27_io/ios_base/types/openmode/case_label.cc: Likewise.
Kyrylo Tkachov [Mon, 1 Aug 2016 10:20:03 +0000 (10:20 +0000)]
[AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP
* config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
addresses.
* gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
Jonathan Wakely [Sun, 31 Jul 2016 18:46:30 +0000 (19:46 +0100)]
Fix non-portable std::regex test and test more cases
* testsuite/28_regex/basic_regex/ctors/basic/raw_string.cc: Fix
test to not rely on GNU extension (escaped normal characters in POSIX
BRE). Enable tests for other strings which are now supported.
Steven G. Kargl [Sun, 31 Jul 2016 01:51:37 +0000 (01:51 +0000)]
re PR fortran/41922 (Diagnostic: No location shown for overlappingly initialized EQUIVALENCEd character vars)
2016-07-30 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/41922
* target-memory.c (expr_to_char): Pass in locus and use it in error
messages.
(gfc_merge_initializers): Ditto.
* target-memory.h: Update prototype for gfc_merge_initializers ().
* trans-common.c (get_init_field): Use the correct locus.
2016-07-30 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/41922
* gfortran.dg/equiv_constraint_5.f90: Adjust the error message.
* gfortran.dg/equiv_constraint_7.f90: Ditto.
* gfortran.dg/pr41922.f90: New test.
Michael Meissner [Sat, 30 Jul 2016 22:31:16 +0000 (22:31 +0000)]
rs6000-protos.h (rs6000_adjust_vec_address): New function that takes a vector memory address...
[gcc]
2016-07-30 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000-protos.h (rs6000_adjust_vec_address): New
function that takes a vector memory address, a hard register, an
element number and a temporary base register, and recreates an
address that points to the appropriate element within the vector.
* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Likewise.
(rs6000_split_vec_extract_var): Add support for the target of a
vec_extract with variable element number being a scalar memory
location.
(rtx_is_swappable_p): VLSO insns (UNSPEC_VSX_VSLOW) are not
swappable.
* config/rs6000/vsx.md (vsx_extract_<mode>_load): Replace
vsx_extract_<mode>_load insn with a new insn that optimizes
storing either element to a memory location, using scratch
registers to pick apart the vector and reconstruct the address.
(vsx_extract_<P:mode>_<VSX_D:mode>_load): Likewise.
(vsx_extract_<mode>_store): Rework alternatives to more correctly
support Altivec registers. Add support for ISA 3.0 Altivec d-form
store instruction.
(vsx_extract_<mode>_var): Add support for extracting a variable
element number from memory.
[gcc/testsuite]
2016-07-30 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/vec-extract-2.c: New tests for vec_extract of
vector double or vector long where the vector is in memory.
* gcc.target/powerpc/vec-extract-3.c: Likewise.
* gcc.target/powerpc/vec-extract-4.c: Likewise.
Bin Cheng [Fri, 29 Jul 2016 15:48:25 +0000 (15:48 +0000)]
re PR tree-optimization/57558 (Loop not vectorized if iteration count could be infinite)
PR tree-optimization/57558
* tree-vect-loop-manip.c (vect_create_cond_for_niters_checks): New
function.
(vect_loop_versioning): Support versioning with niter assumptions.
* tree-vect-loop.c (tree-ssa-loop.h): Include header file.
(vect_get_loop_niters): New parameter. Reimplement to support
assumptions in loop niter info.
(vect_analyze_loop_form_1, vect_analyze_loop_form): Ditto.
(new_loop_vec_info): Init LOOP_VINFO_NITERS_ASSUMPTIONS.
(vect_estimate_min_profitable_iters): Use LOOP_REQUIRES_VERSIONING.
Support loop versioning for niters.
* tree-vectorizer.c (tree-ssa-loop-niter.h): Include header file.
(vect_free_loop_info_assumptions): New function.
(vectorize_loops): Free loop niter info for loops with flag
LOOP_F_ASSUMPTIONS set if vectorization failed.
* tree-vectorizer.h (struct _loop_vec_info): New field
num_iters_assumptions.
(LOOP_VINFO_NITERS_ASSUMPTIONS): New macro.
(LOOP_REQUIRES_VERSIONING_FOR_NITERS): New macro.
(LOOP_REQUIRES_VERSIONING): New macro.
(vect_free_loop_info_assumptions): New decl.
gcc/testsuite
PR tree-optimization/57558
* gcc.dg/vect/pr57558-1.c: New test.
* gcc.dg/vect/pr57558-2.c: New test.
Jason Merrill [Fri, 29 Jul 2016 14:03:26 +0000 (10:03 -0400)]
PR c++/72457 - ICE with list-value-initialized base.
* init.c (expand_aggr_init_1): Only handle value-init of bases.
* constexpr.c (build_data_member_initialization): Handle multiple
initializers for the same field.
re PR rtl-optimization/71976 (insn-combiner deletes a live 64-bit shift)
gcc/
PR rtl-optimization/71976
* combine.c (get_last_value): Return 0 if the argument for which
the function is called has a wider mode than the recorded value.
Jonathan Wakely [Fri, 29 Jul 2016 10:42:17 +0000 (11:42 +0100)]
New libstdc++ symbol version for new basic_string symbols
* acinclude.m4 (libtool_VERSION): Bump to 6:23:0.
* config/abi/pre/gnu.ver: Add 3.4.23 version for new basic_string
symbols.
* configure: Regenerate.
* testsuite/util/testsuite_abi.cc: Add new symbol version.
gfortran: Fix allocation of diagnostig string (was too small).
The attached patch fixes an out of bound write to memory allocated
with alloca() on the stack. This rarely ever happened because on
one hand -fbounds-check needs to be enabled, and on the other hand
alloca() used to allocate a few bytes extra most of the time so
most of the time the excess write did no harm.
gcc/fortran/ChangeLog:
* trans-array.c (gfc_conv_array_ref): Fix allocation of diagnostic
message (was too small).
Michael Meissner [Thu, 28 Jul 2016 21:02:06 +0000 (21:02 +0000)]
rs6000-protos.h (rs6000_split_vec_extract_var): New declaration.
[gcc]
2016-07-28 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000-protos.h (rs6000_split_vec_extract_var):
New declaration.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Add support for vec_extract of vector double or vector long having
a variable element number on 64-bit ISA 2.07 systems or newer.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract):
Likewise.
(rs6000_split_vec_extract_var): New function to split a
vec_extract built-in function with variable element number.
(rtx_is_swappable_p): Variable vec_extracts and shifts are not
swappable.
* config/rs6000/vsx.md (UNSPEC_VSX_VSLO): New unspec.
(UNSPEC_VSX_EXTRACT): Likewise.
(vsx_extract_<mode>, VSX_D iterator): Fix constraints to allow
direct move instructions to be generated on 64-bit ISA 2.07
systems and newer, and to take advantage of the ISA 3.0 MFVSRLD
instruction.
(vsx_vslo_<mode>): New insn to do VSLO on V2DFmode and V2DImode
arguments for vec_extract variable element.
(vsx_extract_<mode>_var, VSX_D iterator): New insn to support
vec_extract with variable element on V2DFmode and V2DImode
vectors.
* config/rs6000/rs6000.h (TARGET_VEXTRACTUB): Remove
-mupper-regs-df requirement, since it isn't needed.
(TARGET_DIRECT_MOVE_64BIT): New macro to say whether we can
do direct moves on 64-bit systems, which allows optimization of
vec_extract on 64-bit ISA 2.07 systems and newer.
[gcc/testsuite]
2016-07-28 Michael Meissner <meissner@linux.vnet.ibm.com>
Jonathan Wakely [Thu, 28 Jul 2016 21:00:34 +0000 (22:00 +0100)]
Use dg-additional-options in libstdc++ tests
* testsuite/17_intro/headers/c++2011/stdc++.cc: Change target-specific
dg-options to dg-additional-options so that default options are used.
* testsuite/17_intro/headers/c++2011/stdc++_multiple_inclusion.cc:
Likewise.
* testsuite/17_intro/headers/c++2014/stdc++.cc: Likewise.
* testsuite/17_intro/headers/c++2014/stdc++_multiple_inclusion.cc:
Likewise.
* testsuite/29_atomics/atomic_flag/test_and_set/explicit-hle.cc:
Use dg-additional-options instead of repeating the common options.
Paul Thomas [Thu, 28 Jul 2016 14:47:02 +0000 (14:47 +0000)]
[multiple changes]
2016-07-28 Steven G. Kargl <kargl@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/71883
* frontend-passes.c (gfc_run_passes): Bail out if there are any
errors.
* error.c (gfc_internal_error): If there are any errors in the
buffer, exit with EXIT_FAILURE.
2016-07-28 Paul Thomas <pault@gcc.gnu.org>
PR fortran/71883
* gfortran.dg/pr71883.f90 : New test.
On AArch64 the UXTB and UXTH instructions are aliases of UBFM,
which does a shift as part of its operation. An AND immediate is a
simpler operation, and might be faster on some implementations, so
it is better to emit this this instead of UBFM.
Benchmarking showed no difference on implementations where UBFM has
the same performance as AND, and minor speedups across several
benchmarks on an implementation where UBFM is slower than AND.
Bootstrapped and tested on aarch64-none-elf.
gcc/
* config/aarch64/aarch64.md
(zero_extend<SHORT:mode><GPI:mode>2_aarch64): Change output
statement and type.
(<optab>qihi2_aarch64): Likewise, and split into two.
(extendqihi2_aarch64): New.
(zero_extendqihi2_aarch64): New.
* config/aarch64/iterators.md (ldrxt): Remove.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Change cost of
uxtb/uxth.
This patch improves the readability of the prolog and epilog code by moving some code into separate functions.
This patch improves the readability of the prolog and epilog code by moving
some code into separate functions. There is no difference in generated code.
gcc/
* config/aarch64/aarch64.c (aarch64_pushwb_pair_reg): Rename.
(aarch64_push_reg): New function to push 1 or 2 registers.
(aarch64_pop_reg): New function to pop 1 or 2 registers.
(aarch64_expand_prologue): Use aarch64_push_regs.
(aarch64_expand_epilogue): Use aarch64_pop_regs.
re PR middle-end/71734 (FAIL: libgomp.fortran/simd4.f90 -O3 -g execution test)
gcc/
2016-07-28 Yuri Rumyantsev <ysrumyan@gmail.com>
PR tree-optimization/71734
* tree-ssa-loop-im.c (ref_indep_loop_p_1): Pass value of safelen
attribute instead of REF_LOOP and use it.
(ref_indep_loop_p_2): Use SAFELEN argument instead of REF_LOOP and
set it for Loops having non-zero safelen attribute.
(ref_indep_loop_p): Pass zero as initial value for safelen.
gcc/testsuite/
2016-07-28 Yuri Rumyantsev <ysrumyan@gmail.com>
PR tree-optimization/71734
* g++.dg/vect/pr70729-nest.cc: New test.