Michael Clark [Tue, 7 Nov 2017 16:58:37 +0000 (16:58 +0000)]
RISC-V: Define MUSL_DYNAMIC_LINKER
Use no suffix at all in the musl dynamic linker name for hard
float ABI. Use -sf and -sp suffixes in musl dynamic linker name
for soft float and single precision ABIs. The following table
outlines the musl interpreter names for the RISC-V ABI names.
[AArch64] Use aarch64_reg_or_imm instead of nonmemory_operand
Some of the shift expanders accepted nonmemory_operands but were only
able to handle register_operands or CONST_INTs. This is probably
academic without SVE, since we're not likely to see shifts by other
types of constant (const_wide_ints, consts, etc). But for SVE,
it's possible for a vectorised shift induction to have a CONST_POLY_INT
shift amount.
This patch makes the expanders use aarch64_reg_or_imm instead.
2017-11-07 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/aarch64.md (ashl<mode>3, ashr<mode>3, lshr<mode>3)
(rotr<mode>3, rotl<mode>3): Use aarch64_reg_or_imm instead of
nonmmory_operand.
Sudakshina Das [Tue, 7 Nov 2017 12:23:38 +0000 (12:23 +0000)]
PR80131: Simplification of 1U << (31 - x)
Currently the code A << (B - C) is not simplified.
However at least a more specific case of 1U << (C -x) where
C = precision(type) - 1 can be simplified to (1 << C) >> x.
This is done by adding a new simplification rule in match.pd.
2017-11-07 Sudakshina Das <sudi.das@arm.com>
gcc/
PR middle-end/80131
* match.pd: Simplify 1 << (C - x) where C = precision (x) - 1.
testsuite/
PR middle-end/80131
* testsuite/gcc.dg/pr80131-1.c: New Test.
Alan Modra [Tue, 7 Nov 2017 04:54:01 +0000 (15:24 +1030)]
Require ngettext in test of system gettext implementation
gcc currently uses ngettext in a number of places (gcc/cp/pt.c,
gcc/diagnostic.c, gcc/collect2.c). Apparently there are (or used to
be) gettext implementations that lack ngettext. See config/gettext.m4.
This patch arranges for intl/ to be compiled when the system gettext
lacks ngettext.
* configure.ac: Invoke AM_GNU_GETTEXT with need_ngettext.
* configure: Regenerate.
We want to actually use isel, so we shouldn't disable it. It is
already not set by default on CPUs that don't have it, or where we
do not want to use it.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
disable isel if it was not set explicitly.
James Bowman [Tue, 7 Nov 2017 01:10:18 +0000 (01:10 +0000)]
FT32 makes use of multiple address spaces.
FT32 makes use of multiple address spaces. When trying to inspect
objects in GDB, GDB was treating them as a straight "const". The cause
seems to be in GCC DWARF2 output.
This output is handled in gcc/gcc/dwarf2out.c, where modified_type_die()
checks that TYPE has qualifiers CV_QUALS. However while TYPE has
ADDR_SPACE qualifiers, the modified_type_die() explicitly discards the
ADDR_SPACE qualifiers.
This patch retains the ADDR_SPACE qualifiers as modified_type_die()
outputs the DWARF type tree. This allows the types to match, and correct
type information for the object is emitted.
[gcc]
2017-11-06 James Bowman <james.bowman@ftdichip.com>
"make check" runs make recursively to check each package. Pass
the flags through. So it is possible to run "make check" with
different settings easily.
[AArch64] Pass number of units to aarch64_expand_vec_perm(_const)
This patch passes the number of units to aarch64_expand_vec_perm
and aarch64_expand_vec_perm_const, which avoids a to_constant ()
once GET_MODE_NUNITS is variable.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Take the number of units too.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Likewise.
* config/aarch64/aarch64-simd.md (vec_perm_const<mode>)
(vec_perm<mode>): Update accordingly.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254469
[AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half
This patch passes the number of units to aarch64_simd_vect_par_cnst_half,
which avoids a to_constant () once GET_MODE_NUNITS is variable.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_simd_vect_par_cnst_half):
Take the number of units too.
* config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Likewise.
(aarch64_simd_check_vect_par_cnst_half): Update call accordingly,
but check for a vector mode before rather than after the call.
* config/aarch64/aarch64-simd.md (aarch64_split_simd_mov<mode>)
(move_hi_quad_<mode>, vec_unpack<su>_hi_<mode>)
(vec_unpack<su>_lo_<mode, vec_widen_<su>mult_lo_<mode>)
(vec_widen_<su>mult_hi_<mode>, vec_unpacks_lo_<mode>)
(vec_unpacks_hi_<mode>, aarch64_saddl2<mode>, aarch64_uaddl2<mode>)
(aarch64_ssubl2<mode>, aarch64_usubl2<mode>, widen_ssum<mode>3)
(widen_usum<mode>3, aarch64_saddw2<mode>, aarch64_uaddw2<mode>)
(aarch64_ssubw2<mode>, aarch64_usubw2<mode>, aarch64_sqdmlal2<mode>)
(aarch64_sqdmlsl2<mode>, aarch64_sqdmlal2_lane<mode>)
(aarch64_sqdmlal2_laneq<mode>, aarch64_sqdmlsl2_lane<mode>)
(aarch64_sqdmlsl2_laneq<mode>, aarch64_sqdmlal2_n<mode>)
(aarch64_sqdmlsl2_n<mode>, aarch64_sqdmull2<mode>)
(aarch64_sqdmull2_lane<mode>, aarch64_sqdmull2_laneq<mode>)
(aarch64_sqdmull2_n<mode>): Update accordingly.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254468
[AArch64] Pass number of units to aarch64_reverse_mask
This patch passes the number of units to aarch64_reverse_mask,
which avoids a to_constant () once GET_MODE_NUNITS is variable.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_reverse_mask): Take
the number of units too.
* config/aarch64/aarch64.c (aarch64_reverse_mask): Likewise.
* config/aarch64/aarch64-simd.md (vec_load_lanesoi<mode>)
(vec_store_lanesoi<mode>, vec_load_lanesci<mode>)
(vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
(vec_store_lanesxi<mode>): Update accordingly.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254467
Later patches turn the number of vector units into a poly_int.
We deliberately don't support applying GEN_INT to those (except
in target code that doesn't distinguish between poly_ints and normal
constants); gen_int_mode needs to be used instead.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_endian_lane_rtx): Declare.
* config/aarch64/aarch64.c (aarch64_endian_lane_rtx): New function.
* config/aarch64/aarch64.h (ENDIAN_LANE_N): Take the number
of units rather than the mode.
* config/aarch64/iterators.md (nunits): New mode attribute.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
Use aarch64_endian_lane_rtx instead of GEN_INT (ENDIAN_LANE_N ...).
* config/aarch64/aarch64-simd.md (aarch64_dup_lane<mode>)
(aarch64_dup_lane_<vswap_width_name><mode>, *aarch64_mul3_elt<mode>)
(*aarch64_mul3_elt_<vswap_width_name><mode>): Likewise.
(*aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt<mode>): Likewise.
(*aarch64_mla_elt_<vswap_width_name><mode>, *aarch64_mls_elt<mode>)
(*aarch64_mls_elt_<vswap_width_name><mode>, *aarch64_fma4_elt<mode>)
(*aarch64_fma4_elt_<vswap_width_name><mode>):: Likewise.
(*aarch64_fma4_elt_to_64v2df, *aarch64_fnma4_elt<mode>): Likewise.
(*aarch64_fnma4_elt_<vswap_width_name><mode>): Likewise.
(*aarch64_fnma4_elt_to_64v2df, reduc_plus_scal_<mode>): Likewise.
(reduc_plus_scal_v4sf, reduc_<maxmin_uns>_scal_<mode>): Likewise.
(reduc_<maxmin_uns>_scal_<mode>): Likewise.
(*aarch64_get_lane_extend<GPI:mode><VDQQH:mode>): Likewise.
(*aarch64_get_lane_zero_extendsi<mode>): Likewise.
(aarch64_get_lane<mode>, *aarch64_mulx_elt_<vswap_width_name><mode>)
(*aarch64_mulx_elt<mode>, *aarch64_vgetfmulx<mode>): Likewise.
(aarch64_sq<r>dmulh_lane<mode>, aarch64_sq<r>dmulh_laneq<mode>)
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Likewise.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l_lane<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l_laneq<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l2_lane<mode>_internal): Likewise.
(aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal): Likewise.
(aarch64_sqdmull_lane<mode>, aarch64_sqdmull_laneq<mode>): Likewise.
(aarch64_sqdmull2_lane<mode>_internal): Likewise.
(aarch64_sqdmull2_laneq<mode>_internal): Likewise.
(aarch64_vec_load_lanesoi_lane<mode>): Likewise.
(aarch64_vec_store_lanesoi_lane<mode>): Likewise.
(aarch64_vec_load_lanesci_lane<mode>): Likewise.
(aarch64_vec_store_lanesci_lane<mode>): Likewise.
(aarch64_vec_load_lanesxi_lane<mode>): Likewise.
(aarch64_vec_store_lanesxi_lane<mode>): Likewise.
(aarch64_simd_vec_set<mode>): Update use of ENDIAN_LANE_N.
(aarch64_simd_vec_setv2di): Likewise.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com> Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254466
Wilco Dijkstra [Mon, 6 Nov 2017 19:26:27 +0000 (19:26 +0000)]
[Arm] Cleanup IT attributes
A recent change to remove the movdi_vfp_cortexa8 meant that ldrd was used in
ITs block even when arm_restrict_it was enabled. Rather than just fixing this
latent issue, change the default of predicable_short_it to "no" so that only
16-bit instructions need to be marked with it. As a result there are far fewer
patterns that need the attribute, and omitting predicable_short_it is no longer
causing issues.
* config/arm/arm.md (predicable_short_it): Change default to "no",
improve documentation, remove uses that are identical to the default.
(enabled_for_depr_it): Rename to enabled_for_short_it.
* gcc/config/arm/arm-fixed.md (predicable_short_it): Remove default uses.
* gcc/config/arm/ldmstm.md (predicable_short_it): Likewise.
* gcc/config/arm/sync.md (predicable_short_it): Likewise.
* gcc/config/arm/thumb2.md (predicable_short_it): Likewise.
* gcc/config/arm/vfp.md (predicable_short_it): Likewise.
re PR target/82748 (ICE with __builtin_fabsq and __float128 in copy_to_mode_reg, at explow.c:612)
[gcc]
2017-11-06 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/82748
* config/rs6000/rs6000-builtin.def (BU_FLOAT128_1): Delete
float128 helper macros, which are no longer used after deleting
the old 'q' built-in functions, and moving the round to odd
built-in functions to being special built-in functions.
(BU_FLOAT128_2): Likewise.
(BU_FLOAT128_1_HW): Likewise.
(BU_FLOAT128_2_HW): Likewise.
(BU_FLOAT128_3_HW): Likewise.
(FABSQ): Delete old 'q' built-in functions.
(COPYSIGNQ): Likewise.
(SQRTF128_ODD): Move round to odd built-in functions to be
special built-in functions, so that we can handle
-mabi=ieeelongdouble.
(TRUNCF128_ODD): Likewise.
(ADDF128_ODD): Likewise.
(SUBF128_ODD): Likewise.
(MULF128_ODD): Likewise.
(DIVF128_ODD): Likewise.
(FMAF128_ODD): Likewise.
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Map old 'q'
built-in names to 'f128'.
* config/rs6000/rs6000.c (rs6000_fold_builtin): Remove folding the
old 'q' built-in functions, as the machine independent code for
'f128' built-in functions handles this.
(rs6000_expand_builtin): Add expansion for float128 round to odd
functions, keying off on -mabi=ieeelongdouble of whether to use
the KFmode or TFmode variant.
(rs6000_init_builtins): Initialize the _Float128 round to odd
built-in functions.
* doc/extend.texi (PowerPC Built-in Functions): Document the old
_Float128 'q' built-in functions are now mapped into the new
'f128' built-in functions.
[gcc/testsuite]
2017-11-06 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/82748
* gcc.target/powerpc/pr82748-1.c: New test.
* gcc.target/powerpc/pr82748-2.c: Likewise.
Jakub Jelinek [Mon, 6 Nov 2017 16:29:11 +0000 (17:29 +0100)]
re PR tree-optimization/82838 (ICE in verify_ssa failed w/ store-merging)
PR tree-optimization/82838
* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Call force_gimple_operand_1
on a separate gimple_seq which is then appended to seq.
In this PR we tried to create a widening multiply of two 3-bit numbers,
but that isn't a widening multiply at the optab/rtl level, since both
the input and output still have the same mode.
We could trap this either in is_widening_mult_p or (as the patch does)
in the routines that actually ask for an optab. The latter seemed
more natural since is_widening_mult_p doesn't otherwise care about modes.
2017-11-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
PR tree-optimization/82816
* tree-ssa-math-opts.c (convert_mult_to_widen): Return false
if the modes of the two types are the same.
(convert_plusminus_to_widen): Likewise.
gcc/testsuite/
* gcc.c-torture/compile/pr82816.c: New test.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254454
Bill Schmidt [Mon, 6 Nov 2017 13:47:46 +0000 (13:47 +0000)]
[gcc]
2017-11-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/altivec.md (*p9_vadu<mode>3) Rename to
p9_vadu<mode>3.
(usadv16qi): New define_expand.
(usadv8hi): New define_expand.
[gcc/testsuite]
2017-11-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/sad-vectorize-1.c: New file.
* gcc.target/powerpc/sad-vectorize-2.c: New file.
* gcc.target/powerpc/sad-vectorize-3.c: New file.
* gcc.target/powerpc/sad-vectorize-4.c: New file.
* testsuite/libgomp.c++/loop-2.C: Return a value
for functions with non-void return type, or change type to void,
or add -Wno-return-type for test.
* testsuite/libgomp.c++/loop-4.C: Likewise.
* testsuite/libgomp.c++/parallel-1.C: Likewise.
* testsuite/libgomp.c++/shared-1.C: Likewise.
* testsuite/libgomp.c++/single-1.C: Likewise.
* testsuite/libgomp.c++/single-2.C: Likewise.
2017-11-06 Martin Liska <mliska@suse.cz>
* testsuite/27_io/basic_fstream/cons/char/path.cc (main):
Return a value for functions with non-void return type,
or change type to void, or add -Wno-return-type for test.
* testsuite/27_io/basic_ifstream/cons/char/path.cc (main):
Likewise.
* testsuite/27_io/basic_ofstream/open/char/path.cc (main):
Likewise.
PR c++/80955
* lex.c (lex_string): When checking for a valid macro for the
warning related to -Wliteral-suffix (CPP_W_LITERAL_SUFFIX),
check that the macro name does not start with an underscore
before calling is_macro().
Martin Liska [Mon, 6 Nov 2017 09:02:15 +0000 (10:02 +0100)]
Instrument function exit with __builtin_unreachable in C++
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* c-opts.c (c_common_post_options): Set -Wreturn-type for C++
FE.
* c.opt: Set default value of warn_return_type.
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* constexpr.c (cxx_eval_builtin_function_call): Handle
__builtin_unreachable call.
(get_function_named_in_call): Declare function earlier.
(constexpr_fn_retval): Skip __builtin_unreachable.
* cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
...
(cp_maybe_instrument_return): ... this.
(cp_genericize): Call the function unconditionally.
2017-11-06 Martin Liska <mliska@suse.cz>
PR middle-end/82404
* options.c (gfc_post_options): Set default value of
-Wreturn-type to false.
...to avoid a warning about uninitialised wide_ints.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vrp.c (vrp_int_const_binop): Return true on success and
return the value by pointer.
(extract_range_from_multiplicative_op_1): Update accordingly.
Return as soon as an operation fails.
Thomas Koenig [Sun, 5 Nov 2017 17:24:37 +0000 (17:24 +0000)]
re PR fortran/82471 (Reorder loop for unfavorable index ordering in DO CONCURRENT and FORALL)
2017-11-05 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/82471
* lang.opt (ffrontend-loop-interchange): New option.
(Wfrontend-loop-interchange): New option.
* options.c (gfc_post_options): Handle ffrontend-loop-interchange.
* frontend-passes.c (gfc_run_passes): Run
optimize_namespace if flag_frontend_optimize or
flag_frontend_loop_interchange are set.
(optimize_namespace): Run functions according to flags set;
also call index_interchange.
(ind_type): New function.
(has_var): New function.
(index_cost): New function.
(loop_comp): New function.
2017-11-05 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/82471
* gfortran.dg/loop_interchange_1.f90: New test.
Paul Thomas [Sun, 5 Nov 2017 14:32:05 +0000 (14:32 +0000)]
re PR fortran/78641 ([OOP] ICE on polymorphic allocatable function in array constructor)
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/78641
* resolve.c (resolve_ordinary_assign): Do not add the _data
component for class valued array constructors being assigned
to derived type arrays.
* trans-array.c (gfc_trans_array_ctor_element): Take the _data
of class valued elements for assignment to derived type arrays.
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/78641
* gfortran.dg/class_66.f90: New test.
Paul Thomas [Sun, 5 Nov 2017 12:38:42 +0000 (12:38 +0000)]
re PR fortran/81447 ([7/8] gfortran fails to recognize the exact dynamic type of a polymorphic entity that was allocated in a external procedure)
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/81447
PR fortran/82783
* resolve.c (resolve_component): There is no need to resolve
the components of a use associated vtype.
(resolve_fl_derived): Unconditionally generate a vtable for any
module derived type, as long as the standard is F2003 or later
and it is not a vtype or a PDT template.
2017-11-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/81447
* gfortran.dg/class_65.f90: New test.
* gfortran.dg/alloc_comp_basics_1.f90: Increase builtin_free
count from 18 to 21.
* gfortran.dg/allocatable_scalar_9.f90: Increase builtin_free
count from 32 to 54.
* gfortran.dg/auto_dealloc_1.f90: Increase builtin_free
count from 4 to 10.
* gfortran.dg/coarray_lib_realloc_1.f90: Increase builtin_free
count from 3 to 6. Likewise _gfortran_caf_deregister from 2 to
3, builtin_malloc from 1 to 4 and builtin_memcpy|= MEM from
2 to 5.
* gfortran.dg/finalize_28.f90: Increase builtin_free
count from 3 to 6.
* gfortran.dg/move_alloc_15.f90: Increase builtin_free and
builtin_malloc counts from 11 to 14.
* gfortran.dg/typebound_proc_27.f03: Increase builtin_free
count from 7 to 10. Likewise builtin_malloc from 12 to 15.
Tom de Vries [Sun, 5 Nov 2017 09:58:27 +0000 (09:58 +0000)]
Remove semicolon after do {} while (0) in DEF_SANITIZER_BUILTIN
2017-11-05 Tom de Vries <tom@codesourcery.com>
PR other/82784
* asan.c (DEF_SANITIZER_BUILTIN_1): Factor out of ...
(DEF_SANITIZER_BUILTIN): ... here.
(initialize_sanitizer_builtins): Use DEF_SANITIZER_BUILTIN_1 instead of
DEF_SANITIZER_BUILTIN in if stmt. Add missing semicolon.
Michael Clark [Sun, 5 Nov 2017 00:42:54 +0000 (00:42 +0000)]
RISC-V: Emit "i" suffix for instructions with immediate operands
This changes makes GCC asm output use instruction names that are
consistent with the RISC-V ISA manual. The assembler accepts
immediate-operand instructions without the "i" suffix, so this all
worked before, it's just a bit cleaner to match the ISA manual more
closely.
Andrew Waterman [Sun, 5 Nov 2017 00:30:40 +0000 (00:30 +0000)]
RISC-V: Set SLOW_BYTE_ACCESS=1
When implementing the RISC-V port, I took the name of this macro at
face value. It appears we were mistaken in what this means, here's a
quote from the SPARC port that better describes what SLOW_BYTE_ACCESS
does
/* Nonzero if access to memory by bytes is slow and undesirable.
For RISC chips, it means that access to memory by bytes is no
better than access by words when possible, so grab a whole word
and maybe make use of that. */
I've added the comment to our port as well.
See https://gcc.gnu.org/ml/gcc/2017-08/msg00202.html for more
discussion. Thanks to Michael Clark and Andrew Pinski for the help!
gcc/ChangeLog
2017-11-04 Andrew Waterman <andrew@sifive.com>
* config/riscv/riscv.h (SLOW_BYTE_ACCESS): Change to 1.
Daniel Santos [Sat, 4 Nov 2017 22:38:43 +0000 (22:38 +0000)]
PR target/82002 Part 2: Correct non-immediate offset/invalid INSN
When we are realigning the stack pointer, making an ms_abi to sysv_abi
call and allocating 2GiB or more on the stack we end up with an invalid
INSN due to a non-immediate offset. This occurs both with and without
-mcall-ms2sysv-xlogues. Additionally, the stack allocation with
-mcall-ms2sysv-xlogues is ignoring (silently disabling) stack checking,
stack clash checking and probing.
This patch fixes these problems by:
1. No longer allocate stack space in ix86_emit_outlined_ms2sysv_save.
2. Rearrange where we emit SSE saves or stub call:
a. Before frame allocation when offset from frame to save area is >= 2GiB.
b. After frame allocation when frame is < 2GiB. (Stack allocations
prior to the stub call can't be combined with those afterwards, so
this is better when possible.)
3. Modify choose_baseaddr to take an optional scratch_regno argument
and never return rtx that cannot be used as an immediate.
gcc:
config/i386/i386.c (choose_basereg): Use optional scratch
register and add assertion.
(x86_emit_outlined_ms2sysv_save): Use scratch register when
needed, and don't allocate stack.
(ix86_expand_prologue): Rearrange where SSE saves/stub call is
emitted, correct wrong allocation with -mcall-ms2sysv-xlogues.
(ix86_emit_outlined_ms2sysv_restore): Fix non-immediate offsets.
gcc/testsuite:
gcc.target/i386/pr82002-2a.c: Change from xfail to fail.
gcc.target/i386/pr82002-2b.c: Likewise.
Andreas Tobler [Sat, 4 Nov 2017 19:40:23 +0000 (20:40 +0100)]
re PR libgcc/82635 (std::thread's join broken on FreeBSD with all GCCs >= 5)
2017-11-04 Andreas Tobler <andreast@gcc.gnu.org>
PR libgcc/82635
* config/i386/freebsd-unwind.h (MD_FALLBACK_FRAME_STATE_FOR): Use a
sysctl to determine whether we're in a trampoline.
Keep the pattern matching method for systems without
KERN_PROC_SIGTRAMP sysctl.
trans-expr.c (gfc_trans_assignment_1): Character kind conversion may create a loop variant temporary, too.
gcc/fortran/ChangeLog:
2017-11-04 Andre Vehreschild <vehre@gcc.gnu.org>
* trans-expr.c (gfc_trans_assignment_1): Character kind conversion may
create a loop variant temporary, too.
* trans-intrinsic.c (conv_caf_send): Treat char arrays as arrays and
not as scalars.
* trans.c (get_array_span): Take the character kind into account when
doing pointer arithmetic.
gcc/testsuite/ChangeLog:
2017-11-04 Andre Vehreschild <vehre@gcc.gnu.org>
* gfortran.dg/coarray/send_char_array_1.f90: New test.
Thomas Koenig [Sat, 4 Nov 2017 13:20:32 +0000 (13:20 +0000)]
re PR fortran/29600 ([F03] MINLOC and MAXLOC take an optional KIND argument)
2017-11-04 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/29600
* gfortran.h (gfc_check_f): Replace fm3l with fm4l.
* intrinsic.h (gfc_resolve_maxloc): Add gfc_expr * to argument
list in protoytpe.
(gfc_resolve_minloc): Likewise.
* check.c (gfc_check_minloc_maxloc): Handle kind argument.
* intrinsic.c (add_sym_3_ml): Rename to
(add_sym_4_ml): and handle kind argument.
(add_function): Replace add_sym_3ml with add_sym_4ml and add
extra arguments for maxloc and minloc.
(check_specific): Change use of check.f3ml with check.f4ml.
* iresolve.c (gfc_resolve_maxloc): Handle kind argument. If
the kind is smaller than the smallest library version available,
use gfc_default_integer_kind and convert afterwards.
(gfc_resolve_minloc): Likewise.
2017-11-04 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/29600
* gfortran.dg/minmaxloc_8.f90: New test.
Steven G. Kargl [Sat, 4 Nov 2017 00:34:40 +0000 (00:34 +0000)]
re PR fortran/82796 (Private+equivalence in used module breaks compilation of pure function)
2017-11-01 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/82796
* resolve.c (resolve_equivalence): An entity in a common block within
a module cannot appear in an equivalence statement if the entity is
with a pure procedure.
2017-11-01 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/82796
* gfortran.dg/equiv_pure.f90: New test.
* config/i386/i386.c (ix86_emit_restore_reg_using_pop): Prototype.
(ix86_adjust_stack_and_probe_stack_clash): Use a push/pop sequence
to probe at the start of a noreturn function.
Jakub Jelinek [Fri, 3 Nov 2017 19:08:25 +0000 (20:08 +0100)]
re PR tree-optimization/78821 (GCC7: Copying whole 32 bits structure field by field not optimised into copying whole 32 bits at once)
PR tree-optimization/78821
* gimple-ssa-store-merging.c: Update the file comment.
(MAX_STORE_ALIAS_CHECKS): Define.
(struct store_operand_info): New type.
(store_operand_info::store_operand_info): New constructor.
(struct store_immediate_info): Add rhs_code and ops data members.
(store_immediate_info::store_immediate_info): Add rhscode, op0r
and op1r arguments to the ctor, initialize corresponding data members.
(struct merged_store_group): Add load_align_base and load_align
data members.
(merged_store_group::merged_store_group): Initialize them.
(merged_store_group::do_merge): Update them.
(merged_store_group::apply_stores): Pick the constant for
encode_tree_to_bitpos from one of the two operands, or skip
encode_tree_to_bitpos if neither operand is a constant.
(class pass_store_merging): Add process_store method decl. Remove
bool argument from terminate_all_aliasing_chains method decl.
(pass_store_merging::terminate_all_aliasing_chains): Remove
var_offset_p argument and corresponding handling.
(stmts_may_clobber_ref_p): New function.
(compatible_load_p): New function.
(imm_store_chain_info::coalesce_immediate_stores): Terminate group
if there is overlap and rhs_code is not INTEGER_CST. For
non-overlapping stores terminate group if rhs is not mergeable.
(get_alias_type_for_stmts): Change first argument from
auto_vec<gimple *> & to vec<gimple *> &. Add IS_LOAD, CLIQUEP and
BASEP arguments. If IS_LOAD is true, look at rhs1 of the stmts
instead of lhs. Compute *CLIQUEP and *BASEP in addition to the
alias type.
(get_location_for_stmts): Change first argument from
auto_vec<gimple *> & to vec<gimple *> &.
(struct split_store): Remove orig_stmts data member, add orig_stores.
(split_store::split_store): Create orig_stores rather than orig_stmts.
(find_constituent_stmts): Renamed to ...
(find_constituent_stores): ... this. Change second argument from
vec<gimple *> * to vec<store_immediate_info *> *, push pointers
to info structures rather than the statements.
(split_group): Rename ALLOW_UNALIGNED argument to
ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
it. Adjust find_constituent_stores caller.
(imm_store_chain_info::output_merged_store): Handle rhs_code other
than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
get_location_for_stmts callers. Set MR_DEPENDENCE_CLIQUE and
MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
(mem_valid_for_store_merging): New function.
(handled_load): New function.
(pass_store_merging::process_store): New method.
(pass_store_merging::execute): Use process_store method. Adjust
terminate_all_aliasing_chains caller.
* gcc.dg/store_merging_13.c: New test.
* gcc.dg/store_merging_14.c: New test.
Wilco Dijkstra [Fri, 3 Nov 2017 18:19:33 +0000 (18:19 +0000)]
Improve aarch64_legitimate_constant_p
This patch further improves aarch64_legitimate_constant_p. Allow all
integer, floating point and vector constants. Allow label references
and non-anchor symbols with an immediate offset. This allows such
constants to be rematerialized, resulting in smaller code and fewer stack
spills. SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.
gcc/
* config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
Return true for more constants, symbols and label references.
(aarch64_valid_floating_const): Remove unused function.
Wilco Dijkstra [Fri, 3 Nov 2017 16:29:47 +0000 (16:29 +0000)]
re PR c++/82768 (ICE in synthesize_implicit_template_parm, at cp/parser.c:39338)
Fix PR82768
Forcing LR at the bottom of the frame caused a few test failures.
Since there are some cases that generate worse code, revert this
part, and the frame tests pass again.
gcc/
PR target/82786
* config/aarch64/aarch64.c (aarch64_layout_frame):
Undo forcing of LR at bottom of frame.
* asan.c (create_cond_insert_point): Maintain profile.
* ipa-utils.c (ipa_merge_profiles): Be sure only IPA profiles are
merged.
* basic-block.h (struct basic_block_def): Remove frequency.
(EDGE_FREQUENCY): Use to_frequency
* bb-reorder.c (push_to_next_round_p): Use only IPA counts for global
heuristics.
(find_traces): Update to use to_frequency.
(find_traces_1_round): Likewise; use only IPA counts.
(bb_to_key): Likewise.
(connect_traces): Use IPA counts only.
(copy_bb_p): Update to use to_frequency.
(fix_up_crossing_landing_pad): Likewise.
(sanitize_hot_paths): Likewise.
* bt-load.c (basic_block_freq): Likewise.
* cfg.c (init_flow): Set count_max to uninitialized.
(check_bb_profile): Remove frequencies; check counts.
(dump_bb_info): Do not dump frequencies.
(update_bb_profile_for_threading): Update counts only.
(scale_bbs_frequencies_int): Likewise.
(MAX_SAFE_MULTIPLIER): Remove.
(scale_bbs_frequencies_gcov_type): Update counts only.
(scale_bbs_frequencies_profile_count): Update counts only.
(scale_bbs_frequencies): Update counts only.
* cfg.h (struct control_flow_graph): Add count-max.
(update_bb_profile_for_threading): Update prototype.
* cfgbuild.c (find_bb_boundaries): Do not update frequencies.
(find_many_sub_basic_blocks): Likewise.
* cfgcleanup.c (try_forward_edges): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgexpand.c (expand_gimple_cond): Likewise.
(expand_gimple_tailcall): Likewise.
(construct_init_block): Likewise.
(construct_exit_block): Likewise.
* cfghooks.c (verify_flow_info): Check consistency of counts.
(dump_bb_for_graph): Do not dump frequencies.
(split_block_1): Do not update frequencies.
(split_edge): Do not update frequencies.
(make_forwarder_block): Do not update frequencies.
(duplicate_block): Do not update frequencies.
(account_profile_record): Do not update frequencies.
* cfgloop.c (find_subloop_latch_edge_by_profile): Use IPA counts
for global heuristics.
* cfgloopanal.c (average_num_loop_insns): Update to use to_frequency.
(expected_loop_iterations_unbounded): Use counts only.
* cfgloopmanip.c (scale_loop_profile): Simplify.
(create_empty_loop_on_edge): Simplify
(loopify): Simplify
(duplicate_loop_to_header_edge): Simplify
* cfgrtl.c (force_nonfallthru_and_redirect): Update profile.
(update_br_prob_note): Take care of removing note when profile
becomes undefined.
(relink_block_chain): Do not dump frequency.
(rtl_account_profile_record): Use to_frequency.
* cgraph.c (symbol_table::create_edge): Convert count to ipa count.
(cgraph_edge::redirect_call_stmt_to_calle): Conver tcount to ipa count.
(cgraph_update_edges_for_call_stmt_node): Likewise.
(cgraph_edge::verify_count_and_frequency): Update.
(cgraph_node::verify_node): Temporarily disable frequency verification.
* cgraphbuild.c (compute_call_stmt_bb_frequency): Use
to_cgraph_frequency.
(cgraph_edge::rebuild_edges): Convert to ipa counts.
* cgraphunit.c (init_lowered_empty_function): Do not initialize
frequencies.
(cgraph_node::expand_thunk): Update profile.
* except.c (dw2_build_landing_pads): Do not update frequency.
* final.c (compute_alignments): Use to_frequency.
(dump_basic_block_info): Do not dump frequency.
* gimple-pretty-print.c (dump_profile): Do not dump frequency.
(dump_gimple_bb_header): Do not dump frequency.
* gimple-ssa-isolate-paths.c (isolate_path): Do not update frequency;
do update count.
* gimple-streamer-in.c (input_bb): Do not stream frequency.
* gimple-streamer-out.c (output_bb): Do not stream frequency.
* haifa-sched.c (sched_pressure_start_bb): Use to_freuqency.
(init_before_recovery): Do not update frequency.
(sched_create_recovery_edges): Do not update frequency.
* hsa-gen.c (convert_switch_statements): Do not update frequency.
* ipa-cp.c (ipcp_propagate_stage): Update search for max_count.
(ipa_cp_c_finalize): Set max_count to uninitialized.
* ipa-fnsummary.c (get_minimal_bb): Use counts.
(param_change_prob): Use counts.
* ipa-profile.c (ipa_profile_generate_summary): Do not summarize
local profiles.
* ipa-split.c (consider_split): Use to_frequency.
(split_function): Use to_frequency.
* ira-build.c (loop_compare_func): Likewise.
(mark_loops_for_removal): Likewise.
(mark_all_loops_for_removal): Likewise.
* loop-doloop.c (doloop_modify): Do not update frequency.
* loop-unroll.c (unroll_loop_runtime_iterations): Do not update
frequency.
* lto-streamer-in.c (input_function): Update count_max.
* omp-expand.c (expand_omp_taskreg): Update count_max.
* omp-simd-clone.c (simd_clone_adjust): Update profile.
* predict.c (maybe_hot_frequency_p): Use to_frequency.
(maybe_hot_count_p): Use ipa counts only.
(maybe_hot_bb_p): Simplify.
(maybe_hot_edge_p): Simplify.
(probably_never_executed): Do not take frequency argument.
(probably_never_executed_bb_p): Do not pass frequency.
(probably_never_executed_edge_p): Likewise.
(combine_predictions_for_bb): Check that profile is nonzero.
(propagate_freq): Do not set frequency.
(drop_profile): Simplify.
(counts_to_freqs): Simplify.
(expensive_function_p): Use to_frequency.
(propagate_unlikely_bbs_forward): Simplify.
(determine_unlikely_bbs): Simplify.
(estimate_bb_frequencies): Add hack to silence graphite issues.
(compute_function_frequency): Use ipa counts.
(pass_profile::execute): Update.
(rebuild_frequencies): Use counts only.
(force_edge_cold): Use counts only.
* profile-count.c (profile_count::dump): Dump new count types.
(profile_count::differs_from_p): Check compatiblity.
(profile_count::to_frequency): New function.
(profile_count::to_cgraph_frequency): New function.
* profile-count.h (struct function): Declare.
(enum profile_quality): Add profile_guessed_local and
profile_guessed_global0.
(class profile_proability): Decrease number of bits to 29;
update from_reg_br_prob_note and to_reg_br_prob_note.
(class profile_count: Update comment; decrease number of bits
to 61. Check compatibility.
(profile_count::compatible_p): New private member function.
(profile_count::ipa_p): New member function.
(profile_count::operator<): Handle global zero correctly.
(profile_count::operator>): Handle global zero correctly.
(profile_count::operator<=): Handle global zero correctly.
(profile_count::operator>=): Handle global zero correctly.
(profile_count::nonzero_p): New member function.
(profile_count::force_nonzero): New member function.
(profile_count::max): New member function.
(profile_count::apply_scale): Handle IPA scalling.
(profile_count::guessed_local): New member function.
(profile_count::global0): New member function.
(profile_count::ipa): New member function.
(profile_count::to_frequency): Declare.
(profile_count::to_cgraph_frequency): Declare.
* profile.c (OVERLAP_BASE): Delete.
(compute_frequency_overlap): Delete.
(compute_branch_probabilities): Do not use compute_frequency_overlap.
* regs.h (REG_FREQ_FROM_BB): Use to_frequency.
* sched-ebb.c (rank): Use counts only.
* shrink-wrap.c (handle_simple_exit): Use counts only.
(try_shrink_wrapping): Use counts only.
(place_prologue_for_one_component): Use counts only.
* tracer.c (find_best_predecessor): Use to_frequency.
(find_trace): Use to_frequency.
(tail_duplicate): Use to_frequency.
* trans-mem.c (expand_transaction): Do not update frequency.
* tree-call-cdce.c: Do not update frequency.
* tree-cfg.c (gimple_find_sub_bbs): Likewise.
(gimple_merge_blocks): Likewise.
(gimple_split_edge): Likewise.
(gimple_duplicate_sese_region): Likewise.
(gimple_duplicate_sese_tail): Likewise.
(move_sese_region_to_fn): Likewise.
(gimple_account_profile_record): Likewise.
(insert_cond_bb): Likewise.
* tree-complex.c (expand_complex_div_wide): Likewise.
* tree-eh.c (lower_resx): Update profile.
* tree-inline.c (copy_bb): Simplify count scaling; do not scale
frequencies.
(initialize_cfun): Do not initialize frequencies
(freqs_to_counts): Delete.
(copy_cfg_body): Ignore count parameter.
(copy_body): Update.
(expand_call_inline): Update count_max.
(optimize_inline_calls): Update count_max.
(tree_function_versioning): Update count_max.
* tree-ssa-coalesce.c (coalesce_cost_bb): Use to_frequency.
* tree-ssa-ifcombine.c (update_profile_after_ifcombine): Do not update
frequency.
* tree-ssa-loop-im.c (execute_sm_if_changed): Use counts only.
* tree-ssa-loop-ivcanon.c (unloop_loops): Do not update freuqency.
(try_peel_loop): Likewise.
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Use
to_frequency.
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Pass -1.
(tree_transform_and_unroll_loop): Do not use frequencies
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations):
Use reliable prediction only.
* tree-ssa-loop-unswitch.c (hoist_guard): Do not use frequencies.
* tree-ssa-sink.c (select_best_block): Use to_frequency.
* tree-ssa-tail-merge.c (replace_block_by): Temporarily disable
probability scaling.
* tree-ssa-threadupdate.c (create_block_for_threading): Do
not update frequency
(any_remaining_duplicated_blocks): Likewise.
(update_profile): Likewise.
(estimated_freqs_path): Delete.
(freqs_to_counts_path): Delete.
(clear_counts_path): Delete.
(ssa_fix_duplicate_block_edges): Likewise.
(duplicate_thread_path): Likewise.
* tree-switch-conversion.c (gen_inbound_check): Use counts.
* tree-tailcall.c (decrease_profile): Do not update frequency.
(eliminate_tail_call): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-loop.c (scale_profile_for_vect_loop): Likewise.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* ubsan.c (ubsan_expand_null_ifn): Update profile.
(ubsan_expand_ptr_ifn): Update profile.
* value-prof.c (gimple_ic): Simplify.
* value-prof.h (gimple_ic): Update prototype.
* ipa-inline-transform.c (inline_transform): Fix scaling conditoins.
* ipa-inline.c (compute_uninlined_call_time): Be sure that
counts are nonzero.
(want_inline_self_recursive_call_p): Likewise.
(resolve_noninline_speculation): Only cummulate defined counts.
(inline_small_functions): Use nonzero_p.
(ipa_inline): Do not access freed node.
Wilco Dijkstra [Fri, 3 Nov 2017 15:20:53 +0000 (15:20 +0000)]
Set default sched pressure algorithm
The Arm backend sets the default sched-pressure algorithm to SCHED_PRESSURE_MODEL.
Benchmarking on AArch64 shows this speeds up floating point performance on SPEC -
eg. CactusBSSN improves by ~16%. The gains are mostly due to less spilling,
so enable this on AArch64 by default.
gcc/
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set PARAM_SCHED_PRESSURE_ALGORITHM to SCHED_PRESSURE_MODEL.
The rs6000 port currently has an *lt0_disi define_insn, setting the DI
result to whether the SI argument is negative or not. It turns out the
generic optimisers cannot always figure out in the other cases either
that this is just a shift for us. This patch adds patterns for all
four SI/DI combinations.