Kelvin Nilsen [Tue, 8 May 2018 17:29:52 +0000 (17:29 +0000)]
extend.texi (PowerPC Built-in Functions): Rename this subsection.
gcc/ChangeLog:
2018-05-08 Kelvin Nilsen <kelvin@gcc.gnu.org>
* doc/extend.texi (PowerPC Built-in Functions): Rename this
subsection.
(Basic PowerPC Built-in Functions): The new name of the
subsection previously known as "PowerPC Built-in Functions".
(Basic PowerPC Built-in Functions Available on all Configurations):
New subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.05): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.06): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.07): New
subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 3.0): New
subsubsection.
Jakub Jelinek [Tue, 8 May 2018 16:17:34 +0000 (18:17 +0200)]
re PR target/85683 (GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64])
PR target/85683
* config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0
after cmpelim optimization.
* gcc.target/i386/pr49095.c: Add -masm=att to dg-options. Add
scan-assembler-times checking that except for [fh]*xor other functions
don't use any load instructions.
Jonathan Wakely [Tue, 8 May 2018 13:05:04 +0000 (14:05 +0100)]
PR libstdc++/85672 #undef _GLIBCXX_USE_FLOAT128 when not supported
Restore the behaviour in GCC 8 and earlier where _GLIBCXX_USE_FLOAT128
is not defined when configure detects support is missing. This avoids
having three states where the macro is either 1, 0, or undefined.
PR libstdc++/85672
* include/Makefile.am [!ENABLE_FLOAT128]: Change c++config.h entry
to #undef _GLIBCXX_USE_FLOAT128 instead of defining it to zero.
* include/Makefile.in: Regenerate.
* include/bits/c++config (_GLIBCXX_USE_FLOAT128): Move definition
within conditional block.
Jakub Jelinek [Tue, 8 May 2018 12:16:19 +0000 (14:16 +0200)]
re PR target/85572 (faster code for absolute value of __v2di)
PR target/85572
* config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and
E_V4DImode.
* config/i386/sse.md (abs<mode>2): Use VI_AVX2 iterator instead of
VI1248_AVX512VL_AVX512BW. Handle V2DImode and V4DImode if not
TARGET_AVX512VL using ix86_expand_sse2_abs. Formatting fixes.
* g++.dg/other/sse2-pr85572-1.C: New test.
* g++.dg/other/sse2-pr85572-2.C: New test.
* g++.dg/other/sse4-pr85572-1.C: New test.
* g++.dg/other/avx2-pr85572-1.C: New test.
[arm] PR target/85658 Fix operator precedence errors in parsecpu.awk
There are a number of places in parsecpu.awk where I've managed to get
the operator precedence between ! and 'in' incorrect (! binds more
tightly). In most cases this just makes a consistency test
ineffective, but in a few cases it means we fail to correctly diagnose
errors by the user (for example, when passing an invalid cpu or
architecture name to configure. This patch fixes all the cases I
could find, based on searching for all uses of the two operators in
the same expression. The tweak to the API of check_fpu is to bring it
into line with the other check functions - it now returns the result
rather than printing it directly. The caller now does the printing,
in the same way that the chkarch and chkcpu commands do.
PR target/85658
* config/arm/parsecpu.awk (check_cpu): Fix operator precedence.
(check_arch): Likewise.
(check_fpu): Return the result rather than printing it.
(end arch): Fix operator precedence.
(end cpu): Likewise.
(END): Print the result from check_fpu.
This patch adds SVE patterns that combine a PTRUE-predicated
comparison with a separate AND. The main benefit is for
optimising ANDs with the loop predicate, as in the testcase.
However, one of the potential drawbacks is that it triggers
even for cases in which two naturally-parallel comparisons
are ANDed together. Whether that's a win or a less will
depend on the schedule, but it has the potential to be a win
more often than a loss.
The combine patterns are undeniably ugly. One way of getting
around them would be to allow 1->1 "splits" when combining
2 instructions, as well as 1->2 splits when combining more
than 2 instructions (although that wouldn't really be a split).
Another would be to have a way of defining target-specific
rtx simplifications. branches/ARM/sve-branch has a prototype
implementation of that, but it would need some clean-up before being
ready to submit. It would also be good to make it closer to the
match.pd style.
Until then, I think what the combine patterns are doing is the
"correct" implementation given the current infrastructure.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c: Do not expect any ANDs.
XFAIL the BIC test.
* gcc.target/aarch64/sve/vcond_7.c: New test.
* gcc.target/aarch64/sve/vcond_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r260031
This patch rewrites the SVE comparison handling so that it uses
UNSPEC_MERGE_PTRUE for comparisons that are known to be predicated
on a PTRUE, for consistency with other patterns. Specific unspecs
are then only needed for truly predicated floating-point comparisons,
such as those used in the expansion of UNEQ for flag_trapping_math.
The patch also makes sure that the comparison expanders attach
a REG_EQUAL note to instructions that use UNSPEC_MERGE_PTRUE,
so passes can use that as an alternative to the unspec pattern.
(This happens automatically for optabs. The problem was that
this code emits instruction patterns directly.)
No specific benefit on its own, but it lays the groundwork for
the next patch.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/iterators.md (UNSPEC_COND_LO, UNSPEC_COND_LS)
(UNSPEC_COND_HI, UNSPEC_COND_HS, UNSPEC_COND_UO): Delete.
(SVE_INT_CMP, SVE_FP_CMP): New code iterators.
(cmp_op, sve_imm_con): New code attributes.
(SVE_COND_INT_CMP, imm_con): Delete.
(cmp_op): Remove above unspecs from int attribute.
* config/aarch64/aarch64-sve.md (*vec_cmp<cmp_op>_<mode>): Rename
to...
(*cmp<cmp_op><mode>): ...this. Use UNSPEC_MERGE_PTRUE instead of
comparison-specific unspecs.
(*vec_cmp<cmp_op>_<mode>_ptest): Rename to...
(*cmp<cmp_op><mode>_ptest): ...this and adjust likewise.
(*vec_cmp<cmp_op>_<mode>_cc): Rename to...
(*cmp<cmp_op><mode>_cc): ...this and adjust likewise.
(*vec_fcm<cmp_op><mode>): Rename to...
(*fcm<cmp_op><mode>): ...this and adjust likewise.
(*vec_fcmuo<mode>): Rename to...
(*fcmuo<mode>): ...this and adjust likewise.
(*pred_fcm<cmp_op><mode>): New pattern.
* config/aarch64/aarch64.c (aarch64_emit_unop, aarch64_emit_binop)
(aarch64_emit_sve_ptrue_op, aarch64_emit_sve_ptrue_op_cc): New
functions.
(aarch64_unspec_cond_code): Remove handling of LTU, GTU, LEU, GEU
and UNORDERED.
(aarch64_gen_unspec_cond, aarch64_emit_unspec_cond): Delete.
(aarch64_emit_sve_predicated_cond): New function.
(aarch64_expand_sve_vec_cmp_int): Use aarch64_emit_sve_ptrue_op_cc.
(aarch64_emit_unspec_cond_or): Replace with...
(aarch64_emit_sve_or_conds): ...this new function. Use
aarch64_emit_sve_ptrue_op for the individual comparisons and
aarch64_emit_binop to OR them together.
(aarch64_emit_inverted_unspec_cond): Replace with...
(aarch64_emit_sve_inverted_cond): ...this new function. Use
aarch64_emit_sve_ptrue_op for the comparison and
aarch64_emit_unop to invert the result.
(aarch64_expand_sve_vec_cmp_float): Update after the above
changes. Use aarch64_emit_sve_ptrue_op for native comparisons.
sve/vcond_6.c was effectively testing a three-input logical operation,
since the result of BINOP needed to be ANDed with the loop predicate
before loading src[i]. This patch makes it really test a binary
operation instead. A later patch will add (and optimise) the
three-operand case.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_6.c (LOOP): Unconditionally
load from src[i].
Thomas Koenig [Tue, 8 May 2018 07:47:19 +0000 (07:47 +0000)]
re PR fortran/54613 ([F08] Add FINDLOC plus support MAXLOC/MINLOC with KIND=/BACK=)
2018-05-08 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/54613
* check.c (gfc_check_minmaxloc): Remove error for BACK not being
implemented. Use gfc_logical_4_kind for BACK.
* simplify.c (min_max_choose): Add optional argument back_val.
Handle it.
(simplify_minmaxloc_to_scalar): Add argument back_val. Pass
back_val to min_max_choose.
(simplify_minmaxloc_to_nodim): Likewise.
(simplify_minmaxloc_to_array): Likewise.
(gfc_simplify_minmaxloc): Add argument back, handle it.
Pass back_val to specific simplification functions.
(gfc_simplify_minloc): Remove ATTRIBUTE_UNUSED from argument back,
pass it on to gfc_simplify_minmaxloc.
(gfc_simplify_maxloc): Likewise.
* trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc): Adjust
comment. If BACK is true, use greater or equal (or lesser or
equal) insteal of greater (or lesser). Mark the condition of
having found a value which exceeds the limit as unlikely.
Jason Merrill [Mon, 7 May 2018 23:50:16 +0000 (19:50 -0400)]
PR c++/85646 - lambda visibility.
* decl2.c (determine_visibility): Don't mess with template arguments
from the containing scope.
(vague_linkage_p): Check DECL_ABSTRACT_P before looking at a 'tor
thunk.
Luis Machado [Mon, 7 May 2018 15:47:14 +0000 (15:47 +0000)]
re PR bootstrap/85681 (r259995 breaks bootstrap on x86_64-*-freebsd)
2018-05-07 Luis Machado <luis.machado@linaro.org>
PR bootstrap/85681
Revert:
2018-05-07 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
2018-05-07 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
Luis Machado [Mon, 7 May 2018 14:12:54 +0000 (14:12 +0000)]
Introduce prefetch-dynamic-strides option.
The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.
Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.
For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.
This also aligns GCC with what LLVM does for Falkor.
Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.
2018-05-07 Luis Machado <luis.machado@linaro.org>
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<prefetch_dynamic_strides>: New const bool field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
prefetch_dynamic_strides.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_DYNAMIC_STRIDES.
* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
prefetch-dynamic-strides setting.
Luis Machado [Mon, 7 May 2018 14:08:55 +0000 (14:08 +0000)]
Introduce prefetch-minimum stride option
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.
* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.
I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.
The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.
2018-05-07 Luis Machado <luis.machado@linaro.org>
Introduce option to limit software prefetching to known constant
strides above a specific threshold with the goal of preventing
conflicts with a hardware prefetcher.
gcc/
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
Tom de Vries [Mon, 7 May 2018 11:33:45 +0000 (11:33 +0000)]
[openacc, testsuite] Allow installed testing of libgomp to find gomp-constants.h
2018-05-07 Tom de Vries <tom@codesourcery.com>
PR testsuite/85677
* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
include directory in ALWAYS_CFLAGS out of $blddir != "" condition.
re PR fortran/85507 (ICE in gfc_dep_resolver, at fortran/dependency.c:2258)
gcc/fortran/ChangeLog:
2018-05-06 Andre Vehreschild <vehre@gcc.gnu.org>
PR fortran/85507
* dependency.c (gfc_dep_resolver): Revert looking at coarray dimension
introduced by r259385.
* trans-intrinsic.c (conv_caf_send): Always report a dependency for
same variables in coarray assignments.
Roland McGrath [Sat, 5 May 2018 23:35:25 +0000 (23:35 +0000)]
PR other/77609: Let the assembler choose ELF section types for miscellaneous named sections
gcc/
PR other/77609
* varasm.c (default_section_type_flags): Set SECTION_NOTYPE for
any section for which we don't know a specific type it should have,
regardless of name. Previously this was done only for the exact
names ".init_array", ".fini_array", and ".preinit_array".
(default_elf_asm_named_section): Add comment about
relationship with default_section_type_flags and SECTION_NOTYPE.
(get_section): Don't consider it a type conflict if one side has
SECTION_NOTYPE and the other doesn't, as long as neither has the
SECTION_BSS et al used in the default_section_type_flags logic.
Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
is given, these optimizations are disabled.
With this flag, gccbrig can generate GENERIC that assumes we are
targeting a phsa-runtime based implementation, which allows us
to expose the work-item context accesses to retrieve WI IDs etc.
which helps optimizers.
First optimization that takes advantage of this is to get rid of
the setworkitemid calls whenever we have non-inlined calls that
use IDs internally.
Other optimizations added in this commit:
- expand absoluteid to similar level of simplicity as workitemid.
At the moment absoluteid is the best indexing ID to end up with
WG vectorization.
- propagate ID variables closer to their uses. This is mainly
to avoid known useless casts, which confuse at least scalar
evolution analysis.
- use signed long long for storing IDs. Unsigned integers have
defined wraparound semantics, which confuse at least scalar
evolution analysis, leading to unvectorizable WI loops.
- also refactor some BRIG function generation helpers to brig_function.
- no point in having the wi-loop as a for-loop. It's really
a do...while and SCEV can analyze it just fine still.
- add consts to ptrs etc. in BRIG builtin defs.
Improves optimization opportunities.
- add qualifiers to generated function parameters.
Const and restrict on the hidden local/private pointers,
the arg buffer and the context pointer help some optimizations.
HSA assumes all program scope HSAIL symbols can be queried from
the host runtime API, thus cannot be removed by the IPA.
Getting some inlining happening in the finalized binary required:
* explicitly marking the 'prog' scope functions and the launcher
function "externally_visible" to avoid the inliner removing it
* also the host_def ptr is set to externally visible, otherwise
IPA assumes it's never set
* adding the 'inline' keyword to functions to enable inlining,
otherwise GCC defaults to replaceable functions (one can link
over the previous one) which cannot be inlined
* replacing all calls to declarations with calls to definitions to
enable the inliner to find the definition
* to fix missing hidden argument types in the generated functions.
These were ignored silently until GCC started to be able to
inline calls to such functions.
* do not gimplify before fixing the call targets. Otherwise the
calls get detached and the definitions are not found. The reason
why this happens is not clear, but gimplifying only after call
target decl->def conversion fixes this.
libgo: fix for unaligned read in go-unwind.c's read_encoded_value()
Change code to work properly reading unaligned data on architectures
that don't support unaliged reads. This fixes a regression (broke
Solaris/sparc) introduced in https://golang.org/cl/90235.
Alan Modra [Fri, 4 May 2018 13:47:11 +0000 (23:17 +0930)]
libffi PowerPC64 ELFv1 fp arg fixes
The ELFv1 ABI says: "Single precision floating point values are mapped
to the second word in a single doubleword" and also "Floating point
registers f1 through f13 are used consecutively to pass up to 13
floating point values, one member aggregates passed by value
containing a floating point value, and to pass complex floating point
values".
libffi wasn't expecting float args in the second word, and wasn't
passing one member aggregates in fp registers. This patch fixes those
problems, making use of the existing ELFv2 homogeneous aggregate
support since a one element fp struct is a special case of an
homogeneous aggregate.
I've also set a flag when returning pointers that might be used one
day. This is just a tidy since the ppc64 assembly support code
currently doesn't test FLAG_RETURNS_64BITS for integer types..
* src/powerpc/ffi_linux64.c (discover_homogeneous_aggregate):
Compile for ELFv1 too, handling single element aggregates.
(ffi_prep_cif_linux64_core): Call discover_homogeneous_aggregate
for ELFv1. Set FLAG_RETURNS_64BITS for FFI_TYPE_POINTER return.
(ffi_prep_args64): Call discover_homogeneous_aggregate for ELFv1,
and handle single element structs containing float or double
as if the element wasn't wrapped in a struct. Store floats in
second word of doubleword slot when big-endian.
(ffi_closure_helper_LINUX64): Similarly.
This removes the special Xilinx FP support. It was deprecated in
GCC 8.
After this patch all of TARGET_{DOUBLE,SINGLE}_FLOAT,
TARGET_{DF,SF}_INSN, and TARGET_{DF,SF}_FPR are replaced by
TARGET_HARD_FLOAT. Also the fp_type attribute is deleted.
Richard Biener [Fri, 4 May 2018 07:30:50 +0000 (07:30 +0000)]
re PR tree-optimization/85627 (ICE in update_phi_components in tree-complex.c)
2018-05-04 Richard Biener <rguenther@suse.de>
PR middle-end/85627
* tree-complex.c (update_complex_assignment): We are always in SSA form.
(expand_complex_div_wide): Likewise.
(expand_complex_operations_1): Likewise.
(expand_complex_libcall): Preserve EH info of the original stmt.
(tree_lower_complex): Handle removed blocks.
* tree.c (build_common_builtin_nodes): Do not set ECF_NOTRHOW
on complex multiplication and division libcall builtins.
Richard Biener [Fri, 4 May 2018 07:25:54 +0000 (07:25 +0000)]
re PR lto/85574 (LTO bootstapped binaries differ)
2018-05-04 Richard Biener <rguenther@suse.de>
PR middle-end/85574
* fold-const.c (negate_expr_p): Restrict negation of operand
zero of a division to when we know that can happen without
overflow.
(fold_negate_expr_1): Likewise.
* gcc.dg/torture/pr85574.c: New testcase.
* gcc.dg/torture/pr57656.c: Use dg-additional-options.
Jakub Jelinek [Fri, 4 May 2018 07:19:45 +0000 (09:19 +0200)]
re PR tree-optimization/85466 (Performance is slow when doing 'branchless' conditional style math operations)
PR libstdc++/85466
* real.h (real_nextafter): Declare.
* real.c (real_nextafter): New function.
* fold-const-call.c (fold_const_nextafter): New function.
(fold_const_call_sss): Call it for CASE_CFN_NEXTAFTER and
CASE_CFN_NEXTTOWARD.
(fold_const_call_1): For CASE_CFN_NEXTTOWARD call fold_const_call_sss
even when arg1_mode is different from arg0_mode.
* gcc.dg/nextafter-1.c: New test.
* gcc.dg/nextafter-2.c: New test.
* gcc.dg/nextafter-3.c: New test.
* gcc.dg/nextafter-4.c: New test.
In https://golang.org/cl/111097 the gc version of cmd/go was updated
to include some gofrontend-specific changes. The gofrontend code
already has different versions of those changes; this CL makes the
gofrontend match the upstream code.
Jonathan Wakely [Thu, 3 May 2018 22:58:43 +0000 (23:58 +0100)]
PR libstdc++/82644 define TR1 hypergeometric functions in strict modes
Following a recent change for PR 82644 the non-standard hypergeomtric
functions are not defined by <cmath> when __STRICT_ANSI__ is defined
(e.g. for -std=c++17, or -std=c++14 -D__STDCPP_WANT_MATH_SPEC_FUNCS__).
That caused errors in <tr1/cmath> because the using-declarations for
tr1::hyperg et al are invalid in strict modes.
The solution is to define the TR1 hypergeometric functions inline in
<tr1/cmath> if __STRICT_ANSI__ is defined.
Jakub Jelinek [Thu, 3 May 2018 18:59:39 +0000 (20:59 +0200)]
re PR target/85530 ([X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 not implemented)
PR target/85530
* config/i386/avx512fintrin.h (_mm512_mullox_epi64,
_mm512_mask_mullox_epi64): New intrinsics.
* gcc.target/i386/avx512f-vpmullq-1.c: New test.
* gcc.target/i386/avx512f-vpmullq-2.c: New test.
* gcc.target/i386/avx512dq-vpmullq-3.c: New test.
* gcc.target/i386/avx512dq-vpmullq-4.c: New test.
Jonathan Wakely [Thu, 3 May 2018 18:58:00 +0000 (19:58 +0100)]
PR libstdc++/85632 fix wraparound in filesystem::space
On 32-bit targets any values over 4GB would wrap and produce the wrong
result.
PR libstdc++/85632 use uintmax_t for arithmetic
* src/filesystem/ops.cc (experimental::filesystem::space): Perform
arithmetic in result type.
* src/filesystem/std-ops.cc (filesystem::space): Likewise.
* testsuite/27_io/filesystem/operations/space.cc: Check total capacity
is greater than free space.
* testsuite/experimental/filesystem/operations/space.cc: New.
compiler: avoid crashing on invalid non-integer array length
Tweak the array type checking code to avoid crashing on array types
whose length expressions are explicit non-integer types (for example,
"float64(10)"). If such constructs are seen, issue an "invalid array
bound" error.
The standard requires that the std::thread constructor is constrained so
it can't be called with a first argument of type std::thread. The
current implementation only meets that requirement if the constructor is
called with one argument, by using deleted overloads. This uses an
enable_if constraint to enforce the requirement for any number of
arguments.
Also add a static assertion to give a more readable error for invalid
arguments that cannot be invoked. Also simplify _Invoker to reduce the
error cascade for ill-formed instantiations with non-invocable
arguments.
PR libstdc++/84535
* include/std/thread (thread::__not_same): New SFINAE helper.
(thread::thread(_Callable&&, _Args&&...)): Add SFINAE constraint that
first argument is not a std::thread. Add static assertion to check
INVOKE expression is valid.
(thread::thread(thread&), thread::thread(const thread&&)): Remove.
(thread::_Invoke::_M_invoke, thread::_Invoke::operator()): Use
__invoke_result for return types and remove exception specifications.
* testsuite/30_threads/thread/cons/84535.cc: New.
Tom de Vries [Thu, 3 May 2018 13:47:14 +0000 (13:47 +0000)]
[testsuite] Add scan-offload-tree-dump
2018-05-03 Tom de Vries <tom@codesourcery.com>
PR testsuite/85106
* lib/scanoffloadtree.exp: New file.
* testsuite/lib/libgomp-dg.exp (libgomp-dg-test): Add save-temps to
extra_tool_flags if it contains an -foffload=-fdump-* flag.
* testsuite/lib/libgomp.exp: Include scanoffloadtree.exp.
* testsuite/libgomp.oacc-c/vec.c: Use scan-offload-tree-dump.
* doc/sourcebuild.texi (Commands for use in dg-final, Scan optimization
dump files): Add offload-tree.
Richard Biener [Thu, 3 May 2018 13:20:02 +0000 (13:20 +0000)]
re PR tree-optimization/85615 (ICE at -O2 and above on valid code on x86_64-linux-gnu: in dfs_enumerate_from, at cfganal.c:1197)
2018-05-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/85615
* tree-ssa-threadupdate.c (thread_block_1): Only allow exits
to loops not nested in BBs loop father to avoid creating multi-entry
loops.
Kyrylo Tkachov [Thu, 3 May 2018 12:59:43 +0000 (12:59 +0000)]
[tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively
We can improve the performance of complex floating-point multiplications by inlining the expansion a bit more aggressively.
We can inline complex x = a * b as:
x = (ar*br - ai*bi) + i(ar*bi + br*ai);
if (isunordered (__real__ x, __imag__ x))
x = __muldc3 (a, b); //Or __mulsc3 for single-precision
That way the common case where no NaNs are produced we can avoid the libgcc call and fall back to the
NaN handling stuff in libgcc if either components of the expansion are NaN.
The implementation is done in expand_complex_multiplication in tree-complex.c and the above expansion
will be done when optimising for -O1 and greater and when not optimising for size.
At -O0 and -Os the single call to libgcc will be emitted.
For the code:
__complex double
foo (__complex double a, __complex double b)
{
return a * b;
}
We will now emit at -O2 for aarch64:
foo:
fmul d16, d1, d3
fmul d6, d1, d2
fnmsub d5, d0, d2, d16
fmadd d4, d0, d3, d6
fcmp d5, d4
bvs .L8
fmov d1, d4
fmov d0, d5
ret
.L8:
stp x29, x30, [sp, -16]!
mov x29, sp
bl __muldc3
ldp x29, x30, [sp], 16
ret
Instead of just a branch to __muldc3.
PR tree-optimization/70291
* tree-complex.c (expand_complex_libcall): Add type, inplace_p
arguments. Change return type to tree. Emit libcall as a new
statement rather than replacing existing one when inplace_p is true.
(expand_complex_multiplication_components): New function.
(expand_complex_multiplication): Expand floating-point complex
multiplication using the above.
(expand_complex_division): Rename inner_type parameter to type.
Update expand_complex_libcall call-site.
(expand_complex_operations_1): Update expand_complex_multiplication
and expand_complex_division call-sites.
* gcc.dg/complex-6.c: New test.
* gcc.dg/complex-7.c: Likewise.
Jakub Jelinek [Thu, 3 May 2018 09:29:39 +0000 (11:29 +0200)]
re PR other/85622 (gcc-8.1.0/NEWS says it's not released yet)
PR other/85622
* gcc_release: For -f, verify contrib/gennews has the major version
pages listed and both index.html and changes.html have been updated
for the new release.
These tests used to be disabled in the gofrontend since the go tool
didn't support build IDs for the gofrontend. It does now, so enable
the tests again.
This patch adds explicit references to various types and constants
defined by the header files included by sysinfo.c (used to drive the
generation of gen-sysinfo.go as part of the libgo build via the GCC
"-fdump-go-spec" option).
The intent is to enable clients to gather the same info generated by
"-fdump-go-spec" by instead reading the generated DWARF from a
sysinfo.o object file compiled with "-g". Some compilers (notably
clang) try to omit DWARF records for a given type unless there is an
explicit use of it in the translation unit; the additional references
are to insure that everything we want to see in the DWARF shows up.