Jonathan Wakely [Mon, 21 May 2018 12:52:44 +0000 (13:52 +0100)]
Fix std::filesystem::absolute for empty paths
* src/filesystem/std-ops.cc (absolute): Report an error for empty
paths.
(weakly_canonical(const path&)): Do not call canonical on empty path.
(weakly_canonical(const path&, error_code&)): Likewise.
* testsuite/27_io/filesystem/operations/absolute.cc: Check for errors.
Kyrylo Tkachov [Mon, 21 May 2018 11:21:07 +0000 (11:21 +0000)]
[AArch64] Implement usadv16qi and ssadv16qi standard names
This patch implements the usadv16qi and ssadv16qi standard names.
See the thread at on gcc@gcc.gnu.org [1] for background.
The V16QImode variant is important to get right as it is the most commonly used pattern:
reducing vectors of bytes into an int.
The midend expects the optab to compute the absolute differences of operands 1 and 2 and
reduce them while widening along the way up to SImode. So the inputs are V16QImode and
the output is V4SImode.
I've tried out a few different strategies for that, the one I settled with is to emit:
UABDL2 tmp.8h, op1.16b, op2.16b
UABAL tmp.8h, op1.16b, op2.16b
UADALP op3.4s, tmp.8h
To work through the semantics let's say operands 1 and 2 are:
op1 { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 }
op2 { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 }
op3 { c0, c1, c2, c3 }
The UABDL2 takes the upper V8QI elements, computes their absolute differences, widens them and stores them into the V8HImode tmp:
The UABAL after that takes the lower V8QI elements, computes their absolute differences, widens them and accumulates them into the V8HImode tmp from the previous step:
Finally the UADALP does a pairwise widening reduction and accumulation into the V4SImode op3:
op3 { c0+ABS(a[8]-b[8])+ABS(a[0]-b[0])+ABS(a[9]-b[9])+ABS(a[1]-b[1]), c1+ABS(a[10]-b[10])+ABS(a[2]-b[2])+ABS(a[11]-b[11])+ABS(a[3]-b[3]), c2+ABS(a[12]-b[12])+ABS(a[4]-b[4])+ABS(a[13]-b[13])+ABS(a[5]-b[5]), c3+ABS(a[14]-b[14])+ABS(a[6]-b[6])+ABS(a[15]-b[15])+ABS(a[7]-b[7]) }
(sorry for the text dump)
Remember, according to [1] the exact reduction sequence doesn't matter (for integer arithmetic at least).
I've considered other sequences as well (thanks Wilco), for example
* UABD + UADDLP + UADALP
* UABLD2 + UABDL + UADALP + UADALP
I ended up settling in the sequence in this patch as it's short (3 instructions) and in the future we can potentially
look to optimise multiple occurrences of these into something even faster (for example accumulating into H registers for longer
before doing a single UADALP in the end to accumulate into the final S register).
If your microarchitecture has some some strong preferences for a particular sequence, please let me know or, even better, propose a patch
to parametrise the generation sequence by code (or the appropriate RTX cost).
This expansion allows the vectoriser to avoid unpacking the bytes in two steps and performing V4SI arithmetic on them.
So, for the code:
unsigned char pix1[N], pix2[N];
int foo (void)
{
int i_sum = 0;
int i;
for (i = 0; i < 16; i++)
i_sum += __builtin_abs (pix1[i] - pix2[i]);
So I expect this new expansion to be better than the status quo in any case.
Bootstrapped and tested on aarch64-none-linux-gnu.
This gives about 8% on 525.x264_r from SPEC2017 on a Cortex-A72.
* config/aarch64/aarch64.md ("unspec"): Define UNSPEC_SABAL,
UNSPEC_SABDL2, UNSPEC_SADALP, UNSPEC_UABAL, UNSPEC_UABDL2,
UNSPEC_UADALP values.
* config/aarch64/iterators.md (ABAL): New int iterator.
(ABDL2): Likewise.
(ADALP): Likewise.
(sur): Add mappings for the above.
* config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>_3):
New define_insn.
(aarch64_<sur>abal<mode>_4): Likewise.
(aarch64_<sur>adalp<mode>_3): Likewise.
(<sur>sadv16qi): New define_expand.
Tamar Christina [Mon, 21 May 2018 10:33:30 +0000 (10:33 +0000)]
Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-a
This patch adds the missing neon intrinsics for all 128 bit vector Integer modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.
gcc/
2018-05-21 Tamar Christina <tamar.christina@arm.com>
Alexey Brodkin [Mon, 21 May 2018 09:56:57 +0000 (09:56 +0000)]
[ARC] Add multilib support for linux targets
We used to build baremetal (AKA Elf32) multilibbed toolchains for years
now but never made that for Linux targets since there were problems with
uClibc n multilib setup. Now with help of Crosstool-NG it is finally
possible to create uClibc-based multilibbed toolchains and so we add
relevant CPUs for multilib in case of configuration for "arc*-*-linux*".
This will be essentially useful for glibc-based multilibbbed toolchains
in the future.
Janus Weil [Mon, 21 May 2018 06:45:55 +0000 (08:45 +0200)]
re PR fortran/85841 ([F2018] reject deleted features)
2018-05-21 Janus Weil <janus@gcc.gnu.org>
PR fortran/85841
* libgfortran.h: New macros GFC_STD_OPT_*.
* error.c (notify_std_msg): New function.
(gfc_notify_std): Adjust such that it can handle combinations of
GFC_STD_* flags in the 'std' argument, not just a single one.
* match.c (match_arithmetic_if, gfc_match_if): Reject arithmetic if
in Fortran 2018.
(gfc_match_stopcode): Use GFC_STD_OPT_* macros.
* options.c (set_default_std_flags): Warn for F2018 deleted features
by default.
(gfc_handle_option): F2018 deleted features are allowed in earlier
standards.
* symbol.c (gfc_define_st_label, gfc_reference_st_label): Reject
nonblock do constructs in Fortran 2018.
Paul Thomas [Sun, 20 May 2018 10:54:24 +0000 (10:54 +0000)]
re PR fortran/82275 (gfortran rejects valid & accepts invalid reference to dimension-remapped type SELECT TYPE selector)
2018-05-20 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82275
Correcting ChangeLogs
* match.c (gfc_match_type_spec): Go through the array ref and
decrement 'rank' for every dimension that is an element.
2018-05-20 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82275
Correcting ChangeLogs
* gfortran.dg/select_type_42.f90: New test.
Paul Thomas [Sun, 20 May 2018 10:08:24 +0000 (10:08 +0000)]
re PR fortran/80657 (Loop in character function declaration)
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/80657
* resolve.c (flag_fn_result_spec): Use the 'sym' argument to
test for self refs to the function result in the character len
expression. If a self reference is found, emit an error and
return true.
(resolve_fntype): Use the function symbol in the calls to the
above.
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/80657
* gfortran.dg/char_result_18.f90: New test.
Paul Thomas [Sun, 20 May 2018 10:04:46 +0000 (10:04 +0000)]
re PR fortran/49636 ([F03] ASSOCIATE construct confused with slightly complicated case)
2018-05-20 Paul Thomas <pault@gcc.gnu.org>
PR fortran/49636
* trans-array.c (gfc_get_array_span): Renamed from
'get_array_span'.
(gfc_conv_expr_descriptor): Change references to above.
* trans-array.h : Add prototype for 'gfc_get_array_span'.
* trans-stmt.c (trans_associate_var): If the associate name is
a subref array pointer, use gfc_get_array_span for the span.
2018-05-20 Paul Thomas <pault@gcc.gnu.org>
PR fortran/49636
* gfortran.dg/associate_38.f90: New test.
Paul Thomas [Sun, 20 May 2018 09:59:54 +0000 (09:59 +0000)]
re PR fortran/82923 (Automatic allocation of deferred length character using function result)
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82923
PR fortran/66694
PR fortran/82617
* trans-array.c (gfc_alloc_allocatable_for_assignment): Set the
charlen backend_decl of the rhs expr to ss->info->string_length
so that the value in the current scope is used.
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82923
* gfortran.dg/allocate_assumed_charlen_4.f90: New test. Note
that the patch fixes PR66694 & PR82617, although the testcases
are not explicitly included.
Uros Bizjak [Sat, 19 May 2018 13:43:06 +0000 (15:43 +0200)]
i386.md (rex64namesuffix): New mode attribute.
* config/i386/i386.md (rex64namesuffix): New mode attribute.
* config/i386/sse.md (sse_cvtsi2ss<rex64namesuffix><round_name>):
Merge insn pattern from sse_cvtsi2ss<round_name> and
sse_cvtsi2ssq<round_name> using SWI48 mode iterator.
(sse_cvtss2si<rex64namesuffix><round_name>): Merge insn pattern
from sse_cvtss2si<round_name> and sse_cvtss2siq<round_name>
using SWI48 mode iterator.
(sse_cvtss2si<rex64namesuffix>_2): Merge insn pattern from
sse_cvtss2si_2 and sse_cvtss2siq_2 using SWI48 mode iterator.
(sse_cvttss2si<rex64namesuffix><round_saeonly_name>): Merge insn
pattern from sse_cvttss2si<round_saeonly_name>
and sse_cvttss2siq<round_saeonly_name> using SWI48 mode iterator.
(avx512f_vcvtss2usi<rex64namesuffix><round_name>): Merge insn pattern
from avx512f_vcvtss2usi<round_name> and avx512f_vcvtss2usiq<round_name>
using SWI48 mode iterator.
(avx512f_vcvttss2usi<rex64namesuffix><round_saeonly_name>): Merge
insn pattern from avx512f_vcvttss2usi<round_saeonly_name> and
avx512f_vcvttss2usiq<round_saeonly_name> using SWI48 mode iterator.
(avx512f_vcvtsd2usi<rex64namesuffix><round_name>): Merge insn pattern
from avx512f_vcvtsd2usi<round_name> and avx512f_vcvtsd2usiq<round_name>
using SWI48 mode iterator.
(avx512f_vcvttsd2usi<rex64namesuffix><round_saeonly_name>): Merge
insn pattern from avx512f_vcvttsd2usi<round_saeonly_name> and
avx512f_vcvttsd2usiq<round_saeonly_name> using SWI48 mode iterator.
(sse2_cvtsd2si<rex64namesuffix><round_name>): Merge insn pattern from
sse2_cvtsd2si<round_name> and sse2_cvtsd2siq<round_name> using
SWI48 mode iterator.
(sse2_cvtsd2si<rex64namesuffix>_2): Merge insn pattern from
sse2_cvtsd2si_2 and sse2_cvtsd2siq_2 using SWI48 mode iterator.
(sse_cvttsd2si<rex64namesuffix><round_saeonly_name>): Merge insn
pattern from sse_cvttsd2si<round_saeonly_name>
and sse_cvttsd2siq<round_saeonly_name> using SWI48 mode iterator.
Paul Thomas [Sat, 19 May 2018 10:49:50 +0000 (10:49 +0000)]
re PR fortran/82923 (Automatic allocation of deferred length character using function result)
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82923
PR fortran/66694
PR fortran/82617
* trans-array.c (gfc_alloc_allocatable_for_assignment): Set the
charlen backend_decl of the rhs expr to ss->info->string_length
so that the value in the current scope is used.
2018-05-19 Paul Thomas <pault@gcc.gnu.org>
PR fortran/82923
* gfortran.dg/allocate_assumed_charlen_4.f90: New test. Note
that the patch fixes PR66694 & PR82617, although the testcases
are not explicitly included.
Jonathan Wakely [Sat, 19 May 2018 02:03:42 +0000 (03:03 +0100)]
Fix std::codecvt_utf8<wchar_t> for Mingw
* src/c++11/codecvt.cc (__codecvt_utf8_base<wchar_t>::do_in)
[__SIZEOF_WCHAR_T__==2 && __BYTE_ORDER__!=__ORDER_BIG_ENDIAN__]: Set
little_endian element in bitmask.
* testsuite/22_locale/codecvt/codecvt_utf8/69703.cc: Run all tests.
* testsuite/22_locale/codecvt/codecvt_utf8/wchar_t/1.cc: New.
Jonathan Wakely [Fri, 18 May 2018 16:14:04 +0000 (17:14 +0100)]
PR libstdc++/85098 add missing definitions for static constants
In C++11 and C++14 any odr-use of these constants requires a definition
at namespace-scope. In C++17 they are implicitly inline and so the
namespace-scope redeclarations are redundant (and allowing them is
deprecated).
PR libstdc++/85098
* include/bits/regex.h [__cplusplus < 201703L] (basic_regex::icase)
(basic_regex::nosubs, basic_regex::optimize, basic_regex::collate)
(basic_regex::ECMAScript, basic_regex::basic, basic_regex::extended)
(basic_regex::awk, basic_regex::grep, basic_regex::egrep): Add
definitions.
* include/bits/regex_automaton.h (_NFA::_M_insert_state): Adjust
whitespace.
* include/bits/regex_compiler.tcc (__INSERT_REGEX_MATCHER): Add
braces around body of do-while.
* testsuite/28_regex/basic_regex/85098.cc: New
Kyrylo Tkachov [Fri, 18 May 2018 13:10:36 +0000 (13:10 +0000)]
[arm][2/2] Remove support for -march=armv3 and older
We deprecated architecture versions earlier than Armv4T in GCC 6 [1].
This patch removes support for architectures lower than Armv4.
That is the -march values armv2, armv2a, armv3, armv3m are removed
with this patch. I did not remove armv4 because it's a bit more
involved code-wise and there has been some pushback on the implications
for -mcpu=strongarm support.
Removing armv3m and earlier though is pretty straightforward.
This allows us to get rid of the armv3m and mode32 feature bits
in arm-cpus.in as they can be assumed to be universally available.
Consequently the mcpu values arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620, arm7, arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720, arm710c, arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dm are now also removed.
Bootstrapped and tested on arm-none-linux-gnueabihf and on arm-none-eabi
with an aprofile multilib configuration (which builds quite a lot of library
configurations).
Kyrylo Tkachov [Fri, 18 May 2018 13:08:16 +0000 (13:08 +0000)]
[arm][1/2] Remove support for deprecated -march=armv5 and armv5e
The -march=armv5 and armv5e options have been deprecated in GCC 7 [1].
This patch removes support for them.
It's mostly mechanical stuff. The functionality that was previously
gated on arm_arch5 is now gated on arm_arch5t and the functionality
that was gated on arm_arch5e is now gated on arm_arch5te.
A path in TARGET_OS_CPP_BUILTINS for VxWorks is now unreachable and
therefore is deleted.
References to armv5 and armv5e are deleted/updated throughout the
source tree and testsuite.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Also built a cc1 for arm-wrs-vxworks as a sanity check.
* config/arm/arm-cpus.in (armv5, armv5e): Delete features.
(armv5t, armv5te): New features.
(ARMv5, ARMv5e): Delete fgroups.
(ARMv5t, ARMv5te): Adjust for above changes.
(ARMv6m): Likewise.
(armv5, armv5e): Delete arches.
* config/arm/arm.md (*call_reg_armv5): Use arm_arch5t instead of
arm_arch5.
(*call_reg_arm): Likewise.
(*call_value_reg_armv5): Likewise.
(*call_value_reg_arm): Likewise.
(*call_symbol): Likewise.
(*call_value_symbol): Likewise.
(*sibcall_insn): Likewise.
(*sibcall_value_insn): Likewise.
(clzsi2): Likewise.
(prefetch): Likewise.
(define_split and define_peephole2 dependent on arm_arch5):
Likewise.
* config/arm/arm.h (TARGET_LDRD): Use arm_arch5te instead of
arm_arch5e.
(TARGET_ARM_QBIT): Likewise.
(TARGET_DSP_MULTIPLY): Likewise.
(enum base_architecture): Delete BASE_ARCH_5, BASE_ARCH_5E.
(arm_arch5, arm_arch5e): Delete.
(arm_arch5t, arm_arch5te): Declare.
* config/arm/arm.c (arm_arch5, arm_arch5e): Delete.
(arm_arch5t): Declare.
(arm_option_reconfigure_globals): Update for the above.
(arm_options_perform_arch_sanity_checks): Update comment, replace
use of arm_arch5 with arm_arch5t.
(use_return_insn): Likewise.
(arm_emit_call_insn): Likewise.
(output_return_instruction): Likewise.
(arm_final_prescan_insn): Likewise.
(arm_coproc_builtin_available): Likewise.
* config/arm/arm-c.c (arm_cpu_builtins): Replace arm_arch5 and
arm_arch5e with arm_arch5t and arm_arch5te.
* config/arm/arm-protos.h (arm_arch5, arm_arch5e): Delete.
(arm_arch5t, arm_arch5te): Declare.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-arm-elf: Remove references to armv5, armv5e.
* config/arm/t-multilib: Likewise.
* config/arm/thumb1.md (*call_reg_thumb1_v5): Check arm_arch5t
instead of arm_arch5.
(*call_reg_thumb1): Likewise.
(*call_value_reg_thumb1_v5): Likewise.
(*call_value_reg_thumb1): Likewise.
* config/arm/vxworks.h (TARGET_OS_CPP_BUILTINS): Remove now
unreachable path.
* doc/invoke.texi (ARM Options): Remove references to armv5, armv5e.
* gcc.target/arm/pr40887.c: Update comment.
* lib/target-supports.exp: Don't generate effective target checks
and related helpers for armv5. Update comment.
* gcc.target/arm/armv5_thumb_isa.c: Delete.
* gcc.target/arm/di-longlong64-sync-withhelpers.c: Update effective
target check and options.
* config/arm/libunwind.S: Update comment relating to armv5.
Martin Liska [Fri, 18 May 2018 13:06:31 +0000 (15:06 +0200)]
gcov: add new option -t that prints output to stdout (PR gcov-profile/84846).
2018-05-18 Martin Liska <mliska@suse.cz>
PR gcov-profile/84846
* gcov.c (print_usage): Add new -t option.
(process_args): Handle the option.
(generate_results): Use stdout as output when requested by
the option.
2018-05-18 Martin Liska <mliska@suse.cz>
PR gcov-profile/84846
* doc/gcov.texi: Document -t option of gcov tool.
Toon Moene [Fri, 18 May 2018 09:07:39 +0000 (09:07 +0000)]
invoke.texi: Move -floop-unroll-and-jam documentation directly after that of -floop-interchange.
2018-05-18 Toon Moene <toon@moene.org>
* doc/invoke.texi: Move -floop-unroll-and-jam documentation
directly after that of -floop-interchange. Indicate that both
options are enabled by default when specifying -O3.
Kyrylo Tkachov [Fri, 18 May 2018 08:52:30 +0000 (08:52 +0000)]
[AArch64] Unify vec_set patterns, support floating-point vector modes properly
We've a deficiency in our vec_set family of patterns.
We don't support directly loading a vector lane using LD1 for V2DImode and all the vector floating-point modes.
We do do it correctly for the other integer vector modes (V4SI, V8HI etc) though.
The alternatives on the relative floating-point patterns only allow a register-to-register INS instruction.
That means if we want to load a value into a vector lane we must first load it into a scalar register and then
perform an INS, which is wasteful.
There is also an explicit V2DI vec_set expander dangling around for no reason that I can see. It seems to do the
exact same things as the other vec_set expanders. This patch removes that.
It now unifies all vec_set expansions into a single "vec_set<mode>" define_expand using the catch-all VALL_F16 iterator.
With this patch we avoid loading values into scalar registers and then doing an explicit INS on them to move them into
the desired vector lanes. For example for:
typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));
v2di
foo_v2di (long long *a, long long *b)
{
v2di res = { *a, *b };
return res;
}
* config/aarch64/aarch64-simd.md (vec_set<mode>): Use VALL_F16 mode
iterator. Delete separate integer-mode vec_set<mode> expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set<mode>): Delete all other patterns with that name.
Use VALL_F16 mode iterator. Add LD1 alternative and use vwcore for
the "w, r" alternative.
Martin Liska [Fri, 18 May 2018 08:43:19 +0000 (10:43 +0200)]
Radically simplify emission of balanced tree for switch statements.
2018-05-18 Martin Liska <mliska@suse.cz>
* passes.def: Add pass_lower_switch and pass_lower_switch_O0.
* tree-pass.h (make_pass_lower_switch_O0): New function.
* tree-switch-conversion.c (node_has_low_bound): Remove.
(node_has_high_bound): Likewise.
(node_is_bounded): Likewise.
(class pass_lower_switch): Make it a template type and create
two instances.
(pass_lower_switch::execute): Add template argument.
(make_pass_lower_switch): New function.
(make_pass_lower_switch_O0): New function.
(do_jump_if_equal): Remove.
(emit_case_nodes): Simplify to just handle all 3 cases and leave
all the hard work to tree optimization passes.
2018-05-18 Martin Liska <mliska@suse.cz>
* gcc.dg/tree-ssa/vrp104.c: Adjust dump file that is scanned.
* gcc.dg/tree-prof/update-loopch.c: Likewise.
Martin Liska [Fri, 18 May 2018 08:42:15 +0000 (10:42 +0200)]
Support lower and upper limit for -fdbg-cnt flag.
2018-05-18 Martin Liska <mliska@suse.cz>
* dbgcnt.c (limit_low): Renamed from limit.
(limit_high): New variable.
(dbg_cnt_is_enabled): Check for upper limit.
(dbg_cnt): Adjust dumping.
(dbg_cnt_set_limit_by_index): Add new argument for high
value.
(dbg_cnt_set_limit_by_name): Likewise.
(dbg_cnt_process_single_pair): Parse new format.
(dbg_cnt_process_opt): Use strtok.
(dbg_cnt_list_all_counters): Remove 'value' and add
'limit_high'.
* doc/invoke.texi: Document changes.
2018-05-18 Martin Liska <mliska@suse.cz>
* gcc.dg/ipa/ipa-icf-39.c: New test.
* gcc.dg/pr68766.c: Adjust pruned output.
There are four optabs for various forms of fused multiply-add:
fma, fms, fnma and fnms. Of these, only fma had a direct gimple
representation. For the other three we relied on special pattern-
matching during expand, although tree-ssa-math-opts.c did have
some code to try to second-guess what expand would do.
This patch removes the old FMA_EXPR representation of fma and
introduces four new internal functions, one for each optab.
IFN_FMA is tied to BUILT_IN_FMA* while the other three are
independent directly-mapped internal functions. It's then
possible to do the pattern-matching in match.pd and
tree-ssa-math-opts.c (via folding) can select the exact
FMA-based operation.
The BRIG & HSA parts are a best guess, but seem relatively simple.
2018-05-18 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* doc/sourcebuild.texi (scalar_all_fma): Document.
* tree.def (FMA_EXPR): Delete.
* internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
* internal-fn.c (ternary_direct): New macro.
(expand_ternary_optab_fn): Likewise.
(direct_ternary_optab_supported_p): Likewise.
* Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
* builtins.c (fold_builtin_fma): Delete.
(fold_builtin_3): Don't call it.
* cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (operand_equal_p): Likewise.
(fold_ternary_loc): Likewise.
* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
* gimple.c (DEFTREECODE): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-eh.c (operation_could_trap_p): Likewise.
(stmt_could_throw_1_p): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
* tree-ssa-loop-im.c (stmt_cost): Likewise.
* tree-ssa-operands.c (get_expr_operands): Likewise.
* tree.c (commutative_ternary_tree_code, add_expr): Likewise.
* fold-const-call.h (fold_fma): Delete.
* fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS,
CFN_FNMA and CFN_FNMS.
(fold_fma): Delete.
* genmatch.c (combined_fn): New enum.
(commutative_ternary_tree_code): Remove FMA_EXPR handling.
(commutative_op): New function.
(commutate): Use it. Handle more than 2 operands.
(dt_operand::gen_gimple_expr): Use commutative_op.
(parser::parse_expr): Allow :c to be used with non-binary
operators if the commutative operand is known.
* gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
CFN_FMS, CFN_FNMA and CFN_FNMS.
(backprop::process_assign_use): Remove FMA_EXPR handling.
* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
(gen_hsa_fma): New function.
(gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
IFN_FNMA and IFN_FNMS.
* match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
* gimple-fold.h (follow_all_ssa_edges): Declare.
* gimple-fold.c (follow_all_ssa_edges): New function.
* tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the
gimple_build interface and use follow_all_ssa_edges to fold the result.
(convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
instead of checking for optabs directly.
* config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
rather than FMA_EXPRs.
* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
call to IFN_FMA instead of an FMA_EXPR.
gcc/brig/
* brigfrontend/brig-function.cc
(brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
for BRIG_OPCODE_FMA.
(brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
as a call.
Jim Wilson [Thu, 17 May 2018 22:37:38 +0000 (22:37 +0000)]
RISC-V: Optimize switch with sign-extended index.
gcc/
* expr.c (do_tablejump): When converting index to Pmode, if we have a
sign extended promoted subreg, and the range does not have the sign bit
set, then do a sign extend.
* config/riscv/riscv.c (riscv_extend_comparands): In unsigned QImode
test, check for sign extended subreg and/or constant operands, and
do a sign extend in that case.
Jonathan Wakely [Thu, 17 May 2018 15:36:25 +0000 (16:36 +0100)]
PR libstdc++/85818 ensure path::preferred_separator is defined
Because path.cc is compiled with -std=gnu++17 the static constexpr
data member is implicitly 'inline' and so no definition gets emitted
unless it gets used in that translation unit. Other translation units
built as C++11 or C++14 still require a namespace-scope definition of
the variable, so mark the definition as used.
Jonathan Wakely [Thu, 17 May 2018 15:03:29 +0000 (16:03 +0100)]
PR libstdc++/85812 fix memory leak in std::make_exception_ptr
PR libstdc++/85812
* libsupc++/cxxabi_init_exception.h (__cxa_free_exception): Declare.
* libsupc++/exception_ptr.h (make_exception_ptr) [__cpp_exceptions]:
Refactor to separate non-throwing and throwing implementations.
[__cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI]: Deallocate the memory
if constructing the object throws.
This patch gets the gimple FE to parse calls to internal functions.
The only non-obvious thing was how the functions should be written
to avoid clashes with real function names. One option would be to
go the magic number of underscores route, but we already do that for
built-in functions, and it would be good to keep them visually
distinct. In the end I borrowed the local/internal label convention
from asm and used:
x = .SQRT (y);
2018-05-17 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* internal-fn.h (lookup_internal_fn): Declare
* internal-fn.c (lookup_internal_fn): New function.
* gimple.c (gimple_build_call_from_tree): Handle calls to
internal functions.
* gimple-pretty-print.c (dump_gimple_call): Print "." before
internal function names.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-scopedtables.c (expr_hash_elt::print): Likewise.
gcc/c/
* gimple-parser.c: Include internal-fn.h.
(c_parser_gimple_statement): Treat a leading CPP_DOT as a call.
(c_parser_gimple_call_internal): New function.
(c_parser_gimple_postfix_expression): Use it to handle CPP_DOT.
Fix typos in comment.
gcc/testsuite/
* gcc.dg/gimplefe-28.c: New test.
* gcc.dg/asan/use-after-scope-9.c: Adjust expected output for
internal function calls.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
This patch makes the function versions of gimple_build and
gimple_simplify take combined_fns rather than built_in_codes,
so that they work with internal functions too. The old
gimple_builds were unused, so no existing callers need
to be updated.
2018-05-17 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* gimple-fold.h (gimple_build): Make the function forms take
combined_fn rather than built_in_function.
(gimple_simplify): Likewise.
* gimple-match-head.c (gimple_simplify): Likewise.
* gimple-fold.c (gimple_build): Likewise.
* tree-vect-loop.c (get_initial_def_for_reduction): Use gimple_build
rather than gimple_build_call_internal.
(get_initial_defs_for_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_live_operation): Likewise.
Jakub Jelinek [Thu, 17 May 2018 09:47:52 +0000 (11:47 +0200)]
avx512fintrin.h (_mm512_set_epi16, [...]): New intrinsics.
* config/i386/avx512fintrin.h (_mm512_set_epi16, _mm512_set_epi8,
_mm512_setzero): New intrinsics.
* gcc.target/i386/avx512f-set-v32hi-1.c: New test.
* gcc.target/i386/avx512f-set-v32hi-2.c: New test.
* gcc.target/i386/avx512f-set-v32hi-3.c: New test.
* gcc.target/i386/avx512f-set-v32hi-4.c: New test.
* gcc.target/i386/avx512f-set-v32hi-5.c: New test.
* gcc.target/i386/avx512f-set-v64qi-1.c: New test.
* gcc.target/i386/avx512f-set-v64qi-2.c: New test.
* gcc.target/i386/avx512f-set-v64qi-3.c: New test.
* gcc.target/i386/avx512f-set-v64qi-4.c: New test.
* gcc.target/i386/avx512f-set-v64qi-5.c: New test.
* gcc.target/i386/avx512f-setzero-1.c: New test.
James Greenhalgh [Thu, 17 May 2018 09:39:02 +0000 (09:39 +0000)]
[patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting the remaining bits, and simply move the first element to the corrcet
place (implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
to catch:
(vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function, without requiring the second operand to also be a vec_duplicate.
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
code generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
Mark Wielaard [Wed, 16 May 2018 18:20:08 +0000 (18:20 +0000)]
DWARF: Add header for .debug_str_offsets table for dwarf_version 5.
DWARF5 defines a small header for .debug_str_offsets. Since we only use
it for split dwarf .dwo files we don't need to keep track of the actual
index offset in an attribute.
gcc/ChangeLog
* dwarf2out.c (count_index_strings): New function.
(output_indirect_strings): Call count_index_strings and generate
header for dwarf_version >= 5.
Mark Wielaard [Wed, 16 May 2018 18:02:25 +0000 (18:02 +0000)]
DWARF: Emit DWARF5 forms for indirect addresses and string offsets.
We already emit DWARF5 attributes and tables for indirect addresses
and string offsets, but still use GNU forms. Add a new helper function
dwarf_FORM () for emitting the right form.
Currently we only use the uleb128 forms. But DWARF5 also allows
1, 2, 3 and 4 byte forms (DW_FORM_strx[1234] and DW_FORM_addrx[1234])
which might be more space efficient.
Carl Love [Wed, 16 May 2018 16:06:08 +0000 (16:06 +0000)]
vsx-vector-6-be.c: Remove file.
gcc/testsuite/ChangeLog:
2018-05-16 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vsx-vector-6-be.c: Remove file.
* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file.
* gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for
xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi.
Wilco Dijkstra [Wed, 16 May 2018 14:33:16 +0000 (14:33 +0000)]
[AArch64] Improve register allocation of fma
This patch improves register allocation of fma by preferring to update the
accumulator register. This is done by adding fma insns with operand 1 as the
accumulator. The register allocator considers copy preferences only in operand
order, so if the first operand is dead, it has the highest chance of being
reused as the destination. As a result code using fma often has a better
register allocation. Performance of SPECFP2017 improves by over 0.5% on some
implementations, while it had no effect on other implementations. Fma is more
readable too, in a simple example we now generate:
Richard Biener [Wed, 16 May 2018 13:08:04 +0000 (13:08 +0000)]
tree-vectorizer.h (struct stmt_info_for_cost): Add where member.
2018-05-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (struct stmt_info_for_cost): Add where member.
(dump_stmt_cost): Declare.
(add_stmt_cost): Dump cost we add.
(add_stmt_costs): New function.
(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
No longer exported.
(vect_analyze_stmt): Adjust prototype.
(vectorizable_condition): Likewise.
(vectorizable_live_operation): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
* tree-vect-loop.c (vect_analyze_loop_operations): Create local
cost vector to pass to vectorizable_ and record afterwards.
(vect_model_reduction_cost): Take cost vector argument and adjust.
(vect_model_induction_cost): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_analyze_slp_cost_1): Remove.
(vect_analyze_slp_cost): Likewise.
(vect_slp_analyze_node_operations): Take visited args and
a target cost vector. Avoid processing already visited stmt sets.
(vect_slp_analyze_operations): Use a local cost vector to gather
costs and register those of non-discarded instances.
(vect_bb_vectorization_profitable_p): Use add_stmt_costs.
(vect_schedule_slp_instance): Remove copying of
SLP_TREE_NUMBER_OF_VEC_STMTS. Instead assert that it is not
zero.
* tree-vect-stmts.c (record_stmt_cost): Remove path directly
adding cost. Record cost entry location.
(vect_prologue_cost_for_slp_op): Function to compute cost of
a constant or invariant generated for SLP vect in the prologue,
split out from vect_analyze_slp_cost_1.
(vect_model_simple_cost): Make static. Adjust for SLP costing.
(vect_model_promotion_demotion_cost): Likewise.
(vect_model_store_cost): Likewise, make static.
(vect_model_load_cost): Likewise.
(vectorizable_bswap): Add cost vector arg and adjust.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(can_vectorize_live_stmts): Likewise.
(vect_analyze_stmt): Likewise.
(vect_transform_stmt): Adjust calls to vectorizable_*.
* tree-vectorizer.c: Include gimple-pretty-print.h.
(dump_stmt_cost): New function.
Richard Biener [Wed, 16 May 2018 13:02:27 +0000 (13:02 +0000)]
params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param.
2018-05-16 Richard Biener <rguenther@suse.de>
* params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param.
* doc/invoke.texi (dse-max-alias-queries-per-store): Document.
* tree-ssa-dse.c: Include tree-ssa-loop.h.
(check_name): New callback.
(dse_classify_store): Track cycles via a visited bitmap of PHI
defs and simplify handling of in-loop and across loop dead stores
and properly fail for loop-variant refs. Handle byte-tracking with
multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for
limiting the walk.
* gcc.dg/tree-ssa/ssa-dse-32.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-33.c: Likewise.
* gcc.dg/uninit-pr81897-2.c: Use -fno-tree-dse.
Handle vector boolean types when calculating the SLP unroll factor
The SLP unrolling factor is calculated by finding the smallest
scalar type for each SLP statement and taking the number of required
lanes from the vector versions of those scalar types. E.g. for an
int32->int64 conversion, it's the vector of int32s rather than the
vector of int64s that determines the unroll factor.
We rely on tree-vect-patterns.c to replace boolean operations like:
bool a, b, c;
a = b & c;
with integer operations of whatever the best size is in context.
E.g. if b and c are fed by comparisons of ints, a, b and c will become
the appropriate size for an int comparison. For most targets this means
that a, b and c will end up as int-sized themselves, but on targets like
SVE and AVX512 with packed vector booleans, they'll instead become a
small bitfield like :1, padded to a byte for memory purposes.
The SLP code would then take these scalar types and try to calculate
the vector type for them, causing the unroll factor to be much higher
than necessary.
This patch tries to make the SLP code use the same approach as the
loop vectorizer, by splitting out the code that calculates the
statement vector type and the vector type that should be used for
the number of units.
2018-05-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,
split out from...
(vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt
to determine the statement's vector type and the vector type that
should be used for calculating nunits. Deal with cases in which
the type has to be deferred.
(vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt
and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1)
(vect_determine_vf_for_stmt): New functions, split out from...
(vect_determine_vectorization_factor): ...here.
* tree-vect-stmts.c (vect_get_vector_types_for_stmt)
(vect_get_mask_type_for_stmt): New functions, split out from
vect_determine_vectorization_factor.