Steven G. Kargl [Sun, 18 Mar 2018 17:51:57 +0000 (17:51 +0000)]
re PR fortran/77414 (ICE in create_function_arglist, at fortran/trans-decl.c:2410)
2018-03-18 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/77414
* decl.c (get_proc_name): Check for a subroutine re-defined in
the contain portion of a subroutine. Change language of existing
error message to better describe the issue. While here fix whitespace
issues.
Don't try to vectorise COND_EXPR reduction chains (PR 84913)
The testcase ICEd for both SVE and AVX512 because we were trying
to vectorise a chain of COND_EXPRs as a reduction and getting
confused by reduc_index == -1.
2018-03-18 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/84913
* tree-vect-loop.c (vectorizable_reduction): Don't try to
vectorize chains of COND_EXPRs.
gcc/testsuite/
PR tree-optimization/84913
* gfortran.dg/vect/pr84913.f90: New test.
Jakub Jelinek [Sat, 17 Mar 2018 11:12:00 +0000 (12:12 +0100)]
re PR target/84902 (549.fotonik3d_r from SPEC2017 fails verification with -Ofast -march=native on Zen since r258518)
PR target/84902
* config/i386/i386.c (initial_ix86_tune_features,
initial_ix86_arch_features): Use unsigned HOST_WIDE_INT rather than
unsigned long long.
(set_ix86_tune_features): Change ix86_tune_mask from unsigned int
to unsigned HOST_WIDE_INT, initialize to HOST_WIDE_INT_1U << ix86_tune
rather than 1u << ix86_tune. Formatting fix.
(ix86_option_override_internal): Change ix86_arch_mask from
unsigned int to unsigned HOST_WIDE_INT, initialize to
HOST_WIDE_INT_1U << ix86_arch rather than 1u << ix86_arch.
(ix86_function_specific_restore): Likewise.
Jakub Jelinek [Fri, 16 Mar 2018 21:01:16 +0000 (22:01 +0100)]
re PR target/84899 (ICE: in final_scan_insn_1, at final.c:3139 (error: could not split insn))
PR target/84899
* postreload.c (reload_combine_recognize_pattern): Perform
INTVAL addition in unsigned HOST_WIDE_INT type to avoid UB and
truncate_int_for_mode the result for the destination's mode.
Ian Lance Taylor [Fri, 16 Mar 2018 19:01:40 +0000 (19:01 +0000)]
libgo: add runtime/pprof/internal/profile.gox to noinst_DATA
Also add noinst_DATA to CHECK_DEPS; it's not needed in practice since
`make` will build noinst_DATA, but it's logically required and will
make a difference if any of the noinst_DATA sources change between
`make` and `make check`.
Tony Reix figured out why omitting packages from noinst_DATA didn't
seem to matter: because if gccgo can't find foo.gox, it will fall back
to reading the export data in foo.o, and foo.o will exist because
these packages go into libgo.a.
Vladimir Makarov [Fri, 16 Mar 2018 18:48:26 +0000 (18:48 +0000)]
re PR target/84876 (ICE on invalid code in lra_assign at gcc/lra-assigns.c:1601 since r258504)
2018-03-16 Vladimir Makarov <vmakarov@redhat.com>
PR target/84876
* lra-assigns.c (lra_split_hard_reg_for): Don't use
regno_allocno_class_array and sorted_pseudos.
* lra-constraints.c (spill_hard_reg_in_range): Ignore hard regs in
insns where regno is used.
2018-03-16 Vladimir Makarov <vmakarov@redhat.com>
PR target/84876
* gcc.target/i386/pr84876.c: New test.
Jakub Jelinek [Fri, 16 Mar 2018 12:46:12 +0000 (13:46 +0100)]
re PR c++/79937 (ICE in replace_placeholders_r)
PR c++/79937
PR c++/82410
* tree.h (TARGET_EXPR_NO_ELIDE): Define.
* gimplify.c (gimplify_modify_expr_rhs): Don't elide TARGET_EXPRs with
TARGET_EXPR_NO_ELIDE flag set unless *expr_p is INIT_EXPR.
* cp-tree.h (CONSTRUCTOR_PLACEHOLDER_BOUNDARY): Define.
(find_placeholder): Declare.
* tree.c (struct replace_placeholders_t): Add exp member.
(replace_placeholders_r): Don't walk into ctors with
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag set, unless they are equal to
d->exp. Replace PLACEHOLDER_EXPR with unshare_expr (x) rather than x.
(replace_placeholders): Initialize data.exp.
(find_placeholders_r, find_placeholders): New functions.
* typeck2.c (process_init_constructor_record,
process_init_constructor_union): Set CONSTRUCTOR_PLACEHOLDER_BOUNDARY
if adding NSDMI on which find_placeholder returns true.
* call.c (build_over_call): Don't call replace_placeholders here.
* cp-gimplify.c (cp_genericize_r): Set TARGET_EXPR_NO_ELIDE on
TARGET_EXPRs with CONSTRUCTOR_PLACEHOLDER_BOUNDARY set on
TARGET_EXPR_INITIAL.
(cp_fold): Copy over CONSTRUCTOR_PLACEHOLDER_BOUNDARY bit to new
ctor.
* g++.dg/cpp1y/pr79937-1.C: New test.
* g++.dg/cpp1y/pr79937-2.C: New test.
* g++.dg/cpp1y/pr79937-3.C: New test.
* g++.dg/cpp1y/pr79937-4.C: New test.
* g++.dg/cpp1y/pr82410.C: New test.
Jakub Jelinek [Fri, 16 Mar 2018 08:06:41 +0000 (09:06 +0100)]
re PR tree-optimization/84841 (ICE: tree check: expected ssa_name, have real_cst in rewrite_expr_tree_parallel, at tree-ssa-reassoc.c:4624)
PR tree-optimization/84841
* tree-ssa-reassoc.c (INTEGER_CONST_TYPE): Change to 1 << 4 from
1 << 3.
(FLOAT_ONE_CONST_TYPE): Define.
(constant_type): Return FLOAT_ONE_CONST_TYPE for -1.0 and 1.0.
(sort_by_operand_rank): Put entries with higher constant_type last
rather than first to match comments.
Mark Doffman [Fri, 16 Mar 2018 05:07:39 +0000 (23:07 -0600)]
03-16-2018 Mark Doffman <mark.doffman@codethink.co.uk>
Jim MacArthur <jim.macarthur@codethink.co.uk>
* gfortran.dg/automatic_1.f90: New test.
* gfortran.dg/automatic_repeat.f90: New test
* gfortran.dg/automatic_save.f90: New test.
* gfortran.dg/vax_structure.f90: New test.
H.J. Lu [Thu, 15 Mar 2018 17:54:40 +0000 (17:54 +0000)]
i386: Don't generate alias for function return thunk
Function return thunks shouldn't be aliased to indirect branch thunks
since indirect branch thunks are placed in COMDAT section and a COMDAT
section with indirect branch may not have function return thunk. This
patch generates function return thunks directly.
gcc/
PR target/84574
* config/i386/i386.c (indirect_thunk_needed): Update comments.
(indirect_thunk_bnd_needed): Likewise.
(indirect_thunks_used): Likewise.
(indirect_thunks_bnd_used): Likewise.
(indirect_return_needed): New.
(indirect_return_bnd_needed): Likewise.
(output_indirect_thunk_function): Add a bool argument for
function return.
(output_indirect_thunk_function): Don't generate alias for
function return thunk.
(ix86_code_end): Call output_indirect_thunk_function to generate
function return thunks.
(ix86_output_function_return): Set indirect_return_bnd_needed
and indirect_return_needed instead of indirect_thunk_bnd_needed
and indirect_thunk_needed.
Jakub Jelinek [Thu, 15 Mar 2018 17:45:01 +0000 (18:45 +0100)]
re PR c++/84222 ([[deprecated]] class complains about internal class usage)
PR c++/84222
* cp-tree.h (cp_warn_deprecated_use): Declare.
* tree.c (cp_warn_deprecated_use): New function.
* typeck2.c (build_functional_cast): Use it.
* decl.c (grokparms): Likewise.
(grokdeclarator): Likewise. Temporarily push nested class scope
around grokparms call for out of class member definitions.
* g++.dg/warn/deprecated.C (T::member3): Change dg-warning to dg-bogus.
* g++.dg/warn/deprecated-6.C (T::member3): Likewise.
* g++.dg/warn/deprecated-13.C: New test.
Ian Lance Taylor [Thu, 15 Mar 2018 16:56:24 +0000 (16:56 +0000)]
cmd/go: force LANG=C when looking for compiler version
Tested by installing the gcc-locales package and running
LANG=de_DE.utf8 go build hello.go
Without this change, that fails, as described at
https://gcc.gnu.org/PR84765.
Jakub Jelinek [Thu, 15 Mar 2018 07:37:53 +0000 (08:37 +0100)]
re PR c/84853 (ICE: verify_gimple failed (expand_shift_1))
PR c/84853
* c-typeck.c (build_binary_op) <case RSHIFT_EXPR, case LSHIFT_EXPR>:
If code1 is INTEGER_TYPE, only allow code0 VECTOR_TYPE if it has
INTEGER_TYPE element type.
Jason Merrill [Thu, 15 Mar 2018 03:49:07 +0000 (23:49 -0400)]
PR c++/84801 - ICE with unexpanded pack in lambda.
We avoid complaining about unexpanded packs when inside a lambda,
since the lambda as a whole could be part of a pack expansion.
But that can only be true if the lambda is in a template context.
* pt.c (check_for_bare_parameter_packs): Don't return early for a
lambda in non-template context.
Jonathan Wakely [Wed, 14 Mar 2018 23:02:01 +0000 (23:02 +0000)]
PR libstdc++/78420 Make std::less etc. yield total order for pointers
In order for std::less<T*> etc. to meet the total order requirements of
[comparisons] p2 we need to cast unrelated pointers to uintptr_t before
comparing them. Those casts aren't allowed in constant expressions, so
only cast when __builtin_constant_p says the result of the comparison is
not a compile-time constant (because the arguments are not constants, or
the result of the comparison is unspecified). When the result is
constant just compare the pointers directly without casting.
This ensures that the function can be called in constant expressions
with suitable arguments, but still yields a total order even for
otherwise unspecified pointer comparisons.
For std::less<void> etc. add new overloads for pointers, which use
std::less<common_type_t<T*,U*>> directly. Also change the generic
overloads to detect when the comparison would call a built-in relational
operator with pointer operands, and dispatch that case to the
corresponding specialization for void pointers.
PR libstdc++/78420
* include/bits/stl_function.h (greater<_Tp*>, less<_Tp*>)
(greater_equal<_Tp*>, less_equal<_Tp>*): Add partial specializations
to ensure total order for pointers.
(greater<void>, less<void>, greater_equal<void>, less_equal<void>):
Add operator() overloads for pointer arguments and make generic
overloads dispatch to new _S_cmp functions when comparisons would
use built-in operators for pointers.
* testsuite/20_util/function_objects/comparisons_pointer.cc: New.
Carl Love [Wed, 14 Mar 2018 23:01:12 +0000 (23:01 +0000)]
re PR target/84422 (ICE on various builtin test functions when compiled with -mcpu=power7)
gcc/ChangeLog:
2018-03-14 Carl Love <cel@us.ibm.com>
PR target/84422
* config/rs6000/rs6000-builtin.def: Change expansion for
VMULESW to BU_P8V_AV_2.
Change expansion for VMULEUW to BU_P8V_AV_2.
* config/rs6000/rs6000.c: Change
ALTIVEC_BUILTIN_VMULESW to P8V_BUILTIN_VMULESW.
Change ALTIVEC_BUILTIN_VMULEUW to P8V_BUILTIN_VMULEUW.
Change ALTIVEC_BUILTIN_VMULOSW to P8V_BUILTIN_VMULOSW.
Change ALTIVEC_BUILTIN_VMULOUW to P8V_BUILTIN_VMULOUW.
* config/rs6000/rs6000-c.c: Change
ALTIVEC_BUILTIN_VMULESW to P8V_BUILTIN_VMULESW.
Change ALTIVEC_BUILTIN_VMULEUW to P8V_BUILTIN_VMULEUW.
Change ALTIVEC_BUILTIN_VMULOSW to P8V_BUILTIN_VMULOSW.
Change ALTIVEC_BUILTIN_VMULOUW to P8V_BUILTIN_VMULOUW.
The underlying problem is the mix of types for storing line numbers:
in parts of libcpp and diagnostic-show-locus.c we use linenum_type;
in other places (including libcpp's expanded_location) we use int.
I looked at using linenum_type throughout, but doing so turned into
a large patch, so this patch fixes the ICE in a less invasive way
by merely using linenum_type more consistently just within
diagnostic-show-locus.c, and fixing line_span::comparator to properly
handle line numbers (and line number differences) >= 2^31, by using
a new helper function for linenum_type differences, computing the
difference using long long, and using the sign of the difference
(as the difference might not fit in the "int" return type imposed
by qsort).
gcc/ChangeLog:
PR c/84852
* diagnostic-show-locus.c (class layout_point): Convert m_line
from int to linenum_type.
(line_span::comparator): Use linenum "compare" function when
comparing line numbers.
(test_line_span): New function.
(layout_range::contains_point): Convert param "row" from int to
linenum_type.
(layout_range::intersects_line_p): Likewise.
(layout::will_show_line_p): Likewise.
(layout::print_source_line): Likewise.
(layout::should_print_annotation_line_p): Likewise.
(layout::print_annotation_line): Likewise.
(layout::print_leading_fixits): Likewise.
(layout::annotation_line_showed_range_p): Likewise.
(struct line_corrections): Likewise for field m_row.
(line_corrections::line_corrections): Likewise for param "row".
(layout::print_trailing_fixits): Likewise.
(layout::get_state_at_point): Likewise.
(layout::get_x_bound_for_row): Likewise.
(layout::print_line): Likewise.
(diagnostic_show_locus): Likewise for locals "last_line" and
"row".
(selftest::diagnostic_show_locus_c_tests): Call test_line_span.
* input.c (selftest::test_linenum_comparisons): New function.
(selftest::input_c_tests): Call it.
* selftest.c (selftest::test_assertions): Test ASSERT_GT,
ASSERT_GT_AT, ASSERT_LT, and ASSERT_LT_AT.
* selftest.h (ASSERT_GT): New macro.
(ASSERT_GT_AT): New macro.
(ASSERT_LT): New macro.
(ASSERT_LT_AT): New macro.
gcc/testsuite/ChangeLog:
PR c/84852
* gcc.dg/fixits-pr84852-1.c: New test.
* gcc.dg/fixits-pr84852-2.c: New test.
libcpp/ChangeLog:
* include/line-map.h (compare): New function on linenum_type.
combine: Don't make log_links for pc_rtx (PR84780 #c10)
distribute_links tries to place a log_link for whatever the destination
of the modified instruction is. It shouldn't do that when that dest
is pc_rtx, which isn't actually a register.
* combine.c (distribute_links): Don't make a link based on pc_rtx.
Martin Liska [Wed, 14 Mar 2018 11:17:01 +0000 (12:17 +0100)]
Fix tree statistics with -fmem-report.
2018-03-14 Martin Liska <mliska@suse.cz>
* tree.c (record_node_allocation_statistics): Use
get_stats_node_kind.
(get_stats_node_kind): New function extracted from
record_node_allocation_statistics.
(free_node): Use get_stats_node_kind.
scan-assembler-times and scan-tree-dump-times dejagnu directives show a
different output in the summary files depending on whether they PASS or
FAIL. This means that dg-cmp-results would not show a regression because
it would not see a connection between the two output.
The difference comes from the FAIL showing the number of actual times
the pattern was match, presumably to help debugging. This patch moves
the info regarding the actual number of times the pattern match in a
separate verbose message. This keeps the message unchanged but let
developers have the required debug message with -v.
2018-03-14 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Move FAIL debug info into a
separate verbose message.
* lib/scandump.exp (scan-dump-times): Likewise.
Jakub Jelinek [Wed, 14 Mar 2018 08:50:23 +0000 (09:50 +0100)]
re PR sanitizer/83392 (FAIL: c-c++-common/ubsan/ptr-overflow-sanitization-1.c scan-tree-dump-times)
PR sanitizer/83392
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Handle also
INTEGER_CST offset, add it together with bitpos / 8 and
sign extend based on POINTER_SIZE.
* c-c++-common/ubsan/ptr-overflow-sanitization-1.c: Adjust expected
check count from 17 to 14.
PR target/78090
* config/i386/constraints.md (Yc): New register constraint.
* config/i386/i386.md (*float<SWI48:mode><MODEF:mode>2_mixed):
Use Yc constraint for alternative 2 of operand 0. Remove
preferred_for_speed attribute.
Vladimir Makarov [Tue, 13 Mar 2018 20:42:49 +0000 (20:42 +0000)]
re PR target/83712 ("Unable to find a register to spill" when compiling for thumb1)
2018-03-13 Vladimir Makarov <vmakarov@redhat.com>
PR target/83712
* lra-assigns.c (find_all_spills_for): Ignore uninteresting
pseudos.
(assign_by_spills): Return a flag of reload assignment failure.
Do not process the reload assignment failures. Do not spill other
reload pseudos if they has the same reg class. Update n if
necessary.
(lra_assign): Add a return arg. Set up from the result of
assign_by_spills call.
(find_reload_regno_insns, lra_split_hard_reg_for): New functions.
* lra-constraints.c (split_reg): Add a new arg. Use it instead of
usage_insns if it is not NULL.
(spill_hard_reg_in_range): New function.
(split_if_necessary, inherit_in_ebb): Pass a new arg to split_reg.
* lra-int.h (spill_hard_reg_in_range, lra_split_hard_reg_for): New
function prototypes.
(lra_assign): Change prototype.
* lra.c (lra): Add code to deal with fails by splitting hard reg
live ranges.
Jakub Jelinek [Tue, 13 Mar 2018 20:32:54 +0000 (21:32 +0100)]
re PR c++/84843 (C++ ICE on builtin redefinition since r258391)
PR c++/84843
* decl.c (duplicate_decls): For redefinition of built-in, use error
and return error_mark_node. For redeclaration, return error_mark_node
rather than olddecl if !flag_permissive.
* g++.dg/ext/pr84843-1.C: New test.
* g++.dg/ext/pr84843-2.C: New test.
Palmer Dabbelt [Tue, 13 Mar 2018 18:35:06 +0000 (18:35 +0000)]
RISC-V: Add and document the "-mno-relax" option
RISC-V relies on aggressive linker relaxation to get good code size. As
a result no text symbol addresses can be known until link time, which
means that alignment must be handled during the link. This alignment
pass is essentially just another linker relaxation, so this has the
unfortunate side effect that linker relaxation is required for
correctness on many RISC-V targets.
The RISC-V assembler has supported an ".option norelax" for a long time
because there are situations in which linker relaxation is a bad idea --
the canonical example is when trying to materialize the initial value of
the global pointer into a register, which would otherwise be relaxed to
a NOP. We've been relying on users who want to disable relaxation for
an entire link to pass "-Wl,--no-relax", but that still relies on the
linker relaxing R_RISCV_ALIGN to handle alignment despite it not being
strictly necessary.
This patch adds a GCC option, "-mno-relax", that disable linker
relaxation by adding ".option norelax" to the top of every generated
assembly file. The assembler is smart enough to handle alignment at
assemble time for files that have never emitted a relaxable relocation,
so this is sufficient to really disable all relaxations in the linker,
which results in significantly faster link times for large objects.
This also has the side effect of allowing toolchains that don't support
linker relaxation (LLVM and the Linux module loader) to function
correctly. Toolchains that don't support linker relaxation should
default to "-mno-relax" and error when presented with any R_RISCV_ALIGN
relocation as those need to be handled for correctness.
gcc/ChangeLog
2018-03-13 Palmer Dabbelt <palmer@sifive.com>
* config/riscv/riscv.opt (mrelax): New option.
* config/riscv/riscv.c (riscv_file_start): Emit ".option
"norelax" when riscv_mrelax is disabled.
* doc/invoke.texi (RISC-V): Document "-mrelax" and "-mno-relax".
Currently, function output_init_element () does not evaluate the left
hand expression in a comma operator that's used for a struct
initializer field if the right hand side is zero-sized. However, the
left hand expression must be evaluated if it's found to have side
effects (for example, a function call).
Patch was successfully bootstrapped and tested on x86_64-linux.
gcc/c:
2018-03-13 David Pagan <dave.pagan@oracle.com>
PR c/46921
* c-typeck.c (output_init_element): Ensure field initializer
expression is always evaluated if there are side effects.
gcc/testsuite:
2018-03-13 David Pagan <dave.pagan@oracle.com>
Martin Sebor [Tue, 13 Mar 2018 15:33:16 +0000 (15:33 +0000)]
PR tree-optimization/84725 - enable attribute nonstring for all narrow character types
gcc/c-family/ChangeLog:
PR tree-optimization/84725
* c-attribs.c (handle_nonstring_attribute): Allow attribute nonstring
with all three narrow character types, including their qualified forms.
gcc/testsuite/ChangeLog:
PR tree-optimization/84725
* c-c++-common/Wstringop-truncation-4.c: New test.
* c-c++-common/attr-nonstring-5.c: New test.
[SLP/AArch64] Fix unpack handling for big-endian SVE
I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
lanes. This meant that both the SVE patterns and the handling of
fully-masked loops were wrong.
The patch deals with that by making sure that all vec_unpack* optabs
are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
define_insn. This in turn meant that we can get rid of the duplication
between the signed and unsigned patterns for predicates. (We provide
implementations of both the signed and unsigned optabs because the sign
doesn't matter for predicates: every element contains only one
significant bit.)
Also, the float unpacks need to unpack one half of the input vector,
but the unpacked upper bits are "don't care". There are two obvious
ways of handling that: use an unpack (filling with zeros) or use a ZIP
(filling with a duplicate of the low bits). The code previously used
unpacks, but the sequence involved a subreg that is semantically an
element reverse on big-endian targets. Using the ZIP patterns avoids
that, and at the moment there's no reason to prefer one over the other
for performance reasons, so the patch switches to ZIP unconditionally.
As the comment says, it would be easy to optimise this later if UUNPK
turns out to be better for some implementations.
2018-03-13 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
for big-endian.
* config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
* config/aarch64/aarch64-sve.md
(*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
(aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
(*extend<mode><Vwide>2): Rename to...
(aarch64_sve_extend<mode><Vwide>2): ...this.
(vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
renaming the old pattern to...
(aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define
unsigned packs.
(vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
define_expand, renaming the old pattern to...
(aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
(*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
(vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
account when deciding which SVE instruction the optab should use.
(vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
than unpacks.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
tlsdesc calls are guaranteed to preserve all Advanced SIMD registers,
but are not guaranteed to preserve the SVE extension of them.
The calls also don't preserve the SVE predicate registers.
The long-term plan for handling the SVE vector registers is CLOBBER_HIGH,
which adds a clobber equivalent of TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
The pattern can then directly model the fact that the low 128 bits are
preserved and the upper bits are clobbered.
However, it's too late now for that to be included in GCC 8, so this
patch conservatively treats the whole vector register as being clobbered.
This has the obvious disadvantage that compiling for SVE can make NEON
code worse, but I don't think there's much we can do about that until
CLOBBER_HIGH is in.
2018-03-13 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/aarch64.md (V4_REGNUM, V8_REGNUM, V12_REGNUM)
(V20_REGNUM, V24_REGNUM, V28_REGNUM, P1_REGNUM, P2_REGNUM, P3_REGNUM)
(P4_REGNUM, P5_REGNUM, P6_REGNUM, P8_REGNUM, P9_REGNUM, P10_REGNUM)
(P11_REGNUM, P12_REGNUM, P13_REGNUM, P14_REGNUM): New define_constants.
(tlsdesc_small_<mode>): Turn a define_expand and use
tlsdesc_small_sve_<mode> for SVE. Rename original define_insn to...
(tlsdesc_small_advsimd_<mode>): ...this.
(tlsdesc_small_sve_<mode>): New pattern.
gcc/testsuite/
* gcc.target/aarch64/sve/tls_1.c: New test.
* gcc.target/aarch64/sve/tls_2.C: Likewise.
One advantage of the new permute handling compared to the old way is
that we can now easily take advantage of the vectoriser's divmod patterns
for SVE.
2018-03-13 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/iterators.md (UNSPEC_SMUL_HIGHPART)
(UNSPEC_UMUL_HIGHPART): New constants.
(MUL_HIGHPART): New int iteraor.
(su): Handle UNSPEC_SMUL_HIGHPART and UNSPEC_UMUL_HIGHPART.
* config/aarch64/aarch64-sve.md (<su>mul<mode>3_highpart): New
define_expand.
(*<su>mul<mode>3_highpart): New define_insn.
gcc/testsuite/
* gcc.target/aarch64/sve/mul_highpart_1.c: New test.
* gcc.target/aarch64/sve/mul_highpart_1_run.c: Likewise.