Ian Lance Taylor [Wed, 19 May 2021 01:09:27 +0000 (18:09 -0700)]
libgo: update configure to current sources
Change-Id: I12766baf02bfdf2233f1c5bde1a270f06b020aa7
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/321076
Trust: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
Xionghu Luo [Wed, 19 May 2021 02:34:18 +0000 (21:34 -0500)]
Run pass_sink_code once more before store_merging
Gimple sink code pass runs quite early, there may be some new
oppertunities exposed by later gimple optmization passes, this patch
runs the sink code pass once more before store_merging. For detailed
discussion, please refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html
Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
small with 0.25%.
Jason Merrill [Tue, 18 May 2021 21:15:42 +0000 (17:15 -0400)]
c++: ICE with bad definition of decimal32 [PR100261]
The change to only look at the global binding for non-classes meant that
here, when dealing with decimal32 which is magically mangled like its first
non-static data member, we got a collision with the mangling for float.
Fixed by also looking up an existing binding for such magical classes.
Here we have a pack expansion of a template template parameter pack, of
which the pattern is a TEMPLATE_DECL, which strip_typedefs doesn't want to
see.
PR c++/100372
gcc/cp/ChangeLog:
* tree.c (strip_typedefs): Only look at the pattern of a
TYPE_PACK_EXPANSION if it's a type.
Bill Schmidt [Tue, 18 May 2021 14:04:39 +0000 (09:04 -0500)]
rs6000: Remove old psabi warnings
Long ago we were forced to make some small ABI breaks to correct errors
in the implementation, and we added warning messages for the changes
from GCC 4.9 to GCC 5. Enough time has passed that these are now just
irritants, so let's remove them. Also clean up associated macros using
rs6000_special_adjust_field_align_p, which has been always returning
false for a long time.
Marek Polacek [Thu, 11 Mar 2021 19:01:52 +0000 (14:01 -0500)]
c++: Prune dead functions.
I was looking at the LCOV coverage report for the C++ FE and
found a bunch of unused functions that I think we can remove.
Obviously, I left alone various dump_* and debug_* routines.
I haven't removed cp_build_function_call although it is also
currently unused.
* lambda_return_type: was used in parser.c in GCC 7, unused since r255950,
* classtype_has_non_deleted_copy_ctor: appeared in GCC 10, its usage
was removed in c++/95350,
* contains_wildcard_p: used in GCC 9, unused since r276764,
* get_template_head_requirements: seems to never have been used,
* check_constrained_friend: seems to never have been used,
* subsumes_constraints: unused since r276764,
* push_void_library_fn: usage removed in r248328,
* get_template_parms_at_level: unused since r157857,
* get_pattern_parm: unused since r275387.
(Some of the seemingly unused functions, such as set_global_friend, are
actually used in libcc1.)
Jason Merrill [Tue, 18 May 2021 16:18:56 +0000 (12:18 -0400)]
c++: non-static member, decltype, {} [PR100205]
This test was fixed by my second patch for PR93314, which distinguishes
between constant-expression and potentially-constant-evaluated contexts in a
way that my first patch did not.
Jason Merrill [Tue, 18 May 2021 16:06:36 +0000 (12:06 -0400)]
c++: "perfect" implicitly deleted move [PR100644]
Here we were ignoring the template constructor because the implicit move
constructor had all perfect conversions. But CWG1402 says that an
implicitly deleted move constructor is ignored by overload resolution; we
implement that instead by preferring any other candidate in joust, to get
better diagnostics, but that means we need to handle that case here as well.
gcc/cp/ChangeLog:
PR c++/100644
* call.c (perfect_candidate_p): An implicitly deleted move
is not perfect.
David Malcolm [Tue, 18 May 2021 16:29:58 +0000 (12:29 -0400)]
analyzer: fix missing leak after call to strsep [PR100615]
PR analyzer/100615 reports a missing leak diagnostic.
The issue is that the code calls strsep which the analyzer doesn't
have special knowledge of, and so conservatively assumes that it
could free the pointer, so drops malloc state for it.
Properly "teaching" the analyzer about strsep would require it
to support bifurcating state at a call, which is currently fiddly to
do, so for now this patch notes that strsep doesn't affect the
malloc state machine, allowing the analyzer to correctly detect the leak.
gcc/analyzer/ChangeLog:
PR analyzer/100615
* sm-malloc.cc: Include "analyzer/function-set.h".
(malloc_state_machine::on_stmt): Call unaffected_by_call_p and
bail on the functions it recognizes.
(malloc_state_machine::unaffected_by_call_p): New.
gcc/testsuite/ChangeLog:
PR analyzer/100615
* gcc.dg/analyzer/pr100615.c: New test.
Uros Bizjak [Tue, 18 May 2021 15:25:54 +0000 (17:25 +0200)]
i386: Implement 4-byte vector support [PR100637]
Add infrastructure, logic and arithmetic support for 4-byte vectors.
These can be used with SSE2 targets, where movd instructions from/to
XMM registers are available. x86_64 ABI passes 4-byte vectors in
integer registers, so also add logic operations with integer registers.
2021-05-18 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/100637
* config/i386/i386.h (VALID_SSE2_REG_MODE):
Add V4QI and V2HI modes.
(VALID_INT_MODE_P): Ditto.
* config/i386/mmx.md (VI_32): New mode iterator.
(mmxvecsize): Handle V4QI and V2HI.
(Yv_Yw): Ditto.
(mov<VI_32:mode>): New expander.
(*mov<mode>_internal): New insn pattern.
(movmisalign<VI_32:mode>): New expander.
(neg<VI_32:mode>): New expander.
(<plusminus:insn><VI_32:mode>3): New expander.
(*<plusminus:insn><VI_32:mode>3): New insn pattern.
(mulv2hi3): New expander.
(*mulv2hi3): New insn pattern.
(one_cmpl<VI_32:mode>2): New expander.
(*andnot<VI_32:mode>3): New insn pattern.
(<any_logic:code><VI_32:mode>3): New expander.
(*<any_logic:code><VI_32:mode>3): New insn pattern.
gcc/testsuite/
PR target/100637
* gcc.target/i386/pr100637-1b.c: New test.
* gcc.target/i386/pr100637-1w.c: Ditto.
* gcc.target/i386/pr92658-avx2-2.c: Do not XFAIL scan for pmovsxbq.
* gcc.target/i386/pr92658-avx2.c: Do not XFAIL scan for pmovzxbq.
* gcc.target/i386/pr92658-avx512vl.c: Do not XFAIL scan for vpmovdb.
* gcc.target/i386/pr92658-sse4-2.c: Do not XFAIL scan for
pmovsxbd and pmovsxwq.
* gcc.target/i386/pr92658-sse4.c: Do not XFAIL scan for
pmovzxbd and pmovzxwq.
Patrick Palka [Tue, 18 May 2021 14:21:27 +0000 (10:21 -0400)]
libstdc++: Fix access issue in elements_view::_Sentinel [PR100631]
In the earlier commit r12-854 I forgot to also rewrite the other operator-
overload in terms of the split-out member function _M_distance_from.
libstdc++-v3/ChangeLog:
PR libstdc++/100631
* include/std/ranges (elements_view::_Sentinel::operator-): Use
_M_distance_from in the other operator- overload too.
* testsuite/std/ranges/adaptors/elements.cc (test06): Augment test.
Uros Bizjak [Tue, 18 May 2021 13:45:54 +0000 (15:45 +0200)]
i386: Fix split_double_mode with paradoxical subreg [PR100626]
split_double_mode calls simplify_gen_subreg, which fails for the
high half of the paradoxical subreg. Return temporary register
instead of NULL RTX in this case.
2021-05-18 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/100626
* config/i386/i386-expand.c (split_double_mode): Return
temporary register when simplify_gen_subreg fails with
the high half od the paradoxical subreg.
Richard Biener [Mon, 17 May 2021 14:35:38 +0000 (16:35 +0200)]
Avoid setting TREE_ADDRESSABLE on stack vars during RTL expansion
This avoids setting TREE_ADDRESSABLE on variables we want to force to
the stack. Instead track those in a temporary bitmap and force
stack expansion that way, leaving TREE_ADDRESSABLE alone, not
pessimizing future alias queries.
2021-05-17 Richard Biener <rguenther@suse.de>
* cfgexpand.c (expand_one_var): Pass in forced_stack_var
and honor it when expanding.
(expand_used_vars_for_block): Pass through forced_stack_var.
(expand_used_vars): Likewise.
(discover_nonconstant_array_refs_r): Set bits in
forced_stack_vars instead of marking vars TREE_ADDRESSABLE.
(avoid_type_punning_on_regs): Likewise.
(discover_nonconstant_array_refs): Likewise.
(pass_expand::execute): Create and pass down forced_stack_var
bitmap. For parameters and returns temporarily set
TREE_ADDRESSABLE when expand_function_start.
Thomas Schwinge [Mon, 27 Apr 2020 06:22:36 +0000 (08:22 +0200)]
[libgomp, testsuite] Don't shadow global 'offload_targets' variable
See local 'offload_targets' variable in
'libgomp/testsuite/lib/libgomp.exp:libgomp_check_effective_target_offload_target'
vs. global 'libgomp/testsuite/libgomp-test-support.exp.in:offload_targets'
variable.
libgomp/
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_nvptx): Don't shadow global
'offload_targets' variable.
Thomas Schwinge [Thu, 6 May 2021 09:59:42 +0000 (11:59 +0200)]
Add 'dg-note', 'dg-lto-note'
That's 'dg-message "note: [...]"' with a twist: inhibit default notes pruning,
such that "if 'dg-note' is used at least once in a testcase, [notes] are not
pruned and instead must *all* be handled explicitly".
The rationale is that either you're not interested in notes at all (default
behavior of pruning all notes), but often, when you're interested in one note,
you're in fact interested in all notes, and especially interested if
*additional* notes appear over time, as GCC evolves.
gcc/testsuite/
* lib/gcc-dg.exp: Implement 'dg-note'.
* lib/prune.exp: Likewise.
* gcc.dg/vect/nodump-vect-opt-info-2.c: Use 'dg-note', and
'dg-prune-output "note: ".
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Use
'dg-note', match up additional notes, one class of them with
XFAILed 'dg-bogus'.
* lib/lto.exp: Implement 'dg-lto-note'.
* g++.dg/lto/odr-1_0.C: Use 'dg-lto-note', match up additional
notes.
* g++.dg/lto/odr-1_1.C: Likewise.
* g++.dg/lto/odr-2_1.C: Likewise.
libstdc++-v3/
* testsuite/lib/prune.exp: Add note about 'dg-note'.
gcc/
* doc/sourcebuild.texi: Document 'dg-note'.
Tobias Burnus [Tue, 18 May 2021 09:56:05 +0000 (11:56 +0200)]
gcc/configure.ac: Fix cross build by using $(CFLAGS-$@) [PR100598]
BUILD_CFLAGS is set by configure; by default, BUILD_CFLAGS = $(ALL_CFLAGS)
is used. The latter contains (see gcc/Makefile.in) $(CFLAGS-$@), which is used
to pass .o-file specific flags to the compiler.
For cross builds, BUILD_CFLAGS is constructed in configure{,.ac} and missed
the $(CFLAGS-$@) - despite the comment above ALL_CFLAGS that configure.ac
might have to kept in sync.
Richard Biener [Tue, 18 May 2021 09:33:29 +0000 (11:33 +0200)]
operand scanner TLC
This applies some TLC to mark_address_taken which ends up setting
TREE_ADDRESSABLE on nodes where it doesn't have any semantics.
It also does work (incomplete) that get_base_address already does,
likewise we'll never get WITH_SIZE_EXPR in this context and thus
get_base_address never fails.
Jakub Jelinek [Tue, 18 May 2021 08:26:45 +0000 (10:26 +0200)]
regcprop: Avoid DCE of asm goto [PR100590]
The following testcase ICEs, because copyprop_hardreg_forward_1 decides
to DCE asm goto with REG_UNUSED notes (because the output is unused and
asm isn't volatile). But that DCE just removes the asm goto, leaving
a bb with two successors and no insn at the end that would allow that.
The following patch makes sure we drop that way only INSNs and not
JUMP_INSNs or CALL_INSNs.
2021-05-18 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100590
* regcprop.c (copyprop_hardreg_forward_1): Only DCE dead sets if
they are NONJUMP_INSN_P.
Jakub Jelinek [Tue, 18 May 2021 08:10:17 +0000 (10:10 +0200)]
function: Set dummy DECL_ASSEMBLER_NAME in push_dummy_function [PR100580]
Last year I've added cgraph_node::get_create calls for the dummy
functions used for -fdump-passes, so that it interacts well with pass
disabling/enabling which is cgraph uid based.
Unfortunately, as the following testcase shows, when assembler hash
is present, that wants to compute DECL_ASSEMBLER_NAME and the C++ FE
is unprepared to handle it on the dummy functions which don't have
DECL_NAME etc.
The following patch fixes it by setting up a dummy DECL_ASSEMBLER_NAME
on these, so that the FEs don't need to compute it.
2021-05-18 Jakub Jelinek <jakub@redhat.com>
PR c++/100580
* function.c (push_dummy_function): Set DECL_ARTIFICIAL and
DECL_ASSEMBLER_NAME on the fn_decl.
As mentioned earlier, spaceship_replacement didn't optimize partial_ordering
>= 0 comparisons, because the possible values are -1, 0, 1, 2 and the
>= comparison is implemented as (res & 1) == res to choose the 0 and 1
cases from that. As we optimize that only with -ffinite-math-only, the
2 case is assumed not to happen and my earlier match.pd change optimizes
(res & 1) == res into (res & ~1) == 0, so this patch pattern matches
that case and handles it like res >= 0.
2021-05-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
* tree-ssa-phiopt.c (spaceship_replacement): Pattern match
phi result used in (res & ~1) == 0 comparison as res >= 0 as
res == 2 would be UB with -ffinite-math-only.
* g++.dg/opt/pr94589-2.C: Adjust scan-tree-dump count from 14 to 12.
Richard Biener [Wed, 12 May 2021 07:20:17 +0000 (09:20 +0200)]
c/100547 - reject overly large vector_size attributes
This rejects a number of vector components that does not fit an 'int'
which is an internal limitation of RTVEC. This requires adjusting
gcc.dg/attr-vector_size.c which checks for much larger
supported vectors. Note that the RTVEC limitation is a host specific
limitation (unless we change this 'int' to int32_t), but should be
32bits in practice everywhere.
2021-05-12 Richard Biener <rguenther@suse.de>
PR c/100547
gcc/c-family/
* c-attribs.c (type_valid_for_vector_size): Reject too large nunits.
Reword existing nunit diagnostic.
gcc/testsuite/
* gcc.dg/pr100547.c: New testcase.
* gcc.dg/attr-vector_size.c: Adjust.
Andreas Krebbel [Tue, 27 Apr 2021 08:09:06 +0000 (10:09 +0200)]
PR100281 C++: Fix SImode pointer handling
The problem appears to be triggered by two locations in the front-end
where non-POINTER_SIZE pointers aren't handled right now.
1. An assertion in strip_typedefs is triggered because the alignment
of the types don't match. This in turn is caused by creating the new
type with build_pointer_type instead of taking the type of the
original pointer into account.
2. An assertion in cp_convert_to_pointer is triggered which expects
the target type to always have POINTER_SIZE.
gcc/cp/ChangeLog:
PR c++/100281
* cvt.c (cp_convert_to_pointer): Use the size of the target
pointer type.
* tree.c (cp_build_reference_type): Call
cp_build_reference_type_for_mode with VOIDmode.
(cp_build_reference_type_for_mode): Rename from
cp_build_reference_type. Add MODE argument and invoke
build_reference_type_for_mode.
(strip_typedefs): Use build_pointer_type_for_mode and
cp_build_reference_type_for_mode for pointers and references.
gcc/ChangeLog:
PR c++/100281
* tree.c (build_reference_type_for_mode)
(build_pointer_type_for_mode): Pick pointer mode if MODE argument
is VOIDmode.
(build_reference_type, build_pointer_type): Invoke
build_*_type_for_mode with VOIDmode.
gcc/testsuite/ChangeLog:
PR c++/100281
* g++.target/s390/pr100281-1.C: New test.
* g++.target/s390/pr100281-2.C: New test.
Patrick Palka [Tue, 18 May 2021 04:28:44 +0000 (00:28 -0400)]
libstdc++: Fix up semiregular-box partial specialization [PR100475]
This makes the in-place constructor of our partial specialization of
__box for already-semiregular types perform direct-non-list-initialization
(in accordance with the specification of the primary template), and
additionally makes the member function data() use std::__addressof.
libstdc++-v3/ChangeLog:
PR libstdc++/100475
* include/std/ranges (__box::__box): Use non-list-initialization
in member initializer list of in-place constructor of the
partial specialization for semiregular types.
(__box::operator->): Use std::__addressof.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc
(test02): New test.
* testsuite/std/ranges/single_view.cc (test04): New test.
Patrick Palka [Tue, 18 May 2021 04:26:25 +0000 (00:26 -0400)]
libstdc++: Fix condition for memoizing reverse_view::begin() [PR100621]
A range being a random access range isn't a sufficient condition for
ranges::next(iter, sent) to have constant time complexity -- it must
also have a sized sentinel. This adjusts the memoization condition for
reverse_view accordingly.
libstdc++-v3/ChangeLog:
PR libstdc++/100621
* include/std/ranges (reverse_view::_S_needs_cached_begin):
Set to true if the underlying non-common random-access range
doesn't have a sized sentinel.
Patrick Palka [Tue, 18 May 2021 04:26:07 +0000 (00:26 -0400)]
libstdc++: Fix miscellaneous issues with elements_view::_Sentinel [PR100631]
libstdc++-v3/ChangeLog:
PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Also befriend
_Sentinel<!_Const>.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): ... here. Depend on
_Base2 instead of _Base in the return type.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.
openmp: Notify team barrier of pending tasks in omp_fulfill_event
The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.
Harald Anlauf [Mon, 17 May 2021 19:35:38 +0000 (21:35 +0200)]
PR fortran/98411 - Pointless warning for static variables
Variables with explicit SAVE attribute cannot end up on the stack.
There is no point in checking whether they should be moved off the
stack to static storage.
gcc/fortran/ChangeLog:
PR fortran/98411
* trans-decl.c (gfc_finish_var_decl): Add check for explicit SAVE
attribute.
gcc/testsuite/ChangeLog:
PR fortran/98411
* gfortran.dg/pr98411.f90: New test.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member.
(__waiter_base::_M_notify): Check _M_laundered to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
Tom de Vries [Mon, 17 May 2021 08:11:52 +0000 (10:11 +0200)]
[nvptx] Handle memmodel for atomic ops
The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.
Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-17 Tom de Vries <tdevries@suse.de>
PR target/100497
* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
* config/nvptx/nvptx.c (nvptx_output_barrier)
(nvptx_output_atomic_insn): New function.
(nvptx_print_operand): Add support for 'B'.
* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
insns.
Jonathan Wakely [Mon, 17 May 2021 10:54:06 +0000 (11:54 +0100)]
libstdc++: Fix filesystem::path constraints for volatile [PR 100630]
The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval<Source>())) which mean it considers
conversion from an rvalue. When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.
Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval<const Source&>(),
not declval<Source>(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.
libstdc++-v3/ChangeLog:
PR libstdc++/100630
* include/experimental/bits/fs_path.h (__is_constructible_from):
Test construction from a const lvalue, not an rvalue.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.
Aldy Hernandez [Thu, 13 May 2021 20:09:58 +0000 (16:09 -0400)]
Bail in bounds_of_var_in_loop if scev returns NULL.
Both initial_condition_in_loop_num and evolution_part_in_loop_num
can return NULL. This patch exits if either one is NULL. Presumably
this didn't happen before, because adjust_range_with_scev was called
far less frequently than in ranger, which can call it for every PHI.
gcc/ChangeLog:
PR tree-optimization/100349
* vr-values.c (bounds_of_var_in_loop): Bail if scev returns
NULL.
Tamar Christina [Mon, 17 May 2021 14:22:39 +0000 (15:22 +0100)]
AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown
Currently when using -mcpu=native or -march=native on a CPU that is unknown to
the compiler the compiler currently just used -march=armv8-a and enables none
of the extensions.
To make this a bit more useful this patch changes it to still use -march=armv8.a
but to enable the extensions. We still cannot do tuning but at least if using
this on a future SVE core the compiler will at the very least enable SVE etc.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (DEFAULT_ARCH): New.
(host_detect_local_cpu): Use it.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_16: New test.
* gcc.target/aarch64/cpunative/info_17: New test.
* gcc.target/aarch64/cpunative/native_cpu_16.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_17.c: New test.
Richard Biener [Mon, 17 May 2021 11:56:14 +0000 (13:56 +0200)]
middle-end/100582 - fix array_at_struct_end_p for vector indexing
Vector indexing leaves us with ARRAY_REFs of VIEW_CONVERT_EXPRs,
sth which array_at_struct_end_p considers a array-at-struct-end
even when there's an underlying decl visible. The following fixes
the latter.
2021-05-17 Richard Biener <rguenther@suse.de>
PR middle-end/100582
* tree.c (array_at_struct_end_p): Get to the base of the
reference before looking for the underlying decl.
Joern Rennecke [Mon, 17 May 2021 12:44:49 +0000 (13:44 +0100)]
Improve message for wrong number of alternatives.
gcc/
* genoutput.c (validate_insn_alternatives) Make "wrong number of
alternatives" message more specific, and remove assumption on where
the problem is.
Christophe Lyon [Mon, 17 May 2021 12:31:58 +0000 (12:31 +0000)]
arm: Auto-vectorization for MVE: add __fp16 support to VCMP
This patch adds __fp16 support to the previous patch that added vcmp
support with MVE. For this we update existing expanders to use VDQWH
iterator, and add a new expander vcond<VH_cvtto><mode>. In the
process we need to create suitable iterators, and update v_cmp_result
as needed.
gcc/
* config/arm/iterators.md (V16): New iterator.
(VH_cvtto): New iterator.
(v_cmp_result): Added V4HF and V8HF support.
* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>): Use VDQWH.
(vcond<mode><mode>): Likewise.
(vcond_mask_<mode><v_cmp_result>): Likewise.
(vcond<VH_cvtto><mode>): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-compare-3.c: New test with GCC vectors.
* gcc.target/arm/simd/mve-vcmp-f16.c: New test for
auto-vectorization.
* gcc.target/arm/armv8_2-fp16-arith-1.c: Adjust since we now
vectorize float16_t vectors.
Christophe Lyon [Mon, 17 May 2021 12:29:42 +0000 (12:29 +0000)]
arm: Auto-vectorization for MVE: vcmp
Since MVE has a different set of vector comparison operators from
Neon, we have to update the expansion to take into account the new
ones, for instance 'NE' for which MVE does not require to use 'EQ'
with the inverted condition.
Conversely, Neon supports comparisons with #0, MVE does not.
For:
typedef long int vs32 __attribute__((vector_size(16)));
vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
gcc/testsuite
* gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors.
* gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors.
* gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC
vectors.
* gcc.target/arm/simd/mve-vcmp-f32.c: New test for
auto-vectorization.
* gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization.
liuhongt [Thu, 13 May 2021 05:08:16 +0000 (13:08 +0800)]
Fix ICE [PR target/100549]
When arg0 is same as arg1 in __builtin_ia32_pcmpgtw,
gimple_build (&stmts, GT_EXPR, cmp_type, arg0, arg1) will simplify the
comparison to vector constant 0, no stmts is generated, which causes
ICE in gsi_insert_before (gsi, stmts, GSI_SAME_STMT). So use
gsi_insert_seq_before instead which will handle NULL seq.
gcc/ChangeLog:
PR target/100549
* config/i386/i386.c (ix86_gimple_fold_builtin): Use
gsi_insert_seq_before instead.
gcc/testsuite/ChangeLog:
PR target/100549
* gcc.target/i386/pr100549.c: New test.
Christophe Lyon [Mon, 17 May 2021 12:02:40 +0000 (12:02 +0000)]
testsuite/arm: Add mve-vadd-scalar-1.c test
This patch adds a test for the scalar mode of vadd, precisely noting
that we do not yet use the T2 variants of vadd, which take a scalar as
final argument.
Christophe Lyon [Mon, 17 May 2021 11:53:14 +0000 (11:53 +0000)]
testsuite/arm: Fix and rename arm_qbit_ok into arm_sat_ok effective-target
The acle/saturation.c test uses __[su]sat() and
__saturation_occurred() intrinsics but __[su]sat() are defined in
acle.h if __ARM_FEATURE_SAT true, while __saturation_occurred()
depends on __ARM_FEATURE_QBIT.
QBIT is a v5te feature, while SAT is available since v6, so the test
really needs __ARM_FEATURE_SAT, to have both available.
This patch renames arm_qbit_ok into arm_sat_ok and checks
__ARM_FEATURE_SAT. It updates acle/saturation.c accordingly.
This enables the test to pass on arm-eabi with default cpu/fpu/mode,
where arm_qbit previously used -march=armv5te instead of armv6 now.
Jonathan Wakely [Fri, 14 May 2021 13:19:50 +0000 (14:19 +0100)]
libstdc++: Allow lualatex to be used for Doxygen PDF
This allows the Doxygen PDF to be built using lualatex instead of
pdflatex, which solves a problem with pdflatex running out of memory
sometimes. This is done by adding a --latex_cmd option to the
run_doxygen script, which then sets the specified command in the
generated user.cfg file used by Doxygen. The makefile is adjusted to
pass --latex_cmd=$(LATEX_CMD) to the script, so using running make with
LATEX_CMD=lualatex will override the default.
Additionally, this does some refactoring of the doc/Makefile.am rules
and the run_doxygen script.
libstdc++-v3/ChangeLog:
* doc/Makefile.am: Simplify doxygen recipes and use --latex_cmd.
* doc/Makefile.in: Regenerate.
* doc/doxygen/user.cfg.in (LATEX_CMD_NAME): Add placeholder
value.
* scripts/run_doxygen (print_usage): Always print to stdout and
do not exit.
(fail): New function for exiting on error.
(parse_options): Handle --latex_cmd. Do not treat --help the
same as errors. Simplify handling of required arguments.
Martin Liska [Thu, 13 May 2021 11:41:17 +0000 (13:41 +0200)]
LTO: merge -flto=foo both from IL and linker cmdline
gcc/ChangeLog:
* lto-wrapper.c (merge_flto_options): Factor out a new function.
(merge_and_complain): Use it.
(run_gcc): Merge also linker command line -flto=foo argument
with IL files.
Richard Biener [Mon, 17 May 2021 06:51:03 +0000 (08:51 +0200)]
Update mpfr version to 3.1.6
This updates the mpfr version to 3.1.6 which is the last bugfix
release from the 3.1.x series and avoids printing the version
is buggy but acceptable from our configury.
2021-05-17 Richard Biener <rguenther@suse.de>
contrib/ChangeLog:
* download_prerequisites: Update mpfr version to 3.1.6.
* prerequisites.md5: Update.
* prerequisites.sha512: Likewise.
Christophe Lyon [Sun, 16 May 2021 13:48:21 +0000 (13:48 +0000)]
arm: remove error in CPP_SPEC when -mlittle-endian and -mbig-endian are used together
arm.h has had this error message since 1997, but it is no longer
needed since option parsing has been improved: -mXXX-endian is handled
via arm.opt and updates the BIG_END mask. So, the last
instance of -mXXX-endian on the command line wins.
Tested on many arm* configurations, with no impact on the testsuite results.
Christophe Lyon [Sun, 16 May 2021 13:46:06 +0000 (13:46 +0000)]
testsuite/arm: Improve unsigned-float.c
The test requires an FPU, so use -march=armv7-a+fp -mfpu=auto instead
of -march=armv7-a.
We also remove dg-require-effective-target arm_fp_ok, but keep
dg-add-options arm_fp: this enables the test to pass on arm-eabi
configured with default cpu/fpu/mode.
dg-require-effective-target arm_fp_ok fails on such a configuration
for lack of FPU, since dg-options are not taken into account by
dg-require-effective-target.
Add -march=armv7-a+fp -mfpu=auto is sufficient for arm_fp options to
be acceptable.
This enables the test to pass on all the arm-eabi configurations I'm
testing, as well as arm-linux-gnueabi when forcing -march=armv5t.
reorg.c (fill_slots_from_thread): Reinstate code typoed out in "Remove CC0".
The typo here, is obviously mistaken removal of lines next
to a line that was validly removed. Targets affected are
those with a delay-slot *and* defining TARGET_FLAGS_REGNUM.
In-tree, a git-grep says the only ones matching are CRIS,
h8300 and visium. The code removal has the effect of
wrong-code, not reverting the effect of r11-2814.
I'm "guessing" it was the effect of an incorrect conflict
resolution in preparatory work for the r12-440 / bd1cd0d0e0fe / "Remove CC0" commit, when rebasing a related
branch, and not testing any of the affected targets. Either
way, the effect was a btest-gcc.sh state of "regress-1152"
for cris-elf. FWIW, I wrote the removed code (sans the
validly removed cc0 line), a part of what was committed at
2020-08-24 as 0e6c51de8ec47 / r11-2814.
This commit gets cris-elf test-results back to a sane state
(tested at 0ffdbc85d9a6 / r12-761).
gcc:
* reorg.c (fill_slots_from_thread): Reinstate code typoed out in
"Remove CC0".
Revert:
PR tree-optimization/100453
* tree-sra.c (sra_modify_assign): All const base accesses do not
need refreshing, not just those from decl_pool.
(sra_modify_assign): Do not refresh into a const base decl.
gcc/testsuite/ChangeLog:
2021-05-12 Martin Jambor <mjambor@suse.cz>
Revert:
PR tree-optimization/100453
* gcc.dg/tree-ssa/pr100453.c: New test.
Jakub Jelinek [Sat, 15 May 2021 08:12:11 +0000 (10:12 +0200)]
regcprop: Fix another cprop_hardreg bug [PR100342]
On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
> [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same chain:
>
> (set (reg:HI R1) (reg:HI R0))
> (set (reg:SI R2) (reg:SI R1)) // [A]
> (set (reg:DI R3) (reg:DI R2)) // [A]
> (set (reg:SI R4) (reg:SI R[0-3]))
> (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
> (set (reg:SI R1) (reg:SI R0))
> (set (reg:HI R2) (reg:HI R1))
> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though. The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate “if” or “else if”,
> with its own comment.
Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).
The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx. Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is
DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.
The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
&& partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode))
return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.
The following patch effectively implements your (2) above.
2021-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded destination
mode too or punt if !REG_CAN_CHANGE_MODE_P.
Comparing DECL_SOURCE_LOCATION like the GCC 11 patch for PR 95870 will also
work for user-defined functions, if we update their location when
instantiating. Another option would be to use LAMBDA_EXPR_REGEN_INFO for
lambdas, but this way is even simpler.
Kyrylo Tkachov [Fri, 14 May 2021 09:05:42 +0000 (10:05 +0100)]
aarch64: Make sqdmlal2 patterns match canonical RTL
The sqdmlal2 patterns are hidden beneath the SBINQOPS iterator and unfortunately they don't match
canonical RTL because the simple accumulate operand comes in the first arm of the SS_PLUS.
This patch splits the SS_PLUS and SS_MINUS forms with the SS_PLUS operands set up to match
the canonical form, where the complex operand comes first.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(aarch64_sqdml<SBINQOPS:as>l2_lane<mode>_internal): Split into...
(aarch64_sqdmlsl2_lane<mode>_internal): ... This...
(aarch64_sqdmlal2_lane<mode>_internal): ... And this.
(aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal): Split into ...
(aarch64_sqdmlsl2_laneq<mode>_internal): ... This...
(aarch64_sqdmlal2_laneq<mode>_internal): ... And this.
(aarch64_sqdml<SBINQOPS:as>l2_n<mode>_internal): Split into...
(aarch64_sqdmlsl2_n<mode>_internal): ... This...
(aarch64_sqdmlal2_n<mode>_internal): ... And this.