Jakub Jelinek [Fri, 19 Feb 2021 11:14:39 +0000 (12:14 +0100)]
tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]
The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
labels are always the first label if there is more than one label.
When merging blocks, we don't honor that though.
On the following testcase, we try to merge blocks:
<bb 13> [count: 0]:
<L2>:
S::~S (&s);
and
<bb 15> [count: 0]:
<L0>:
resx 1
where <L2> is landing pad and <L0> is FORCED_LABEL. And the code puts
the FORCED_LABEL before the landing pad label, violating the verification
requirements.
The following patch fixes it by moving the FORCED_LABEL after the
DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.
2021-02-19 Jakub Jelinek <jakub@redhat.com>
PR ipa/99034
* tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
pad or non-local label, put FORCED_LABELs from bb b after that label
rather than before it.
Jakub Jelinek [Thu, 18 Feb 2021 21:17:52 +0000 (22:17 +0100)]
c: Fix ICE with -fexcess-precision=standard [PR99136]
The following testcase ICEs on i686-linux, because c_finish_return wraps
c_fully_folded retval back into EXCESS_PRECISION_EXPR, but when the function
return type is void, we don't call convert_for_assignment on it that would
then be fully folded again, but just put the retval into RETURN_EXPR's
operand, so nothing removes it anymore and during gimplification we
ICE as EXCESS_PRECISION_EXPR is not handled.
This patch fixes it by not adding that EXCESS_PRECISION_EXPR in functions
returning void, the return value is ignored and all we need is evaluate any
side-effects of the expression.
2021-02-18 Jakub Jelinek <jakub@redhat.com>
PR c/99136
* c-typeck.c (c_finish_return): Don't wrap retval into
EXCESS_PRECISION_EXPR in functions that return void.
Jakub Jelinek [Wed, 17 Feb 2021 14:03:25 +0000 (15:03 +0100)]
c++: Fix up build_zero_init_1 once more [PR99106]
My earlier build_zero_init_1 patch for flexible array members created
an empty CONSTRUCTOR. As the following testcase shows, that doesn't work
very well because the middle-end doesn't expect CONSTRUCTOR elements with
incomplete type (that the empty CONSTRUCTOR at the end of outer CONSTRUCTOR
had).
The following patch just doesn't add any CONSTRUCTOR for the flexible array
members, it doesn't seem to be needed.
2021-02-17 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/99106
* init.c (build_zero_init_1): For flexible array members just return
NULL_TREE instead of returning empty CONSTRUCTOR with non-complete
ARRAY_TYPE.
Jakub Jelinek [Mon, 15 Feb 2021 08:16:06 +0000 (09:16 +0100)]
match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079]
The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification
uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe
it doesn't check what it was meant to check. On convert?@3
TREE_TYPE (@3) is not the type of what it has been converted from, but
what it has been converted to, which needs to be (because it is operand
of normal binary operation) equal or compatible to type of the modulo
result and first operand - type.
I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1))
and be done with it, but actually most of the non-nop conversions are IMHO
ok and so we would regress those optimizations.
In particular, if we have say narrowing conversions (foo5 and foo6 in
the new testcase), I think we are fine, either the shift of the power of two
constant after narrowing conversion is still that power of two (or negation
of that) and then it will still work, or the result of narrowing conversion
is 0 and then we would have UB which we can ignore.
Similarly, widening conversions where the shift result is unsigned are fine,
or even widening conversions where the shift result is signed, but we sign
extend to a signed wider divisor, the problematic case of INT_MIN will
become x % (long long) INT_MIN and we can still optimize that to
x & (long long) INT_MAX.
What doesn't work is the case in the pr99079.c testcase, widening conversion
of a signed shift result to wider unsigned divisor, where if the shift
is negative, we end up with x % (unsigned long long) INT_MIN which is
x % 0xffffffff80000000ULL where the divisor is not a power of two and
we can't optimize that to x & 0x7fffffffULL.
So, the patch rejects only the single problematic case.
Furthermore, when the shift result is signed, we were introducing UB into
a program which previously didn't have one (well, left shift into the sign
bit is UB in some language/version pairs, but it is definitely valid in
C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c
testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that
subtraction of 1, x % (1 << 31) in C++20 is well defined, but
x & ((1 << 31) - 1) triggers UB on the subtraction.
So, the patch performs the subtraction in the unsigned type if it isn't
wrapping.
2021-02-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/99079
* match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove
useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check. Instead
require both type and TREE_TYPE (@1) to be integral types and either
type having smaller or equal precision, or TREE_TYPE (@1) being
unsigned type, or type being signed type. If TREE_TYPE (@1)
doesn't have wrapping overflow, perform the subtraction of one in
unsigned type.
* gcc.dg/fold-modpow2-2.c: New test.
* gcc.c-torture/execute/pr99079.c: New test.
Jakub Jelinek [Fri, 12 Feb 2021 08:55:46 +0000 (09:55 +0100)]
c++: Fix endless errors on invalid requirement seq [PR97742]
As the testcase shows, if we reach CPP_EOF during parsing of requirement
sequence, we end up with endless loop where we always report invalid
requirement expression, don't consume any token (as we are at eof) and
repeat.
This patch stops the loop when we reach CPP_EOF.
2021-02-12 Jakub Jelinek <jakub@redhat.com>
PR c++/97742
* parser.c (cp_parser_requirement_seq): Stop iterating after reaching
CPP_EOF.
Jakub Jelinek [Thu, 11 Feb 2021 16:24:17 +0000 (17:24 +0100)]
c++: Fix zero initialization of flexible array members [PR99033]
array_type_nelts returns error_mark_node for type of flexible array members
and build_zero_init_1 was placing an error_mark_node into the CONSTRUCTOR,
on which e.g. varasm ICEs. I think there is nothing erroneous on zero
initialization of flexible array members though, such arrays should simply
get no elements, like they do if such classes are constructed (everything
except when some larger initializer comes from an explicit initializer).
So, this patch handles [] arrays in zero initialization like [0] arrays
and fixes handling of the [0] arrays - the
tree_int_cst_equal (max_index, integer_minus_one_node) check
didn't do what it thought it would do, max_index is typically unsigned
integer (sizetype) and so it is never equal to a -1.
What the patch doesn't do and maybe would be desirable is if it returns
error_mark_node for other reasons let the recursive callers not stick that
into CONSTRUCTOR but return error_mark_node instead. But I don't have a
testcase where that would be needed right now.
2021-02-11 Jakub Jelinek <jakub@redhat.com>
PR c++/99033
* init.c (build_zero_init_1): Handle zero initialiation of
flexible array members like initialization of [0] arrays.
Use integer_minus_onep instead of comparison to integer_minus_one_node
and integer_zerop instead of comparison against size_zero_node.
Formatting fixes.
Jakub Jelinek [Wed, 10 Feb 2021 18:52:37 +0000 (19:52 +0100)]
varasm: Fix ICE with -fsyntax-only [PR99035]
My FE change from 2 years ago uses TREE_ASM_WRITTEN in -fsyntax-only
mode more aggressively to avoid "expanding" functions multiple times.
With -fsyntax-only nothing is really expanded, so I think it is acceptable
to adjust the assert and allow declare_weak at any time, with -fsyntax-only
we know it is during parsing only anyway.
2021-02-10 Jakub Jelinek <jakub@redhat.com>
PR c++/99035
* varasm.c (declare_weak): For -fsyntax-only, allow even
TREE_ASM_WRITTEN function decls.
Jakub Jelinek [Wed, 10 Feb 2021 18:31:15 +0000 (19:31 +0100)]
c++: Consider addresses of heap artificial vars always non-NULL [PR98988, PR99031]
With -fno-delete-null-pointer-checks which is e.g. implied by
-fsanitize=undefined or default on some embedded targets, the middle-end
folder doesn't consider addresses of global VAR_DECLs to be non-NULL, as one
of them could have address 0. Still, I think malloc/operator new (at least
the nonthrowing) relies on NULL returns meaning allocation failure rather
than success. Furthermore, the artificial VAR_DECLs we create for
constexpr new never actually live in the address space of the program,
so we can pretend they will never be NULL too.
> I'm surprised that nonzero_address has such a limited set of things it will
> actually believe have non-zero addresses with
> -fno-delete-null-pointer-checks. But it seems that we should be able to
> arrange to satisfy
>
> > if (definition && !DECL_EXTERNAL (decl)
>
> since these "variables" are indeed defined within the current translation
> unit.
Doing that seems to work and as added benefit it fixes another PR that has
been filed recently. I need to create the varpool node explicitly and call
a method that sets the definition member in there, but I can also unregister
those varpool nodes at the end of constexpr processing, as the processing
ensured they don't leak outside of the processing.
2021-02-10 Jakub Jelinek <jakub@redhat.com>
PR c++/98988
PR c++/99031
* constexpr.c: Include cgraph.h.
(cxx_eval_call_expression): Call varpool_node::finalize_decl on
heap artificial vars.
(cxx_eval_outermost_constant_expr): Remove varpool nodes for
heap artificial vars.
* g++.dg/cpp2a/constexpr-new16.C: New test.
* g++.dg/cpp2a/constexpr-new17.C: New test.
Jakub Jelinek [Wed, 10 Feb 2021 09:34:58 +0000 (10:34 +0100)]
openmp: Temporarily disable into_ssa when gimplifying OpenMP reduction clauses [PR99007]
gimplify_scan_omp_clauses was already calling gimplify_expr with false as
last argument to make sure it is not an SSA_NAME, but as the testcases show,
that is not enough, SSA_NAME temporaries created during that gimplification
can be reused too and we can't allow SSA_NAMEs to be used across OpenMP
region boundaries, as we can only firstprivatize decls.
Fixed by temporarily disabling into_ssa.
2021-02-10 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99007
* gimplify.c (gimplify_scan_omp_clauses): For MEM_REF on reductions,
temporarily disable gimplify_ctxp->into_ssa around gimplify_expr
calls.
* g++.dg/gomp/pr99007.C: New test.
* gcc.dg/gomp/pr99007-1.c: New test.
* gcc.dg/gomp/pr99007-2.c: New test.
* gcc.dg/gomp/pr99007-3.c: New test.
Jakub Jelinek [Fri, 5 Feb 2021 09:22:07 +0000 (10:22 +0100)]
c++: Fix ICE with structured binding initialized to incomplete array [PR97878]
We ICE on the following testcase, for incomplete array a on auto [b] { a }; without
giving any kind of diagnostics, with auto [c] = a; during error-recovery.
The problem is that we get too far through check_initializer and e.g.
store_init_value -> constexpr stuff can't deal with incomplete array types.
As the type of the structured binding artificial variable is always deduced,
I think it is easiest to diagnose this early, even if they have array types
we'll need their deduced type to be complete rather than just its element
type.
2021-02-05 Jakub Jelinek <jakub@redhat.com>
PR c++/97878
* decl.c (check_array_initializer): For structured bindings, require
the array type to be complete.
Jakub Jelinek [Wed, 3 Feb 2021 08:09:26 +0000 (09:09 +0100)]
ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]
As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
some insns) at expand_binop or expand_unop and expects it to do something
(and then will check if it created valid insns and punts if not).
These functions in the end if the operands don't match try to
copy_to_mode_reg the operands, which does
if (!general_operand (x, VOIDmode))
x = force_operand (x, temp);
but, force_operand is far from handling all possible RTLs, it will ICE for
all more unusual RTL codes. Basically handles just simple arithmetic and
unary RTL operations if they have an optab and
expand_simple_binop/expand_simple_unop ICE on others.
The following patch fixes it by adding some operand verification (whether
there is a hope that copy_to_mode_reg will succeed on those). It is added
both to noce_emit_move_insn (not needed for this exact testcase,
that function simply tries to recog the insn as is and if it fails,
handles some simple binop/unop cases; the patch performs the verification
of their operands) and noce_try_sign_mask.
2021-02-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97487
* ifcvt.c (noce_can_force_operand): New function.
(noce_emit_move_insn): Use it.
(noce_try_sign_mask): Likewise. Formatting fix.
* gcc.dg/pr97487-1.c: New test.
* gcc.dg/pr97487-2.c: New test.
Jakub Jelinek [Wed, 3 Feb 2021 08:07:36 +0000 (09:07 +0100)]
lra-constraints: Fix error-recovery for bad inline-asms [PR97971]
The following testcase has ice-on-invalid, it can't be reloaded, but we
shouldn't ICE the compiler because the user typed non-sense.
In current_insn_transform we have:
if (process_alt_operands (reused_alternative_num))
alt_p = true;
if (check_only_p)
return ! alt_p || best_losers != 0;
/* If insn is commutative (it's safe to exchange a certain pair of
operands) then we need to try each alternative twice, the second
time matching those two operands as if we had exchanged them. To
do this, really exchange them in operands.
If we have just tried the alternatives the second time, return
operands to normal and drop through. */
if (! alt_p && ! sec_mem_p)
{
/* No alternative works with reloads?? */
if (INSN_CODE (curr_insn) >= 0)
fatal_insn ("unable to generate reloads for:", curr_insn);
error_for_asm (curr_insn,
"inconsistent operand constraints in an %<asm%>");
lra_asm_error_p = true;
...
and so handle inline asms there differently (and delete/nullify them after
this) - fatal_insn is only called for non-inline asm.
But in process_alt_operands we do:
/* Both the earlyclobber operand and conflicting operand
cannot both be user defined hard registers. */
if (HARD_REGISTER_P (operand_reg[i])
&& REG_USERVAR_P (operand_reg[i])
&& operand_reg[j] != NULL_RTX
&& HARD_REGISTER_P (operand_reg[j])
&& REG_USERVAR_P (operand_reg[j]))
fatal_insn ("unable to generate reloads for "
"impossible constraints:", curr_insn);
and thus ICE even for inline-asms.
I think it is inappropriate to delete/nullify the insn in
process_alt_operands, as it could be done e.g. in the check_only_p mode,
so this patch just returns false in that case, which results in the
caller have alt_p false, and as inline asm isn't simple move, sec_mem_p
will be also false (and it isn't commutative either), so for check_only_p
it will suggests to the callers it isn't ok and otherwise will emit
error and delete/nullify the inline asm insn.
2021-02-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97971
* lra-constraints.c (process_alt_operands): For inline asm, don't call
fatal_insn, but instead return false.
Jakub Jelinek [Wed, 3 Feb 2021 08:04:26 +0000 (09:04 +0100)]
i386: Remove V1DImode shift expanders [PR98287]
On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote:
> All I say is that the x86 target
> should either not advertise V1DF shifts or advertise the basic
> ops that reasonable simplification would expect to exist.
The backend has several V1?Imode shifts, but optab only for those V1DImode
ones:
Harald Anlauf [Wed, 10 Mar 2021 21:59:50 +0000 (22:59 +0100)]
PR fortran/99205 - Out of memory with undefined character length
A character variable appearing as a data statement object cannot
be automatic, thus it shall have constant length.
gcc/fortran/ChangeLog:
PR fortran/99205
* data.c (gfc_assign_data_value): Reject non-constant character
length for lvalue.
* trans-array.c (gfc_conv_array_initializer): Restrict loop to
elements which are defined to avoid NULL pointer dereference.
gcc/testsuite/ChangeLog:
PR fortran/99205
* gfortran.dg/data_char_4.f90: New test.
* gfortran.dg/data_char_5.f90: New test.
Eric Botcazou [Fri, 19 Mar 2021 08:21:11 +0000 (09:21 +0100)]
Fix segfault during encoding of CONSTRUCTORs
The segfault occurs in native_encode_initializer when it is encoding the
CONSTRUCTOR for an array whose lower bound is negative (it's OK in Ada).
The computation of the current position is done in HOST_WIDE_INT and this
does not work for arrays whose original range has a negative lower bound
and a positive upper bound; the computation must be done in sizetype
instead so that it may wrap around.
gcc/
PR middle-end/99641
* fold-const.c (native_encode_initializer) <CONSTRUCTOR>: For an
array type, do the computation of the current position in sizetype.
Sinan Lin [Thu, 4 Mar 2021 10:02:39 +0000 (18:02 +0800)]
PR target/99314: Fix integer signedness issue for cpymem pattern expansion.
Third operand of cpymem pattern is unsigned HOST_WIDE_INT, however we
are interpret that as signed HOST_WIDE_INT, that not a problem in
most case, but when the value is large than signed HOST_WIDE_INT, it
might screw up since we have using that value to calculate the buffer
size.
2021-03-05 Sinan Lin <sinan@isrc.iscas.ac.cn>
Kito Cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv.c (riscv_block_move_straight): Change type
to unsigned HOST_WIDE_INT for parameter and local variable with
HOST_WIDE_INT type.
(riscv_adjust_block_mem): Ditto.
(riscv_block_move_loop): Ditto.
(riscv_expand_block_move): Ditto.
Kyrylo Tkachov [Thu, 18 Mar 2021 08:57:01 +0000 (08:57 +0000)]
aarch64: Improve generic SVE tuning defaults
This patch adds the recently-added tweak to split some SVE VL-based scalar
operations [1] to the generic tuning used for SVE, as enabled by adding +sve
to the -march flag, for example -march=armv8.2-a+sve.
The recommendation for best performance on a particular CPU remains unchanged:
use the -mcpu option for that CPU, where possible. -mcpu=native makes this
straightforward for native compilation.
The tweak to split out SVE VL-based scalar operations is a consistent win for
the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of
SPEC2017 on A64FX with this tweak on didn't show any non-noise differences.
It is also expected to be neutral on SVE2 implementations.
Therefore, the patch enables the tweak for generic +sve tuning e.g.
-march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it,
therefore the tweak is disabled for generic tuning when +sve2 is in
-march e.g. -march=armv8.2-a+sve2.
The implementation of this approach requires a bit of custom logic in
aarch64_override_options_internal to handle these kinds of
architecture-dependent decisions, but we do believe the user-facing principle
here is important to implement.
In general, for the generic target we're using a decision framework that looks
like:
* If all cores that are known to benefit from an optimization
are of architecture X, and all other cores that implement X or above
are not impacted, or have a very slight impact, we will consider it for
generic tuning for architecture X.
* We will not enable that optimisation for generic tuning for architecture X+1
if no known cores of architecture X+1 or above will benefit.
This framework allows us to improve generic tuning for CPUs of generation X
while avoiding accumulating tweaks for future CPUs of generation X+1, X+2...
that do not need them, and thus avoid even the slight negative effects of
these optimisations if the user is willing to tell us the desired architecture
accurately.
X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a etc)
or optional architecture extensions (like SVE, SVE2).
* config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): Define.
(aarch64_override_options_internal): Use it.
(generic_tunings): Add AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to
tune_flags.
Jason Merrill [Fri, 5 Mar 2021 22:07:25 +0000 (17:07 -0500)]
testsuite: Update testcase for PR96078 fix [PR99363]
My fix for PR96078 made us stop warning about flatten on an alias if the
target has the alias, which is exactly the case tested here. So let's
remove the expected warning and add a similar case which does warn.
Kyrylo Tkachov [Wed, 17 Mar 2021 18:21:05 +0000 (18:21 +0000)]
aarch64: Fix status return logic in RNG intrinsics
There is a bug with the RNG intrinsics in their return code. The definition says:
"Stores a 64-bit random number into the object pointed to by the argument and returns zero.
If the implementation could not generate a random number within a reasonable period of time
the object pointed to by the input is set to zero and a non-zero value is returned."
This means we should be testing whether to return non-zero with:
CSET W0, EQ
rather than NE.
This patch fixes that.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (aarch64_expand_rng_builtin): Use EQ
to compare against CC_REG rather than NE.
Peter Bergner [Wed, 17 Mar 2021 15:52:26 +0000 (10:52 -0500)]
rs6000: Fix disassembling a vector pair in gcc-10 in little-endian mode
In gcc-10, we don't handle disassembling a vector pair in little-endian mode
correctly. The solution is to make use of the disassemble accumulator code
that is endian friendly.
gcc/
2021-03-17 Peter Bergner <bergner@linux.ibm.com>
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle
disassembling a vector pair vector by vector in little-endian mode.
Martin Jambor [Tue, 16 Mar 2021 15:42:41 +0000 (16:42 +0100)]
ipa: Fix resolving speculations through cgraph_edge::set_call_stmt
In the PR 98078 testcase, speculative call-graph edges which were
created by IPA-CP are confirmed during inlining but
cgraph_edge::set_call_stmt does not take it very well.
The function enters the update_speculative branch and updates the
edges in the speculation bundle separately (by a recursive call), but
when it processes the first direct edge, most of the bundle actually
ceases to exist because it is devirtualized. It nevertheless goes on
to attempt to update the indirect edge (that has just been removed),
which surprisingly gets as far as adding the edge to the
call_site_hash, the same devirtualized edge for the second time, and
that triggers an assert.
Fixed by this patch which makes the function aware that it is about to
resolve a speculation and do so instead of updating components of
speculation. Also, it does so before dealing with the hash because
the speculation resolution code needs the hash to point to the first
speculative direct edge and also cleans the hash up by calling
update_call_stmt_hash_for_removing_direct_edge.
Bootstrapped and tested on x86_64-linux, also profile-LTO-bootstrapped
on the same system.
gcc/ChangeLog:
2021-01-20 Martin Jambor <mjambor@suse.cz>
PR ipa/98078
* cgraph.c (cgraph_edge::set_call_stmt): Do not update all
corresponding speculative edges if we are about to resolve
sepculation. Make edge direct (and so resolve speculations) before
removing it from call_site_hash.
(cgraph_edge::make_direct): Relax the initial assert to allow calling
the function on speculative direct edges.
Richard Biener [Wed, 24 Feb 2021 08:18:05 +0000 (09:18 +0100)]
c/99224 - avoid ICEing on invalid __builtin_next_arg
This avoids crashes with __builtin_next_arg on non-parameters. For
the specific testcase we arrive with an anonymous SSA_NAME so that
SSA_NAME_VAR becomes NULL and we crash.
This fixes an ordering problem with verifying that no intermediate
computations in a reduction path are used outside of the chain. The
check was disabled for value-preserving conversions at the tail
but whether a stmt was a conversion or not was only computed after
the first use. The following fixes this by re-ordering things
accordingly.
2021-02-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/99253
* tree-vect-loop.c (check_reduction_path): First compute
code, then verify out-of-loop uses.
Kyrylo Tkachov [Fri, 12 Mar 2021 13:18:00 +0000 (13:18 +0000)]
aarch64: Set AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC for Neoverse N2
This patch tweaks the Neoverse N2 tuning on the GCC 10 branch to have it
in line with GCC 8 and 9 to prefer AdvancedSIMD over SVE for
auto-vectorisation.
gcc/ChangeLog:
* config/aarch64/aarch64.c (neoversen2_tunings): Set
AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC tune_flags.
We were missing a check in function_resolver::require_vector_type to see
if the argument type was already invalid. This was causing us to attempt
to emit a diagnostic and subsequently ICE in print_type. Fixed thusly.
Peter Bergner [Mon, 8 Mar 2021 18:20:41 +0000 (12:20 -0600)]
rs6000: Fix invalid splits when using Altivec style addresses [PR98959]
The rs6000_emit_le_vsx_* functions assume they are not passed an Altivec
style "& ~16" address. However, some of our expanders and splitters do
not verify we do not have an Altivec style address before calling those
functions, leading to an ICE. The solution here is to guard the expanders
and splitters to ensure we do not call them if we're given an Altivec style
address.
2021-03-08 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/98959
* config/rs6000/rs6000.c (rs6000_emit_le_vsx_permute): Add an assert
to ensure we do not have an Altivec style address.
* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): Disable if passed
an Altivec style address.
(*vsx_le_perm_store_<mode>): Likewise.
(splitters after *vsx_le_perm_store_<mode>): Likewise.
(vsx_load_<mode>): Disable special expander if passed an Altivec
style address.
(vsx_store_<mode>): Likewise.
gcc/testsuite/
PR target/98959
* gcc.target/powerpc/pr98959.c: New test.
Peter Bergner [Fri, 26 Feb 2021 03:35:28 +0000 (21:35 -0600)]
rs6000: Fix ICE in rs6000_init_builtins when compiling with -mcpu=440 [PR99279]
The initialization of compat builtins assumes the builtin we are creating
a compatible builtin for exists and ICEs if it doesn't. However, there are
valid reasons why some builtins are disabled for a particular compile.
In this case, the MMA builtins are disabled for -mcpu=440 (and other cpus),
so instead of ICEing, we should just skip adding the MMA compat builtin.
2021-02-25 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/99279
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Replace assert
with an "if" test.
Peter Bergner [Tue, 23 Feb 2021 23:16:10 +0000 (17:16 -0600)]
rs6000: Add support for compatibility built-ins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. It's too late to remove the
old names, so this patch renames the built-ins to the new names and then
adds support for creating compatibility built-ins (ie, multiple built-in
functions generate the same code) and then creates compatibility built-ins
using the old names.
2021-02-23 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/mma.md (mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
* config/rs6000/rs6000-builtin.def (BU_MMA_V2, BU_MMA_V3,
BU_COMPAT): New macros.
(mma_assemble_pair): Rename from this...
(vsx_assemble_pair): ...to this.
(mma_disassemble_pair): Rename from this...
(vsx_disassemble_pair): ...to this.
(mma_assemble_pair): New compatibility built-in.
(mma_disassemble_pair): Likewise.
* config/rs6000/rs6000-call.c (struct builtin_compatibility): New.
(RS6000_BUILTIN_COMPAT): Define.
(bdesc_compat): New.
(rs6000_gimple_fold_mma_builtin): Use VSX_BUILTIN_ASSEMBLE_PAIR.
(rs6000_init_builtins): Register compatibility built-ins.
(mma_init_builtins): Use VSX_BUILTIN_ASSEMBLE_PAIR,
and VSX_BUILTIN_DISASSEMBLE_PAIR.
* doc/extend.texi (__builtin_mma_assemble_pair): Rename from this...
(__builtin_vsx_assemble_pair): ...to this.
(__builtin_mma_disassemble_pair): Rename from this...
(__builtin_vsx_disassemble_pair): ...to this.
gcc/testsuite/
* gcc.target/powerpc/mma-builtin-4.c: Add tests for
__builtin_vsx_assemble_pair and __builtin_vsx_disassemble_pair.
Add __has_builtin tests for built-ins.
Update expected instruction counts.
Peter Bergner [Thu, 11 Feb 2021 20:15:26 +0000 (14:15 -0600)]
rs6000: Fix invalid address used in MMA built-in function
The mma_assemble_input_operand predicate is too lenient on the memory
operands it will accept, leading to an ICE when illegitimate addresses
are passed in. The solution is to only accept memory operands with
addresses that are valid for quad word memory accesses. The test case
is a minimized test case from the Eigen library. The creduced test case
is very noisy with respect to warnings, so the test case has added -w to
silence them.
2021-02-11 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/99041
* config/rs6000/predicates.md (mma_assemble_input_operand): Restrict
memory addresses that are legal for quad word accesses.
gcc/testsuite/
PR target/99041
* g++.target/powerpc/pr99041.C: New test.
Eric Botcazou [Wed, 10 Mar 2021 11:04:25 +0000 (12:04 +0100)]
Fix ICE on atomic enumeration type with LTO
This is a strange regression whereby an enumeration type declared as
atomic (or volatile) incorrectly triggers the ODR machinery for its
values in LTO mode.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Build a TYPE_STUB_DECL
for the main variant of an enumeration type declared as volatile.
gcc/testsuite/
* gnat.dg/specs/lto25.ads: New test.
Eric Botcazou [Tue, 9 Mar 2021 15:12:22 +0000 (16:12 +0100)]
Fix internal error on lambda function
This boils down to the RTL expander trying to take the address of a DECL
whose RTX is a register.
gcc/
PR c++/90448
* calls.c (initialize_argument_information): When the argument
is passed by reference, do not make a copy in a thunk only if
the argument is already in memory. Remove redundant test for
the case of callee copy.
Kyrylo Tkachov [Mon, 8 Mar 2021 09:35:14 +0000 (09:35 +0000)]
aarch64: Add internal tune flag to minimise VL-based scalar ops
This is a backport of the cse_sve_vl_constants tuning param to GCC 10.
Bootstrapped and tested on the branch on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
Define.
* config/aarch64/aarch64.md (add<mode>3): Force CONST_POLY_INT immediates
into a register when the above is enabled.
* config/aarch64/aarch64.c (neoversev1_tunings):
AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
(aarch64_rtx_costs): Use AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
gcc/testsuite/
* gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.
Eric Botcazou [Fri, 5 Mar 2021 11:45:41 +0000 (12:45 +0100)]
Fix build breakage with latest glibc release
gcc/ada/
PR ada/99264
* init.c (__gnat_alternate_sta) [Linux]: Remove preprocessor test on
MINSIGSTKSZ and bump size to 32KB.
* libgnarl/s-osinte__linux.ads (Alternate_Stack_Size): Bump to 32KB.
Jason Merrill [Wed, 3 Mar 2021 22:38:55 +0000 (17:38 -0500)]
c++: Normalization and deduction guide rewriting [PR96199]
This is a subset of r11-2748; we don't want to rewrite all deduction guides
in GCC 10, but we do need the push_nested_class in normalization to avoid
breaking cmcstl2.
gcc/cp/ChangeLog:
PR c++/96199
* cp-tree.h (struct push_nested_class_guard): New.
* constraint.cc (get_normalized_constraints_from_decl): Use it.
Jason Merrill [Wed, 3 Mar 2021 04:59:00 +0000 (23:59 -0500)]
c++: C++17 and decltype of multi-operator expression [PR95675]
A call that is the immediate operand of decltype has special semantics: no
temporary is produced, so it's OK for the return type to be e.g. incomplete.
But we were treating (e | f) the same way, which confused overload
resolution when we then tried to evaluate ... | g.
Fixed by making build_temp do what its name says, and force the C++17
temporary materialization conversion.
gcc/cp/ChangeLog:
PR c++/95675
* call.c (build_temp): Wrap a CALL_EXPR in a TARGET_EXPR
if it didn't get one before.
gcc/testsuite/ChangeLog:
PR c++/95675
* g++.dg/cpp0x/decltype-call5.C: New test.
* g++.dg/cpp0x/decltype-call6.C: New test.
Jason Merrill [Fri, 12 Feb 2021 03:01:19 +0000 (22:01 -0500)]
cgraph: flatten and same_body aliases [PR96078]
The patch for PR92372 made us start warning about a flatten attribute on an
alias. But in the case of C++ 'tor base/complete variants, the user didn't
create the alias. If the alias target also has the attribute, the alias
points to a flattened function, so we shouldn't warn.
gcc/ChangeLog:
PR c++/96078
* cgraphunit.c (process_function_and_variable_attributes): Don't
warn about flatten on an alias if the target also has it.
* cgraph.h (symtab_node::get_alias_target_tree): New.
gcc/testsuite/ChangeLog:
PR c++/96078
* g++.dg/ext/attr-flatten1.C: New test.
Jason Merrill [Thu, 25 Feb 2021 21:47:53 +0000 (16:47 -0500)]
c++: Fix class NTTP constness handling [PR98810]
Here, when substituting still-dependent args into an alias template, we see
a non-const type because the default argument is non-const, and is not a
template parm object because it's still dependent.
gcc/cp/ChangeLog:
PR c++/98810
* pt.c (tsubst_copy) [VIEW_CONVERT_EXPR]: Add const
to a class non-type template argument that needs it.
gcc/testsuite/ChangeLog:
PR c++/98810
* g++.dg/cpp2a/nontype-class-defarg1.C: New test.
Eric Botcazou [Wed, 3 Mar 2021 11:25:03 +0000 (12:25 +0100)]
Fix ICE with pathologically large frames
gcc/
PR target/99234
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
point back the hard frame pointer to its default location when the
frame is larger than SEH_MAX_FRAME_SIZE.
Richard Biener [Wed, 20 Jan 2021 07:48:34 +0000 (08:48 +0100)]
tree-optimization/98758 - fix integer arithmetic in data-ref analysis
This fixes some int arithmetic issues and a bogus truncation.
2021-01-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/98758
* tree-data-ref.c (int_divides_p): Use lambda_int arguments.
(lambda_matrix_right_hermite): Avoid undefinedness with
signed integer abs and multiplication.
(analyze_subscript_affine_affine): Use lambda_int.
Richard Biener [Wed, 13 Jan 2021 08:43:52 +0000 (09:43 +0100)]
tree-optimization/98640 - fix bogus sign-extension with VN
VN tried to express a sign extension from int to long of
a trucated quantity with a plain conversion but that loses the
truncation. Since there's no single operand doing truncate plus
sign extend (there was a proposed SEXT_EXPR to do that at some
point mapping to RTL sign_extract) don't bother to appropriately
model this with two ops (which the VN insert machinery doesn't
handle and which is unlikely to CSE fully).
2021-01-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/98640
* tree-ssa-sccvn.c (visit_nary_op): Do not try to
handle plus or minus from a truncated operand to be
sign-extended.
This fixes a double-counting in the reduction cost when vectorizing
the reduction through the regular vectorizable_* functions.
2021-01-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/98526
* tree-vect-loop.c (vect_model_reduction_cost): Remove costing
of the actual reduction op for the regular case.
(vectorizable_reduction): Cost the stmts
vect_transform_reduction produces here.
Richard Biener [Thu, 19 Nov 2020 08:06:50 +0000 (09:06 +0100)]
tree-optimization/97897 - complex lowering on abnormal edges
This fixes complex lowering to not put constants into abnormal
edge PHI values by making sure abnormally used SSA names are
VARYING in its propagation lattice.
2020-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/97897
* tree-complex.c (complex_propagate::visit_stmt): Make sure
abnormally used SSA names are VARYING.
(complex_propagate::visit_phi): Likewise.
Eric Botcazou [Tue, 2 Mar 2021 16:58:46 +0000 (17:58 +0100)]
Fix PR ada/99095
This is a regression present on the mainline and 10 branch, where we fail
to make the bounds explicit for the return value of a function returning
an unconstrained array of a limited record type.
gcc/ada/
PR ada/99095
* sem_ch8.adb (Check_Constrained_Object): Restrict again the special
optimization for limited types to non-array types except in the case
of an extended return statement.
gcc/testsuite/
* gnat.dg/limited5.adb: New test.
Kito Cheng [Tue, 7 Jul 2020 08:20:53 +0000 (16:20 +0800)]
RISC-V: Implement __builtin_thread_pointer
RISC-V has a dedicate register for thread pointer which is specified in psABI
doc, so we could support __builtin_thread_pointer in straightforward way.
Note: clang/llvm was supported __builtin_thread_pointer for RISC-V port
recently.
- https://reviews.llvm.org/rGaabc24acf0d5f8677bd22fe9c108581e07c3e180
Richard Earnshaw [Mon, 22 Feb 2021 15:00:53 +0000 (15:00 +0000)]
arm: force use of r4 for __gnu_cmse_nonsecure_call when !FPCXT [PR99271]
Commit r10-6017 relaxed the constraint on thumb2 calls to
__gnu_cmse_nonsecure_call to allow any register for the call address.
Although the initial code expansion continues to use r4 with the FPCXT
extension is not enabled, the change was unsafe because subsequent
optimizations could use the additional freedom to change which
register was being used.
To fix this we need to split the output patterns in the machine
description to use distinct recognizers: one with the additional
freedom when FPCXT is enabled an another that retains the original
restrictions when the extension is not available.
gcc:
PR target/99271
* config/arm/thumb2.md (nonsecure_call_reg_thumb2_fpcxt): New pattern.
(nonsecure_call_value_reg_thumb2_fpcxt): Likewise.
(nonsecure_call_reg_thumb2): Restrict to using r4 for the callee
address and disable when the FPCXT is not available.
(nonsecure_call_value_reg_thumb2): Likewise.
gcc/testsuite:
* gcc.target/arm/cmse/cmse-18.c: New test.
Eric Botcazou [Mon, 1 Mar 2021 06:53:05 +0000 (07:53 +0100)]
Fix wrong result for 1.0/3.0 at -O2 -fno-omit-frame-pointer -frounding-math
This wrong-code PR for the C++ compiler on x86-64/Windows is a regression
in GCC 9 and later, but the underlying issue has probably been there since
SEH was implemented and is exposed by this comment in config/i386/winnt.c:
/* SEH records saves relative to the "current" stack pointer, whether
or not there's a frame pointer in place. This tracks the current
stack pointer offset from the CFA. */
HOST_WIDE_INT sp_offset;
That's not what the (current) Microsoft documentation says; instead it says:
/* SEH records offsets relative to the lowest address of the fixed stack
allocation. If there is no frame pointer, these offsets are from the
stack pointer; if there is a frame pointer, these offsets are from the
value of the stack pointer when the frame pointer was established, i.e.
the frame pointer minus the offset in the .seh_setframe directive. */
That's why the implementation is correct only under the condition that the
frame pointer be established *after* the fixed stack allocation; as a matter
of fact, that's clearly the model underpinning SEH, but is the opposite of
what is done e.g. on Linux.
However the issue is mostly papered over in practice because:
1. SEH forces use_fast_prologue_epilogue to false, which in turns forces
save_regs_using_mov to false, so the general regs are always pushed when
they need to be saved, which eliminates the offset computation for them.
2. As soon as a frame is larger than 240 bytes, the frame pointer is fixed
arbitrarily to 128 bytes above the stack pointer, which of course requires
that it be established after the fixed stack allocation.
So you need a small frame clobbering one of the call-saved XMM registers in
order to generate wrong SEH unwind info.
The attached fix makes sure that the frame pointer is always established
after the fixed stack allocation by pointing it at or below the lowest used
register save area, i.e. the SSE save area, and removing the special early
saves in the prologue; the end result is a uniform prologue sequence for
SEH whatever the frame size. And it avoids a discrepancy between cases
where the number of saved general regs is even and cases where it is odd.
gcc/
PR target/99234
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
point the hard frame pointer to the SSE register save area instead
of the general register save area. Perform only minimal adjustment
for small frames if it is initially not correctly aligned.
(ix86_expand_prologue): Remove early saves for a SEH target.
* config/i386/winnt.c (struct seh_frame_state): Document constraint.
gcc/testsuite/
* g++.dg/eh/seh-xmm-unwind.C: New test.
Jason Merrill [Fri, 26 Feb 2021 10:45:02 +0000 (05:45 -0500)]
c++: Allow GNU attributes before lambda -> [PR90333]
In my 9.3/10 patch for 90333 I allowed attributes between [] and (), and
after the trailing return type, but not in the place that GCC 8 expected
them, and we've gotten several bug reports about that. So let's allow them
there, as well.
gcc/cp/ChangeLog:
PR c++/90333
* parser.c (cp_parser_lambda_declarator_opt): Accept GNU attributes
between () and ->.
gcc/testsuite/ChangeLog:
PR c++/90333
* g++.dg/ext/attr-lambda3.C: New test.
Jason Merrill [Fri, 12 Feb 2021 00:45:22 +0000 (19:45 -0500)]
c++: variadic lambda template and empty pack [PR97246]
In get<0>, Is is empty, so the first parameter pack of the lambda is empty,
but after the fix for PR94546 we were wrongly associating it with the
partial instantiation of 'v'.
Substrings were not reduced early enough for use in initializations,
such as DATA statements. Add an early simplification for substrings
with constant starting and ending points.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_resolve_substring): Add prototype.
* primary.c (match_string_constant): Simplify substrings with
constant starting and ending points.
* resolve.c: Rename resolve_substring to gfc_resolve_substring.
(gfc_resolve_ref): Use renamed function gfc_resolve_substring.
gcc/testsuite/ChangeLog:
* substr_10.f90: New test.
* substr_9.f90: New test.
Christophe Lyon [Wed, 24 Feb 2021 15:51:52 +0000 (15:51 +0000)]
arm: Fix CMSE support detection in libgcc (PR target/99157)
As discussed in the PR, the Makefile fragment lacks a double '$' to
get the return-code from GCC invocation, resulting is CMSE support
missing from multilibs.
I checked that the simple patch proposed in the PR fixes the problem.
2021-02-23 Christophe Lyon <christophe.lyon@linaro.org>
Hau Hsu <hsuhau617@gmail.com>
PR target/99157
libgcc/
* config/arm/t-arm: Fix cmse support detection.
Paul Thomas [Tue, 23 Feb 2021 19:29:04 +0000 (19:29 +0000)]
Fortran: Fix for class defined operators [PR99124].
2021-02-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/99124
* resolve.c (resolve_fl_procedure): Include class results in
the test for F2018, C15100.
* trans-array.c (get_class_info_from_ss): Do not use the saved
descriptor to obtain the class expression for variables. Use
gfc_get_class_from_expr instead.
gcc/testsuite/
PR fortran/99124
* gfortran.dg/class_defined_operator_2.f03 : New test.
* gfortran.dg/elemental_result_2.f90 : New test.
* gfortran.dg/class_assign_4.f90: Correct the non-conforming
elemental function with an allocatable result with an operator
interface with array dummies and result.
PR target/85074
* config/pa/pa.c (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Define as
hook_bool_const_tree_hwi_hwi_const_tree_true.
(pa_asm_output_mi_thunk): Add support for nonzero vcall_offset.
Kyrylo Tkachov [Fri, 19 Feb 2021 09:02:58 +0000 (09:02 +0000)]
aarch64: Introduce prefer_advsimd_autovec to GCC 10
This patch introduces the prefer_advsimd_autovec internal tune flag that's already available on the GCC 8 and 9 branches.
It allows a CPU tuning to specify that it prefers Advanced SIMD for autovectorisation rather than SVE.
In GCC 10 onwards this can be easily adjusted through the aarch64_autovec_preference param in the options override hook.
The neoversev1_tunings struct makes use of this tuning flag
Bootstrapped and tested on aarch64-none-linux-gnu.
Confirmed that an --param aarch64-autovec-preference can override the CPU setting if the user really wishes to.
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (prefer_advsimd_autovec): Define.
* config/aarch64/aarch64.c (neoversev1_tunings): Use it.
(aarch64_override_options_internal): Adjust aarch64_autovec_preference
param when prefer_advsimd_autovec is enabled.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd_autovec_only_1.c: New test.